CL-bench Life, from the Tencent Hunyuan team and Fudan University’s NLP group, is a companion to CL-bench that shifts from clean professional sources to messy, real-life context. Instead of well-structured reference material, models are given everyday communication, fragmented notes, and behavioral traces that are socially grounded and temporally dispersed, and must learn from this context to complete tasks.
Tasks span three categories: communication & social interactions, fragmented information & revisions, and behavioral records & activity trails.
We source results from the public CL-bench Life leaderboard, where the “Life” results appear as a tab alongside the main CL-bench leaderboard.
CL-bench Life contains 405 expert-curated context–task pairs with 5,348 verification rubrics across its three categories. As in CL-bench, scoring is rubric-based and the headline metric is the solving rate, the percentage of tasks solved against their rubrics. Our chart also exposes the per-category solving rates.
For full details, see the CL-bench Life paper and dataset.
Have a question? Noticed something wrong? Let us know.
A companion to CL-bench testing whether models can learn from and reason over messy, real-life context such as everyday communication, scattered notes, and behavioral traces.