Кластер #3994 - News Clusters

LLM Research Papers: The 2025 List (January to June)

closed
Тип события	other
Тема	reasoning models
Организация
Страна
Статей	1
Уник. источников	1
Важность / Момент	0.69 / 0
Период	01.07.2025 11:11 — 01.07.2025 11:11
Создан	06.04.2026 06:18:06
Статьи в кластере 1

Заголовок
Источник
Дата публикации
Score
LLM Research Papers: The 2025 List (January to June)
ahead_of_ai
01.07.2025 11:11
Embedding sim.	1
Entity overlap	1
Title sim.	1
Time proximity	1
NLP тип	other
NLP организация
NLP тема	reasoning models
NLP страна
Открыть оригинал
As some of you know, I keep a running list of research papers I (want to) read and reference.
 About six months ago, I shared my 2024 list , which many readers found useful. So, I was thinking about doing this again. However, this time, I am incorporating that one piece of feedback kept coming up: "Can you organize the papers by topic instead of date?" 
 The categories I came up with are:
 Reasoning Models
 - 1a. Training Reasoning Models
 - 1b. Inference-Time Reasoning Strategies
 - 1c. Evaluating LLMs and/or Understanding Reasoning

 Other Reinforcement Learning Methods for LLMs

 Other Inference-Time Scaling Methods

 Efficient Training & Architectures

 Diffusion-Based Language Models

 Multimodal & Vision-Language Models

 Data & Pre-training Datasets

 Also, as LLM research continues to be shared at a rapid pace, I have decided to break the list into bi-yearly updates. This way, the list stays digestible, timely, and hopefully useful for anyone looking for solid summer reading material.
 Please note that this is just a curated list for now. In future articles, I plan to revisit and discuss some of the more interesting or impactful papers in larger topic-specific write-ups. Stay tuned!
 
 Announcement:
 It's summer! And that means internship season, tech interviews, and lots of learning.
To support those brushing up on intermediate to advanced machine learning and AI topics, I have made all 30 chapters of my Machine Learning Q and AI book freely available for the summer: 

&#128279; https://sebastianraschka.com/books/ml-q-and-ai/#table-of-contents 

Whether you are just curious and want to learn something new or prepping for interviews, hopefully this comes in handy. 

Happy reading, and best of luck if you are interviewing!
 
Reasoning Models 
 This year, my list is very reasoning model-heavy. So, I decided to subdivide it into 3 categories: Training, inference-time scaling, and more general understanding/evaluation.
 1a. Training Reasoning Models 
 This subsection focuses on training strategies specifically designed to improve reasoning abilities in LLMs. As you may see, much of the recent progress has centered around reinforcement learning (with verifiable rewards), which I covered in more detail in a previous article.
 
 Annotated figure from Reinforcement Pre-Training, https://arxiv.org/abs/2506.08007 
Jan, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682 

Jan, The Lessons of Developing Process Reward Models in Mathematical Reasoning, https://arxiv.org/abs/2501.07301 

Jan, Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models, https://arxiv.org/abs/2501.09686 

Jan, Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 

Jan, Kimi k1.5: Scaling Reinforcement Learning with LLMs, https://arxiv.org/abs//2501.12599 

Jan, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948 

Feb, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 

Feb, Demystifying Long Chain-of-Thought Reasoning in LLMs, Demystifying Long Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2502.03373 

Feb, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.03387 

Feb, Teaching Language Models to Critique via Reinforcement Learning, https://arxiv.org/abs/2502.03492 

Feb, Training Language Models to Reason Efficiently, https://arxiv.org/abs/2502.04463 

Feb, Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, https://arxiv.org/abs/2502.06781 

Feb, On the Emergence of Thinking in LLMs I: Searching for the Right Intuition, https://arxiv.org/abs/2502.06773 

Feb, LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!, https://arxiv.org/abs/2502.07374 

Feb, Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance, https://arxiv.org/abs/2502.08127 

Feb, Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe, https://arxiv.org/abs/2502.09056 

Feb, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, https://arxiv.org/abs/2502.14768 

Feb, SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, https://arxiv.org/abs/2502.18449 

Mar, Learning from Failures in Multi-Attempt Reinforcement Learning, https://arxiv.org/abs/2503.04808 

Mar, The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, https://arxiv.org/abs/2503.02875 

Mar, R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.05592 

Mar, LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL, https://arxiv.org/abs/2503.07536 

Mar, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516 

Mar, Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models, https://arxiv.org/abs/2503.13551 

Mar, Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't, https://arxiv.org/abs/2503.16219 

Mar, ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.19470 

Mar, Understanding R1-Zero-Like Training: A Critical Perspective, https://arxiv.org/abs/2503.20783 

Mar, RARE: Retrieval-Augmented Reasoning Modeling, https://arxiv.org/abs/2503.23513 

Mar, Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model, https://arxiv.org/abs/2503.24290 

Mar, JudgeLRM: Large Reasoning Models as a Judge, https://arxiv.org/abs/2504.00050 

Apr, Concise Reasoning via Reinforcement Learning, https://arxiv.org/abs/2504.05185 

Apr, VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning, https://arxiv.org/abs/2504.08837 

Apr, Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning, https://arxiv.org/abs/2504.08672 

Apr, Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability, https://arxiv.org/abs/2504.09639 

Apr, Learning to Reason under Off-Policy Guidance, https://arxiv.org/abs/2504.14945 

Apr, Tina: Tiny Reasoning Models via LoRA, https://arxiv.org/abs/2504.15777 

Apr, Reinforcement Learning for Reasoning in Large Language Models with One Training Example, https://arxiv.org/abs/2504.20571 

Apr, Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math, https://arxiv.org/abs/2504.21233 

May, Llama-Nemotron: Efficient Reasoning Models, https://arxiv.org/abs/2505.00949 

May, RM-R1: Reward Modeling as Reasoning, https://arxiv.org/abs/2505.02387 

May, Absolute Zero: Reinforced Self-play Reasoning with Zero Data, https://arxiv.org/abs/2505.03335 

May, INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning, https://arxiv.org/abs/2505.07291 

May, MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining, https://arxiv.org/abs/2505.07608 

May, Qwen3 Technical Report, https://arxiv.org/abs/2505.09388 

May, Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models, https://arxiv.org/abs/2505.10554 

May, AdaptThink: Reasoning Models Can Learn When to Think, https://arxiv.org/abs/2505.13417 

May, Thinkless: LLM Learns When to Think, https://arxiv.org/abs/2505.13379 

May, General-Reasoner: Advancing LLM Reasoning Across All Domains, https://arxiv.org/abs/2505.14652 

May, Learning to Reason via Mixture-of-Thought for Logical Reasoning, https://arxiv.org/abs/2505.15817 

May, RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning, https://arxiv.org/abs/2505.15034 

May, QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning, https://www.arxiv.org/abs/2505.17667 

May, Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles, https://arxiv.org/abs/2505.19914 

May, Learning to Reason without External Rewards, https://arxiv.org/abs/2505.19590 

May, Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents, https://arxiv.org/abs/2505.22954 

May, Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning, https://arxiv.org/abs/2505.24726 

May, ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models, https://arxiv.org/abs/2505.24864 

Jun, Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning, https://arxiv.org/abs/2506.01939 

Jun, Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening, https://www.arxiv.org/abs/2506.02355 

Jun, Reinforcement Pre-Training, https://arxiv.org/abs/2506.08007 

Jun, RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling, https://arxiv.org/abs/2506.08672 

Jun, Reinforcement Learning Teachers of Test Time Scaling, https://www.arxiv.org/abs/2506.08388 

Jun, Magistral, https://arxiv.org/abs/2506.10910 

Jun, Spurious Rewards: Rethinking Training Signals in RLVR, https://arxiv.org/abs/2506.10947 

Jun, AlphaEvolve: A coding agent for scientific and algorithmic discovery, https://arxiv.org/abs/2506.13131 

Jun, Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs, https://arxiv.org/abs/2506.14245 

Jun, Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training, https://arxiv.org/abs/2506.18777 

Jun, Bridging Offline and Online Reinforcement Learning for LLMs, https://arxiv.org/abs/2506.21495 

 1b. Inference-Time Reasoning Strategies 
 This part of the list covers methods that improve reasoning dynamically at test time, without requiring retraining. Often, these papers are focused on trading of computational performance for modeling performance.
 
 Read more