(Rio, April 26/27, 2026)
Accepted Papers
-
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
Zhaohui Wang -
Beaver: An Efficient Deterministic LLM Verifier
Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh -
RocqSmith: Can Automatic Optimization Forge Better Proof Agents?
Andrei Kozyrev, Nikita Khramov, Denis Lochmelis, Valerio Morelli, Gleb Solovev, Anton Podkopaev -
Agentic Uncertainty Reveals Agentic Overconfidence
Jean Kaddour, Srijan Patel, Gbetondji Dovonon, Leo Richter, Pasquale Minervini, Matt Kusner -
Autoformalizing Memory Device Specifications with Agents
Jan Ole Ernst, Dmitri Saberi, Thomas Zimmermann, Derek Christ, Rajath Salegame, Suhaas Bhat, Stanislav Levental, Thomas Ahle, Matthias Jung -
The Dual Nature of Unlearning: Impact of Fact Salience and Model Fine-Tuning
Anna Borisiuk, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina -
Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis
Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, Alex Aiken -
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu -
GLEAN: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification
Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu, Mihaela van der Schaar -
Evaluating Agentic Optimization on Large Codebases
Atharva Sehgal, James Hou, Akanksha Sarkar, Ishaan Mantripragada, Swarat Chaudhuri, Jennifer Sun, Yisong Yue -
Conv-to-Bench: Evaluating Language Models via User–Assistant Dialogues in Code Tasks
Victor dos Santos, André Castro, Samuel de Souza Toledo, Bruno Calura, Lisandra de Moura Menezes, Raul Mata, Telma de Lima Soares, Bryan Lincoln Marques de Oliveira -
Geometry of Reason: Probabilistic Spectral Verification for Mathematical Reasoning
Valentin Noël -
A Nash Equilibrium Framework for Training-Free Multimodal Step Verification
Rohit Sinha, Kunal Tilaganji, Tanuja Ganu, Nagarajan Natarajan, Amit Sharma, Vineeth Balasubramanian -
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo -
ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
Ayush Nangia, Shikhar Mishra, Aman Gokrani, Paras Chopra -
Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition
Joonhyuk Cha, Moises Andrade, Zsolt Kira -
Learning to Repair Lean Proofs from Compiler Feedback
Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin -
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu -
Grounding Long-Horizon Agent Coordination in GUI Environments via Contract-based Structural Planning
Hao Yu, Weiming Li, Yueming Lyu, Jie-Jing Shao, Yulei Sui, Ivor Tsang, Haiyan Yin -
DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification
Shentong Mo -
Enforcing Temporal Constraints for LLM Agents
Adharsh Kamath, Sishen Zhang, Changming Xu, Shubham Dipak Ugare, Gagandeep Singh, Sasa Misailovic -
Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning
Manan Tayal, Mumuksh Tayal -
Beyond Self-Checking: Fragment-Level Verification Across Diverse LLMs
Ken Mueller, Arihant Choudhary, David Perez, Scott Mueller -
interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors
Vishak K Bhat, Prateek Chanda, Ashmit Khandelwal, Maitreyi Swaroop, Subbarao Kambhampati, Vineeth Balasubramanian, Nagarajan Natarajan, Amit Sharma -
Unified Operational Formalism for LLM-based Theorem-proving Systems
Avaljot Singh, Shaurya Gomber, Yasmin Sarita, José Meseguer, Gagandeep Singh -
Learning to Rank the Initial Branching Order of SAT Solvers
Arvid Eriksson, Gabriel Poesia, Roman Bresson, Karl H. Johansson, David Broman -
Verification Limits Code LLM Training
Srishti Gureja, Marzieh Fadaee, Sara Hooker, Matthias Gallé, Jingyi He, Elena Tommasone -
ROC-n-reroll: How Verifier Imperfection Affects Test-time Scaling
Florian Eddie Dorner, Yatong Chen, André F. Cruz, Fanny Yang -
Learning from Synthetic Data Improves Multi-hop Reasoning
Anmol Kabra, Yilun Yin, Albert Gong, Kamilė Stankevičiūtė, Dongyoung Go, Johann Lee, Katie Luo, Carla Gomes, Kilian Weinberger -
Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy
Andrea Brunello, Luca Geatti, Michele Mignani, Angelo Montanari, Nicola Saccomanno -
Computational Arbitrage in AI Model Markets
Ricardo Dominguez-Olmedo, Bernhard Schölkopf, Moritz Hardt -
A Minimal Agent for Automated Theorem Proving
Borja Requena, Austin Letson, Krystian Nowakowski, Izan Ferreiro, Leopoldo Sarra -
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Guijin Son, Dongkeun Yoon, Juyoung Suk, Javier Aula-Blasco, Mano Aslan, Kim Vu, Shayekh Islam, Jaume Prats-Cristià, Lucía Tormo-Bañuelos, Seungone Kim -
SorryDB: Can AI Provers Complete Real-World Lean Theorems?
Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman -
Benchmarking Code Verification Strategies with LLMs-as-a-judge
Arnav Kumar Jain, Justin Chiu, Tom Sherborne, Matthias Gallé -
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
Kyuhee Kim, Auguste Poiroux, Antoine Bosselut -
RepairBench: Exploring Proof Repair in Lean
Manooshree Patel, Bartosz Piotrowski, Leopold Haller, Hugh Leather -
Scaling Evaluation-Time Compute with Reasoning Models as Process Evaluators
Seungone Kim, Ian Wu, Jinu Lee, Xiang Yue, Seongyun Lee, Minkyeong Moon, Carolin Lawrence, Kiril Gashteovski, Julia Hockenmaier, Graham Neubig, Sean Welleck -
FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
Nikil Ravi, Kexing Ying, Elif Uskuplu, Rayan Krishnan, Langston Nashold, Vasilii Nesterov, Bingyu Xia, Janitha Aswedige