(Rio, April 26/27, 2026)

Accepted Papers

  1. NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
    Zhaohui Wang
  2. Beaver: An Efficient Deterministic LLM Verifier
    Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
  3. RocqSmith: Can Automatic Optimization Forge Better Proof Agents?
    Andrei Kozyrev, Nikita Khramov, Denis Lochmelis, Valerio Morelli, Gleb Solovev, Anton Podkopaev
  4. Agentic Uncertainty Reveals Agentic Overconfidence
    Jean Kaddour, Srijan Patel, Gbetondji Dovonon, Leo Richter, Pasquale Minervini, Matt Kusner
  5. Autoformalizing Memory Device Specifications with Agents
    Jan Ole Ernst, Dmitri Saberi, Thomas Zimmermann, Derek Christ, Rajath Salegame, Suhaas Bhat, Stanislav Levental, Thomas Ahle, Matthias Jung
  6. The Dual Nature of Unlearning: Impact of Fact Salience and Model Fine-Tuning
    Anna Borisiuk, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina
  7. Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis
    Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, Alex Aiken
  8. Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
    Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu
  9. GLEAN: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification
    Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu, Mihaela van der Schaar
  10. Evaluating Agentic Optimization on Large Codebases
    Atharva Sehgal, James Hou, Akanksha Sarkar, Ishaan Mantripragada, Swarat Chaudhuri, Jennifer Sun, Yisong Yue
  11. Conv-to-Bench: Evaluating Language Models via User–Assistant Dialogues in Code Tasks
    Victor dos Santos, André Castro, Samuel de Souza Toledo, Bruno Calura, Lisandra de Moura Menezes, Raul Mata, Telma de Lima Soares, Bryan Lincoln Marques de Oliveira
  12. Geometry of Reason: Probabilistic Spectral Verification for Mathematical Reasoning
    Valentin Noël
  13. A Nash Equilibrium Framework for Training-Free Multimodal Step Verification
    Rohit Sinha, Kunal Tilaganji, Tanuja Ganu, Nagarajan Natarajan, Amit Sharma, Vineeth Balasubramanian
  14. ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
    Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo
  15. ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
    Ayush Nangia, Shikhar Mishra, Aman Gokrani, Paras Chopra
  16. Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition
    Joonhyuk Cha, Moises Andrade, Zsolt Kira
  17. Learning to Repair Lean Proofs from Compiler Feedback
    Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin
  18. Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
    Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu
  19. Grounding Long-Horizon Agent Coordination in GUI Environments via Contract-based Structural Planning
    Hao Yu, Weiming Li, Yueming Lyu, Jie-Jing Shao, Yulei Sui, Ivor Tsang, Haiyan Yin
  20. DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification
    Shentong Mo
  21. Enforcing Temporal Constraints for LLM Agents
    Adharsh Kamath, Sishen Zhang, Changming Xu, Shubham Dipak Ugare, Gagandeep Singh, Sasa Misailovic
  22. Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning
    Manan Tayal, Mumuksh Tayal
  23. Beyond Self-Checking: Fragment-Level Verification Across Diverse LLMs
    Ken Mueller, Arihant Choudhary, David Perez, Scott Mueller
  24. interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors
    Vishak K Bhat, Prateek Chanda, Ashmit Khandelwal, Maitreyi Swaroop, Subbarao Kambhampati, Vineeth Balasubramanian, Nagarajan Natarajan, Amit Sharma
  25. Unified Operational Formalism for LLM-based Theorem-proving Systems
    Avaljot Singh, Shaurya Gomber, Yasmin Sarita, José Meseguer, Gagandeep Singh
  26. Learning to Rank the Initial Branching Order of SAT Solvers
    Arvid Eriksson, Gabriel Poesia, Roman Bresson, Karl H. Johansson, David Broman
  27. Verification Limits Code LLM Training
    Srishti Gureja, Marzieh Fadaee, Sara Hooker, Matthias Gallé, Jingyi He, Elena Tommasone
  28. ROC-n-reroll: How Verifier Imperfection Affects Test-time Scaling
    Florian Eddie Dorner, Yatong Chen, André F. Cruz, Fanny Yang
  29. Learning from Synthetic Data Improves Multi-hop Reasoning
    Anmol Kabra, Yilun Yin, Albert Gong, Kamilė Stankevičiūtė, Dongyoung Go, Johann Lee, Katie Luo, Carla Gomes, Kilian Weinberger
  30. Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy
    Andrea Brunello, Luca Geatti, Michele Mignani, Angelo Montanari, Nicola Saccomanno
  31. Computational Arbitrage in AI Model Markets
    Ricardo Dominguez-Olmedo, Bernhard Schölkopf, Moritz Hardt
  32. A Minimal Agent for Automated Theorem Proving
    Borja Requena, Austin Letson, Krystian Nowakowski, Izan Ferreiro, Leopoldo Sarra
  33. MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
    Guijin Son, Dongkeun Yoon, Juyoung Suk, Javier Aula-Blasco, Mano Aslan, Kim Vu, Shayekh Islam, Jaume Prats-Cristià, Lucía Tormo-Bañuelos, Seungone Kim
  34. SorryDB: Can AI Provers Complete Real-World Lean Theorems?
    Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman
  35. Benchmarking Code Verification Strategies with LLMs-as-a-judge
    Arnav Kumar Jain, Justin Chiu, Tom Sherborne, Matthias Gallé
  36. Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
    Kyuhee Kim, Auguste Poiroux, Antoine Bosselut
  37. RepairBench: Exploring Proof Repair in Lean
    Manooshree Patel, Bartosz Piotrowski, Leopold Haller, Hugh Leather
  38. Scaling Evaluation-Time Compute with Reasoning Models as Process Evaluators
    Seungone Kim, Ian Wu, Jinu Lee, Xiang Yue, Seongyun Lee, Minkyeong Moon, Carolin Lawrence, Kiril Gashteovski, Julia Hockenmaier, Graham Neubig, Sean Welleck
  39. FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
    Nikil Ravi, Kexing Ying, Elif Uskuplu, Rayan Krishnan, Langston Nashold, Vasilii Nesterov, Bingyu Xia, Janitha Aswedige