Gödel’s Poetry (PDF) is a Lean 4 system that combines specialized language models for proof generation with recursive decomposition of difficult theorems into simpler entailing propositions, coordinated through a multi-agent architecture as in the arXiv manuscript. Proof verification runs through the Kimina Lean Server; when direct proof generation (with verifier-guided self-correction in the style of Goedel-Prover-V2) does not yield a closed proof, the stack falls back to RAG-based theorem retrieval, proof sketching with sorry placeholders in explicit have obligations, AST-based extraction of those subgoals, recursive proof of each proposition, and finally proof reconstruction that substitutes verified sub-proofs back into the parent sketch.
The goedels-poetry repository (PyPI: goedels-poetry) remains authoritative for operational detail—supervisor routing, defaults, and harness semantics—while the arXiv PDF records motivation, ablations, and a frozen snapshot of benchmark numbers at submission time.
Design stance
Many LLM+ATP setups stop at verifier-guided self-correction: the prover revises tactics using Lean diagnostics. That matches the direct-proving regime reported at pass@32 on miniF2F without decomposition—90.4%. The additional design idea in Gödel’s Poetry is global repair when local correction saturates: RAG-based retrieval of library facts, a proof sketch that states entailing propositions as named have lines closed with sorry, sketch validation in Kimina, then AST extraction so unproven leaves become first-class goals for recursion.
Orchestration uses LangGraph and LangChain. A supervisor dispatches work to specialized agents, and each role may use a different OpenAI-compatible endpoint.
Pipeline
Autoformalization
Informal statements go through:
- the formalizer agent
- syntax validation against Kimina
- semantic validation to ensure the formal Lean statement preserves the informal problem
Direct proof generation
The prover agent (defaulting to Goedel-Prover-V2) completes the theorem using verifier-guided self-correction. Failed attempts return compiler errors to a corrector agent until Kimina accepts a proof or retry limits are reached.
Recursive decomposition
When direct proving fails:
- the query generation agent emits descriptions of potentially useful lemmas
- the theorem lookup agent queries a vector index over Mathlib
- the proof sketcher agent creates a decomposition using
havesubgoals ending insorry - Kimina validates the sketch
- an AST parser extracts unresolved subgoals
- each proposition recursively re-enters the proving pipeline
- verified sub-proofs are reconstructed into a complete proof
Architecture
The architecture combines:
- formalization and validation
- proof generation
- Kimina verification
- theorem retrieval
- proof sketching and decomposition
- AST-based subgoal extraction
- recursive proof search

Evaluation
The system is evaluated on miniF2F, a benchmark of olympiad-style formal mathematics problems.
The manuscript reports:
- 90.4% at pass@32 without recursive decomposition
- significant additional gains with decomposition and RAG enabled
The main technical contribution emphasized in the paper is the use of AST extraction from Lean 4 proof sketches to keep decomposition fully programmatic and verifier-aligned.
Reproducibility
Verification runs through Kimina. Retrieval uses a local LeanExplore backend to ensure compatibility with the checker workspace.
Agent backends are configurable via config.ini and environment overrides.
References
@misc{davis2025godelspoetry,
title={G\"odel's Poetry},
author={Kelly J. Davis},
year={2025},
eprint={2512.14252},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.14252},
}