Gödel's Poetry: AI, Lean 4 Prover

Gödel’s Poetry (PDF) is a Lean 4 system that combines specialized language models for proof generation with recursive decomposition of difficult theorems into simpler entailing propositions, coordinated through a multi-agent architecture as in the arXiv manuscript. Proof verification runs through the Kimina Lean Server; when direct proof generation (with verifier-guided self-correction in the style of Goedel-Prover-V2) does not yield a closed proof, the stack falls back to RAG-based theorem retrieval, proof sketching with sorry placeholders in explicit have obligations, AST-based extraction of those subgoals, recursive proof of each proposition, and finally proof reconstruction that substitutes verified sub-proofs back into the parent sketch.

The goedels-poetry repository (PyPI: goedels-poetry) remains authoritative for operational detail—supervisor routing, defaults, and harness semantics—while the arXiv PDF records motivation, ablations, and a frozen snapshot of benchmark numbers at submission time.

Design stance

Many LLM+ATP setups stop at verifier-guided self-correction: the prover revises tactics using Lean diagnostics. That matches the direct-proving regime reported at pass@32 on miniF2F without decomposition—90.4%. The additional design idea in Gödel’s Poetry is global repair when local correction saturates: RAG-based retrieval of library facts, a proof sketch that states entailing propositions as named have lines closed with sorry, sketch validation in Kimina, then AST extraction so unproven leaves become first-class goals for recursion.

Orchestration uses LangGraph and LangChain. A supervisor dispatches work to specialized agents, and each role may use a different OpenAI-compatible endpoint.

Pipeline

Autoformalization

Informal statements go through:

the formalizer agent
syntax validation against Kimina
semantic validation to ensure the formal Lean statement preserves the informal problem

Direct proof generation

The prover agent (defaulting to Goedel-Prover-V2) completes the theorem using verifier-guided self-correction. Failed attempts return compiler errors to a corrector agent until Kimina accepts a proof or retry limits are reached.

Recursive decomposition

When direct proving fails:

the query generation agent emits descriptions of potentially useful lemmas
the theorem lookup agent queries a vector index over Mathlib
the proof sketcher agent creates a decomposition using have subgoals ending in sorry
Kimina validates the sketch
an AST parser extracts unresolved subgoals
each proposition recursively re-enters the proving pipeline
verified sub-proofs are reconstructed into a complete proof

Architecture

The architecture combines:

formalization and validation
proof generation
Kimina verification
theorem retrieval
proof sketching and decomposition
AST-based subgoal extraction
recursive proof search

High-level architecture of Gödel’s Poetry

Evaluation

The system is evaluated on miniF2F, a benchmark of olympiad-style formal mathematics problems.

The manuscript reports:

90.4% at pass@32 without recursive decomposition
significant additional gains with decomposition and RAG enabled

The main technical contribution emphasized in the paper is the use of AST extraction from Lean 4 proof sketches to keep decomposition fully programmatic and verifier-aligned.

Reproducibility

Verification runs through Kimina. Retrieval uses a local LeanExplore backend to ensure compatibility with the checker workspace.

Agent backends are configurable via config.ini and environment overrides.

References

@misc{davis2025godelspoetry,
  title={G\"odel's Poetry},
  author={Kelly J. Davis},
  year={2025},
  eprint={2512.14252},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.14252},
}

Design stance#

Pipeline#

Autoformalization#

Direct proof generation#

Recursive decomposition#

Architecture#

Evaluation#

Reproducibility#

References#