Awesome Autoformalization

A selected list of Autoformalization papers, projects, frameworks, libraries, tools, and other resources.

This repo is intended to be a manually selected and curated list rather than an exhaustive or authoritative bibliography. The entries, metadata, descriptions, and reading guides are manually written by the maintainer, and thus inevitably constrained by one's knowledge and perspective. Important work may be missing, and some details or descriptions may be incomplete, outdated, or mistaken.

Corrections, suggestions, and friendly discussions are more than welcome! Please feel free to open an issue or pull request.

Papers

0. Survey, Positions, and Roadmaps

Year	Title	Venue	Main Techniques	Main Takeaways	Reading Guide
2019	A Promising Path Towards Autoformalization and General Artificial Intelligence	CICM 2020	-	1. Neural Autoformalizer + Neural Formal Theorem Prover + Symbolic Proof Verifier = Self-Improving Mathematical General Intelligence. 2. Autoformalization is essential for Neural Formal Theorem Proving to provide massive training data.	Pioneering Manifesto: a must-read if you're interested in the history; most visions anticipated current standard practices, while a few are slightly outdated (e.g. underestimation of Informal Theorem Provers). ★★★½☆
2024	Formal Mathematical Reasoning: A New Frontier in AI	ICML 2025 Position Track	-	1. Autoformalization has 3 important applications: synthesizing data for theorem provers, guiding formal theorem proving, and validating informal reasoning. 2. There are 3 main challenges in Autoformalization: automated evaluation, task decomposition, and interaction with formal systems.	Best First Read: great intro to the field; frames Autoformalization in the greater context of AI for Formal Mathematics, identifies critical challenges and future milestones. ★★★★½
2025	Autoformalization in the Era of Large Language Models: A Survey	-	-	1. A taxonomy of Autoformalization tasks based on the workflow: Data Pre-processing, Autoformalization Methods, Post-processing, and Evaluation. 2. Autoformalization is helpful in 3 applications: Software Verification, Medical Domains, and LLM Output Verification.	Good Resource Collection: a nice collection of Autoformalization datasets, benchmarks, methods, and progress in various mathematical domains; the resource coverage slightly lags the fast-moving frontier, and it focuses mostly on math and statement autoformalization. ★★★☆☆
2025	Towards a Common Framework for Autoformalization	AAAI 2026	-	1. An Autoformalization task consists of 3 components: The Informal Language `L_i` (a domain-specific subset of it), The Formal Language `L_f`, and the Semantic Equivalence Criterion `E`. 2. The general framework encompasses targets beyond Interactive Theorem Prover Languages: Automated Theorem Prover Inputs (e.g. Prover9), Logic Programming Languages (e.g. SWI-Prolog), Planning Languages (e.g. PDDL), and Knowledge Representations (e.g. OWL).	Inclusive Conceptual Framework: an inclusive conceptual view of Autoformalization with many applications beyond Mathematics. ★★★★☆
2026	Theory-Level Autoformalization: From Isolated Statements to Unified Formal Knowledge Bases	ICML 2026 Position Track	-	1. Advocates a focus shift from Statement-Level Autoformalization to Theory-Level: formalizing the entire context including axioms, definitions, notations, theorems, proofs, tactics, and their inter-dependencies. 2. There are 4 key challenges in theory-level: reliable equivalence checking, hierarchical decomposition and abstraction learning, low-resource domain-specific languages, and multi-modal inputs. 3. There are 3 promising paths: benchmarks with sound equivalence checking, general-purpose models over specialized autoformalizers, and a common intermediate representation as a unified formalization target.	Theory-Level Agenda: a forward-looking agenda focusing on Theory-Level Mathematics and other real-world applications (e.g. Automated Theorem Proving, Protocol Verification, Hardware Verification, Declarative Programming). No Rating: paper authored by the repo maintainer.

1. Symbolic Autoformalization

Under construction.

2. Neural Autoformalization

2.0 Early Explorations (Pre-ChatGPT Era)

2.1.1 Semantic Parsing

Year	Title	Venue	Main Techniques	Main Takeaways	Reading Guide
2015	Learning to Parse on Aligned Corpora (Rough Diamond)	ITP 2015	-	-	No Rating: mainly for historical context.
2017	System Description: Statistical Parsing of Informalized Mizar Formulas	SYNASC 2017	-	-	No Rating: mainly for historical context.

2.1.2 Neural Machine Translation (RNN, LSTM, Transformer, etc.)

Year	Title	Venue	Main Techniques	Main Takeaways	Reading Guide
2018	First Experiments with Neural Translation of Informal to Formal Mathematics	CICM 2018	-	-	No Rating: mainly for historical context.
2019	Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar	CPP 2020	-	-	No Rating: mainly for historical context.

2.1.3 Large Language Models (Transformer Decoder-Based)

Year	Title	Venue	Main Techniques	Main Takeaways	Reading Guide
2022	Autoformalization with Large Language Models	NeurIPS 2022	-	1. General Foundation Models + Few-Shot Prompting works surprisingly well for Statement Autoformalization and generalizes to non-trivial cases. 2. Synthetic data curated via Autoformalization improves Neural Theorem Proving. 3. There is an asymmetry between Informalization and Formalization: LLMs are better at the former than the latter, providing the basis for methods leveraging back-translation.	Foundational Empirical Results: the findings are now widely accepted as common wisdom, which inspire the development of many important methods in the field. ★★★★☆
2022	Towards a Mathematics Formalisation Assistant using Large Language Models	-	-	-	No Rating: mainly for historical context.

2.1 Data Curation & Synthesis

Under construction.

2.2 Training Recipes

Under construction.

2.3 Inference-Time Techniques

Under construction.

2.4 Workflows & Harnesses

Under construction.

2.5 Benchmarking & Evaluation

Under construction.

Projects

Under construction.

Frameworks, Libraries, and Tools

Under construction.

Other Resources

Under construction.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Autoformalization

Papers

0. Survey, Positions, and Roadmaps

1. Symbolic Autoformalization

2. Neural Autoformalization

2.0 Early Explorations (Pre-ChatGPT Era)

2.1.1 Semantic Parsing

2.1.2 Neural Machine Translation (RNN, LSTM, Transformer, etc.)

2.1.3 Large Language Models (Transformer Decoder-Based)

2.1 Data Curation & Synthesis

2.2 Training Recipes

2.3 Inference-Time Techniques

2.4 Workflows & Harnesses

2.5 Benchmarking & Evaluation

Projects

Frameworks, Libraries, and Tools

Other Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Autoformalization

Papers

0. Survey, Positions, and Roadmaps

1. Symbolic Autoformalization

2. Neural Autoformalization

2.0 Early Explorations (Pre-ChatGPT Era)

2.1.1 Semantic Parsing

2.1.2 Neural Machine Translation (RNN, LSTM, Transformer, etc.)

2.1.3 Large Language Models (Transformer Decoder-Based)

2.1 Data Curation & Synthesis

2.2 Training Recipes

2.3 Inference-Time Techniques

2.4 Workflows & Harnesses

2.5 Benchmarking & Evaluation

Projects

Frameworks, Libraries, and Tools

Other Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages