|
--- |
|
license: mit |
|
base_model: |
|
- deepseek-ai/DeepSeek-Prover-V1.5-SFT |
|
--- |
|
|
|
# MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving |
|
|
|
* Paper link: [https://arxiv.org/abs/2503.03205](https://arxiv.org/abs/2503.03205) |
|
* LoT-Solver (Based on DeepSeek-Prover-v1.5-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver](https://huggingface.co/RickyDeSkywalker/LoT-Solver) |
|
* LoT-Solver (Based on Godel-Prover-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel](https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel) |
|
* Website: [https://ma-lot.github.io/](https://ma-lot.github.io/) |
|
* LoT-CorrectionData: [https://huggingface.co/datasets/RickyDeSkywalker/LoT-CorrectionData/viewer/default/train](https://huggingface.co/datasets/RickyDeSkywalker/LoT-CorrectionData/viewer/default/train) |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is the model under MA-LoT paper presented training pipeline trained on DeepSeek-Prover-v1.5-SFT. For usage of the model, pelase refer to our GitHub page. |
|
|
|
## Introduction |
|
Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted the mathematical |
|
and computer science communities. State-of-the-art methods utilize a single Large Language Model (LLM) |
|
to generate complete proof or perform tree search, but they fail to balance these tasks. |
|
We propose **MA-LoT**: *Model-CollAboration Lean-based Long Chain-of-Thought*, |
|
a comprehensive framework for Lean4 theorem proving to solve this issue. |
|
It separates the cognition tasks of general NL for whole-proof generation and error analysis for proof correction using the |
|
model-collaboration method. |
|
We achieve this by structured interaction of the LLM and Lean4 verifier in Long CoT. |
|
To implement the framework, we propose the novel *LoT-Transfer Learning* training-inference pipeline, |
|
which enables the Long CoT thinking capability to LLMs without special data annotation. |
|
Extensive experiment shows that our framework achieves a **61.07%** accuracy rate on the Lean4 version of the MiniF2F-Test dataset, |
|
largely outperforming DeepSeek-V3 (33.61%), single-model tree search (InternLM-Step-Prover, 50.70%), |
|
and whole-proof generation (Godel-Prover, 55.33%) baselines. Furthermore, our findings highlight the potential |
|
of combining Long CoT with formal verification for a more insightful generation in a broader perspective. |
|
|
|
## Evaluation Results |
|
|
|
<p align="center"> |
|
<img width="100%" src="performance.png"> |
|
</p> |
|
|
|
## Usage example |
|
|
|
### General setup |
|
Load model |
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
MODEL_ID = "RickyDeSkywalker/LoT-Solver" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
MODEL_ID, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
``` |
|
|
|
### Prover Model |
|
|
|
#### Code-completion usage |
|
|
|
The model can be used as a general code-completion prover as follows: |
|
|
|
```python |
|
# formulate prompt |
|
prompt = """Complete the following Lean 4 code with explanatory comments preceding each line of code: |
|
|
|
```lean4 |
|
import Mathlib |
|
import Aesop |
|
|
|
set_option maxHeartbeats 0 |
|
|
|
open BigOperators Real Nat Topology Rat |
|
/--Suppose $a, b, c$ are the sides of a triangle. Prove that |
|
|
|
$a^2(b+c-a)+b^2(c+a-b)+c^2(a+b-c)\le{3abc}.$-/ |
|
theorem imo_1964_p2 |
|
(a b c : ℝ) |
|
(h₀ : 0 < a ∧ 0 < b ∧ 0 < c) |
|
(h₁ : c < a + b) |
|
(h₂ : b < a + c) |
|
(h₃ : a < b + c) : |
|
a^2 * (b + c - a) + b^2 * (c + a - b) + c^2 * (a + b - c) ≤ 3 * a * b * c := by""" |
|
|
|
# tokenize and generation |
|
input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device) |
|
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048) |
|
|
|
# print results |
|
result = tokenizer.decode(outputs[0]) |
|
print(result) |
|
``` |
|
|
|
#### Lean-based Long CoT Prover: |
|
|
|
The model can also be used as a LoT(Lean long CoT) prover as follows: |
|
|
|
```python |
|
# formulate prompt |
|
prompt = """<|begin▁of▁sentence|>You are an expert in Lean4 theorem proving with exceptional strategic reasoning abilities. When solving problems, strictly follow this process: |
|
1. First, create a natural language proof draft explaining key insights and logical steps |
|
2. Then analyze required Lean4 tactics, specifying exact syntax and their logical purpose |
|
### Instruction: You will receive several Lean4 problems. For each: |
|
- Prove the theorem **WITH** Long Chain-of-Thought in the <Thought> block. |
|
- **Do not** reveal your chain of thought outside the <Thought> block. |
|
- **Ensure** the final Lean4 code or final result is placed **only** in <Output>. |
|
@ Natural language theorem statement: |
|
mathd_algebra_478: |
|
The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65. |
|
|
|
@ Lean4 theorem statement: |
|
```lean4 |
|
theorem mathd_algebra_478 |
|
(b h v : ℝ) |
|
(h₀ : 0 < b ∧ 0 < h ∧ 0 < v) |
|
(h₁ : v = 1 / 3 * (b * h)) |
|
(h₂ : b = 30) |
|
(h₃ : h = 13 / 2) : |
|
v = 65 := by |
|
```& |
|
|
|
@ Lean4 theorem statement and proof with explanatory comments preceding each line of code: |
|
### Response: |
|
<Thought> |
|
Alright, I should do the following: |
|
|
|
1. Provide the natural language analysis for the theorem based on the Natural language theorem statement. |
|
|
|
2. Draft the Lean4 tactics I should use to solve the problem |
|
|
|
3. Write the output Lean4 code. |
|
|
|
The user also asks that I should avoid using the keyword `sorry` to give up the proof, so I will not write it in my Lean4 code. |
|
|
|
The `mathd_algebra_478` can be proofed by""" |
|
|
|
# tokenize and generation |
|
input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device) |
|
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048) |
|
|
|
# print results |
|
result = tokenizer.decode(outputs[0]) |
|
print(result) |
|
``` |
|
|
|
### Corrector Agent |
|
|
|
The model can be used as a corrector agent as well, here is an example (Note that the corrector may be unable to ) |
|
|
|
```python |
|
prompt = """<|begin▁of▁sentence|>You are a helpful mathematical assistant specialized in formal theorem proving using Lean4. |
|
Your objectives: |
|
1. Read and interpret the Lean4 theorem statement and any error messages. |
|
2. **If a previous proof attempt was incorrect, analyze its exact mistakes and completely discard or rewrite the proof as needed.** |
|
3. **Avoid reusing incorrect proof structures or strategies unless explicitly validated as correct.** |
|
4. **Address all error messages** by modifying the proof structure as needed. |
|
5. Prove the theorem **WITH** Long Chain-of-Thought process in the <Thought> section, but **only place the corrected Lean4 code in the <Output> section**. |
|
6. **Ensure the new proof is logically valid and does not use `sorry`.**### Instruction: @ Natural language theorem statement: |
|
mathd_algebra_400 |
|
The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65. |
|
|
|
@ Lean4 theorem statement: |
|
```lean4 |
|
theorem mathd_algebra_478 |
|
(b h v : ℝ) |
|
(h₀ : 0 < b ∧ 0 < h ∧ 0 < v) |
|
(h₁ : v = 1 / 3 * (b * h)) |
|
(h₂ : b = 30) |
|
(h₃ : h = 13 / 2) : |
|
v = 65 := by |
|
```& |
|
|
|
@ Lean4 theorem statement and proof with explanatory comments preceding each line of code: |
|
### Response: |
|
<Thought> |
|
Alright, I need to prove the theorem lean_workbook_plus_68493 using the Lean4 code. Here is my draft of the proof: |
|
|
|
```lean4 |
|
import Mathlib |
|
import Aesop |
|
|
|
set_option maxHeartbeats 0 |
|
|
|
open BigOperators Real Nat Topology Rat |
|
|
|
/--The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.-/ |
|
theorem mathd_algebra_478 |
|
(b h v : ℝ) |
|
(h₀ : 0 < b ∧ 0 < h ∧ 0 < v) |
|
(h₁ : v = 1 / 3 * (b * h)) |
|
(h₂ : b = 30) |
|
(h₃ : h = 13 / 2) : |
|
v = 65 := by |
|
rfl |
|
norm_num at h₁ |
|
linarith |
|
```& |
|
|
|
Let me test it in Lean4 |
|
|
|
Emmm, it seems the above proof is wrong. |
|
|
|
Let me check the error messages. |
|
|
|
OK, here are the error messages: |
|
|
|
```bash |
|
line 16 |
|
|
|
The rfl tactic failed. Possible reasons: |
|
- The goal is not a reflexive relation (neither `=` nor a relation with a @[refl] lemma). |
|
- The arguments of the relation are not equal. |
|
Try using the reflexivity lemma for your relation explicitly, e.g. `exact Eq.refl _` or |
|
`exact HEq.rfl` etc. |
|
b h v : ℝ |
|
h₀ : 0 < b ∧ 0 < h ∧ 0 < v |
|
h₁ : v = 1 / 3 * (b * h) |
|
h₂ : b = 30 |
|
h₃ : h = 13 / 2 |
|
⊢ v = 65 |
|
```& |
|
|
|
|
|
So, I will rethink a Lean4 proof following the steps |
|
|
|
1. Provide the natural language analysis for the theorem based on the Natural language theorem statement, Lean4 theorem statement, my previous proof and the error message. |
|
|
|
2. Draft the Lean4 tactics I should use to solve the problem |
|
|
|
3. Write the output Lean4 code. |
|
|
|
Let me analysis the wrong Lean4 solution through the error messages""" |
|
|
|
# tokenize and generation |
|
input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device) |
|
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048) |
|
|
|
# print results |
|
result = tokenizer.decode(outputs[0]) |
|
print(result) |
|
``` |
|
|
|
|
|
|
|
|
|
## Citation |
|
```bibtex |
|
@misc{wang2025malot, |
|
title={MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving}, |
|
author={Ruida Wang and Rui Pan and Yuxin Li and Jipeng Zhang and Yizhen Jia and Shizhe Diao and Renjie Pi and Junjie Hu and Tong Zhang}, |
|
year={2025}, |
|
eprint={2503.03205}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2503.03205}, |
|
} |
|
``` |