LoT-Solver-Godel / README.md

Update README.md

7bb2156 verified 3 months ago

10 kB

	---
	license: mit
	base_model:
	- deepseek-ai/DeepSeek-Prover-V1.5-SFT
	---

	# MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

	* Paper link: [https://arxiv.org/abs/2503.03205](https://arxiv.org/abs/2503.03205)
	* LoT-Solver (Based on DeepSeek-Prover-v1.5-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver](https://huggingface.co/RickyDeSkywalker/LoT-Solver)
	* LoT-Solver (Based on Godel-Prover-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel](https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel)
	* Website: [https://ma-lot.github.io/](https://ma-lot.github.io/)
	* LoT-CorrectionData: [https://huggingface.co/datasets/RickyDeSkywalker/LoT-CorrectionData/viewer/default/train](https://huggingface.co/datasets/RickyDeSkywalker/LoT-CorrectionData/viewer/default/train)

	<!-- Provide a quick summary of what the model is/does. -->

	This is the model under MA-LoT paper presented training pipeline trained on DeepSeek-Prover-v1.5-SFT. For usage of the model, pelase refer to our GitHub page.

	## Introduction
	Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted the mathematical
	and computer science communities. State-of-the-art methods utilize a single Large Language Model (LLM)
	to generate complete proof or perform tree search, but they fail to balance these tasks.
	We propose MA-LoT: Model-CollAboration Lean-based Long Chain-of-Thought,
	a comprehensive framework for Lean4 theorem proving to solve this issue.
	It separates the cognition tasks of general NL for whole-proof generation and error analysis for proof correction using the
	model-collaboration method.
	We achieve this by structured interaction of the LLM and Lean4 verifier in Long CoT.
	To implement the framework, we propose the novel LoT-Transfer Learning training-inference pipeline,
	which enables the Long CoT thinking capability to LLMs without special data annotation.
	Extensive experiment shows that our framework achieves a 61.07% accuracy rate on the Lean4 version of the MiniF2F-Test dataset,
	largely outperforming DeepSeek-V3 (33.61%), single-model tree search (InternLM-Step-Prover, 50.70%),
	and whole-proof generation (Godel-Prover, 55.33%) baselines. Furthermore, our findings highlight the potential
	of combining Long CoT with formal verification for a more insightful generation in a broader perspective.

	## Evaluation Results

	<p align="center">
	<img width="100%" src="performance.png">
	</p>

	## Usage example

	### General setup
	Load model
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	MODEL_ID = "RickyDeSkywalker/LoT-Solver"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_ID,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	```

	### Prover Model

	#### Code-completion usage

	The model can be used as a general code-completion prover as follows:

	```python
	# formulate prompt
	prompt = """Complete the following Lean 4 code with explanatory comments preceding each line of code:

	```lean4
	import Mathlib
	import Aesop

	set_option maxHeartbeats 0

	open BigOperators Real Nat Topology Rat
	/--Suppose $a, b, c$ are the sides of a triangle. Prove that

	$a^2(b+c-a)+b^2(c+a-b)+c^2(a+b-c)\le{3abc}.$-/
	theorem imo_1964_p2
	(a b c : ℝ)
	(h₀ : 0 < a ∧ 0 < b ∧ 0 < c)
	(h₁ : c < a + b)
	(h₂ : b < a + c)
	(h₃ : a < b + c) :
	a^2 * (b + c - a) + b^2 * (c + a - b) + c^2 * (a + b - c) ≤ 3 * a * b * c := by"""

	# tokenize and generation
	input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
	outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048)

	# print results
	result = tokenizer.decode(outputs[0])
	print(result)
	```

	#### Lean-based Long CoT Prover:

	The model can also be used as a LoT(Lean long CoT) prover as follows:

	```python
	# formulate prompt
	prompt = """<｜begin▁of▁sentence｜>You are an expert in Lean4 theorem proving with exceptional strategic reasoning abilities. When solving problems, strictly follow this process:
	1. First, create a natural language proof draft explaining key insights and logical steps
	2. Then analyze required Lean4 tactics, specifying exact syntax and their logical purpose
	### Instruction: You will receive several Lean4 problems. For each:
	- Prove the theorem WITH Long Chain-of-Thought in the <Thought> block.
	- Do not reveal your chain of thought outside the <Thought> block.
	- Ensure the final Lean4 code or final result is placed only in <Output>.
	@ Natural language theorem statement:
	mathd_algebra_478:
	The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.

	@ Lean4 theorem statement:
	```lean4
	theorem mathd_algebra_478
	(b h v : ℝ)
	(h₀ : 0 < b ∧ 0 < h ∧ 0 < v)
	(h₁ : v = 1 / 3 * (b * h))
	(h₂ : b = 30)
	(h₃ : h = 13 / 2) :
	v = 65 := by
	```&

	@ Lean4 theorem statement and proof with explanatory comments preceding each line of code:
	### Response:
	<Thought>
	Alright, I should do the following:

	1. Provide the natural language analysis for the theorem based on the Natural language theorem statement.

	2. Draft the Lean4 tactics I should use to solve the problem

	3. Write the output Lean4 code.

	The user also asks that I should avoid using the keyword `sorry` to give up the proof, so I will not write it in my Lean4 code.

	The `mathd_algebra_478` can be proofed by"""

	# tokenize and generation
	input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
	outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048)

	# print results
	result = tokenizer.decode(outputs[0])
	print(result)
	```

	### Corrector Agent

	The model can be used as a corrector agent as well, here is an example (Note that the corrector may be unable to )

	```python
	prompt = """<｜begin▁of▁sentence｜>You are a helpful mathematical assistant specialized in formal theorem proving using Lean4.
	Your objectives:
	1. Read and interpret the Lean4 theorem statement and any error messages.
	2. If a previous proof attempt was incorrect, analyze its exact mistakes and completely discard or rewrite the proof as needed.
	3. Avoid reusing incorrect proof structures or strategies unless explicitly validated as correct.
	4. Address all error messages by modifying the proof structure as needed.
	5. Prove the theorem WITH Long Chain-of-Thought process in the <Thought> section, but only place the corrected Lean4 code in the <Output> section.
	6. Ensure the new proof is logically valid and does not use `sorry`.### Instruction: @ Natural language theorem statement:
	mathd_algebra_400
	The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.

	@ Lean4 theorem statement:
	```lean4
	theorem mathd_algebra_478
	(b h v : ℝ)
	(h₀ : 0 < b ∧ 0 < h ∧ 0 < v)
	(h₁ : v = 1 / 3 * (b * h))
	(h₂ : b = 30)
	(h₃ : h = 13 / 2) :
	v = 65 := by
	```&

	@ Lean4 theorem statement and proof with explanatory comments preceding each line of code:
	### Response:
	<Thought>
	Alright, I need to prove the theorem lean_workbook_plus_68493 using the Lean4 code. Here is my draft of the proof:

	```lean4
	import Mathlib
	import Aesop

	set_option maxHeartbeats 0

	open BigOperators Real Nat Topology Rat

	/--The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.-/
	theorem mathd_algebra_478
	(b h v : ℝ)
	(h₀ : 0 < b ∧ 0 < h ∧ 0 < v)
	(h₁ : v = 1 / 3 * (b * h))
	(h₂ : b = 30)
	(h₃ : h = 13 / 2) :
	v = 65 := by
	rfl
	norm_num at h₁
	linarith
	```&

	Let me test it in Lean4

	Emmm, it seems the above proof is wrong.

	Let me check the error messages.

	OK, here are the error messages:

	```bash
	line 16

	The rfl tactic failed. Possible reasons:
	- The goal is not a reflexive relation (neither `=` nor a relation with a @[refl] lemma).
	- The arguments of the relation are not equal.
	Try using the reflexivity lemma for your relation explicitly, e.g. `exact Eq.refl _` or
	`exact HEq.rfl` etc.
	b h v : ℝ
	h₀ : 0 < b ∧ 0 < h ∧ 0 < v
	h₁ : v = 1 / 3 * (b * h)
	h₂ : b = 30
	h₃ : h = 13 / 2
	⊢ v = 65
	```&


	So, I will rethink a Lean4 proof following the steps

	1. Provide the natural language analysis for the theorem based on the Natural language theorem statement, Lean4 theorem statement, my previous proof and the error message.

	2. Draft the Lean4 tactics I should use to solve the problem

	3. Write the output Lean4 code.

	Let me analysis the wrong Lean4 solution through the error messages"""

	# tokenize and generation
	input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
	outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048)

	# print results
	result = tokenizer.decode(outputs[0])
	print(result)
	```




	## Citation
	```bibtex
	@misc{wang2025malot,
	title={MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving},
	author={Ruida Wang and Rui Pan and Yuxin Li and Jipeng Zhang and Yizhen Jia and Shizhe Diao and Renjie Pi and Junjie Hu and Tong Zhang},
	year={2025},
	eprint={2503.03205},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2503.03205},
	}
	```