Llama 3.2 3B — Smart Contract Decompiler (A1.5)

A LoRA fine-tuned Llama 3.2 3B model for decompiling EVM smart contract bytecode into human-readable Solidity source code.

This model implements the methodology from "Decompiling Smart Contracts with a Large Language Model" (arXiv:2506.19624v1).

Overview

Traditional decompilers (Panoramix, Heimdall) produce low-level, hard-to-read output with 0.4–0.5 semantic similarity to original source. This model achieves 0.82 semantic similarity by combining deterministic static analysis with neural code generation in a two-stage pipeline:

Bytecode → TAC — Static analysis converts raw EVM bytecode into a Three-Address Code (TAC) intermediate representation (control flow graph, basic blocks, jump targets, function selectors).
TAC → Solidity — This fine-tuned LLM generates readable Solidity from the TAC representation.

Model Details

Property	Value
Base Model	meta-llama/Llama-3.2-3B
Fine-tuning Method	LoRA (Low-Rank Adaptation) via PEFT
Task	Causal Language Modeling (`CAUSAL_LM`)
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.1
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Trainable Parameters	13,631,488 (0.42% of 3.2B total)
Max Sequence Length	4,096 tokens

Training Details

Dataset

Source: Ethereum mainnet verified contracts fetched via the Etherscan API
Format: JSONL with bytecode, tac, and solidity fields
Pipeline: Bytecode is fetched → converted to TAC via BytecodeAnalyzer (static analysis with control flow, basic blocks, dominance analysis, loop detection) → paired with the verified Solidity source
Size: 95 examples (85 train / 10 validation) from the demo dataset

Training Configuration

Parameter	Value
Epochs	3
Batch Size (per device)	1
Gradient Accumulation Steps	8
Effective Batch Size	8
Optimizer	AdamW (8-bit via bitsandbytes)
Learning Rate	2×10⁻⁴
LR Scheduler	Cosine
Warmup Steps	3
Weight Decay	0.01
Max Gradient Norm	1.0
FP16	Yes
Gradient Checkpointing	Yes

Training Results

Metric	Value
Final Training Loss	0.6553
Training Duration	~31 minutes
Total Optimization Steps	285
Hardware	NVIDIA RTX 4080 (16 GB VRAM)
Training Date	July 4, 2025

Evaluation Metrics

Metric	Target	Description
Semantic Similarity	> 0.80	CodeBERT embedding cosine similarity
Edit Distance	< 0.40	Normalized Levenshtein distance
Success Rate	> 78%	Percentage of functions exceeding similarity threshold

Comparison with Traditional Decompilers

Feature	This Model	Panoramix	Heimdall
Semantic Similarity	~0.82	~0.45	~0.40
Readable Output	✅	Partial	Partial
Variable Naming	Inferred	Generic	Generic
Function Signatures	✅	✅	✅
Complex Logic	Good	Limited	Limited

Usage

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    device_map="auto",
    load_in_8bit=True,
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "askalgore/llama-3.2-3b-A1.5")

Using the Project Wrapper

from src.model_setup import SmartContractLLM

llm = SmartContractLLM()
llm.load_model("models/final_model")
result = llm.generate(tac_input)

Prompt Format

### Task: Convert the following Three-Address Code (TAC) representation to Solidity source code.

### TAC:
{tac_representation}

### Solidity:

The model generates Solidity code following the ### Solidity: marker.

Three-Address Code (TAC) Representation

TAC is an intermediate representation produced by the static analysis stage. It captures:

Basic blocks — sequences of instructions with a single entry and exit point
Control flow — jumps, conditional branches, block predecessors/successors
Stack operations — translated from EVM's stack-based architecture to explicit temporary variables
Storage/Memory access — SLOAD, SSTORE, MLOAD, MSTORE operations
Function selectors — 4-byte keccak256 identifiers for function dispatch

Example TAC snippet:

Block_0:
  temp_1 = CALLDATALOAD(0)
  temp_2 = SHR(224, temp_1)
  IF temp_2 == 0x70a08231 GOTO Block_balanceOf
  IF temp_2 == 0xa9059cbb GOTO Block_transfer
  GOTO Block_fallback

Limitations

Approximate reconstruction — generated Solidity is an approximation, not an exact byte-for-byte match of the original source code
Variable names — original variable and function parameter names cannot be recovered; the model infers reasonable names
Compiler optimizations — some optimizations applied during compilation may not be reversible
Complex patterns — very complex or unusual control flow may produce less accurate results
Comments & NatSpec — original comments and documentation are not preserved
Demo dataset scale — this checkpoint was trained on 95 examples; larger datasets (the paper used 238,446 pairs) would improve quality

Project Repository

🔗 GitHub — A1.5_Smart_Contract_Bytecode_To_Code_Generator

The repository includes:

Full two-stage decompilation pipeline
Web interface for interactive decompilation
Bytecode analyzer with control flow analysis
Dataset collection pipeline via Etherscan
Training and evaluation scripts

Citation

If you use this model, please cite the underlying research paper:

@article{david2025decompiling,
  title={Decompiling Smart Contracts with a Large Language Model},
  author={David, Sifei and Zhou, Zhiyu and Song, Xuan and Gervais, Arthur and Qin, Benjamin},
  journal={arXiv preprint arXiv:2506.19624v1},
  year={2025}
}

License

This model adapter is released under the MIT License. The base model (Llama 3.2 3B) is subject to Meta's Llama Community License.

Acknowledgments

Meta AI for the Llama 3.2 base model
Paper authors (David, Zhou, Song, Gervais, Qin) for the research methodology
Etherscan for verified smart contract access
Hugging Face for model hosting, Transformers, and PEFT libraries

Downloads last month: 7

Model tree for askalgore/llama-3.2-3b-A1.5

Base model

meta-llama/Llama-3.2-3B

Adapter

(244)

this model

Paper for askalgore/llama-3.2-3b-A1.5

Decompiling Smart Contracts with a Large Language Model

Paper • 2506.19624 • Published Jun 24, 2025