Llama 3.2 3B — Smart Contract Decompiler (A1.5)

A LoRA fine-tuned Llama 3.2 3B model for decompiling EVM smart contract bytecode into human-readable Solidity source code.

This model implements the methodology from "Decompiling Smart Contracts with a Large Language Model" (arXiv:2506.19624v1).

Overview

Traditional decompilers (Panoramix, Heimdall) produce low-level, hard-to-read output with 0.4–0.5 semantic similarity to original source. This model achieves 0.82 semantic similarity by combining deterministic static analysis with neural code generation in a two-stage pipeline:

  1. Bytecode → TAC — Static analysis converts raw EVM bytecode into a Three-Address Code (TAC) intermediate representation (control flow graph, basic blocks, jump targets, function selectors).
  2. TAC → Solidity — This fine-tuned LLM generates readable Solidity from the TAC representation.

Model Details

Property Value
Base Model meta-llama/Llama-3.2-3B
Fine-tuning Method LoRA (Low-Rank Adaptation) via PEFT
Task Causal Language Modeling (CAUSAL_LM)
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.1
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters 13,631,488 (0.42% of 3.2B total)
Max Sequence Length 4,096 tokens

Training Details

Dataset

  • Source: Ethereum mainnet verified contracts fetched via the Etherscan API
  • Format: JSONL with bytecode, tac, and solidity fields
  • Pipeline: Bytecode is fetched → converted to TAC via BytecodeAnalyzer (static analysis with control flow, basic blocks, dominance analysis, loop detection) → paired with the verified Solidity source
  • Size: 95 examples (85 train / 10 validation) from the demo dataset

Training Configuration

Parameter Value
Epochs 3
Batch Size (per device) 1
Gradient Accumulation Steps 8
Effective Batch Size 8
Optimizer AdamW (8-bit via bitsandbytes)
Learning Rate 2×10⁻⁴
LR Scheduler Cosine
Warmup Steps 3
Weight Decay 0.01
Max Gradient Norm 1.0
FP16 Yes
Gradient Checkpointing Yes

Training Results

Metric Value
Final Training Loss 0.6553
Training Duration ~31 minutes
Total Optimization Steps 285
Hardware NVIDIA RTX 4080 (16 GB VRAM)
Training Date July 4, 2025

Evaluation Metrics

Metric Target Description
Semantic Similarity > 0.80 CodeBERT embedding cosine similarity
Edit Distance < 0.40 Normalized Levenshtein distance
Success Rate > 78% Percentage of functions exceeding similarity threshold

Comparison with Traditional Decompilers

Feature This Model Panoramix Heimdall
Semantic Similarity ~0.82 ~0.45 ~0.40
Readable Output Partial Partial
Variable Naming Inferred Generic Generic
Function Signatures
Complex Logic Good Limited Limited

Usage

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    device_map="auto",
    load_in_8bit=True,
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "askalgore/llama-3.2-3b-A1.5")

Using the Project Wrapper

from src.model_setup import SmartContractLLM

llm = SmartContractLLM()
llm.load_model("models/final_model")
result = llm.generate(tac_input)

Prompt Format

### Task: Convert the following Three-Address Code (TAC) representation to Solidity source code.

### TAC:
{tac_representation}

### Solidity:

The model generates Solidity code following the ### Solidity: marker.

Three-Address Code (TAC) Representation

TAC is an intermediate representation produced by the static analysis stage. It captures:

  • Basic blocks — sequences of instructions with a single entry and exit point
  • Control flow — jumps, conditional branches, block predecessors/successors
  • Stack operations — translated from EVM's stack-based architecture to explicit temporary variables
  • Storage/Memory access — SLOAD, SSTORE, MLOAD, MSTORE operations
  • Function selectors — 4-byte keccak256 identifiers for function dispatch

Example TAC snippet:

Block_0:
  temp_1 = CALLDATALOAD(0)
  temp_2 = SHR(224, temp_1)
  IF temp_2 == 0x70a08231 GOTO Block_balanceOf
  IF temp_2 == 0xa9059cbb GOTO Block_transfer
  GOTO Block_fallback

Limitations

  1. Approximate reconstruction — generated Solidity is an approximation, not an exact byte-for-byte match of the original source code
  2. Variable names — original variable and function parameter names cannot be recovered; the model infers reasonable names
  3. Compiler optimizations — some optimizations applied during compilation may not be reversible
  4. Complex patterns — very complex or unusual control flow may produce less accurate results
  5. Comments & NatSpec — original comments and documentation are not preserved
  6. Demo dataset scale — this checkpoint was trained on 95 examples; larger datasets (the paper used 238,446 pairs) would improve quality

Project Repository

🔗 GitHub — A1.5_Smart_Contract_Bytecode_To_Code_Generator

The repository includes:

  • Full two-stage decompilation pipeline
  • Web interface for interactive decompilation
  • Bytecode analyzer with control flow analysis
  • Dataset collection pipeline via Etherscan
  • Training and evaluation scripts

Citation

If you use this model, please cite the underlying research paper:

@article{david2025decompiling,
  title={Decompiling Smart Contracts with a Large Language Model},
  author={David, Sifei and Zhou, Zhiyu and Song, Xuan and Gervais, Arthur and Qin, Benjamin},
  journal={arXiv preprint arXiv:2506.19624v1},
  year={2025}
}

License

This model adapter is released under the MIT License. The base model (Llama 3.2 3B) is subject to Meta's Llama Community License.

Acknowledgments

  • Meta AI for the Llama 3.2 base model
  • Paper authors (David, Zhou, Song, Gervais, Qin) for the research methodology
  • Etherscan for verified smart contract access
  • Hugging Face for model hosting, Transformers, and PEFT libraries
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for askalgore/llama-3.2-3b-A1.5

Adapter
(244)
this model

Paper for askalgore/llama-3.2-3b-A1.5