Llama-3.2-3B-CLEAR / README.md
oscarwu's picture
Update README.md
38b5692 verified
|
raw
history blame
2.97 kB
metadata
base_model: Llama-3.2-3B-Instruct
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
  - climate-policy
  - query-interpretation
  - lora
license: apache-2.0
language:
  - en

CLEAR Query Interpreter

This is the official implementation of the query interpretation model from our paper "CLEAR: Climate Policy Retrieval and Summarization Using LLMs" (WWW Companion '25).

Model Description

The model is a LoRA adapter fine-tuned on Llama-3.2-3B to decompose natural language queries about climate policies into structured components for precise information retrieval.

Task

Query interpretation for climate policy retrieval, decomposing natural queries into:

  • Location (L): Geographic identification
  • Topics (T): Climate-related themes
  • Intent (I): Specific policy inquiries

Training Details

  • Base Model: Llama-3.2-3B
  • Training Data: 330 manually annotated queries
  • Annotators: Four Australia-based experts with media communication backgrounds
  • Hardware: NVIDIA A100 GPU
  • Parameters:
    • Batch size: 6
    • Sequence length: 1024
    • Optimizer: AdamW (weight decay 0.05)
    • Learning rate: 5e-5
    • Epochs: 10

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "oscarwu/Llama-3.2-3B-CLEAR"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
   model_name,
   torch_dtype=torch.float16
).to(device)




# Example query
query = "I live in Burwood (Vic) and want details on renewable energy initiatives. Are solar farms planned?"

# Format prompt
prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Your response must be a valid JSON object, strictly following the requested format.
### Instruction:
Extract location, topics, and search queries from Australian climate policy questions. Your response must be a valid JSON object with the following structure:
{{
 "rag_queries": ["query1", "query2", "query3"],  // 1-3 policy search queries
 "topics": ["topic1", "topic2", "topic3"],       // 1-3 climate/environment topics
 "location": {{                                   
   "query_suburb": "suburb_name or null",
   "query_state": "state_code or null", 
   "query_lga": "lga_name or null"
 }}
}}
### Input:
{query}
### Response (valid JSON only):
"""




# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=220)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

```json

{
  "rag_queries": [
    "What renewable energy projects are planned for Burwood?",
    "Are there solar farm initiatives in Burwood Victoria?"
  ],
  "topics": [
    "renewable energy",
    "solar power"
  ],
  "location": {
    "query_suburb": "Burwood",
    "query_state": "VIC",
    "query_lga": null
  }
}