🧠 Multi-Task Address Reasoning Model v1.0
This model is a multi-task fine-tuned model specialized for address correction, component extraction, and geographic Q&A with Chain of Thought reasoning. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.
🎯 Model Description
Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning
Key Capabilities
- 🔧 Address Correction: Fix spelling errors, formatting issues, and incomplete addresses
- 📊 Component Extraction: Extract and structure address components (building, locality, city, state, pincode)
- ❓ Geographic Q&A: Answer questions about locations, states, cities, and geographic relationships
- 🧠 Chain of Thought Reasoning: Detailed step-by-step reasoning for address analysis
- 🎯 Multi-Task Learning: Single model handles multiple address-related tasks
📊 Model Architecture & Training
- Base Model: unsloth/Llama-3.2-1B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
- LoRA Rank (r): 64
- LoRA Alpha: 128
- LoRA Dropout: 0.1
- Target Modules: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
- Model Size: ~276MB (adapter only)
- Checkpoint: 435
- Max Sequence Length: 1024 tokens (auto-optimized from sequence analysis)
Training Configuration
- Learning Rate: 1e-4
- Batch Size: 32 (1 per device × 32 gradient accumulation)
- Epochs: 3
- Optimizer: adamw_8bit
- Scheduler: cosine
- Weight Decay: 0.01
🚀 Usage Examples
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")
# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct" # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"
print("📥 Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Add pad token if missing
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
print("📥 Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
print("📥 Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)
print("✅ Model loaded successfully!")
def process_address_with_reasoning(prompt, max_new_tokens=400):
"""Process address with Chain of Thought reasoning (as trained)"""
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
# Move inputs to model device
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
# Generate with reasoning (matching training parameters)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.1, # Lower temperature as used in training testing
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
use_cache=True
)
# Decode only the new tokens
input_length = inputs['input_ids'].shape[1]
generated_tokens = outputs[0][input_length:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return response.strip()
def fix_address_with_reasoning(address, max_new_tokens=400):
"""Fix address with detailed Chain of Thought reasoning"""
messages = [
{"role": "user", "content": f"Fix and extract components from: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def answer_geographic_question(question, max_new_tokens=150):
"""Answer geographic questions about addresses"""
messages = [
{"role": "user", "content": question}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def extract_components(address, max_new_tokens=200):
"""Extract address components with reasoning"""
messages = [
{"role": "user", "content": f"Extract all components from this address: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
# Test cases based on training script examples
print("""
🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
print("📊 Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)
# Test address correction with reasoning (exact example from training)
test_addresses = [
"pandit nagla badi masjid moradabad 244001",
"sec 14 gurgoan haryana 122001",
"koramangala bangalor 560095",
"dlf cyber city gurgaon haryana"
]
print(f"""
🔧 TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)
for i, test_address in enumerate(test_addresses, 1):
print(f"""
📍 Test {i}: {test_address}""")
result = fix_address_with_reasoning(test_address)
print(f"🤖 Chain of Thought Response:")
print(f" {result}")
print("-" * 40)
# Test geographic Q&A (examples from training script)
qa_tests = [
"Which state is Mumbai in?",
"What is the pincode of Bangalore?",
"Is Delhi a metro city?",
"What tier city is Pune?",
"Where is Connaught Place located?",
"What state does Hyderabad belong to?",
"Name a city in Karnataka.",
"What is the postal code for Gurgaon?",
"Which state is New Delhi in?", # Training example
"What cities are in Maharashtra?"
]
print(f"""
❓ TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)
for i, question in enumerate(qa_tests[:8], 1): # Test first 8 questions
print(f"""
❓ Q{i}: {question}""")
result = answer_geographic_question(question)
print(f"🤖 Answer: {result}")
# Test component extraction
print(f"""
📊 TESTING COMPONENT EXTRACTION:""")
print("-" * 50)
extraction_tests = [
"Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
"DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
"Connaught Place, New Delhi, Delhi 110001"
]
for i, test_address in enumerate(extraction_tests, 1):
print(f"""
📊 Extract {i}: {test_address}""")
result = extract_components(test_address)
print(f"🤖 Components: {result}")
print(f"""
✅ ALL TESTS COMPLETED!""")
print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
print(f"""📍 Geographic knowledge from NER training data""")
print(f"""🔧 Address correction with detailed analysis""")
🧠 Training Methodology
This model was trained using a sophisticated multi-task approach:
1. Data Preparation Strategy
- Source: Address NER dataset with structured components (address → corrected_address → extracted_info)
- Multi-task Split: 70% Chain of Thought address correction + 30% Geographic Q&A
- Data Augmentation: Generated 584.8% of original data from original dataset
- Reasoning Integration: Each sample includes step-by-step analytical reasoning
2. Chain of Thought Address Correction
- Input: Raw/incomplete addresses with potential errors
- Process: Model analyzes, identifies issues, and explains corrections
- Output: Detailed reasoning + structured JSON with address components
- Examples: Spelling fixes, state inference, component extraction
3. Geographic Q&A Generation
From each address record's NER data, the model generates multiple Q&A pairs:
- State-City relationships: "Which state is Mumbai in?" → "Mumbai is in Maharashtra state."
- Pincode queries: "What is the pincode of Bangalore?" → "The pincode of Bangalore is 560001."
- City tier classification: "Is Delhi a metro city?" → "Yes, Delhi is a metropolitan city."
- Locality mapping: "Where is Connaught Place?" → "Connaught Place is in New Delhi."
4. Sequence Optimization
- Dynamic Analysis: Analyzed 1000+ samples to determine optimal context length
- Result: 99% samples fit in 768 tokens, optimized for 1024
- Context Window: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks
🔧 Training Performance
Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0
🎭 Supported Tasks
1. Address Correction with Reasoning
- Fix spelling errors and formatting issues
- Infer missing components (state, city tier)
- Provide step-by-step reasoning for corrections
2. Component Extraction
- Extract building names, localities, cities, states, pincodes
- Structure unstructured address data
- Identify address hierarchy and relationships
3. Geographic Q&A
- Answer questions about cities, states, and locations
- Provide geographic knowledge and relationships
- Handle location-based queries
4. Address Standardization
- Convert informal addresses to structured format
- Normalize address formats
- Handle various input formats
💡 Use Cases
1. E-commerce & Logistics
- Correct customer addresses during checkout
- Extract delivery components for routing
- Answer location-based customer queries
2. Data Processing & Migration
- Clean legacy address databases with reasoning
- Extract structured data from unstructured addresses
- Provide explanations for address corrections
3. Customer Support Automation
- Answer geographic questions about locations
- Help customers correct their addresses
- Provide location-based information
4. Address Intelligence
- Analyze address patterns and relationships
- Infer missing address components
- Provide geographic context and reasoning
🎯 Prompt Formats
The model works with Llama-3.2 chat format:
Address Correction
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Geographic Q&A
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Component Extraction
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
⚡ Performance Tips
- Temperature Settings: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
- Context Management: Keep prompts under 512 tokens for optimal performance
- Batch Processing: Process multiple addresses efficiently with batching
- Device Placement: Ensure all tensors are on the same device (GPU/CPU)
- Memory Management: Use float16 for memory efficiency
⚠️ Limitations
- Model Size: 1B parameters - may have limitations compared to larger models
- Training Data: Based on specific dataset - may not generalize to all address formats
- Geographic Scope: Optimized for Indian addresses and geography
- Reasoning Depth: Chain of thought reasoning may vary in complexity
- Device Compatibility: Requires proper device placement for inference
📋 Model Files
adapter_config.json
: LoRA adapter configurationadapter_model.safetensors
: LoRA adapter weightstokenizer_config.json
: Tokenizer configurationtokenizer.json
: Tokenizer vocabulary and settingsspecial_tokens_map.json
: Special tokens mappingchat_template.jinja
: Chat template for conversations
🔄 Model Updates
- Version: 1.0 (Checkpoint 435)
- Last Updated: 2025-07-08
- Training Framework: Unsloth + LoRA
- Base Model: Llama-3.2-1B-Instruct
📚 Citation
If you use this model in your research or applications, please cite:
@misc{multitask-address-reasoning-model,
title={Multi-Task Address Reasoning Model},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}
📞 Support & Contact
For questions, issues, or feature requests:
- Open an issue in this repository
- Contact: shiprocket-ai team
- Documentation: See usage examples above
📜 License
This model is released under the Apache 2.0 License. See LICENSE file for details.
Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth
Model tree for shiprocket-ai/multitask-address-reasoning-llama-1B-model
Base model
meta-llama/Llama-3.2-1B-Instruct