Nemotron Hinglish 4B Thinking Tool Use

A fine-tuned version of NVIDIA's Nemotron-4-Mini-Hindi-4B-Instruct model for function calling and reasoning in Hindi and Hinglish (Hindi-English code-mixed language).

Model Details

Base Model: nvidia/Nemotron-4-Mini-Hindi-4B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Languages: Hindi, Hinglish
Capabilities: Function calling, reasoning, conversational AI

Installation

pip install torch transformers peft datasets huggingface_hub

Usage

Loading the Model

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Configuration for 4-bit quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load the model
peft_model_id = "ankitdhiman/nemotron-hinglish-4b-thinking-tool-use"
device = "auto"

# Load configuration and base model
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    device_map=device,
    # quantization_config=bnb_config,  # Optional: add for 4-bit quantization
)

# Load tokenizer and resize embeddings
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))

# Load LoRA adapter
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

Basic Conversation

def generate_response(prompt, max_new_tokens=200):
    """Generate response for a given prompt"""
    
    # Tokenize the prompt directly (model uses special tokens format)
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.95,
            temperature=0.01,
            repetition_penalty=1.0,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Decode response
    response = tokenizer.decode(outputs[0])
    return response

# Example usage - Hinglish conversation
prompt = """<extra_id_0>System
Respond in Hinglish. Use devnagri script for hindi. Do not use bullet points or markdown formatting.
<extra_id_1>User
Arre yaar, mujhe batao ki agar main Delhi se Mumbai road trip karun, toh approx kitna distance hoga aur kitna time lagega?

<extra_id_1>Assistant
"""

response = generate_response(prompt)
print(response)

Function Calling Example

# Example with function calling - Currency conversion
prompt = """<extra_id_0>System
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. 
You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. 
Here are the available tools:
<AVAILABLE_TOOLS>[{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, 
{'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}]</AVAILABLE_TOOLS>

Use the following pydantic model json schema for each tool call you will make: 
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} 

For each function call return a json object with function name and arguments within <TOOLCALL>…</TOOLCALL> tags, as follows:
<TOOLCALL>
{tool_call}
</TOOLCALL>

Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

<extra_id_1>User
Hi, I need to convert 500 USD to Euros. Can you help me with that?

<extra_id_1>Assistant
<think>"""

response = generate_response(prompt, max_new_tokens=300)
print(response)

Thinking Process Example

The model has been trained to show its reasoning process using <think> tags:

# Example showing thinking process with function calling
prompt = """<extra_id_0>System

<extra_id_1>User
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_random_joke', 'description': 'Get a random joke', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}, {'type': 'function', 'function': {'name': 'calculate_discount', 'description': 'Calculate the discount on a product', 'parameters': {'type': 'object', 'properties': {'original_price': {'type': 'number', 'description': 'The original price of the product'}, 'discount_percentage': {'type': 'number', 'description': 'The percentage discount on the product'}}, 'required': ['original_price', 'discount_percentage']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>

Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

I'm feeling a bit down. Can you tell me a joke to cheer me up?
<extra_id_1>Assistant
"""

response = generate_response(prompt)
print(response)
# The model will show its thinking process like:
# <think>Okay, the user is feeling down and asks for a joke to cheer up. I look at the available functions: get_random_joke and calculate_discount. The function get_random_joke seems perfect because its purpose is exactly to provide a joke...</think>
# Then it will make the appropriate function call

Training Details

Datasets Used

Function Calling Dataset: Jofthomas/hermes-function-calling-thinking-V1
Hindi/Hinglish Dataset: maya-research/IndicVault (Hindi and Hinglish subsets)

Training Configuration

LoRA Rank: 16
LoRA Alpha: 64
LoRA Dropout: 0.05
Target Modules: up_proj, down_proj, q_proj, k_proj, o_proj, lm_head, embed_tokens
Learning Rate: 1e-4
Batch Size: 2 per device
Gradient Accumulation: 2 steps
Epochs: 10

Model Architecture

The model uses a custom chat template that supports:

System prompts
User messages
Assistant responses
Tool calls and tool responses
Thinking process with <think> tags

Limitations

The model is fine-tuned primarily for Hindi and Hinglish
Function calling capabilities are based on the training data patterns
May require specific prompt formatting for optimal performance

License

This model inherits the license from the base NVIDIA Nemotron model. Please check the original model's license for usage terms.

ankitdhiman
/

nemotron-hinglish-4b-thinking-tool-use