Nemotron Hinglish 4B Thinking Tool Use
A fine-tuned version of NVIDIA's Nemotron-4-Mini-Hindi-4B-Instruct model for function calling and reasoning in Hindi and Hinglish (Hindi-English code-mixed language).
Model Details
- Base Model: nvidia/Nemotron-4-Mini-Hindi-4B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Languages: Hindi, Hinglish
- Capabilities: Function calling, reasoning, conversational AI
Installation
pip install torch transformers peft datasets huggingface_hub
Usage
Loading the Model
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Configuration for 4-bit quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load the model
peft_model_id = "ankitdhiman/nemotron-hinglish-4b-thinking-tool-use"
device = "auto"
# Load configuration and base model
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
device_map=device,
# quantization_config=bnb_config, # Optional: add for 4-bit quantization
)
# Load tokenizer and resize embeddings
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
# Load LoRA adapter
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()
Basic Conversation
def generate_response(prompt, max_new_tokens=200):
"""Generate response for a given prompt"""
# Tokenize the prompt directly (model uses special tokens format)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
top_p=0.95,
temperature=0.01,
repetition_penalty=1.0,
eos_token_id=tokenizer.eos_token_id
)
# Decode response
response = tokenizer.decode(outputs[0])
return response
# Example usage - Hinglish conversation
prompt = """<extra_id_0>System
Respond in Hinglish. Use devnagri script for hindi. Do not use bullet points or markdown formatting.
<extra_id_1>User
Arre yaar, mujhe batao ki agar main Delhi se Mumbai road trip karun, toh approx kitna distance hoga aur kitna time lagega?
<extra_id_1>Assistant
"""
response = generate_response(prompt)
print(response)
Function Calling Example
# Example with function calling - Currency conversion
prompt = """<extra_id_0>System
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.
You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
Here are the available tools:
<AVAILABLE_TOOLS>[{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}},
{'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}]</AVAILABLE_TOOLS>
Use the following pydantic model json schema for each tool call you will make:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
For each function call return a json object with function name and arguments within <TOOLCALL>…</TOOLCALL> tags, as follows:
<TOOLCALL>
{tool_call}
</TOOLCALL>
Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>
<extra_id_1>User
Hi, I need to convert 500 USD to Euros. Can you help me with that?
<extra_id_1>Assistant
<think>"""
response = generate_response(prompt, max_new_tokens=300)
print(response)
Thinking Process Example
The model has been trained to show its reasoning process using <think>
tags:
# Example showing thinking process with function calling
prompt = """<extra_id_0>System
<extra_id_1>User
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_random_joke', 'description': 'Get a random joke', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}, {'type': 'function', 'function': {'name': 'calculate_discount', 'description': 'Calculate the discount on a product', 'parameters': {'type': 'object', 'properties': {'original_price': {'type': 'number', 'description': 'The original price of the product'}, 'discount_percentage': {'type': 'number', 'description': 'The percentage discount on the product'}}, 'required': ['original_price', 'discount_percentage']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>
Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>
I'm feeling a bit down. Can you tell me a joke to cheer me up?
<extra_id_1>Assistant
"""
response = generate_response(prompt)
print(response)
# The model will show its thinking process like:
# <think>Okay, the user is feeling down and asks for a joke to cheer up. I look at the available functions: get_random_joke and calculate_discount. The function get_random_joke seems perfect because its purpose is exactly to provide a joke...</think>
# Then it will make the appropriate function call
Training Details
Datasets Used
- Function Calling Dataset: Jofthomas/hermes-function-calling-thinking-V1
- Hindi/Hinglish Dataset: maya-research/IndicVault (Hindi and Hinglish subsets)
Training Configuration
- LoRA Rank: 16
- LoRA Alpha: 64
- LoRA Dropout: 0.05
- Target Modules: up_proj, down_proj, q_proj, k_proj, o_proj, lm_head, embed_tokens
- Learning Rate: 1e-4
- Batch Size: 2 per device
- Gradient Accumulation: 2 steps
- Epochs: 10
Model Architecture
The model uses a custom chat template that supports:
- System prompts
- User messages
- Assistant responses
- Tool calls and tool responses
- Thinking process with
<think>
tags
Limitations
- The model is fine-tuned primarily for Hindi and Hinglish
- Function calling capabilities are based on the training data patterns
- May require specific prompt formatting for optimal performance
License
This model inherits the license from the base NVIDIA Nemotron model. Please check the original model's license for usage terms.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ankitdhiman/nemotron-hinglish-4b-thinking-tool-use
Base model
nvidia/Minitron-4B-Base
Finetuned
nvidia/Nemotron-4-Mini-Hindi-4B-Base
Finetuned
nvidia/Nemotron-4-Mini-Hindi-4B-Instruct