base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
- lora
- transformers
- text-generation
- fine-tuned
- quotes
- tinyllama
Model Card for learn-abc/tinyllama-custom-quotes
This model is a PEFT (LoRA)
fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0
. It has been specialized to act as an AI assistant that, when given an inspiring quote, provides the author's name, following a specific instruction-based chat format.
Model Details
Model Description
This model is a specialized version of TinyLlama-1.1B-Chat-v1.0
, fine-tuned using the QLoRA
technique. The primary objective of this fine-tuning was to adapt the base LLM's behavior to a specific task: generating the author's name for a given inspiring quote. It adheres to a conversational instruction format, making it suitable for focused Q&A on a dataset of quotes
and authors
.
- Developed by: The user (
learn-abc
) - Model type: Causal Language Model (Fine-tuned adapter)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Sources
Uses
Direct Use
This model is intended for direct use in applications requiring highly specialized text generation for quotes. Specifically, it can be prompted with an inspiring quote in a predefined instruction format, and it will generate the corresponding author. It is ideal for:
- Automated quote attribution systems.
- Educational tools for learning about famous quotes.
- Integrating a quote-lookup feature into a chatbot or application.
Downstream Use
This fine-tuned adapter can be integrated into larger systems or applications that require accurate quote-to-author mapping. Examples include:
- Enhancing content creation tools that deal with quotations.
- Part of a larger RAG system where quotes need specific attribution.
- Specialized virtual assistants focused on literary or motivational content.
Out-of-Scope Use
This model is not intended for:
- Generating general conversational text or engaging in open-ended dialogue.
- Providing factual information on topics outside of quote attribution.
- Generating code or structured data (unless further fine-tuned for such tasks).
- Use in high-stakes applications requiring absolute factual accuracy on diverse topics.
- Generating creative text that is not related to existing quotes and authors.
Bias, Risks, and Limitations
This model inherits biases present in its base model, TinyLlama/TinyLlama-1.1B-Chat-v1.0
, which was trained on a broad corpus. Additionally, biases from the Abirate/english_quotes
dataset (e.g., disproportionate representation of certain authors, historical periods, or cultural perspectives) may be introduced or amplified.
Risks & Limitations:
- Limited Scope: Its specialization means it will not perform well on general language tasks.
- Knowledge Cut-off: While fine-tuned, its knowledge is primarily constrained to the quotes present in the training data. It will likely hallucinate or fail if asked about quotes or authors not in its training set.
- Short Context: As TinyLlama is a smaller model, its effective context window may limit its ability to process very long quotes or complex instructions, although the fine-tuning format is designed to mitigate this.
- Hallucinations: Despite fine-tuning, the model may still "hallucinate" authors for unknown quotes or misattribute known quotes if the input is ambiguous or outside its learned patterns.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. For critical applications, human review of generated outputs is recommended. It should primarily be used for its intended task of quote attribution based on the fine-tuning data. Developers should evaluate its performance on a representative dataset reflecting their specific use case to understand its limitations.
How to Get Started with the Model
To use this model for inference, you can load the base model and then load the PEFT adapters on top of it. Alternatively, you can directly load the merged model if it has been saved in a standalone format.
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
import torch
# Define the model paths
BASE_MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
FINE_TUNED_ADAPTER_PATH = "learn-abc/tinyllama-custom-quotes" # Your Hugging Face repo ID
MERGED_MODEL_PATH = "/tinyllama_custom_quotes_fine_funed/merged_model" # If you have saved the merged model locally
# Option 1: Load base model and then PEFT adapter (requires peft installed)
# Load base model
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_NAME,
torch_dtype=torch.float16,
device_map="auto"
)
# Load fine-tuned adapter
model = PeftModel.from_pretrained(model, FINE_TUNED_ADAPTER_PATH)
model = model.merge_and_unload() # Merge adapters for easier inference
# Option 2: Directly load the merged model if it was saved as a full model
# model = AutoModelForCausalLM.from_pretrained(
# MERGED_MODEL_PATH,
# torch_dtype=torch.float16,
# device_map="auto"
# )
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Create a text generation pipeline
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
# Example usage
test_quote = "The only way to do great work is to love what you do."
formatted_prompt = f"""<s>[INST] <<SYS>>
You are an AI assistant that is an expert in writing inspiring quotes. Your task is to provide an inspiring quote for the user based on the given concept, followed by the author's name.
</SYS>>
{test_quote} [/INST]"""
result = generator(formatted_prompt, max_new_tokens=50, num_return_sequences=1)
generated_text = result[0]['generated_text']
print(f"Prompt: {test_quote}")
print(f"Generated Author: {generated_text.split('[/INST]')[-1].strip()}")
Training Details
Training Data
The model was fine-tuned on a subset of the Abirate/english_quotes
dataset. This dataset contains English quotes paired with their respective authors. The data was preprocessed to fit the Llama 2 chat instruction format, ensuring the model learned to map a given quote (as an "instruction") to its author (as the "response"). Each training sample was formatted as:
<s>[INST] <<SYS>>{system_prompt}</SYS>>\n\n{quote} [/INST] {author}</s>
Training Procedure
The model was fine-tuned using the QLoRA
(Quantized Low-Rank Adaptation) method, a parameter-efficient fine-tuning technique.
Preprocessing
The Abirate/english_quotes
dataset was loaded and a custom format_instruction
function was applied to transform each quote-author pair into the Llama 2 chat template. The dataset was then tokenized using the TinyLlama/TinyLlama-1.1B-Chat-v1.0
tokenizer, with truncation to max_seq_length=512
and right-padding
. Labels were created by copying the input IDs. The dataset was split into 90%
training and 10%
evaluation sets.
Training Hyperparameters
- Training regime: bf16 mixed precision
- Optimizer: paged_adamw_32bit
- Learning Rate: 2e-4
- Weight Decay: 0.001
- Gradient Norm Clipping: 0.3
- Warmup Ratio: 0.03
- Number of Epochs: 1
- Per Device Train Batch Size: 2
- Gradient Accumulation Steps: 2 (resulting in an effective batch size of 4)
- LoRA Config: r=64, lora_alpha=16, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
- LoRA Target Modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
- Gradient Checkpointing: Enabled with
use_reentrant=False
Speeds, Sizes, Times
- Model Parameters: 1.1 Billion (base model)
- Trainable Parameters (LoRA): Approximately
0.05%
of total parameters (specific number depends on exact model architecture). - Training Time: The training ran for approximately 1 hour and 35 minutes for 400 steps on the specified hardware.
- Checkpoint Size: Only the PEFT adapters (small, typically in MBs) are saved during training, along with tokenizer files. The full merged model is saved once at the end.
Evaluation
Testing Data
The model was evaluated on a 10%
split of the Abirate/english_quotes dataset
, which was held out from the training data. This validation set consists of tokenized quote-author pairs.
Factors
Evaluation was performed across the entire validation dataset. No specific subpopulations or sub-domains were isolated for disaggregated analysis.
Metrics
The primary metric used for evaluation during training was:
- eval_loss (Validation Loss): A measure of how well the model predicts the next token on the unseen validation data. Lower values indicate better performance.
Results
During training, the eval_loss reached approximately 0.3576
at the end of the single epoch. This indicates the model learned effectively to predict the author given the quote in the specified format.
Summary
The fine-tuning process successfully adapted the TinyLlama model to the task of quote attribution, as evidenced by the low validation loss. The model demonstrates the ability to generate the correct author for quotes it was fine-tuned on, following the Llama 2 chat instruction template.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA GPU (e.g.,
T4
or similar with ~14.57 GiB VRAM) - Hours used: ~1.6 hours
- Cloud Provider: User's Cloud Provider (e.g., AWS EC2)
- Carbon Emitted: ~50 - 100 grams of CO2eq (estimated based on typical cloud GPU power consumption and average grid emission factors for a short training run). This is a very low emission given the small model size and short training duration.
Technical Specifications
Model Architecture and Objective
The model is based on the TinyLlama-1.1B-Chat-v1.0
architecture, which is a decoder-only transformer model similar to Llama 2. Its objective was fine-tuned to perform causal language modeling, specifically predicting the author token sequence following a given quote within a chat-based instruction prompt. The QLoRA
method efficiently adapts this architecture by injecting low-rank adapters without fully retraining all original parameters.
Compute Infrastructure
Hardware
The fine-tuning was performed on a system equipped with an NVIDIA GPU
having approximately 14.57 GiB
of VRAM.
Software
- Operating System: Linux (e.g., Ubuntu)
- Python Version: Python 3.12+
- Deep Learning Framework: PyTorch
- Libraries: Hugging Face transformers, datasets, peft, bitsandbytes, trl.
Citation
BibTeX:
@misc{tinyllama_custom_quotes_fine_tuned,
author = {learn-abc},
title = {TinyLlama Custom Quotes Fine-Tune},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\url{https://huggingface.co/learn-abc/tinyllama-custom-quotes}}
}
APA:
learn-abc. (2025). TinyLlama Custom Quotes Fine-Tune. Hugging Face. Retrieved from https://huggingface.co/learn-abc/tinyllama-custom-quotes
Glossary
- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds small, trainable matrices (adapters) to a pre-trained model, significantly reducing the number of parameters that need to be updated during fine-tuning.
- QLoRA (Quantized LoRA): An extension of LoRA that further reduces memory usage by quantizing the pre-trained model's weights to 4-bit precision during training.
- Causal Language Model: A type of language model that predicts the next token in a sequence based only on the preceding tokens.
- PEFT (Parameter-Efficient Fine-Tuning): A family of methods designed to fine-tune large models more efficiently by updating only a small subset of the model's parameters.
- Hallucination: When an LLM generates plausible but factually incorrect or fabricated information.
Model Card Contact
Contact Me
For any inquiries or support, please reach out to:
- Author: Abhishek Singh
- LinkedIn: My LinkedIn Profile
- Portfolio: Abhishek Singh Portfolio
Framework versions PEFT 0.17.0