metadata

base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
  - lora
  - transformers
  - text-generation
  - fine-tuned
  - quotes
  - tinyllama

Model Card for learn-abc/tinyllama-custom-quotes

This model is a PEFT (LoRA) fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0. It has been specialized to act as an AI assistant that, when given an inspiring quote, provides the author's name, following a specific instruction-based chat format.

Model Details

Model Description

This model is a specialized version of TinyLlama-1.1B-Chat-v1.0, fine-tuned using the QLoRA technique. The primary objective of this fine-tuning was to adapt the base LLM's behavior to a specific task: generating the author's name for a given inspiring quote. It adheres to a conversational instruction format, making it suitable for focused Q&A on a dataset of quotes and authors.

Developed by: The user (learn-abc)
Model type: Causal Language Model (Fine-tuned adapter)
Language(s) (NLP): English
License: MIT
Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources

Repository: https://huggingface.co/learn-abc/tinyllama-custom-quotes

Uses

Direct Use

This model is intended for direct use in applications requiring highly specialized text generation for quotes. Specifically, it can be prompted with an inspiring quote in a predefined instruction format, and it will generate the corresponding author. It is ideal for:

Automated quote attribution systems.
Educational tools for learning about famous quotes.
Integrating a quote-lookup feature into a chatbot or application.

Downstream Use

This fine-tuned adapter can be integrated into larger systems or applications that require accurate quote-to-author mapping. Examples include:

Enhancing content creation tools that deal with quotations.
Part of a larger RAG system where quotes need specific attribution.
Specialized virtual assistants focused on literary or motivational content.

Out-of-Scope Use

This model is not intended for:

Generating general conversational text or engaging in open-ended dialogue.
Providing factual information on topics outside of quote attribution.
Generating code or structured data (unless further fine-tuned for such tasks).
Use in high-stakes applications requiring absolute factual accuracy on diverse topics.
Generating creative text that is not related to existing quotes and authors.

Bias, Risks, and Limitations

This model inherits biases present in its base model, TinyLlama/TinyLlama-1.1B-Chat-v1.0, which was trained on a broad corpus. Additionally, biases from the Abirate/english_quotes dataset (e.g., disproportionate representation of certain authors, historical periods, or cultural perspectives) may be introduced or amplified.

Risks & Limitations:

Limited Scope: Its specialization means it will not perform well on general language tasks.
Knowledge Cut-off: While fine-tuned, its knowledge is primarily constrained to the quotes present in the training data. It will likely hallucinate or fail if asked about quotes or authors not in its training set.
Short Context: As TinyLlama is a smaller model, its effective context window may limit its ability to process very long quotes or complex instructions, although the fine-tuning format is designed to mitigate this.
Hallucinations: Despite fine-tuning, the model may still "hallucinate" authors for unknown quotes or misattribute known quotes if the input is ambiguous or outside its learned patterns.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. For critical applications, human review of generated outputs is recommended. It should primarily be used for its intended task of quote attribution based on the fine-tuning data. Developers should evaluate its performance on a representative dataset reflecting their specific use case to understand its limitations.

How to Get Started with the Model

To use this model for inference, you can load the base model and then load the PEFT adapters on top of it. Alternatively, you can directly load the merged model if it has been saved in a standalone format.

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
import torch

# Define the model paths
BASE_MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
FINE_TUNED_ADAPTER_PATH = "learn-abc/tinyllama-custom-quotes" # Your Hugging Face repo ID
MERGED_MODEL_PATH = "/tinyllama_custom_quotes_fine_funed/merged_model" # If you have saved the merged model locally

# Option 1: Load base model and then PEFT adapter (requires peft installed)
# Load base model
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto"
)
# Load fine-tuned adapter
model = PeftModel.from_pretrained(model, FINE_TUNED_ADAPTER_PATH)
model = model.merge_and_unload() # Merge adapters for easier inference

# Option 2: Directly load the merged model if it was saved as a full model
# model = AutoModelForCausalLM.from_pretrained(
#     MERGED_MODEL_PATH,
#     torch_dtype=torch.float16,
#     device_map="auto"
# )

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Create a text generation pipeline
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

# Example usage
test_quote = "The only way to do great work is to love what you do."
formatted_prompt = f"""<s>[INST] <<SYS>>
You are an AI assistant that is an expert in writing inspiring quotes. Your task is to provide an inspiring quote for the user based on the given concept, followed by the author's name.
</SYS>>

{test_quote} [/INST]"""

result = generator(formatted_prompt, max_new_tokens=50, num_return_sequences=1)
generated_text = result[0]['generated_text']

print(f"Prompt: {test_quote}")
print(f"Generated Author: {generated_text.split('[/INST]')[-1].strip()}")

Training Details

Training Data

The model was fine-tuned on a subset of the Abirate/english_quotes dataset. This dataset contains English quotes paired with their respective authors. The data was preprocessed to fit the Llama 2 chat instruction format, ensuring the model learned to map a given quote (as an "instruction") to its author (as the "response"). Each training sample was formatted as:

<s>[INST] <<SYS>>{system_prompt}</SYS>>\n\n{quote} [/INST] {author}</s>

Training Procedure

The model was fine-tuned using the QLoRA (Quantized Low-Rank Adaptation) method, a parameter-efficient fine-tuning technique.

Preprocessing

The Abirate/english_quotes dataset was loaded and a custom format_instruction function was applied to transform each quote-author pair into the Llama 2 chat template. The dataset was then tokenized using the TinyLlama/TinyLlama-1.1B-Chat-v1.0 tokenizer, with truncation to max_seq_length=512 and right-padding. Labels were created by copying the input IDs. The dataset was split into 90% training and 10% evaluation sets.

Training Hyperparameters

Training regime: bf16 mixed precision
Optimizer: paged_adamw_32bit
Learning Rate: 2e-4
Weight Decay: 0.001
Gradient Norm Clipping: 0.3
Warmup Ratio: 0.03
Number of Epochs: 1
Per Device Train Batch Size: 2
Gradient Accumulation Steps: 2 (resulting in an effective batch size of 4)
LoRA Config: r=64, lora_alpha=16, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
LoRA Target Modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Gradient Checkpointing: Enabled with use_reentrant=False

Speeds, Sizes, Times

Model Parameters: 1.1 Billion (base model)
Trainable Parameters (LoRA): Approximately 0.05% of total parameters (specific number depends on exact model architecture).
Training Time: The training ran for approximately 1 hour and 35 minutes for 400 steps on the specified hardware.
Checkpoint Size: Only the PEFT adapters (small, typically in MBs) are saved during training, along with tokenizer files. The full merged model is saved once at the end.

Evaluation

Testing Data

The model was evaluated on a 10% split of the Abirate/english_quotes dataset, which was held out from the training data. This validation set consists of tokenized quote-author pairs.

Factors

Evaluation was performed across the entire validation dataset. No specific subpopulations or sub-domains were isolated for disaggregated analysis.

Metrics

The primary metric used for evaluation during training was:

eval_loss (Validation Loss): A measure of how well the model predicts the next token on the unseen validation data. Lower values indicate better performance.

Results

During training, the eval_loss reached approximately 0.3576 at the end of the single epoch. This indicates the model learned effectively to predict the author given the quote in the specified format.

Summary

The fine-tuning process successfully adapted the TinyLlama model to the task of quote attribution, as evidenced by the low validation loss. The model demonstrates the ability to generate the correct author for quotes it was fine-tuned on, following the Llama 2 chat instruction template.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA GPU (e.g., T4 or similar with ~14.57 GiB VRAM)
Hours used: ~1.6 hours
Cloud Provider: User's Cloud Provider (e.g., AWS EC2)
Carbon Emitted: ~50 - 100 grams of CO2eq (estimated based on typical cloud GPU power consumption and average grid emission factors for a short training run). This is a very low emission given the small model size and short training duration.

Technical Specifications

Model Architecture and Objective

The model is based on the TinyLlama-1.1B-Chat-v1.0 architecture, which is a decoder-only transformer model similar to Llama 2. Its objective was fine-tuned to perform causal language modeling, specifically predicting the author token sequence following a given quote within a chat-based instruction prompt. The QLoRA method efficiently adapts this architecture by injecting low-rank adapters without fully retraining all original parameters.

Compute Infrastructure

Hardware

The fine-tuning was performed on a system equipped with an NVIDIA GPU having approximately 14.57 GiB of VRAM.

Software

Operating System: Linux (e.g., Ubuntu)
Python Version: Python 3.12+
Deep Learning Framework: PyTorch
Libraries: Hugging Face transformers, datasets, peft, bitsandbytes, trl.

Citation

BibTeX:

@misc{tinyllama_custom_quotes_fine_tuned,
  author = {learn-abc},
  title = {TinyLlama Custom Quotes Fine-Tune},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/learn-abc/tinyllama-custom-quotes}}
}

APA:

learn-abc. (2025). TinyLlama Custom Quotes Fine-Tune. Hugging Face. Retrieved from https://huggingface.co/learn-abc/tinyllama-custom-quotes

Glossary

LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds small, trainable matrices (adapters) to a pre-trained model, significantly reducing the number of parameters that need to be updated during fine-tuning.
QLoRA (Quantized LoRA): An extension of LoRA that further reduces memory usage by quantizing the pre-trained model's weights to 4-bit precision during training.
Causal Language Model: A type of language model that predicts the next token in a sequence based only on the preceding tokens.
PEFT (Parameter-Efficient Fine-Tuning): A family of methods designed to fine-tune large models more efficiently by updating only a small subset of the model's parameters.
Hallucination: When an LLM generates plausible but factually incorrect or fabricated information.

Model Card Contact

Contact Me

For any inquiries or support, please reach out to:

Author: Abhishek Singh
LinkedIn: My LinkedIn Profile
Portfolio: Abhishek Singh Portfolio

Framework versions PEFT 0.17.0