Model Card for llm-jp-13b-instruct-full-jaster-dpo

This is a human preference optimized version of the native Japanese model llm-jp/llm-jp-13b-instruct-full-jaster-v1.0.

Model Details

Model type: transformer-based large language model

Total tokens seen: 300B

Parameters: 13B

Layers: 40

Hidden size: 5120

Heads: 40

Context length: 2048

Training

Pre-training:

Hardware: 96 A100 40GB GPUs (MDX cluster)

Software: Megatron-DeepSpeed

Instruction tuning:

Hardware: 8 A100 40GB GPUs (MDX cluster)

Software: TRL, PEFT, and DeepSpeed

Human Preference Alignment:

Hardware: Apple MPS device, M3 Max chip, 16-core CPU, 16-core neural engine, 40-core GPU / 128G unified memory

Software: PyTorch (on MPS), HugginFace Transformers, PEFT (version 0.8.2)

Tokenizer

The tokenizer of this model is based on huggingface/tokenizers unigram byte-fallback model. The vocabulary entries were converted from llm-jp-tokenizer v2.1 (50k). Please refer to README.md of llm-ja-tokenizer for details on the vocabulary construction procedure.

  • Model: Hugging Face Fast Tokenizer using Unigram byte-fallback model which requires tokenizers>=0.14.0
  • Training algorithm: SentencePiece Unigram byte-fallback
  • Training data: a subset of the datasets for model pre-training
  • Vocabulary size: 50,570 (mixed vocabulary of Japanese, English, and source code)

Model Description

This model was aligned with human preferences using an adapter approach from the PEFT library (https://github.com/huggingface/peft). The alignment was based on Direct Preference Optimization (https://arxiv.org/abs/2305.18290).

Training Data

The data used for DPO was a Japanese translation of of the original Anthropic Helpful-Harmless dataset (https://huggingface.co/datasets/Anthropic/hh-rlhf) for Reinforcement Learning from Human Feedback (https://arxiv.org/abs/2204.05862). The translation is available here: https://huggingface.co/datasets/shi3z/anthropic_hh_rlhf_japanese

Direct Use

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_name = "llmjp/llm-jp-13b-instruct-full-jaster-dpo"

model = AutoPeftModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer.encode("質問:日本の首都はどこですか?\n\n答え:", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Author

Stephen Fitz (https://huggingface.co/stephenfitz) for LLMJP (https://huggingface.co/llm-jp)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.