Text Generation
rwkv
causal-lm
ggml

RWKV-4 World GGML

This repository contains quantized conversions of the current RWKV-4 World checkpoints.

For use with frontends that support GGML quantized RWKV models, such as rwkv.cpp and KoboldCpp.

Last updated on 2023-09-28.

Description:

RAM USAGE

Model Starting RAM usage (KoboldCpp)
RWKV-4-World-0.1B.q4_0.bin 289.3 MiB
RWKV-4-World-0.1B.q4_1.bin 294.7 MiB
RWKV-4-World-0.1B.q5_0.bin 300.2 MiB
RWKV-4-World-0.1B.q5_1.bin 305.7 MiB
RWKV-4-World-0.1B.q8_0.bin 333.1 MiB
RWKV-4-World-0.1B.f16.bin 415.3 MiB
RWKV-4-World-0.4B.q4_0.bin 484.1 MiB
RWKV-4-World-0.4B.q4_1.bin 503.7 MiB
RWKV-4-World-0.4B.q5_0.bin 523.1 MiB
RWKV-4-World-0.4B.q5_1.bin 542.7 MiB
RWKV-4-World-0.4B.q8_0.bin 640.2 MiB
RWKV-4-World-0.4B.f16.bin 932.7 MiB
RWKV-4-World-1.5B.q4_0.bin 1.2 GiB
RWKV-4-World-1.5B.q4_1.bin 1.3 GiB
RWKV-4-World-1.5B.q5_0.bin 1.4 GiB
RWKV-4-World-1.5B.q5_1.bin 1.5 GiB
RWKV-4-World-1.5B.q8_0.bin 1.9 GiB
RWKV-4-World-1.5B.f16.bin 3.0 GiB

Notes:

  • rwkv.cpp [0df970a] was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
  • KoboldCpp [bc841ec] was used to test the model.

The original models can be found here, and the original model card can be found below.


RWKV-4 World

Model Description

RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).

World = Some_Pile + Some_RedPajama + Some_OSCAR + All_Wikipedia + All_ChatGPT_Data_I_can_find

XXXtuned = finetune of World on MC4, OSCAR, wiki, etc.

How to use:

The differences between World & Raven:

  • set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
  • use Question/Answer or User/AI or Human/Bot for chat. DO NOT USE Bob/Alice or Q/A

For 0.1/0.4/1.5B models, use fp32 for first layer (will overflow in fp16 at this moment - fixable in future), or bf16 if you have 30xx/40xx GPUs. Example strategy: cuda fp32 *1 -> cuda fp16

NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']

QA prompt (replace \n\n in xxx to \n):

Question: xxx

Answer:

and

Instruction: xxx

Input: xxx

Response:

A good chat prompt (replace \n\n in xxx to \n):

User: hi

Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

User: xxx

Assistant:
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Datasets used to train Crataco/RWKV-4-World-Series-GGML