File size: 6,595 Bytes
d8c4315 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
library_name: rkllm
license: llama2
language:
- en
base_model:
- meta-llama/CodeLlama-7b-Instruct-hf
pipeline_tag: text-generation
tags:
- text-generation-inference
- rkllm
- rk3588
- rockchip
- edge-ai
- llm
- nextcoder
- code
- chat
---
# CodeLlama-7b-Instruct-hf — RKLLM build for RK3588 boards
**Author:** @jamescallander
**Source model:** [meta-llama/CodeLlama-7b-Instruct-hf · Hugging Face](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf)
**Target:** Rockchip RK3588 NPU via RKNN-LLM Runtime
> This repository hosts a **conversion** of `CodeLlama-7b-Instruct-hf` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
#### Conversion details
- RKLLM-Toolkit version: v1.2.1
- NPU driver: v0.9.8
- Python: 3.12
- Quantization: `w8a8_g128`
- Output: single-file `.rkllm` artifact
- Tokenizer: not required at runtime (UI handles prompt I/O)
## ⚠️ Code generation disclaimer
🛑 **This model may produce incorrect or insecure code.**
- It is intended for **research, educational, and experimental purposes only**.
- Always **review, test, and validate code outputs** before using them in real projects.
- Do not rely on outputs for production, security-sensitive, or safety-critical systems.
- Use responsibly and in compliance with the source model’s license and restrictions.
## Intended use
- On-device coding assistant / code generation on RK3588 SBCs.
- CodeLlama-7b-Instruct-hf is tuned for software development and programming tasks, making it suitable for **edge deployment** where privacy and low power use are priorities.
## Limitations
- Requires 9GB free memory
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
- Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
- Generated code should always be reviewed before use in production systems.
## Quick start (RK3588)
### 1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
Download and install the required packages as per the toolkit's instructions.
### 2) Simple Flask server deployment
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/CodeLlama-7b-Instruct-hf_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
```
### 3) Sending a request
A basic format for message request is:
```json
{
"model":"CodeLlama-7b-Instruct-hf",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
```
Example request using `curl`:
```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"CodeLlama-7b-Instruct-hf","messages":[{"role":"user","content":"Create a python function to calculate factorials using recursive method."}],"stream":false}'
```
The response is formated in the following way:
```json
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
```
Example response:
```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Here is an example of how you can create a Python function to calculate factorials using the recursive method: ``` def factorial(n): if n == 0: return 1 else: return n * factorial(n-1) ``` This function uses the recursive formula for factorials, which is `n! = n * (n-1)!`, to calculate the factorial of a given number `n`. The base case is when `n` is 0, in which case the result is 1. Otherwise, the function calls itself with `n-1` as the argument and multiplies the result by `n`. For example, if you call the function with `n = 5`, it will calculate the factorial of 5 using the recursive formula: ``` factorial(5) = 5 * factorial(4) = 5 * (4 * factorial(3)) = 5 * (4 * (3 * factorial(2))) = 5 * (4 * (3 * (2 * factorial(1)))) = 5 * (4 * (3 * (2 * 1))) = 5 * (4 * (3 * 2)) = 5 * (4 * 6) = 5 * 24 = 120 ``` So the result of `factorial(5)` is 120. You can also use a loop to calculate factorials, here is an example: ``` def factorial(n): result = 1 for i in range(1, n+1): result *= i return result ``` This function uses a loop to iterate from 1 to `n` and multiply the result by each number. The base case is when `n` is 0, in which case the result is 1. You can also use the built-in `math.factorial()` function, it will calculate the factorial of a given number using the recursive method: ``` from math import factorial print(factorial(5)) # output: 120 ``` It's worth noting that the recursive method can be slower than the loop method for large values of `n`, because it requires making multiple function calls. However, the recursive method is often more concise and easier to understand, especially for small values of `n`.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```
### 4) UI compatibility
This server exposes an **OpenAI-compatible Chat Completions API**.
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:
```json
{
"model": "CodeLlama-7b-Instruct-hf",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
```
# License
This conversion follows the license of the source model: [LICENSE.txt · meta-llama/Llama-2-7b-chat-hf at main](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt)
- -**Required notice:** see [`NOTICE`](NOTICE)
You must also comply with:
- [Responsible Use Guide](https://llama.meta.com/responsible-use-guide) |