--- library_name: rkllm license: llama2 language: - en base_model: - meta-llama/CodeLlama-7b-Instruct-hf pipeline_tag: text-generation tags: - text-generation-inference - rkllm - rk3588 - rockchip - edge-ai - llm - nextcoder - code - chat --- # CodeLlama-7b-Instruct-hf — RKLLM build for RK3588 boards **Author:** @jamescallander **Source model:** [meta-llama/CodeLlama-7b-Instruct-hf · Hugging Face](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf) **Target:** Rockchip RK3588 NPU via RKNN-LLM Runtime > This repository hosts a **conversion** of `CodeLlama-7b-Instruct-hf` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com) #### Conversion details - RKLLM-Toolkit version: v1.2.1 - NPU driver: v0.9.8 - Python: 3.12 - Quantization: `w8a8_g128` - Output: single-file `.rkllm` artifact - Tokenizer: not required at runtime (UI handles prompt I/O) ## ⚠️ Code generation disclaimer 🛑 **This model may produce incorrect or insecure code.** - It is intended for **research, educational, and experimental purposes only**. - Always **review, test, and validate code outputs** before using them in real projects. - Do not rely on outputs for production, security-sensitive, or safety-critical systems. - Use responsibly and in compliance with the source model’s license and restrictions. ## Intended use - On-device coding assistant / code generation on RK3588 SBCs. - CodeLlama-7b-Instruct-hf is tuned for software development and programming tasks, making it suitable for **edge deployment** where privacy and low power use are priorities. ## Limitations - Requires 9GB free memory - Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream. - Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions. - Generated code should always be reviewed before use in production systems. ## Quick start (RK3588) ### 1) Install runtime The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip). Download and install the required packages as per the toolkit's instructions. ### 2) Simple Flask server deployment The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo` ```bash python3 /rknn-llm/examples/rkllm_server_demo/flask_server.py \ --rkllm_model_path /CodeLlama-7b-Instruct-hf_w8a8_g128_rk3588.rkllm \ --target_platform rk3588 ``` ### 3) Sending a request A basic format for message request is: ```json { "model":"CodeLlama-7b-Instruct-hf", "messages":[{ "role":"user", "content":""}], "stream":false } ``` Example request using `curl`: ```bash curl -s -X POST :8080/rkllm_chat \ -H 'Content-Type: application/json' \ -d '{"model":"CodeLlama-7b-Instruct-hf","messages":[{"role":"user","content":"Create a python function to calculate factorials using recursive method."}],"stream":false}' ``` The response is formated in the following way: ```json { "choices":[{ "finish_reason":"stop", "index":0, "logprobs":null, "message":{ "content":", "role":"assistant"}}], "created":null, "id":"rkllm_chat", "object":"rkllm_chat", "usage":{ "completion_tokens":null, "prompt_tokens":null, "total_tokens":null} } ``` Example response: ```json {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Here is an example of how you can create a Python function to calculate factorials using the recursive method: ``` def factorial(n): if n == 0: return 1 else: return n * factorial(n-1) ``` This function uses the recursive formula for factorials, which is `n! = n * (n-1)!`, to calculate the factorial of a given number `n`. The base case is when `n` is 0, in which case the result is 1. Otherwise, the function calls itself with `n-1` as the argument and multiplies the result by `n`. For example, if you call the function with `n = 5`, it will calculate the factorial of 5 using the recursive formula: ``` factorial(5) = 5 * factorial(4) = 5 * (4 * factorial(3)) = 5 * (4 * (3 * factorial(2))) = 5 * (4 * (3 * (2 * factorial(1)))) = 5 * (4 * (3 * (2 * 1))) = 5 * (4 * (3 * 2)) = 5 * (4 * 6) = 5 * 24 = 120 ``` So the result of `factorial(5)` is 120. You can also use a loop to calculate factorials, here is an example: ``` def factorial(n): result = 1 for i in range(1, n+1): result *= i return result ``` This function uses a loop to iterate from 1 to `n` and multiply the result by each number. The base case is when `n` is 0, in which case the result is 1. You can also use the built-in `math.factorial()` function, it will calculate the factorial of a given number using the recursive method: ``` from math import factorial print(factorial(5)) # output: 120 ``` It's worth noting that the recursive method can be slower than the loop method for large values of `n`, because it requires making multiple function calls. However, the recursive method is often more concise and easier to understand, especially for small values of `n`.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}} ``` ### 4) UI compatibility This server exposes an **OpenAI-compatible Chat Completions API**. You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com)) - Configure your client with the API base: `http://:8080` and use the endpoint: `/rkllm_chat` - Make sure the `model` field matches the converted model’s name, for example: ```json { "model": "CodeLlama-7b-Instruct-hf", "messages": [{"role":"user","content":"Hello!"}], "stream": false } ``` # License This conversion follows the license of the source model: [LICENSE.txt · meta-llama/Llama-2-7b-chat-hf at main](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) - -**Required notice:** see [`NOTICE`](NOTICE) You must also comply with: - [Responsible Use Guide](https://llama.meta.com/responsible-use-guide)