GGML files are for CPU + GPU inference using llama.cpp
How to run in llama.cpp
./main -t 10 -ngl 32 -m ggml-model-q8_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write JQL(Jira query Language) for give input ### Input: stories assigned to manthan which are created in last 10 days with highest priority and label is set to release ### Response:"
Change -t 10
to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8
.
Change -ngl 32
to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
Tto have a chat-style conversation, replace the -p <PROMPT>
argument with -i -ins
How to run in text-generation-webui
Further instructions here: text-generation-webui/docs/llama.cpp-models.md.
How to run using LangChain
Instalation on CPU
pip install llama-cpp-python
Instalation on GPU
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool.
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_ctx=2048
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Make sure the model path is correct for your system!
llm = LlamaCpp(
model_path="./ggml-model-q8_0.bin",
n_gpu_layers=n_gpu_layers, n_batch=n_batch,
callback_manager=callback_manager,
verbose=True,
n_ctx=n_ctx
)
llm("""### Instruction:
Write JQL(Jira query Language) for give input
### Input:
stories assigned to manthan which are created in last 10 days with highest priority and label is set to release
### Response:""")
For more information refer LangChain
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.