GGML files are for CPU + GPU inference using llama.cpp

How to run in llama.cpp

./main -t 10 -ngl 32 -m ggml-model-q8_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write JQL(Jira query Language) for give input ### Input: stories assigned to manthan which are created in last 10 days with highest priority and label is set to release ### Response:"

Change -t 10 to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8.

Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

Tto have a chat-style conversation, replace the -p <PROMPT> argument with -i -ins

How to run in text-generation-webui

Further instructions here: text-generation-webui/docs/llama.cpp-models.md.

How to run using LangChain

Instalation on CPU
pip install llama-cpp-python
Instalation on GPU
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool.
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_ctx=2048

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="./ggml-model-q8_0.bin",
    n_gpu_layers=n_gpu_layers, n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
    n_ctx=n_ctx
)

llm("""### Instruction:
Write JQL(Jira query Language) for give input

### Input:
stories assigned to manthan which are created in last 10 days with highest priority and label is set to release

### Response:""")

For more information refer LangChain

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train ManthanKulakarni/JQL_LLaMa_GGML