make ggufs please.

by drmcbride - opened 4 days ago

Discussion

drmcbride

4 days ago

we need gguf!

Jeol

Jinx org 3 days ago

You can try to build gguf by yourself using this tool. It is online and does not need local resources. Have fun!

Best,
Jinx Team

CalvinZero

1 day ago

•

edited 1 day ago

I try run the Qwen3-235B-A22B-Thinking-2507 and the 32b made by Jinx-org, they both response without <think> but with </think>, I try quants version and f16 32b version, they all has this problem.

Jeol

Jinx org about 24 hours ago

Could you please describe your process step by step? For example, your environment setup, IDE version, the commands you're running, and the output you're seeing.

CalvinZero

about 23 hours ago

let me try:

on linux with ik_llama.cpp
python convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--Jinx-org--Jinx-Qwen3-235B-A22B-Thinking-2507/snaps
hots/fe1b7faefb33dd8d321eac938ed1db862e29035b --outfile Jinx-Qwen3-235B-A22B-Thinking-2507.gguf --outtype bf16
llama-server --jinja --threads 16 --threads-batch 32 --no-mmap -m Jinx-Qwen3-30B-A3B-Thinking-2507.gguf --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.01 -c 32768 -np 1 -fmoe -ub 4096 -b 4096
ask any question in browser , get response without <think>

Jeol

Jinx org about 23 hours ago

Could you please check if your downloaded transformer weight works correctly with this script?

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Jinx-org/Jinx-Qwen3-235B-A22B-Thinking-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)

drmcbride

about 23 hours ago

respectfully if people are having issues why don’t you just do the ggufs u will gather much more attention from local users if you do

CalvinZero

about 23 hours ago

hi @Jeol

I like to, but my device can not run Jinx-Qwen3-30B-A3B-Thinking-2507 or Jinx-org/Jinx-Qwen3-235B-A22B-Thinking-2507 safetensors type.

Sorry...

Jeol

Jinx org about 22 hours ago

•

edited about 22 hours ago

Hi @drmcbride , you are right. I should do this. I will add gguf of each model before the end of this week.

drmcbride

about 22 hours ago

i promise people are gonna use them and then people will talk about your jinx org

Jeol

Jinx org about 22 hours ago

To minimize the workload, I'd rather not run quantization for every possible setup. What's your preferred quantization approach, or do you have suggestions for the most effective configurations to prioritize?

CalvinZero

about 21 hours ago

•

edited about 21 hours ago

I use ik_llama.cpp to get best speed, you can try the Secret Recipe for ik_llama.cpp from https://huggingface.co/ubergarm/Qwen3-30B-A3B-Thinking-2507-GGUF

for most of people, they use normal llama.cpp or lm studio or ollama, the best options is unsloth version.

I guess k5_m is good for start, for bigger model like Jinx-org/Jinx-Qwen3-235B-A22B-Thinking-2507 K3/K2 could be useful for some small memory case.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment