main-horse
/

UI-TARS-72B-SFT-Q4_K_M-GGUF

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

UI-TARS-72B-SFT-Q4_K_M-GGUF / README.md

main-horse's picture

Update README.md

1c40e9c verified about 1 month ago

|

history blame contribute delete

1.7 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: image-text-to-text
	tags:
	- multimodal
	- gui
	- llama-cpp
	- gguf-my-repo
	library_name: transformers
	base_model: bytedance-research/UI-TARS-72B-SFT
	---

	note: most qwen2 weights aren't divisible by 256, so this is really a q8/q5 quant.

	# main-horse/UI-TARS-72B-SFT-Q4_K_M-GGUF
	This model was converted to GGUF format from [`bytedance-research/UI-TARS-72B-SFT`](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT) using llama.cpp.
	Refer to the [original model card](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT) for more details on the model.

	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo main-horse/UI-TARS-72B-SFT-Q4_K_M-GGUF --hf-file UI-TARS-72B-SFT.Q4_K_M.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo main-horse/UI-TARS-72B-SFT-Q4_K_M-GGUF --hf-file UI-TARS-72B-SFT.Q4_K_M.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	```
	Step 2: Build using CMake.
	```
	cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_F16=1 -DGGML_CUDA_FA_ALL_QUANTS=1 -DCMAKE_CUDA_ARCHITECTURES=...
	cmake --build build --config Release -j
	```
	Step 3: Run inference through the main binary.
	```
	./llama-server --hf-repo main-horse/UI-TARS-72B-SFT-Q4_K_M-GGUF --hf-file UI-TARS-72B-SFT.Q4_K_M.gguf -c 2048
	```