Csaba  Kecskemeti's picture

Csaba Kecskemeti PRO

csabakecskemeti

AI & ML interests

None yet

Recent Activity

Organizations

Zillow's profile picture DevQuasar's profile picture Hugging Face Party @ PyTorch Conference's profile picture Intelligent Estate's profile picture open/ acc's profile picture

csabakecskemeti's activity

posted an update 9 days ago
reacted to fdaudens's post with 😎 9 days ago
view post
Post
2087
🔊 Meet Kokoro Web - Free, ML speech synthesis on your computer, that'll make you ditch paid services!

28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.

Test it with full articles—sounds amazingly human! 🎯🎙️

Xenova/kokoro-web
replied to their post 19 days ago
replied to their post 19 days ago
view reply

Yes I did an tried it with chrome canary.
(I even have a demo page that utilizes it but now can’t recall the name will share later)

It’s working fine but :

  • still not available just in experimental chrome
  • How about different browsers
  • you’ve locked in with one model
  • what is you hosting your local AI on another local machine

All in all the chrome built it AI provides less flexibility on my view.

Appreciate the comment though

replied to their post 20 days ago
view reply

This is obviously a prototype.
Security is a big concern here, but is believe it’s possible to put together a proxy that is safe and does not allow anything else than forward generate requests between browser and local llm.

posted an update 21 days ago
view post
Post
1850
Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
  • 4 replies
·
replied to their post 24 days ago
replied to their post 24 days ago
view reply

restarted the space, and regarding the speed I found forgot to offload the model to gpu :D
try now

replied to their post 24 days ago
view reply

Here you can try
https://huggingface.co/spaces/DevQuasar/Mi50

Bust something seems off with my network or with HF everything is very slow.

When llama benched the model I've get 60t/s on the mi50.
Anyway you can try it.

ROCR_VISIBLE_DEVICES=0 build/bin/llama-bench -m ~/Downloads/DevQuasar-R1-Uncensored-Llama-8B.Q8_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon VII, compute capability 9.0, VMM: no

model size params backend ngl test t/s
llama 8B Q8_0 7.95 GiB 8.03 B ROCm 99 pp512 416.30 ± 0.07
llama 8B Q8_0 7.95 GiB 8.03 B ROCm 99 tg128 60.13 ± 0.02
posted an update 25 days ago
replied to their post 27 days ago
view reply

Here is the full result or the re-executed evaluation on deepseek-ai/DeepSeek-R1-Distill-Llama-8B with the suggested gen args.

mytable2.png

I see some marginal changes in the scores but not much. If this is true the original Llama 3.1 8B wins more test than the Deepseek R1 distilled. I'm not sure what is going on. If anyone can perform the eval, please share your results.
Again I can be totally wrong here.

Full result data (results with 2025-01-26 date)
https://github.com/csabakecskemeti/lm_eval_results/blob/main/deepseek-ai__DeepSeek-R1-Distill-Llama-8B/results_2025-01-26T22-29-00.931915.json

Eval command:
accelerate launch -m lm_eval --model hf --model_args pretrained=deepseek-ai/DeepSeek-R1-Distill-Llama-8B,parallelize=True,dtype="float16" --tasks hellaswag,leaderboard_gpqa,leaderboard_ifeval,leaderboard_math_hard,leaderboard_mmlu_pro,leaderboard_musr,leaderboard_bbh --batch_size auto:4 --log_samples --output_path eval_results --gen_kwargs temperature=0.6,top_p=0.95,do_sample=True

Eval output:
hf (pretrained=deepseek-ai/DeepSeek-R1-Distill-Llama-8B,parallelize=True,dtype=float16), gen_kwargs: (temperature=0.6,top_p=0.95,do_sample=True), limit: None, num_fewshot: None, batch_size: auto:4 (1,16,64,64)

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5559 ± 0.0050
none 0 acc_norm 0.7436 ± 0.0044
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.8080 ± 0.0250
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.5508 ± 0.0365
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.4240 ± 0.0313
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.2240 ± 0.0264
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5200 ± 0.0317
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2360 ± 0.0269
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.4840 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.3240 ± 0.0297
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.4200 ± 0.0313
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.4040 ± 0.0311
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.6880 ± 0.0294
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.6240 ± 0.0307
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.4040 ± 0.0311
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.2945 ± 0.0379
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.4120 ± 0.0312
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.3440 ± 0.0301
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5112 ± 0.0376
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.4880 ± 0.0317
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2080 ± 0.0257
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1800 ± 0.0243
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1040 ± 0.0193
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3400 ± 0.0300
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.4880 ± 0.0317
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2879 ± 0.0323
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.3004 ± 0.0196
- leaderboard_gpqa_main 1 none 0 acc_norm 0.3036 ± 0.0217
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.4556 ± N/A
none 0 inst_level_strict_acc 0.4400 ± N/A
none 0 prompt_level_loose_acc 0.3087 ± 0.0199
none 0 prompt_level_strict_acc 0.2957 ± 0.0196
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.4821 ± 0.0286
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.2033 ± 0.0364
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.2197 ± 0.0362
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.0750 ± 0.0158
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.4026 ± 0.0396
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.4508 ± 0.0359
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.0963 ± 0.0255
leaderboard_mmlu_pro 0.1 none 5 acc 0.2741 ± 0.0041
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5200 ± 0.0317
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.3086 ± 0.0289
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3120 ± 0.0294
replied to their post 28 days ago
view reply

I've rerun hellaswag with the suggested config, the results haven't improved:

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5559 ± 0.0050
none 0 acc_norm 0.7436 ± 0.0044

command:
accelerate launch -m lm_eval --model hf --model_args pretrained=deepseek-ai/DeepSeek-R1-Distill-Llama-8B,parallelize=True,dtype="float16" --tasks hellaswag --batch_size auto:4 --log_samples --output_path eval_results --gen_kwargs temperature=0.6,top_p=0.95,generate_until=64,do_sample=True

replied to their post 28 days ago
view reply

I've missed this suggested configuration from the model card:
"For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1."

Thanks for @shb777 and @bin110 to pointing this out!

replied to their post 28 days ago
replied to their post 28 days ago
posted an update 29 days ago
view post
Post
2309
I've run the open llm leaderboard evaluations + hellaswag on deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.

If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results

Am I made some mistake, or (at least this distilled version) not as good/better than the competition?

I'll run the same on the Qwen 7B distilled version too.
·
posted an update about 1 month ago
reacted to mitkox's post with 🤗 about 2 months ago
view post
Post
2480
Can it run DeepSeek V3 671B is the new 'can it run Doom'.

How minimalistic can I go with on device AI with behemoth models - here I'm running DeepSeek V3 MoE on a single A6000 GPU.

Not great, not terrible, for this minimalistic setup. I love the Mixture of Experts architectures. Typically I'm running my core LLM distributed over the 4 GPUs.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
·
replied to mitkox's post about 2 months ago
view reply

Deepseek-V3-Base Q2_K

AMD Ryzen™ Threadripper™ 3970X × 64
ASUS ROG ZENITH II EXTREME ALPHA
256.0 GiB
NVIDIA GeForce RTX™ 3090 / NVIDIA GeForce RTX™ 3090 / NVIDIA GeForce RTX™ 4080

replied to singhsidhukuldeep's post about 2 months ago
view reply

seems it's happening:
ChatGPT
I've provided context that has no information about if Berlin is the capital of Germany, though my 'fake' source has been cited.
Screenshot 2025-01-08 at 3.26.35 PM.png