LlamaHub: upgraded llama.cpp GUI + MCP tools + local-first RAG (looking for community collab) #17952

jbulger82 · 2025-12-12T07:11:14Z

jbulger82
Dec 12, 2025

I’ve been hacking on an upgraded GUI / command-center for llama.cpp and I think it’s finally in a good place to share.

Repo: https://github.com/jbulger82/LLAMA_Hub

What it is

LlamaHub is a local-first UI built around llama-server with a focus on speed, tooling, and workflow (not just “a chat box”). I’ve been using it as my daily driver and I’m getting excellent inference speed with OSS 20B and other models.

Highlights

Fast local inference (llama.cpp first-class; OpenAI-compatible endpoints optional)

Local embeddings + RAG (you can keep retrieval local even if you use cloud models for reasoning)

MCP tools stack built-in (and it’s actually usable from the UI)

Multi-agent workflows (I tested a setup where a local model orchestrated multiple cloud Gemini instances while embeddings stayed local)

Lots of UI polish (themes, layout controls, etc. — I wanted it to feel “daily-usable”)

What I’m looking for

I’d really love help from the llama.cpp community to:

fix logic in the smart canvas

working on model launching

polish rough edges / improve UX

sanity-check architecture choices

tighten setup docs and defaults

test on more hardware + OS combos

I’m specifically hoping to keep this a community-driven collab (I’m not interested in someone repackaging it and selling it as a “premium AI app”).

Best 3 Launch commands (what I’m using currently)

This one kicks ass in the openai codex cli !!
jeff@jeff-STGAUBRON:~$ /home/jeff/llama-b6962-bin-ubuntu-vulkan-x64/build/bin/llama-server -m "/home/jeff/Desktop/models/gpt-oss-20b-Q4_K_M.gguf" -ngl 99 -c 131072 --parallel 1 --host 0.0.0.0 --port 8082 -b 2056 -ub 256 -fa auto --temp 1.0 --top-p 0.9 --top-k 40 --repeat-penalty 1.1 --repeat-last-n 200 --cache-type-k q8_0 --cache-type-v q8_0 --mlock --threads 8 --threads-batch 8 --chat-template-kwargs '{"reasoning_effort": "high"}' --jinja

Not great in the codex but KICKASS in llamahub. (custom chat template i made in the repo) insane function/tool calling!!
jeff@jeff-STGAUBRON:~$ /home/jeff/llama-b6962-bin-ubuntu-vulkan-x64/build/bin/llama-server -m "/home/jeff/Desktop/models/gpt-oss-20b-gpt-5-codex-distill.F16.gguf" -ngl 99 -c 131072 --parallel 1 --host 0.0.0.0 --port 8082 -b 2056 -ub 256 -fa auto --temp 1.0 --top-p 1.0 --top-k 40 --repeat-penalty 1.0 --repeat-last-n 200 --cache-type-k q8_0 --cache-type-v q8_0 --mlock --threads 24 --threads-batch 12 --chat-template-file "/home/jeff/Desktop/models/francine_oss.jinja.txt" --jinja

The best embedding model i've used to date!
jeff@jeff-STGAUBRON:~/build-cpu/bin$ /home/jeff/build-cpu/bin/llama-server --embedding -m "/home/jeff/Desktop/models/qwen3-embedding-0.6b-q4_k_m.gguf" -c 8192 -b 512 --parallel 1 --host 0.0.0.0

525654787-b77fbab4-8a36-4760-a250-667dac7b16cf

525653131-0da4d46a-2cdc-4fd2-806d-da24a32189ec

525653769-eacd0621-95a6-4763-95a7-7f88f81a518e

If you try it and hit issues, open a GitHub issue with your OS + GPU + llama.cpp build + model + command line flags and I’ll do my best to reproduce.

— Jeff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LlamaHub: upgraded llama.cpp GUI + MCP tools + local-first RAG (looking for community collab) #17952

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

LlamaHub: upgraded llama.cpp GUI + MCP tools + local-first RAG (looking for community collab) #17952

Uh oh!

Uh oh!

jbulger82 Dec 12, 2025

Replies: 0 comments

jbulger82
Dec 12, 2025