Dev
GitHub repos gaining traction - what high-signal users are starring and what's climbing the board, captured daily and enriched from GitHub. Raw material for spotting new tech and patterns worth building on.
653
repos tracked
153
surfaced this week
141
created < 30d
Python
top language
20 repos
-
A high-throughput and memory-efficient inference and serving engine for LLMs
-
LLM inference in C/C++
-
Foundation Models API for llama.cpp
-
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.
-
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
-
Vim plugin for LLM-assisted code/text completion
-
TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
-
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
-
LLM inference in C/C++
-
SGLang is a high-performance serving framework for large language models and multimodal models.
-
Throwaway staging: validate CUDA 13.3 prebuilt build leg
-
LLM inference in C/C++
-
Port of Facebook's LLaMA model in C/C++
-
LLM inference in C/C++
-
LLM inference in C/C++
-
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
-
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
-
LLM inference in C/C++
-
A fast, helpful, and open-source document parser