Dev
GitHub repos gaining traction - what high-signal users are starring and what's climbing the board, captured daily and enriched from GitHub. Raw material for spotting new tech and patterns worth building on.
653
repos tracked
153
surfaced this week
141
created < 30d
Python
top language
49 repos
-
A feed-forward 3D foundation model for reconstructing scenes from streaming data
-
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
-
scikit-learn: machine learning in Python
-
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
-
Distributed High-Performance Symbolic Regression in Julia
-
Interactive error analysis skill for AI agents. Studies LLM trace datasets, builds a review UI, monitors annotations, categorizes failure modes, proposes new samples.
-
Faker is a pure Elixir library for generating fake data.
-
An open source multi-tool for exploring and publishing data
-
High-Performance Symbolic Regression in Python and Julia
-
A modern replacement for Redis and Memcached
-
Extract structured data from documents quickly and accurately.
-
Community extensions for TabPFN - the foundation model for tabular data. Built with TabPFN! 🤗
-
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
-
Python CLI utility and library for manipulating SQLite databases
-
ABC: Scalable Behavior Cloning with Open Data, Training, and Evaluation
-
Advanced permission management for Datasette
-
Refiner by Macrodata Labs, a data processing framework for Machine Learning large scale datasets
-
A public-domain dataset of prompts and scenarios for evaluating compliance with the OpenAI Model Spec.
-
An LLM-powered agent for Datasette
-
Apps that live inside Datasette
-
Repo to the paper "Lie Point Symmetry Data Augmentation for Neural PDE Solvers"
-
The official project website for Datasette
-
Measuring frontier coding agents on original, long-horizon engineering tasks
-
HTTP SQLite scale-to-zero database on the edge built on Cloudflare Durable Objects.
-
A system for agentic LLM-powered data processing and ETL
-
Nymeria: a massive collection of multimodal egocentric daily motion in the wild
-
The new data-free filesystem!
-
Data and software for building the ACL Anthology.
-
[ICML '26] Code repo for the paper entitled "Convex Dataset Valuation for Post-Training" at ICML 2026.
-
Data Journalist Agent: Transforming Data into Verifiable Multimodal Story
-
Machine learning with dataframes
-
Datasette plugin for authenticating access using API tokens
-
Lens is a 3.8B-parameter text-to-image diffusion model that achieves quality competitive with and in several cases surpassing models like FLUX and SD3, while requiring significantly less training compute. Key ideas include maximizing data information density per batch and accelerating convergence.
-
🌌 A very small graph database in Zig
-
O*NET pipeline + occupation-linked dataset for all 174 Seinfeld episodes — reproducibility repo for 'A show about nothing is a show about the AGI economy' (newyorkreviewofjobs.com)
-
Search infrastructure for AI
-
egraphs + datalog!
-
Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
-
LLM Architecture Gallery source data
-
A library of extension and helper modules for Python's data analysis and machine learning libraries.
-
GDM Science Skills to speed up agentic scientific workflows with better grounding and higher token efficiency. Integrate insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools.
-
Data Science Skills for AI agents like Claude Code
-
⚡ TabPFN: Foundation Model for Tabular Data ⚡
-
Add secure sharing, document tracking, and workflow automation on top of your storage (Nextcloud, Google Drive, Dropbox). Self-hosted DocSend alternative.
-
ocrscout is a toolkit for frontier OCR models that allows you to run, evaluate and profile OCR models on your own data and compute infrastructure
-
High-quality search for AI-native applications.