memlocal
your AI's memory belongs on your device.
your application
agent, assistant, or app runtime
llm extraction
sensory buffer
ring buffer · ttl 5s · capacity 100 · noise filter
context assembly
working memory
flat single-hop or sectioned multi-hop context
key facts → top evidence → raw excerpts → session context
store / retrieve
long-term memory
on a smartphone, you cannot justify running pinecone, elasticsearch, and neo4j just to give an agent memory. CozoDB collapses graph, vector, full-text, and relational storage into a single embedded engine.
how a query is answered
- the raw query enters the retrieval pipeline.
- a query classifier determines whether the question is single-hop, multi-hop, temporal, or open-ended.
- six retrieval channels run in parallel: per-type semantic search, bm25 keyword matching, recursive graph traversal, triple fts, session-window expansion, and speaker-filtered search.
- all candidates are pooled, deduplicated, and reranked with a cross-encoder.
- the top-ranked items are assembled into a context block that adapts to query complexity.
- the context is injected into the llm prompt so most questions can be answered in a single call.
status
memlocal is under active development. the rust core compiles to shared libraries, with the flutter sdk available now and more native bindings planned.
open source under the apache license 2.0.
if you care about local-first ai, private-by-default memory, or just think agents should remember things without phoning home, the project is being built in the open.