Octocode 0.15.0: Local-First by Default, Hybrid Search Everywhere

The first time you ran octocode index on a fresh machine, you got a config error. Required: a Voyage API key. Fine for production teams who already have one — annoying for everyone trying Octocode for the first time. And honestly, not the right default for a tool that should run anywhere.

0.15.0 fixes that. No API key required to get a useful, fast, production-quality index. Hybrid search and reranking are on by default. And the structural grep that ships with Octocode now recovers automatically from the wrong patterns LLMs typically generate — instead of returning zero results and a shrug.

42 commits since 0.14.1. Here's what matters.

No API Key Required — Local Models Are Now the Default

The default config.toml in 0.15.0 runs entirely offline:

[embedding]
code_model = "fastembed:jinaai/jina-embeddings-v2-base-code"   # 768-d, purpose-built for code
text_model = "fastembed:nomic-ai/nomic-embed-text-v1.5"        # 8K context, Apache-2.0

[search.reranker]
enabled = true
model   = "fastembed:jina-reranker-v2-base-multilingual"

[search.hybrid]
enabled = true
default_vector_weight  = 0.6
default_keyword_weight = 0.4

Models download to system cache on first use. After that, everything runs locally — indexing, search, reranking. No network calls, no rate limits, no monthly bill.

Voyage, Cohere, and Jina API options are still there, commented out in the template. If you already have API keys configured, they still work exactly as before. But the default experience is now: clone repo, run octocode index, get results. That's it.

One thing to set expectations on: first octocode index after upgrading will pull down jina-embeddings-v2-base-code and nomic-embed-text-v1.5 — a few hundred MB total. One-time download, then it's cached.

Hybrid Search Is On by Default — Here's What Changed

Octocode used to pick between dense vector search and BM25. 0.15.0 fuses them on every query using Weighted Reciprocal Rank Fusion, running inside LanceDB.

The practical difference: dense vectors lose against BM25 on identifier-heavy queries like "find parse_remote". BM25 loses on paraphrased intent like "function that handles remote pull setup." RRF combines them so neither approach starves the other. You get the best of both.

The fusion weights are exposed:

# Code search project — identifiers dominate
default_vector_weight  = 0.3
default_keyword_weight = 0.7

# Long-form docs — semantic intent dominates
default_vector_weight  = 0.8
default_keyword_weight = 0.2

One housekeeping note: the old per-field BM25 weights (keyword_path_weight, keyword_content_weight, etc.) are gone. FTS runs over a single enriched content column now, so those knobs had no effect anyway. Old configs with those keys will deserialize fine — Serde ignores unknown fields — but the values aren't used. Replace with default_vector_weight / default_keyword_weight if you had anything custom.

The reranker running by default means the first search after upgrade downloads the fastembed reranker model. If you want to skip it: [search.reranker] enabled = false.

Structural Grep No Longer Returns Blank Results

LLMs get the node kind wrong a lot. Python uses function_definition, not function_declaration. Rust uses function_item. The LLM picks whichever looks natural, gets zero results, and there's no hint why.

Now when a pattern matches nothing, Octocode automatically tries progressively looser interpretations — and knows the right kind per language. function_declaration in Python becomes function_definition. func, fn, function all resolve correctly, regardless of what the LLM typed.

When nothing works after all that, you get a useful error instead of silence:

"In Python use function_definition, not function_declaration"

The other fix: large matches — whole class bodies, big function blocks — used to dump everything and blow the context window. Now they show the first few lines and a summary:

src/foo.rs:42:  pub fn handle_request(req: Request) -> Result<Response> {
                    let user = authenticate(&req)?;
                    let payload = req.json()?;
                    if !validate(&payload) {
... (24 more lines)

Short matches pass through unchanged.

GraphRAG Now Tracks Inheritance and Implementation

GraphRAG extracts a call graph between files and functions — and from 0.15.0, it also extracts extends and implements relationships. Across 9 languages: C++, Go, Java, JavaScript, TypeScript, PHP, Python, Ruby, Rust.

Each function/class entry now carries:

extends — superclasses, parent traits/interfaces, Go struct embedding
implements — interface implementations, trait satisfactions, Rust impl Trait for Type

These become edges in the graph. Type names are normalized — std::collections::HashMap<K, V>, com.example.Foo, Foo<T> all collapse to the bare receiver name for cross-file resolution.

octocode graphrag get-relationships --node_id src/auth/middleware.rs

Now returns inheritance and impl edges alongside the existing imports and calls graph. An AI agent can ask "who implements Validator?" or "which classes extend BaseHandler?" without scanning files manually.

Export and Import Your Index

Moving a codebase to a new machine used to mean re-running the full embedder. With a large repo and a local model, that's not fast.

# On machine A
$ octocode export
Exported 142.30 MB
/Users/dk/Work/myproject/octocode-abc123-20260520-203708.tar.zst

# Transfer the file, then on machine B
$ cd /path/to/same/project
$ octocode import octocode-abc123-20260520-203708.tar.zst

Format is tar.zst (multi-threaded zstd level 3). The archive includes both the main index and branch overlays, excludes lock files and transient state, and carries a provenance marker so import validates before extracting. Import is atomic — extracts to a sibling temp dir and renames in one operation. No partial state on failure.

Export takes the same index lock as the indexer and MCP server, so concurrent operations wait rather than race.

Everything Else

The headlines above cover the biggest changes. Here's the rest:

Branch delta coherence — branch overlays no longer silently apply old override mappings on top of a moved main index. A new MasterState reconciler checks before every branch index and re-indexes the main store when its commit doesn't match. When the overlay can't be trusted, search and GraphRAG skip it with a visible warning instead of returning stale results.

Methods carry their parent type — method chunks from inside an impl, class, or module now include the owning type's name in their symbol list. BM25 and dense retrieval both hit the method chunk directly when you search for Suppression.mark_set or Foo.bar. Rust trait impls (impl Trait for Type) surface both Trait and Type.

Better class chunking — large classes in Python, TypeScript, C++, and Ruby now index methods individually, matching how Rust/JS/PHP/Java already worked. No more "everything in one giant chunk" results on large class bodies.

Multi-query cap raised 5 → 10 — MAX_QUERIES doubled. Agents that batched into two passes can now send a single request.

C++20 module extensions — .cppm, .ixx, .mxx, .ccm, .cxxm, .cc, .cxx, .c++, .hxx are now recognized and indexed as code blocks.

Expanded text file coverage — yaml, toml, dockerfile, makefile, ini, conf, env, xml, html, sql, csv, tsv, log are now indexed as text blocks. JSON, CSS, and Bash keep their dedicated tree-sitter paths.

DB maintenance runs automatically — every indexing run ends with optimize_tables(). Repeated incremental indexing used to leave growing unindexed tails the query path had to brute-force scan. Search stays fast as the database grows.

Static binaries everywhere — release pipeline now produces statically-linked ONNX Runtime binaries for x86_64-unknown-linux-musl, aarch64-unknown-linux-musl, glibc Linux, macOS x86_64, and Windows x64. No more "missing libonnxruntime.so" on Alpine containers.

Smaller binary, faster embeddings — release profile switched from opt-level = 3 to opt-level = "z". The binary is meaningfully smaller — and counterintuitively, local embedding runs got faster too. Smaller code fits better in CPU instruction cache, which matters more for this workload than aggressive inlining does.

Cross-modality fusion for --mode all — the old fixed 1/3 split across code/text/docs is replaced with per-modality RRF. Each input list contributes by rank within its own list. Better mix that adapts to where the matches actually are, with no scale-incompatibility issues between embedding models.

Upgrade Path

Back up your existing index first: octocode export

Upgrade to 0.15.0:

# Homebrew
brew upgrade muvon/tap/octocode

# Universal installer
curl -fsSL https://raw.githubusercontent.com/Muvon/octocode/master/install.sh | sh

Decide on models: accept the new local defaults (no config changes needed), or paste your previous [embedding] and [search.reranker] config back to keep Voyage/Cohere/Jina
If you want a clean reindex (recommended — new chunking won't apply to already-indexed files otherwise): octocode clear && octocode index
Optional: tune default_vector_weight / default_keyword_weight if your project is identifier-heavy vs docs-heavy

If you were relying on pure vector ranking for reproducibility against the 0.14.x baseline, set [search.hybrid] enabled = false. Existing indexes built with voyage-code-3 will need re-indexing if you switch to local models — if you want to keep Voyage, set the models back explicitly in config.toml.

Octocode is open source (Apache 2.0) at github.com/Muvon/octocode. It's what powers code search inside Octomind — the MCP server is how they talk.

No API Key Required — Local Models Are Now the Default

Hybrid Search Is On by Default — Here's What Changed

Structural Grep No Longer Returns Blank Results

GraphRAG Now Tracks Inheritance and Implementation

Export and Import Your Index

Everything Else

Upgrade Path

Related Articles

Give an AI Agent a Filesystem Without Giving It Your Whole Filesystem

Running One AI Agent Across Many Models: A Multi-Model Routing Guide for Octomind

AI Agent Memory Without the Noise: Scoping and Forgetting with Octobrain