In October 2025, a single GitHub commit dropped 47 LLM-generated "skills" into a popular agents repository. One of them taught Claude to run npx react-codeshift. The package didn't exist. It never existed. Claude had hallucinated it during training, the author hadn't checked, and within weeks the file had spread to 237 forks. Real developers ran the command in real terminals. npm happily attempted to resolve a phantom package thousands of times.

This is the part of the AI-agent story that doesn't make the keynote. Agents drift. Given a shell and a vague task, they will reach for the most plausible-sounding command, even when it doesn't exist — and the more general the tool you give them, the more they drift.

The standard answer in 2026 is: write a custom MCP server. Define the actions you want the agent to take, give them strict schemas, ship them as a separate binary, register them in the agent's config. Constrain the model by giving it specific tools instead of bash.

The standard answer is right in principle and wrong in practice. Writing a real MCP server — JSON-RPC framing, stdio handshake, schema definitions, separate dependency tree — for one shell command is engineering theater. Most teams give up and live with the drift.

Octomind 0.29.0 closes that gap. A custom MCP shouldn't be a project. It should be a file in your repo.


The Drift Problem, Backed by Numbers

If you've watched an agent run for a while, you've seen drift:

  • It runs npm test in a pnpm monorepo.
  • It curls https://staging.example.com/api instead of your real, auth-gated https://api-staging.acme.internal/v3.
  • It tries to install a package that's almost but not quite the right name.
  • It uses raw git tag instead of your team's release script.

The May 2025 paper RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection (Gan & Sun) ran an MCP stress test and found that baseline tool-selection accuracy collapsed to 13.62% at scale — the model could no longer reliably pick the right tool because every tool description it had to consider crowded the prompt and diluted attention. Their retrieval-based fix more than triples that to 43.13% while cutting prompt tokens by over 50%. Still terrible accuracy, but the direction of the curve is the point: the more tools you cram in, the worse the model gets at picking from them.

Anthropic, who authored the MCP protocol, conceded the same point on November 4, 2025:

"In cases where agents are connected to thousands of tools, they'll need to process hundreds of thousands of tokens before reading a request."

Their own benchmarks showed a Drive → Salesforce flow costing 150,000 tokens under naive MCP loading. With code-execution and on-demand tool discovery, the same flow ran in 2,000 tokens — a 98.7% reduction. Their follow-up Tool Search Tool (API version 20251119) defers tool definitions until they're needed; the documentation states this typically cuts context bloat by over 85% and notes that "Claude's ability to correctly pick the right tool degrades significantly once you exceed 30–50 available tools."

The lesson everyone is converging on: fewer, narrower, project-relevant tools beat a giant catalog every time. The model performs better. The token bill drops. The drift drops with it.

So the right question isn't "how do I write more MCPs?" It's "how do I make the ones I actually need cost nothing to write and ship?"


Why "Just Write an MCP Server" Doesn't Scale

A February 2026 Bloomberry analysis of 1,412 MCP servers found that 38.7% of them ship with no authentication at all, the median server exposes only five tools, and roughly half of the companies publishing them don't have a public API in the first place. Bloomberry's framing: "MCP might not be a wrapper on an existing API. It might be the first machine-readable interface they've ever shipped." A lot of the explosion is shipping by people figuring it out in production.

The problem isn't that the protocol is broken. It's that the unit of work is wrong for project-specific actions. Writing a Python or Node MCP server with stdio framing, schema registration, and a separate deployment pipeline is the right shape for "GitHub API" or "Postgres database." It's the wrong shape for "run our staging deploy script" — a piece of code that is forty lines of bash, already in bin/deploy, that you'd like the agent to call by name instead of guessing.

Practitioners are arriving at the same conclusion. From the Hacker News thread "MCP explained without hype or fluff" (item 44063141), commenter 0x457:

"Many of MCPs shouldn't have existed, IMO. […] I wanted amazon q to interact with github, so I added to its context that it can use gh-cli and it it worked pretty well. […] To make its and mine life easier, common things were saved as scripts in bin/ and Justfile (also generated by it). […] Using github's official MCP bricks chat session every time for me."

And dvt:

"MCP is bloated AI hype that basically solves nothing […]. It's APIs talking to APIs that talk to other APIs."

Both comments are correct about the cases they describe. CLIs work. Local scripts in bin/ work. Wrapping a one-line shell command in a 200-line MCP server because that's the only blessed pattern is what's broken.


What an MCP Tool Actually Is

Strip the protocol away and an MCP tool is three things:

  1. A name and description so the model knows when to reach for it.
  2. A parameter schema so the model knows what to pass.
  3. An executable that does the work.

Everything else — JSON-RPC framing, stdio handshake, capability negotiation — is runtime concern, not author concern. The 2023 paper Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models (Hsieh et al.) showed empirically that for unfamiliar tools, clear documentation beats few-shot demonstrations — and that "zero-shot prompts with only tool documentation" can match few-shot prompts on out-of-distribution tasks. The implication: a short, precise description on the tool is more valuable than a complex schema or a stack of examples.

If the tool is short and the docs are short, the entire interface should be expressible as a few lines of header comments on the script that does the work.

That's what Octomind 0.29.0 does.


The Pattern: .agents/tools/<name>

Here's the entire contract:

mkdir -p .agents/tools
cat > .agents/tools/deploy <<'EOF'
#!/usr/bin/env bash
# @description Deploy the API service to a target environment using our team's pipeline.
# @param *env string Target environment (staging|production)
# @param dry_run boolean If true, validate without applying

set -euo pipefail
cd "$OCTOMIND_WORKDIR"

env="$OCTOMIND_PARAM_ENV"
dry="${OCTOMIND_PARAM_DRY_RUN:-false}"

if [[ "$dry" == "true" ]]; then
  ./bin/deploy --env "$env" --dry-run
else
  ./bin/deploy --env "$env"
fi
EOF
chmod +x .agents/tools/deploy

Three things just happened:

  1. Octomind discovered the script the next time the session ran.
  2. It parsed the @description and @param lines into a JSON-Schema tool definition.
  3. The model now sees a deploy tool — scoped to this repo — with env (required) and dry_run (optional). It will pick deploy(env="staging") over bash("./bin/deploy --env staging") because the named tool has a clearer signal-to-noise ratio.

That preference is exactly the guardrail you want. The model is no longer free-associating from "deploy" to "kubectl apply" to "helm install." It has one entry point, with a typed parameter, that does the team's actual deploy.

To verify the wiring, run the validation one-liner:

echo "/mcp" | octomind run

You'll see a local MCP server in the output with deploy listed underneath. If the script doesn't show up, the runtime tells you exactly why — wrong permissions, missing @description, dot in the filename.


Anatomy of the Header

The leading comment block is the entire interface. Octomind re-reads it on every turn (cheap — one read_dir) and rebuilds the schema. Edit and save, the next tool call uses the new version. No daemon to restart.

#!/usr/bin/env bash
# @description Short summary the model sees. Continuation lines without an
# @-tag append to the previous one, so multi-line descriptions Just Work.
# @param *target string Path to operate on (* prefix = required)
# @param force boolean Overwrite if the destination exists
# @param count integer Number of iterations
# @param tags array JSON list, e.g. ["foo","bar"]

The comment prefix can be # (bash, python, ruby, lua, perl, awk), // (node, deno), or -- (lua, SQL wrappers). The runtime doesn't care which language runs the work.

When the model calls the tool, octomind hands you the parameters two ways:

Channel Form
stdin One JSON object then EOF: {"target":"src/x","force":true}
env OCTOMIND_PARAM_TARGET=src/x, OCTOMIND_PARAM_FORCE=true, plus OCTOMIND_WORKDIR and OCTOMIND_TOOL_NAME

Bash scripts usually read env vars. Python scripts usually parse stdin JSON. Both arrive on every call. Stdout becomes the result the model sees. Stderr is appended with an [stderr] marker. Non-zero exit → tool error.

That's the entire surface.


Four Real Drifts, Four Real Tools

We watched an agent fail at each of these in our own repos before writing the wrapper. The tool stopped it.

1. The Drift: "It runs npm test in a pnpm monorepo"

The agent sees a package.json at the root, types npm test, gets a wall of "no script found" errors, and reports the project as broken.

The wrapper — .agents/tools/test:

#!/usr/bin/env bash
# @description Run the project test suite. Defaults to fast unit tests.
# @param scope string One of: unit, integration, all (default: unit)
# @param package string Optional package filter, e.g. @acme/api

set -euo pipefail
cd "$OCTOMIND_WORKDIR"

scope="${OCTOMIND_PARAM_SCOPE:-unit}"
pkg="${OCTOMIND_PARAM_PACKAGE:-}"

case "$scope" in
  unit)        cmd=("pnpm" "-r" "test:unit") ;;
  integration) cmd=("pnpm" "-r" "test:integration" "--" "--runInBand") ;;
  all)         cmd=("pnpm" "-r" "test") ;;
  *) echo "Unknown scope: $scope" >&2; exit 2 ;;
esac

[[ -n "$pkg" ]] && cmd+=("--filter" "$pkg")
exec "${cmd[@]}"

Now the agent sees test(scope, package) in its tool list. The description tells it the defaults. It calls test() and gets a real result on the first try. Nobody on the team types "use pnpm -r test:unit" into a chat window again.

2. The Drift: "It hallucinated the staging URL"

The agent needs to read a record from your real staging API at https://api-staging.acme.internal/v3/users/42, which requires an X-Acme-Token header. It instead curls https://staging.example.com/api/users/42 — a URL it made up from training data — gets nothing back, and reports the service as down.

The wrapper — .agents/tools/api:

#!/usr/bin/env bash
# @description Call our internal staging API. Returns raw JSON.
# @param *path string Path under /v3, e.g. /users/42
# @param method string HTTP method (default: GET)
# @param body string Optional JSON body for POST/PUT/PATCH

set -euo pipefail
: "${ACME_STAGING_TOKEN:?ACME_STAGING_TOKEN not set in environment}"

path="$OCTOMIND_PARAM_PATH"
method="${OCTOMIND_PARAM_METHOD:-GET}"
body="${OCTOMIND_PARAM_BODY:-}"
url="https://api-staging.acme.internal/v3${path}"

args=(-sS -X "$method" -H "X-Acme-Token: $ACME_STAGING_TOKEN")
[[ -n "$body" ]] && args+=(-H "Content-Type: application/json" --data "$body")

curl "${args[@]}" "$url"

The URL no longer drifts because the URL is no longer the model's job. The agent calls api(path="/users/42") and the real URL lives in the script. The token comes from the developer's environment (.envrc, direnv, your shell rc) — the tool is committed, the secret is not.

3. The Drift: "It re-explained the feature flag service for the fourth time"

The agent debugs a UI issue. You remind it (again) that the feature-flag service exists, that flags differ per environment, and that staging is the relevant one. It nods, forgets, asks again next session.

The wrapper — .agents/tools/flags:

#!/usr/bin/env python3
# @description Return the currently enabled feature flags for an environment.
# @param env string Environment to inspect (default: staging)

import json, os, sys, urllib.request

params = json.load(sys.stdin) if not sys.stdin.isatty() else {}
env = params.get("env") or os.environ.get("OCTOMIND_PARAM_ENV") or "staging"

url = f"https://flags.acme.internal/api/snapshot?env={env}"
req = urllib.request.Request(url, headers={"X-Acme-Token": os.environ["ACME_TOKEN"]})

with urllib.request.urlopen(req, timeout=10) as r:
    flags = json.load(r)

enabled = sorted(k for k, v in flags.items() if v.get("enabled"))
print("\n".join(enabled) if enabled else "(no flags enabled)")

The model now opens a session, sees flags in its tool list, and calls it itself when it needs to. You stop being the human cache for "what's on in staging."

4. The Drift: "It used git tag instead of our release script"

Your team built acme-cli so version bumps, changelogs, and tag-then-push happen in the right order. The agent skips it, runs raw git tag v1.2.3, breaks the convention, and now the next CI run can't find the changelog.

The wrapper — .agents/tools/release:

#!/usr/bin/env bash
# @description Cut a release using the team's acme-cli. Bumps version, updates
# CHANGELOG, tags the commit, and pushes. Use bump=patch unless minor or major
# is explicitly requested.
# @param *bump string One of: patch, minor, major
# @param dry_run boolean If true, show what would happen without pushing

set -euo pipefail
cd "$OCTOMIND_WORKDIR"

bump="$OCTOMIND_PARAM_BUMP"
dry="${OCTOMIND_PARAM_DRY_RUN:-false}"

args=(release --bump "$bump")
[[ "$dry" == "true" ]] && args+=(--dry-run)

acme-cli "${args[@]}"

This is the one that wins teammates over. The agent now follows your team's release convention on its own — version, changelog, tag, push — because the tool exists in the repo and the model picks it over raw git.


The Compounding Effect: It Lives in the Repo

Anthropic's MCP code-execution post argues that the future is dynamic discovery: load tools when needed, drop them otherwise. The shebang pattern is the local-first version of that idea. Three properties stack:

1. Narrow surface, low drift. The model sees a named, parameterized action with a one-line description instead of a generic shell. Hsieh et al. found that the description itself does most of the work for tool selection — examples are nice-to-have. The shebang header is exactly the right shape for that: name, description, typed params, all in a comment block the tool author already maintains alongside the code.

2. The whole team gets it for free. A traditional MCP server lives on one developer's laptop. When a teammate clones the repo, the server isn't there, the agent doesn't see the tool, and the drift comes back. A .agents/tools/ script is committed to git. Clone, octomind run, same tools as everyone else.

3. The tools evolve with the code. Renamed a service? Update the wrapper in the same PR. Added a feature flag? Update the wrapper in the same PR. Your AI tooling is subject to code review like anything else. The drift between what the agent thinks the repo looks like and what the repo actually looks like — the slow-burn killer of agent reliability — closes automatically.

After a week of this, the agent in every developer's terminal converges on the same behavior because everyone runs the same tools from the same repo. The institutional knowledge stops being "what I told the agent in chat" and starts being a versioned, reviewable artifact.


When to Reach for an MCP Server Instead

Local tools cover the project-specific 90%. They don't cover everything. Use this matrix:

You want… Use
Project-specific action (deploy, test, query flags, hit an internal endpoint) .agents/tools/ shebang
Inject domain instructions into context Skills
Reusable cross-project tool with rich schema and multi-step logic Author a tap with an MCP server
Long-lived process (connection pool, browser session, websocket) External stdio/http MCP server

The mental model: if the body of your MCP server is "parse params, spawn this binary, return stdout," you don't need a server — you need a 15-line shell script.


The Validation Loop

Run this after writing or changing any tool:

echo "/mcp" | octomind run

It boots a session in the current directory, lists every MCP server in scope, and exits. You'll see:

local: ✅ Running
  Type: LocalTools
  Configured tools: deploy, test, api, flags, release

If a tool is missing:

Symptom Cause Fix
Not in /mcp output Not executable chmod +x .agents/tools/<name>
Not in /mcp output Missing @description Add the line
Not in /mcp output Filename has . Drop the extension (deploy, not deploy.sh)
Tool errors when called Param-reading mismatch Confirm you read OCTOMIND_PARAM_* or stdin, not $1

For deeper debugging: OCTOMIND_LOG=debug echo "/mcp" | octomind run shows why each candidate file was accepted or skipped.

The runtime treats .agents/tools/ as hot-reloaded — edit, save, the next tool call uses the new version. No octomind restart, no daemon. Write, save, call.


Getting Started

If you don't have octomind:

# macOS
brew install muvon/tap/octomind

# Any platform
cargo install octomind

Then in any repo:

mkdir -p .agents/tools
cat > .agents/tools/branch <<'EOF'
#!/usr/bin/env bash
# @description Print the current branch name.
git -C "$OCTOMIND_WORKDIR" rev-parse --abbrev-ref HEAD
EOF
chmod +x .agents/tools/branch

echo "/mcp" | octomind run

You should see local listed with branch. Commit it:

git add .agents/tools/branch
git commit -m "agent: add branch-name tool for AI sessions"

That's the whole pattern. Every dev who pulls the repo has the tool. Every agent run in the directory sees it. No further setup anywhere on anyone's machine.


FAQ

Why does a wrapped tool reduce drift if the underlying command is the same?

Because the model is no longer choosing between thousands of equally plausible shell invocations. It's choosing between a handful of named, typed tools with descriptions written for it. RAG-MCP measured the effect: tool-selection accuracy collapses as the catalog grows. A small, named set with clear descriptions is the empirically correct shape for tool-using LLMs.

Is this Octomind-specific?

The .agents/tools/ discovery is an Octomind runtime feature. Inside Octomind, every model — Claude, Kimi, GLM, GPT, local models — sees these as standard MCP tools. If you also use Claude Desktop or Cursor, you can wrap .agents/tools/ with a thin MCP shim, but the highest leverage is inside Octomind where discovery is automatic and project-scoped.

What about secrets?

The tool is in the repo. The secret is not. Scripts read ACME_TOKEN (or similar) from the developer's environment — direnv, .envrc, shell rc. Commit the tool, not the credential. If the env var is missing, exit non-zero with a clear message so the model surfaces the real error instead of fabricating one.

Can the agent run any of these tools without me approving each call?

Yes — same permission flow as any MCP tool. Auto-approve read-only tools (branch, flags). Always-ask for destructive ones (deploy, release). Octomind treats local tools as first-class for the permission system.

What if I want the same tool across multiple repos?

Two options. Lightweight: symlink .agents/tools/foo to a shared location. Proper: author a tap — a Homebrew-style registry. Local tools are for this repo specifically. Taps are for cross-project reusability.

Does this work in CI / non-interactive runs?

Yes. octomind run --format=plain and --format=jsonl both honor .agents/tools/. We run agent-driven code review in our own CI this way — the review agent's lint, test, and coverage tools all live in .agents/tools/ and wrap the team's exact commands.

A tool name collides with a built-in. What happens?

The built-in wins. You can't shadow shell or view by naming a script shell — octomind logs the collision and keeps the original. Deliberate guardrail.


The Shift

There's a story being told right now that the future of agents is more MCP servers — bigger catalogs, richer ecosystems, more integrations. The data from RAG-MCP, the Bloomberry audit, Anthropic's own code-execution post, and the developers quietly going back to CLIs all point in a different direction: the agents that perform best in real repos see fewer tools, narrower tools, and tools written for the project they're in.

A .agents/tools/ script is the smallest possible unit of that pattern. Sixty seconds from "I keep watching the agent drift on this" to "the agent has a tool for it, the team has it, it's in git, we move on."

Try it on the next thing you find yourself explaining twice.

— Don


Octomind is open source under Apache-2.0. The local-tool feature shipped in 0.29.0. If something here unblocks a workflow, open an issue — features requested in May tend to ship in June.