Few Token Do Trick: The Caveman Skill Topping GitHub Trending

The top trending repo on GitHub today is a Claude Code skill that makes your AI talk like a caveman. It’s a joke. It’s also a real answer to a real bill.

I went looking at what was trending on GitHub this morning and the number one project — nearly 3,000 stars in a single day — is called caveman. Its tagline: “why use many token when few token do trick.” It’s a plugin for Claude Code, Codex, Gemini, Cursor and thirty-odd other agents, and its entire premise is to make your coding assistant drop the filler and answer in terse, grammar-free caveman-speak. Same fix, a third of the words.

It reads like a shitpost. But I just wrote a post about the $12.8B AI-coding economy and how half of GitHub’s commits are now AI-touched — and caveman is a surprisingly sharp comment on that same trend. So let me take the joke seriously for a minute, because there’s a real engineering idea under the grunting.

The thing it’s actually attacking

Every reply your AI agent sends is billed by the token. And the default personality of these models is verbose — “Sure! I’d be happy to help. The issue you’re experiencing is most likely caused by…” Three sentences of throat-clearing before the one line you needed. You pay for every word of that preamble, on every reply, forever.

caveman’s before/after example is the whole pitch:

Normal (69 tokens): “The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle… I’d recommend using useMemo to memoize the object.”
Caveman (19 tokens): “New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

Identical technical content. A quarter of the tokens. Their benchmarks claim an average 65% output reduction across ten prompts, measured against the Claude API’s own token counts — and crucially, they keep code, commands, and error strings byte-for-byte exact. The compression is applied to the prose, never to the payload.

Why this is cleverer than it looks

The slogan they lead with is the real insight: “Caveman no make brain smaller. Caveman make mouth smaller.” It shrinks what the model says, not what it knows. The reasoning still happens at full fidelity inside the model; only the final rendering to text is compressed. That’s the right place to cut — you’re not asking it to think less, just to stop narrating.

A few details that show it’s more than a one-liner:

Levels. lite, full, ultra, and a wenyan mode that renders in classical Chinese — which, only half-jokingly, packs the most meaning per token of any human language.
It keeps your language. Write Portuguese, it grunts back in compressed Portuguese. It compresses style, not meaning.
Compress the memory file too. /caveman-compress CLAUDE.md rewrites your project’s instruction file into terse form, cutting input tokens on every session after — while preserving code, URLs, and paths verbatim.
It measures itself. /caveman-stats reports real session token usage and lifetime dollar savings. The benchmarks live in the repo, committed and reproducible. For a meme project, that’s unusually honest.

Where I’d be careful

I like it, but a token-counter’s enthusiasm shouldn’t switch off the engineer’s skepticism:

Output tokens are the cheap half. The savings are on output only — their own chart shows 0% saved on input. In agentic coding, the input side (your files, tool results, the whole context window fed back each turn) is often the bigger cost. caveman helps, but it’s trimming the smaller line item.
Terse isn’t always better for humans. When I’m debugging something subtle, the model’s “why” paragraph is sometimes the part that catches its own mistake. Compress the explanation away and you may also compress away the reasoning you’d have caught an error in. For grinding through boilerplate, grunt away. For a tricky design call, I want the full sentence.
Curl-pipe-bash installs. The one-line installer pipes a remote script straight into your shell across 30+ agents. Convenient, but it’s exactly the kind of thing I’d read before running — same instinct as any privileged installer.

The takeaway

caveman is a joke with a real spreadsheet behind it. It won’t change how these tools reason, and it won’t touch your biggest cost line — but it’s a genuinely smart observation that the default verbosity of AI agents is a tax you can opt out of, and that you can do it without losing a byte of the technical answer. In a year where we’re all quietly watching our API bills climb, “same brain, smaller mouth” is a better engineering principle than it has any right to be. Few token do trick.

Syncster

Search This Blog