Field notes from shipping AI in production.
Long essays from the Cuecoder lab — only when there's something worth saying. No SEO bait. No recycled announcements.
Evals are the product. Everything else is a side effect.
After two years shipping LLM features in production, the only artifact worth trusting is a well-designed eval suite. Here is how Cuecoder structures them.
Stop tuning prompts. Start tuning context.
Prompt engineering plateaus fast. The real lever is what you put in front of the model — retrieval, structured tools, and dynamic memory.
Why the gateway got rewritten in Rust
Two years of Python latency tax, a runaway autoscaler, and a 3am incident later — the case for moving the hot path off the GIL.
Building agents without frameworks
LangGraph, Crew, AutoGen — all good demos, none have shipped what production needed. Here's the 200-line loop quietly serving production traffic.
Building a SaaS in public is a forcing function for taste
Public commits, public revenue, public incident reports. Why exposing the work changes what gets shipped.
The 1M context window is a trap (for now)
Bigger context windows are exciting and useless without a retrieval strategy. A pragmatic guide to picking what to put in.