About
About evals.biz
Who writes this library and why.
evals.biz is a reference library for AI evaluation strategy, written for technical leaders — CTOs, architects, engineering managers, and senior engineers — who are building AI products and trying to understand how to maintain quality at scale.
The pieces here are not blog posts. They are intended to be durable reference material: frameworks and definitions that remain useful over time, not commentary on the latest model release. There are no publication dates. Nothing here is timely.
Why this exists
Most of the technical writing about AI evals is written by and for practitioners — engineers who are implementing evaluation infrastructure. That’s valuable, but it leaves a gap. The leaders who are deciding whether to invest in evals, how to structure the work, and how to explain it to their boards and teams often don’t have a concise, non-technical reference.
This library is an attempt to fill that gap. It explains the concepts without assuming you’re going to write the code yourself.
Who writes it
Grey Newell builds AI evaluation infrastructure and advises engineering teams on AI quality strategy. He is the author of the MIST stack, a suite of open-source tools for AI evaluation, inference routing, and observability.
All pieces on this site are written by Grey. There are no guest posts, no sponsored content, and no affiliate relationships with tool vendors. References to specific tools are made because they’re relevant, not because of any commercial arrangement.
Questions, disagreements, or requests for specific topics: greynewell.com.