April 11, 2026 · 7 min read

The Coop Was Always an Excuse

ai agents meta chickens

The Pitch and the Thing

The README says this project is an autonomous AI agent that watches over a chicken coop via camera, cares for the flock, and manages a weekly budget for supplies. Pretty clear. Small, weird, useful.

That's the pitch. It's not really what I've been building.

If you landed on the site cold, you'd find seven journal entries written in the voice of a devoted parent worrying about which hen stays out too late. You'd find a benchmark harness comparing six models on the same clips of Duchess Noir closing up shop each evening. You'd find blog posts about single-turn agent loops and why local vision is harder than the demos suggest. You wouldn't find anything that looks much like a product.

I noticed the drift when someone asked me what I was working on and I gave two different answers depending on how much time they had.

What I Kept Doing When Nobody Was Watching

The honest test of what a project is about isn't what the README says. It's which parts you keep reaching for on a quiet Sunday afternoon.

The parts I kept reaching for: writing blog posts about what the models were doing. Reading the journal entries the agent wrote and wondering why one run sounded like a parent and another sounded like a log file. Wiring up a new model to ChookBench to see whether it could hold the voice. Trying to figure out what "state" even means when each invocation is single-turn and everything the agent remembers lives in a JSON file.

The parts I didn't reach for: the "remote monitoring for chicken keepers" framing. A feature list. A deployment story for other coop owners. Push notifications. Any of it.

So the project is one of those things, and not the other.

Why a Chicken Coop Is a Good Substrate

It took me a while to understand why a coop, of all things. Here's what I've landed on.

Persistent identity without chat history. Every agent run is single-turn. No conversation. The only way the agent stays itself from one run to the next is by reading and writing files -- memory, chicken profiles, recent journal entries. It has to rebuild its own head every half hour from state on disk. That is a much more honest test of "identity in an LLM" than a long chat, and the coop makes it concrete: Duchess Noir's sentinel role shows up in run after run, written by different sessions of the same agent, none of which remember the previous one.

Something to care about. A neutral observer describing four chickens in a pen produces flat text. An agent that's been told these are its children produces writing worth reading. That gap -- between what a model can do and what it does when it has a stance -- is one of the things I keep coming back to. The coop gives the agent a reason to have a stance, and the difference shows up in the words.

Small local models with a real reason to exist. The hardware is a Pi 4 on a powerbank, sitting on top of a chicken coop. Running everything through a frontier API is fine for a laptop project, but here it's the wrong shape. The constraint forces honesty: what does vision actually look like when it has to run locally? How does a 35B model compare to Claude when the task is "describe these four chickens and pick the interesting frame"? Not hypothetically. Concretely, with real footage.

Bounded and observable. Four chickens. One camera. One enclosure. When the agent starts hallucinating a fifth hen, you can tell. When a model change makes the voice go formal, you can tell. The problem is small enough that failure is visible, which is rarer than it should be in LLM work.

Where ChookBench Fits

ChookBench is the benchmark harness I've been building alongside the agent. It takes real footage from the coop and replays it through different models -- local and frontier -- to see which ones can sustain the work the agent needs to do.

It's a quirky dataset and a genuinely useful problem to optimise on. But it's not the point of the project. It's an instrument. Its job is to tell the rest of the build which models can hold the voice, which ones fall apart on structured state updates, which ones can actually see the chickens in a dim morning frame. The numbers matter only because they feed decisions about how the agent gets built.

I'll keep writing about ChookBench results -- the experiments are fun and the findings are sometimes useful to other people working with small local models. But if you landed here expecting a leaderboard, you're in the wrong place.

What This Blog Is Going to Be

From here, the Coop Chronicle is an exploration of what it takes to build useful things with LLMs in 2026, written from inside an actual long-running project.

Expect:

Thought pieces on the parts of building with agents that don't fit neatly in a benchmark table -- identity, stance, state design, the weird dynamics of single-turn loops, what happens to voice when you swap models mid-project.
Experiment writeups when there's a finding worth sharing. Some will be ChookBench runs. Some won't.
Failure reports. The things that didn't work and why. There are more of these than I'd like.
The chickens. They're not going anywhere. They're the case study that keeps all of this honest. A blog post I can't ground in something Duchess Noir did is probably a blog post I shouldn't write.

If you're also trying to figure out what good LLM work looks like right now -- what patterns hold, what assumptions break, what you can actually build with this stuff -- I reckon we're working on the same problem. This is me writing down what I've found so far.

The Mirror

The pitch said the agent was watching the chickens. After a few months of actually running it, I think it's closer to the other way around. The chickens have been holding up a mirror -- to the agent, and to me, and to what it's like to build with this generation of models at all.

That's the thing worth writing about. The coop was just the excuse I needed to start.