QA is the Killer App for PI & Claude Coding Agents

2026-03-16 ai, tooling, qa, agents, cmux, pi, developer experience, pi agent

If 2025 was the year of the "harness," then 2026 is shaping up to be the year of "orchestration." Gas Town, FactoryFactory, Conductor...

All of them are focused on how build your application fast but quality is up to you to assess after the polecats have finished their jobs around Gas Town.

Personally, I am not ready for this level of parallel development. Instead, I lean into using multiple agents to represent different perspectives, or personas like a potential customer doing due diligence or an SRE looking for performance issues.

I am working on a preconstruction application for a friend that works in commercial HVAC. He gives me great feedback, but we are busy people, and I want to show him the best version of the application. So why not clone his personality into an agent and give the agent access to a browser?

Fifteen minutes later, this persona, Mike Henderson, returned a detailed report that gave me clear action items for improvement. Some of these items my initial tester missed because he was focused on other parts of the application.

Setting it up yourself

At a basic level, all you need is a browser tool an agent can use (I like cmux, but sometimes use Chrome Devtools instead). Then, define a skill to perform the QA with details like how to login, register, etc. Most of my projects leverage Lit so the skill encodes some of the trickier parts of piercing the shadow-dom.

The key element though are reference files to different personas that adjust the focus of the agent when testing. I set my skill up so I can pass a parameter to the skill itself to select a persona like: qa-app customer

With this system in place, it is easy to launch headless processes via claude -p (ask Claude to do it for you).

No merge conflicts to deal with, just bug reports and usability issues for your other agents to fix!

QA while you sleep

In February, I attended the Pragmatic Summit in San Francisco. The lead of OpenAI's Codex team devoted some time (but not enough!) on how they use agents for QA. He mentioned two key use cases:

On receipt of a bug report, attempting to reproduce it with an agent
Running agents during off hours to find new issues

Before going crazy with parallel development, try getting a fully automated pipeline running. Not only testing, but QA. Consider other ideas like having a totally automated "customer" using the application in production as sort of a canary.

Building fast is awesome, but if the end result is slop software, you will not win any customers.

QA is the Killer App for PI & Claude Coding Agents

Setting it up yourself

QA while you sleep

More Posts