Remote Agents and You

Over the past month, I have been trying out remote agents on a complex, legacy codebase as a method for automating bug fixes for tickets that land in a "Triage" queue in Linear.

For our experiments, I hand pick the tickets to send to our remote agent, Codegen, which directly integrates with Linear. Concurrently, I use Claude Code and the Linear MCP Server to run the same process on my local machine as a method of comparison.

Out of the 6 experiments so far, neither system has been able to complete a ticket without manual intervention. But, I run into more problems using a remote agent.

Remote Agents & the Sandbox

Remote Agents run in nice sandboxed environments with limited access to the outside world. The configuration is minimal (Codegen has solid documentation) and I expect it will continue to get better as the tech matures. However, things start to get hairy when your codebase is...messy.

Consider an all-to-common scenario:

  • Fat Django models leading to overreliance on the DB in tests
  • Integration tests not properly marked
  • Dependencies on private, internal only GitHub packages
  • Hundreds of files, some that contain thousands of lines of code

What do you do in this case? Locally, you get by with Makefiles and docker compose. To you, it's really not all that bad! You know what to do to test even if the codebase is old and dusty. Your remote agent is looking at it with fresh eyes and is horrified by what they see (sorry Devin).

What I need in this situation is a way to create a sandboxed environment that can run a docker compose script. Trying to prepare this codebase for remote agents is a non-starter unfortunately.

Designing for Remote Agents

If anyone from Codegen or Devin (or another provider) is reading this, I haven't given up on you all yet. I am going to better align our development practices with what the remote agent needs to be successful. This means:

  • True unit tests that have no external dependencies (repository pattern is back!)
  • Simplify setup using a tool like mise and provide one or two clearly defined tasks as part of precommit
  • Setup remote access for external systems to the internal GitHub packages

Honestly, these are all things any high-quality codebase should aim for regardless of AI. If an intern or new hire can't figure out how to get started in your codebase, a remote agent is going to struggle just as much if not more.

Helpful MCP Servers

Anthropic's Model Context Protocol has made Claude Code my daily driver (sorry Aider). If you are unfamiliar, MCP is a specific interface definition that exposes a service as a tool for an MCP Client (Claude Code, Cursor, Copilot).

I won't dig into the details of exactly how MCP works, but instead I want to provide a quick list of MCP servers that can expand your coding agent's capabilities.

Zen

Link

Zen is awesome. It provides tools for your agent to get a second opinion from another model to handle a large refactor or to iterate on an idea. I briefly dabbled with a similar idea, and then I found Zen and promptly stopped development. It takes some getting used to how to trigger Zen, but I find myself using it a few times per coding session.

MCP Toolbox for Databases

Link

Talk to your local database or to various Google data products, like BigQuery. I don't use much on my personal projects, however it is essential at my day job to help diagnose data issues and to understand relationships between tables. Recently, I used it to help me plan a refactor of an existing data model.

Context 7

Link

Code documentation for many different libraries and frameworks available as plaintext. It works pretty well, but the interface seems to be a bit rough around the edges (I think it's case sensitive and there is no search feature). The promise of this one is huge and I look forward to seeing it develop.

Cloudflare

Link

Documentation for Cloudflare available as a tool. Essential when dealing with Cloudflare specifically. I am using it on a project to help migrate from Pages to Workers.

Enterprise Vibe Coding

A few days ago, the team over at Cloudflare released an OAuth library that was developed with extensive help from Claude. They went the extra step and published detailed information about the prompts used and the commit history gives insight into the limitations of Claude, if not all coding agents.

For the past three weeks, I have been waking up at 5AM to work on some projects before my son needs to get ready for school. My goal is figure out what level of guardrails are necessary to prevent code from devolving into a sloppy mess. I find myself restarting projects, sometimes after a single day, because the code has become unmaintainable.

These attempts were done on pure vibes with nothing written down before hand.

What if I broke things down into user stories and Architectural Decision Records (ADRs)? Would an Agent be better behaved if I treated it more like an employee and myself like a PM?

Experimenting with ADRs

My most recent experiment was to develop ADRs with Claude acting as a reviewer first before implementing (Check out the light weight ADR Tools). My thoughts became more clear! Claude gave me actionable feedback to refine my desires into real requirements!

I excitedly had Claude implement the first few of these and ended up with a working Flutter prototype in minutes. We continued on and knocked out two more ADRs. However, it got stuck trying to resolve an issue with closing out parenthesis and thats when I noticed that the widgets it built were becoming massive and unwieldy.

Instead of creating a generic list component (yes, I am implementing a TODO list-like app. we all must do it) it created a highly specific component tightly coupled to the domain. My ADRs had been written with technical requirements, but without specifying how the agent should accomplish the goal.

Claude passed QA, but failed the code review.

Next Experiment: More Process

My next experiment will be to break down these cos into much more narrowly focused tasks. Each task would represent my thoughts on how the work should be done (implement a ListComponent vs. create a list for managing a trip to a grocery store). If I can properly articulate my vision, Claude and I should be in better alignment over the long term.

It will lead to more frequent code reviews, but honestly this is not an undesirable outcome. Ten minutes of review could be the difference between restarting a project and actually finishing it.

To start, I want to try purely describing my ideas. Failing that, I will move towards defining the interfaces and tests myself.

My overall velocity will decrease, but it will be more maintainable. Or so I hope.

Reformed AI Skeptic

If you even casually browse HackerNews, you probably saw "My AI Skeptic Friends are all Nuts" and the -raging flame war- heated debate it triggered.

I love HN for all of the skeptics and naysayers; it keeps me grounded and constantly questioning unbridled enthusiasm for AI. However, I am becoming less skeptical by the day as I continue to improve my systems for working with agents.

My adoption curve went like this:

  • 2023: Dabble with ChatGPT to answer questions. Begin deprecating StackOverflow as a resource. Very little coding with AI.
  • 2024: Decide to try again with Copilot. Opening VSCode made my physically ill, so I stopped and went back to Neovim. StackOverflow completely forgotten for Claude/ChatGPT.
  • Late 2024/2025: Cursor is all the rage. I decided to install it, Zed, & VSCode (again). Super impressed by Cursor especially, but the IDE leaves me wanting.
  • Mid 2025: Rediscover Aider and how to configure it. Immediately I notice a sea change in code quality.
  • Now: Daily driver is Aider but starting to use Claude Code more.

With the help of agents, I have prototyped more at work, shaped ideas, and started (& finished!) a number of small projects. Honestly, it has made the act of coding really exciting again.

If you aren't seeing the benefits, take some time to configure the agent via it's specific rules configuration. Have it help you write it. Configure aggressive linters to hold the agent accountable to high standards. You may surprise yourself by how much better the experience is.

Now listening to: Virtual Dreams II: Ambient Explorations in the House & Techno Age

Shifting Left

Not So Common Thoughts produced a great post that succinctly combines the historical ways that technology has shifted production away from skilled labor, i.e. people needing years of experience with a certain skill, towards something approaching a critic, or someone with good taste and judgement. Brian Eno recognized how music was affected in such a way by the advent of the computer sequencer.

What does this mean for the software engineer in the world of Claude & ChatGPT? Are we all just "Ideas Guys", endlessly bothering our agents to "trust them - this idea is worth millions"?

Personally, I think we are going to see a return of strict documentation and requirements at every step of the process. The clarity of your vision will become paramount to ensure that your vibecode will not degenerate into complete slop. Humans have the distinct advantage of being able to deal with ambiguity due to our massive context windows and prior knowledge of our company, how the code has evolved, values, etc., but new hires often lack this historical information and thus fall into the same trap as an agent.

New Hires need a high level of clarity in their tickets to operate independently. They need to know how they can evaluate their production to ensure success. As of right now, AI agents are no different. The more ambiguity you allow, the more the software is going to drift in unexpected ways as each Agent defines their own success criteria.

I wonder if the waterfall methodology, at least at the beginning, is going to make a comeback in the new AI world. Consider this:

  1. Using deep research, you produce a high-level product roadmap and technical architecture
  2. Each item in the roadmap becomes a PRD (Product Requirements Document) with strictly defined requirements and UI design
  3. Architect engineers scaffold out the services, function definitions, and tests to ensure a unified design to the system
  4. Project management agents build out the tickets and consider task sequencing using Kanban
  5. Coder agents pick up and complete tasks. Tests act as their evaluation criteria

Right now, I think a human must be involved between each step of the process, but we seem to be rapidly reducing it. Maybe the company of the future just focuses on #1 and the rest is taken care of.

But, it begs the question. Did Brian Eno know what his Music for Airports was going to sound like before he started?

Minimal Viable Blogging

On HackerNews today, there were two articles very relevant to my journey in building out my website.

  1. The Minimum Viable Blog
  2. Why can't html alone do includes?

I was heavily inspired by articles like #1 to start blogging (not this one in particular given the date), but I made it my mission to not reach for Django, Rails, Astro, or really anything outside of HTML. I did not cut my teeth by building webpages, so I wanted to use this opportunity to learn how to do as much as possible with just HTML and CSS.

It did not take me very long to run into the problems listed in article #2. After writing only one blog post, I wanted to refactor out the common header and footer elements to their own templates.

Nope, you can't do that with just HTML.

However, I did find a solution using Caddy's built-in features as I detail in my earlier posts. I saved myself from Django or Rails, but I guess I did end up writing a bit of Golang.

¯_(ツ)_/¯

Sorting files in reverse order

This one had me scratching my head for far too long. How do you render a list of posts in reverse order? I knew to make it easy, I would name files to ensure they are sorted in order on the file system.

I thought all I had to do was reverse (listFiles "./posts/") in my Caddyfile. This worked just fine on my machine, but when deployed, my post order would be wildly incorrect.

What I ended up doing was: reverse (sortAlpha (listFiles "./posts/")) which leverages the sortAlpha string list function from Sprig.

Moral of the story: Read the f___cking Manual

Styling code blocks with Chroma

I have always found it difficult to properly style my code blocks when using just HTML. My last attempt used a bit of Javascript to generate it on the fly which led to terrible layout shifts on first load.

Caddy uses Chroma for handling tokenization of code blocks within a markdown file and provides styles. A fully static approach is exactly what I was looking for, but I struggled to get the styling to work.

What I had to do was generate the styles via the Chroma CLI tool and add them to my CSS file.

brew install chroma

chroma --html-styles -s catppuccin-latte | pbcopy

A list of styles can be found here

I fully believe there is a way to generate the styles inline, but this worked just fine for my needs

Anatomy of a Post

Before diving into the template that drives the feed itself, let's break down what a post looks like and the magic of the splitFrontMatter function..

The file driving a post is laid out like so:


---
title: My title
date: 2025-04-25
tags: helpful, tags
description: SEO description
---
{markdown content}

splitFrontMatter will automatically parse out everything between the --- marks and uses each key as an entry to the Meta dictionary. We can then use these keys in our HTML template.


{{ $parsed := splitFrontMatter $content }}
{{ if $parsed }}
  {{ $title := $parsed.Meta.title }}
  {{ $date := $parsed.Meta.date }}
  {{ $tags := $parsed.Meta.tags }}
  
  <section class="blog post-item">
    <h2><a href="/post/{{ $file }}">{{ $title }}</a></h2>
    <div class="post-meta mono">
    <time datetime={{ $date }}>{{ $date | htmlDate }}</time>
      {{ if $tags }}
        <span class="post-tags">{{ $tags }}</span>
      {{ end }}
    </div>
    <div class="post-content">
      {{ markdown $parsed.Body }}
    </div>
  </section>
{{ end }}

I was surprised with how intuitive it was to get things set up once I discovered the proper functions to do so. Caddy is made with Golang and comes with a variety of helpful packages for template rendering:

  • Caddy itself: Contains most of the functions for manipulating files
  • Sprig: Contains the functions to format strings & dates, reverse lists, and much more
  • Golang text/template package: Necessary reference for the syntax driving the templating engine

Hello World

After many futile attempts to start blogging, I decided to start small with link posts and a more freeform feed.

I became a fan of Simon Willison's blog and was inspired by his post on how to run a link blog

Of course, before I could get started, I had to ask myself how I was going to manage this feed a.k.a procrastinating on the hard task of actually writing.

Since I already knew a bit about Caddy's templating system, I was aware of the possibilities of building the feed without using Hugo, Gatsby, Python, etc. So with a bit of help from Aider (not as much as I would have liked, it got most of the functions wrong), I finally have my own link feed.

My first series of posts will focus on how the techniques I used to set it up in case you want to do it yourself.