How I Use AI to Build Software

AI has become part of my daily software workflow, but not in the way the demos usually make it look.

The useful version is less "one prompt builds the whole app" and more "a capable agent can hold a lot of repo context, make a focused change, run the checks, and hand me something I can actually review." That shift sounds small, but it changes the shape of the work. AI is not just a chat box I visit when I am stuck. It is now part of the development loop: idea, issue, branch, implementation, verification, review, and then a human decision about whether the work is good enough.

That loop is what I care about. The tools matter, and I keep a current list on my setup page, but the workflow matters more.

The stack I actually use

I use a few different AI tools because they occupy different parts of the system.

Claude Code is my terminal-based coding agent. I use it when I want to hand off a specific implementation task inside a repo: read the existing code, make the change, run the commands, and keep iterating until there is a clean diff.

Codex is another agent in the loop. I use it for implementation, code review, repo archaeology, browser verification, and second-pass thinking. Sometimes I want one agent to build and another to review. Sometimes I want Codex to work through a UI in the browser and tell me what actually rendered instead of what the code suggests should render.

Detent is the orchestration layer. It turns issues into isolated worktrees, gives each task a branch, and keeps the work tied to a board. That matters because agents are most useful when they have a clear lane. "Improve the site" is vague. "Write the article from this issue, follow the acceptance criteria, open a draft PR, and hand it back for review" is a job.

Custom Claude skills are how we package repeatable team workflows. Some work is not just code. We have church data migrations, Planning Center cleanup tasks, import verification, project routines, and recurring checks that need the same prompts, scripts, and guardrails every time. A skill turns "remember how we do this" into something closer to an operating procedure.

GitHub is the source of truth for the work itself. Issues define the request. Pull requests show the diff. Comments, checks, and review state decide what happens next.

Local verification is the anchor. Agents can be confident and wrong at the same time. The way out is boring and reliable: run the tests, run the build, inspect the UI, compare against the issue, and say plainly what was not verified.

The normal path from idea to review

Most useful work starts as a small, written issue. I try to make the issue concrete enough that an agent can act on it without inventing the product direction. A good issue says what should change, why it matters, and how we will know when it is done.

From there, Detent can pick up the issue and create a branch in an isolated worktree. That isolation is not glamorous, but it keeps the agent from trampling other work. It also makes the review easier because the diff belongs to one task.

Once the agent starts, I want it to read before it writes. It should find the relevant files, inspect the surrounding patterns, and understand the commands from the repo itself. In a Next.js app, that might mean reading the article layout, the setup page, the package scripts, and the repo instructions before adding a page. In a Go service, it might mean reading the handler, the tests, and the domain package before changing behavior.

Then the agent implements the smallest thing that satisfies the issue. It runs focused checks while it works, broad checks before it calls the work ready, and a browser pass when the change affects the UI. Finally, it opens a draft pull request with a summary and test plan.

That draft state is intentional. The agent can prepare work for review. It does not get to decide that the work should merge.

What I let agents do

I am comfortable letting agents do a lot of the mechanical and investigative work.

They are good at repo archaeology. If I need to know where a behavior lives, which branch introduced it, or why a test started failing, an agent can search faster than I can and keep a much larger map in its head.

They are good at focused implementation. "Add this article," "wire this button," "fix this query," "write the missing test," and "update this API response" are all good agent tasks when the acceptance criteria are clear.

They are useful reviewers. A second pass catches missing tests, confusing naming, edge cases, and places where the diff quietly does more than the issue asked for.

They are good at repeatable workflows. If the same business operation has to happen every week, I would rather encode the steps once than depend on someone remembering the right Slack message, SQL query, browser click path, and verification note.

They are also good at keeping momentum. A lot of software work stalls in the cracks: finding the right command, creating the branch, waiting for a build, checking a page, updating a PR body. Agents can keep those little handoffs from becoming a pile of unfinished tabs.

What I still verify myself

I still own taste, product judgment, and risk.

An agent can make the code compile. It cannot always tell whether the feature feels right, whether the empty state is humane, whether the copy has the right amount of confidence, or whether a workflow matches the messy reality of a customer day.

I am also careful with anything high-stakes: money, migrations, destructive actions, permissions, customer data, production changes, and third-party systems that do not give you a clean undo button. In those spaces, I want evidence from the real source of truth. Not vibes. Not "the code looks like." Actual database rows, API responses, workflow histories, logs, screenshots, or browser state.

The most important habit is separating verified facts from inference. "The tests passed" is different from "the import worked in staging." "The page builds" is different from "the UI reads well on mobile." "The branch has no diff" is different from "it is safe to delete the worktree." Saying those differences out loud prevents a lot of chaos.

Where this helps most

The biggest win is not that AI writes code faster. It is that it lowers the cost of doing the responsible version of the work.

It is easier to ask for a test when the agent can write it. It is easier to verify a UI when the agent can start the dev server, open the page, and inspect the result. It is easier to understand a legacy path when the agent can trace files, commits, and logs without getting tired.

That means I use AI most heavily in places where context is the hard part: understanding a repo I have not touched in a while, threading a change through several layers, comparing a bug report against actual runtime evidence, or turning a rough operational routine into a repeatable process.

It also helps with review discipline. A good agent does not just say "done." It tells me what changed, what it ran, what failed, what it could not verify, and what still needs human attention.

The current limit

The limit is not intelligence in the abstract. The limit is ownership.

Software is full of judgment calls. What should this product become? Which customer pain matters most? Is this abstraction worth it? Does this page feel like us? Is this automation safe enough to run unattended? Those are not questions I want to outsource.

The best AI workflow I have found keeps the agent close to the work and the human close to the decision. Agents can gather context, make changes, run checks, and prepare evidence. I still decide whether the result is clear, useful, honest, and safe.

That is the version of AI development that is sticking for me: less magic trick, more working loop.