Back to blog
5 min read

Notes on how a world-class software engineer uses LLMs - Part 2

Article: How I program with LLMs

HN Comments: How I program with LLMs (crawshaw.io)

What the article covers

This document is a summary of my personal experiences using generative models while programming over the past year. It has not been a passive process. I have intentionally sought ways to use LLMs while programming to learn about them. The result has been that I now regularly use LLMs while working and I consider their benefits net-positive on my productivity. (My attempts to go back to programming without them are unpleasant.)

It’s very early but so far the experience has been positive.

So I followed this curiosity, to see if a tool that can generate something mostly not wrong most of the time could be a net benefit in my daily work. The answer appears to be yes, generative models are useful for me when I program.

My Thoughts

Overall takeaway

Today’s thoughts center around this part of David’s post:

Chat-based LLMs do best with exam-style questions

He then points to two primary elements:

  1. Avoid creating a situation with so much complexity and ambiguity that the LLM gets confused and produces bad results
  2. Ask for work that is easy to verify.

This is in line with the idea that you should treat LLMs like very smart interns.

Just as you wouldn’t give an intern a vague, complex project without clear success criteria, LLMs need well-defined tasks with verifiable outcomes.

Ideally, David comments:

The ideal task for an LLM is one where it needs to use a lot of common libraries (more than a human can remember, so it is doing a lot of small-scale research for you), working to an interface you designed or produces a small interface you can verify as sensible quickly, and it can write readable tests.

This framing of the task to give LLMs is interesting because most people’s experiences with LLMs are ChatGPT or Claude, so they are only aware of working within a text box input.

Here are some practical thoughts for AI Engineering and AI Agents from the paragraph above.

Practical Implementation

As AI Agents proliferate, people must find the ideal task for them.

Per David’s (and my experience) working with LLMs, that will mean

  • they do a common task
  • that produces something that can be verified quickly, efficiently, and consistently
  • the work is testable (even though the AI Agent is probabilistic in nature)

The more sophisticated the problem an AI Agent tries to solve, the more steps will have to be taken according to the above template.

This mirrors how humans work: They start with a big project and break it down into smaller pieces until it’s easy to replicate each piece.

The trouble with the AI Agent hype now is that most vendors are promising the moon while still figuring out how to deliver on the tiny pieces of the task.

On the other hand, as they work through those issues, the AI Agents should become increasingly capable over time.

Success will likely follow a similar pattern to other technology adoption curves in that they’ll start with simple, well-defined tasks and then gradually expand capabilities as trust and reliability grow

Strategic Implications

Having to work on problems with low complexity that are verifiable means that many of the more pressing issues still need to be solved by humans.

For technical leaders and programmers, some strategic considerations to think through:

  • AI Engineers need to understand the problem they are solving fully

The probabilistic nature of LLMs means guard-rails must be installed early

  • Verifying work is not a one-and-done task, it’s an every-time task

  • Start with tasks that have clear right/wrong answers before moving to more nuanced problems

Key Takeaways for AI Agent Development by AI Engineers

David’s experience with what kinds of problems can be solved by LLMs (January 2025) suggests some things to watch out for when developing AI Agents:

  1. Lower complexity of the problems AI Agents solve Currently, LLMs still get confused with too much context and ambiguity, so AI Agents should tackle small tasks instead of large ones.

  2. Understand the problem end-to-end Because the workflow is built once and run many times in a non-deterministic way, AI Engineers should consider as many scenarios as possible.

  3. Focus on AI Agent outputs that produce results that a human can verify quickly, easily, and consistently. While humans are removed from the process, for the time being, the results will be used/actioned by humans, so it’s important to consider what they get out of the process.

  4. Design for iteration speed The smaller the AI Agent, the more times you can run it on a process, which will help build trust in the agent and the solutions it provides.

  5. Make errors easy to spot and fix Having the work produced easily verified as “correct” or “not correct” will make it easier for a human or another LLM to verify the work.

The future of AI Agents isn’t about solving the most complex problems first but about reliably solving simpler problems at scale while gradually expanding capabilities.