Back to blog
8 min read

Notes on how a world-class software engineer uses LLMs - Part 1

Article: How I program with LLMs

HN Comments: How I program with LLMs (crawshaw.io)

What the article covers

This document is a summary of my personal experiences using generative models while programming over the past year. It has not been a passive process. I have intentionally sought ways to use LLMs while programming to learn about them. The result has been that I now regularly use LLMs while working and I consider their benefits net-positive on my productivity. (My attempts to go back to programming without them are unpleasant.)

It’s very early but so far the experience has been positive.

So I followed this curiosity, to see if a tool that can generate something mostly not wrong most of the time could be a net benefit in my daily work. The answer appears to be yes, generative models are useful for me when I program.

My Thoughts

Overall takeaway

It’s easy to go on Twitter/X and YouTube to find regular software programmers/engineers building with LLMs.

It’s rarer to find a truly world-class software engineer like David Crawshaw, Co-Founder and CTO of Tailscale and formerly a Staff Software Engineer at Google, writing about how he’s using LLMs and what’s working for him.

The three ways David finds LLMs to be helpful are:

  1. Autocomplete: “This makes me more productive by doing a lot of the more-obvious typing for me”
  2. Search: “If I have a question about a complex environment…I will get a far better answer asking any consumer-based LLM, o1, sonnet 3.5, etc, than I do using an old fashioned web search engine and trying to parse the details out of whatever page I land on.”
  3. Chat-driven programming: “This is where I get the most value of LLMs, but also the one that bothers me the most. It involves learning a lot and adjusting how you program, and on principle I don’t like that.”

What makes this particularly interesting is that these use cases resonate across skill levels.

This reminds me of the quip “Stars! They’re Just Like Us,” in that I, too, use LLMs for Autocomplete, Search, and Chat-driven programming.

The paragraph that resonated with me the most is:

Let me try to motivate this for the skeptical. A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need. (I’m an early-morning person, so this is usually any time after 11am for me, though it can also be any time I context-switch into a different language/framework/etc.) LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.

The key point here is that even world-class programmers eventually run out of energy to continue doing the work that they love.

The LLM lowers the barrier enough that it lets them continue programming (maybe not at full bore, but at least at some bore).

Until AGI / ASI takes over, the key is to find ways to use and build tools that make it easier for humans to accomplish their goals.

This blog post is part one, so I’ll go deeper into the different parts of the article tomorrow.

For now, here are some practical implementations for AI Engineering and AI Agents from the overall takeaways and the paragraph quoted above.

Practical Implementation

LLMs and AI Agents should lower the activation energy required to achieve a goal.

Goal achievement progression generally follows:

  1. Come up with a goal
  2. Look at ways to solve it
  3. Stress-test potential solutions
  4. Figure out how to achieve it within constraints
  5. Start the process
  6. React/Correct as you get closer to the end
  7. Achieve the goal

The LLM System should lower the activation energy or completely remove that step for the user at each step.

The more sophisticated the goal, the longer each step will take, so either breaking it down or partially solving it allows humans to keep moving forward.

Scale & Performance Considerations

When thinking about scaling LLM assistance for programming or other problem solving, a few considerations to consider:

  1. Energy Levels & Cognitive Load
  • Different times of day/days of the week affect programming capacity
  • LLMs can help maintain productivity during low-energy periods
  • Balancing tool assistance with developer understanding
  1. Context Management Load
  • IDE integration can bring too much complexity
  • A clean slate approach (browser-based) often works better
  • Manage the scope of context the LLM receives
  1. Verification Overhead
  • Always need to compile and test generated code
  • More structure means more verification points
  • Trade-off between granularity and verification time

Having an LLM lower the barrier to problem solving is great, but given the current technology, it doesn’t offload the work; it just makes it less arduous.

Tools, no matter how advanced, require thoughtful integration.

The goal is to manage trade-offs, not eliminate them.

Trade-offs & Limitations

David’s article looks at some key trade-offs:

  1. Flexibility vs. Reliability
  • Chat-driven programming requires learning new workflows
  • Non-deterministic service with changing behaviors
  • Benefits may not justify adjustment costs for all developers
  1. Scope vs. Quality
  • LLMs do better with exam-style, contained questions
  • Large, complex workspaces can confuse LLMs
  • Need to scope requests for best results carefully
  1. Structure vs. Speed
  • More package boundaries mean more precise context for LLMs
  • Additional structure adds overhead
  • Need to balance organization with development velocity

David concludes that while it’s a net positive, there’s still much work to be done.

He, in fact, plugs a project he’s working on to solve some of these issues.

Strategic Implications

For technical leaders and programmers, some strategic considerations to think through:

  1. Tool Investment
  • Need specialized environments for LLM interaction
  • Different from traditional IDE integration
  • Tool (re)training to work better
  1. Team Workflow
  • Changes how code structure decisions are made
  • Affects package organization strategies
  • Influences test writing approaches
  1. Code Organization
  • Trend toward more specialized, smaller packages
  • Less emphasis on DRY principle
  • More focus on readability and maintainability

As LLMs drive faster development, teams will have to decide whether that means

  • more robust code gets written within the same time it takes today or
  • more code gets written within the same time it takes today but not as robust as it could possibly be (current standard of non-LLM-aided code)

Team Adoption Patterns

Based on David’s experiences, teams adopting LLM coding as the standard should consider the following:

  1. Progressive Integration
  • Start with autocomplete, as it’s the lowest risk and has immediate benefits
  • Move to search-based usage
  • Finally, experiment with chat-driven programming
  1. Environment Setup
  • Create clean, isolated environments for LLM interaction
  • Establish clear guidelines for when to use LLMs
  • Build feedback loops for what works/doesn’t work
  1. Cultural Shift
  • Accept that LLMs produce imperfect first drafts
  • Embrace verification and more rigorous testing as part of the workflow
  • Focus on review and improvement rather than perfect generation

The change in mindset will drive more results.

David shares in the article that based on his (non-rigorous) data, he has seen:

it appears from my records that for every two hours of programming I do now, I accept more than 10 autocomplete suggestions, use LLM for a search-like task once, and program in a chat session once.

Key Takeaways for AI Agent Development

David’s experiences with LLMs in programming suggest some fundamental principles for AI Agent development:

  1. Energy Management
  • AI Agents should reduce cognitive load
  • Focus on making tasks more manageable, not just automated
  • Help users maintain productivity through energy peaks and valleys
  1. Context Awareness
  • Keep interactions focused and well-scoped
  • Provide clean environments for specific tasks
  • Avoid overwhelming users with complexity
  1. Verification First
  • Build verification into the workflow
  • Make errors easy to spot and fix
  • Design for iteration rather than perfection

The goal isn’t to replace human judgment but to lower barriers and maintain productivity throughout the day.

As AI Agent tools evolve, success will come not from replacing human judgment but from finding the right balance between human insight and AI assistance.