My 2 Rules of Thumb (So Far) When Using AI Coding Tools
Rule #1 is pessimistic, but rule #2 is optimistic.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
Here are my 2 rules of thumb so far when using AI coding tools (specifically Cursor though it’s on my to-do list to check out others like Claude Code, Amp, etc.).
Rule #1 is pessimistic, but rule #2 is optimistic.
Rule #1
Never offload decision-making. You have to make all the decisions. Personally I don’t even let it write code unless it’s a small snippet that follows a simple design pattern and would otherwise amount to you doing a bunch of mechanical “copy, paste, minor tweak” work.
(YMMV; this rule may be specific to my situation which is that most of the code I write involves heavy domain expertise and custom algorithms that you wouldn’t be able to interpolate even if your brain contained all the public information on the internet.)
Most of the times that I have broken this rule, it has gone badly right off the bat. There were a couple times when I initially thought it went well, but later downstream I realized I got faked out, and it took me an order of magnitude more work to diagnose and resolve the issue than if I just wrote the damn thing myself to begin with.
This is probably best illustrated with an example, so here’s a snippet from a Slack message I wrote a couple months ago:
- "I'm currently debugging some stuff relating to the latest diagnostic update. The automatic diagnostic question pool updater seems to think that the question pool is insufficient coverage when double checking after it's selected the question pool to be sufficient coverage.
...
Amusingly, the bug was introduced by LLM. The diagnostic fix involved a particular graph traversal that I thought the LLM might be able to write up more quickly than me thinking through it from scratch, so I had the LLM do it, and it had a subtle bug. But I didn't notice because the code reads fairly sensibly. It's like the bug was camouflaged. Like one of those instances where a student gets a wrong answer, and they you look at their work, and you're like 'WTF, this looks right,' and then you do it yourself on a separate piece of paper and compare your work to theirs and then you spot the issue.
Feels like a very asymmetric tradeoff here in that the amount of time it takes to debug an LLM-generated bug is extremely high relative to the upfront time savings of the LLM. In this case the LLM probably saved me 20 minutes on the implementation but then identifying this bug took about 3-4 hours because it was so subtle and so deep in the internals.
Probably, when I'm working with nontrivial algo stuff, even if it's not highly context-dependent, I should avoid copying any code from the LLM (even if I'm tweaking it afterwards). Instead, I can just look at the LLM-generated code to get a high-level idea of the implementation approach if it doesn't immediately jump out at me, and then write the actual code myself entirely from scratch."
Rule 2
It can save quite a bit of time and mental energy to have the LLM gather lots of info that you need to make decisions – where is this thing involved in the codebase, please suggest a high-level approach with pseudocode for a complicated graph traversal, please check over my work and let me know if you spot any bugs.
- It's helpful for tracking down where certain logic is in the codebase. Even if I could look this up myself in a few minutes, it still depletes some cognitive energy, which is something I've been trying to avoid (safely) where I can, since it seems I have some kind of limit per day until exhaustion sets in. It's also been very helpful navigating code that's not directly under my umbrella, where I have much less of a mental map where everything is. Basically like a GPS.
- And another thing I've been using it for is checking large code updates before I test them myself. It's perfect at catching dumb syntax errors and great at catching logical inconsistencies and drawing my attention to particular areas where a bug would be most likely to hide. It obviously doesn't remove the need to be careful and test stuff yourself, but it adds a really helpful "pre-testing" layer where
1) any issues it catches in pre-testing take a tenth as much time and cognitive effort to diagnose/resolve than if you waited for them to bubble up in usual testing, and
2) having that extra layer of evaluation from an independent evaluator reduces the likelihood of issues sneaking past usual testing.
More generally, I’m finding AI assistance really useful in cases where I need it to brainstorm stuff that doesn’t have to be 100% correct. Stuff like “it came up with 10 ideas, 4 of them are valid, 6 are not, but the 4 valid ideas saved a bunch of time.”
(Another example of this was diagnosing an unhandled state accumulation issue last month and brainstorming other areas where state accumulation could potentially cause an issue.)
Follow-Up Questions
Have you tried using Cursor rules?
I haven’t been using rules since the only code that I have it generate for me is narrowly-scoped autocomplete, which does a good job of inferring my desired structure.
I understand how rules would come in handy if the bottleneck is design and behavior, but unfortunately right now my bottleneck is that it doesn’t seem to have the domain expertise and extrapolation/generalization ability for the type of stuff I need, even when using the best models.
The few times that I got it to generate a nontrivial amount of correct-looking code and I got all excited about the productivity speedup, it always turned out there was some really subtle bug camouflaged in there, and diagnosing the issue ended up taking 10x longer than if I just wrote it myself to begin with.
My understanding of rules is that they can help a lot with design and behavior, but it can’t really guard against bugs stemming from a lack of domain expertise (unless the amount of required domain expertise outside its training set is small enough that you can enumerate it).
I’ve also tried giving it very very extensive information in the main prompt, but the information typically turns out to either 1) be so extensive that it would be faster for me to code it up myself, or 2) require so much thinking-through that I basically end up having to write the code in order to even figure out what all information is necessary to tell it.
That said, I’ve of course been finding other use-cases for LLMs to do more menial work, even work that requires tons of context, and it’s been awesome. But the difference is that the places I’ve been able to leverage LLMs successfully have all been more cookie-cutter, factory-like, whereas most of the coding tasks I deal with are far more amorphous.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.