I Didn’t Realize How Fragile My Coding Chops Were Until I Started Working on Real Production Systems

by Justin Skycak (@justinskycak) on

Nothing prepared me for how violently they punish even the smallest mistake.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.


Even after building commercial ML/analytics, I didn’t realize how fragile my coding chops were until I started working on real production systems.

Nothing prepared me for how violently they punish even the smallest mistake.

“There was no avoiding pain. There were lots of unknown unknowns that eventually appeared, that I had to learn about the hard way.

For instance, cascading failures. On one hand it’s obvious that if one thing fails, it can cause another thing to fail, and so on. But unless you’ve lived it, you don’t realize the magnitude of a screw-up that can happen from one small thing.

Coming from a mathy coding background working in self-contained notebooks, you think the obvious solution is to just prevent things from breaking internally.

And that’s the first step, but then there’s another step: what if something does break? What happens then? You need to really limit the scope of the breaking, not to have it cascade out to other things.

You have to accept that something’s going to break at some point, even if you’re as careful as possible. You’re going to deploy a typo, or your tests are not going to catch some case, or somebody is going to run into some scenario that you never thought would have ever happened.

So it becomes really important to take error handling seriously, write robust code that can self-heal any problems in the inputs, and don’t let in any change to the system unless you validate beforehand that nothing’s going to screw up. Not just theoretical validations – actually have the system run everything everything to the point where you are a hundred percent confident, it’s guaranteed to continue working.

Because if you don’t, your system will eventually blow up, and it’ll happen at the worst time. It blows up overnight at 4am and you wake up to all these emails from people. People are upset, everybody’s confused, emails keep coming in. It ruins their day, it ruins your day, everyone’s day is ruined. It’s a disaster.

First time I accidentally blew up the task processor, I nearly puked. For a good minute I thought I was about to pass out. It was really stressful.

But that’s the kind of emotional scar tissue that guides you in the future. Like, okay, we’re not gonna do that again. Every time I’m writing code now, I’m thinking, like, how can I not be in that situation again? And I haven’t been in that situation since.

It sucks, but the pain teaches you to have respect for the process and gets you to do all the right things.”

(weaving together snippets from our discussion starting ~32:58)




Want to get notified about new posts? Join the mailing list and follow on X/Twitter.