Guide · 6 min read
That Former Employee Left Behind Code You Can't Maintain. You're Now Stuck.
The System Nobody Else Understands
An engineer builds a critical system. It automates something important. A workflow. A calculation. A data pipeline. The engineer is the only one who understands how it works. The code has no documentation. The logic is complex. The system just works. Three years later, the engineer leaves for a better opportunity. A month after they leave, the system breaks. A dependency changed. A server went down. A data format shifted. Something. Now your team is in trouble. Nobody knows how to fix it. The engineer is gone. The code is there, but it might as well be written in ancient Egyptian. You have to: 1) Hope the engineer answers your calls. 2) Spend thousands of dollars getting an external consultant to reverse-engineer the system. 3) Spend weeks rebuilding it from scratch. This is expensive, demoralizing, and completely avoidable.
Why This Happens
It happens because building and maintaining systems are treated as different problems. Building is fun. It's creative. You're solving problems. You're shipping features. Maintaining is boring. You're writing documentation. You're explaining decisions. You're making sure someone else can understand your work. Most engineers choose to build. Maintenance gets skipped. Nobody makes this choice explicitly. It's just what happens when you're under deadline and maintenance feels optional.
The Damage It Does
Operations Risk — If the system breaks and only one person knows how it works, you're stuck. You can't fix it. You can't modify it. You can't improve it. You're trapped.
Scaling Friction — You want to scale the system. But nobody understands how it works. So you can't. You're stuck with the current capacity even if you want to grow.
Business Continuity — What if the person who built it gets hit by a bus? (Morbid, but real.) What if they leave? What if they're on vacation when the system breaks? You're vulnerable.
Knowledge Rot — Systems change. Dependencies get updated. Environments shift. If nobody's maintaining the documentation, your understanding of how the system works becomes increasingly inaccurate.
Hiring Friction — You want to hire a new engineer to maintain this system. But there's no documentation. The interview process is "Here's the code, can you figure out what it does?" Not a great interview.
Technology Decisions Are Made Poorly — Should we move this to a new framework? Upgrade this library? Migrate this database? You can't make informed decisions because you don't fully understand the system.
What This Looks Like in Practice
Scenario 1: The Billing System — An engineer built a system that calculates invoices. It's been running for five years. It's never had a problem. Nobody's looked at the code in years. One day, a new product is launched with different pricing logic. The system needs to be modified to handle it. You ask the original engineer (who's now at another company) if they remember how it works. They say: "Not really. It's been five years. Let me see the code." They spend a week deciphering their own system. Then they implement the change. It takes two weeks total when it should have taken two days. Cost: $3,000-5,000 in consulting fees.
Scenario 2: The Data Pipeline — An engineer built a script that runs nightly, pulls data from your API, processes it, and loads it into your data warehouse. It's been running fine for three years. Nobody touches it. One day, a bug in your API starts causing malformed responses. The script breaks. It crashes silently. For three days, no data is loaded. You don't notice until someone asks "Why do we have no data this week?" Nobody knows how to fix the script. It takes a week to get it working again. Cost: Three days of bad reporting. One week of engineering time. Possibly bad business decisions made on incomplete data.
Scenario 3: The Infrastructure — An engineer built an infrastructure as code setup using Terraform. It's complicated. The engineer leaves. Six months later, you need to add a new server. You look at the Terraform code. It's complex. The variables are cryptically named. There are no comments. You have no idea what happens if you change something. You're afraid to touch it. So you manually create the server instead, which defeats the purpose of infrastructure as code.
How to Know If You Have This Problem
Question 1: Is there code running in production that only one person understands? If yes, you have this problem.
Question 2: If that person left today, could someone else keep the system running? If the answer is "no" or "maybe, but it would take a while to figure out," you have this problem.
Question 3: Is there code running in production that has no documentation? If yes, you have this problem.
Question 4: Do you have a process for handling outages in critical systems? If the answer is "call [person who built it]," you have this problem.
How to Fix This
Level 1: Minimum Viable Documentation (Do This First) — For each critical system, create a one-page document that answers: What does this system do? (One paragraph description.) Where does it live? (What server, what code repository, how to access it.) What does it depend on? (What databases, APIs, services does it need?) How do you run it? (If it's a script, how do you execute it? If it's a service, how do you start/stop it?) What can go wrong? (What are the failure modes? What would cause it to break?) How do you know if it's broken? (Are there logs? Are there alerts? How would you notice?) How do you fix it? (If it breaks, what steps do you take? Who do you call if you get stuck?) Who maintains it? (Who's responsible for keeping it running?) That's it. One page. Takes an hour to write. Saves weeks of trouble later.
Level 2: Code Comments — Go through the critical code. Add comments that explain: Why did we make this decision? What's not obvious about this section? What assumptions are we making? Not every line needs a comment. Just the parts that are non-obvious.
Level 3: Architecture Diagram — Draw a simple diagram showing: Components of the system; How they connect; Where data flows; External dependencies. This can be as simple as a Google Slides presentation. It takes 30 minutes and clarifies understanding.
Level 4: Runbook — For critical systems, create a "runbook" that documents: How to deploy the system; How to monitor it; How to respond to common failures; How to scale it up or down; How to get help.
The Team Conversation to Have
Pull in the engineers who built critical systems. Say: "I realize we have several systems that only you understand. That's a risk for all of us. I'd like to work with you to document how they work, so that if you're hit by a bus, someone can keep them running. This also helps us onboard new people and make decisions about the system. Can we schedule time to document this?" Most engineers will say yes. They actually want to document. It's their product. They just need permission and time.
The Downloadable Resource
We've created a Critical Systems Documentation Template that includes: A one-page system overview template (start here); a code documentation checklist; an architecture diagram template; a runbook template (for operational procedures); a checklist for "Is this system documented enough?"
Download it here: aiforbusiness.net/resources/critical-systems-documentation
This typically takes 2-3 hours per critical system and is probably the highest-ROI documentation you'll ever write.
What's Next
Once you've documented your systems, the next question becomes: Does your team actually know how to use them to make better decisions? The next article, "Your Team Can't Answer Basic Questions About Your Own Business," covers how poor data organization prevents your team from getting insights.