Skip to main content
Software

How to prevent software bugs from ruining your weekend

IT pros share some old horror stories of accidental disruption.

A hand dropping a laptop with code on the screen falling out of it. (Credit: Illustration: Anna Kim, Photos: Adobe Stock)

Credit: Illustration: Anna Kim, Photos: Adobe Stock

4 min read

Decades ago, when Mathew Thomas was a young programmer for a pager company, he learned a valuable lesson that many veteran coders have likely learned, too:

If the weekend is almost here, don’t touch anything.

“If you really want to anger the gods of software, deploy new software on a Friday afternoon,” Thomas, now SVP of engineering at KnowBe4, told us, recalling how a well-meaning colleague at the time who’d wanted to optimize a database by changing indexes, but did not change all of the code that uses the indexes.

Software at that pager company connected a customer relationship management (CRM) network to a network of pagers. The changes accidentally took down the CRM system, Thomas said, and there was the risk that two million customers could not manage their devices.

“It was what we call a potential extinction-level event. I mean, it was that type of bug,” he told us, remembering working the weekend to solve the problem.

Veteran coders like Thomas spoke with us about the coding bugs that can ruin a Friday or a whole weekend—and what mechanisms (like emergency rollbacks) should be in place before everyone heads to happy hour.

It’s buggy out. A bug—“the bane of every developer’s existence,” according to Clinton Walker, associate director of delivery at Presidio—generally refers to code that doesn’t work as intended.

A bug can lead to unexpected consequences, ranging from error messages to downed CRMs.

Recent research from digital infrastructure advisory organization Uptime Institute found that 40% of 397 global data center operators and vendors experienced a “significant, serious, or severe IT service outage” caused by human error over the past three years. Top reasons reported by the IT practitioners included failure to follow correct procedure.

Also, according to Uptime: Sixty-one percent of a surveyed 207 respondents said a “software or configuration” error within the same time period led to an outage caused by a problem with a third-party IT service provider.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

Bugs happen—even to the Mars Climate Orbiter, which burned up following a failure to use metric measurement in the coding of a software file, according to NASA’s review board.

And we all remember CrowdStrike’s memorable outage on a Friday.

How to get rid of all the bugs. Walker recommends many practices to prevent bugs, including static analysis tools followed by manual review from a senior engineer.

Dave Laskowski, director, application development and engineering at Protiviti, suggests modern CI/CD pipeline tools like Azure DevOps and plenty of unit tests, or short pieces of code written to run the main functional code and verify its output. “A good developer will write multiple unit tests with different inputs to thoroughly validate critical functions,” he wrote in an email to IT Brew.

KnowBe4 deploys strict access controls for code tweakers, no matter the rank.

“Even I, as the SVP of engineering here at KnowBe4, don’t have permission to make a code change, merge that code change to our production branch, and ship that,” Thomas told us. “There are probably about four or five different software systems in between that would block me from doing something so reckless.”

After the call, he shared in an email other measures the company implements, including built-in automation blocks for production changes unless a certain number of engineers have reviewed and approved the modifications, as well as tools that can roll back disruptive code updates.

And, of course, there’s engineering policy to make sure everyone has a good weekend.

“Every engineer is fully aware that it is a violation of company policy to deploy changes after a certain time on a Friday,” Thomas wrote.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.