4

Technical Disasters Should Not Catch Developers Unprepared

 2 years ago
source link: https://itnext.io/technical-disasters-should-not-catch-developers-unprepared-84bf3d09bfbe
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Technical Disasters Should Not Catch Developers Unprepared

It’s not if something will go wrong, but when

Photo by SHVETS production from Pexels

“If a catastrophic outcome is possible or you can’t judge the downside, stay away.” — Peter Bevelin

If you work on software projects long enough, you will suffer a technical catastrophe. The build will break, the server will explode, an environment will become corrupt, somebody with sausage fingers will delete data.

If something can go wrong on a software project, at some point it will go wrong. Technology goes wrong, Humans make mistakes, designs are flawed, people leave, and the probability of accidents is they will happen (known as Murphy’s Law)

Software development teams need to recover quickly when something wrong. This is what best practices are for, this is why we have source control, DevOps and documentation.

If developers are not prepared to recover quickly, they are not doing their job..

Covid was not a black swan

I was watching the video about COVID — US Pandemic Policy: Failures, Successes, and Lessons

This was not a black swan event. This was an entirely predicted and predictable event. We knew it was going to happen….And yet, we weren’t ready. Alex Tabarrok

Scientists warned about a virus similar to Covid, a killer flu will happen at some point. It had happened in the past and was likely to happen in the future. Each country had a responsibility to be prepared.

In April 2015 Bill Gates did TED Talk — Bill Gates: The next outbreak? We’re not ready | TED. There had been warnings such as SAR’s, bird flu, etc.

Yet when Covid hit, the world wasn’t ready and was caught on the back foot.

Azure Mindset

Developers should live by the mantra — if it can go wrong, it will go wrong. I call this the Azure mindset.

Azure doesn’t try to stop servers from breaking because it's impossible to stop servers breaking every time.

Instead Azure focuses on recovering at rapid speed. If the server breaks, we can pop up another one almost instantly.

Azure makes everything scriptable to enable DevOps to restore quickly and roll back if needed.

Best practices

Development disaster is a pain in the butt, disrupt plans and makes development late. Best practices should minimise disasters, enable the development team to recover quickly.

We don’t always realise it but best practice of software development avoids the development team being wiped out by disaster.

  • Backups enable us to restore and minimise loss of data
  • Ensure we don’t have a single source of failure.
  • We can catch errors and have retry mechanisms.
  • Source control backs up versions of code, so we can rollback
  • DevOps enables us to restore a vanilla environment rapidly
  • Nightly builds finds build problems
  • Unit testing and automated testings finds bugs after code changes
  • Alerting protectively warns us services are down and reduces downtime
  • DevOps reduces manual deployment errors
  • Documentation combats individuals leaving

Software development disasters

I have worked on software projects where things have gone bang, individuals deleted something accidentally and environments became corrupt and no one could work out why.

The important part isn’t finding out who to blame. The important part is how do we recover from this situation as quickly as possible.

The projects we deployed multiple best practices, recovery was straightforward, calm and took about a day. It still disrupted us, but we can reset the environment quickly and easily.

The projects with a more cowboy approach, involved heroic work from developers to come up with innovative ways to fix it and then hours of effort outside working hours to fix it.

The low quality development team needed a hero and although everyone was congratulating all involved, I took it as a warning. This time we got lucky, we got away with not doing development properly. Next time, we might not be so lucky and I didn’t want to be involved in a development that contributed to its own downfall with poor development practice.

When you see hero developers, it’s not something to celebrate, it’s a symptom of poor development practice. It's a warning that you need to improve your development processes or the development gods will take you down.

Conclusion

Developers cannot stop all the errors, mistakes, problems and issues with software development. It’s impossible to predict all problems in a complex development environment.

When you realise you can’t stop all problems, you can focus on being able to recover fast with minimum effort.

With software development, it’s not if things will go wrong but when. You can’t stop all the problems from happening, but you can be ready to recover when they do.

Technical problems are not black swan events, they can and should be predicted. Developers should be ready.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK