Accelerating Developer Velocity with Time-Travel Debugging

July 15, 2021
5 minute(s)

Every company should want to increase its developer velocity. Time-travel debugging is one of the best-in-class tools that are a primary driver of developer velocity and a top contributor to business success.

Ten years ago, software started eating the world. Today, every company is a software company. In every industry segment, from IT (of course), through medical, financial, energy, retail, … everything, companies depend on software, and therefore, software developers, to achieve their business goals. In the never-ending quest to improve, industry leaders have coined the term “Developer Velocity” as the ability to improve business performance through software development. So, every company should want to increase their developer velocity. But what exactly does that mean?

During 2020, in the midst of the global COVID-19 crisis, McKinsey sought to identify and quantify what drives developer velocity. They asked hundreds of engineering and technology executives to rate their company’s performance on 46 drivers across 13 capability areas in the three broad categories of technology, working practices, and organizational enablement. The weighted average of scores across all the drivers was defined as the Developer Velocity Index (DVI).

Not surprisingly, Mckinsey found that companies with a high DVI outperform others in the market by 4 – 5 times.

Revenue CAGR - Ozcode

Source: McKinsey

Drilling down into the numbers, McKinsey also found that tools in general, and development tools, in particular, were one of the drivers that had the greatest impact on business performance, which brings me to tools for debugging in production.

Studies have shown that 43% of developers spend 25% of their time debugging in production.

That’s time spent fixing errors instead of delivering more value that contributes to business performance. This is why time-travel debugging in production is one of those tools that can help every business accelerate developer velocity.

Impact on Developer Velocity - Ozcode

Source: McKinsey

Alternatives aren’t good enough

While there are several alternatives for resolving errors in production, none of them provide the same level of production data and insights as time-travel debugging.

Legacy tools

Resolving errors in production is not new. Developers have had to grapple with production issues since the dawn of computing. Over the years, many tools were introduced, including Dump files, post-mortem analysis tools, profilers, remote debuggers, and more.

While all of these tools are better than nothing, either they don’t provide enough data for an effective root cause analysis, or they incur an unacceptable impact on performance. For example, remote debuggers may provide exception information, but they block the server, which is unacceptable in production. Dump files may not block your production servers, but they don’t show you the latest logs or HTTP requests and will only provide you with local variables or source code if the code is not optimized (which is usually the case in production systems). Furthermore, dump files only represent a single point in time in the program’s history, which is not usually enough to understand that all-elusive chain of causality that caused things to break down.

So, these legacy tools don’t quite cut it.

Observability platforms

The last decade has seen a rise of Application Performance Monitoring tools which have evolved into full-blown observability platforms. These sophisticated tools have moved monitoring and error resolution in production systems forward by leaps and bounds and work well with modern architectures like microservices and serverless. However, none of these platforms provide the code-level observability you get with time-travel debugging. They are supremely suited for system-level errors, detecting overloaded microservices, or downed virtual machines, but they do not provide the insights needed to resolve exceptions and logical errors that only manifest under unique circumstances of production systems.

Traditional log-based debugging

There isn’t a developer out there who doesn’t write log entries. It’s, by far, the most common way developers try to debug production errors. It’s just so easy to write something like,

Logger.LogInformation(“About to invoke transaction {id} on table {table}”, transactions.Id, table.Name)

But log-based debugging is both inefficient and ineffective. Inefficient because developers write way too many log entries and typically never observe or analyze 99.9% of them. Ineffective because, for all the log lines they write, they never have the right data when and where they need it. This is the paradox of static logs: If you know what to log, you’ve already solved the bug. Therefore, debugging in production with logs is an arduous, time-consuming process that usually requires several iterations.

Debugging with logs, the traditional way - Ozcode

How time-travel debugging drives developer velocity

Time-travel debugging can slash the time developers spend on resolving production errors by up to 80%, which means that those developers can spend more time delivering business value.

It’s like any type of problem-solving. To resolve production errors effectively, developers need data. In development, they have all the data they need at their fingertips right in their IDE’s debugger. Not so in production.

To begin with, production errors can be very hard to reproduce in the first place. You can’t place breakpoints in production since that would interrupt service to your customers. Matching your production code to the right source code version is not as trivial as it may seem, and modern microservices and serverless architectures where the offending code is running one moment and gone the next only complicate matters. In most cases, developers don’t even have access to the production environments they need to debug. So, usually, they rely on log files, and we’ve just been through how inefficient and ineffective those are.

Time-travel debugging provides the development experience in production.

When your application throws an exception, Ozcode Live Debugger automatically captures a vast amount of data related to that exception. It starts with a complete recording of the code execution flow of the error across microservices (or serverless code) from user interaction to the line of code that threw the exception. This means that developers can step through the error execution flow, line by line, with full visibility into the call stack, locals, method parameters and return values, HTTP requests, and database queries at every step of the way. Ozcode makes this data available without interrupting service and with no noticeable impact on your production systems.

But not all software errors generate exceptions. Logical bugs can make your application display incorrect behavior without throwing an exception. Here, Ozcode’s dynamic logging with tracepoints provides developers with the production data they need to resolve errors. By placing tracepoints in strategic locations in the code where they suspect the error originates, they can add and remove log entries on the fly without impacting performance. In addition to the dynamic log entries, Ozcode also adds time-travel debug information to the methods that contain tracepoints so that developers can track production data step by step through the lines of code. Doesn’t that sound familiar. It’s exactly the experience a developer gets when debugging on her local environment – only now, it’s in production.

Time-travel debugging, backed by science

Mckinsey’s research showed that best-in-class tools are the primary driver of developer velocity and a top contributor to business success. This is exactly the category in which Ozcode’s live, time-travel debugger sits through the ability to slash 80% off the time taken to resolve production errors. Mckinsey’s research fully supports this assertion:

Additional areas that executives believe will accelerate software innovation and impact in the future include increased usage of product telemetry to make product decisions and automation in detecting and remediating production issues.
Source: Mckinsey

So, if you want to be in that quartile with 5x business performance, time-travel debugging might not be the only change you need, but it’s a great place to start.

Ozcode Live Debugger

Zero code changes

Install lightweight agent in 5 minutes.

Run anywhere

Easily deployed on-premises or in the cloud. Azure, AWS, Windows and Linux.

Low footprint

Less than 3% impact on runtime performance.

Rami Honig

Supercharge WebApps by Testing and Debugging in Production - Ozcode

Accelerating Developer Velocity with Time-Travel Debugging

Accelerating Developer Velocity with Time-Travel Debugging

Alternatives aren’t good enough

Legacy tools

Observability platforms

Traditional log-based debugging

How time-travel debugging drives developer velocity

Time-travel debugging, backed by science

Ozcode Live Debugger

Zero code changes

Run anywhere

Low footprint

Rami Honig

Supercharging Web Apps by Testing and Debugging in Production

3 Reasons to Enable Debugging in Production

Comments

Recommend

吾辈的工作 - rxliuli blog

Feeding Second Life Selfies into WOMBO and Reface: Now with Duets!

韩国一虚拟货币诈骗犯被判处有期徒刑10个月

It’s Time to Upgrade My Home Computer: Anybody Have Any Good Recommendations on...

A highly customized dropdown,select,picker menu for react native

反常磁矩里的反常

中国广电股份公司发布招标公告，涉及集团管控、有线网视听点播等项目

There Are Things Only Business Can Do About Race: Five Tips to Hiring Black Tale...

RECONNECT: The biggest high tech-event is coming. along with a series of spectac...

2021.07 坚韧，易检和渐进故障 - 向航空行业学习健壮性

About Joyk