8

Lightrun – the best way to debug production problems

 2 years ago
source link: https://vladmihalcea.com/lightrun-debug-production-problems/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Last modified: Mar 4, 2022

If you are trading with Kraken or Revolut, then you are going to love RevoGain!

Introduction

In this article, I’m going to present you Lightrun, a very useful tool that I discovered recently while developing RevoGain, which helps me debug problems happening in production.

Lightrun is like no other tool I’ve used before since it allows us to insert log entries dynamically at runtime, capture snapshots, and even inject metrics without changing your production code.

This is especially useful when investigating issues reported by clients since we can figure out the problem while the user is executing the actions that can replicate the issue. Cool, right?

Getting Started with Lightun

Setting up Lightrun is very easy and takes you less than 5 minutes to configure it:

  • Step 1: Install the Lightrun IntelliJ IDEA plugin, which works with both the Ultimate and the Community editions,
  • Step 2: Create an account on the Lightrun App platform.
  • Step 3: Install the Lightrun JVM agent that will be used to introspect our application. On the Lightrun App platform, you can find the instructions on how you can set up the agent depending on your development and production system requirements,
  • Step 4: Configure your application to use the Lightrun JVM agent.

In my case, since RevoGain is a Spring Boot application, I can provide the agent on my local Windows environment, like this:

java -agentpath:%USER_HOME%/agent/lightrun_agent.dll ^
-jar revogain-%REVOGAIN_VERSION%.jar

And, for the production system, I can use this Linux-based command:

java -agentpath:~/agent/lightrun_agent.so -jar revogain-$REVOGAIN_VERSION$.jar

Lightrun dynamic logging

A very common issue with parsing trading statements is when the trading balance doesn’t add up. This can happen with operations that are not yet supported or because either the statement file or the parsing logic is broken.

Debugging such issues requires having the trading statement, and, unfortunately, not all clients are willing to provide it for us to debug it locally. So, in these particular cases, adding a dynamic log entry is going to help us spot the problem while the user is parsing their statements.

So, let’s add a dynamic log entry that displays the calculated trading balance for a specific user:

Lightrun simple log popup

The Format text field defines the message that’s going to be logged. The {calculatedBalance} placeholder is going to be replaced with the value of the calculatedBalance local variable when executing the method in question.

The Condition text field allows us to define filtering criteria so that the message is logged only if the provided condition evaluates to true. In our case, we want to display this message only for the user with the identifier value of 1, as illustrated in the advanced log popup screenshot.

LightRun log advanced popup

So, this log message is only going to be printed for the user with the id value of 1, while for other users, it will be ignored.

The Lightrun log messages are printed in the application log, but we can also pipeline them to our IDE.

Next, we can ask the user to import a new trading statement, and the calculatedBalance log entries are going to be printed in the Lightrun Console, as follows:

Lightrun log messages

Brilliant!

Check out how the balance is being calculated based on the trading operation we are parsing from the statement. If the calculated balance doesn’t match the balance values provided by the statement, we can pinpoint to the client what causes the issue so that they can inspect it as well.

Without Lightrun, we can’t just debug the productions system since the entire server will halt, therefore affecting availability.

And, that’s not all. Lightrun allows us to capture dynamic snapshots, as we will see in the next section.

Lightrun runtime snapshots

Another cool feature offered by Lighrun is the ability to capture runtime snapshots that contain both the stack trace and the variables available when the snapshot was taken.

Since RevoGain users are restricted to the countries where FastSpring, the external payment processor, is currently operating, we want to investigate the cases when the user country cannot be resolved, and, for this reason, we are going to use the following Lightrun snapshot.

Lightrun snapshot popup

The Condition text field is used to activate the snapshot only when the country local variable is null, meaning the location could not be fetched.

When trying to access the application from an IP address the GeoLocationService cannot process, we can see how Lightrun manages to capture the in-memory context at the time when the snapshot was created:

Lightrun snapshot result

Notice the geoLocationDTO object that was captured at the moment when the country object couldn’t be resolved.

This is a very valuable feature since it can allow us to aggregate multiple information at once, rather than having to do so using individual logs.

Lightrun dynamic metrics

And we can also add metrics dynamically without changing the source code we are monitoring. For instance, I employ this feature to figure out how long it takes to validate email addresses using the Stop Forum Spam API.

The reason I’m validating email addresses is that there are a plethora of bots running over the Internet trying to infest our applications with useless accounts that consume space in the database.

Adding a duration metric using Lightrun is very easy and, just like it was the case with the dynamic logs and the runtime snapshots, we can do it directly from IntelliJ IDEA:

Lightrun metrics duration popup

Now, every time a user registers, the isSpam method invocation is going to be intercepted and monitored by Lightrun, and we are going to get the call durations printed in the Lightrun console:

Lightrun metrics duration result

Awesome, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Lightrun is easy to use but very powerful as we can inject logs, collect snapshots, or instrument our code using metrics without even changing the production source code that would require a redeployment. And that’s big since I’m offering a live chat to my clients, and I can debug their production problems during our live conversation. This helps me provide exceptional support to my clients that I couldn’t provide without a tool like Lightrun.

For this article, I used the Lightrun Free Tier, which is limited to 3 agents. However, since RevoGain is a majestic monolith, this is not an issue for me.

If you are using a microservice architecture, and you wish to deploy more than 3 agents, then you will have to use the Professional edition instead.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK