What the structured logs?
source link: https://sebport0.github.io/what-the-structured-logs/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction
Hello. On this blog post you and me together will look at two silly little examples related to structured logs inside the AWS CloudWatch service. Hopefully they’ll show you why structured logs are a good idea.
The mysterious process
There is a strange log trace. Nobody knows where it comes from. What kind of mysterious service has been producing such weird, automated, ghostly and terrifying text? We don’t know. We must found out more about this.
With determination, we go to the CloudWatch web console and open the log group. Our eyes cannot believe what they see.
| 1654486915158 | START RequestId: 09229113-7024-4110-bdb3-1ebc662c2575 Version: $LATEST |
| 1654486915158 | [INFO] 2022-06-06T03:41:55.158Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'PENDING', 'failed': False} |
| 1654486915158 | [INFO] 2022-06-06T03:41:55.158Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'INITIATED', 'failed': False} |
| 1654486915158 | [INFO] 2022-06-06T03:41:55.158Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'RUNNING', 'failed': False, 'progress': {'at': 0, 'remaining': 100}} |
| 1654486915159 | [INFO] 2022-06-06T03:41:55.158Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'RUNNING', 'failed': False, 'progress': {'at': 25, 'remaining': 75}} |
| 1654486915159 | [INFO] 2022-06-06T03:41:55.159Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'RUNNING', 'failed': False, 'progress': {'at': 50, 'remaining': 50}} |
| 1654486915159 | [ERROR] 2022-06-06T03:41:55.159Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'TERMINATING', 'failed': True, 'progress': {'at': 75, 'remaining': 25}} |
| 1654486915159 | [INFO] 2022-06-06T03:41:55.159Z 09229113-7024-4110-bdb3-1ebc662c2575 {'state': 'TERMINATED', 'failed': True, 'progress': {'at': 75, 'remaining': 25}, 'message': 'Task failed at 75%. Unable to upload images.'} |
| 1654486915159 | END RequestId: 09229113-7024-4110-bdb3-1ebc662c2575 |
| 1654486915159 | REPORT RequestId: 09229113-7024-4110-bdb3-1ebc662c2575 Duration: 1.50 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 38 MB Init Duration: 114.56 ms |
We take a closer look while trying to avoid the mental meltdown. There is hope! That’s JSON inside a Lambda log look alike. Maybe we can write a query inside Logs Insights and grasp, bit by bit, what those inhuman strings of text are trying to say. Let’s try to show the log related fields only
Query
fields failed, state, progress
Output
-----------------------------------
| failed | state | progress |
| ------ | ----------- | -------- |
| | | |
| | | |
| 0 | PENDING | |
| 0 | INITIATED | |
| 0 | RUNNING | |
| 0 | RUNNING | |
| 0 | RUNNING | |
| 1 | TERMINATING | |
| 1 | TERMINATED | |
| | | |
-----------------------------------
From them we can see that something starts running…but it fails! And the process is terminated. Holly molly! What is this thing trying to do? What is going on with the progress field? Let’s retrieve one log to have a closer look at its structure
Query
fields @message
| filter @message like "INFO" and state = 'RUNNING'
| limit 1
Output
| [INFO] 2022-06-06T03:59:54.937Z 4e3b9e67-ac08-4545-968b-75742787a6e9 {'state': 'RUNNING', 'failed': False, 'progress': {'at': 0, 'remaining': 100}} |
Ok! The progress
field is also a JSON look alike. Good, very good. We can be more precise now
Query
fields
failed, state, progress.at as currentProgress, progress.remaining as remainingProgress
Output
--------------------------------------------------------------
| failed | state | currentProgress | remainingProgress |
| ------ | ----------- | --------------- | ----------------- |
| | | | |
| | | | |
| 0 | PENDING | | |
| 0 | INITIATED | | |
| 0 | RUNNING | 0 | 100 |
| 0 | RUNNING | 25 | 75 |
| 0 | RUNNING | 50 | 50 |
| 1 | TERMINATING | 75 | 25 |
| 1 | TERMINATED | 75 | 25 |
| | | | |
--------------------------------------------------------------
What kind of twisted intelligence will do such a thing!? Listing the current and the remaining progress on a scale of 100? Why!? Oh! Our minds cannot comprehend such puzzling structure but we cannot stop now, our destiny is tied to these logs. Our fingers move on their on
Query
fields @message
| filter failed = 1
Output
| @message
| [ERROR] 2022-06-06T03:59:54.937Z 4e3b9e67-ac08-4545-968b-75742787a6e9 {'state': 'TERMINATING', 'failed': True, 'progress': {'at': 75, 'remaining': 25}} |
| [INFO] 2022-06-06T03:59:54.937Z 4e3b9e67-ac08-4545-968b-75742787a6e9 {'state': 'TERMINATED', 'failed': True, 'progress': {'at': 75, 'remaining': 25}, 'message': 'Task failed at 75%. Unable to upload images.'} |
There is a message
field on the last log. The last clue
Query
fields @message
| filter ispresent(message)
| display message
Output
| message |
| Task failed at 75%. Unable to upload images. |
------------------------------------------------
Images. Upload. The process was unable to upload images. Our lives will never be the same. Knowledge cursed us. We will never know. Were those the images of our salvation? Of our hope? Of our despair? We will never know.
Pings to another dimension
Finally, the dping CLI tool is ready. The d is for dimensional because dping is a program that sends pings to servers in another dimension. Your teammates has been testing the application with a server inside the Always Is Christmas dimension, aka, AXMAS-01 and the results are very promising. Now is up to you to show some latency numbers.
First, you write a query to recover the logs
Query
fields @timestamp, @message
| sort by @timestamp asc
Output
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| @timestamp | @message |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| 2022-06-06 02:13:39.119 | START RequestId: fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 Version: $LATEST |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 1, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 2, 'latency': 999} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 3, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 4, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 5, 'latency': 25} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 6, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 7, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 8, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 9, 'latency': 50} |
| 2022-06-06 02:13:39.121 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 10, 'latency': 10} |
| 2022-06-06 02:13:39.121 | END RequestId: fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 |
| 2022-06-06 02:13:39.121 | REPORT RequestId: fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 Duration: 1.52 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 38 MB |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
They look good so far. Now you filter the results to show only the info level logs
Query
fields @timestamp, @message
| filter @message like "INFO"
| sort by @timestamp asc
Output
---------------------------------------------------------------------------------------------------------------------------------
| @timestamp | @message |
| ----------------------- | -------------------------------------------------------------------------------------------------- |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 1, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 2, 'latency': 999} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 3, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 4, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 5, 'latency': 25} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 6, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 7, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 8, 'latency': 30} |
| 2022-06-06 02:13:39.120 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 9, 'latency': 50} |
| 2022-06-06 02:13:39.121 | [INFO] 2022-06-06T02:13:39.120Z fdc95028-6bdd-4ae7-8d0d-b3fcffe17749 {'packet': 10, 'latency': 10} |
---------------------------------------------------------------------------------------------------------------------------------
Since you are an extreme person, you want to see the minimum and maximum
latencies. Thanks to the logs being structured, you know that the queries can use the
latency
field.
Query
fields @timestamp, @message
| filter @message like "INFO"
| stats min(latency) as minLatency
Output
--------------
| minLatency |
| ---------- |
| 10 |
--------------
And for the maximum latency
Query
fields @timestamp, @message
| filter @message like "INFO"
| stats max(latency) as maxLatency
Output
--------------
| maxLatency |
| ---------- |
| 999 |
--------------
Hmmm… that’s quite a difference. What about the average latency?
Query
fields @timestamp, @message
| filter @message like "INFO"
| stats avg(latency)
Output
----------------
| avg(latency) |
| ------------ |
| 126.4 |
----------------
Maybe not that bad for another dimension. Now you want to extract some percentiles
Query
fields @timestamp, @message
| filter @message like "INFO"
| stats pct(latency, 10) as p10,
pct(latency, 25) as p25,
pct(latency, 50) as p50,
pct(latency, 75) as p75,
pct(latency, 95) as p95
Output
-------------------------------
| p10 | p25 | p50 | p75 | p95 |
| --- | --- | --- | --- | --- |
| 10 | 30 | 30 | 30 | 999 |
-------------------------------
It’s getting late, you save the results, finish your coffee and call the day. Cross dimensional analytics are now a problem of the future.
Conclusion
Structured logging is a good idea. It’ll help you with automation and may show new ways to explore your application behaviour, specially if you pair it with services like AWS CloudWatch Logs Insights.
Check the list below to get more valuable information.
See you later!
Additional resources
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK