Graphite Shows Metrics But No Data - Troubleshooting
source link: https://blog.jakubholy.net/2014/05/05/graphite-shows-metrics-but-no-data-troubleshooting/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Update: Graphite data gotchas that got me
(These gotchas explain why I did not see any data.)- Graphite shows aggregated, not raw data if the selected query period (24h by default) is greater than the retention period of the highest precision. F.ex. with the schema "1s:30m,1m:1d,5m:2y" you will see data at the 1s precision only if you select period less than or equal to the past 30 minutes. With the default one, you will see the 1-minute aggregates. This applies both to the UI and whisper-fetch.py.
- Aggregation drops data unless by default at least 50% of available data slots have values (xFilesFactor=0.5). I.e. if your app sends data at a rate more than twice slower than Graphite expects them, they will never show up in aggregates. F.ex. with the schema "1s:30m,1m:1d,5m:2y" you must sends data at least 30 times within a minute for them to show in an aggregate.
Lesson learned: Always send data to Graphite in *exactly* same rate as its highest resolution
As described above, if you send data less frequently than twice the highest precision (if 1s => send at least every 2s), aggregation will ignore the data, with the default xFilesFactor=0.5 (a.k.a. min 50% of values reqired factor). On the other hand, if you send data more frequently than the highest precision, only the last data point received in each of the highest precision periods is recorded, others ignored - that's why f.ex. statsD flush period must = Graphite period.Exploring the whisper storage
We can go directly to itswhisper
storage and see is there f.ex. for the metric ring.request.active.count
:
ll -h /opt/graphite/storage/whisper/ring/requests/rate/count.wsp -rw-r--r-- 1 root root 2.5M May 5 11:01 /opt/graphite/storage/whisper/ring/requests/rate/count.wspOK, so the file is there and is not empty. What does it contain?
$ whisper-fetch.py ./count.wsp | tail -n1 1399287780 NoneThe data is hard to read, let's pretty print it:
$ whisper-fetch.py --pretty ./count.wsp | tail -n1 Mon May 5 11:02:00 2014 NoneOK, so the data is as expected but the value is None. Let's verify that:
$ whisper-fetch.py --pretty --json ./count.wsp { "start" : 1399193520, "end" : 1399279920, "step" : 60, "values" : [null, null, ...] }
We can get the same data also via the API: http://graphite-webapp.example.com/render?from=-24h&target=ring.request.active.count&title=Testing&rawData=true
So the values are indeed null. According to the docs: "If no value is recorded at a particular timestamp bucket in a series, the value will be None (null).".
We were able to find out what data and for what period Graphite has by reading directly its whisper files. The next step will be to find out why the values are null.
Update: What did I do wrong? I have not noticed that whisper-fetch shows me data aggregated to the next, 1 min precision, because I have asked for a period longer than the 30 min rentention period of my 1s precision (the default period of 24h). I also did not know that aggregation results in None if there are too few data points.
I run Graphite via Nginx so /var/log/nginx/error.log and /var/log/gunicorn/graphite.log might help.
Graphite's own logs are f.ex. /opt/graphite/storage/log/carbon-cache/carbon-cache-a/listener.log (and console.log, creates.log etc.) and /opt/graphite/storage/log/webapp/*.log.
Experimenting
Lets manually push some data to Graphite, namely its data collecting daemon Carbon, to see what happens.echo "jakub.test 42 $(date +%s)" | nc 0.0.0.0 2003 # Carbon listens at 2003
A minute or so later:
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1 Sun May 4 12:19:00 2014 None $ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | tail -n1 Mon May 5 12:09:00 2014 None $ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | grep -v None | wc -l 0What is happening? Carbon has created the metric and filled with "no data" for the past ~ 24 hours (it is, I believe, the default behavior to automatically add data for each period in the past 24h when creating a metric) but the value I sent to it is not there.
ngrep: What data is being sent?
Let's us see what data is actually being sent to carbon by intercepting the traffic on port 2003 with ngrep (thanks to Tim Zetha for the tip):graphite-server$ sudo ngrep -d any port 2003 & # Intercept packets on any (eth, lo, ..) device, port 2003 interface: any filter: (ip or ip6) and ( port 2003 ) graphite-server$ echo -e "jakub.test2 45 $(date +%s)" | sudo nc localhost 2003 #### T 127.0.0.1:34696 -> 127.0.0.1:2003 [AP] jakub.test2 45 1399362193. ####^Cexit 23 received, 0 dropped"T" means this is TCP (and not f.ex. U[DP]), the letters in [] are TCP flags as printed by ngrep (Ack, Fin = no more data, Sync = during conn setup, ...). A "#" is reportedly displayed for each non-matching packet (use -e[mpty] to print more of them). Notice that ngrep captures both in- and out-going packets.
------
PS: I have asked for tips at StackExchange.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK