Spring Boot Observability: Validating Tail Latency with Percentiles

Each application has different requirements regarding latency, from some which need to respond pretty fast, to other where latency is not so important. However, fast is always better.

In this post, we are going to talk about latency, how we collect metrics and how percentiles can help us to understand what’s happening in our applications.

TRY IT YOURSELF: You can find the source code of this post here.

Spring Boot Observability Series

If you like this post and are interested in hearing more about my journey as a Software Engineer, you can follow me on Twitter and travel together.

What is latency?

Latency is the amount of time a process required to be completed, talking about a web application, would be the amount of time between a request is sent and a response is returned, as we can see in the following image:

The latency between a request and a response

Latency is measured using time, like seconds, milliseconds, and so on. As little the value, as better.

Why latency is important?

Well, depending on your application, your customers might be angry if a click they did, is pretty slow. Latency usually affects a lot when your application is customer-facing, for instance, a RESTFul API to grab your bank account information: if you open your bank app on your phone, and it takes 1 minute to open, you will be mad, worst, if you are going to pay something with it and there are a queue of people waiting for you to get over.

On histograms and percentiles

As we already talked about what latency is and why it is important, now, we need to choose how to analyze latency data, and there, histograms and percentiles give us great insight.

Let’s start with the data: we have an endpoint GET /customers with the following latency metrics, for only the users Daniel and Andres, sorted by latency:

UserLatency (seconds)

Now, to create a histogram, we need to choose a grouping/buckets, in this case, of latency, to see how frequently one interval was:

Bucket (seconds)Frequency
1 to 34
4 to 62
7 to 94

There, we can see the most frequent bucket is from 1 to 3 seconds, where we found 4 requests. Now, the histogram looks as follows:

Buckets vs Frequency

That gives us a lot of insight into how the requests are going to GET /customers, but, we require to create the buckets beforehand, which is hard if we don’t know quite well the application we want to analyze, or, those metrics are changing because of optimizations.

Let’s try to see the data from another point of view: tail latency. I want to see how bad my API is responding, depending on some buckets I define, but more flexible, like percentages. In this case, we should sort our data using the latency metric, and mark them by where, from 0 % to 100 %, that data is located over the dataset, as follows:

Latency (seconds)Percentage (%)
1< 10
2< 20
2< 30
3< 40
4< 50
4< 60
7< 70
7< 80
7< 90
8< 100

There, we can see that 100 % of the request, are below 8 seconds, 60 % of the requests, are below 4 seconds, and so on. To describe this in another way:

Bucket (%)Tail Latency (seconds)
100 – 08
90 – 0 7
60 – 04

This way to see the data, help us to understand better the behavior of the API: the worst-case scenario for a request is 8 seconds, as 100 % of requests are below that threshold; but the second row is the most interesting one: 90 % of the requests are below 7 seconds, which means, we have 10 % of outliers requests that take more time, which is fine. The following is the histogram for that table:

Tail latency in seconds vs percentage buckets

Why 90 – 0 % is important? because tell us whether our system performs well or not: if you have 90 % of your requests below 1 second, and 100 % below 5 seconds, that means that you are responding pretty fast to the most of the requests (90 %), and you have some slow outliers (10 %)

NOTE: Usually is better to use 99 %, 98 %, 95 % and 95 %, as they show better insights about latency, you can find larger differences between the 99 percentile and 98 percentile.

Histograms and percentiles through time

We saw how to model the latency data using percentiles to get more insightful results, however, on a real system, we need to take into account another variable: time.

I would like to see how the latency behaves through time, but also, using tail latency with percentiles. The following data shows that:

Bucket (%)Tail Latency (seconds)When (time)
100 – 0808:01:00
90 – 0708:01:00
60 – 0408:01:00
100 – 0708:02:00
90 – 0508:02:00
60 – 0208:02:00
100 – 0608:03:00
90 – 0408:03:00
60 – 0308:03:00

There, each bucket has information on a moment on time, and this is what the histogram looks like:

Latency through time with percentiles

There, we can see the whole picture of what’s happening in the system through time: 100 % of the request respond below 8 seconds, and with time, that has improved, same for 90 % and 60 %.

We talked about how to analyze latency using histograms and percentiles, now, let’s see how to implement it on Spring Boot.

Measuring RESTFul API latency with micrometer

Micrometer offers us configuration to grab metrics for RESFul API latency, as follows:

      percentiles[http.server.requests]: 0.5, 0.7, 0.95, 0.99
      percentiles-histogram[http.server.requests]: true
      slo[http.server.requests]: 10ms, 100ms

There, we have:

  • http.server.requests: This is the metric generated by micrometer for the incoming requests to the API
  • percentiles: This is a configuration of the buckets, in this case, we create 50 – 0 %, 70 – 0 %, 95 – 0 % and 99 – 0 %
  • percentiles-histogram: We tell micrometer to generate histogram data
  • slo: We define some service levels to alert on

This configuration will send the metrics to Prometheus, and to analyze them, we need a new Grafana dashboard.

Seeing the Micrometic Metrics

Now, after exercising the application as we did in previous posts, using Gatling, we can see the metrics we set up as follows (http://localhost:8080/actuator/prometheus):

http_server_requests_seconds_bucket{application="spring-observability",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/customers",le="0.001048576"} 0.0
http_server_requests_seconds_bucket{application="spring-observability",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/customers",le="0.001398101"} 0.0
http_server_requests_seconds_bucket{application="spring-observability",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/customers",le="0.001747626"} 0.0
http_server_requests_seconds_bucket{application="spring-observability",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/customers",le="0.002097151"} 3.0

There, we see the buckets created by Micrometer.

Histogram and Percentiles for Latency on Grafana

There is no pre-design Grafana dashboard for percentiles and latency, so, we are going to build one.

  1. Open Grafana on http://localhost:3000/
  2. Go to Dashboards / Manage / New Dashboard
  3. Click on Add an Empty Panel
  4. Set the title as /customer/transform, we are going to create a panel only for that URL, for simplicity
  5. On the A metric, add the following: histogram_quantile(0.99, sum(rate(http_server_requests_seconds_bucket{ status!~"5..", uri="/customers/transform"}[1m])) by (le)). There, we are saying to Prometheus to calculate the histogram, using percentiles (99 percentile), of the uri /customer/transform that doesn’t fail with an HTTP 5xx error. For more information about Prometheus histogram_quantile, check here.
  6. On a legend, add 99 %
  7. Do 5 and 6 steps for different percentiles (90, 50, etc)

NOTE: You can find the dashboard here.

After running the Gatling tests, you will see something as follows:

Grafana dashboard for tail latency

There, we can see how the tail latency of requests behaves through time.

Final Thought

Measuring latency on our application is important: we learned what latency is and how histograms and percentiles can help us to analyze it.

Monitoring this kind of metrics will help you to avoid pitfalls, like adding new code which doesn’t perform as you expect. You can have automated tests on lower environments to guarantee new code doesn’t degrade the app performance.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s