The goal of this post is to determine how the latency is going to be impacted

if the service time is going to be cut in half. I was watching a great presentation

of Martin Thompson where he mentioned this example example and wanted

to freshen up my queuing theory a bit and do some calculations.

## Lots of formulas

Let’s first get some formulas out of the way.

The mean latency as function of the utilization:

[R=dfrac{1}{mu(1-rho)}]- (R) is latency (residency time)
- (mu) is service rate
- (rho) is utilization

(mu) is the reciprocal of service time (S).

[mu=dfrac{1}{S}]And if we fill this in and simplify the formula a bit, we get the following:

[R=dfrac{1}{dfrac{1}{S}(1-rho)}] [R=dfrac{S}{(1-rho)}]Another part we need is the calculation of the utilization:

[rho = S lambda]- (rho) is the utilization
- (lambda) is the arrival rate.
- (S) is the service time.

This formula is known as the Utilization Law or Little’s Microscopic Law.

And this can be converted into a function that calculates the arrival rate as a function of utilization and service time:

[lambda = dfrac{S}{rho}]## Comparing the latencies

In Martin’s example he has 2 services:

- an unoptimized service with (S=0.1) seconds.
- an optimized service with (S=0.05) seconds

To make it a bit more tangible let’s plot latency curves as function of utilization for both services.

- green: unoptimized service
- blue: optimized service

The x-axis is the utilization and the y-axis is the latency.

The first thing we can calculate is the latency with a service time of 0.10 seconds and a utilization of 0.90 because this utilization Martin picked in his example:

(R=dfrac{S}{(1-rho)}=dfrac{0.10}{(1.0-0.90)}=1.0) second.

The next part we need to determine is the arrival rate based on the utilization and service time. So with a service time of 0.10 and a utilization of 0.90 we get the following arrival rate:

(lambda = dfrac{S}{rho} = dfrac{0.1}{1-0.9} = 10) requests/second.

So if the service time is going to be cut in half, and we keep the same arrival rate, the utilization is going to be cut in half:

[rho = S lambda = 0.05 times 10 = 0.45]And if fill this into the calculate the latency is function of the utilization for the optimized service, we get:

(R=dfrac{S}{(1-rho)} = dfrac{0.05}{1-0.45}=0.091) seconds.

We can see both the calculated latencies in the plot below:

So we went from a latency of 1 second to 0.091 seconds which is an (dfrac{1}{0.091}=11) times improvement.

## Discrepancy

Martin’s claim is that the latency improvement is 20x. But the results based on the above formulas don’t agree with that. He makes a remark that the unit of processing is half the calculated latency, but I don’t understand what he means or how this is relevant for the latency calculation.

I haven’t configured comments yet on this new website. So if you want to comment, send me a message on Twitter.