Predicting Application Workload via Extreme Value Analysis
Scalability plays a key role in the success of an application/service. Being able to remain responsive in the face of heavy loads is crucial for a growing application. A lot of effort is spent to design scalable systems with the aid of state of the art paradigms, frameworks and tools. Developing a scalable system is, however, only part of the solutions. A system capable of scaling is of no use if we do not know when to scale it. The point is that we want our services to scale up/out when the load is intensifying but at the same time we want to strongly avoid scaling up/out when not needed. Scaling, in fact, never comes for free: additional resources mean additional costs that we want to avoid if not really necessary.
To complicate this scenario, we might have committed to provide our users with a specific service level (e.g. through a Service Level Agreement). In this cases we can not afford to scale up/out when the load reaches critical levels but we rather want to be proactive. That is, we scale up/out before the Quality of Service offered to the users degrades. Proactiveness, however, often leads to provisioning and non cost-effective decisions.
I am currently working on a web application that does some heavy sound processing (as beat detection that I discussed here) on user-submitted audio files. I want the processing service to scale proactively without ever degrading user experience (avoid long wait times). I am able to scale out these services but I don’t want to waste money on unused resources so I was after some tool to aid me predict the load on my system to more accurately decide when to scale. Several tools allow engineers to monitor the resource consumption (CPU, memory, network) and load of their systems. Few techniques however are available to guide proactive scaling. Few tools, in fact, help engineers and system designers to quantify possible future peaks in their system’s workload. And almost none of such tools provides a quantifiable confidence on the predicted peak. Workload prediction (as well as resource demand prediction) is often performed by experts on the basis of time series and real-time information but is hardly guided by statistical/scientific evidence.
In this scenario, Extreme Value Analysis can help predicting workload peaks starting from time series and can therefore serve as a fruitful aid to proactive scaling.
Extreme Value Analysis
Extreme Value Analysis (EVA) is a statistical theory that aims at assessing the probability of extreme events given an ordered sample of a random variable. EVA is applied in several fields as for instance, structural engineering, finance and traffic prediction. Recent studies also apply EVA to the prediction of network congestions.
For practical applications of EVA, the Block Maxima technique can be used. With block maxima the input time series is divided into blocks of fixed size from each of which only the maximum value is taken. The resulting sample is called block maxima serer. EVA then fits to the block maxima data a Generalized Extreme Value (GEV) distribution. GEV distribution is a family of continuous distributions that approximates well the behaviour of maximum values in a time series.
Once a GEV continuous distribution has been fit to the data a survival function can be extracted from it. The survival function (also known as reliability function or complementary cumulative distribution function) says for any value x, the probability p that a value higher than x occurs in the event represented by the original data. As you can understand with such a distribution we can choose a probability value p and extract the corresponding value x such that the probability of witnessing a value higher than x is p. That is, we can not only identify peaks but also quantify the probability of exceeding them.
In figure above an example of a survival function is shown. As you can see from the plot, the value 6527 is associated to the probability \(10^{-3}\) which means that, in the event represented by the input time series, the probability of observing values that exceed 6527 is only \(10^{-3}\).
Predicting Workload
In the context of predicting workload peaks, the time series fed to Extreme Value Analysis can represent either traffic load, memory consumption, CPU usage and so on. Consider for instance traffic load, but the same reasoning also applies to other metrics.
Given a time series representing traffic load we can apply EVA to that data.
Extreme Value Analysis fits to the block maxima a GEV continuous distribution.
From the corresponding survival function we are able
to extract traffic values that are only exceeded with arbitrarily low probabilities.
For instance, let’s pick a probability \(p = 10^{-3}\) and its corresponding
traffic load value x.
We now not only have a prediction of future traffic load but also the
probability that this prediction is exceeded.
Scaling up/out our service so that is can handle the traffic load x
means knowing that the probability of overloading the service is
\(p = 10^{-3}\). That is, 99.9% of uptime.
Example
Consider as an example the following time series representing the maximum daily number of simultaneous requests to a service (for the last 20 days):
[ 4352, 4472, 3847, 4915, 4969, 4333, 4381, 4091, 4135, 4160,
3534, 4598, 4086, 3788, 4038, 3396, 4118, 3822, 4333, 4034 ]
If we apply Extreme Value Analysis to that data we get the following survival function:
Scaling out the service so that it can handle 6527 simultaneous requests ensures us that it will be overloaded only with a probability \(p = 10^{-3}\) which is a 99.9% daily uptime.
In addition, the more data we have, the more accurately we can predict possible peaks. If we extend the previous example data with other 20 values (for a total of 40 days):
[ 4352, 4472, 3847, 4915, 4969, 4333, 4381, 4091, 4135, 4160,
3534, 4598, 4086, 3788, 4038, 3396, 4118, 3822, 4333, 4034,
3738, 4670, 3346, 4070, 3556, 3810, 3984, 3892, 4615, 3634,
4016, 3378, 4441, 3800, 4182, 3879, 3926, 3625, 4687, 3366 ]
If we apply Extreme Value Analysis to that data we get the following survival function:
That associates to \(p = 10^{-3}\) a lower workload value (6031).
Source Code
You can find the Python pyscale
module that implements the
workload prediction on Github.
To predict a load peak at a given probability
you can instantiate an object of the class LoadPredictor
as:
from pyscale import LoadPredictor
load_data = [ 4352, 4472, 3847, 4915, 4969, 4333, 4381, 4091, 4135, 4160,
3534, 4598, 4086, 3788, 4038, 3396, 4118, 3822, 4333, 4034 ]
predictor = LoadPredictor(load_data)
load_peak = predict_load(0.001)
You can also produce a plot of the survival function by instantiating an
object of the class PredictionPlotter
:
from pyscale import LoadPredictor
load_data = [ 4352, 4472, 3847, 4915, 4969, 4333, 4381, 4091, 4135, 4160,
3534, 4598, 4086, 3788, 4038, 3396, 4118, 3822, 4333, 4034 ]
predictor = LoadPredictor(load_data)
plotter = PredictionPlotter(predictor)
plotter.xlabel('Requests')
plotter.ylabel('Probability')
plotter.plot("plot.png", 0.001)
Notes
I would like to compare the workloads predicted by this approach with real load data of a growing web service. That would allow me to really understand the cost impact of scaling when guided by these predictions. Unluckily, I don’t have real-word data for a thorough evaluation now but I should have them reasonably soon (as soon as I get those data I will update this post). If you ever give a try to this approach and collect some data from your services/applications please contact me as I would love to have a look at the results.