The modulo and integer division operators are widely used by
programmers, regardless of the language used,
the type of application or its goal.
Modern off-the-shelf processors provide
machine instructions to perform such operations.
For instance x86 architecture has the
instruction that performs a division and places the reminder in the
Division and modulo operations are frequently executed as
recurrent parts of more complex functionalities.
Their impact on performance is, however, often underestimated.
Divisions in modern architectures, in fact,
may take several clock cycles, depending on the operands’ values.
Other integer or bitwise operations, on the other hand, usually
take only one cycle.
Due to their high frequency of execution and possibly longer latency,
modulo and division should be avoided in favor of bitwise
operations whenever possible.
How to avoid division and modulo?
Modulo can be easily translated into a bitwise
the divisor is a power of two. Consider, for instance,
the following C code:
It can be translated into:
In general, if
divisor is a power
n of two, the modulo operation can be translated to
a bitwise AND with
the integer division
value / divisor corresponds to
a right shift of
value >> n.
What is the real impact of modulo?
Saying that division/modulo should be avoided when possible
requires some evidence to be provided.
The question is: does the use of division/modulo
operation have a real impact on the
performance of the application when compared to
using the bitwise and?
To answer to this question I run the following C program on an x86 machine:
And compared its execution times with the same program
implemented using binary AND (
Results are shown below (1000 runs of the program
As you can see, in the average case the program with
the modulo operation is 62% slower than the one
It’s not always possibile to get rid of modulo and division in favor of bitiwise operations. However, there are cases in which this is possible. If you are in need for a circular buffer, for instance, of size s you might consider setting the size to the power of two closer to s to speed up performance on accessing the buffer. If s is close to the nearest power of two then the space wasted will be totally worth it.