Cache (pronounced like cash)
A cache is used to speed access to memory. There are two major ideas that
govern the effectiveness of a cache
-
When going to memory to fill the cache, get more code or data than is needed
-
Hopefully, the extra code or data that is fetched will be used in the next
few instructions
Since it takes so long to make a new request to RAM, then it makes sense
to fetch a lot of code or data every time that RAM is accessed. Of course,
if the code or data that is fetched is not used before it is purged from
the cache, then it was a waste of time to bring it into the cache. However,
computer programs tend to conform to the Locality Principle: the next
memory location needed from RAM will be close to the current memory location.
Consider several situations for the locality principle
-
A loop without procedure calls.
-
A procedure that does not call other procedures
-
Local variables in a function
-
The elements of an array
In the first two examples, code is being referenced. When the first statement
for the loop or procedure is loaded from RAM, it will be placed into the
cache, along with many of the following instructions. Possibly all of the
instructions in the loop or function will be loaded into the cache during
the first read to RAM. Once instructions are in the cache, they can be read
much more quickly than if they were in RAM, so the execution cycle is sped
up. If the loop or function is large, then it may fill several lines of data
in the cache. In this case, every time a statement that is needed is not
in the cache, a call to RAM will occur, and another block of instructions
will be placed into the cache.
In the second two examples, data is being referenced. It is important to
note that all the local variables will be near each other in memory, as will
all the elements in an array. When the first local variable is loaded into
the cache, then several more will be loaded at the same time. The same is
true for the array, when a reference is made to the array, then several array
elements will be fetched into the cache. If there are a lot of local variables,
or a large array is being accessed, several cache lines may need to be filled.
Whenever a new cache line is filled, there will be a delay while RAM is being
accessed and the data is transferred to the cache. Any subsequent references
to data in that cache line will be much faster than accessing RAM again.
As a simple example, suppose that a cache line can hold either 4 instructions
or 4 data items. Consider the following program segment.
1 int a,b,c,d;
2 int arr[10];
3 a=0;
4 c=2;
5 d=3;
6 top: if (a >= 10) goto done
7 b = c + d;
8 c = c + 1;
9 d = b * 3;
10 arr[a] = c + d;
11 a = a + 1;
12 goto top
13 done:
Let's investigate how this program would get loaded into the cache. Assume
that the cache is empty when we start the program.
-
Lines 1 and 2 do not affect the cache, they only declare data
-
When line 3 is needed, the cache is checked. Since it is not in the cache
yet, it is fetched from RAM. At the same time, instructions 4, 5, and 6 are
fetched from RAM and placed into the same cache line
-
When line 3 is executed, it references the variable a, at this time,
a would be loaded into a new cache line. At the same time, variables
b, c, and d would be loaded into the same cache line.
-
When lines 4, 5, and 6 are needed, they are already in the cache, so no RAM
access is needed. Similarly, all the data referenced by these instructions
is already in the cache, so no RAM access is needed here either.
-
When line 7 is needed, it is not in the cache, so it is fetched and placed
into a new cache line. At the same time, instructions 8,9 and 10 are loaded
into the same cache line.
-
When lines 8 and 9 are needed, they are already in the cache, so no RAM access
is needed. Similarly, all the data referenced by these instructions is already
in the cache, so no RAM access is needed here either.
-
When line 10 is executed, it references item arr[0], so it is brought
into a new cache line. At the same time, arr[1], arr[2], and
arr[3] are brought into the same cache line.
-
When line 11 is needed, it is not in the cache, so it is fetched and placed
into a new cache line. At the same time, instructions 12 and 13 are loaded
into the same cache line. Also, the next line of code would be added to the
cache as well, even if it isn't part of this program.
-
When 11 and 12 are executed, they are now in the cache and so is all the
data that they need, so no RAM access is needed.
-
The next three iterations of the loop do not require any accesses to RAM.
All the instructions have been loaded into the cache as well as the varaibles
a, b, c, d, and arr[0], arr[1],
arr[2], arr[3] so no RAM accesses are needed.
-
On the iteration when variable a is equal to 4, line 10 causes RAM
to be accessed. At this time, arr[4], arr[5], arr[6] and
arr[7] are loaded into a new cache line. No other RAM accessed are
needed during this iteration, nor the next three iterations of the loop.
-
On the iteration when variable a is equal to 8, line 10 causes RAM
to be accessed. At this time, arr[8], arr[9], and two random
data elements are loaded into a new cache line. No other RAM accessed
are needed during this iteration, nor the next tow iterations of the loop.
Some observations about this process
-
Three cache lines are needed to store the instructions
-
Four cache lines are needed to store the data
-
Extra code or data may be placed in a cache line that might not even be part
of this program
-
Once all the data and instructions are in the cache, the code executes very
quickly
-
Every time RAM is accessed, the code runs slower
A question to consider about this process: What would happen if there were
only 4 cache lines?