Cache (pronounced like cash)

A cache is used to speed access to memory. There are two major ideas that govern the effectiveness of a cache

When going to memory to fill the cache, get more code or data than is needed
Hopefully, the extra code or data that is fetched will be used in the next few instructions

Since it takes so long to make a new request to RAM, then it makes sense to fetch a lot of code or data every time that RAM is accessed. Of course, if the code or data that is fetched is not used before it is purged from the cache, then it was a waste of time to bring it into the cache. However, computer programs tend to conform to the Locality Principle: the next memory location needed from RAM will be close to the current memory location.

Consider several situations for the locality principle

A loop without procedure calls.
A procedure that does not call other procedures
Local variables in a function
The elements of an array

In the first two examples, code is being referenced. When the first statement for the loop or procedure is loaded from RAM, it will be placed into the cache, along with many of the following instructions. Possibly all of the instructions in the loop or function will be loaded into the cache during the first read to RAM. Once instructions are in the cache, they can be read much more quickly than if they were in RAM, so the execution cycle is sped up. If the loop or function is large, then it may fill several lines of data in the cache. In this case, every time a statement that is needed is not in the cache, a call to RAM will occur, and another block of instructions will be placed into the cache.

In the second two examples, data is being referenced. It is important to note that all the local variables will be near each other in memory, as will all the elements in an array. When the first local variable is loaded into the cache, then several more will be loaded at the same time. The same is true for the array, when a reference is made to the array, then several array elements will be fetched into the cache. If there are a lot of local variables, or a large array is being accessed, several cache lines may need to be filled. Whenever a new cache line is filled, there will be a delay while RAM is being accessed and the data is transferred to the cache. Any subsequent references to data in that cache line will be much faster than accessing RAM again.

As a simple example, suppose that a cache line can hold either 4 instructions or 4 data items. Consider the following program segment.

1         int a,b,c,d;
2         int arr[10];
3         a=0;
4	  c=2;
5	  d=3;
6  top:   if (a >= 10) goto done
7         b = c + d;
8         c = c + 1;
9         d = b * 3;
10        arr[a] = c + d;
11        a = a + 1;
12        goto top
13 done:

Let's investigate how this program would get loaded into the cache. Assume that the cache is empty when we start the program.

Lines 1 and 2 do not affect the cache, they only declare data
When line 3 is needed, the cache is checked. Since it is not in the cache yet, it is fetched from RAM. At the same time, instructions 4, 5, and 6 are fetched from RAM and placed into the same cache line
When line 3 is executed, it references the variable a, at this time, a would be loaded into a new cache line. At the same time, variables b, c, and d would be loaded into the same cache line.
When lines 4, 5, and 6 are needed, they are already in the cache, so no RAM access is needed. Similarly, all the data referenced by these instructions is already in the cache, so no RAM access is needed here either.
When line 7 is needed, it is not in the cache, so it is fetched and placed into a new cache line. At the same time, instructions 8,9 and 10 are loaded into the same cache line.
When lines 8 and 9 are needed, they are already in the cache, so no RAM access is needed. Similarly, all the data referenced by these instructions is already in the cache, so no RAM access is needed here either.
When line 10 is executed, it references item arr[0], so it is brought into a new cache line. At the same time, arr[1], arr[2], and arr[3] are brought into the same cache line.
When line 11 is needed, it is not in the cache, so it is fetched and placed into a new cache line. At the same time, instructions 12 and 13 are loaded into the same cache line. Also, the next line of code would be added to the cache as well, even if it isn't part of this program.
When 11 and 12 are executed, they are now in the cache and so is all the data that they need, so no RAM access is needed.
The next three iterations of the loop do not require any accesses to RAM. All the instructions have been loaded into the cache as well as the varaibles a, b, c, d, and arr[0], arr[1], arr[2], arr[3] so no RAM accesses are needed.
On the iteration when variable a is equal to 4, line 10 causes RAM to be accessed. At this time, arr[4], arr[5], arr[6] and arr[7] are loaded into a new cache line. No other RAM accessed are needed during this iteration, nor the next three iterations of the loop.
On the iteration when variable a is equal to 8, line 10 causes RAM to be accessed. At this time, arr[8], arr[9], and two random data elements are loaded into a new cache line. No other RAM accessed are needed during this iteration, nor the next tow iterations of the loop.

Some observations about this process

Three cache lines are needed to store the instructions
Four cache lines are needed to store the data
Extra code or data may be placed in a cache line that might not even be part of this program
Once all the data and instructions are in the cache, the code executes very quickly
Every time RAM is accessed, the code runs slower

A question to consider about this process: What would happen if there were only 4 cache lines?