Home Up
Home Teaching Glossary ARM Processors Supplements Prof issues About

The Computer in a Nutshell

This article provides a very brief introduction to the notion of the stored program computer, or von Neumann machine, as it is sometimes called.

Before we can describe a computer, we have to say what it is. That is easier said than done because there are different types of computer and the nature of the computer has evolved over the years.  For our current purposes, we can say that a computer is a device that executes or runs a program, and a program is a sequence of operations (or instructions).  A computer must have three elements. First, a memory that holds the instructions. Second, a mechanism that fetches the instructions from memory. Third, a mechanism that executes the instructions. In practice, a real computer needs a fourth element; that is, an input/output device that takes in information from the outside world and delivers new information to the outside world. In the case of the personal computer, typical input devices are the mouse and keyboard, and a typical output device is the screen. In the case of an aircraft input devices may be sensors that measure speed, aircraft location, aircraft heading, aircraft pitch, and output devices may be the actuators that move the flying surfaces (rudder, ailerons, and elevator) and the power throttles.

Figure 1 describes the organization of the conventional digital computer. This is simplified (partially because there may be multiple memories, control units, and even ALUs). However, in principle, it describes the elements that all digital computers have has since the late 1940s.

Figure 1 The stored program digital computer
















We have not mentioned data. Where does that fit into the von Neumann model? Data is the information processed by the computer; for example if you wrote a program to calculate y = p2+ q 2 we could say that the data is y, p, q, p2 and q2. All input devices convert physical quantities (e.g., moving a mouse or hitting a key) into data.

The very first computers were really (by today’s standards) calculating devices. They could receive data but there was no program in the sense that it is now used; that is, the computer performed a sequence of operations that were predetermined by the hardware of the computer and could not (easily) be changed. You may wonder why anyone would make such a computer. In the 1940s the military wanted to calculate rapidly where to point large guns in order to hit distant targets. To do this, you have to perform complex calculations that take into account atmospheric conditions and even the rotation of the earth.

The breakthrough in computer design came with the notion of the stored program. If a program is represented as data and stored in memory, you can change the data (program) and make the computer do anything you want. That is, you can program it.

The von Neumann stored program computer stores instructions in the same memory as the data used by the computer. In fact, there is absolutely no difference between data and instructions; both are just strings of bits. The only difference between instructions and data lies in how they are interpreted; in fact, one of the reasons that programs crash is because certain errors cause a computer to read data and then try and interpret it as an instruction.  Even worse, it occurred to someone that you could inject data into a computer and then pass control to it (i.e., treat it as an instruction). If this data represents instructions that are harmful, we call it a computer virus.

Computers that store instructions and data in separate memories are called Harvard machines, to distinguish them from von Neumann stored program machines with one common program/data memory. Harvard machines exist today in special applications such as digital signal processing (e.g., operating with audio, video, and the analog signals you might get from an EEG or seismograph).

Fetch/execute Cycle

The von Neumann computer operates in a two-phase (also called two-beat or two-cycle) mode called fetch and execute. Obviously, a computer can’t execute an instruction that is located in memory until it has read the instruction from memory and decoded it. The computer is then able to execute the instruction it fetched from memory. As you can see, all instruction are executed in two parts: fetch the instruction and then execute it.

Now, suppose that the instruction itself accesses memory; for example, consider the operation STORE r1,1234 (store register r1 in memory location 1234). In order to execute this operation, the computer has to access memory location 1234. The computer has accessed memory twice; one to read the instruction and once to access the memory location specified by the instruction. Some instructions might specify two addresses; for example MOVE 1234,2000 (move the contents of memory location 2000 to memory location 1234). In this case, there are three memory accesses: fetch the instruction, read the contents of location 2000, store the data in location 1234.

The von Neumann computer accesses memory at least once per instruction. The data path between the CPU and memory can prove to be the limiting factor in the performance of a von Neumann computer. This limitation is called the von Neumann bottleneck. A lot of the effort in computer design goes into dealing with the von Neumann bottleneck.

Summary

The stored program digital computer is also called a von Neumann machine. Its principal characteristic is that data and instructions are stored in the same memory. Instructions are executed in two phases: fetch an instruction from memory and then execute that instruction. A limitation of the stored program digital computer is the path between the CPU and memory because it has to be used by both instructions and data.

Example

It takes T seconds to read memory, P seconds to execute an instruction, and a fraction f of instructions (0 < f < 1) require one memory access, how long does it take to execute n instructions?

The n instructions each require T seconds to fetch, so the total time is nT.

Each instruction requires P seconds to execute, so the total execution time is nP

A fraction f of the instructions requires an additional memory cycle; that is, fnT in total.

The time to execute all instructions is nT + nP + fnT = n(T + P + fT) = n(P + T(1 + f)) seconds.

Components of the CPU

Before we describe the operation of a computer we need to define four terms: memory, register, ALU and bus.

Memory

Information in a computer (both instructions and data) is stored in its memory which is sometimes called main memory, immediate access memory, or main store. In the PC world, this memory is frequently called DRAM (because it’s made from DRAM chips). The memory is called immediate access memory because data is immediately available; unlike in secondary memory (e.g., hard drives, CDs, DVDs, tape) where data can take a relatively long time to access. Note that when we say ‘immediately available’ the comparison is being made with disk drives which are millions of times slower. In reality, it might take 50 ns (i.e., 50 x 10-9s) to access memory.

Main store can be described formally as, Random access, read/write memory’ which means that the time taken to access a stored element at any random location is constant and that you can read data from it or write data to it.

Figure 2 illustrates memory conceptually. It has three ports (a port is a point of connection). An address port that contains the address (location) of the data to be accessed, a data port via which data enters the memory or leaves the memory, and a control port that determines the operation of the memory – read or write.

The address port has n bits which means that 2n different locations can be accessed. If there are 8 address lines, these can express all the 28 = 256 values  from 00000000, 00000001,00000010, … , 11111110, 11111111. A memory chip might have 24 address lines allowing 224 locations to be accessed.

The data port is bidirectional because data flows into memory during a write operation, and out of memory during a read operation. The width of a data port is typically, 1, 4, 8, of 16 bits. If you want to implement a computer with a 64-bit word, you can use 64 chips with a 1-bit data bus side by side, or 16 4-bit chips, or 8 8-bit chips, or 4 16-bit chips.


Figure 2 Memory















The capacity of a memory (total number of bits) is given by the number of locations multiplied by the number of bits stored at each location. The result is normally expressed in bytes. Below are several example. The total capacity of the memory is expressed in KB, MB, GB (or even) TB. However, you have to be careful. Traditionally, the K in memory terminology represents 1024 (i.e., 210) so that 8,388,608 is expressed as 8M bytes. However, some use the decimal 1,000 as K and express 8,388,608  as 8.3K (because it looked bigger).

Example


Chip                          Locations          Capacity (bits)              Capacity        Capacity


8 address lines 16 data bits 28 = 256            256 x 16 = 4,096             512              ½K

16 address lines 1 bit       216 = 65,536        65,536 x 1 = 65,536          8,192            8K

24 address lines 4 bits      224 = 16,777,216    16,777,216 x 4 = 67,108,864  8,388,608        8M


The control port of a memory included the signals that select (activate) the memory and determine whether data is to be written into it or read from it.


In a high-level language like C, reading from memory is expressed by x = y(p) (i.e., read location p from memory y and call the data x). Similarly, writing data can be expressed as y(p) = x (i.e., store data x in location p of memory y).


Registers


The register is a single memory location. You can regard is as a memory that is m-bits wide and 1-bit deep, where m is usually the wordlength of the computer. Memories have addresses. Registers don’t have an address because there is only one location. Memories have names that usually describe how they are being used; for example instruction register would be used to describe the register that holds an instruction.


Registers are the fastest storage locations in a computer and are used to store temporary data during the fetching and executing of an instruction.


Buses

A bus is a data highway that connects registers, memories, and functional units such as ALUs (and even I/O devices). Although many devices can be connected to a bus, only one device at a time can put data on a bus. However, multiple devices can read data off a bus simultaneously.

The ALU

The ALU or arithmetic and logic unit is the device where data processing takes place. It has four ports. Two input ports that access two m-bit words and an output port that delivers an m-bit result. There is also a control input that defines the current operation to be performed; for example, addition, subtraction, bit shifting, logical AND, and so on. There’s also a carry-bit output because adding two m-bit numbers such as 10001111 and 11000001 results in 01011111 and a carry out of 1.

The Processor

We can now build a computer. Figure 3 illustrates the structure of a very simple stored program computer.

Figure 3 Structure of a computer


 






















In figure 3 the registers are in blue, the buses in pink, the functional units are in yellow, and the memory in green. We are going to explain the operating of this computer step by step.

Fetching the Instruction

We begin with the register called the program counter, PC. This register contains the address of the next instruction to be executed. The name program counter is rather unfortunate because it is misleading. This should be called ‘next instruction pointer register’ but we are stuck with program counter.

The contents of the program counter are fed to the register called the memory address register, MAR. This register holds the address of the next location in memory to be accessed.

We can express the operation “copy program counter to memory address register” as


[MAR]  ¬ [PC]

[MAR] ¬  [PC]


Once the program counter has been used to access an instruction, it must be incremented to point to the next instruction; that is,


[PC] ¬  [PC] + 1


In the next step, the instruction must be read from memory. In this computer, it is copied into the memory data register that holds all data from memory or that is going to be written into memory. We can write this as:


[MBR] ¬ [M[MAR]]


The expression is read as “the contents of memory whose address is given by the contents of the memory address register”.


Finally, the contents of the memory data register are copied to the instruction register, where the instruction is held while it is being decoded and executed. We can write


[IR] ¬  [MBR]


Putting all these together, we can express the fetch phase as:


[MAR] ¬  [PC]

[PC]  ¬  [PC] + 1

[MBR] ¬  [M[MAR]]

[IR]     ¬   [MBR]


Figure 4 Parts of the computer taking place in the fetch phase





















Once the instruction (the op-code) is in the instruction register, it can be executed. The details are beyond the scope of this article. However, the basic principles are straightforward. The bits of the op-code are using in conjunction with a stream of clock pulses to generate a series of signals that control the computer’s buses, registers, and functional units. These signals steer data from its source to functional units and then to its destination.  These actions are performed by the block labeled control unit in figure 3.

Instruction Formats

Before we can demonstrate how the computer of figure 3 executes and instruction, we have to introduce the notion of instruction formats; that is, how a computer instruction is formatted (i.e., structured).

Instruction formats are generally (and loosely) categorized by the number of addresses; for example, one-address, or three-addresses.

Three-address Instructions

A typical arithmetic operation is p = q + r which requires three operands p, q, and r. Consequently a 3-address instruction has the forma operation,destination,source_1,source_2; for example ADD p,q,r.

If an operation is specified by 8 bits, and each operand address requires 32 bits, the total length of an instruction is 8 + 3 * 32 = 8 + 96 = 104 bits. Such an instruction length is not practical. Consequently, three-address instruction formats of this type do not exist today. However, RISC processors like MIPS and ARM do support three-address formats or the form ADD r0,r1,r2 where the operands are internal registers rather than external memory locations. Instead of millions of memory locations, you can address only 16 registers (ARM) or 32 registers (MIPS).

Two-address Instructions

A two-address instruction has the format operation,destination,source; for example, ADD destination,source. Since only two operands are specified and three are needed, the destination operand is used twice; that is, ADD destination,source is defined as

Destination = destination + source

The two-address instruction is smaller than the three-address instruction in terms of bit-length. However, it is still impractical. Microprocessors (8-bit and members of the Intel IA32 and Motorola 68K families) implement a version of this instruction format where one operand is a memory address and the other is a register address; for example ADD r0,1234 will add the contents of memory location to the contents of register r0 and pout the sum in register r0.

One-address instructions

A one-address instruction has a single address field. This means that operations must use an implicit register; that is, a one-address instruction is really a two-address instruction where one address is always the same. This one address was called the accumulator (because it accumulated the result). In general, one address machines are implemented only by 8-bit microprocessors used in embedded and control systems. Moreover, most 8-bit computers have more than one accumulator.

The machine of figure 3 is a one address machine. We’ve chosen this structure because it is the simplest way of illustrating the structure of a computer and the operation of a fetch execute cycle.

In order to illustrate the execute phase we have to demonstrate an actual instruction because the process is different for each instruction. Let’s use ADD 1234 (add the contents of memory location 1234 to the accumulator).

We begin with the instruction ADD 134 in the instruction register. Figure 5 shows the parts of the CPU used in the first part of the execute phase. The operand address (i.e., 1234) is sent to the memory address register and latched. Now it can be used to access the actual operand in memory.

Figure 5 The execute phase (part 1)






















The operand is read from memory and fed to the memory buffer register where it is latched. We can write this in RTL as:

[MAR] ¬ [Operand]

[MBR] ¬ [M[MAR]]


Now the memory buffer register contains the actual data to be used in the operation. The memory buffer register is connected to the P port of the ALU.


The instruction is decoded and the control unit sends the appropriate code for ADD to the ALU. The contents of the accumulator are fed to the ALU. Now the data from memory and ALU are added together and the result sent to the accumulator, figure 5.


The full RTL for the addition is:


[MAR] ¬ [Operand]

[MBR] ¬ [M[MAR]]

ALU   ¬ [MBR]

ALU   ¬ [A]

[MBR] ¬ ALU


Note that the ALU does not have square brackets because it does not store data. The actual operation implemented is


[A] ¬ [[A] + [MDR].



Figure 6 The execute phase (part 2)






































The control signals generated by the control unit are: enable a register onto the bus, clock data off a bus into a register, set the memory to read or write, and select the ALU function.


So far, we have demonstrated how the computer performs data processing. The same process would implement:


Store accumulator in memory

Load accumulator from memory

Add (or any other operation) data to the accumulator

Add (or any other operation) data in the accumulator to memory.


There are only three element missing: literals, pointer-based addressing, and conditional operation.


We cannot deal with literals yet (e.g., ADD #5 which adds the literal value 5 to the accumulator to implement [A] ¬ 5. In practice this is easy because there is a path between the IR and the MBR; that is, the actual or literal operand can be fed from the IR directly to the MBR. The operation ADD #5 is therefore:


[MBR] ¬ [Operand]

ALU   ¬ [MBR]

ALU   ¬ [A]

[A]   ¬ ALU


Pointer-based addressing (e.g., ADD [r0] which adds the contents of the memory location pointed at by r0 to the accumulator) would require the addition of a pointer register that can be loaded from data in the MBR, and which can send data to the MAR in order to access memory).


Conditional Behavior


We cannot implement the conditional  branches such as BEQ XYZ (branch on zero to instruction XYZ). Conditional branches are necessary to implement if…then…else statements, and for, while, and repeat loops.


In order to implement a conditional operation,  register Is connected to the ALU to receive the Z,N,V,C (zero, negative, overflow, and carry) flags from the result of an operation. These flag bits are stored in the CCR (condition code register) which is also called the status register or flag register.


The appropriate flag are tested to implement an operation such as branch on zero or branch on greater than. Then, the result of the test is used to load the program counter with either its next sequential address or the branch target address. Consider, BEQ XYZ (branch on zero to XYZ). In RTL this is simply


IF Z = 0 THEN [PC] ¬ [PC]    ;if not zero then continue

        ELSE [PC] ¬ [IR]    ;else load the target address


Points to Understand and Remember




The Memory Wall

Year by year, computers get faster. The rate at which processors execute instructions has increased, with higher clock speeds, better semiconductor technology, pipelining, and superscalar processing.


Memory (DRAM) has also been getting faster; for example, DDR1 chips have been replaced by DDR2 and then DDR3 chips.

However, the speed of the CPU has been increasing at a greater rate than that of memory. Consequently, computers have to wait for the memory to fetch data and instructions. The growing gap between processor cycle time and memory access time is called the memory wall and some believe it to be the ultimate limit to the increase in processor performance.


Computers reduce the effect of slow memory by putting frequently used data in very fast memory called cache memory. In general, over 90% of memory accesses are to cache rather than to DRAM.


Register Transfer Language

Register transfer language, RTL, is used to define operations in a computer in an algebra-like notation.

Square brackets indicate the contents of; for example, [A] means the contents of the accumulator.

The back arrow, ¬, indicates data transfer; for example, [P] ¬ [Q] means that the contents of Q are copied to P.

The notation [MAR] ¬ 0 means that the actual literal value 0 is written into the memory address register.

If [M] means the contents of the memory and [MAR] means the contents of the memory address register, then the notation [M[MAR]] means the contents of memory whose address is in the memory address register.