Skip to content

CIS 4710 Notes 1.1

Published: at 08:01 AM

This set of notes covers CIS 4710 at UPenn, which is computer architecture, organization and design. The goal of this course is to build a RISC-V processor, specifically a 32-bit RV32IM core with a cache.

Table of Contents

Open Table of Contents

Intro to Computer Architecture

Computers are complex systems, but are built in two layers: software and hardware. These two layers are connected together by the instruction set architecture (ISA). The ISA enables the implementation of software through the hardware. The implementation of this ISA in terms of design processor, ALU, memory, I/O is known as the microarchitecture. Specifically, this course is about general-purpose CPUs, which is a processor that can run an OS. There also exist application-specific chips called Application-Specific Integrated Circuits, or ASICs. These are critical for domain-specific functionality in hardware, like GPUs for graphics or accelerators for video encoding. The tradeoff is that while the hardware is much less flexible in its capability, it is much stronger at its specific application.

The basic element of the computer is the solid-state transistor. These are the building block of integrated circuits (ICs). ICs are high performance, high reliability, low cost, and low power. There are several kinds of integrated circuit families:

ISAs and RISC-V

Remember that a computer is a finite state machine. It has a few different product states:

A computer executes instructions. It follows a strict loop:

This is known as the Von-Neumann model. This model treats a program as just data in its memory.

The hardware-software interface is the ISA. It is the functional definition of HW storage locations and operations. This could be retrieving storage from registers or memory, or computing operations like add, multiply, etc. It is a precise description of how to invoke and access them. Assembly is a human-readable version of machine code, which translates directly into machine code, which is the binary format that the processor understands. The program that translates assembly to machine code is known as the assembler.

RISC-V

RISC-V is an open-source ISA. Other ISAs like ARM are commercial and require licensing. In contrast, anyone can build an RV chip without a license.

A 64-bit ISA is an ISA with 64-bit pointers. Often, registers are also 64 bits to hold pointers. The address space is also 64 bits. However, this does not describe the number of registers, instruction size, amount of memory, or all the data widths that instructions may process.

RV has two base ISAs: RV32I and RV64I. The 32 represents the fact that our ISA will be 32-bit. The extension I represents that the ISA has only the 47 base integers instructions. We will be building the RV32IM, which additionally gives us multiply/divide instructions.

RV32IM

The RV32IM is described by the 32-bit program counter, 32 registers called x0-x31, where x0 always has the value 0, and each register has an alias. For example, x2 = sp. Instructions are also a fixed 32-bits in size. You can refer to the RV ISA spec here. Here is part of the ISA:

RV ISA

What you’ll notice is that RV has no NOT operation. The way that RV does it is by performing the XOR operation of a register with -1, which by two’s complement, has all the bits set to 1. This logically will flip each bit in a register.

RV also works with both signed and unsigned numbers. Consider the instructions slti and sltiu. These instructions can be used as the following:

Memory Instructions

RV has 5 load instructions:

As you can see, RV is byte addressable. This means that each address points to a single byte of data, and that no smaller units can be manipulated. A memory load instructions looks like this: lb x1,0(x0). The lb is the opcode, x1 the destination register. The next 0 is the immediate, and finally, the x0 is the source register. The memory address that we load from is the $immediate + source$. Therefore, lb x1, 3(x0) would load the byte at address 0x3 into x1. However, lh x1, 3(x0) would load the half at the address 0x3, so it would load both the bytes at 0x3 and 0x4 together.

Memory Access Alignment

All storage devices have some internal structure. Let’s assume that our memory is built out of 4 byte chunks. What would happen if we invoked lh x1, 3(x0)?. If an address we access is a multiple of the memory chunk size, then we call that access aligned. The access command we specified is misaligned. A misaligned access may require multiple physical memory accesses, which is unfavorable. ISAs handle misaligned memory accesses in different ways. Some architectures like MIPS and ARMv5 disallow it and crash the program. Others, like x86, ARMv6+, and RV just handle it transparently. This makes it easier to program but adds significantly more hardware complexity. However, compilers will mostly try to avoid misaligned accesses.

Consider a struct in C:

struct foo {
	char c;
	int i;
}

Here, we know the char uses 1 byte of memory and the int uses 64. However, this struct does not use 5 bytes of memory, because that would lead to misalignment of memory, from either the next struct or variable. Instead, the char uses its own chunk of 4 bytes, throwing away 3 bytes to keep the memory aligned. Therefore, foo takes up 8 bytes.

Memory Endianness

Endianness refers to the arrangement of bytes in a multi-byte number. Within a byte, the arrangement is fixed, and the least significant bit is always in position 0. However, when it comes to the arrangement of bytes in a multi-byte value, there are two types of configurations. Big-endian refers to the most significant bit being the first in memory. This order is sensible, and if memory lay out flat, then it would follow our idea of left to right for numbers and the significance of their digits. However, most of the world actually runs on Little-endian, which is when the least significant bit is the first in memory. x86, ARM, and RV are all runtime-configurable, but all generally run in LE.

Little endian exists because its popularity was driven by x86. This is because integer casting is free in LE. Consider a 4-byte integer in BE. If you want to cast that datatype to a lower size datatype like a char or a short, then you’d have to empty the more significant bits and shift the target bits back. However, in LE, you don’t need to do anything. The more significant bits are no longer considered to account for memory alignment, and the least significant bits are already in place. Therefore, if you have an int*, then casting that to a char* is free and requires no data manipulation, and they point to the same address.

Jumps

Jumps are used for function calls, returns, and just plain jumps: anywhere in the program where the next instruction is not just the next memory cell. The opcode for jump is jal, which stands for jump and link. A command looks like jal x1, 0x123. This command jumps to the PC + 0x123 and writes PC + 4 to x1. It is important to write PC+4 to x1 because this essentially represents the next instruction we should be returning to (essentially playing the role of R7 on the LC4), using an offset of 4 because the program instructions are also treated as memory aligned. The opcode jalr is also a jump, but the destination is a register instead. In this case, the command jalr x1, 0x10(x2) jumps to the value in x2 + 0x10, and writes PC+4 to x1. A regular jump is just jal x0, offset

Branches

A branch is based on comparing the values in two registers. It takes the opcode beq, and a command looks like this: beq x1, x2, 0x456. This instruction translates to branch to PC+0x456 if x1 == x2. There are also bltu and bgeu for unsigned comparisons, translating to “branch less than unsigned” and “branch greater than unsigned”.

RV Assembly

Here is some C code for getting the sum of a static array:

int array[5];
int sum;
void array_sum() {
	for (int i = 0; i < 5; i++) {
	sum += array[i];
	}
}

In RV assembly, this would translate to the following:

.data
array: .word 1, 2, 3, 4, 5

array_sum:
la a1, array         //load base address
li a2, 5             // array length
li a3, 0             // initialize sum register

loop:
	lw a4, 0(a1)     // load value at pointer a1
	add a3, a3, a4   // add loaded value to sum at a3
	addi a1, a1, 4   // increment the pointer by 4
	addi a2, a2, -1  // decrement the loop index
	bnez a2, loop    // check if a2 == 0, else jump to loop