Tuesday, 21 October 2014

Creating an NES Emulator From Scratch Part 1: CPU Basics

My first task was to read up a little bit on how emulation actually works. In our case, Emulation is when you design a software program to act as a hardware component such as a Microprocessor.

Before going further, I needed to understand how a CPU works as well as some history. So here we go. I’ll put a TL;DR at the bottom if you don’t understand… it’s a lot of stuff to try to understand , though not all the history or small details are all that important.

A CPU is a type of Microprocessor, and a Microprocessor is a chip that contains a logic center, as well as small areas of memory, called Registers. Registers are used to store numbers, states, or memory addresses pointers temporarily, until a program interacts with them. Registers are used because it is much faster for a CPU to use a register value than to fetch the same value from RAM.

Programs executed by CPUs are written in Binary, aka Machine Code. This type of language is formed by a string of 1’s and 0’s. Each 1 or 0 is called a bit, and if you have 8 bits, you have 1 byte. If you have 1000 bytes, you have a kilobyte. If you have 1000 kilobytes, you have a megabyte, and so on. Just think, every photo you have ever taken or any digital music track you have ever listened to is actually millions of 1’s or 0’s.

You may now be thinking “Hmm, I understand how data could be stored in 1’s and 0’s, but how can programs possibly be made up of them, since they aren’t physical data in the same sense as a picture?”

The gritty details are out of the scope of this first post, but CPUs (and most microchips) assign various internal instructions to correspond with a pattern of 1’s and 0’s, fed to it by a stream of data. Imagine it like Morse Code, where letters are spelled using either long (written as dashes - ) or short (written as dots . ) bursts of Light, Sound, hand signals, etc. If you were in danger, you would signal SOS to a plane with . . . - - - . . .
SOS is an instruction, and it is relayed by morse code. Binary can be thought of in a similar manner, where each instruction corresponds to a different stream of 1’s or 0’s. Instructions are referred to as OpCodes.



Back to programming. If you are familiar with programming, you may not realize that although you write your program in Java or C++, when you compile, your program is converted into this machine code, to later be used by the processor. But before there were ‘high level’ languages such as those, programmers would of had to write all their programs in binary… had it not been for a few other “Inventions”.

If you look at a page full of 1’s and 0’s eventually you are going to get dizzy and the whole lot is just going to be unreadable. To make binary more human readable, we use a numbering system called Hexadecimal. By the time this is posted I would have posted an explanation of Hex, so I’m not going to cover it again here.


Okay, so at least now we aren’t looking at a page full of 1’s and 0’s but hexadecimal isn’t much better. That’s why the next step was to program in Assembly. Assembly was created to assign “words” and “syntax” to hexadecimal (and thus, binary) commands. I use quotation marks around those two words because as you will see, Assembly kinda sucks to read as well. However, things began to look much more structured.

However, apart from straight-up binary or Hex, Assembly is one of the most painful languages to program in especially for beginners (like me!), because it appears to be so abstract, yet readable at the same time. Interestingly, at this point the only reasons assembly is worth understanding are if you are really interested in how CPU’s work, have lots of spare time and want to learn a low level language, or if you are writing an Emulator, Compiler, among other niche programs. Don’t be afraid though, we wont write our emulator in assembly.

Note that assembly, though a human readable representation of Machine Code, couldn’t be fed to the processor itself. It has to be compiled into binary before that could happen.

Eventually people got tired of Assembly and invented higher level programming languages (like C), that are much easier to read and program in, but that could be compiled (aka translated) by a computer, [back into Assembly], then finally into Binary, where a computer was ready to run the program.
 
My fingers are hurting so TL;DR:

- Emulation is where you use one computer to run a program with instructions designed for another… translation needs to be done.
- A CPU is a chip that has a logic center and small areas of memory called Registers
- Registers are temporary storage areas used in programs
- Programs are a series of 1’s and 0’s that a CPU reads as instructions. Different variations of 1’s and 0’s mean different things.
- Instructions (defined streams of 0’s and 1’s) are called OpCodes
- Hexadecimal is used to make data easier to read than if it were in Binary
- Assembly is used to make program data easier to read than if it were in Hex.
- High Level programming languages are used to make program data easier to read (and write!) than if it were in Assembly.

No comments:

Post a Comment