Introduction to Hacking

From Data Crystal
Revision as of 20:36, 20 October 2005 by (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The practice of ROM hacking has and continues to interest thousands as a hobby. Comprising both the analysis and manipulation of data, hacking can appeal to the spirit of exploration, creative problem solving, engineering, and creativity. Thanks to the subject's breadth and its propagation through the Internet, ROM hacking has become a sort of art form. The Wikipedia article on ROM hacking offers an excellent overview of the practice, but it is not a prerequisite for this article, which intends to introduce any interested person to the subject and to set aspiring hackers on their ways.

Aspects of ROM Hacking

It is possible to view ROM hacking as two somewhat different practices: analysis and manipulation, the differences between the two being somewhat comparable to the differences between pure and applied sciences (respectively). Many casual hackers and gaming enthusiasts are familiar with the end result of data manipulation - a completed hack, usually distributed in a patch. Hackers alter the data in a game for whatever purposes they desire. A hacker can make a game easier or more difficult, change graphics, alter or translate text, or even use and/or abuse the game's engine to design a game of his or her own. To make these modifications, of course, requires either specialized tools or knowledge of the workings of the game. As such, the other aspect of ROM hacking is the analysis of ROMs to discover their inner workings and contents. Some hackers prefer the challenge of reverse-engineering and probing the deepest recesses of ROMs; some hackers prefer to use hacking to express themselves creatively. Some hackers enjoy both aspects.

Binary and Hexadecimal

To become a hacker, no matter what the intent, it will most likely be necessary to have at least a casual understanding of the binary and hexadecimal number systems and the applications of these systems to computer science. Such an explanation follows; readers familiar with these systems and their use may skip the following section.


Most readers will be familiar with the everyday number system, in which thirteen is written 13, a hundred and two is written 102, and so on. This system is known as the decimal or base 10 (or synonymously radix 10) number system. Within the base 10 system, we use ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Each digit stands for a quantity: none, one, two, and so on. Rather than having to write a number such as eleven as, for example, 9 + 2 or 8 + 3, we have a system of columns. A one-digit number has numbers only in the "ones column", where each digit is worth just what it says: 6 means six and so on. The next column is the tens column, in which every digit is worth ten times its normal value. 16, for example, means one ten and six ones, better known as sixteen. The next column after the tens is the hundreds, followed by the thousands and so on. A pattern should be evident: in base 10, the columns have values of 100, 101, 102, 103, and so on. Let us extend this concept to a system of a different radix.

It is suggested that the readability of this section could be improved with tables or diagrams.


Of fundamental importance to computer science is the binary system, which is base 2. Attempt to comprehend such a system with the same approach that was taken to comprehend base 10. In the base 2 system, there are two digits: the familiar 0 and 1, respectively meaning none and one. With only these quantities, we cannot represent two, as no digit on its own represents two. Therefore we use a system of columns, where the first column has value of 20, the second 21, then 22, 23, 24, and so on. Evaluating these exponentials, we see that we have a ones column, then a twos, then 4, 8, 16, 32, 64, 128, and so on. So, to write two in binary, we write 10. Observe why this works: 10 means one two and no ones, thus two. As another example, observe that one can represent twenty-one in binary as 10101 (one sixteen, one four, and one one). Though binary numbers can become long somewhat quickly (notice that the three-digit decimal number 255 represented in binary, 11111111, is an eight-digit number), binary allows us to represent any value with nothing more than the simple values one or none.


Assuming comprehension of the previous section, it should not be difficult to understand one further number system: hexadecimal, or base 16. One potentially alarming fact is that such a system would require sixteen digits. This is handled simply: after the familiar digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, we introduce the digits A, B, C, D, E, and F. Just as 9 represents nine, A represents ten, B represents eleven, and so on. Note that in hexadecimal, a quantity as large as fifteen can be represented with a single digit (F). For larger quantities, again, we use columns. The columns' values are 160, the second 161, then 162, 163, 164, and so on. Note that these values quickly become alarmingly large (1, 16, 256, 4096...); fortunately, there is rarely need to use more than a sixteens column. As an example, the value thirty-five is written in hexadecimal as 23, which indicates two sixteens and three ones. To demonstrate the unfamiliar digits, note that the value sixty is written in hexadecimal as 3C; that is, three sixteens and twelve ones, where we recall that the digit C represents twelve. Notice that in hexadecimal, a two-digit number can represent a value as large as two hundred and fifty five (FF in hex).

With multiple number systems in common use amongst hackers, it is not always immediately clear what number system is being used. The presence of letters is a pretty good indication that the number is in hex, but the number 14 could very well mean fourteen (decimal) or twenty (hex). To clear up confusion, hexadecimal numbers are commonly preceeded with the symbol 0x. The 0x symbol does not have a mathematical value or meaning; it simply indicates that the number proceeding it is written in hex. The number 0x21, for example, is thirty-three, not twenty-one.

Use of the Systems

It may not be immediately obvious why it is important to know the binary and hexidecimal systems, but a bit of knowledge of the workings of computers should clear things up. All information in the memory of a computer is stored by electrical switches which are either on or off. There are no in-between values; the only states are on and off. The binary digits 0 and 1 can represent off and on, respectively. Therefore each on-or-off switch in memory is represented by one binary digit, or bit. Of course, a one-digit binary number does not allow for the storage of large quantities, so bits are divided into groups of eight known as bytes. Note that this means that a byte can store any value from 00000000 to 11111111 (decimal 0 and 255 respectively). To represent values 256 and larger, computers simply group bytes together. For instance, the number 600 requires two bytes, as such: 00000010 01011000. To verify this more easily, rather than taking the 1 in the first byte to mean one five-hundred-and-twelve, simply determine the value of each byte independently, multiply the value of the higher byte by 256, and add - note that this works as since 10 * 100000000 = 1000000000. Using our example (and writing in decimal for convenience), we have that the combination of the bytes 00000010 01011000 has a value of 256(2) + (64 + 16 + 8) = 600. It should be noted that whether the byte of the higher value (known as the high byte) does not always come before the lower byte: this is the concept of endianness, and it varies from platform to platform.

The use of hexidecimal in computer science is not so much a necessity as a great convenience. There are two important observations to be made about hexadecimal. Note first that any byte can be written as a two-digit hexadecimal number, from the smallest (0x00, 0 in decimal) to the largest (0xFF, 255 in decimal). The previous two-byte value 600 is written in hex as 02 58. For the second convenience of hexadecimal, pay close attention to the following observation: 0x58 = 01011000, 0101 = 0x5, and 1000 = 0x8. This applies to all bytes: the high hex digit represents the four highest bits; the low hex digit represents the four lowest bits. A group of four bits is known as a nybble. With this in mind, it is easy to mentally convert from binary to hex and vice versa. For example, observing 11010111, 1101 = 0xD and 0111 = 0x7, so 11010111 = 0xD7. Calculations between these systems and decimal, when not easily computed mentally, can be performed by many calculators, including the calculator calc.exe included with most versions of Microsoft Windows.

This article is unfinished and in progress.