Integer Variables and Bit Manipulation
This section goes much more deeply into the way a computer represents numbers and character strings. You might start off by skimming this section to see whether you will need to study it in detail. You will need this material only if you pack several pieces of data in one variable or if you want to use -calc- operations on character strings. A variable such as v150 can hold a number as big as 10322 (the number 1 followed by 322 zeros) or a non-zero number as small as 10-293 (a 1 in the 293rd position after the decimal point). These huge or tiny numbers may be positive or negative, from ±10-293 up to ±10322. Any number held in v150 is recorded as sixty tiny “bits” of information. For example, whether the number is positive or negative is one bit of information, and whether the magnitude is 10+200 or 10-200 is another bit of information. The remaining 58 bits of information are used to specify precisely the number held in v150.
What is a bit? A bit is the smallest possible piece of information and represents a two-way (binary) choice such as yes or no, or true or false, or up or down (anything with two possibilities). A number is positive or negative and these two possibilities can be represented by one bit of information. Numbers themselves can be represented by bits corresponding to yes or no. Let us see how any number from zero to seven can be represented by three bits corresponding to the yes or no answers to just three questions. Suppose a friend is thinking of a number between zero and seven and you are to determine it by asking the fewest possible questions to be answered yes or no. Suppose the friend's number is 6:
From this you correctly conclude that the number is 6. You determined that the number was made up of a 4, a 2, and no 1. You might also say that the number can be represented by the sequence “yes,yes,no”!
As another example, try to guess a number between zero and 63 chosen by the friend. Suppose it is 37:
So the number is 37, or perhaps “yes,no,no,yes,no,yes”. Try this questioning strategy on any number from zero to 63 and you will find that six questions are always sufficient to determine the number. The strategy depends on cutting the unknown range in two each time (a so-called “binary chop”).
Conversely, any number between zero and 63 can be represented by a sequence of yes and no answers to six such questions. What number is represented by the sequence?
This number must be built up of a 32, a 16, no 8, a 4, no 2, and a 1. 32+16+4+1 is 53, so the sequence represents the number 53. Because a yes or no answer is the smallest bit of information we can extract from our friend, we say any number between zero (six nos) and 63 (six yeses) can be represented by six bits. If on the other hand we know the number is between zero and seven, three bits are sufficient to describe the number fully. Similarly, numbers up to 15 (24-1) can be expressed with four bits, and numbers up to 31 (25-1) with five bits. Each new power of two requires another bit because it requires another yes/no question to be asked.
This method of representing numbers as a sequence of bits, each bit corresponding to a yes or no, is called “binary notation” and is the method normally used by computers. Whether a computer bit represents yes or no is typically specified by a tiny electronic switch being on or off, or by a tiny piece of iron being magnetized up or down. A TUTOR variable contains sixty bits of yes/no information and could therefore be used to hold a positive integer as big as (260-1), which is approximately 1018, or 1 followed by 18 zeros. What do we do about negative integers? Instead of using all sixty bits we could give up one bit to represent whether the number is positive or negative (again, a two-way or binary bit of information) and just use 59 bits for the magnitude of the number. In this way we could represent positive or negative integers up to ±(259— 1), which is approximately plus or minus one-half of 1018.
But what do we do about bigger numbers, or numbers such as 3.782 which are not integers? The scheme used on the CONTROL DATA® PLATO computer is analogous to the scientific notation used to express large numbers. For example, 6.02×1023 is a much more compact form than 602 followed by 21 zeros, and it consists of two essential pieces: the number 6.02 and the exponent or power of ten (23). Instead of using 59 bits for the number, we use only 48 bits and use 11 bits for the exponent. Of these .11 bits, one is used to say whether the exponent is positive or negative (the difference between 10+6, a million, and 10-6, one-millionth). The remaining ten bits are used to represent exponents as big as one thousand (210-1 is 1023, to be precise). The exponent is actually a power of two rather than ten, as though our scientific notation for the number 40 were written as 5×23 instead of 4×101. That is, instead of expressing the number 40 as 4×101, we express it as 5 x 23, putting the 5 in our 48-bit number and the 3 in the 11-bit exponent storage place. In this way we split up the 60 bits as:
The 48-bit number will hold an integer as big as (248-1), which is about 2.5×1014. If we wish to represent the number 1/4, the variable will have a number of 247 and an exponent of -49:
|247 x 2-49=2-2=1/4|
That is, the 48-bit number will hold a large integer, 247, and the exponent or power of 2, will be -49. The complicated format just described is that used by the PLATO computer when we calculate with variables vl through v150. It automatically takes care of an enormous range of numbers by separating each number into a 48-bit number and a power of two. This format is called “fractional” or “floating-point” format because non-integral values can be expressed and the position of the decimal point floats automatically right or left as operations are performed on the variable.
Sometimes this format is not suitable, particularly when dealing with strings of characters. The -storea- and -pack- commands place ten alphanumeric characters into each variable or “word” (a computer variable is often called a “word” because it can contain several characters). We simply split up the sixty bits of the word into ten characters of six bits each, six bits being sufficient to specify one of 64 possible characters, from character number zero to character number 63 (26-1). In this scheme character number 1 corresponds to an “a”, number 2 to a “b”, number 26 to a “z”, number 27 to a “0”, number 28 to a “r, etc. A capital D requires two 6-bit character slots including one for a “shift” character (which happens to be number 56) and one for a lower-case “d” (number 4). The -showa- command takes such strings of 6-bit character codes and displays the corresponding letters, numbers, or punctuation marks on the student's screen.
Nonsensical things happen when a -showa- command is used to display a word which contains a floating-point number. The two sign bits (for the number and for the exponent) and the first four bits of the exponent make up the first 6-bit character code. The last six bits of the exponent are taken as specifying the second 6-bit code. Then the remaining 48 bits are taken as specifyik eight 6-bit character codes. Small wonder that using a -showa- on anything other than character strings usually puts gibberish on the screen. On the other hand, using a -show- with a character string gives nonsense: the floating-point exponent is made up out of pieces of the first and second 6-bit character codes, the 48-bit number comes from the last eight character codes, and whether the number and the exponent are positive or negative is determined by the first two bits of the first character code. (See Fig. 10-1)
So far we have kept numerical manipulations (-talc-, -store-, -show-) completely separate from character string manipulations (-storea-, -showa-). The reasons should now be clear. It is sometimes advantageous, however, to be able to use the power of -talc- in manipulating character strings and similar sequences of bits. For such manipulations we would like to notify TUTOR not to pack numbers into a variable in the useful but complicated floating-point format. This is done by referring to “integer variables”:
|n1,n2,n3—– ——— n149,n150|
The integer variable n17 is the same storage place as v17, but its internal format will be different. If we say “calc v17⇐6”, TUTOR will put into variable number 17 the number 6, expressed as 6×245 with an exponent of -45, so that the complete number is 6 x 245×2-45, or 6. If on the other hand we say “calc n1”7 ”, TUTOR will just put the number 6 into variable number 17. (See Fig. 10-2.) Since the number 6 requires only three bits to specify it, variable 17 will have its first 57 bits unused (unlike the situation when we refer to the 17th variable as v17, in which case both the exponent and the magnitude portions of the variable contain information).
Consider the following sequence:
If we say “calc n23⇐5.7”, variable n23 will be assigned the value 6. Rounding is performed in assigning values to integer variables. If truncation is desired, use the “int” function: “n23⇐int(5.7)” will assign the integer part (5) to n23. Indexed integer variables are written as “n(index)” in analogy with “v(index)”.
The -showa- and -storea- commands may be used with either v-variables or n-variables. These commands simply interpret any v- or n-variable as a character string. This is the reason why we were able to use -showa- and -storea- without discussing integer variables.
It is possible to shift the bits around inside an integer variable. In particular, a “circular left shift”, abbreviated as “$cls$”, will move bits to the left, with a wrap-around to the right end of the variable. For example:
will display an “f” even though the -showa- will display only the first character, because the “6” has been shifted left 54 bit positions (9 six-bit character positions). A circular left shift of 54 may also be thought of as a right circular shift of 6 because of the wrap-around nature of the circular shift.
We have been using “n17” as an example, but we should actually be writing “inum” or some such name, where we have used a -define- to specify that “inum=n17”. For the remainder of this chapter we revert, therefore, to the custom of referring to variables (v or n) by name rather than number. Also, if we want the character code corresponding to the letter “f” we should use “f” rather than 6. For example:
The quotation marks can be used to specify strings of characters. For example:
will put these numbers in inum:
A “showa inum,10” will display “cat”. Notice, particularly, that using quotes in a -calc- to define a character string puts the string at the right (“right adjusted”), whereas the -storea- and -pack- commands produce left-adjusted character strings. It is possible to create left-adjusted character strings by using single quote marks: inum`cat' will place the “cat” in the first three character positions rather than the last three.
Let us now return to our early example of the number 37 expressed as the sequence of six bits “yes,no,no,yes,no,yes”. If we let 1 stand for “yes”, and 0 for “no”, we might write this sequence as:
|(1×32)+(0×16)+(0×8)+(1×4)+(0×2)+(1×1) = 32+0+0+4+0+1 = 37|
or even more suggestively:
|(1×25)+(0×24)+(0×23)+(1×22)+(0×21)+(1×20) = 32+0+0+4+0+1 37|
(Note that 20 equals 1.) Writing the sequence in this way is analogous to writing 524 as:
|(5×102)+(2×101)+(4×100) = 500+20+4 = 524|
In other words, when we write 524 we imply a “place notation” in base 10 such that each digit is associated with a power of 10: 5×102, 2×101, 4×100. Similarly, rewriting our yes and no sequences as 1 and 0 sequences, we find that the string of ones and zeros turns out to be the place notation in base 2 for the number being represented.
Here are some examples. (10012 means 1001 in base 2.)
10012 = 23+20 = 8+1 = 9
11002 = 23+22 = 8+4 = 12
1101012 = 25+24+22+20 = 32+16+4+1 = 53
10000012 = 26+20 = 64+1 = 65
This base 2 (or “binary”)notation can be used to represent any pattern of bits in an integer variable, and with some practice you can mentally convert back and forth between base 10 and base 2. This becomes important if you perform certain kinds of bit manipulations.
An important property of binary representations is that shifting left or right is equivalent to multiplying or dividing. Consider these examples:
So, a left shift of N bit positions is equivalent to multiplying by 2N. A right shift of N bit positions is equivalent to division by 2N (assuming no bits wrap around to the left end in a $cls$ of 60—N). There exists an “arithmetic right shift”, $ars$, which is not circular but simply throws away any bits that fall off the right end of the word:
This corresponds to a division by 23, with truncation (9/23 = 9/8 which truncates to 1).
A major use of the 60 bits held in an integer variable is to pack into one word many pieces of information. For example, you might have 60 “flags” set up or down (1 or 0) to indicate 60 yes or no conditions, perhaps corresponding to whether each of 60 drill items has been answered correctly or not. Or you might keep fifteen 4-bit counters in one word: each 4-bit counter could count from zero to as high as 15 (24-1) to keep track of how well the student did on each of fifteen problems. Ten bits is sufficient to specify integers as large as 1023: you could store six 10-bit baseball batting averages in one word, with suitable normalizations. Suppose a batting average is .324. Multiply by a thousand to make it an integer (324) and store this integer in one of the 10-bit slots. When you withdraw this integer, divide it by a thousand to rescale it to a fraction (.324). When we discussed arrays we had exam scores ranging from zero to 100. The next larger power of two is 128 (27), so we need only 7 bits for each integer exam score. Eight such 7-bit quantities could be stored in one 60-bit word.
How do you extract a piece of information packed in a word? As an example, suppose you want three bits located in the 19th of twenty 3-bit slots of variable “spack”:
The number 7 is 1112 (base 2: 4+2+1), so it is a 3-bit quantity with all three bits “set” or “on” (non-zero). The $mask$ operation pulls out the corresponding part of the other word, the 3-bit piece we are interested in. In an expression (x $mask$ y), the result will have bits set (1) only in those bit positions where both x and y have bits set. In those bit positions where either x or y have bits which are “reset” or “off” (0), the $mask$ operation produces a 0. We could also have used a “segment” definition to split up the word into 3-bit segments.
A 4-bit mask would be 15 (11112) and a 5-bit mask would be 31 (111112). (Again, “segment” definitions of 4 or 5 bits could be used.) You might even need a mask such as 1101112 (or 55) which will extract bits located in the five bit positions where 1101112 has bits set. There should be a simpler way of writing down numbers corresponding to particular bit patterns. Certainly, reading the number 55 does not immediately conjure up the bit pattern 1101112!
A compact way of expressing patterns of bits depends on whether or not each set of three bits can represent a number from 0 to 7:
Just as each digit in a decimal number (base 10) runs from 0 to 9, so do the individual numerals run from 0 to 7 in an octal number (base 8). Octal numbers are useful only because they represent a compact way of expressing bit patterns. With practice, you should be able to convert between octal and base 2 instantaneously, and between base 8 and base 10 somewhat slower! See the table below.
|base 10||base 8||base 2|
|0||0||0 or 000|
|1||1||1 or 001|
|2||2||10 or 010|
|3||3||11 or 110|
The conversion between base 8 and base 2 is a matter of memorizing the first eight patterns, after which translating 11010110111012 to octal is simply a matter of drawing some dividers every three bits:
How about the octal version of the number 79? The biggest power of 8 in 79 is 82 (64), and 79 is 15 more than 64. In turn, 15 is 1×81+7×80, so:
|7910 = 1×64+1×8+7×1 = 1×82+1×81+7×80 = 1178|
Luckily, in bit manipulations the conversions between base 2 and base 8 are more important than the harder conversions between base 8 and base 10.
To express an octal number in TUTOR, use an initial letter “o”:
x $mask$ o37
will extract the right-most 5 bits from x, because o37 = 378 = 0111112, which has 5 bits set. Naturally, a number starting with the letter “o” must not contain 8's or 9's.
You can display an octal number with a -showo- command (show octal):
will display “00000000000000000047” on the screen (3910=478). The default format is twenty (3-bit) octads, corresponding to a whole 60-bit word:
will display “0047”, showing just four octads.
Now that we have discussed the octal notation, it is possible to point out what happens to negative numbers:
will display “77777777777777777730”. A negative number is the “complement” of the positive number (binary l's are changed to 0's and binary 0's are changed to l's). In octal, the complement of 0 is 7 (0002→1112 = 78), and the complement of 78 is 08. In the example shown, octal 478 is 001112, whose complement is 0110002, or 308. Notice that the left-most bit (the “sign” bit) of a negative number is always set. In order for a negative number to stay negative upon performing an “arithmetic right shift”, all the left-most bits are set. So,
o40000000000000003242 $ars$ 6
Only the sign bit was set among the left-most bits before the shift (o40 is 1000002), but after the shift the first seven bits are all set. The “circular left shift”, $cls$, does not do anything special with the sign bit.
It is interesting to see the bits set for floating-point numbers:
will make this display;
Note that the negative number is the complement of the positive. The 48-bit magnitude (6000000000000000) represents a huge integer (6×245). The eleven bits between the sign bit and the 48-bit magnitude give the power of two (-46) by which the magnitude is to be scaled (3 = 6×245×2-46 = 6×2-1 = 3). A bias of 20008 is added to the correct exponent (-46, or —568) to give an eleven-bit exponent of 17218. Exponents less than 20008 represent negative powers and exponents greater than 20008 represent positive powers.
We have encountered octal numbers (e.g., o327) which can be shifted left ($cls$) and right ($ars$) and complemented (by making them negative). Pieces can be extracted with a $mask$ operation. Additional bit operations are $union$, $diff$, and “bitcnt”. The “bitcnt” function gives the number of bits set in a word: bitcnt(o25) is 3, because o25 is 0101012, which has 3 bits set; bitcnt (—o25) is 57, since the complement will have only 3 of 60 bits not set; and bitcnt (0) is 0. Like $mask$, $union$ and $diff$ operate on the individual bit positions, with all 60 done at once:
x $mask$ y produces a 1 only where both x and y have l's. x $union$ y produces a 1 where either x or y or both have l's. x $diff$ y produces a 1 only where x and y differ.
Note that $union$ might be called “merge”, since l's will appear in every bit position where either x or y have bits set. The $diff$ operation might also be referred to as an “exclusive” union, since it will merge bits except for those places where both x and y have bits set.
While $mask$ can be used to extract a piece of information from a word, a $mask$ that includes all but that piece followed by a $union$ can be used to insert a new piece of information.
These bit operations can be used with arrays. For example, if A, B, and C are true arrays, the statement “C⇐A $diff$ B” will replace each element of C by the bit difference of the corresponding elements of A and B.