# Integer Variables and Bit Manipulation

This section goes much more deeply into the way a computer represents
numbers and character strings. You might start off by skimming
this section to see whether you will need to study it in detail. You will
need this material only if you pack several pieces of data in one variable
or if you want to use -calc- operations on character strings.
A variable such as v150 can hold a number as big as 10^{322}
(the number 1 followed by 322 zeros) or a non-zero number as small as 10^{-293}
(a 1 in the 293rd position after the decimal point). These huge or tiny
numbers may be positive or negative, from ±10^{-293} up to ±10^{322}. Any
number held in v150 is recorded as sixty tiny “bits” of information. For
example, whether the number is positive or negative is one bit of
information, and whether the magnitude is 10^{+200} or 10^{-200} is another bit
of information. The remaining 58 bits of information are used to specify
precisely the number held in v150.

What is a bit? A bit is the smallest possible piece of information and represents a two-way (binary) choice such as yes or no, or true or false, or up or down (anything with two possibilities). A number is positive or negative and these two possibilities can be represented by one bit of information. Numbers themselves can be represented by bits corresponding to yes or no. Let us see how any number from zero to seven can be represented by three bits corresponding to the yes or no answers to just three questions. Suppose a friend is thinking of a number between zero and seven and you are to determine it by asking the fewest possible questions to be answered yes or no. Suppose the friend's number is 6:

From this you correctly conclude that the number is 6. You determined that the number was made up of a 4, a 2, and no 1. You might also say that the number can be represented by the sequence “yes,yes,no”!

As another example, try to guess a number between zero and 63 chosen by the friend. Suppose it is 37:

So the number is 37, or perhaps “yes,no,no,yes,no,yes”. Try this questioning strategy on any number from zero to 63 and you will find that six questions are always sufficient to determine the number. The strategy depends on cutting the unknown range in two each time (a so-called “binary chop”).

Conversely, any number between zero and 63 can be represented by a sequence of yes and no answers to six such questions. What number is represented by the sequence?

This number must be built up of a 32, a 16, no 8, a 4, no 2, and a 1.
32+16+4+1 is 53, so the sequence represents the number 53.
Because a yes or no answer is the smallest bit of information we can
extract from our friend, we say any number between zero (six nos) and 63
(six yeses) can be represented by six bits. If on the other hand we know
the number is between zero and seven, three bits are sufficient to describe
the number fully. Similarly, numbers up to 15 (2^{4}-1) can be expressed
with four bits, and numbers up to 31 (2^{5}-1) with five bits. Each new
power of two requires another bit because it requires another yes/no
question to be asked.

This method of representing numbers as a sequence of bits, each bit
corresponding to a yes or no, is called “binary notation” and is the
method normally used by computers. Whether a computer bit represents
yes or no is typically specified by a tiny electronic switch being on or off,
or by a tiny piece of iron being magnetized up or down. A TUTOR
variable contains sixty bits of yes/no information and could therefore be
used to hold a positive integer as big as (2^{60}-1), which is approximately
1018, or 1 followed by 18 zeros. What do we do about negative integers?
Instead of using all sixty bits we could give up one bit to represent
whether the number is positive or negative (again, a two-way or binary bit
of information) and just use 59 bits for the magnitude of the number. In
this way we could represent positive or negative integers up to ±(2^{59}— 1),
which is approximately plus or minus one-half of 10^{18}.

But what do we do about bigger numbers, or numbers such as 3.782
which are not integers? The scheme used on the CONTROL DATA®
PLATO computer is analogous to the scientific notation used to express
large numbers. For example, 6.02×10^{23} is a much more compact form
than 602 followed by 21 zeros, and it consists of two essential pieces: the
number 6.02 and the exponent or power of ten (23). Instead of using 59
bits for the number, we use only 48 bits and use 11 bits for the exponent.
Of these .11 bits, one is used to say whether the exponent is positive or
negative (the difference between 10^{+6}, a million, and 10^{-6}, one-millionth).
The remaining ten bits are used to represent exponents as big as one
thousand (2^{10}-1 is 1023, to be precise). The exponent is actually a power
of two rather than ten, as though our scientific notation for the number 40
were written as 5×2^{3} instead of 4×10^{1}. That is, instead of expressing the
number 40 as 4×10^{1}, we express it as 5 x 2^{3}, putting the 5 in our 48-bit
number and the 3 in the 11-bit exponent storage place. In this way we
split up the 60 bits as:

The 48-bit number will hold an integer as big as (2^{48}-1), which is about
2.5×10^{14}. If we wish to represent the number 1/4, the variable will have a
number of 2^{47} and an exponent of -49:

2^{47} x 2^{-49}=2^{-2}=1/4 |

That is, the 48-bit number will hold a large integer, 2^{47}, and the exponent or power of 2, will be -49. The complicated format just described is that
used by the PLATO computer when we calculate with variables vl
through v150. It automatically takes care of an enormous range of
numbers by separating each number into a 48-bit number and a power of
two. This format is called “fractional” or “floating-point” format because
non-integral values can be expressed and the position of the decimal
point floats automatically right or left as operations are performed on the
variable.

Sometimes this format is not suitable, particularly when dealing with
strings of characters. The -storea- and -pack- commands place ten
alphanumeric characters into each variable or “word” (a computer
variable is often called a “word” because it can contain several characters).
We simply split up the sixty bits of the word into ten characters of
six bits each, six bits being sufficient to specify one of 64 possible
characters, from character number zero to character number 63 (2^{6}-1). In
this scheme character number 1 corresponds to an “a”, number 2 to a “b”,
number 26 to a “z”, number 27 to a “0”, number 28 to a “r, etc. A capital
D requires two 6-bit character slots including one for a “shift” character
(which happens to be number 56) and one for a lower-case “d” (number
4). The -showa- command takes such strings of 6-bit character codes and
displays the corresponding letters, numbers, or punctuation marks on the
student's screen.

Nonsensical things happen when a -showa- command is used to display a word which contains a floating-point number. The two sign bits (for the number and for the exponent) and the first four bits of the exponent make up the first 6-bit character code. The last six bits of the exponent are taken as specifying the second 6-bit code. Then the remaining 48 bits are taken as specifyik eight 6-bit character codes. Small wonder that using a -showa- on anything other than character strings usually puts gibberish on the screen. On the other hand, using a -show- with a character string gives nonsense: the floating-point exponent is made up out of pieces of the first and second 6-bit character codes, the 48-bit number comes from the last eight character codes, and whether the number and the exponent are positive or negative is determined by the first two bits of the first character code. (See Fig. 10-1)

So far we have kept numerical manipulations (-talc-, -store-, -show-) completely separate from character string manipulations (-storea-, -showa-). The reasons should now be clear. It is sometimes advantageous, however, to be able to use the power of -talc- in manipulating character strings and similar sequences of bits. For such manipulations we would like to notify TUTOR not to pack numbers into a variable in the useful but complicated floating-point format. This is done by referring to “integer variables”:

n1,n2,n3—– ——— n149,n150 |

The integer variable n17 is the same storage place as v17, but its internal
format will be different. If we say “calc v17⇐6”, TUTOR will put into
variable number 17 the number 6, expressed as 6×2^{45} with an exponent of
-45, so that the complete number is 6 x 2^{45}×2^{-45}, or 6. If on the other hand we say “calc n1”7 ”, TUTOR will just put the number 6 into
variable number 17. (See Fig. 10-2.) Since the number 6 requires only
three bits to specify it, variable 17 will have its first 57 bits unused (unlike
the situation when we refer to the 17th variable as v17, in which case both
the exponent and the magnitude portions of the variable contain information).

Consider the following sequence:

If we say “calc n23⇐5.7”, variable n23 will be assigned the value 6. Rounding is performed in assigning values to integer variables. If truncation is desired, use the “int” function: “n23⇐int(5.7)” will assign the integer part (5) to n23. Indexed integer variables are written as “n(index)” in analogy with “v(index)”.

The -showa- and -storea- commands may be used with either v-variables or n-variables. These commands simply interpret any v- or n-variable as a character string. This is the reason why we were able to use -showa- and -storea- without discussing integer variables.

It is possible to shift the bits around inside an integer variable. In particular, a “circular left shift”, abbreviated as “$cls$”, will move bits to the left, with a wrap-around to the right end of the variable. For example:

will display an “f” even though the -showa- will display only the first character, because the “6” has been shifted left 54 bit positions (9 six-bit character positions). A circular left shift of 54 may also be thought of as a right circular shift of 6 because of the wrap-around nature of the circular shift.

We have been using “n17” as an example, but we should actually be writing “inum” or some such name, where we have used a -define- to specify that “inum=n17”. For the remainder of this chapter we revert, therefore, to the custom of referring to variables (v or n) by name rather than number. Also, if we want the character code corresponding to the letter “f” we should use “f” rather than 6. For example:

The quotation marks can be used to specify strings of characters. For example:

will put these numbers in inum:

A “showa inum,10” will display “cat”. Notice, particularly, that using quotes in a -calc- to define a character string puts the string at the right (“right adjusted”), whereas the -storea- and -pack- commands produce left-adjusted character strings. It is possible to create left-adjusted character strings by using single quote marks: inum`cat' will place the “cat” in the first three character positions rather than the last three.

Let us now return to our early example of the number 37 expressed as the sequence of six bits “yes,no,no,yes,no,yes”. If we let 1 stand for “yes”, and 0 for “no”, we might write this sequence as:

100101 |

(1×32)+(0×16)+(0×8)+(1×4)+(0×2)+(1×1) = 32+0+0+4+0+1 = 37 |

or even more suggestively:

(1×2 ^{5})+(0×2^{4})+(0×2^{3})+(1×2^{2})+(0×2^{1})+(1×2^{0}) = 32+0+0+4+0+1 37 |

(Note that 2^{0} equals 1.) Writing the sequence in this way is analogous to
writing 524 as:

(5×10 ^{2})+(2×10^{1})+(4×10^{0}) = 500+20+4 = 524 |

In other words, when we write 524 we imply a “place notation” in base
10 such that each digit is associated with a power of 10: 5×10^{2}, 2×10^{1},
4×10^{0}. Similarly, rewriting our yes and no sequences as 1 and 0
sequences, we find that the string of ones and zeros turns out to be the
place notation in base 2 for the number being represented.

Here are some examples. (1001_{2} means 1001 in base 2.)

1001 _{2} = 2^{3}+2^{0} = 8+1 = 91100 _{2} = 2^{3}+2^{2} = 8+4 = 12110101 _{2} = 2^{5}+2^{4}+2^{2}+2^{0} = 32+16+4+1 = 531000001 _{2} = 2^{6}+2^{0} = 64+1 = 65 |

This base 2 (or “binary”)notation can be used to represent any pattern of bits in an integer variable, and with some practice you can mentally convert back and forth between base 10 and base 2. This becomes important if you perform certain kinds of bit manipulations.

An important property of binary representations is that shifting left or right is equivalent to multiplying or dividing. Consider these examples:

**⇐ Shift left 2 places**9 $cls$ 2 = 1001<sub>2</sub> $cls$ 2 = 1001

**00**<sub>2</sub> = 36

**(left shift 2 is like multiplying by 2**

^{2}or 4)**⇐ Shift left 3 places**9 $cls$ 3 = 1001<sub>2</sub> $cls$ 3 = 1001

**000**<sub>2</sub> = 72

**(left shift 3 is like multiplying by 2**

^{3}or 8)
So, a left shift of N bit positions is equivalent to multiplying by 2^{N}. A
right shift of N bit positions is equivalent to division by 2^{N} (assuming no
bits wrap around to the left end in a $cls$ of 60—N). There exists an
“arithmetic right shift”, $ars$, which is not circular but simply throws
away any bits that fall off the right end of the word:

**Thrown Away**9 $ars$ 3 = 1001<sub>2</sub> $ars$ 3 = 1

**001**= 1

This corresponds to a division by 2^{3}, with truncation (9/2^{3} = 9/8 which
truncates to 1).

A major use of the 60 bits held in an integer variable is to pack into
one word many pieces of information. For example, you might have 60
“flags” set up or down (1 or 0) to indicate 60 yes or no conditions, perhaps
corresponding to whether each of 60 drill items has been answered
correctly or not. Or you might keep fifteen 4-bit counters in one word:
each 4-bit counter could count from zero to as high as 15 (2^{4}-1) to keep
track of how well the student did on each of fifteen problems. Ten bits is
sufficient to specify integers as large as 1023: you could store six 10-bit
baseball batting averages in one word, with suitable normalizations.
Suppose a batting average is .324. Multiply by a thousand to make it an
integer (324) and store this integer in one of the 10-bit slots. When you
withdraw this integer, divide it by a thousand to rescale it to a fraction
(.324). When we discussed arrays we had exam scores ranging from zero
to 100. The next larger power of two is 128 (2^{7}), so we need only 7 bits for
each integer exam score. Eight such 7-bit quantities could be stored in
one 60-bit word.

How do you extract a piece of information packed in a word? As an example, suppose you want three bits located in the 19th of twenty 3-bit slots of variable “spack”:

The number 7 is 111_{2} (base 2: 4+2+1), so it is a 3-bit quantity with all
three bits “set” or “on” (non-zero). The $mask$ operation pulls out the
corresponding part of the other word, the 3-bit piece we are interested in.
In an expression (x $mask$ y), the result will have bits set (1) only in
those bit positions where both x and y have bits set. In those bit positions
where either x or y have bits which are “reset” or “off” (0), the $mask$
operation produces a 0. We could also have used a “segment” definition
to split up the word into 3-bit segments.

A 4-bit mask would be 15 (1111_{2}) and a 5-bit mask would be 31
(11111_{2}). (Again, “segment” definitions of 4 or 5 bits could be used.) You
might even need a mask such as 110111_{2} (or 55) which will extract bits
located in the five bit positions where 110111_{2} has bits set. There should
be a simpler way of writing down numbers corresponding to particular
bit patterns. Certainly, reading the number 55 does not immediately
conjure up the bit pattern 110111_{2}!

A compact way of expressing patterns of bits depends on whether or not each set of three bits can represent a number from 0 to 7:

Just as each digit in a decimal number (base 10) runs from 0 to 9, so do the individual numerals run from 0 to 7 in an octal number (base 8). Octal numbers are useful only because they represent a compact way of expressing bit patterns. With practice, you should be able to convert between octal and base 2 instantaneously, and between base 8 and base 10 somewhat slower! See the table below.

base 10 | base 8 | base 2 |
---|---|---|

0 | 0 | 0 or 000 |

1 | 1 | 1 or 001 |

2 | 2 | 10 or 010 |

3 | 3 | 11 or 110 |

4 | 4 | 100 |

5 | 5 | 101 |

6 | 6 | 110 |

7 | 7 | 111 |

8 | 10 | 1000 |

9 | 11 | 1001 |

10 | 12 | 1010 |

11 | 13 | 1011 |

12 | 14 | 1100 |

13 | 15 | 1101 |

The conversion between base 8 and base 2 is a matter of memorizing the
first eight patterns, after which translating 1101011011101_{2} to octal is
simply a matter of drawing some dividers every three bits:

_{8}

**What is 15335**8

_{8}in base 10?^{4}| 8

^{3}| 8

^{2}| 8

^{1}| 8

^{0}4096 | 512 | 64 | 8 | 1 1 5 3 3 5 = 1×4096 + 5×512 + 3×64 + 3×8 + 5 = 5853

_{10}

How about the octal version of the number 79? The biggest power of 8 in
79 is 8^{2} (64), and 79 is 15 more than 64. In turn, 15 is 1×8^{1}+7×8^{0}, so:

79 _{10} = 1×64+1×8+7×1 = 1×8^{2}+1×8^{1}+7×8^{0} = 117_{8} |

Luckily, in bit manipulations the conversions between base 2 and base 8 are more important than the harder conversions between base 8 and base 10.

To express an octal number in TUTOR, use an initial letter “o”:

x $mask$ o37

will extract the right-most 5 bits from x, because o37 = 37_{8} = 011111_{2},
which has 5 bits set. Naturally, a number starting with the letter “o” must
not contain 8's or 9's.

You can display an octal number with a -showo- command (show octal):

showo 39

will display “00000000000000000047” on the screen (39_{10}=47_{8}). The
default format is twenty (3-bit) octads, corresponding to a whole 60-bit
word:

showo 39,4

will display “0047”, showing just four octads.

Now that we have discussed the octal notation, it is possible to point out what happens to negative numbers:

showo -39

will display “77777777777777777730”. A negative number is the “complement”
of the positive number (binary l's are changed to 0's and binary
0's are changed to l's). In octal, the complement of 0 is 7 (000_{2}→111_{2} =
7_{8}), and the complement of 7_{8} is 0_{8}. In the example shown, octal 47_{8} is 00111_{2}, whose complement is 011000_{2}, or 30_{8}. Notice that the left-most
bit (the “sign” bit) of a negative number is always set. In order for a
negative number to stay negative upon performing an “arithmetic right
shift”, all the left-most bits are set. So,

o40000000000000003242 $ars$ 6

yields:

o7400000000000000032.

Only the sign bit was set among the left-most bits before the shift (o40 is
100000_{2}), but after the shift the first seven bits are all set. The “circular
left shift”, $cls$, does not do anything special with the sign bit.

It is interesting to see the bits set for floating-point numbers:

will make this display;

Note that the negative number is the complement of the positive. The
48-bit magnitude (6000000000000000) represents a huge integer (6×2^{45}).
The eleven bits between the sign bit and the 48-bit magnitude give the
power of two (-46) by which the magnitude is to be scaled (3 =
6×2^{45}×2^{-46} = 6×2^{-1} = 3). A bias of 2000_{8} is added to the correct
exponent (-46, or —568) to give an eleven-bit exponent of 1721_{8}.
Exponents less than 2000_{8} represent negative powers and exponents
greater than 2000_{8} represent positive powers.

We have encountered octal numbers (e.g., o327) which can be shifted
left ($cls$) and right ($ars$) and complemented (by making them negative).
Pieces can be extracted with a $mask$ operation. Additional bit
operations are $union$, $diff$, and “bitcnt”. The “bitcnt” function gives
the number of bits set in a word: bitcnt(o25) is 3, because o25 is 010101_{2},
which has 3 bits set; bitcnt (—o25) is 57, since the complement will have
only 3 of 60 bits not set; and bitcnt (0) is 0. Like $mask$, $union$ and
$diff$ operate on the individual bit positions, with all 60 done at once:

x $mask$ y produces a 1 only where both x and y have l's. x $union$ y produces a 1 where either x or y or both have l's. x $diff$ y produces a 1 only where x and y differ.

Note that $union$ might be called “merge”, since l's will appear in every bit position where either x or y have bits set. The $diff$ operation might also be referred to as an “exclusive” union, since it will merge bits except for those places where both x and y have bits set.

While $mask$ can be used to extract a piece of information from a word, a $mask$ that includes all but that piece followed by a $union$ can be used to insert a new piece of information.

These bit operations can be used with arrays. For example, if A, B, and C are true arrays, the statement “C⇐A $diff$ B” will replace each element of C by the bit difference of the corresponding elements of A and B.