MTArchive: September 2011

Sunday, September 25, 2011

A Catalog of Random Bits

Random numbers are a precious commodity, whether expressed as strings of decimal digits or simply 1s and 0s. Computer scientist and mathematician George Marsaglia (1924-2011) spent most of his career producing and testing random numbers.

Over the years, Marsaglia and his collaborators identified a variety of flaws in computer-based random number generators, invented more robust versions of existing generators, and developed a suite of rigorous tests to check for randomness.

Scientists, engineers, and others use random numbers for tackling a wide range of problems, from modeling molecular behavior and sampling opinion to solving certain equations and testing the efficiency of algorithms.

Random numbers are also used in computer graphics for rendering realistic-looking images, such as mottled surfaces and other textures. They play a crucial role in a wide variety of games, including electronic versions of slot machines, lotteries, and other forms of gambling. Indeed, the world of gambling inspired use of the term "Monte Carlo method" to describe any algorithm that employs random numbers.

A century ago, researchers who needed random numbers in their scientific work actually tossed coins, rolled dice, dealt cards, or picked numbered balls out of urns. They also combed census records, directories, and other lists to glean the digits they required.

In the early 1900s, L.H.C. Tippett, a physicist and statistician at the Biometric Laboratory at University College in London, went to the trouble of compiling a table of 10,000 random numbers, obtained by selecting 40,000 single digits from census records giving the measured areas of parishes in England. He did it as a service to others who needed a convenient source of random numbers.

In the 1950s, experts at the RAND Corporation used a special machine—in effect, an electronic roulette wheel—to generate random bits, which were then converted into a table of a million random decimal digits and published as a book. To remove slight biases uncovered in the course of rigorous, intensive testing and attributed to faulty wiring, the digits were further randomized by adding all pairs and retaining only the final digit.

Shortly after the introduction of computers in the late 1940s, researchers began to search for efficient ways of obtaining random numbers to use in computer programs. They developed random-number generators that scramble the bits of a given number in such a way that the digits of the scrambled result appear to have no connection with any previously generated numbers.

In reality, however, the numbers in the resulting sequences typically don't meet all the criteria to establish randomness. Patterns often remain because the computer follows a set procedure to generate the numbers, and restarting the process produces the same sequence. Eventually, the sequences themselves begin repeating. It's all quite deterministic.

During the 1990s, in a project funded by the National Science Foundation, Marsaglia packed his hard-won expertise onto a CD-ROM to create a catalog of random bits for the information age.

Marsaglia's primary purpose was to provide a source of what he termed "unassailable random numbers" for serious Monte Carlo studies and simulations, which often require billions of such numbers. In smaller doses, these apparently patternless strings of bits are available to provide reliable and verifiable random choices in designing experiments and clinical trials, for running sample surveys and choosing jury panels, and in other applications of a statistical nature.

The Marsaglia Random Number CD-ROM contained some 5 billion random bits, divided into 60 10-megabyte files, which could be read 1, 8, 16, or more bits at a time, depending on the application. The random bits were made by combining three sources of electronic white noise with the output from the best of the latest crop of deterministic random number generators.

The result of combining a truly random string with any other bit string—no matter how patterned—is still random. So Marsaglia also mixed digital tracks from rap and classical music and even a few digitized pictures into some of the 10-megabyte files on the CD-ROM.

Both the unadulterated and the mixed files passed Marsaglia's "Diehard" battery of randomness tests, which was also on the disk (along with text files explaining the theory underlying the tests and the random number generators used to create the random bits).

"They seem to pass all tests I have put to them—and I have some very stringent tests," Marsaglia once noted.

Loosely speaking, any number that belongs to a sequence of random digits is a member of that string by chance. Ideally, it has no connection with any other number in the sequence, and each number has a specified probability of falling in a given range. If the probability distribution is uniform, you would expect each of the 10 digits from 0 to 9 to occur about one-tenth of the time.

For binary digits, 1s and 0s should come up just about equally often in the long run. It's also useful to check whether the four pairs of digits 00, 01, 10, and 11 each come up roughly 25 percent of the time, and whether the eight triples 000, 001, . . . 111 each appear about 12.5 percent of the time. There are equivalent checks for larger groups of bits and for various groupings.

Such traditional tests, however, aren't sufficient to guarantee a reasonable degree of randomness. Over the years, Marsaglia developed additional tests to weed out hidden patterns.

For example, in one of his Diehard tests he considered a string of random bits as a succession of overlapping 20-letter "words," made up of the "letters" 1 and 0. The first word starts with the first bit of the string, the second word with the second bit, and so on.

Given that there are 220 possible 20-letter words, the test involves examining the first 2 million overlapping words derived from the random string to count how many of the possible words appear in the sequence. For a truly random sequence, the number of missing words should be very close to 142,000, according to probability theory.

Marsaglia called this a "monkey test," based on the proverbial monkey sitting at a keyboard and randomly hitting keys. "It's one of my most stringent tests," he noted. "No commercial electronic generator of random bits that I know of can pass this test." Interestingly, generators that rely on electronic noise to produce random digits also fail.

The playful names of other tests hint at alternative strategies for rooting out patterns in sequences purported to be random: the birthday-spacing test, the count-the-1s test, the parking-lot test, the minimum-distance test, the overlapping-sums test, and the craps test.

Checking for randomness is difficult partly because it's the nature of randomness that anything can happen. Ironically, if a generator produces sequences of random numbers that always pass a given test, then it's probably flawed!

As mathematician Robert R. Coveyou (1915-1996) once put it, "The generation of random numbers is too important to be left to chance."

Originally posted April 22, 1996

References:

Gardner, M. 1989. Random numbers. In Mathematical Carnival. Mathematical Association of America.

Knuth, D.E. 1969. The Art of Computer Programming. Addison-Wesley.

Marsaglia, G. 1968. Random numbers fall mainly in the planes. Proceedings of the National Academy of Science 61 (September): 25-28.

Marsaglia, G., and A. Zaman. 1994. Some portable very-long-period random number generators. Computers in Physics 8 (January/February):117-121.

______. 1991. A new class of random number generators. Annals of Applied Probability 1(No. 3):462-480.

Peterson, I. 1998. The Jungles of Randomness: A Mathematical Safari. Wiley.

______. 1990. Islands of Truth: A Mathematical Mystery Cruise. W. H. Freeman.

RAND Corporation. 1955. A Million Random Digits with 100,000 Normal Deviates. Free Press.

Friday, September 16, 2011

Spreading Rumors

Three people take longer to share their gossip than four people!

This curious result arises out of the following situation: A group of friends love sharing their gossip. Each gossiper initially knows something that no one else in the group knows. Each day, some of the gossipers phone each other and exchange all the news they have collected so far. On a given day, however, each gossiper can participate in only one phone call, and no conference calls are allowed.

What is the minimum number of days required for all gossipers in the group to learn all the news?

If the group consists of only one member, the minimum number of days is zero; for two people, only one call is required to share two pieces of gossip. A group of three people needs three days to get everyone up to speed, and a group of four people takes only two days.

Here's the table that C. Kenneth Fan, Bjorn Poonen, and George Poonen produced to illustrate these results:

Number of gossipers	1	2	3	4	5	6	7	8
Minimum number of days	0	1	3	2	4	3	4	3

The minimum number of days doesn't increase steadily as the number of participants grows. From the pattern, you can see that it would probably be helpful to consider odd and even numbers of participants separately.

Suppose a group has an even number of gossipers. On the first day, the group can be divided in half. Each of the two subgroups will learn all the gossip within the subgroup in a certain number of days. Gossipers in one subgroup can then be paired with gossipers in the other subgroup so that, in another day, all the gossipers know all the gossip.

It turns out that if the number of gossipers, n, is 2^D, then the minimum number of days required to spread the rumors to everyone in the group is D. Thus, four (2²) participants would take two days; and eight (2³) gossipers would require three days.

Now, consider a group of seven gossipers: Alice, Bob, Carol, Daniel, Eric, Fred, and Greta. One way to spread the rumors is to divide the group into two subgroups, one consisting of four members (Alice, Bob, Carol, and Daniel) and the other of three members (Eric, Fred, and Greta). On the first day, Alice phones Eric, Bob phones Fred, and Carol phones Greta, while Daniel takes a break. At this point, Alice, Bob, Carol, and Daniel collectively have all the information available, and it takes them two days to share it among themselves. Then, Alice calls back Eric, Bob calls Fred, and Carol calls Greta to complete the information transfer. Total time: four days.

Illustration of how rumors can spread in four days among seven gossipers. The numbers indicate the day on which the connected couple communicates.

Four days is the best that a group of seven can do no matter what the strategy.

In an article in Mathematics Magazine, Fan, Poonen, and Poonen prove that, in general, the minimum number of days for an even number of participants to share information is given by the exponent of the power of two exactly representing the number of gossipers or by the exponent of the next higher power of two. Thus, four (2²) gossipers would need two days, six would need three days (the next higher power of two is 2³), eight would need three days, and 10 would need four days (because 10 is greater than eight but smaller than 16, or 2⁴).

For an odd number of participants, just add one day to the number of days required by the next higher even number of gossipers.

Fan, Poonen, and Poonen readily concede that their mathematical model of rumor mongering may be somewhat unrealistic. "Admittedly, one day is a bit much for the time it takes for even the most loquacious of rumor mongers to communicate with a friend!" they commented. "Perhaps one hour or even one minute would have been more realistic."

Interestingly, the problem can also be interpreted in terms of communication between computers operating in parallel. "The result can be useful, for instance, when multiple copies of a database are distributed to several locations and replication is needed to keep the databases synchronized," the mathematicians said. "In fact, this is the situation in which our problem originally arose."

References:

Fan, C.K., B. Poonen, and G. Poonen. 1997. How to spread rumors fast. Mathematics Magazine 70(February):40-42.

Peterson, I. 2002. Mathematical Treks: From Surreal Numbers to Magic Circles. Mathematical Association of America.

Originally posted March 17, 1997

Saturday, September 10, 2011

Powerful Sequences

Number sequences suggest all sorts of intriguing puzzles and patterns.

Consider, for example, the sequence of counting numbers:

…

Take out every second number, leaving just the odd numbers:

…

Then, form the cumulative totals of these numbers, as follows:

1, (1 + 3) = 4, (4 + 5) = 9, (9 + 7) = 16, (16 + 9) = 25, (25 + 11) = 36, (36 + 13) = 49, (49 + 15) = 64, . . .

Out pops the sequence of consecutive squares: 1 = 1², 4 = 2², 9 = 3², 16 = 4², 25 = 5², 36 = 6², 49 = 7², 64 = 8², and so on.

This seemingly magical transformation of one sequence into another was first discovered and explored by mathematician Alfred Moessner in the early 1950s. He found a host of such relationships between different number sequences.

Again starting with the sequence of counting numbers, suppose you take away every third number (multiples of 3):

…

Then, add up what's left to get cumulative totals:

1, (1 + 2) = 3, (3 + 4) = 7, (7 + 5) = 12, (12 + 7) = 19, (19 + 8) = 27, (27 + 10) = 37, (37 + 11) = 48, (48 + 13) = 61, (61 + 14) = 75, (75 + 16) = 91, . . .

You end up with the following sequence:

…

Remove every second number in the new list, and total the remaining numbers. What do you end up with?

…

1, (1 + 7) = 8, (8 + 19) = 27, (27 + 37) = 64, (64 + 91) = 125, (125 + 91) = 216, . . .

You get the sequence of cubes: 1 = 1³, 8 = 2³, 27 = 3³, 64 = 4³, 125 = 5³, 216 = 6³, . . .

If you go through the same procedure again, this time striking out every fourth number at the start, the result should now come as no surprise. You end up with the sequence of fourth powers: 1 = 1⁴, 16 = 2⁴, 81 = 3⁴, 256 = 4⁴ . . .

In general, taking out the nth number and following the appropriate procedure gives a sequence of nth powers.

What happens if you take out the so-called triangular numbers:

1, (1 + 2) = 3, (1 + 2 + 3) = 6, (1 + 2 + 3 + 4) = 10, (1 + 2 + 3 + 4 + 5) = 15, . . ., (1 + 2 + 3 + 4 + . . . n)

and as before, calculate cumulative totals, then take out the first, third, sixth, tenth, fifteenth, and so on numbers from the new list, then continue on, as above?

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	…
2	4	5	7	8	9	11	12	13	14	…
2	6	11	18	26	35	46	58	71	85	…
6	18	26	46	58	71	…
6	24	50	96	154	225	…
24	96	154	…
24	120	274	…
120	…

Notice that the unique numbers down the left-hand side are the factorial numbers: 1, (1 x 2) = 2, (1 x 2 x 3) = 6, (1 x 2 x 3 x 4) = 24, (1 x 2 x 3 x 4 x 5) = 120, or in general, (1 x 2 x 3 x ... x n). Somehow, the recipe turns addition into multiplication. For more information about a related sequence, see Moessner’s factorial triangle at the On-Line Encyclopedia of Integer Sequences.

I first came across these surprising sequence transformations when Richard Guy described them at a meeting on recreational mathematics held in 1986 at the University of Calgary. This material—and much, much more—is included in a fascinating book by Guy and John H. Conway titled The Book of Numbers. If you want to stretch your mind from the integers to the surreal, this is the book to read!

References:

Conway, J.H., and R.K. Guy. 1996. The Book of Numbers. Springer-Verlag.

Enzensberger, H.M. 1997. The Number Devil: A Mathematical Adventure. Metropolitan Books.

Gardner, M. 1997. Strong laws of small primes. In The Last Recreations: Hydras, Eggs, and Other Mathematical Mystifications. Springer-Verlag.

Guy, R.K. 1994. The strong law of small numbers. In The Lighter Side of Mathematics: Proceedings of the Eugène Strens Memorial Conference on Recreational Mathematics and Its History, R.K. Guy and R.E. Woodrow, eds. Mathematical Association of America.

Long, C.T. 1982. Strike it out—and add it up. Mathematical Gazette, 66(December):273-277.

Peterson, I. 2002. Next in line. In Mathematical Treks: From Surreal Numbers to Magic Circles. Mathematical Association of America.

______. 1990. Islands of Truth: A Mathematical Mystery Cruise. W.H. Freeman.

Originally posted Nov. 18, 1996.

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	…
2	4	5	7	8	9	11	12	13	14	…
2	6	11	18	26	35	46	58	71	85	…
6	18	26	46	58	71	…
6	24	50	96	154	225	…
24	96	154	…
24	120	274	…
120	…

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	…
2	4	5	7	8	9	11	12	13	14	…
2	6	11	18	26	35	46	58	71	85	…
6	18	26	46	58	71	…
6	24	50	96	154	225	…
24	96	154	…
24	120	274	…
120	…

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	…
2	4	5	7	8	9	11	12	13	14	…
2	6	11	18	26	35	46	58	71	85	…
6	18	26	46	58	71	…
6	24	50	96	154	225	…
24	96	154	…
24	120	274	…
120	…