home May 01, 2017

Entropy and Random Number Generators


a study of RNG and a custom RNG solution

Computers use a logical device known as a pseudo random number generator (PRNG), also known as a deterministic random bit generator (DRBG). This is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state.

In Unix-like operating systems, /dev/random is a special file that serves as a random number generator or as a pseudo random number generator. It allows access to environmental noise collected from device drivers and other sources.

Not all operating systems implement the same semantics for /dev/random. Linux was the first operating system to implement a pseudo random random number generator in this way.

Whats the problem with our random number generators ?

When read, the /dev/random device will return random bytes within the estimated number of bits of noise in the entropy pool. If the entropy pool is empty the reads from /dev/random will be blocked until additional environmental noise is gathered. This means that any program that needs randomness will stop working till /dev/random fills again. If your web server running SSL stops serving data due to lack of entropy your customers will not be happy. /dev/random should be suitable for uses that need reasonably good quality randomness such as one-time pad or low security key generation.

Unlike /dev/random, /dev/urandom device will return as many bytes as are requested. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current non-classified literature, but it is theoretically possible that such an attack may exist. If this is a concern in your application, always use /dev/random instead.

The OpenBSD group have a random number generator called /dev/arandom which is used by default. It is a source of extremely good entropy and it is incredibly fast. Expect to see an average of 40 megabytes of random data from this device.

Finally, /dev/srandom is a very simple and slow generator. We do not recommend using at all.

There solutions to the problem of lack of entropy. We will go through how to test the entropy pool and some software and hardware solutions. Finally, we will setup our own homemade random number generator.

NOTE: The word "entropy" generally means "chaos", "disorder", or "uncertainty". In the context of this page "entropy" is used to describe random computer data.

Testing the random number generators

Before we make any changes we need to test our current random number generator. To do this we will collect data from the random device of our choice, like /dev/random or /dev/urandom, and then use the program "ent" to test the entropy of the data set.

Get ENT, the Pseudorandom Number Sequence Test Program

First, download and compile the ENT - Pseudorandom Number Sequence Test Program. The link is the Google search result in case the file location changes.

Once the program is downloaded uncompressed it and compile the binary with "make". When you are done you will only need the "ent" binary and can delete the rest. We will make a directory under /tmp called /tmp/random to do the rest of our testing in.

mkdir /tmp/random
cd /tmp/random
wget http://www.fourmilab.ch/random/random.zip
unzip random.zip
make
mkdir /tmp/random
mv ent /tmp/random
cd /tmp/random

How to collect random data

We can collect random data from /dev/random using "dd". This command will collect 8,192 blocks from /dev/random and output the results to the text file we will call "random_output". NOTE: if your /dev/random generator is too slow this command will take forever to finish. The random_output file should be around 1 to 4 megabytes in size depending on how large a chunk is on your system. For now, if the dd command does not finish in 3 seconds or so hit Ctrl-c to kill dd.

dd if=/dev/random of=random_output count=8192

How do we use ent ?

Ent is used against any file containing data and will test the randomness of that data. ent performs a variety of tests on the stream of bytes in our file called random_output (or standard input if no random_output is specified) and produces output as follows on the standard output stream. Simply execute ent against the file like so:

./ent random_output

What tests does ent use to verify our data is random ?

Lets look at the output of the ent program against some random data we generated. The random data in our "random_output" file was collected from /dev/random on an Ubuntu test box with rngd running and we used the command "dd if=/dev/random of=output count=8192" to collect the data. Ent is then used to check the randomness using six(6) separate tests and this is the result:

user@machine:/tmp/random$ ./ent random_output 
Entropy = 7.999825 bits per byte.

Optimum compression would reduce the size
of this 1036003 byte file by 0 percent.

Chi square distribution for 1036003 samples is 251.25, and randomly
would exceed this value 55.46 percent of the times.

Arithmetic mean value of data bytes is 127.4824 (127.5 = random).
Monte Carlo value for Pi is 3.141283511 (error 0.01 percent).
Serial correlation coefficient is 0.000858 (totally uncorrelated = 0.0).

What do each of the tests mean ?

Lets go through each line of the results above and see what values we earned and what values we wanted to reach.

Entropy -- 8 bits per byte

The information density of the contents of the file, expressed as a number of bits per character. The results above, which resulted from processing a text file indicates the file is extremely dense in information essentially random. Hence, compression of the file is unlikely to reduce its size. We received a score of 7.999825 bits per byte out of 8 bits per byte which is extremely good.

Optimum compression -- 0 %

Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes. If the same data structure is repeated multiple times a short binary representation can stand in for long data structures and thus reduce the size of the compresses file. If our random data is truly random then we should NOT see any compression at all. Our goal is 0% compression and that is what we see in our test results above.

Chi square distribution -- between 10% and 90%

The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 90% or less than 10%, the sequence is almost certainly not random. Our result was 55.46% which easily passes.

Arithmetic mean -- 127.5

This is simply the result of summing the all the bytes in the file and dividing by the file length. If the data is close to random, this should be about 127.5 . If the mean departs from this value then the values are consistently high or low. We received a 127.4824 which is quite close to 127.5.

Monte Carlo value for Pi -- 3.14159265

Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a hit. The percentage of hits can be used to calculate the value of Pi. For very large streams the value will approach the correct value of Pi if the sequence is close to random. Our value of 3.141283511 is off by 0.01%. Very good result.

Serial correlation coefficient -- 0.0

This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (this can be positive or negative) will, of course, be close to zero. Our test reveals a value of 0.000858 which is pretty close to 0.0.

Fun with encryption and randomness

You can also take the generated random data you collect fro the psudo random generator and display the data as an image. Check out Roland's Fun with encryption and randomness page where he uses data from openssl to graphically show randomness using greyscale imaging.

What are the qualities of a good random number generator ?

A good random number generator has two main advantages; high quality random data and high speed measured in megabytes per second. The quality of the random data is more important than the speed, but we really want both.

The test for high quality random data we use the program "ent". Look at the section above on how to interpret the results of ent. To test the speed of entropy generation we use the "dd" command and look at the output in megabytes per second (MB/s). A speed of 1.5 megabytes per second is the minimum acceptable value. You really want your random number generator to supply high quality entropy in the ten(10) to sixty (60) megabyte per second range

For example, /dev/random has very good entropy, but poor speed. /dev/urandom has poor entropy and good speed. /dev/arandom on an OpenBSD box has both. Lets look at some of the solutions.

Increasing Linux entropy for random numbers

Increasing the entropy on a Linux system is quite easy. One solution is a daemon called "rngd". It runs in the background and significantly increases the amount of entropy on the system. The quality of the data is about the same as what /dev/random would produce, rngd simply make more per second.

To increase the amount of entropy in /dev/random you can use the program "rngd". Install the rpm or deb for your Linux distribution under the name rng-tools.

Start rngd using the following line. To start the daemon on reboot add the following to your /etc/rc.local .

## start random number seeder
rngd -r /dev/urandom -o /dev/random -t 1

You can query the /proc file system to see how many entropy characters the system has in /dev/random. We suggest checking the entropy before and after you start rngd. On a standard Linux system the available entropy is around 150 characters per second. After you start rngd the average is around 4096 characters per second.

watch -n 1 cat /proc/sys/kernel/random/entropy_avail

Using the "ent" tool as explained above we test the /dev/random device with the rngd daemon running for true entropy. We are using dd to collect some data and test the output. As you can see /dev/random plus rngd passes and we were able to collect data a 1.8 MB/s.

user@machine: dd if=/dev/random of=output count=8192
0+8192 records in
2018+1 records out
1033440 bytes (1.0 MB) copied, 0.577313 s, 1.8 MB/s

user@machine:  ./ent output 
Entropy = 7.999827 bits per byte.

Optimum compression would reduce the size
of this 1033440 byte file by 0 percent.

Chi square distribution for 1033440 samples is 247.33, and randomly
would exceed this value 62.30 percent of the times.

Arithmetic mean value of data bytes is 127.4286 (127.5 = random).
Monte Carlo value for Pi is 3.145587552 (error 0.13 percent).
Serial correlation coefficient is -0.000732 (totally uncorrelated = 0.0).

OpenBSD entropy and random numbers with /dev/arandom (FreeBSD)

The OpenBSD group has an ARC4 generator called /dev/arandom that is quite good and incredibly fast. In OpenBSD 4.9 and earlier the other random number generators like: /dev/random, /dev/urandom and /dev/srandom are too slow and weak to use. In OpenBSD 5.1 and later it seems all the random devices now use the ARC4 generator so it does not matter which name you point to. The arc4random interfaces was designed to solve the problem of emptying the kernel entropy pool with non-cryptographic applications. In Linux this is solved with /dev/erandom if available. The arc4random library function is a companion function. It is designed to never fail.

Lets say we have a program coded to use /dev/urandom for entropy and the gettimeofday library function if /dev/urandom fails (like in a chroot). The problem is when the gettimeofday function is being used it is fairly obvious that the output has a sequence (i.e. the current date/time stamp) and thus informs an attacker that the system time is being used for entropy in this program.

The arc4random library function also uses /dev/urandom (or /dev/erandom), and the gettimeofday library function if /dev/urandom fails, except that the entropy is digested by the arcfour algorithm.

The result is with as little as a one millisecond difference from gettimeofday; arc4random's output will be completely different and it is impossible for an attacker to know whether the entropy came from /dev/urandom or the system time. Furthermore, even if /dev/urandom (or even /dev/erandom) and gettimeofday fail arc4random will use the uninitialized variables in a large character array (like garbage data in memory).

Many packages will use the arc4random library function by default if it is found, such as OpenSSL, OpenSSH, OpenNTPD, and Bind9.

To make sure all programs use /dev/arandom in OpenBSD 4.9 and earlier you can simple move the unwanted random devices out of the way and symlink their names to /dev/arandom . Remember you do not need this step if you are using OpenBSD 5.1 or later.

mv /dev/random /dev/random_OLD
mv /dev/srandom /dev/srandom_OLD
mv /dev/urandom /dev/urandom_OLD

ln -s /dev/arandom /dev/random
ln -s /dev/arandom /dev/srandom
ln -s /dev/arandom /dev/urandom

Using the "ent" tool as explained above we can test the /dev/arandom device for true entropy. Here we use dd to collect some data and test the output. As you can see /dev/arandom passes without issue. Also notice that we collect the data at around 70.88 megabytes per second.

user@machine: dd if=/dev/arandom of=output count=819200
819200+0 records in
819200+0 records out
419430400 bytes transferred in 5.917 secs (70883363 bytes/sec)

user@machine:  ./ent output 
Entropy = 7.999956 bits per byte.

Optimum compression would reduce the size
of this 419430400 byte file by 0 percent.

Chi square distribution for 419430400 samples is 257.21, and randomly
would exceed this value 44.95 percent of the times.

Arithmetic mean value of data bytes is 127.5178 (127.5 = random).
Monte Carlo value for Pi is 3.144255776 (error 0.08 percent).
Serial correlation coefficient is -0.000120 (totally uncorrelated = 0.0).

Buying a hardware random number generator

A hardware random number generator is card which generates random numbers from a physical process. Such devices are often based on microscopic phenomena such as thermal noise or the photoelectric effect or other quantum phenomena. These processes are, in theory, completely unpredictable, and the theory's assertions of unpredictability are subject to experimental test.

The good news is some modern CPU's now have true random number generators built on die. Some select Intel's Ivy Bridge cpu's supply the Digital Random Number Generator (DRNG). Ivy Bridge's DRNG can generate high quality random numbers (standards compliant) at 2 - 3Gbps and is available to both user and OS level code. The newer Intel Haswell cpus have even better support with RdRand (Wikipedia) instructions for random number generation which is compliant with security and cryptographic standards such as NIST SP800-90, FIPS 140-2, and ANSI X9.82.

At the inexpensive end is the USB Hardware Random Number Generator name "Entropy Key" by Simtec Electronics. This is small USB sized key you can plug into any system and is only about $40 US. The Entropy Key uses P-N semiconductor junctions reverse biased with a high enough voltage to bring them near to, but not beyond, breakdown in order to generate noise. In other words, it has a pair of devices that are wired up in such a way that as a high potential is applied across them, where electrons do not normally flow in this direction and would be blocked, the high voltage compresses the semiconduction gap sufficiently that the occasional stray electron will quantum tunnel through the P-N junction. (This is sometimes referred to as avalanche noise.) When this happens is unpredictable, and the occurrence of these events is what the Entropy Key measures. The entropy can be collected in a file, passed to egd, but by default it is added to the kernel entropy pool for use by /dev/random and things like ASLR.

At the expensive side of the spectrum there are many choices from hardware PCI cards to entire boxes using multiple levels of randomization. Those systems are well beyond the scope of this page.

What about making you own random number generator ?

Making your own random number generator is not difficult once you understand one important point: you need access to random seed of data. This random seed can come from anywhere you are reasonably sure no one can predict. For example the site random.org uses a radio antenna to collect atmospheric distortions for their random seed. Anything from lightning, white noise, vehicles or almost anything can induce noise into the atmosphere and that data, at a certain frequency, is received by random.org.

You can do the same on a much cheaper level using most anything you can connect to your system. For example, what about using a random channel on a wireless network for random seed data ?

Using wireless network traffic to create random data

We will be setting up the passive wireless card on an Ubuntu Linux box. The network card will choose a random WiFi channel and passive listen to data on multiple networks. The data accepted will be time stamped to the sixth(6th) decimal place making sure every accepted line is unique. We will receive the signal quality of the remote transmitter or access point as well as wireless options like speeds and frequency options.

Find a wireless card that can scan and passively monitor

The hardest part of passively monitoring a wireless network is using a card that will support this option. Know that most wireless cards do not support any options other than connection, negotiation and transmitting to a valid network. This is not what we want to do. We want to passive monitor a network so no one knows that we are listen to the traffic.

Once of the best USB based wireless cards out there is the Alfa 1000mW 1W 802.11b/g USB Wireless WiFi Network Adapter with the original Alfa Screw-On Swivel 9dBi Rubber Antenna. It is a small gray external base with a 16 inch black rubber antenna. The Alfa has a one(1) milliwatt transmitter and the black antenna which can pick up wireless networks at an incredible distance. We have been able to get packets from a wireless network two(2) city blocks from our location without issue. You can pick up this unit for as little as $30 (US) from amazon.com

This unit allows you to channel scan, passively monitor and inject packets into any network. This device is perfect if you want to use KisMAC or Kismet; a network detector, sniffer, and intrusion detection program. The drivers are already available in default installs of Ubuntu Linux ( driver rtl8187 - RTL8187vB ) and many other distributions. BTW, the Alfa unit works in Windows and OSX on the Macintosh (MAC) perfectly fine.

Listen to wireless network passively

We want to listen to the network passively so no one should know we are listening and we do not introduce anything to the network.

Connect the Alfa to the USB port. Depending on your OS the wireless card may or may not come on-line and be bound to wlan0. In the latest version of Ubuntu /var/log/messages shows that the USB device is recognized, but the OS does not automatically bring up the device.

First we need to bring up the wlan0 interface and do a channel scan to look for the most used channel to list for traffic on. What we want is a channel with the most users and the highest amount of traffic.

ON A SIDE NOTE: Many wireless products in the U.S. ship with a default Wi-Fi channel of 6. Unlike television channels, some Wi-Fi channel numbers overlap with each other. Channel 1 uses the lowest frequency band and each subsequent channel increases the frequency slightly. Therefore, the further apart two channel numbers are, the less the degree of overlap and likelihood of interference. If encountering interference with a neighbor's WLAN, change to a distant channel. Both channels 1 and 11 do not overlap with the default channel 6; use one of these three channels for best results. Since we are only passively listen to data we really do not care what channel the most people are on.

Lets bring up the wireless interface, scan the networks and pick a channel.

## bring up the interface
sudo ifconfig wlan0 up

## scan for nodes on all channels, sort for most users
sudo iwlist wlan0 scan | grep "Channel:" | uniq -c | sort

## choose a channel that has the most clients (i.e. ch 4)
sudo iwconfig wlan0 channel 4

## verify you are on the channel you selected
sudo iwlist wlan0 channel | grep Current

Now that we have found a channel with a few clients on it we can take the wlan0 device down and switch to passive mode called "monitor" mode.

# put the interface in passive mode
sudo ifconfig wlan0 down
sudo iwconfig wlan0 mode monitor
sudo ifconfig wlan0 up

Generating entropy using a wireless network seed

Once the wireless network is up and receiving data we have a source of random seed data. We need to listen to the network, capture the data and use OpenSSL to encrypt it in real time.

The "random_number_gen.sh" script show below does the following:

The script in essence will take all the data it receives from the wireless interface, including statistical and time stamps, and encrypts it. The encrypted data is extremely hard to predict and hard to crack given the Advanced Encryption Standard (AES) encryption scheme and the fact that we used a random password. Even we do not know how to unencrypt the data.

The following code is a script we called, "random_number_gen.sh". You are welcome to copy and paste the following:

#!/bin/bash

## generate a random password 
password=`openssl rand -base64 48`

## listen to the wireless network and encrypt in real time
tcpdump -KnOSx -vvv -i wlan0 | openssl enc -aes-256-cbc -pass pass:$password > random_output

Testing the quality of our random data

Once you have some data in the "random_output" file created by the "random_number_gen.sh" script you need to test if it is actually random. Just like explained earlier on this page we will use the program "ent". Running ent against our output file reveals that our data from our random wireless network is in fact a good seed source:

user@machine$  ./ent random_output 
Entropy = 7.999990 bits per byte.

Optimum compression would reduce the size
of this 19333120 byte file by 0 percent.

Chi square distribution for 19333120 samples is 262.75, and randomly
would exceed this value 35.59 percent of the times.

Arithmetic mean value of data bytes is 127.4861 (127.5 = random).
Monte Carlo value for Pi is 3.143249955 (error 0.05 percent).
Serial correlation coefficient is 0.000581 (totally uncorrelated = 0.0).

Testing the speed of our random data

Running the "random_number_gen.sh" on our passively monitored wireless network for 90 seconds we received 10,315,709 characters of random data. That equals to approximately 114,618 characters per second or 248 kilobytes per second. Not too bad, but not as fast as rngd. If we had a network with more traffic like a public wireless network at a Cafe' the speeds would be better.

## count number of characters
user@machine$  wc -m random_output
10315709 output

## calculate the characters per second
user@machine$  calc 10315709 / 90
  ~114618.98888888888888888889

That's about it. We have built a darn good entropy source.