1948: Claude Shannon Creates the Modern Information Theory
While working for Bell Laboratories, mathematician Claude Shannon, in association with
Warren Weaver, wrote his paper titled, A Mathematical Theory of Communication. This
paper showed that information is not just an abstract concept, but an actual entity that
can be measured and mathematically manipulated. Shannon formed the basic model of
communication that has now become the information theorist's coat of arms (Fig. 1).
Figure 1 - Claude Shannon's information model
By making information something physical, he was able to create rules and theorems that
would shape the way information would be transmitted. These rules still apply as much
today as they did a half century ago. Shannon's model showed that any given message is
taken from a finite set of messages of given probabilities. In order to send that message,
it can be encoded by a transmitter into binary digits, or bits, as he called them.
Using bits, any message can be broken down into a series of zero's and one's and sent
across a channel. In analyzing the way messages are formed and sent, Shannon made it
possible to encode messages using an efficient, minimal-redundancy code.
Other ideas given in Shannon's paper include how fast information is sent through a
given channel and what the maximum capacity of a channel is. One of his greatest theorems
in his paper included a proof showing how data can be sent over a noisy channel with
almost zero error. Given this new mathematical groundwork, Shannon paved the way for other
scholars to pick up his work in this new field of information theory.
1949: Claude Shannon and
Communication in the Presence of Noise
After Claude Shannon wrote A Mathematical Theory of Communication in 1948, he
followed up with another paper titled Communication in the Presence of Noise. This
paper provided a geometrical representation of how information is transmitted over function
spaces. In this paper, Shannon was able to deduce new theories about the bandwidth of
a given channel and the threshold effect. With new mathematical models, Shannon was able
to come up with new ideas of how information can be sent in non-ideal conditions.
His new theories took into account new factors such as the power of the transmission and
bandwidth of the channel.
1950: Richard Hamming and Error
Richard Hamming worked with finding ways for correcting errors in binary transmissions.
His idea was that messages could be sent in 7-bit form with four information bits combined
with three check bits. By checking the oddness or evenness of different combinations of
information bits, the receiver can be certain whether a message has been sent error-free.
If the received bits do not agree with the original calculations, it can be calculated
where the error will most likely exist. This is a fairly basic error-correction technique
but is not the most efficient. For every packet of date transmitted, only 4/7 (about 57%)
of it is the actual message and the rest are check bits. Furthermore, Hamming's procedure
only guards against one error in a 7-bit block and cannot guarantee that a message with
more than one error will be corrected (or even discovered).
1951: Claude Shannon and Entropy of
In his original paper, A Mathematical Theory of Communication, Shannon described
entropy as the average number of binary digits, or bits, per letter needed to translate a
given message. In 1951, he wrote a new paper titled Prediction and Entropy of Printed
English. He wanted to make further statistical analysis as to how to encode the
language as efficiently as possible. He figured that one could predict a certain letter
given a string of preceding letters (based on letter probability of the English language).
1952: David A. Huffman and
Shannon defined "minimal-redundancy" as the lowest possible message length
given D coding digits. While Shannon wrote proofs on this topic, David A. Huffman
gave an actual coding technique to achieve minimal redundancy in his paper, "A Method
of the Construction of Minimum-Redundancy Codes" written in 1952. Given certain
restrictions of how a set of N messages can be arranged, it is possible to
construct a code that yields the lowest possible message length. Huffman placed these
restrictions to message codes in order to achieve the lowest possible message ensemble
- No two messages will consist of identical arrangements of coding digits (if there are
identical arrangements, then one message can be eliminated from the set without any lose
- The message codes will be constructed in such a way that no additional indication is
necessary to specify where a message code begins and ends once the starting point of a
sequence is know. In other words, the code cannot appear, digit for digit, as the first
part of any message code of greater length.
- The messages must be arranged in a way that the length, L(I), of a given message
code is never less than the length of a more probable code.
If P(x) is the probability of a message being chosen then:
P(1) > P(2) > . . . > P(N
L(1) < L(2) < L(N 1) = L(N)
L(N 1) equals L(N) because the first L(N 1)
digits of the Nth message cannot be code used in any other message. If more digits
were added to the Nth message, then the L(N 1) message would be
encapsulated as the first part of the Nth message. This would be a violation of
- At least two and not more than D of the messages with code length L(N)
have codes which
are alike except for their final digits.
- Each possible sequence of L(N) 1 digits must be used either as a
message code or must
have one of its prefixes used as a message code. This ensures that the Nth
message has as
few digits as possible hence reducing the total number of digits in the entire message.
Huffman provided ways of manipulating messages so that these restrictions could be met.
Once this is in place, he gave the following instructions:
- Since the last two messages are exactly the same except for their last digit, the code
(which is not yet determined) will be the common prefixes (the first k digits of a
code) of order L(N)-1 of the two messages.
- A new set of messages can be constructed with one less member because two of the
messages are now combined.
- Manipulate the messages until the restrictions, (a) through (e), are again met.
- Repeat this process until all the messages are encoded.
This new encoded scheme is like signs on a river. By following the restrictions, the
encoded scheme provides plans for a "path" to recreate the original messages.
Figure 2: This table shows messages being combined into one
Because these messages have certain constraints at each step of the encoding,
Huffmans technique ensures that the message is encoded with the lowest average
length. This technique (known as Huffman coding) not only has its effects in information
theory, but also in fields such as data compression. Every time someone uses an
"mp3" file, they are reconstructing messages that have been compressed into a
small ensemble code.
1952: E.N. Gilbert and Error
Shannon's theorems prove that there exist coding schemes where information can be
transmitted with arbitrarily small amounts of errors. In 1952, E.N. Gilbert constructed a
method for finding how to send messages with small amounts of error in his paper, A
Comparison of Signaling Alphabets. Gilbert's idea was that a binary alphabet could be
constructed where an error in any letter could not be interpreted as any other letter. So,
even if a signal is received with errors, the receiver will still know what message was
sent from the transmitter. Gilbert's code offers an advantage over Shannon's code in that
Shannon's is reliable on average, but there are certain probabilities of coding errors
that are above average. Gilbert's, while maybe not as fast and efficient as Shannon's,
provides greater error correcting for any given message.
1954: Amiel Feinstein and Noisy
In his paper, A New Basic Theorem of Information Theory, Amiel Feinstein
provides new proofs and theorems for Shannon's original ideas. In particular, Feinstein
concentrated his work on noisy channels and continuous wave channels. Like other
mathematicians, Feinstein investigated the question of channel capacity and minimizing
error. This highly mathematical paper rethinks Shannon's theories and proves them using
different techniques never used by information theory's inventor. Feinstein discovered new
bounds and limits for a given noisy channel and a new mathematical representation that
still agrees with Shannon's original theorem but has a more rigorous base. His large
contribution to the field of information theory is that he found that the probability of
error decays exponentially with the block length of a coded message.
1955 - 1957: Coding for Noisy
After Feinstein's paper on noisy channels and minimizing errors, several new papers
were written trying to see if it is possible to code such a channel where errors are
almost non-existent. In 1955, Peter Elias produced his paper, Coding for Noisy Channels
in which he tries to outline practical, realistic ways of sending codes with minimal
error. His idea required a channel with memory. This means that the receiver can
check his received signals with the sent signals (hence the signals are stored in memory).
If the received signal has errors, the receiver can check the transmitter for the closest
possible match. However, such an idea cannot be put into practice because of the large
amounts of memory and time needed to check the results against the original information.
Furthermore, such a technique would require a special feedback line that would be as
error-prone as any other channel. Elias continued work with Hamming's check-symbol code
and produced a model for error-free transmitting. However, Elias admits that such a method
would require large amounts of time.
A year later, Shannon wrote The Zero Error Capacity of a Noisy Channel. With
Elias new idea of a feedback channel, Shannon was able to write proofs of how a
zero-error transmission could exist, but offers no practical way of creating such a
Figure 3 - Now do you see why noise and interference are such a
1960 : Reed-Solomon Code and
Even after twenty years of theories and methods, finding an effective way of
error-correction was still elusive. However, in 1960, Irving Reed and Gustave Solomon came
up with a new coding technique that could correct up to 8,232 consecutive errors. This was
way above any other number that could be posted by Hamming or Gilbert. The paper titled, Polynomial
Codes over Certain Finite Fields showed just how this feat was possible. While
mathematically this technique is rather involved and complicated, it is really quite easy
conceptually. The Reed-Solomon code can detect errors in code the same way the human eye
can find bad points in an otherwise smooth parabolic curve. This code sees bad points in
the code, but can recover because it knows the general shape of the transmission.
This coding technique has had a profound affect on communications because it is
implemented in everything from CD's and hard drives to deep space probe communication.
1960 - Present: David K. Berlo and
Shannon's communication model was not only used for information theory, which is
inherently mathematical, but also in communication theory that is mostly sociological. An
example can be seen in David K. Berlo's book, The Process of Communication where he
uses the idea of a source, encoder, channel, decoder, and receiver to describe social
interaction between humans. He showed how each component determines the meaning of a
message. It is the idea of meaning that separates information theory from communication
theory. However, while these two fields have different foci and goals, they both stem from
early ideas of how information is transmitted and received.
1961 - 1965: Fano, Gallager and the
Decay of Errors
In 1961, R.M. Fano was able to determine explicitly a representation of an exponential
decay coefficient for block codes that describes the relationship between code rate, block
length, and probability of error. This decay coefficient is nonzero only for rates below
the channel capacity. In 1965, R.G. Gallager provided a much simpler proof of Fano's
results and provided his best estimate for the lower bound on the decay coefficient for
low-rate codes (a proof of this coefficient is yet to be given).
1970's - Present: The New Challenge:
With the creation of the ARPAnet in 1969 and the growth of networked computers and
servers, point-to-point communication gave way to multitermnal problems. Now called the
Internet, this massive, worldwide network presents many questions and problems for
information theorists. In 1973, David Slepian and Jack Wolf published their paper that
marked the beginning of the field of multiterminal source coding theory. These authors
were able to specify completely the admissible rate region for noiseless coding of a
memoryless multiterminal source; in 1975, Thomas Cover extended their results to the case
of a stationary ergodic multiterminal source by means of an elegant proof.
Fig. 4 - A Nokia cell phone
Networks are important to a wide variety of fields and not just the scientific
community. Businesses of all sorts depend on their internal and external networks working
properly and efficiently. The largest network, the Internet, must handle millions of
transmissions every day.
Because of this large growth in networks, a large growth in research has also appeared.
Many universities and companies devote a lot of time and money into making sure networks
run properly and efficiently. Researchers create new technologies every day such as
personal digital assistants and cell phones all which are wired into huge networks and
must be running at peak performance. These new technologies bring many questions and
problems with them; all of which are the targets of extensive research.
Fig. 5 - A 3Com Palm III
Figures 4 and 5 are examples of just a few technologies that are the subjects of many
studies and experiments as to how they can send data most efficiently. As advanced as they
may seem, today's researchers are still exploring the same problems that faced Shannon and
others in the 1950's.