Information
Information, in Shannon's theory The simplest of the many definitions of information in Shannon's theory is that information is a decrease in uncertainty. To illustrate a more concrete example of this definition, consider the following set of colored shapes. This is the set of all possible shapes, and Bob is trying to guess which particular shape Alice is holding. Bob is in a state of uncertainty, as he has no idea which particular shape Alice holds. What's more, each shape has an equally likely probability of being held by Alice. Now, say Alice tells Bob that she is holding a blue shape. This definitely reduces the set of possibilities, and represents a decrease in Bob's uncertainty. He is by no means certain of the shape, but has a better idea of what single shape she is holding. Bob has gained information. Now for a subtler point. Say that Alice had instead told Bob that she was holding a red shape. In this case, she would actually have conveyed more information. Instead of having three possible choices, Bob now would have two. Therefore, his final uncertainty would be less. Since the initial uncertainty is the same, his overall decrease in uncertainty would be greater in this case.
Another interesting case occurs when the probabilities are not all equal. Say that the probability Alice is holding a blue shape is less than the probability that she is holding any other shape. In this scenario, by telling Bob that she is holding a blue shape, she has conveyed more information than in the original case where all probabilities are equal. What if she instead had said that she is holding a red shape? Whether she conveys more or less information depends on the relative probabilities of the blue and red shapes. The above metaphor allows us to visualize information theory as encompassing two realms, the known and the unknown. Information is gained when the size of the known becomes larger, and/or the size of the unknown becomes smaller. Another paradigm that is helpful in understanding Information Theory defines information as "surprisal." If event X has a smaller probability than event Y, I should be more surprised if you told be that X had occurred. Hence, I would get more information in this case. An instructive example of information theory in practice is its application to language. English, and any other language for that matter, has sets of constraints which must be met. There are dependencies and rules which govern how letters/symbols are put together to form words, sentences, and paragraphs. Therefore, written English has less information in the probabilistic sense than random a set of letters. Given this definition of information as a decrease in
uncertainty, Shannon devised an information-generating function h(p), which would, given a
set of N independent states a1, ... aN
and N corresponding possibilities p1, ... pN,
determine how much information is generated. In his ground-breaking paper,
The last property, which is not immediately obvious, follows from the fact that the probability of two events occurring in concert is the product of their individual probabilities provided that the events are independent. It can thus be restated, "The information from event XY equals the information from event X plus the information from event Y." It can be proved that the only function that satisfies that these five properties is h(p) = - logb(p) This is Shannon's most famous result. Although the logarithmic relation between information and probability had been conceived of before, Shannon was able to draw mind-boggling implications from this simple formula. An immediate implication of the information-generating function is
the formula for the average information per symbol in a set of symbols with H = p1*h(p1) + p2*h(p2) + p3h(p3) + ... pN-1*h(pN-1) pN*h(pN) H is also called entropy. This formula can predict the efficiency limits on any binary code for the set of symbols it describes. In particular, it can predict the most efficient code for compressing the English language. |