home overview techniques technology resources about
intro | multiple meanings | word prediction | part of speech tagging | context

Part-of-Speech Tagging

Often ambiguities in the meaning of a word can be resolved by determining what part of speech that word is serving as in the target sentence. For instance, although content can have two meanings, each meaning is specific to the part of speech that it is used as. Thus, if one could determine what part of speech it was, there would be no need to use n-gram models or other methods for disambiguation. To this end, some researchers study algorithms intended to determine a word’s part of speech.

In part of speech tagging algorithms, two main sources of information are used to compute the probability that a specific tag is correct: the probability of a specific tag for a specific word and the relative probability of the current sequence of tags in English. To combine these two probabilities and determine the best overall tag for a given word, many statisticians use Hidden Markov Models (HMMs).

In understanding how an HMM works, first we must examine how it would work if it only took into account the probability that a specific tag occurs with a specific word. This process works in ways very similar to n-grams (which are actually instances of Markov Models) in that it makes the bigram assumption the word’s tag can be determined simply based on the previous word’s tag. Given that the model has been trained on a tagged corpus, which already has part of speech information added, it can calculate the probabilities that specific words serve as nouns, verbs, or other parts of speech. Additionally, it can calculate the probability that one part of speech occurs after another part of speech. The model’s task can best be understood by using an example, such as “to flower.” We assume that we know the proper tag for “to” and that we know “flower” can be a noun or a verb. Then we seek to maximize the product of the probability that the tag for flower follows the tag for to and the probability that, given we are expecting a certain tag, “flower” is the word matching this tag. Expressed mathematically, we compare the values of P(VERB | TO) P(flower | VERB) and P(NOUN | TO)P(flower | NOUN), as “TO” is the correct tag for “to.” This is the task that the model would face if it were seeking to tag an individual word when given the preceding word’s tag.

In actuality, HMMs usually try to tag whole sentences at one time, and they are not given any tags that are certain. Thus, there are many more probabilities and comparisons involved. To limit these computations and create a process that is manageable given time and computing constraints, HMMs are usually programmed to make some n-gram assumption, often a trigram assumption. Consequently, long sentences can still be tagged without taking enormous amounts of time, and assumptions are made that allow for linear tagging time rather than exponential tagging time based on the length of the text to be tagged. Using the techniques described, researchers have reported accuracy rates that are greater than 96%. Current research aims to increase these accuracy rates by taking more factors than simply the previous (n – 1) tags into account when calculating the next tag.