Part-of-Speech
Tagging
Often
ambiguities in the meaning of a word can be resolved by determining
what part of speech that word is serving as in the target sentence.
For instance, although content can have two meanings, each meaning
is specific to the part of speech that it is used as. Thus, if one
could determine what part of speech it was, there would be no need
to use n-gram models or other methods for disambiguation. To this
end, some researchers study algorithms intended to determine a word’s
part of speech.
In
part of speech tagging algorithms, two main sources of information
are used to compute the probability that a specific tag is correct:
the probability of a specific tag for a specific word and the relative
probability of the current sequence of tags in English. To combine
these two probabilities and determine the best overall tag for a
given word, many statisticians use Hidden Markov Models (HMMs).
In
understanding how an HMM works, first we must examine how it would
work if it only took into account the probability that a specific
tag occurs with a specific word. This process works in ways very
similar to n-grams (which are actually instances of Markov Models)
in that it makes the bigram assumption the word’s tag can
be determined simply based on the previous word’s tag. Given
that the model has been trained on a tagged corpus, which already
has part of speech information added, it can calculate the probabilities
that specific words serve as nouns, verbs, or other parts of speech.
Additionally, it can calculate the probability that one part of
speech occurs after another part of speech. The model’s task
can best be understood by using an example, such as “to flower.”
We assume that we know the proper tag for “to” and that
we know “flower” can be a noun or a verb. Then we seek
to maximize the product of the probability that the tag for flower
follows the tag for to and the probability that, given we are expecting
a certain tag, “flower” is the word matching this tag.
Expressed mathematically, we compare the values of P(VERB | TO)
P(flower | VERB) and P(NOUN | TO)P(flower | NOUN), as “TO”
is the correct tag for “to.” This is the task that the
model would face if it were seeking to tag an individual word when
given the preceding word’s tag.
In
actuality, HMMs usually try to tag whole sentences at one time,
and they are not given any tags that are certain. Thus, there are
many more probabilities and comparisons involved. To limit these
computations and create a process that is manageable given time
and computing constraints, HMMs are usually programmed to make some
n-gram assumption, often a trigram assumption. Consequently, long
sentences can still be tagged without taking enormous amounts of
time, and assumptions are made that allow for linear tagging time
rather than exponential tagging time based on the length of the
text to be tagged. Using the techniques described, researchers have
reported accuracy rates that are greater than 96%. Current research
aims to increase these accuracy rates by taking more factors than
simply the previous (n – 1) tags into account when calculating
the next tag.
|