NLP - techniques

Techniques

intro | multiple meanings | word prediction | part of speech tagging | context

Multiple Meanings

This is complicated. The more you think about it, the more complicated it becomes. The authors of Verbmobil take the second of these two and examine it with regard to the word “open.”

How many different things can we mean with just this one word? Offhand, the authors of Verbmobil can find about seven meanings. Here they are: open door, open golf tournament, open question, open eyes, open job opportunity, open morning, and open football player.
Verbmobil continues:

“... note that while we say ‘his eyes are open’ when they are uncovered and ready to receive stimulus, we do not say ‘his nose is open’ when he is ready to smell, nor ‘his feet are open’ when his socks are removed. We speak of ‘empty glasses’ and ‘free-range chickens’ when “open” might just as well have been used (Kay 15).

And all of this complication results from just one word! Think about what happens when an ambiguous word such as “open” is combined with another word that carries a variety of meanings. Then think about putting that phrase in a sentence, described by the context of other ambiguous phrases. Then think about what happens when things like idioms are thrown in. One infamous example of an exponentially indecipherable sentence is “Time flies like an arrow.” What this means is fairly obvious to a human. However, for a computer this sentence will raise quite a bit of confusion. Is “time,” in fact, a noun, or are “time flies” a special kind of insect that prefers arrows?

Many English words have multiple meanings, something that humans generally cope with via context and knowledge of the world. However, natural language processing systems must use other methods to interpret the correct sense, or meaning, of words in order to understand sentences.

Supervised Methods of Disambiguation
Whenever a source of knowledge is used in learning, such as a dictionary or human intervention, that learning is known as supervised. A number of supervised methods for disambiguating words are available, and they tend to be more accurate than unsupervised methods; to give a taste of these supervised methods, this introduction will focus on the Naives Bayes approach and dictionary disambiguation.

The Naives Bayes approach uses surrounding words to disambiguate the particular sense in which a word is being used. Instead of focusing on order of words, as n-gram models do, Naive Bayes treats the surrounding words as unordered and looks at which words commonly occur around the target word. If such words fall into a few distinct clusters, it is likely that each cluster corresponds to a particular sense of the word. This technique is considered supervised because it requires a labeled training corpus; the material on which it is trained must classify each word as corresponding to a particular sense. To understand how this method works, one can examine a word like “plane.” Assuming there are only two senses of plane, the training corpus might mark the sense of plane as a transportation device as the first sense and the sense of plane as a geometrical idea as the second sense. Then the model would process the training corpus, keeping track of words that often occur near “plane.” Next, the results are tabulated as below:

first sense (transportation): flying, airport, flight, time, departure, arrival, security...
second sense (geometry): two-dimensional, coordinate, math, angle, algebra, graphing...

When the Naive Bayes model is tested and encounters the word “plane,” it then checks if the words around this occurrence of “plane” most closely match the first sense or the second sense to determine which sense is more probable.

Another type of supervised learning can be done using algorithms that employ a dictionary for help with disambiguation. These algorithms can use an unmarked training corpus, unlike the Naive Bayes approach, and they use the dictionary as the source of all the senses of a word. Like Naive Bayes, however, these approaches are concerned with what words are close to the target word. The models then check if any of the words that are nearby appear in one of the sense definitions for the target word. Based on these matches between context and sense definition, the model calculates the probability that the word is an instance of each of the senses. The sense with the highest probability becomes the model’s guess for the meaning of the word.

The similarity of this dictionary approach to the Naive Bayes model prompts the question of why one would use one method over the other. Each model has trade-offs. Naive Bayes requires a labeled training corpus, something that takes a great deal of effort for a human to produce and is subjective based upon how the particular human sees the senses, while many digital versions of dictionaries exist that can be easily employed in the dictionary algorithms. However, the Naive Bayes approach can have significantly more words correlated to specific senses than the dictionary approach, increasing accuracy levels for Naive Bayes. One approach to boosting the accuracy of dictionary algorithms (in order to avoid the need for a labeled training corpus) is to add a thesaurus to the tools that the algorithm can use. With this tool, the model can use words close to the target word that are synonyms of words in the target word’s definition to provide more data for disambiguation. The strengths of Naive Bayes and dictionary disambiguation can thus be combined to some extent.