home overview techniques technology resources about
 
Techniques

Context

Verbmobil also provides us with the following two sentences:

a. Attach the amplifier to the output terminal with the red wire.
b. Attach the amplifier to the output terminal with the red dot (33).

These sentences are very, very similar- in fact, they only differ by one noun! However, in order to process them, a computer with NLP capabilities would have to know that “with the red dot” is an adjectival phrase, describing the terminal, and “with the red wire” is an adverbial phrase that describes the method of attachment.

Let’s take both of Verbmobil’s sentences and demonstrate how interpreting them in the correct way requires two types of specialized knowledge.

We can come to the first of these by attempting to interpret sentence (b) in the same method as sentence (a). That is, we can rephrase the sentence as “Use the red dot to attach the amplifier to the output terminal.” This is clearly incorrect, and a computer could determine this from the definition of dot: [example]. A dot is not an object in three-dimensional space and thus cannot connect two objects in three-dimensional space, albeit not being long enough. This is simple common sense. Thus, in order to have NLP capabilities, a computer will have to have common-sense knowledge. It would begin to acquire it simply by being able to access the definitions of many words. Next, it would need to know the connections between definitions. It would need to know, for example, that after one goes through a doorway one is inside a room.

Now let’s try to interpret sentence (a) like sentence (b). As a refresher, sentence (a) is: “Attach the amplifier to the output terminal with the red wire.” If we read “with the red wire” as an adjectival phrase as it is in sentence (b), the sentence will mean: “Attach the amplifier to the output terminal that has a red wire on it.” At first, it seems that this slight ambiguity does not make much of a difference. However, assuming that the computer possesses some capacity to process language in a systematic way, it will notice that the sentence contains no specification regarding how the two should be attached—there is no adverbial phrase featuring such an instruction. The computer, then, may believe that an “amplifier” and an “output terminal with a red wire” are two things that can be attached to each other without the use of a wire—even though the wire exists in the sentence! This is, at best, absurd. However, in order to fix it, one would have to know something about the world that is not easily categorized.

Such is the goal of the CYC knowledge base, which attempts to encode a vast amount of information in order to represent all fundamental human knowledge. CYC stores pieces of information bundled together with assertions that connect them to one another. The database is broken down into several categories that form a pyramid of associations. These include: living things, organizational plans, politics and warfare, and events and scripts. CYC uses a calculus-based representation language called CycL in order to store the database.