TriQL -- The Trio Query Language |
This document describes TriQL, the SQL-based query language being implemented as part of the Trio project.
Trio is a new prototype database management system under development at Stanford. Trio is based on the ULDB model, an extension to the relational model that adds both uncertainty (U) and lineage (L) as first-class concepts. TriQL is designed for querying and updating ULDB databases.
ULDBs are introduced first, followed by the TriQL language. This document is not a formal specification; rather, it introduces the data model and TriQL language primarily through examples. More details on the model, and formal query semantics, can be found in overview papers linked from the Trio home page.
Saw(witness,color,car) Drives(person,color,car)
The following table has two tuples, each with two
alternative values for its uncertain attributes color and car:
Amy | (blue,Honda) || (red,Toyota) |
Betty | (green,Mazda) || (green,Toyota) |
Amy | (blue,Honda) || (red,Toyota) | ? |
Betty | (green,Mazda) || (green,Toyota) |
Amy | (blue,Honda) || (red,Toyota) | ? |
Betty | (green,Mazda) || (green,Toyota) || (green,NULL) |
Amy | (blue,Honda):0.4 || (red,Toyota):0.3 | ? |
Betty | (green,Mazda):0.2 || (green,Toyota):0.5 || (green,NULL):0.3 |
Confidences are interpreted probabilistically by default, with alternative values for the same tuple being disjoint and alternative values for different tuples being independent. For example in the data above, by default the probability of Betty seeing a green Mazda is 0.2, the probability of Amy not seeing any car ("?") is 0.3, and the probability of Amy seeing a red Toyota while Betty sees a green Toyota is 0.15. Other interpretations for confidence values can be specified by the user, as discussed in Result Confidences below.
Semantics: Formally, a ULDB represents a set of possible instances, each of which is a database of regular relations. Without confidences, each possible instance corresponds to choosing one alternative (or "?" if present) for the uncertain attributes in each tuple of each ULDB relation. When confidences are present, possible instances have associated probabilities derived from the confidences of the chosen alternatives.
Select OPTIONS select-list From table-list {Where predicate} {Group By attribute-list {Having predicate}} {Order By attribute-list}The OPTIONS are considerably expanded from those in SQL. In this specification, "|" denotes alternation (specifying that two options may not appear together). In a query, options may be listed in any order:
{Merged | Distinct} {Flatten | GroupAlts(attribute-list)} {NoConf | NoMaybe} {NoLineage}
What follows is an informal description of logical query evaluation beginning with the From, stepping through the Where and Group-By clauses, creating the result tuples, and applying the OPTIONS. This description is informal and not a complete syntactic or semantic specification. Further details (although still not a complete formal specificaiton) are supplied in the remainder of the document.
When a query executes, there are three components to its result that can, for the most part, be considered separately:
The data set used for most examples in this document expands the
one in the ULDBs section above, but excludes
confidence values for now. The tuples are numbered for convenience
when referring to them, and the red titles are for referring back
easily to the sample data -- neither of these aspects are part of the
data itself.
1. | Amy | (blue,Honda) || (red,Toyota) | ? |
2. | Betty | (green,Mazda) || (green,Toyota) || (green,NULL) | |
3. | Cathy | (red,Acura) | ? |
4. | Diane | (red,Toyota) || (blue,Toyota) |
1. | (Frank,red,Toyota) || (Frank,blue,Toyota) | |
2. | (Billy,blue,Honda) | ? |
3. | (Jimmy,green,Mazda) || (Johnny,green,Mazda) |
Semantics: Recall that a ULDB represents a set of possible instances, which are regular databases. The formal semantics of TriQL states that the result of a SQL query Q on a ULDB database D must represent exactly those possible instances generated by applying Q to each possible instance of D. An operational semantics that respects this formal semantics has also been defined; see An Introduction to ULDBs and the Trio System.
The following query finds (possible) sightings that involve Toyotas:
Select * From Saw Where car = 'Toyota'The result of this query on the sample data is:
Amy | (red,Toyota) | ? |
Betty | (green,Toyota) | ? |
Diane | (red,Toyota) || (blue,Toyota) |
The following query finds the colors of (possibly) sighted Mazdas and Toyotas:
Select color From Saw Where car = 'Mazda' or car = 'Toyota'The result of this query on the sample data is:
red | ? |
green || green | ? |
red || blue |
Select Merged color From Saw Where car = 'Mazda' or car = 'Toyota'The result of the modified query on the sample data is:
red | ? |
green | ? |
red || blue |
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.carThe result of this query on the sample data follows. Note that witness remains a certain attribute in the query result.
Amy | (Frank,red,Toyota) | ? |
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Amy | (Billy,blue,Honda) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
Running the same query but omitting the witness returns:
(Frank,red,Toyota) | ? |
(Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
(Billy,blue,Honda) | ? |
(Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
Finally consider a query that generates suspects based on car color only:
Select Merged D.person as suspect, D.color From Drives D, Saw S Where D.color = S.colorThe result of this query on the sample data is:
(Frank,red) || (Frank,blue) | ? |
(Frank,red) | ? |
(Frank,red) || (Frank,blue) | ? |
(Billy,blue) | ? |
(Billy,blue) | ? |
(Jimmy,green) || (Johnny,green) |
Select color, car From Drives Where (color,car) Not In (Select color, car From Saw)(In the absence of multi-attribute In, the same query can be written using Not-Exists.) On first glance one might think the result on the sample data is empty, since every car in Drives also appears in some alternative of Saw. However quite the opposite is true: the result contains all of the cars in Drives, since for every car there is some possibility that the car does not appear in Saw:
(red,Toyota) || (blue,Toyota) | ? |
(blue,Honda) | ? |
(green,Mazda) || (green,Mazda) | ? |
Examples with subqueries in the From clause appear in Grouping and Aggregation and in Result Confidences, below.
Select Distinct color, car from SawThe result of this query on the sample data is:
(blue,Honda) | ? |
(red,Toyota) | ? |
(green,Mazda) || (green,Toyota) || (green,NULL) | |
(red,Acura) | ? |
(blue,Toyota) | ? |
Note that by definition the Distinct option eliminates horizontal as well as vertical duplicates.
Select Merged color, Count(*) as #sightings From Saw Group By colorThe result of this query on the sample data is:
blue | 1 || 2 | ? |
red | 1 || 2 || 3 | ? |
green | 1 |
The following query is rather contrived, but it further illustrates aggregation while also demonstrating a subquery in the From clause, and again showing the importance of lineage for certain types of query results. This query finds, for all sighting counts, how many different colors have that count. (For example, the result might tell us that the number of colors with 2 sightings is 3.)
Select Merged CC.#sightings, Count(*) as #colors From (Select Merged color, Count(*) as #sightings From Saw Group By color) as CC Group By CC.#sightingsThe subquery named CC is the same query as in the previous example. The result of this query on the sample data is:
1 | 1 || 2 || 3 | |
2 | 1 || 2 | ? |
3 | 1 | ? |
In general, grouping and aggregation can be very expensive -- just the result data (never mind its computation or lineage) can be exponential in the size of the input data. Thus, TriQL also includes more efficient variants for the aggregate functions. Specifically, for each of the five standard SQL aggregation functions -- Count, Min, Max, Aum, and Avg -- TriQL includes three variants: low, high, and expected.
If the full result of an aggregation in a TriQL query (without horizontal merging) is the set of possible values {A1, A2, ..., An}, then the corresponding low aggregate is the least value in the set, the corresponding high aggregate is the greatest value in the set, and the corresponding expected aggregate is the "expected value" of the set in a probabilistic sense: the average of the values, weighted by confidences if present.
For example, suppose we modify our "sightings by car color" query to:
Select color, Lcount(*) as low, Hcount(*) as high, Ecount(*) as exp From Saw Group By colorThe result of this query on the sample data is:
blue | 0 | 2 | 1.0 |
red | 0 | 3 | 1.5 |
green | 1 | 1 | 1.0 |
With five standard aggregate functions and four possibilities for each, TriQL offers 20 built-in aggregate functions as shown in the following table. (Currently, TriQL does not include the Distinct versions of the aggregate functions, e.g., "Select Count(Distinct witness)..." is not included. Equivalent queries can be written -- albeit less conveniently -- using extra subqueries and standard aggregation.)
Full | Low | High | Expected | |
Count | Count | Lcount | Hcount | Ecount |
Min | Min | Lmin | Hmin | Emin |
Max | Max | Lmax | Hmax | Emax |
Sum | Sum | Lsum | Hsum | Esum |
Average | Avg | Lavg | Havg | Eavg |
As a simple example, the following query finds cars (without colors) that appear in both Saw and Drives.
(Select car from Saw) Intersect (Select car from Drives)The result of this query on the sample data is:
Toyota | |
Honda | ? |
Mazda | ? |
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car Order By S.witnessThe result of this query on the sample data is:
Amy | (Frank,red,Toyota) | ? |
Amy | (Billy,blue,Honda) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Ordering by confidence values is discussed in Ordering By Result Confidences, below
Select Flatten * From SawThe result of this query on the sample data is:
Amy | blue | Honda | ? |
Amy | red | Toyota | ? |
Betty | green | Mazda | ? |
Betty | green | Toyota | ? |
Betty | green | NULL | ? |
Cathy | red | Acura | ? |
Diane | red | Toyota | ? |
Diane | blue | Toyota | ? |
The lineage of a flattened result tuple identifies the tuple alternative from which it was derived. Thus, considering lineage, flatten has little more than a cosmetic effect: lineage doesn't permit separate result tuples derived from alternatives of the same input tuple to coexist. Hence, queries with Flatten commonly include the NoLineage option (below).
GroupAlts takes a list of attributes. It "reorganizes" the result by creating tuples with common values for the specified attributes and alternative values for the others. Implicitly, the data is flattened and then new tuples with alternative values are created. As an example, the following query reorganizes the Saw table so that the data is "keyed" on color and uncertain about car and witness, rather than "keyed" on witness and uncertain about car and color.
Select GroupAlts(color) * From SawThe result of this query on the sample data is:
blue | (Amy,Honda) || (Diane,Toyota) | ? |
red | (Amy,Toyota) || (Cathy,Acura) || (Diane,Toyota) | ? |
green | (Betty,Mazda) || (Betty,Toyota) || (Betty,NULL) |
Subqueries enclosed in [] instead of the usual () are "horizontal": they refer to the set of alternatives in each tuple being evaluated. Horizontal subqueries can be used in the Where and Select clauses. Enclosing a subquery in "[]" signals that instead of evaluating the subquery over the entire database, it should instead be evaluated over the set of alternatives comprising the "current tuple" being processed by the Where or Select clause. Thus, the From list in a horizontal subquery must only include tables in the From list of the outer query.
Since horizontal subqueries treat the alternatives in the "current tuple" as a regular relation and not as a ULDB relation, horizontal subqueries may use SQL constructs but not TriQL constructs. The only exception is that TriQL built-in functions and predicates Conf(), Maybe(), and Lineage() are permitted in horizontal subqueries.
A series of examples motivates horizontal subqueries and illustrates their various uses.
The following query finds all sighting tuples with at least two alternatives involving Toyotas:
Select * From Saw Where 2 <= [Select Count(*) From Saw Where car = 'Toyota']In the "[]" subquery, Saw refers to the alternatives of the "current tuple" being considered, treating the set of alternatives as a relation. The result of this query on the sample data is:
Diane | (red,Toyota) || (blue,Toyota) |
The next example finds all Saw tuples for which no car appears in more than one alternative, returning witnesses and cars.
Select witness, car From Saw Where 1 =All [Select Count(*) From Saw Group By car]All the tuples in Saw satisfy the predicate except for tuple 4. The result of this query on the sample data is thus:
Amy | Honda || Toyota | ? |
Betty | Mazda || Toyota || NULL | |
Cathy | Acura | ? |
Although horizontal subqueries refer to all the alternatives of the "current tuple", the Where predicate is still evaluated on one alternative at a time, and each alternative may contribute to the result. For example, suppose we modify our first query to select the car attribute only:
Select car From Saw Where 2 <= [Select Count(*) From Saw Where car = 'Toyota']The result of this query on the sample data is:
Toyota || Toyota |
In the examples so far, predicates on horizontal subqueries have been true for all alternatives of a tuple or for none of them. However, once horizontal subquery results are compared to attributes, or horizontal subqueries have correlated references, this property no longer holds. For example, the following query returns only those alternatives whose color is lexicographically <= to all other colors in its tuple:
Select * From Saw Where color <=All [Select color From Saw]The result of this query on the sample data is:
Amy | (blue,Honda) | ? |
Betty | (green,Mazda) || (green,Toyota) || (green,NULL) | |
Cathy | (red,Acura) | ? |
Diane | (blue,Toyota) |
Select * From Saw S1 Where Exists [Select * From Saw S2 Where S1.car = S2.car and S1.color <> S2.color]The result of this query on the sample data is:
Diane | (red,Toyota) || (blue,Toyota) |
Next consider horizontal subqueries when the outer query involves more than one table. The following query joins Drives and Saw to generate a set of suspects, as in the original join example, but now it keeps only those result tuples where at least one alternative mentions Johnny or Diane.
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Exists [Select * From Drives, Saw Where suspect = 'Johnny' Or witness = 'Diane']The result of this query on the sample data is:
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
(Exists [Select * From Drives Where suspect = 'Johnny'] Or Exists [Select * From Saw Where witness = 'Diane'])In a TriQL query, the "current tuple" being evaluated combines one tuple from each of the tables in the From clause, just as in SQL. However, in TriQL there is an additional "nested cross-product" in which all combinations of alternatives in the combined tuples are considered. (An operational semantics detailing this behavior can be found in An Introduction to ULDBs and the Trio System.) Horizontal subqueries effectively perform the same nested cross-product, but only using the tuples from the tables listed in the subquery's From clause.
For example, in the first expression of the join query above, the horizontal subquery logically performs a nested cross-product of Drives and Saw alternatives in the "current tuple", then checks if any of the cross-product alternatives mention suspect Johnny or witness Diane. Since suspect is from Drives and witness is from Saw, and the query is asking for the "or" and not the "and" of the two conditions, using two one-table horizontal subqueries in the second formulation of the query is equivalent. If the query instead wanted to keep only those result tuples where at least one alternative mentions both Johnny and Diane together, then the horizontal subquery could not be split up, i.e., only the first formulation with Or replaced by And would be correct (and the result on the sample data would be empty).
where the table-list in the expanded version implicitly lists all tables in the outer query. ("{}" encloses optional clauses, and note that the "Select" keyword is also dropped in this shortcut.)
For example, the first query under [] in the Where Clause can be rewritten using this shortcut as:
Select * From Saw Where 2 <= [Count(*) Where car = 'Toyota']Even more dramatic, the query performing lexicographic color comparisons can be rewritten as:
Select * From Saw Where color <=All [color]When this syntactic shortcut is used, attribute references within "[]" cannot be qualified with a table name or variable -- they are bound implicitly to the tables in the implicit table-list. If attribute names need to be disambiguated then the shortcut cannot be used.
[* Where Predicate]
is shorthand for
[Predicate]
For example, in the query under [] with Joins, the first shortcut can be used to reduce the final conjunct in the Where clause to:
Exists [* Where suspect = 'Johnny' Or witness = 'Diane']and the second shortcut can be used to further reduce it to:
Exists [suspect = 'Johnny' Or witness = 'Diane']The final example for this section is another demonstration of the second shortcut, also highlighting succinctly the difference between putting a condition inside "[]" versus outside. The query:
Select * From Saw Where Exists [car = 'Toyota']returns all tuples where at least one alternative contains Toyota:
Amy | (blue,Honda) || (red,Toyota) | ? |
Betty | (green,Mazda) || (green,Toyota) || (green,NULL) | |
Diane | (red,Toyota) || (red,Toyota) |
Amy | (red,Toyota) | ? |
Betty | (green,Toyota) | ? |
Diane | (red,Toyota) || (blue,Toyota) |
Select Merged witness, [Count(Distinct color)] as #colors From SawThe result of this query on the sample data follows.
Amy | 2 | ? |
Betty | 1 | |
Cathy | 1 | ? |
Diane | 2 |
Similarly, the following query associates with every Toyota alternative the number of non-Toyota alternatives in its tuple:
Select witness, color, car, [Count(*) as #nonToyota Where car <> 'Toyota'] From Saw Where car = 'Toyota'The result of this query on the sample data is:
Amy | (red,Toyota,1) | ? |
Betty | (green,Toyota,1) | |
Diane | (red,Toyota,0) || (blue,Toyota,0) | ? |
In the two previous examples the horizontal subquery always produces a single value, so the usual SQL (TriQL) semantics for the Select clause can be followed. If the horizontal subquery in a Select clause may produce a tuple or set of tuples, then it must be the only component of the Select clause, and the tuples produced by the horizontal subquery become the alternatives of the result tuple.
As an example, the following query returns the number of alternatives for each car-color combination in the tuples resulting from joining Drives and Saw.
Select [Select D.color, D.car, Count(*) as #alts From Drives D, Saw S Where D.color = S.color and D.car = S.car Group By D.color, D.car] From Drives, SawReferring back to Join for the join result, the result of this query on the sample data is:
(red,Toyota,1) | ? |
(red,Toyota,1) || (blue,Toyota,1) | ? |
(blue,Honda,1) | ? |
(green,Mazda,2) | ? |
A syntactic subtlety occurs with horizontal subqueries when the outer query includes two or more occurrences of a single table (i.e., self-joins). Tables in the From clause of a horizontal subquery correspond to tables in the outer query by table name. However, if a horizontal subquery includes table T, and table T occurs more than once in the From clause of the outer query, then the inner reference is ambiguous. This ambiguity is resolved by defining that the nth reference to a table T in the From clause of a horizontal subquery corresponds to the nth reference to T in the outer query's From clause.
As an example, suppose we perform a self-join on two instances of Saw to find pairs of witnesses who saw the same type of car:
Select S1.witness as witness1, S2.witness as witness2, S1.car, S1.color as color1, S2.color as color2 From Saw S1, Saw S2 Where S1.car = S2.car And S1.witness < S2.witnessThe result of this query on the sample data follows. Note that witness1 and witness2 are listed first for convenience in table layout (at the expense of understandability!), since they are certain attributes in the result.
Amy | Betty | (Toyota,red,green) | ? |
Amy | Diane | (Toyota,red,red) || (Toyota,red,blue) | ? |
Betty | Diane | (Toyota,green,red) || (Toyota,green,blue) | ? |
Now suppose we only want to keep those tuples where in at least one of the alternatives, the colors of the two sighted cars are the same. We write this query as:
Select S1.car, S1.witness as witness1, S1.color as color1, S2.witness as witness2, S2.color as color2 From Saw S1, Saw S2 Where S1.car = S2.car And S1.witness < S2.witness And Exists [Select * from Saw S3, Saw S4 Where S3.color = S4.color]This query returns only the second tuple of the result above. The first (S3) and second (S4) references to Saw in the horizontal subquery by definition correspond to the first (S1) and second (S2) references in the outer query (although due to its symmetric nature, in this particular query the correspondance is not exploited). This query also illustrates the use of table variables in horizontal subqueries, which are permitted just as in regular subqueries, with or without self-joins.
Neither of the syntactic shortcuts described above can be used in horizontal subqueries when the outer query has self-joins. For both shortcuts, ambiguity would be guaranteed in any query that used them.
1. | Amy | (blue,Honda):0.4 || (red,Toyota):0.3 | ? |
2. | Betty | (green,Mazda):0.5 || (green,Toyota):0.2 || (green,NULL):0.3 | |
3. | Cathy | (red,Acura):0.6 | ? |
4. | Diane | (red,Toyota):0.2 || (blue,Toyota):0.8 |
1. | (Frank,red,Toyota):0.7 || (Frank,blue,Toyota):0.3 | |
2. | (Billy,blue,Honda):0.9 | ? |
3. | (Jimmy,green,Mazda):0.4 || (Johnny,green,Mazda):0.6 |
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(D) > 0.3 And Conf(S) > 0.3The result of this query on the sample data (with default probabilistic interpretation of confidences) is:
Amy | (Billy,blue,Honda):0.36 | ? |
Betty | (Jimmy,green,Mazda):0.2 || (Johnny,green,Mazda):0.3 | ? |
Queries may also filter tuples based on whether they include ?'s. With confidences, a query could determine if a tuple in a table T has a "?" by checking "[Select Sum(Conf(T)) From T] < 1". However, without confidences -- or simply for convenience -- a special built-in function Maybe() can be used. Maybe() takes a table name or variable from the From clause as a parameter, and it returns true if and only if the "current alternative" being evaluated from that table comes from a tuple with a "?". For example, the following query generates suspects based only on data from non-maybe tuples:
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And (Not Maybe(D)) And (Not Maybe(S))The result of this query on the sample data is:
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
An interesting special case occurs when Conf() lists all of the tables in the query's From clause. In this case, if the query's Select clause is a simple attribute list and does not override the default confidence values, and if no dupilicate-merging occurs (horizontal or vertical), then following TriQL semantics the Conf() function returns the same confidence value that will be assigned to the corresponding result alternative. (See Result Confidences below for more details.) Thus, Conf() can be used to filter results based on their confidence values, in addition to filtering input confidence values as in the earlier example. A syntactic shortcut for this case is Conf(*).
As an example that mixes single-table and multi-table Conf(), the following query once again computes the join of tables Drives and Saw to generate suspects, but it only uses sightings with confidence > 0.3 and only retains result alternatives with confidence > 0.2.
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(S) > 0.3 And Conf(*) > 0.2(Note Conf(*) could be replaced with Conf(D,S).) The result of this query on the sample data is:
Amy | (Billy,blue,Honda):0.36 | ? |
Betty | (Johnny,green,Mazda):0.3 | ? |
Diane | (Frank,blue,Toyota):0.24 | ? |
A query can override the default result confidence values by assigning values in its Select clause to the reserved attribute name conf. Suppose in the join example with confidences above, instead of multiplying to get result confidences, the lesser of the two confidences on the contributing alternatives should propagate to the result. The query is:
Select S.witness, D.person as suspect, D.color, D.car, smaller(Conf(D),Conf(S)) as conf From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(D) > 0.3 And Conf(S) > 0.3The result of this query on the sample data is:
Amy | (Billy,blue,Honda):0.4 | ? |
Betty | (Jimmy,green,Mazda):0.4 || (Johnny,green,Mazda):0.5 | ? |
blue | 4 |
red | 1 |
green | 5 |
Select witness, color, car, rel/[sum(rel)] as conf From (Select witness, color, car, (Select reliability From ColorRel Where color = Saw.color) as rel From Saw)
The subquery in the From clause adds a reliability
measure to each alternative in Saw based on its color. In
the Select clause, rel refers to the value
in the "current alternative", while [sum(rel}] sums all
rel values in the current tuple. The result of the
From subquery on the sample data is:
Amy | (blue,Honda,4) || (red,Toyota,1) | ? |
Betty | (green,Mazda,5) || (green,Toyota,5) || (green,NULL,5) | |
Cathy | (red,Acura,1) | ? |
Diane | (red,Toyota,1) || (blue,Toyota,4) |
Amy | (blue,Honda):0.8 || (red,Toyota):0.2 |
Betty | (green,Mazda):0.333 || (green,Toyota):0.333 || (green,NULL):0.333 |
Cathy | (red,Acura):1.0 |
Diane | (red,Toyota):0.2 || (blue,Toyota):0.8 |
Confidence values may be computed via a user-defined plug-in function. Suppose for the sake of illusration that a user-defined function ToyotaConf takes as parameters an input confidence and a color, and adjusts the confidence in some fashion based on the color. Consider the query:
Select witness, color, car, ToyotaConf(Conf(Saw),color) as conf From Saw Where car = 'Toyota'Suppose the function doubles confidences for green Toyotas, halves them for red Toyotas, and leaves them unchanged for blue Toyotas. Then the result on the original sample data with confidences is:
Amy | (red,Toyota):0.15 | ? |
Betty | (green,Toyota):0.4 | ? |
Diane | (red,Toyota):0.1 || (blue,Toyota):0.8 | ? |
Select Merged car, 0.1 as conf From DrivesThe result over the original Drives table with confidences is:
Toyota:0.2 | ? |
Honda:0.1 | ? |
Mazda:0.2 | ? |
The special keyword uniform used with "as conf" assigns confidence values to a tuple as follows:
As an example of uniform, consider the following simple query:
Select *, uniform as conf From SawThe result of this query over the original Saw table, with or without confidences, is:
Amy | (blue,Honda):0.3333 || (red,Toyota):0.3333 | ? |
Betty | (green,Mazda):0.3333 || (green,Toyota):0.3333 || (green,NULL):0.3333 | |
Cathy | (red,Acura):0.5 | ? |
Diane | (red,Toyota):0.5 || (blue,Toyota):0.5 |
The keyword uniform can also precede a table name in the From clause of a query. The following two are equivalent:
From ..., uniform T, ... From ..., (Select *, uniform as conf From T) as T, ...Thus, preceding a table name by uniform in the From clause logically assigns uniform confidence values to the table for the duration of the query.
As an example of scaled, consider the following simple query:
Select *, scaled as conf From SawThe result of this query on the over the original Saw table with confidences is:
Amy | (blue,Honda):0.5714 || (red,Toyota):0.4286 |
Betty | (green,Mazda):0.5 || (green,Toyota):0.2 || (green,NULL):0.3 |
Cathy | (red,Acura):1.0 |
Diane | (red,Toyota):0.5 || (blue,Toyota):0.5 |
scaled as conf Conf(*)/[Sum(Conf(*))] as conf
It is also possible to update confidence values in a table, including the uniform and scaled options; see Update Statement below.
Select * From Saw Order By Confidences Desc, witnessThe result of this query on the sample data is:
Betty | (green,Mazda):0.5 || (green,Toyota):0.2 || (green,NULL):0.3 | |
Diane | (red,Toyota):0.2 || (blue,Toyota):0.8 | |
Amy | (blue,Honda):0.4 || (red,Toyota):0.3 | ? |
Cathy | (red,Acura):0.6 | ? |
Adding "compute confidences" to the end of a query causes result confidences to be computed as part of query execution, instead of on-demand. For example:
Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car Compute Confidencesover the sample data with confidences automatically computes the confidence values for the result. Without the last line, confidences would be computed only when requested.
Create Table Suspects as Select S.witness, D.person as suspect, S.color, S.car Into Suspects From Drives D, Saw S Where D.color = S.color and D.car = S.carThe Suspects table contains:
1. | Amy | (Frank,red,Toyota) | ? |
2. | Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
3. | Amy | (Billy,blue,Honda) | ? |
4. | Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
Alternative | Lineage |
Suspects[1,1] | Drives[1,1], Saw[1,2] |
Suspects[2,1] | Drives[1,1], Saw[4,1] |
Suspects[2,2] | Drives[1,2], Saw[4,2] |
Suspects[3,1] | Drives[2,1], Saw[1,1] |
Suspects[4,1] | Drives[3,1], Saw[2,1] |
Suspects[4,2] | Drives[3,2], Saw[2,1] |
Create Table NMSuspects as Select Suspects.* Into NMSuspects From Suspects, Drives, Saw Where Lineage(Suspects,Drives) And Lineage(Suspects,Saw) And (Not Maybe(Drives)) And (Not Maybe(Saw))Lineage(R,S) is true if and only if the R alternative being considered has the S alternative being considered in its lineage. Thus, without the last line, this query would produce a result that corresponds to the table of lineage relationships above. The last line of the query eliminates any join results involving Drives tuple 2 or Saw tuples 1 or 3, since they have ?'s. The final query result is:
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
Select * From Suspects Where Not Exists (Select * from Drives Where Lineage(Suspects,Drives) And Maybe(Drives)) And Not Exists (Select * from Saw Where Lineage(Suspects,Saw) And Maybe(Saw))When TriQL queries are issued over stored results of previous queries, indirect lineage results. In the general case, Lineage(R,S) is true if and only if the R alternative being considered has the S alternative being considered in the transitive closure of its lineage---i.e., the S alternative is reachable from the R alternative via lineage relationships. As an example of indirect lineage, the following query finds those alternatives in NMSuspects for which the original Drives and Saw tuples from which they were (indirectly) derived have no more than two alternatives.
Select * From NMSuspects Where 2 <=All (Select [Count(*)] from Drives Where Lineage(NMSuspects,Drives)) And 2 <=All (Select [Count(*)] from Saw Where Lineage(NMSuspects,Saw))The result of this query on the sample data is:
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
The NoLineage option says that lineage should not be
retained for any data in the query result. In general omitting lineage
improves efficiency (both time and space) at the expense of
eliminating the possibility of querying or examining the derivation of
result data. However, omitting lineage can have semantic ramifications
in some cases. Recall the join query producing suspects:
Amy | (Frank,red,Toyota) | ? |
Diane | (Frank,red,Toyota) || (Frank,blue,Toyota) | ? |
Amy | (Billy,blue,Honda) | ? |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) | ? |
The NoConf option is very simple: it omits confidence values from the query result. The NoMaybe option similarly omits ?'s from the query result. NoMaybe automatically omits confidence values as well, since confidence values can implicitly generate ?'s. As an example, the earlier query generating suspects based on high-confidence data only can be rewritten to drop the ?'s, and therefore also drop the confidence values:
Select NoMaybe S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(D) > 0.3 And Conf(S) > 0.3returns on the sample data:
Amy | (Billy,blue,Honda) |
Betty | (Jimmy,green,Mazda) || (Johnny,green,Mazda) |
Insert Into table-name Values (tuple-spec) Insert Into table-name subqueryThe tuple-spec has a special syntax enabling the specification of alternative values for uncertain attributes, confidence values, and "?". Examples are:
Insert Into Saw Values ('Mary', [('black','Acura'):0.7 | ('purple','Mazda'):0.2]) Insert Into Drives Values ([('Hank','black','Acura') | ('Hank','purple','Mazda')])?A different kind of insertion -- adding more alternative values to existing tuples -- is part of the Update statement.
The syntax for Delete is the same as SQL, except for an added option "AdjConf":
Delete {AdjConf} From table-name {Where predicate}This statement deletes from the table every tuple alternative satisfying the predicate (or all tuples, in the absence of a Where clause). The predicate can be any SQL predicate, possibly including regular or horizontal subqueries, and TriQL built-in functions and predicates Conf(), Maybe(), and Lineage(), just as in the Where clause of Select statements.
If all alternative values for a given tuple are deleted, then the tuple is removed entirely from the table. Consider a tuple where some but not all alternative values are deleted. Then:
Amy | (blue,Honda):0.4 || (red,Toyota):0.3 | ? |
Betty | (green,Mazda):0.5 || (green,Toyota):0.2 || (green,NULL):0.3 | |
Cathy | (red,Acura):0.6 | ? |
Diane | (red,Toyota):0.2 || (blue,Toyota):0.8 |
Delete From Saw Where car = 'Toyota'Table Saw after this statement is:
Amy | (blue,Honda):0.4 | ? |
Betty | (green,Mazda):0.5 || (green,NULL):0.3 | ? |
Cathy | (red,Acura):0.6 | ? |
Amy | (blue,Honda):0.7 | ? |
Betty | (green,Mazda):0.625 || (green,NULL):0.375 | ? |
Cathy | (red,Acura):0.6 | ? |
Now consider the original Drives table without confidences:
(Frank,red,Toyota) || (Frank,blue,Toyota) | |
(Billy,blue,Honda) | ? |
(Jimmy,green,Mazda) || (Johnny,green,Mazda) |
Delete From Drives Where color = 'blue'Table Drives after this statement is:
(Frank,red,Toyota) | ? |
(Jimmy,green,Mazda) || (Johnny,green,Mazda) |
The basic Update statement is the same as in SQL:
Update table-name Set attr-list = expression-list {Where predicate}This statement updates every alternative satisfying the predicate (or all alternatives, in the absence of a Where clause). Recall that "alternative" denotes a complete tuple comprised of the certain attributes together with one choice of alternative values for the uncertain attributes.
As in Select and Delete statements, the predicate can be any SQL predicate, possibly including regular or horizontal subqueries, and TriQL built-in functions and predicates Conf(), Maybe(), and Lineage(). Each attribute in the attr-list is updated with the result of the corresponding expression on the right-hand side of the =, just as in SQL. The expression can be composed from attributes, constants, and regular or horizontal subqueries that always produce single-value results. In the special case where the updated attribute is "conf", the alternatives' confidence values are updated.
In general both certain and uncertain attributes may be updated, however updating certain attributes incurs some restrictions on the Update statement. Specifically, the corresponding expression on the right-hand-side of the =, and the query's Where predicate, must produce the same value across all alternatives in each tuple. (Otherwise whether or not to update the certain attribute, and what its new value is, can be ambiguous.) This requirement can be enforced syntactically by ensuring that every reference to an uncertain attribute, Conf(), or Lineage() occurs within a horizontal subquery.
As an example, consider a variation on the example query in Result Confidences. The following Update statement reduces confidences in table Saw using a scaling factor based on color reliability values in the range 1-5 drawn from table ColorRel.
Update Saw Set conf = conf * (Select reliability From ColorRel C Where C.color = Saw.color) / 5On the original Saw table with confidences, and using the ColorRel data shown earlier ([blue,4], [red,1], [green,5]), table Saw after this statement is:
Amy | (blue,Honda):0.32 || (red,Toyota):0.06 | ? |
Betty | (green,Mazda):0.5 || (green,Toyota):0.2 || (green,NULL):0.3 | |
Cathy | (red,Acura):0.12 | ? |
Diane | (red,Toyota):0.04 || (blue,Toyota):0.64 |
When confidence values are updated, there is no analogy to the AdjConf option in the Delete statement, and there is no guarantee that confidences in a tuple sum to <= 1. However, both the uniform and scaled options for "as conf" in queries (see Uniform and Scaled Result Confidences above) can also be used for updating confidences:
Update T Set conf = uniform Update T Set conf = scaledAnalogous to "scaled as conf", when table T has existing confidences the second statement is equivalent to:
Update T Set conf = Conf(*)/[Sum(Conf(*))]
Updating a regular attribute (as opposed to special attribute conf) introduces no surpises provided the restriction above on certain attributes is met, so no examples are provided.
A new type of update required in TriQL is one that adds one or more additional alternative values to one or more existing tuples. The following syntax is used:
Update {Merged} table-name AltInsert expression {Where predicate}The predicate is the same as in the regular Update statement, and it incurs the same restriction imposed for updating certain attributes: every reference to an uncertain attribute, Conf(), or Lineage() must occur within a horizontal subquery, to ensure that the predicate is either true or false across all alternatives in each tuple. (The expression does not incur this restriction.)
For each tuple whose alternatives satisfy the predicate (or all tuples, in the absence of a Where clause), the expression is evaluated to produce a set of additional alternative values. The expression can be "Values tuple-spec" as in an Insert statement to produce a single new alternative, or it can be a regular or horizontal subquery that produces zero or more results. In both cases, the schema of the new alternative values to be inserted must match the schema of the table being updated, restricted to its uncertain attributes. If new alternative values to be inserted are duplicates of existing ones (or of other new ones), then if option Merged is specified, horizontal duplicates are merged.
The following update adds to each tuple in the original without-confidences table Saw an additional alternative (color,NULL) for every existing alternative (color,car). (This update might be made if, for example, it's determined that witnesses are accurate with colors but often misidentify cars.)
Update Merged Saw AltInsert [color,NULL]Recall that the non-shortcut version of the horizontal subquery is "[Select color,NULL From Saw]", with Saw restricted to the alternatives of the "current tuple". Table Saw after this statement is:
Amy | (blue,Honda) || (red,Toyota) || (blue,NULL) || (red,NULL) | ? |
Betty | (green,Mazda) || (green,Toyota) || (green,NULL) | |
Cathy | (red,Acura) || (red,NULL) | ? |
Diane | (red,Toyota) || (blue,Toyota) || (red,NULL) || (blue,NULL) |
When new alternatives with confidence values are inserted, by default all confidence values in the expanded tuple are scaled so they sum to the same value summed to by the original confidences. (When horizontal duplicates are merged, their confidences are added before scaling, however one unresolved corner case occurs when duplicates already exist but new ones are eliminated.) As always, confidence values can subsequently be modified to override the default. Suppose the following update is applied to the original Saw table with confidences:
Update Merged Saw AltInsert [color, NULL, 0.1 as conf]Table Saw after this statement is (to 4 decimal places):
Amy | (blue,Honda):0.3111 || (red,Toyota):0.2333 || (blue,NULL):0.0777 || (red,NULL):0.0777 | ? |
Betty | (green,Mazda):0.3846 || (green,Toyota):0.1538 || (green,NULL):0.4615 | |
Cathy | (red,Acura):0.5143 || (red,NULL):0.0857 | ? |
Diane | (red,Toyota):0.1666 || (blue,Toyota):0.6666 || (red,NULL):0.0833 || (blue,NULL):0.0833 |
A second new type of Update is one that adds or removes ?'s on tuples. The syntax is very simple:
Update table-name {AddMaybe | DelMaybe} {Where predicate}As in the Update statement for adding alternatives, every reference to an uncertain attribute, Conf(), or Lineage() in the Where clause must occur within a horizontal subquery, to ensure that the predicate is either true or false across all alternatives in each tuple. This statement updates every tuple whose alternatives satisfy the predicate (or all tuples, in the absence of a Where clause). If AddMaybe is specified, a "?" is added to each updated tuple that does not already have a "?". If DelMaybe is specified, "?" is removed from each updated tuple that has a "?". As an example, the following query removes ?'s from any tuples in Saw having only one alternative:
Update Saw DelMaybe where [Count(*)] = 1On the original Saw table, this statement removes the "?" on tuple 3. As a second example, the following update adds ?'s to tuples 2 and 4 by updating any tuple with at least one Toyota alternative:
Update Saw AddMaybe where exists [car = 'Toyota']