HPL SPARQL.pdf
该文档为英文文档,详细介绍了SPARQL的关系代数,有具体的实例供参考和验证。SElecT ?name ?emailWhere C?person raf: type foaf: Person?person foaf: name ? nameOPTIONAL t ?person foaf: mbox ?email yTperson←? subject? person←? subjectpredicate-foaf: mbo?predicate=rdf: ty∧? object=oaf: PersonTriplesTripleTriplFig. 1. A SPARQL query and its translation into a relational operator treeThe framework presented here does not cover all of SPARQL. Only queriesagainst the default graph are supported; an extension with full RDF datasetsupport is outlined in section 2. 4, but not fully worked out. A translation ofFILTER expressions into sQL remains as future work. Some corner cases are noteasily translated they are discussed in section 41 A relational model over rdf termsAn rdf term is an iri or an rdf literal or a blank node as defined in theRDF Concepts and Abstract Syntax document 4. RDF terms are the scalarsof the relational model presented herea variable is a query variable, as defined in the SPARQL document [5.Forthe purposes of this report, variables are just names and are represented likethis: variableAn rdf tuple is a partial function from variables to rDf terms?person-naneenal?age→"42An rdf tuple is quite different from an rdf triple. a triple always hasthree components with fixed names, subject, predicate and object, while anRDf triple can havc any number of componcnts with arbitrary namcs. A triplois a statement-it implies a semantic relationship between its components. Atuple doesn't carry meaning, it is just a container that maps some variables tosome rdf termsNote, however, that any triple could be represented as a tuplesubjectprea.cateex: name?I will call this an s/p o tuple because it has three components namedsubject, predicate, object. There is a one-to-one mapping from RDf triples to/p/o tuplesRdf tuples are just another term for SPARQL solutions. The term tupleused universally in relational algebra anld I will go with it ill this documentThe variables in a tuple are called its attributes. An attribute is said tobe bound if it is in the tuples domain, and unbound otherwise. In the firstexample of this section, per son is bound, and? dateOf Birth is unbound becauseot, in the tuple1.1 RDF relationrow is an RDF tuple and each column is an attribute, named by a variable achAn Rdf relation is a set of rdf tuples. It can be represented as a table. eePerson?naremailex: Alice"AliceNot all tuples of a rclation nccd to have the samc attributes. An attributecan be unbound in some tuples In the example, ?email is unbound in one tupleThis is similar to the special value NUlL found in some definitions of relationalalgebra and sq, but not quite thetion 4An RdF relation's heading is the set of variables that may be bound in anyof its tuples. Intuitively speaking, the heading contains all column names. Theheading can usually obtained by looking at the attributes present in each tupleDefining an explicit heading that is independent from the tuples allows reasoningabout operations on relations without looking at the data they containIn the full relational model. every attribute has its own domain. In thepresented model, the domain of every attribute is simply the set of a II RDFterms. However, an extension that takes possible values of individual columnsinto account would have benefits. For example, an implementation might choosea more efficient strategy for handling an attribute whose domain is the set of allRDF literals with datatype xsd: integerA graph relation is any relation whose heading is subject,? predicateObject. Each of the attributes Illust be bound in every tuple, ?subject lustal ways he a literal, and ?predicate must, al ways be an IRI. In other words, everytuple must be an s/p/o tuple?subject ?predicate?objectex: Alicelex: name "Alice"ex: Alicelex: age33ex: Bob ex: nameBob'ex: Bob ex: mbox@example. org>ex: Bob ex: ageAs stated above, every s/p/o tuple has a corresponding RDF triple. It followsthat every graph relation has an equivalent rdf graph, and vice versa. Thegraph relation above corresponds to this rdf graphex: Alice ex: name Aliceex: Alice ex: age 33Bobx: Bob ex: mbolto: bob@example. orgx: Bobge"42The rest of the section defines operators on RDF relations1.2 SelectionSelection(o), sometimes also called restriction, is an unary operator that se-lects only those tuples of a relation for which a propositional formula holds. Thepropositions are assumed to have the expressivity of SParQl FILTEr expres-(7)A simple form of Selection is the constant attribute selection, whichaccepts all tuples with a constant value on some attribute. The example usesa glThe heading of any op(R)is the same as the heading of R1.3 Projection and renameThe projection operator (T) restricts a relation to a subset of its attributes.The rename operator(p) renames an attribute. For the purposes of this reportI will combine them into a projection /rename operator. This is a shorthandlotation to keep the operator trees sillaller丌 erson- subiect(Fncme←? objectln This projection contains only the ?subject and ?object attributes of relaR, relaled to? person and ?name1.4 Inner Join and left outer joinThe inner join(a) joins two relations on their shared attributes. A M Bntains all combinations of a tuple from A and a tuple from B, minus thosewhere the shared attributes are not, equalA shared attribute is any attribute that is bound in both tuples. This isdifferent from regular relational algebra, where a shared attribute would be anythat occurs in both relations. This is discussed in more detail in section 4.2The joins heading is the set union of the headings of A and BB?person ?name? person? parentex: Alice"Aliceex: Bob ex: Charlesex: Bob"Bobx:Bob ex:DorothyAx Bper son ?name? parentex: Bob"Bob"ex: Charlesex: Bob"Obex: DorothyThe ex: Alice from the first rclation row docs not appear in the join bccauscthere is no row in the second relation with the same value for personThe result of a left outer join(: )additionally contains all those tuplesrom the first relation that have no matching tuple in the secondA:凶Bmame? parentex: Alice"Aliceex: Bob"Bob"ex: Charlesex: Bob"BobDorothyI was unable to produce the correct symbol. The"dots are supposed to be connectedto the“ bowtieI introduce two additional operators, the conditional inner join and theconditional left outer join. They drop the combined tuples that do not matcha propositional formula. The difference between A Cip B and A D(op(B)that the proposition p can use at tributes from both A and B. The operators areused to overcome the"FILTER scope problem"discussed in section 4.41.5 UnionThe union(U)of two relations A and B is the set union of the tuples of A andB. Unlike in regular relational algebra, the headings of A and b do not needto be identical. Attributes unbound in the tuples of one relation are simply leftbound in the result relation. This is sometimes called an outer alm, ionThe heading of AU B is the set union of the headings of A and BA∪Bperson nancparentex: Alice"Alicex:Bob"Bobex: Bobex: CharlesDoroth1.6 Set DifferenceThisthis report and can not generally be expressed in SPARQL. I note that it wouldDe required for relation I completeness2 A relational definition of SPARQLThe SParQL specificatinally defines t he seics of the query languin terms of solutions. This is just a different term for an rDf tuple -a par-tial function from variables to rdF terms. The formal definitions describe theproperties that a tuple must have to be a solution to a pattern when matchedagainst a particular RDF dataset. This style of definition makes it easy to checkif a given tuple is a solution or not, but docsn't tcll how to find all solutionsThis section provides all altermative definition that produces the salle re-sults(although no proof is offered), but better informs implementation. Thisis somewhat analogous to the relational calculus and relational algebra in therelational model. Both are equivalent, but the calculus describes the propertiesof solutions, while the aIgebra builds the result from the da. a through a set, ofoperators that can be implemented reasonably straightforwardThe main part of a sParQl query is the query pattern. It is usually built outof several graph patterns, which can be combined recursively to form arbitrarilycomplex query patterns. This section provides translations for the different kindsof graph patterns into relational algebra expressions. No translation for the basicgraph pattern is given because it is just a group of triple patterns. The result oftranslating any graph pattern into a relational expression is called its patternIn the SPaRQL documents language, matching the entire query patternagainst a graph produces a solution sequence. This is the same as the patternrelation of the entire query patternAll the remaining bits of SPARQL processing, like DISTINCT, ORDER BY, andthe CONSTRUCT, DESCRIBE and ASK result forms, can then be applied to thesolution sequence, as defined in the SPARQL specificatiOn2.1 Triple patternsThe pattern relation of a triple pattern is obtained from the graph relation byconstant attribute selection on the concrete nodes and by projection/renamingOIl the variable lodes. The SPARQL triple patterN[?person foaf: name ? nametranslates to this relational operationI per son+-? subiect(o?predicate=foaf name (Triples)?ncme←? object2.2 Group patterns, OPTIONAL and FILtERSPARQL group patterns are used to build complex patterns out of simple partsThe order of the subpatterns within the group does not matter, with the ex-ception of OPTIONALs, where the order is significant because a match of oneOPTIONAL can cause a later one that would also have matched to fail. The partsof a group pattern are translated in this order1. Inner join all triple patterns, UNIONs and GRAPHs. Their relative order doesnot matter bccausc a凶B=B凶Aand凶(B凶C)=(凶B)凶C2. Then, left outer join all OPTIONAL subpatterns to the result of l, in the orderthey occur in the query, wit, h the optional subpattern becoming the rightside of the outer join. OPTIONALS must be handled with lower precedencethan the required subpatterns because otherwise, a matching optional partcould cause a required part to fail3.Finally, take each FILTER's condition and apply them as selection operatorsto the result of 2. They must be applied last because the condition canuse variables frol anywhere in the group. If there are Inlultiple FILteRsin the group, their relative order does not matter because pI(op2(R))(OpI(R)The SparQL specification is not entirely clear on this subject, but i believe thatthis is the intended behaviour of group patternsThe groupFILTER (pOPTIONAL p2).p3.Itranslates to:and p3 arc arbitrary sub-patterns in the SPARQL expressions, andtheir pattern relations in the operator treeThis translation does not work for some cases discussed in section 42.3 uNiON patternsA UNION pattern matches if any of its subpatterns matches. Its pattern relations obtained by Connecting the relations of the subpatterns in ally order with theunion operator.i p1UNION p2)translates to2.4 GRAPH and the rdf datasetThis rcport generally assumes that the SPaRQl query is evaluated against anindividual rdf graph- the default graph of an rdf dataset. This subsectionoutlines an extension that works with full rdf datasetsAll triple patterns that are inside a GRaPh pattern are extended to quadpatterns. The fourth element is the graph name provided by the GRAPH construct. It may be a variable or an IrirefInstead of the graph relation, we would have a dataset relation with afourth attribute, graph, which is bound to the graph name for triples in namedgraphs, and bound to some special value DEF AULTGRAPH for triples in the defaultgraph. Triple patterns match only when ?graph equals DEFAULTGRAPH, quadpatterns only if the fourth element matches graphThe patternGRAPH ?g p1would translate to丌g- graph( Datase)凶plThis account s for patterns like thisGRAPH? g OPTIONAL ?a: b: c))Somewhat unintuitively, this pattern binds g to all graph names in thedataset even if the optional part never matchesBlank nodes in an rdf dataset can never be shared between graphs. If labelsare used to identify blank nodes, then relabelling might be necessary when graphsarc addcd to the datasct3 Mapping to SQLThis section outlines a translation from the relational algebra into SQL state-ments. Some parts will be left a bit vague to keep the translation independentfrom the database layout used to represent RDf triples and nodes. Also, the fo-cus is not to produce optimized SQL, but rather to keep the translation straightforwardThe recursive nature of SPARQL's graph patterns requires some form osting in the generated SQL. An implementation may (and probably should)use a mix of different sQL language features to construct nested queriesTemporary tables: Intermediate results can be stored in temporary tablesThis approach is the most flexible. The temporary tables can be processedwith non-database code or used multiple times in different parts of thebigger query. It also produces quite manageable SQL statements becauseone big SPARQL query translates into several small sQL queries. In generalthe result of processing any subtree can be stored in a temporary table. thisapproach is used in 3Store 3Nested SELECTs: This is the approach used here. Bracketed SELECT statementsare used in the FROM clause of sQL queries. The translation is quite straightforward since the resulting queries closely mirror the structure of the relational algebra cxprcssion. Unfortunately, they arc also vory unwieldy andunlikely to be executed with good performance. At least some "fatteningis advisableBracketed JOINs: Several aliases of the triple table are connected by JDINsand LEFT JOINS Brackets around the joins ensure the right order. This ap-proach produces concise SQL statements, and informal testing with MySQLindicates that it produces the best performance. An algorithm for translat-ing into this form is more complex because column names from the basetables are carried a ll the way up through the levels of nesting and Whereconditions of lower levels must be put into the parents JOIN condition. Theunion operator can not be translated with this approach. Bracketed JOINsare not supported in MySQL prior to version 5The rest of the section provides translations for the relational operators3 This allows support for expressions that cannot be evaluated inside the databaselike extension functions or certain forms of regular expressions
下载地址
用户评论