Introduction to Database Design
Xquery, DTD, and XML Schema
Rasmus Pagh
2
Introduction to Database Design
Today’s lecture
XML tools, part 2:
Xquery
Schema languages:
DTD
XML Schema
3
Introduction to Database Design
An Introduction to XML and Web Technologies
Querying XML Documents
with XQuery
following slides based on slides by
Anders Møller & Michael I. Schwartzbach
© 2006 Addison-Wesley
Introduction to Database Design
5
XQuery 1.0
XML documents naturally generalize
database relations
XQuery is the corresponding generalization
of SQL
Introduction to Database Design
6
From Relations to Trees
Introduction to Database Design
7
Trees Are Not Relations
Not all trees satisfy the previous
characterization
Also, XML trees are ordered, while both
rows and columns of tables may be
permuted without changing the meaning of
the data
Introduction to Database Design
8
Relationship to XPath
XQuery 1.0 is a strict superset of XPath
2.0
Every XPath 2.0 expression is directly
an XQuery 1.0 expression (a query)
The extra expressive power is the
ability to
join information from different sources and
generate new XML fragments
Main construct: FLWOR expression
conceptually similar to select-from-where
syntax similar to imperative language
Introduction to Database Design
9
FLWOR Example
<doubles>
{ for $s in fn:doc("students.xml")//student
let $m := $s/major
where fn:count($m) ge 2
order by $s/@id
return <double>
{ $s/name/text() }
</double>
}
</doubles>
Introduction to Database Design
10
XML Expressions
XQuery expressions may compute new XML
nodes
Expressions may denote element, character
data, comment, and processing instruction
nodes
Each node is created with a unique node
identity
Introduction to Database Design
11
Direct Constructors
Uses the standard XML syntax
The expression
<foo><bar/>baz</foo>
evaluates to the given XML fragment
Identity:
<foo/> is <foo/>
evaluates to false
Introduction to Database Design
12
The Difference Between For and
Let (1/4)
for $x in (1, 2, 3, 4)
let $y := ("a", "b", "c")
return ($x, $y)
1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c
Introduction to Database Design
13
The Difference Between For and
Let (2/4)
let $x := (1, 2, 3, 4)
for $y in ("a", "b", "c")
return ($x, $y)
1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c
Introduction to Database Design
14
The Difference Between For and
Let (3/4)
for $x in (1, 2, 3, 4)
for $y in ("a", "b", "c")
return ($x, $y)
1, a, 1, b, 1, c, 2, a, 2, b, 2, c,
3, a, 3, b, 3, c, 4, a, 4, b, 4, c
Introduction to Database Design
15
The Difference Between For and
Let (4/4)
let $x := (1, 2, 3, 4)
let $y := ("a", "b", "c")
return ($x, $y)
1, 2, 3, 4, a, b, c
Introduction to Database Design
16
Computing Joins
Join is implemented as nested loops
But not necessarily executed that way!
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";
for $r in fn:doc("recipes.xml")//rcp:recipe
for $i in $r//rcp:ingredient/@name
for $s in fn:doc("fridge.xml")//stuff[text()=$i]
return $r/rcp:title/text()
<fridge>
<stuff>eggs</stuff>
<stuff>olive oil</stuff>
<stuff>ketchup</stuff>
<stuff>unrecognizable moldy thing</stuff>
</fridge>
Introduction to Database Design
17
Example: Inverting a Relation
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";
<ingredients>
{ for $i in distinct-values(
fn:doc("recipes.xml")//rcp:ingredient/@name)
return <ingredient name="{$i}">
{ for $r in fn:doc("recipes.xml")//rcp:recipe
where $r//rcp:ingredient[@name=$i]
return <title>{$r/rcp:title/text()}</title>
}
</ingredient>
}
</ingredients>
order by $i
Introduction to Database Design
Semantics of FLWOR
let $a := <e>: Assign a value to local
variable $a, given by expression <e>.
for $t in <e> <b>: Iterate through
the list given by <e>, binding $t to
each item and executing <b> to build
output list.
where <p>: If predicate <p> is not
satisfied, go to next binding in for.
return <e>: Add <e> to output list of
enclosing expression.
order by $b: Order output list by $b.
18
Introduction to Database Design
19
Summary
XML trees generalize relational tables
XQuery similarly generalizes SQL
Next week: XSLT
XQuery and XSLT have roughly the same
expressive power
Suited for different application domains:
Xquery is created for querying
XSLT is targeted at presentation/transformation
Introduction to Database Design
An Introduction to XML and Web Technologies
Schema Languages
following slides based on slides by
Anders Møller & Michael I. Schwartzbach
© 2006 Addison-Wesley
Introduction to Database Design
21
Next
The purpose of using schemas
The schema languages DTD and XML
Schema
Regular expressions – a commonly used
formalism in schema languages
Introduction to Database Design
22
Motivation
We have seen a Recipe Markup Language
...but so far only informally described
its syntax
How can we make tools that check that
an XML document is a syntactically
correct Recipe Markup Language
document (and thus meaningful)?
Implementing a specialized validation tool
for Recipe Markup Language is not the
solution...
Introduction to Database Design
23
XML Languages
XML language:
a set of XML documents with some semantics
schema:
a formal definition of the syntax of an XML
language (not its semantics)
schema language:
a notation for writing schemas
Introduction to Database Design
24
Validation
instance
document
schema
processor
schema
valid
invalid
normalized
instance
document
error
message
Introduction to Database Design
25
Why use Schemas?
Formal but human-readable
descriptions
basis for writing programs that read files
from the markup language
Data validation can be performed with
existing schema processors
Introduction to Database Design
26
Regular Expressions
Commonly used in schema languages
to describe sequences of
characters or elements
Σ: an alphabet (e.g Unicode characters or
element names)
Regular expressions are recursively defined:
σ matches the character σ∈Σ
α? matches zero or one α
α* matches zero or more α’s
α+ matches one or more α’s
α β matches any concatenation of an α and a β
α | β matches the union of α and β
Introduction to Database Design
27
Examples
A regular expression describing integers:
0|-?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
A regular expression describing the valid
contents of table elements in XHTML:
caption? (col*|colgroup*) thead? tfoot? (tbody+ | tr+)
Introduction to Database Design
28
DTD – Document Type Definition
Specified as an integral part of XML 1.0
A starting point for development of more
expressive schema languages
Considers elements, attributes, and character
data – processing instructions and comments
are mostly ignored
Introduction to Database Design
29
Document Type Declarations
Associates a DTD schema with the instance
document
Example:
<?xml version="1.1"?>
<!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/
recipes.dtd">
<collection>
...
</collection>
Introduction to Database Design
30
Element Declarations
<!ELEMENT element-name content-model >
Content models:
EMPTY
ANY
mixed content: (#PCDATA|e
1
|e
2
|...|e
n
)*
element content: regular expression over element
names (concatenation is written with “,”)
Example:
<!ELEMENT table
(caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) >
Introduction to Database Design
31
Attribute-List Declarations
<!ATTLIST element-name attribute-
definitions >
Each attribute definition consists of
an attribute name
an attribute type
a default declaration
Example:
<!ATTLIST input maxlength CDATA #IMPLIED
tabindex CDATA #IMPLIED>
Introduction to Database Design
32
Attribute Types
CDATA: any value
enumeration: (s
1
|s
2
|...|s
n
)
ID: must have unique value
IDREF (/ IDREFS): must match some ID
attribute(s)
...
Examples:
<!ATTLIST p align (left|center|right|justify)
#IMPLIED>
<!ATTLIST recipe id ID #IMPLIED>
<!ATTLIST related ref IDREF #IMPLIED>
Introduction to Database Design
33
Attribute Default Declarations
#REQUIRED
#IMPLIED (= optional)
value (= optional, but default provided)
#FIXED ”value (= required, must have this
value)
Example:
<!ATTLIST form
action CDATA #REQUIRED
onsubmit CDATA #IMPLIED
method (get|post) "get"
enctype CDATA "application/x-www-form-urlencoded" >
Introduction to Database Design
34
RecipeML with DTD (1/2)
<!ELEMENT collection (description,recipe*)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT recipe
(title,date,ingredient*,preparation,comment?,
nutrition,related*)>
<!ATTLIST recipe id ID #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT ingredient (ingredient*,preparation)?>
<!ATTLIST ingredient name CDATA #REQUIRED
amount CDATA #IMPLIED
unit CDATA #IMPLIED>
Introduction to Database Design
35
RecipeML with DTD (2/2)
<!ELEMENT preparation (step*)>
<!ELEMENT step (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
<!ELEMENT nutrition EMPTY>
<!ATTLIST nutrition calories CDATA #REQUIRED
carbohydrates CDATA #REQUIRED
fat CDATA #REQUIRED
protein CDATA #REQUIRED
alcohol CDATA #IMPLIED>
<!ELEMENT related EMPTY>
<!ATTLIST related ref IDREF #REQUIRED>
Introduction to Database Design
36
Some limitations of DTD
1. Cannot constrain character data
2. Specification of attribute values is too limited
3. Character data cannot be combined with the
regular expression content model
4. The support for modularity, reuse, and evolution is
too primitive
5. No support for namespaces
XML Schema is a newer schema language
with fewer limitations.
Introduction to Database Design
37
XML Schema example (1/3)
<b:card xmlns:b="http://businesscard.org">
<b:name>John Doe</b:name>
<b:title>CEO, Widget Inc.</b:title>
<b:email>[email protected]</b:email>
<b:phone>(202) 555-1414</b:phone>
<b:logo b:uri="widget.gif"/>
</b:card>
Instance document:
Introduction to Database Design
38
XML Schema example (2/3)
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:b="http://businesscard.org"
targetNamespace="http://businesscard.org">
<element name="card" type="b:card_type"/>
<element name="name" type="string"/>
<element name="title" type="string"/>
<element name="email" type="string"/>
<element name="phone" type="string"/>
<element name="logo" type="b:logo_type"/>
<attribute name="uri" type="anyURI"/>
Schema:
Introduction to Database Design
39
XML Schema example (3/3)
<complexType name="card_type">
<sequence>
<element ref="b:name"/>
<element ref="b:title"/>
<element ref="b:email"/>
<element ref="b:phone" minOccurs="0"/>
<element ref="b:logo" minOccurs="0"/>
</sequence>
</complexType>
<complexType name="logo_type">
<attribute ref=“b:uri" use="required"/>
</complexType>
</schema>
Introduction to Database Design
40
XML Schema Types and Declarations
Simple type definition:
defines a family of Unicode text strings
Complex type definition:
defines a content and attribute model
Element declaration:
associates an element name with a simple or
complex type
Attribute declaration:
associates an attribute name with a simple type
Introduction to Database Design
41
Element and Attribute Declarations
Examples:
<element name="serialnumber"
type="nonNegativeInteger"/>
<attribute name=”alcohol"
type=”r:percentage"/>
Introduction to Database Design
42
Derived simple types
<simpleType name="score_from_0_to_100">
<restriction base="integer">
<minInclusive value="0"/>
<maxInclusive value="100"/>
</restriction>
</simpleType>
<simpleType name="percentage">
<restriction base="string">
<pattern value="([0-9]|[1-9][0-9]|
100)%"/>
</restriction>
</simpleType>
regular expression
Introduction to Database Design
43
Simple Type Derivation – Union
<simpleType name="boolean_or_decimal">
<union>
<simpleType>
<restriction base="boolean"/>
</simpleType>
<simpleType>
<restriction base="decimal"/>
</simpleType>
</union>
</simpleType>
Introduction to Database Design
44
Complex Types
Content models as regular expressions:
Element reference <element ref=”name”/>
Concatenation <sequence> ... </sequence>
Union <choice> ... </choice>
All <all> ... </all>
Element wildcard: <any namespace=”...
processContents=...”/>
Attribute reference: <attribute ref=”...”/>
Attribute wildcard: <anyAttribute namespace=”...
processContents=...”/>
Cardinalities: minOccurs, maxOccurs, use=“required”
Mixed content: mixed=”true”
Introduction to Database Design
45
Example
<element name="order" type="n:order_type"/>
<complexType name="order_type" mixed="true">
<choice>
<element ref="n:address"/>
<sequence>
<element ref="n:email"
minOccurs="0"
maxOccurs="unbounded"/>
<element ref="n:phone"/>
</sequence>
</choice>
<attribute ref=”n:id" use="required"/>
</complexType>
Introduction to Database Design
46
Global vs. Local Descriptions
Global (toplevel) style:
<element name="card“
type="b:card_type"/>
<element name="name
type="string"/>
<complexType name="card_type">
<sequence>
<element ref="b:name"/>
...
</sequence>
</complexType>
Local (inlined) style:
<element name="card">
<complexType>
<sequence>
<element name="name"
type="string"/>
...
</sequence>
</complexType>
</element>
inlined
Introduction to Database Design
47
Requirements to Complex Types
Two element declarations that have the same
name and appear in the same complex type must
have identical types
<complexType name=”some_type">
<choice>
<element name=”foo" type=”string"/>
<element name=”foo" type=”integer"/>
</choice>
</complexType>
This requirement makes efficient implementation easier
all can only contain element (e.g. not sequence)
Introduction to Database Design
48
Uniqueness, Keys, References
<element name="w:widget" xmlns:w="http://www.widget.org">
<complexType>
...
</complexType>
<key name="my_widget_key">
<selector xpath="w:components/w:part"/>
<field xpath="@manufacturer"/>
<field xpath="w:info/@productid"/>
</key>
<keyref name="annotation_references"
refer="w:my_widget_key">
<selector xpath=".//w:annotation"/>
<field xpath="@manu"/>
<field xpath="@prod"/>
</keyref>
</element>
unique: as key, but fields may be absent
in every widget, each part must have
unique (manufacturer, productid)
in every widget, for each annotation,
(manu, prod) must match a my_widget_key
only a “downward”
subset of XPath is used
Introduction to Database Design
49
RecipeML with XML Schema
(1/5)
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:r="http://www.brics.dk/ixwt/recipes"
targetNamespace="http://www.brics.dk/ixwt/recipes"
elementFormDefault="qualified">
<element name="collection">
<complexType>
<sequence>
<element name="description" type="string"/>
<element ref="r:recipe" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<unique name="recipe-id-uniqueness">
<selector xpath=".//r:recipe"/>
<field xpath="@id"/>
</unique>
<keyref name="recipe-references" refer="r:recipe-id-uniqueness">
<selector xpath=".//r:related"/>
<field xpath="@ref"/>
</keyref>
</element>
Introduction to Database Design
50
RecipeML with XML Schema
(2/5)
<element name="recipe">
<complexType>
<sequence>
<element name="title" type="string"/>
<element name="date" type="string"/>
<element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/>
<element ref="r:preparation"/>
<element name="comment" type="string" minOccurs="0"/>
<element ref="r:nutrition"/>
<element ref="r:related" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
<attribute name="id" type="NMTOKEN"/>
</complexType>
</element>
Introduction to Database Design
51
RecipeML with XML Schema
(3/5)
<element name="ingredient">
<complexType>
<sequence minOccurs="0">
<element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/>
<element ref="r:preparation"/>
</sequence>
<attribute name="name" use="required"/>
<attribute name="amount" use="optional">
<simpleType>
<union>
<simpleType>
<restriction base="r:nonNegativeDecimal"/>
</simpleType>
<simpleType>
<restriction base="string">
<enumeration value="*"/>
</restriction>
</simpleType>
</union>
</simpleType>
</attribute>
<attribute name="unit" use="optional"/>
</complexType>
</element>
Introduction to Database Design
52
RecipeML with XML Schema
(4/5)
<element name="preparation">
<complexType>
<sequence>
<element name="step" type="string“ minOccurs="0“ maxOccurs="unbounded"/
>
</sequence>
</complexType>
</element>
<element name="nutrition">
<complexType>
<attribute name="calories" type="r:nonNegativeDecimal“ use="required"/>
<attribute name="protein" type="r:percentage" use="required"/>
<attribute name="carbohydrates" type="r:percentage" use="required"/>
<attribute name="fat" type="r:percentage" use="required"/>
<attribute name="alcohol" type="r:percentage" use="optional"/>
</complexType>
</element>
<element name="related">
<complexType>
<attribute name="ref" type="NMTOKEN" use="required"/>
</complexType>
</element>
Introduction to Database Design
53
RecipeML with XML Schema
(5/5)
<simpleType name="nonNegativeDecimal">
<restriction base="decimal">
<minInclusive value="0"/>
</restriction>
</simpleType>
<simpleType name="percentage">
<restriction base="string">
<pattern value="([0-9]|[1-9][0-9]|100)%"/>
</restriction>
</simpleType>
</schema>
Introduction to Database Design
54
Strengths of XML Schema
Namespace support
Data types (built-in and
derivation)
Modularization
Type derivation mechanism
Introduction to Database Design
55
RELAX NG
OASIS + ISO competitor to XML
Schema
Designed for simplicity and
expressiveness, solid mathematical
foundation
Several other proposals, e.g. DSD2.
Introduction to Database Design
56
Summary
Schema: formal description of the
syntax of an XML language
DTD: simple schema language
elements, attributes, entities, ...
XML Schema: more advanced schema
language
element/attribute declarations
simple types, complex types, type
derivations
global vs. local descriptions
...
Introduction to Database Design
Next weeks
Only two 1-hour sessions left!
XSLT
Exam run-through (preparation: 4 hours)
Three possibilities:
A) 8-10 AM next week
B) 8-10 AM in two weeks
C) 9-10 AM next week and in two weeks
Vote: What do you prefer?
57
Introduction to Database Design
More XML Schema
The following slides give more information and
examples on XML Schema.
They are part of the course curriculum and can be
considered supplements to the course literature.
58
Introduction to Database Design
59
Simple Types – Primitive
string any Unicode string
boolean true, false, 1, 0
decimal 3.1415
float 6.02214199E23
double 42E970
dateTime 2004-09-26T16:29:00-05:00
time 16:29:00-05:00
date 2004-09-26
hexBinary 48656c6c6f0a
base64Binary SGVsbG8K
anyURI http://www.brics.dk/ixwt/
QName rcp:recipe, recipe
...
Introduction to Database Design
60
Simple Type Derivation – List
<simpleType name="integerList">
<list itemType="integer"/>
</simpleType>
matches whitespace separated lists of integers