Last Update: 7/27/03
There are a number of schema languages that have been developed to extend the constructs and allow for additional validity constraints, beyond those provided by DTDs, to be placed XML document instances. The schema language that has gained the most popularity is the W3C XML Schema Language, developed by the World Wide Web Consortium (W3C). The examples in this short tutorial discuss the W3C XML Schema language. It has seen the most use in data-centric applications, such as:
The best general place to start searching for information on XML Schema and their applications is the XML Coverpages. For additional information, see the final section of this document. The examples in this document use the W3C XML Schema language.
XML Schemas and DTDs both support:
XML Schemas offer support for the following constructs not available in DTDs:
It is interesting to note that other than the five general entites defined by the XML 1.0 specification, XML schema languages do not support any of the entity types that one commonly uses in a DTD. In document instances constructed using a schema unicode decimal or hexadecimal values should be used to include characters that would have typically been escaped using a general entities defined for characters that exist outside of ascii. The parameter entities that one uses to represent particular attribute or element classes in a DTD are replaced with more powerful constructs schema languages provide for the indirection and abstraction of a particular markup language defined in XML. We'll go over how the W3C XML Schema handles these concepts.
W3C XML Schemas take the file extension .xsd. An instance can be associated with a schema that exists in a particular namespace defined by the schema or against locally stored schema files. As schema declarations for namespaces is complicated and cumbersome, and also unreliable because one must always have net access to validate against the namespace schema, I'll discuss only how to associate a schema with a particular .xsd file. Consult XML in a Nutshell or the W3C site for information on how to declare a namespace for your schema.
One associates a document instance with a schema by declaring a special attribute in the root element of a document instance. To associate an instance with a particular schema, and not a namespace, use the xsi:noSchemaNamespaceLocation attribute, as in the document instance below:
<?xml version="1.0" encoding="UTF-8"?>
<greeting xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="greeting.xsd" lang="en">
<message type="AM">Hello Schema World</message>
<date>10-12-2002</date>
</greeting>
To understand W3C XML Schema one must learn the concepts of complex and simple types:
<?xml version="1.0" encoding="UTF-8"?>
<-- element greeting is a complex type -->
<greeting xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="greeting.xsd" lang="en"> <-- Attribute lang is a simple type -->
<message type="AM">Hello Schema World</message>
<date>10-12-2002</date>
</greeting>
An important feature most schema languages provide is the ability to place stronger restrictions than those provided by DTDs on the actual content that can occur within a particular element or in an attribute value. The W3C XML Schema defines a number of built-in simple datatypes that can be used to restrict the contents of elements and attribute values, see the W3C Schema specification for a full list of all built-in simple types. These built-types can also be be extended through inheritence to define one's own custom simple types. The following short example illustrates the use of simple types, complex types, and the creation of one's own custom simple types.
The declaration below decribes an attribute, "lang", whose value can contain only the built-in simple type language, the legal value of which corresponds to any language abbreviation defined by an ISO standard for language codes.
<xs:attribute name="lang" use="required" type="xs:language"/>
The following declaration creates a new simple type derived from a W3C built-in simple type, string. The date simple type defined below extends the built-in type using the concept of inheritence. W3C Schema provides the ability to restrict string content to a particular pattern defined in a regular expression like those available for use in most programming languages. This date custom simple type uses this facility to restrict the contents of that type to the form of a common notation for a day of the year. We'll talk more about inheritence in example two.
<!-- define the date pattern simple type -->
<xs:simpleType name="date">
<xs:restriction base="xs:string">
<xs:pattern value="\d\d-\d\d-\d\d\d\d"/>
</xs:restriction>
</xs:simpleType>
The declaration of the element <date> whose type is the custom simple type date is as follows:
<xs:element name="date" type="date" />
The declaration for the complex type that can appear within an element's content model must occur within the declaration for that element. This declaration can take two forms: (1) a nested <complexType> element is present containing the definition of the complext type, this can be seen in the greeting.xsd example, or (2) the type attribute of the <xs:element> element that declares the name of the element must reference the name of a complex type declared at some other point in the schema, examples of (2) can be found in the directory.xsd example. The declaration on the complex type that occurs within the element <message> of our schema for a greeting is an example of this.
<xs:element name="message">
<xs:complexType mixed="true">
<xs:attribute name="type" use="required" type="timeOfDay" />
</xs:complexType>
</xs:element>
Both the example schemas discussed in this document, greeting.xsd and directory.xsd, contain several examples of the definition and usage of simple and complex types in W3C XML Schema.
To parse a document instance associated with a W3C XML Schema on the classroom servers try:
classrm01:~/public_html/demos/schema 625 $ java dom.Writer -v -s greeting.xml
This command calls a validating schema parser provided by the apache software project, Xerces. If your file is valid you should receive output that represents a formatted version of your document instance.
The second schema example demonstrates W3C XML Schema's constructs that use abstraction to allow one to define custom reusable complex types, and how one can use inheritance with these types to simplify the organization of schema.
This example is a schema for a company directory. It defines one abstract complex type, Employee. An abstract type cannot actually be used to define a type, it's only purpose is to serve as a model from which other complex types can be derived. The Employee abstract type is serves as the parent type of four other complex types, ChiefExecutive, Manager, SalaryWorker, and HourlyWorker. These four types are then used to denote the type of the elements that occur within the root element <directory>.
Types can be derived using in W3C XML Schema using the <xs:extension element>. The base attribute of <xs:extension> denotes the complex type that is the parent of current complex type. The following example shows the abstract complex type, Employee, and a complex type derived from it, Chief Executive.
<!-- employee complex type abstract type -->
<xs:complexType name="Employee" abstract="true">
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="email" type="xs:anyURI" />
<xs:element name="homepage" type="xs:anyURI" />
<xs:element name="bdate" type="date" />
<xs:element name="dept" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
<!--
the extension element indicates the of use inheritance to create a new type based on
an existing simple or complex type
-->
<!-- create type ChiefExecutive -->
<xs:complexType name="ChiefExecutive">
<xs:complexContent>
<xs:extension base="Employee"> <-- ChiefExecutive is derived from the abstract type Employee -->
<xs:sequence>
<xs:element name="compensation" type="xs:double" />
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Next complex type, ChiefExecutive, and the other 3 complex types derived from abstract type Employee, are actually associated with particular elements in the schema in the declaration for the complex type of the root <directory> element. As you can see each child element of <directory> takes the form of one of the complex types inherited from Employee.
<xs:element name="directory">
<xs:complexType>
<xs:sequence>
<xs:element name="ceo" type="ChiefExecutive" minOccurs="1" />
<xs:element name="manager" type="Manager" maxOccurs="unbounded" />
<xs:element name="salaryworker" type="SalaryWorker" maxOccurs="unbounded" />
<xs:element name="hourlyworker" type="HourlyWorker" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
In looking at the complex types declared in greeting.xsd, you may notice that both the Manager, and SalaryWorker complex types contain declarations for an element called <salary>. This shows that the simple type that can appear in a particular element can vary in a schema, the simple type of <salary> in Manager is float and in SalaryWorker the simple type is int. This is a small example of a construct that could not be possible using DTDs.
For starters, visit XML Coverpages section on XML Schema, then try some of the follwoing: