Home > Presentations > XML Schema Tutorial

Schemas vs. DTD
Associating a Document Instance with an XML Schema
Complex and Simple Types
Using Simple Types
Parsing
Using Complex Types and Inheritance
Where to Learn More
Project Files

Introduction to XML Schema

Kevin Reiss

Last Update: 7/27/03

Background: Why XML Schema?

There are a number of schema languages that have been developed to extend the constructs and allow for additional validity constraints, beyond those provided by DTDs, to be placed XML document instances. The schema language that has gained the most popularity is the W3C XML Schema Language, developed by the World Wide Web Consortium (W3C). The examples in this short tutorial discuss the W3C XML Schema language. It has seen the most use in data-centric applications, such as:

The best general place to start searching for information on XML Schema and their applications is the XML Coverpages. For additional information, see the final section of this document. The examples in this document use the W3C XML Schema language.

Schemas versus DTDs

Similarities

XML Schemas and DTDs both support:

  • Element Nesting
  • Element Occurrence Constraints
  • Attribute Types and Defaults

Differences

XML Schemas offer support for the following constructs not available in DTDs:

  • The ability to express your schema as a well-formed XML document instance.
  • More powerful element occurrence constraints
  • Enforcement of datatypes on element and attribute content. Ex: Element or attribute values can be restricted to contain only integer values. Errors can be generated if <age>twenty</age> appears instead of <age>20</age>.
  • The ability to define your own datatypes
  • The creation of namespace aware element and attribute declarations
  • More powerful mixed content models
  • More powerful features for abstraction/indirection. Example: W3C XML Schema's support for inheritance and the definition of user-defined simple and complex types.

XML in a Nutshell, Chapter 16

It is interesting to note that other than the five general entites defined by the XML 1.0 specification, XML schema languages do not support any of the entity types that one commonly uses in a DTD. In document instances constructed using a schema unicode decimal or hexadecimal values should be used to include characters that would have typically been escaped using a general entities defined for characters that exist outside of ascii. The parameter entities that one uses to represent particular attribute or element classes in a DTD are replaced with more powerful constructs schema languages provide for the indirection and abstraction of a particular markup language defined in XML. We'll go over how the W3C XML Schema handles these concepts.

Associating an Instance with a particular W3C XML Schema

W3C XML Schemas take the file extension .xsd. An instance can be associated with a schema that exists in a particular namespace defined by the schema or against locally stored schema files. As schema declarations for namespaces is complicated and cumbersome, and also unreliable because one must always have net access to validate against the namespace schema, I'll discuss only how to associate a schema with a particular .xsd file. Consult XML in a Nutshell or the W3C site for information on how to declare a namespace for your schema.

One associates a document instance with a schema by declaring a special attribute in the root element of a document instance. To associate an instance with a particular schema, and not a namespace, use the xsi:noSchemaNamespaceLocation attribute, as in the document instance below:

       <?xml version="1.0" encoding="UTF-8"?>
       <greeting xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:noNamespaceSchemaLocation="greeting.xsd" lang="en">
         <message type="AM">Hello Schema World</message>
         <date>10-12-2002</date>
       </greeting>
    

Complex and Simple Types

To understand W3C XML Schema one must learn the concepts of complex and simple types:

Simple Type
A simple type is an element or attribute value that can contain only a basic type. The types that the W3C Schema provides are comparable to what one can have as a variable value in a programming language, an integer, a string, a regular expression pattern, etc. The equivalent DTD construct would be any attribute value, or any element with a content model that can contain only parsed character data (#PCDATA). The attribute lang in the greeting.xml instance displayed below is an example of a simple type.
Complex Type
A complex type is an element that can include an attribute value and or has a content model that contains other elements. Complex types can can also be used to define reusable content models in a fashion similiar to way in which parameter entities are used in DTDs. Complex types can also be used for inheritance, demonstrated in example two, directory.xsd. The root element <greeting> in the greeting.xml instance displayed below is an example of a complex type.
       <?xml version="1.0" encoding="UTF-8"?>
       <-- element greeting is a complex type -->
       <greeting xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:noNamespaceSchemaLocation="greeting.xsd" lang="en">  <-- Attribute lang is a simple type -->
         <message type="AM">Hello Schema World</message>   
         <date>10-12-2002</date>
       </greeting>
    

Using a Simple Type

Simple Types

An important feature most schema languages provide is the ability to place stronger restrictions than those provided by DTDs on the actual content that can occur within a particular element or in an attribute value. The W3C XML Schema defines a number of built-in simple datatypes that can be used to restrict the contents of elements and attribute values, see the W3C Schema specification for a full list of all built-in simple types. These built-types can also be be extended through inheritence to define one's own custom simple types. The following short example illustrates the use of simple types, complex types, and the creation of one's own custom simple types.

Example One Documents:

Declaring a built-in Simple Type

The declaration below decribes an attribute, "lang", whose value can contain only the built-in simple type language, the legal value of which corresponds to any language abbreviation defined by an ISO standard for language codes.

  <xs:attribute name="lang" use="required" type="xs:language"/>
       

Declaring a custom Simple Type

The following declaration creates a new simple type derived from a W3C built-in simple type, string. The date simple type defined below extends the built-in type using the concept of inheritence. W3C Schema provides the ability to restrict string content to a particular pattern defined in a regular expression like those available for use in most programming languages. This date custom simple type uses this facility to restrict the contents of that type to the form of a common notation for a day of the year. We'll talk more about inheritence in example two.

  <!-- define the date pattern simple type -->
  <xs:simpleType name="date">
    <xs:restriction base="xs:string">
      <xs:pattern value="\d\d-\d\d-\d\d\d\d"/>
    </xs:restriction>
  </xs:simpleType>
       

The declaration of the element <date> whose type is the custom simple type date is as follows:

   <xs:element name="date" type="date" />
   

Declaring a Complex Type

The declaration for the complex type that can appear within an element's content model must occur within the declaration for that element. This declaration can take two forms: (1) a nested <complexType> element is present containing the definition of the complext type, this can be seen in the greeting.xsd example, or (2) the type attribute of the <xs:element> element that declares the name of the element must reference the name of a complex type declared at some other point in the schema, examples of (2) can be found in the directory.xsd example. The declaration on the complex type that occurs within the element <message> of our schema for a greeting is an example of this.

 <xs:element name="message">
    <xs:complexType mixed="true">
      <xs:attribute name="type" use="required" type="timeOfDay" />
    </xs:complexType>
  </xs:element>

Both the example schemas discussed in this document, greeting.xsd and directory.xsd, contain several examples of the definition and usage of simple and complex types in W3C XML Schema.

Validating with a document instance with a W3C XML Schema

To parse a document instance associated with a W3C XML Schema on the classroom servers try:

classrm01:~/public_html/demos/schema 625 $ java dom.Writer -v -s greeting.xml
         

This command calls a validating schema parser provided by the apache software project, Xerces. If your file is valid you should receive output that represents a formatted version of your document instance.

Complex Types and Inheritance

The second schema example demonstrates W3C XML Schema's constructs that use abstraction to allow one to define custom reusable complex types, and how one can use inheritance with these types to simplify the organization of schema.

Example Two Documents:

Declaring an Abstract Complex Type

This example is a schema for a company directory. It defines one abstract complex type, Employee. An abstract type cannot actually be used to define a type, it's only purpose is to serve as a model from which other complex types can be derived. The Employee abstract type is serves as the parent type of four other complex types, ChiefExecutive, Manager, SalaryWorker, and HourlyWorker. These four types are then used to denote the type of the elements that occur within the root element <directory>.

Types can be derived using in W3C XML Schema using the <xs:extension element>. The base attribute of <xs:extension> denotes the complex type that is the parent of current complex type. The following example shows the abstract complex type, Employee, and a complex type derived from it, Chief Executive.

   <!-- employee complex type abstract type -->
   <xs:complexType name="Employee" abstract="true">
     <xs:sequence>
      <xs:element name="name" type="xs:string" />
      <xs:element name="email" type="xs:anyURI" />
      <xs:element name="homepage" type="xs:anyURI" />
      <xs:element name="bdate" type="date" />
      <xs:element name="dept" type="xs:string" minOccurs="0" />
     </xs:sequence>
   </xs:complexType>
  
   <!-- 
           the extension element indicates the of use inheritance to create a new type based on
           an existing simple or complex type 
  -->

   <!-- create type ChiefExecutive -->
   <xs:complexType name="ChiefExecutive">
     <xs:complexContent>
      <xs:extension base="Employee">  <-- ChiefExecutive is derived from the abstract type Employee --> 
       <xs:sequence>
        <xs:element name="compensation" type="xs:double" />
       </xs:sequence>
      </xs:extension>
    </xs:complexContent>
   </xs:complexType>

Next complex type, ChiefExecutive, and the other 3 complex types derived from abstract type Employee, are actually associated with particular elements in the schema in the declaration for the complex type of the root <directory> element. As you can see each child element of <directory> takes the form of one of the complex types inherited from Employee.

   
   <xs:element name="directory">
    <xs:complexType>
     <xs:sequence>
      <xs:element name="ceo" type="ChiefExecutive" minOccurs="1" />
      <xs:element name="manager" type="Manager" maxOccurs="unbounded" /> 
      <xs:element name="salaryworker" type="SalaryWorker" maxOccurs="unbounded" />
      <xs:element name="hourlyworker" type="HourlyWorker" maxOccurs="unbounded" /> 
     </xs:sequence>  
    </xs:complexType>
   </xs:element> 
  

In looking at the complex types declared in greeting.xsd, you may notice that both the Manager, and SalaryWorker complex types contain declarations for an element called <salary>. This shows that the simple type that can appear in a particular element can vary in a schema, the simple type of <salary> in Manager is float and in SalaryWorker the simple type is int. This is a small example of a construct that could not be possible using DTDs.

Where to Learn More

For starters, visit XML Coverpages section on XML Schema, then try some of the follwoing:

Info on W3C Schemas

Other XML Schema Languages

Schema Tools

Sources

  1. Elliotte Rusty Harold and W. Scott Means. XML in a Nutshell: 2nd Edition published by O'Reilly & Associates (2002, ISBN: 0-596-00292-0)

Example Files