| View Abstract/Table of Contents |
XML is often used to encode documentation for software and other technical information. Given this fact it is somewhat surprising that a general purpose documentation system for XML schemas has not been widely adopted by XML developers. This project proposes to adopt the methodology of Knuth's Literate Programming (LitProg) [Knuth 1984] for XML schema construction. LitProg systems are characterized by a single source document (a web) that contains both the prose documentation about and the source code syntax of a piece of software. The web is then processed by the LitProg environment in two steps: tangle and weave. Tangle produces source code files and weave produces typeset documentation.
This proposed LitProg system, called DSS, will consist of three layers: documentation, syntax and semantic. The semantic layer will be implemented using the BECHAMEL [Dubin, et al 2003] system. BECHAMEL represents, in machine-readable form, the document semantics of XML/SGML document instances. Document semantics are the objects that are identified by a markup language, and the properties of and relations between those objects. This new layer can potentially give application developers a higher level interface to an XML document than those provided by current models such as XPath, the DOM, or SAX.
There is a long, successful history of LitProg in the SGML/XML community. LitProg forms the basis for one of the most successful SGML/XML projects, the TEI (Text Encoding Initiative). The TEI Guidelines and DTD are both generated from the same ODD (One Document Does It All) source files. Michael Sperberg-McQueen's Sweb, based on experience in developing the ODD format, is a general purpose LitProg System that uses SGML to encode a web [Sperberg-McQueen 1996]. Interestingly the base tagset for Sweb can be any SGML document type. The user need only incorporate a few LitProg-specific tags into their chosen tag set for an Sweb.
The high-quality, structured documentation produced for the TEI by the ODD system is unique among SGML/XML applications. Though there are some other examples of detailed approaches to schema documentation (such as the online guide to Docbook), the vast of majority of applications rely upon ad hoc or loosely structured comments within the DTD or schema. Comments are often run through documentation generators to document the elements and attributes. Such filters are available in commercial XML editors such as XMLSpy and XMetal, or as command-line utilities. The reference material produced from these tools is similar to Javadoc. These approaches create documentation that, while usable as quick reference, is inadequate for those who need a detailed and thorough discussion of a content model within a schema.
During Processing well-formed XML should be input and output. This will enable the three-layered system to produce output that can be digested by conforming XML software. The only two exceptions of the XML-in/XML-out rule are the non-XML syntax of XML 1.0 DTDs, and the current Prolog syntax of BECHAMEL. Tags for wrapping DTD and Prolog Declarations will be provided in the core DSS Schema. Alternatives to this approach for the semantic layer will become available as BECHAMEL will shortly be able to serialize rules as RDF. See Figure 1 for input/output formats and processing steps. See Figure 3 for a sample DSS source document. Figure 2 shows the modular structure of the schema for DSS source documents. Figure 4 shows a fragment of the DSS XML Schema.
Figure 1Processing steps
|
Figure 2Modular DSS Schema Structure
|
An implementation
using the tools and specifications listed under headings software
and syntax is planned. Please share your comments and
suggestions. The project website is
http://www.isrl.uiuc.edu/~kmreiss/projects/xmldoc
Figure 3DSS Document Instance
|
Figure 4DSS Schema Fragment
|
Thanks to Allen Renear and David Dubin for their advice and assistance on this project.
[Coates and Rendon 2002] Anthony B. Coates and
Zarella Rendon. xmLP — a Literate
Programming Tool for XML & Text. In B.T. Usdin and
S.R. Newcomb, editors, Proceedings of Extreme Markup Languages
2002 , Montreal, Canada, August 2002. Available online at
http://xmlp.sourceforge.net/2002/extreme/index.html
[Dubin, et al 2003] D. Dubin, C. M.
Sperberg-McQueen, A. Renear, and C. Huitfeldt. A Logic Programming Environment for Document
Semantics and Inference. Literary and Linguistic
Computing , 18(1):39-47, 2003. Available online at
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180039.sgm.abs.html
[Knuth 1984] Donald Knuth. Literate Programming. The Computer Journal. 27, 2, 97-111. May 1984
[Renear et. al 2002] Allen Renear and David Dubin and
C. M. Sperberg-McQueen. Towards a
semantics for XML markup. In Proceedings of the 2002
ACM symposium on Document engineering, 119-126, 2002. ACM Press.
Available online at
http://doi.acm.org/10.1145/585058.585081.
[Sperberg-McQueen 1996] Michael
Sperberg-McQueen. SWEB: an SGML Tag Set
for Literate Programming. Available online at
http://www.w3.org/People/cmsmcq/1993/sweb.html. March,
1996
[Walsh 2002] Norman Walsh. Literate Programming in XML. Available
online at
http://nwalsh.com/docs/articles/xml2002/lp/paper.html.
Octover, 15, 2002