What is XML?
XML (eXtensible Markup Language) is a method for defining structure in documents. The philosophy behind XML is that the information (text, images, other parts) of a document can be identified through a set of rules. With these rules, a variety of software applications (like Web browsers) can interpret, display, or process data in documents.
XML is a subset 1 of Standard Generalized Markup Language (SGML)2 a language used to define markup languages (one such language defined by SGML is HTML). XML, similar to HTML, was created to specifically address the issue of writing documents for the Web. And as in HTML, XML authors use elements bracketed by open and close tags. But unlike HTML, XML authors are not stuck with a fixed set of elements and entities. With XML, you can create your own elements, entities, and structural relationships to be used in your documents.
When you write documents in XML, you write text files containing elements and entities that markup your content. But an XML document by itself can not be displayed in a Web browser. You need to identify a Document Type Definition (DTD) for the document that defines the relationships of your XML elements and entities as well as a specification for how the information in the document should appear in a browser.
The DTD defines what is called a "grammar" for your documents--the list of elements and entities as well as their structural relationships. You can reference DTDs that exist somewhere on the Web, or you can write the information that would be in the DTD right in your XML document.
When a user employs some software application (such as a Web browser) to view your XML documents, the elements and entities that are defined in the referenced DTDs are interpreted and displayed according to the behavior of a parser and the user's client software. You can define the look and feel of how the client software should display your document using a style sheet. Thus, the document's contents, the logical structure and element types possible in the document, and the appearance of the document, are all defined separately.
How to Put XML to Use?
You might wonder, though, why even use XML? It is a subset of SGML, so why not just use SGML? Well, SGML itself is fairly complex and contains many features that someone writing Web documents would probably not use. XML is a subset of SGML that is easier to use and customized to the kinds of information-sharing that occurs on the Web.
Using XML, you will be able to define your own elements. This gives you a chance to create logical structure in your documents that is customized to your specific situation. This is helpful, because logical structures are often very context-dependent. For example, my concept of what a unit of thought is might not conform to the linear list of P elements available in HTML. Instead, I might want to have elements like SCENE, IMAGE, or PERSON in my documents to define thought structures that are in complex relationship with each other.
XML's flexibility in defining the logical structure of documents makes it more useful in marking up complex databases or exchanging data among heterogeneous databases. With XML identifying the logical relationship among elements in a document, client applications can then be "smart" about what to do with them. Applications can thus be written which can dynamically provide different "views" of a document or exchange information about a document with other applications.
Fans of "artificial agents" will be happy to know XML might benefit automated software programs that search the Web. Today, the lack of logical structure in many HTML documents has transformed the Web into a mix of text, full of elements and tags juxtaposed to look "cool," but often signifying nothing. Agent software can make use of industry-specific DTDs that convey meaning about the elements in a document.
XML also has the potential to improve hypertext as it is used on the Web. With HTML, you have the A element to create a one-way hard-coded link to another Net resource. With XML, the potential will exist to expand the kinds linking possible to include other relationships, such as bidirectional.
The style sheets used with XML have been defined by Cascading Sytle Sheets (CSS)3, but the emphasis on style sheets for XML is on the emerging standard Extensible Stylesheet Language (XSL)4,5.
XML can be used to define application-specific DTDs that can then be shared by people working in the same field. Industry groups should use XML to define the logical structures that they would like to share in their documents. A DTD, for example, could be created by bibliographers to identify of all the logical parts that could go into a bibliographic citation. Bibliographers than then create lists in the special bibliographical format that each publication requires.
My opinion of XML?
XML helps address the issue of meaning in Web documents, something which has really been lost in the previous HTML standards and practices. XML rests on more than a decade of experience in SGML and therefore is a stable, international standard for which many tools already exist. Investing in XML development is a good idea and will benefit developers who want to create more meaningful Web documents.
Will XML eliminate the kind of diddling that HTML authors currently do to get their pages to "look right"? Probably not. Humans are not entirely logical, consistent, or complete. For databases, fixed information, and shared information, XML is wonderful. For expressive or personal documents, I think users will continue to enjoy fusing meaning and form in ways that defy logic. But XML is an improvement over what we have now with HTML as the dominant language of the Web. XML enables authors to separate style from substance, regardless if they want to play with the two at the same time or not.
- World Wide Web Consortium, Technical Report, W3C Recommendation 10-February-1998, "Extensible Markup Language (XML) 1.0," http://www.w3.org/TR/REC-xml.
- International Organization for Standardization, "Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)," ISO 8879:1986.
- World Wide Web Consortium, Technical Report, W3C Recommendation 12-May-1998, "Cascading Style Sheets, level 2 CSS2 Specification," http://www.w3.org/TR/REC-CSS2/
- World Wide Web Consortium, Technical Report, W3C Working Draft 27 March 2000, "Extensible Stylesheet Language (XSL) Version 1.0," http://www.w3.org/TR/xsl/
- Bosak, Jon, "Four Myths about XML," IEEE Computer (Vol. 31, No. 10, October 1998, pp. 120-122).
Information Sources for XML
- XML Cover pages: This is an extensive collection of information on XML and related technologies hosted by OASIS (Organization for the Advancement of Structured Information Standards). OASIS is a nonprofit, international consortium dedicated to accelerating the adoption of product-independent formats based on public standards. This site includes explanations of related open technologies including XSL, XSLT, XPath, XLink, XPointer, HyTime, DSSSL, CSS, SPDL, CGM, ISO-HTML, and others. This site contains very extensive information on XML standards, publications, software, support, events, and online sources of information.
- W3C Extensible Markup Language (XML) Information: This is a set of information on XML provided by the World Wide Web Consortium (W3C). Here, you'll find links to working drafts of the XML specifications and reference information about the role of the W3C in XML use and development.
- December Communications, Inc. presentation chart on XML : This is the entry point to a set of charts that I use in presentations discussing XML. Includes some example syntax.