XML Does It Your Way! Part (I)


So tell me about it!

If you visit many development-related Web sites, you're probably at least aware that XML is one of the latest Web technologies. XML (Extensible Markup Language) is used to create custom TAGS and elements using a subset of SGML, the grandfather of HTML. The tags themselves provide information about the content within the tags. XML is a great way to really separate style from data, and actually even from HTML.

An XML page is handled by the XML-capable browser, which, using an XML parser, discovers the styling format for the page by looking it up a special style sheet, or XSL. The parser then displays the XML data in a format specified by the XSL style sheet. Sound simple? Yeah, that's what I thought. But fear not, we'll walk you through the steps and point you in the right direction.

XML is similar to HTML in many ways. Essentially, XML holds the data for a page in between tags which look very similar to HTML. If you've looked at the source for an HTML document--and I must assume you have or you probably wouldn't be on this site--then you've undoubtedly seen the TAGS that surround the text which is eventually presented in the Web browser. XML also uses TAGS to surround data, only you get to "invent" some of the tags! Additionally, many of the markup tags used in HTML utilize both start tags and end tags, such as:

<B>This text is bold.</B>

XML is similar, except that ALL tags in XML have both start tags and end tags. The tags are used to define the data they contain. In XML the term "element" is used to describe the tags (both start and end tags) and the data they contain as one unit. In this case I have defined a TOPIC element:

<TOPIC>The topic of this article is XML</TOPIC>

The use of nesting tags also occurs in both HTML and XML. However, most browsers will display the correct formatting even using improperly nested HTML tags, such as:

<B><I>This is bold, italic text</B></I>

To use the tags in an XML document, they must be properly nested, like this:

<B><I>This is bold, italic text</I></B>

XML uses a nesting method which could be looked upon as logical. For instance, let's suppose as our first tag we use, "FOOD." Our next tag could be "FRUIT," and under that "ROUND." Under that, let's create "ORANGE" and "TANGERINE." Even without knowing anything about XML, you'd be likely to realize that each is nested inside the other. A fruit is food, there is round fruit, of which organges and tangerines are members. Thus, the tags would in this case nest as follows:

<FOOD>
    <FRUIT>
      <ROUND>
         <ORANGE>This is an orange.</ORANGE>
        <TANGERINE>This is a tangerine.</TANGERINE>
      </ROUND> 
    </FRUIT>
</FOOD>

(Special thanks to Ken Balderrama for pointing out more compact method of nesting XML tags.)
You could at this point add an OBLONG and a SMALL tag under FRUIT, with a WATERMELON and SQUASH tag under OBLONG, and GRAPE and PLUM tags under SMALL, etc.

While we're talking about XML structure, we're going to discuss the way XML SHOULD be developed in order to work, and the way it actually DOES work. Every valid XML document should start with a statement which tells the browser which language and version is being used:

<?XML VERSION = "1.0" ?>

After that statement, each XML document should either have its own DTD contained inline--in the same document:

<!DOCTYPE FOOD [
<!ENTITY WD "WebDeveloper">
<!ELEMENT FOOD (FRUIT)*>
<!ELEMENT FRUIT (ROUND)>
<!ELEMENT ROUND (ORANGES,TANGERINES)>
<!ELEMENT ORANGES (#PCDATA)>
<!ELEMENT TANGERINES (#PCDATA)>
]>

or it should be pointed to by the XML file:

<!DOCTYPE FOOD SYSTEM "food.dtd">

The DTD describes to the browser how to interpret the tags, which tags should be nested among which others, and may contain conditional statements, etc. Our XML example above (the FOOD nest) is well-formed XML, but it is not valid without a DTD (or a link to one).

Conveniently, Microsoft has been nice enough to include a Java XML parser within the browser, and if that wasn't enough they've also got a small ActiveX component which may be used for the same purpose. An alert reader, Steve Yost, pointed out to us that the Microsoft Java parser may be also used with Netscape Navigator, and even provided us with a Netscape example which uses the MS Java parser.

While the ActiveX parser does not utilize the DTD, the Java parser is a validating parser which can utilize the DTD if you so desire. Additionally, other third-party parsers such as LARK may be used instead of the MS parser.

From Webdeveloper.com