Be warned that really understanding this document will require a good knowledge of XML and XSLT, although I have tried to make the explanation and examples as clear as possible.
Kevin has been experimenting with using XSLT to format simple XML output from Movable Type into a complete web page. Originally, he included the data for each weblog entry in a CDATA section containing literal XHTML, in much the same way as many RSS feeds. See the following example, reformatted for clarity:
<entry> <title>entry with images</title> <date>August 09, 2003</date> <author>Kevin</author> <idnum>000033</idnum> <permalink>http://alazanto.org/xml/archives/000033.xml</permalink> <body xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA[<p><img class="archive" align="right" src="http://alazanto.org/images/sample.jpg" alt="photograph of a flower, just for show"/>Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit. </p>]]> </body> <more xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA]></more> <comment-link>http://alazanto.org/xml/archives/000033_comments.xml</comment-link> <comment-count>6</comment-count> </entry>
The XML CDATA markup indicates that the data between
]]> should not be
interpreted as XML with elements and entity references resolved.
Instead, the data is included as a literal string, exactly as
had been encoded as
& respectively. The result is a DOM tree like
Note that in this DOM, the child text node of the
<body> element is just a string, with
no special meaning to an XML parser or an XSLT processor, even if it
looks to you like a paragraph from an XHTML document.
We can write fairly simple XSLT templates to turn this XML into
XHTML for the browser. To include the literal XHTML in the result, we
can try the
disable-output-escaping attribute, with a template
something like this:
<xsl:template match="entry"> <div class="entry"> <h2><xsl:value-of select="title"/></h2> <xsl:value-of select="body" disable-output-escaping="yes"/> </div> </xsl:template>
disable-output-escaping attribute, the
string value of the
<body> element would be written
to the output so that it could be read in again by another XML parser.
In other words, each
< would be escaped as
&, and each
When processed in Internet Explorer, or a stand-alone XSLT
disable-output-escaping attribute disables
this escaping step, so that the text child of the
<body> node is included literally in the output
file as shown below, which is what Kevin expected:
<div class="entry"> <h2>entry with images</h2> <p><img class="archive" align="right" src="http://alazanto.org/images/sample.jpg" alt="photograph of a flower, just for show"/>Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit... </p> </div>
The problem occurs when trying to use the same templates in
Mozilla. The Mozilla XSLT processor doesn’t support
disable-output-escaping, since it transforms directly
from the source DOM to a destination DOM tree, without an output step
in which to disable escaping. The DOM that Mozilla constructs is
quite predictable, but not what Kevin wanted:
This means that Mozilla displays the markup to the user, complete with <p> and <img> tags, instead of the paragraph text with a floating image. Mozilla bug 98168 is about this behaviour, and comment 11 states quite clearly that it is expected and will not be changed.
The solution for Kevin is to create the original XML file without enclosing the paragraph in a CDATA section, making the image and paragraph tags real elements in the source XML DOM, so that they can be copied directly to the destination XHTML DOM. This small change to the source XML gives us a very different source DOM tree:
With this input, the XSLT to copy the nodes can be just as simple,
xsl:copy-of to copy all the elements under
<body> element, but not the element
<xsl:template match="entry"> <div class="entry"> <h2><xsl:value-of select="title"/></h2> <xsl:copy-of select="body/*"/> </div> </xsl:template>
By copying elements instead of literal XHTML source code, Mozilla displays the page from the correct DOM tree, and it works just as well with Internet Explorer and external XSLT processors. The result looks very similar to the input DOM:
You can see the result of this in Kevin’s example XML weblog, in any web browser that supports XSLT.
(Finally, thanks to Kevin for using the
<xsl:copy-of> element, which I’d managed to miss in
four years of reading the XSLT spec.)