This document elaborates on my ideas about XSLT in response to Kevin Davis’s experiments with Movable Type at Alazanto.
Be warned that really understanding this document will require a good knowledge of XML and XSLT, although I have tried to make the explanation and examples as clear as possible.
The Problem
Kevin has been experimenting with using XSLT to format simple XML output from Movable Type into a complete web page. Originally, he included the data for each weblog entry in a CDATA section containing literal XHTML, in much the same way as many RSS feeds. See the following example, reformatted for clarity:
<entry>
<title>entry with images</title>
<date>August 09, 2003</date>
<author>Kevin</author>
<idnum>000033</idnum>
<permalink>http://alazanto.org/xml/archives/000033.xml</permalink>
<body xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA[<p><img
class="archive" align="right" src="http://alazanto.org/images/sample.jpg"
alt="photograph of a flower, just for show"/>Mauris felis elit, varius
quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit.
</p>]]>
</body>
<more xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA[]]></more>
<comment-link>http://alazanto.org/xml/archives/000033_comments.xml</comment-link>
<comment-count>6</comment-count>
</entry>
The XML CDATA markup indicates that the data between
<![CDATA[
and ]]>
should not be
interpreted as XML with elements and entity references resolved.
Instead, the data is included as a literal string, exactly as
if each <
, >
and &
had been encoded as <
, >
and
&
respectively. The result is a DOM tree like
the following:
<entry>
<title>
'entry with images'
<date>
'August 09, 2003'
<author>
'Kevin'
<idnum>
'000033'
<permalink>
'http://alazanto.org/xml/archives/000033.xml'
<body>
'<p><img class="archive" align="right" src="http://alazanto.org/images/sample.jpg" alt="photograph of a flower, just for show"/>Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit.</p>'
<more>
<comment-link>
'http://alazanto.org/xml/archives/000033_comments.xml'
<comment-count>
'6'
Note that in this DOM, the child text node of the
<body>
element is just a string, with
no special meaning to an XML parser or an XSLT processor, even if it
looks to you like a paragraph from an XHTML document.
We can write fairly simple XSLT templates to turn this XML into
XHTML for the browser. To include the literal XHTML in the result, we
can try the
XSLT
disable-output-escaping
attribute, with a template
something like this:
<xsl:template match="entry">
<div class="entry">
<h2><xsl:value-of select="title"/></h2>
<xsl:value-of select="body" disable-output-escaping="yes"/>
</div>
</xsl:template>
Without the disable-output-escaping
attribute, the
string value of the <body>
element would be written
to the output so that it could be read in again by another XML parser.
In other words, each <
would be escaped as
<
, each &
as
&
, and each >
as
>
.
When processed in Internet Explorer, or a stand-alone XSLT
processor, the disable-output-escaping
attribute disables
this escaping step, so that the text child of the
<body>
node is included literally in the output
file as shown below, which is what Kevin expected:
<div class="entry">
<h2>entry with images</h2>
<p><img class="archive" align="right"
src="http://alazanto.org/images/sample.jpg"
alt="photograph of a flower, just for show"/>Mauris felis elit,
varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum
pharetra elit... </p>
</div>
The problem occurs when trying to use the same templates in
Mozilla. The Mozilla XSLT processor doesn’t support
disable-output-escaping
, since it transforms directly
from the source DOM to a destination DOM tree, without an output step
in which to disable escaping. The DOM that Mozilla constructs is
quite predictable, but not what Kevin wanted:
<div>
@class='entry'
<h2>
'entry with images'
'<p><img class="archive" align="right" src="http://alazanto.org/images/sample.jpg" alt="photograph of a flower, just for show"/>Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit... </p>'
This means that Mozilla displays the markup to the user, complete with <p> and <img> tags, instead of the paragraph text with a floating image. Mozilla bug 98168 is about this behaviour, and comment 11 states quite clearly that it is expected and will not be changed.
The Solution
The solution for Kevin is to create the original XML file without enclosing the paragraph in a CDATA section, making the image and paragraph tags real elements in the source XML DOM, so that they can be copied directly to the destination XHTML DOM. This small change to the source XML gives us a very different source DOM tree:
<entry>
<title>
'entry with images'
<date>
'August 09, 2003'
<author>
'Kevin'
<idnum>
'000033'
<permalink>
'http://alazanto.org/xml/archives/000033.xml'
<body>
<p>
<img>
@class='archive'
@align='right'
@src='http://alazanto.org/images/sample.jpg'
@alt='photograph of a flower, just for show'
'Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit.'
<more>
<comment-link>
'http://alazanto.org/xml/archives/000033_comments.xml'
<comment-count>
'6'
With this input, the XSLT to copy the nodes can be just as simple,
using xsl:copy-of
to copy all the elements under
the source <body>
element, but not the element
itself:
<xsl:template match="entry"> <div class="entry"> <h2><xsl:value-of select="title"/></h2> <xsl:copy-of select="body/*"/> </div> </xsl:template>
By copying elements instead of literal XHTML source code, Mozilla displays the page from the correct DOM tree, and it works just as well with Internet Explorer and external XSLT processors. The result looks very similar to the input DOM:
<div>
@class='entry'
<h2>
'entry with images'
<p>
<img>
@class='archive'
@align='right'
@src='http://alazanto.org/images/sample.jpg'
@alt='photograph of a flower, just for show'
'Mauris felis elit, varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit.'
You can see the result of this in Kevin’s example XML weblog, in any web browser that supports XSLT.
(Finally, thanks to Kevin for using the
<xsl:copy-of>
element, which I’d managed to miss in
four years of reading the XSLT spec.)
Further Reading
- Meyer, Eric, ‘Considered Harmful Essays Considered Harmful’, 2002.
- Walsh, Norman, ‘Escaped Markup Considered Harmful’, XML.com, 2003.
- Rossney, Robert, ‘Creating CDATA sections with XSLT’, 2002.
No comments:
Post a Comment
Please be polite. I will delete any comments I wouldn’t want my mother to read.