Find processing instructions in XSLT

To get the value of a processing instruction from the input XML document in an XSLT stylesheet, use the /processing-instruction(name) function. I use a username PI in my XML-formatted ASP responses to pass the logged-in user parameters to XSLT, for example ...
<?user Web admin (admin@something.net) {00000000-0000-0000-0000-000000000000}?>
To get this string into an XSLT parameter, I use the following XSLT instruction:
<xsl:param name="username" select="/processing-instruction('user')" />

A processing instruction can appear anywhere in the XML document. To find processing instructions that are not children of the root element, use the appropriate XPath syntax.

You have to use cells array, not DOM children to select table cells

Recently, I wanted to use DOM property firstChild to select the first cell in a TABLE row ... and got quite unexpected results, the firstChild in my case was a text node. Once I've warmed up my grey cells, it all became quite obvious. For example, in the following HTML markup ...
<table>
  <tr id='test'>
    <td>Cell #1</td>
    <td>Cell #2</td>
  </tr>
</table>
... the first child of the TR element is the whitespace between the TR tag and the first TD tag. As the whitespace is allowed in that position by the HTML standard, you cannot rely on the TR.firstChild property. You should use TR.cells[0] to select the first cell.

Introduction to HIJAX

In my Introduction to HIJAX article recently published by InformIT, you'll find a hands-on approach to HIJAX methodology explained step-by-step on a sample application.

How do you write utf-8 data from ASP?

Obviously this is too simple, so it's hard to find an explicit answer in the online documentation:

  • To select the output encoding you want to use in the ASP script, you have to set the response.codepage property or change the per-page default with the @Codepage directive. To use utf-8 encoding, set the response.codepage to 65001.
  • The HTTP Content-type header should match the encoding you're using. Set the response.charset (second part of the content-type header) to "utf-8".
  • If you use META tag in your HTML to set the content-type or encoding attribute in the xml pseudo-instruction, these have to match the HTTP encoding as well. You don't have to specify utf-8 encoding in XML, as it's the default XML encoding.
  • Instead of setting response.codepage in every response, you can set session.codepage at the start of the session.
  • Important: If you use HTML forms, you have to set the @Codepage in all ASP scripts processing the forms or change the AspCodepage metabase property, otherwise the script might misinterpret the input data.

This post is part of You've asked for it series of articles.

Which XML encoding should I use?

The short answer is utf-8. The long answer goes along these lines:

  • utf-8 is the default XML encoding specified by the XML standard. If you use it, you don't have to define the encoding manually, thus reducing the chances of introducing errors. For example, some browsers get confused if the encoding specified in the xml pseudo-instruction is not the same as the one specified in the HTTP header.
  • All XML parsers are required to recognize utf-8 encoding. By using it, you don't risk any future compatibility issues. For example, some of my applications break on some installations of Internet Explorer on Vista as they use windows-1250 encoding. They work fine within IE on Windows XP, Firefox on Vista (and sometimes even with IE on Vista).
  • By using utf-8 you'll never encounter a character you cannot encode.

This post is part of You've asked for it series of articles.

You cannot disable output escaping in Firefox

This is probably well known to all experienced XSLT users, but if this will save a few readers the headaches I've had, it's well worth writing :) In Internet Explorer you can disable output escaping when using XSLT in the browser, thus being able to generate HTML tags from the text content of XML nodes. You cannot do the same in Firefox, as Firefox (or any Gecko-based browser) uses XSLT to transform input tree directly into output (DOM) tree, whereas XSLT transformation in Internet Explorer generates an intermediate string which is later re-parsed into HTML DOM tree.

Firefox does not generate any error when encountering the disable-output-escaping attribute (it's allowed by the XSLT standard) but simply ignores it and inserts the quoted < and > characters into the output stream.

CSV format specifications

If you ever need to know specifics about Comma-Separated-Values (CSV) format, this is probably as good as you'll get. More about using XSLT to generate CSV files in an upcoming post :)

Firefox document object lacks DOM properties in XSLT-generated pages

I started working on a keyboard-shortcut library and knew (somewhere in the hidden inaccessible depths of my mind) that I had a problem with document.body property a while ago. As it's required by the HTML DOM standard, I immediately suspected that it was an IE problem ... and chased that ghost for a while, until I've realized how wrong I was.

The buggy browser was Firefox: if you create a web page with local XSLT transformation (dictated by the xml-stylesheet instruction in the XML document returned from the server) in Firefox (up to at least 2.0.0.6), everything looks normal, but document.body is null and document.forms, document.links and other similar array are empty.

To make matters more confusing, all other HTML DOM calls work (it's a different bug from this one), so I had to substitute the links array with document.getElementsByTagName("A") and the body property with the document.getElementsById("body") (and used the id attribute on the body tag).

Testing individual XSLT features

The easiest way to test individual XSLT features is as follows:


  • Install SAXON;

  • Create a small XML document with structure that represents your input data;

  • Create a small XSL stylesheet focusing on features you want to test;

  • Use xml:output method='text' in the test stylesheet to produce the test results without XML garbage;

  • Perform the tests with SAXON;

  • Test the transformation in your target environment. If you're developing browser-side transformations, insert xsl-stylesheet pseudo-instruction in your XML document and open it with Internet Explorer and Firefox.

You can see how I've used this approach in this post.

Handling unexpected children in xsl:apply-templates

When you use xsl:apply-templates in a xsl:template, you might get extra text strings in the transformed results if your input XML data contains unexpected elements with text nodes (due to default XSLT template rules, the extra elements themselves and their attributes are ignored).

There are three ways you can handle this situation:

  • Write an XML schema and validate input data against it before the transformation;
  • List the children you expect in the xsl:apply-templates instruction;
  • Define a low-priority default template the does nothing.
For example, when the input XML document ...
<?xml version="1.0" ?>
<data>
  <a>A1</a>
  <b>B1</b>
  <c>Unexpected</c>
  <b>B2</b>
  <a>A2</a>
</data>
... is processed with the stylesheet ...
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="data">
  <xsl:apply-templates /></xsl:template>

<xsl:template match="a">
  A: <xsl:value-of select="text()" />
</xsl:template>

<xsl:template match="b">
  B: <xsl:value-of select="text()" />
</xsl:template>

</xsl:stylesheet>
... you get the string Unexpected in the transformed results. You could either list the children you expect in the xsl:apply-templates, for example ...
<xsl:template match="data">
  <xsl:apply-templates select="a|b" /></xsl:template>
... or you could define a default template that does nothing (overwriting the built-in default that outputs the child text nodes) ...
<xsl:template match="*" priority="-100" />

This post is part of You've asked for it series of articles.

Another IE bug: transformed plain text displayed as HTML

If you use browser-side XSLT transformation triggered with xsl-stylesheet pseudo instruction in the source XML document and set the output format to plain text with xsl:output method='text', you'd expect the results to be displayed as unformatted text in the browser window (similarly to what browser does when served a document with content-type: text/plain). Not surprisingly, Firefox behaves as expected, but Internet Explorer 7 renders the results as pure HTML (going as far as interpreting the start-of-tag characters).For example, when the following XML document ...
<?xml version="1.0" ?>
<?xml-stylesheet href="t1.xsl" type="text/xsl"?>
<data>
  <row>Sample</row>
</data>
... is transformed with this stylesheet ...
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:output method="text" />

<xsl:template match="row">
&lt;b&gt;<xsl:value-of select="text()" />&lt;/b&gt;
</xsl:template>

</xsl:stylesheet>
... Firefox displays the resulting text (<b>Sample</b>), but Internet Explorer displays Sample.

Yet another reason to use Saxon

Testing a sample stylesheet today, I've discovered that Saxon reports a lot of recoverable errors that are silently ignored by other XSLT implementations (MSXML or Gecko).For example, when faced with this stylesheet ...
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:template match="data">
  <xsl:apply-templates /></xsl:template>
 
<xsl:template match="a">
  A: <xsl:value-of select="text()" />
</xsl:template>
 
<xsl:template match="a">
  B: <xsl:value-of select="text()" />
</xsl:template>
 
</xsl:stylesheet>
... Saxon would report ambiguous rule match (even documenting the input element that caused the error and the source line number), whereas all the other XSLT translators would silently produce "best effort" results. In fact, the stylesheet had a typo, the last template should match the b element and I would have lots of problems detecting that with MSXML (for example).

Why would I use select='text()'

<xsl:value-of select='text() /> is used when you want to select the text embedded between the opening and closing tag of the current element. However, in this context, the text() function selects only the first child text node. If you want to:

  • select all embedded text (including text within the child elements), use <xsl:value-of select='.' /> as this renders the whole tree of descendants as a text string;
  • select all text nodes within the current element but no text in descendant elements, use a <xsl:for-each select='text()' > loop.
For example, the following XML document ...
<?xml version="1.0" ?>
<?xml-stylesheet href="t1.xsl" type="text/xsl"?>
<data>
  <row>Before <b>an embedded tag</b> and after it</row>
</data>
... transformed with this stylesheet ...
<?xml version="1.0" encoding="windows-1250" ?>
<xsl:stylesheet
 version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:output method="text" encoding="windows-1250" />

<xsl:template match="row">
  Text(): <xsl:value-of select="text()" />
  Current element: <xsl:value-of select="." />
  Text loop: <xsl:for-each select="text()"><xsl:value-of select="." /></xsl:for-each>
</xsl:template>

</xsl:stylesheet>
... results in the following text:

Text(): Before
Current element: Before an embedded tag and after it
Text loop: Before and after it

This post is part of You've asked for it series of articles.

XSL: Remove whitespace from source XML document

While searching for a solution to a completely different problem, I've stumbled across a very elegant way to remove whitespaces from input XML documents before they are transformed (see also description of problems caused by them) - use xsl:strip-space instruction, for example <xsl:strip-space elements='*' />

Test for empty attributes in XSLT

Testing for missing attributes in XML elements is different from testing for attributes with empty values. You can test for missing attributes with <xsl:if test="not(@attribute)">, but this test will never succeed if the attribute is present but empty. In that case, you have to use the <xsl:if test="@attribute = ''"> condition.

This post is part of You've asked for it series of articles.