XSLT: Extract file name from path

When transforming the WordProcessingML documents, I wanted to output an IMG element that would contain the HREF attribute equal to the picture name used in the Word file (assuming the picture is linked, not just inserted). However, the picture name in the Word file usually contains whole path (even when the picture and the Word document are in the same directory), so I needed a function that would extract the file name from the path. As XSLT 1.0 has very limited function set (and MSMXL still doesn't support XSLT 2.0), I had to write the function myself using a named template:
<xsl:template name="fileName">
  <xsl:param name="path" />
    <xsl:when test="contains($path,'\')">
      <xsl:call-template name="fileName">
        <xsl:with-param name="path" select="substring-after($path,'\')" />
    <xsl:when test="contains($path,'/')">
      <xsl:call-template name="fileName">
        <xsl:with-param name="path" select="substring-after($path,'/')" />
      <xsl:value-of select="$path" />

The function recognizes Windows- and Unix-style paths (forward and backward slashes are recognized as path delimiters).

Count XML items with specific attribute value

To count the number of nodes that have specific attribute or that have attribute with specific value in your XSLT transformation, use the following code:
  • All elements having the specified attribute:
  • <xsl:select value="count(//*[@attr])" />
  • All elements having attribute with desired value:
  • <xsl:select value="count(//*[@attr = 'value'])" />
For example, to count all elements in an XHTML document having a class name, use:
<xsl:select value="count(//*[@class])" />
To count all elements with class name containing 'menu' use:
<xsl:select value="count(//*[contains(@class,'menu')])" />

Line breaks in XSLT-generated text document

If you're generating text documents with XSLT transformation and use xsl:text tags for tight whitespace control, you might need to insert the end-of-line characters into the output stream manually. You could do that with a newline within the xsl:text tag, like this:
However, using explicit newline character (&#xa; or &#10;) results in more explicit code that's easier to read, understand and maintain:

XPATH expressions in MSXML selectNodes() function are evaluated on the whole DOM tree

The selectNodes() function available in the Microsoft XML (MSXML) API evaluates the XPATH expressions supplied as the argument in the context of the whole DOM tree to which the DOM node belongs, not just the node on which the selectNodes function was executed. For example, if you use the following code …
set firstDiv = document.selectSingleNode("//div")
set paraList = firstDiv.selectNodes("//p")
… the paraList will contain the list of all paragraphs in the whole document, not just the list of paragraphs in the DIV on which the selectNodes call was executed.

Tight control on whitespaces in XSLT-generated documents

Continuing from the previous post on the whitespace issues, here are a few rules that you should keep in mind:
  • Whitespace-only text nodes are not copied from the XSLT document into the output document;
  • Text nodes containing non-whitespace characters are copied in their entirety, including any whitespace characters that are copied verbatim. Extra line breaks can easily appear in your output text, more so if you try to apply nice readable format to the source XSLT document.
  • If you want very tight control on the generated output, place the non-whitespace characters only within the xsl:text tags. The contents of the xsl:text tag (which cannot contain any embedded tags) is copied straight into the output document.

The whitespace control is extremely important if you're generating text output with XSLT transformation.

Word to MediaWiki conversion with XSLT

I've tried two Word to MediaWiki converters: the set of macros described in the Word2MediaWikiPlus extension and the OpenOffice converter. The OpenOffice converter does not work too well (for example, if you have a paragraph style with COURIER font, it's not transformed into MediaWiki code markup) and I needed some extensions that would be hard to cram into Word macros used by Word2MediaWikiPlus, so I decided to implement the converter as an XSLT translator (usable in Word 2003/2007) that should be easy(er) to modify for someone fluent in XSLT. You can find the current (pre-alpha) sources on SourceForge and download the alpha release. If I'm missing a functionality you desperately need, let me know.

Soft breaks in WordProcessingML

The WordProcessingML has an “interesting” way of representing soft breaks (Ctrl-Enter in Word): if a range (w:r, also called run) has a w:br child, it represents a soft break at that position (see also Section 2.3.3 of Part 4 of ECMA-376). Here is the XML Notepad display of a sample three-line paragraph (with two soft breaks):To process the soft break in a range with XSLT, use templates similar to these:
<xsl:template match="w:r">
  <xsl:apply-templates />

<xsl:template match="w:t"><xsl:value-of select="text()" /></xsl:template>

<xsl:template match="w:br|w:cr"><br /></xsl:template>