xsl:element generates an extra linebreak

In a previous post I’ve described how you can generate dynamic output elements with xsl:element instruction. Unfortunately, the xsl:element generates a newline before the element name (in Saxon and in MSXML) if you use HTML output (xsl:output with method=’html’), resulting in extra line breaks in Blogger (I was developing Word-to-Blogger converter). The only workaround is to use the method=’xml’ with no XML declaration:

<xsl:output method=’xml’ omit-xml-declaration=’yes’)

I didn’t want to have the <?xml ?> heading in the document as I’m pasting the transformation results straight into Blogger.

Query-string-based revision control

One of the easy ways to improve the perceived response time of your web site is to ensure that your web server sets explicit Expires: HTTP header on the static web page components (JavaScript libraries, CSS files, images …), thus reducing the number of HTTP requests sent by the browser. However, if you change your JavaScript code or CSS, your visitors could be stuck with the old version for a long time.

If you use static HTML and a decent development environment, you can easily rename the JavaScript or CSS files (and the HTML pages get updated as a side effect). For more complex environments, you could use an easy trick: append the revision number as a query string after the file name:

<link href="myStyle.css?57” rel="stylesheet" type="text/css" />
<script src="myCode.js?42" type="text/javascript"></script>

Most web servers ignore the query string after the name of a static file, but the browsers perform the caching on whole URL (so x.js?1 is different from x.js?2).

When boolean logic has a maybe value

When doing Word-to-Blogger conversion, I wanted to identify a group of code paragraphs and convert them into a single PRE element (Word-to-MediaWiki is simpler, as you just prepend a single space to the text and MediaWiki merges all code lines together). The original idea was pretty simple:
  • Match w:p elements with w:pPr/w:style/@w:val = 'code';
  • Check if the previous w:p element also has code style, if not, emit the PRE element and handle the whole group of code paragraphs.

The initial code worked ...

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <xsl:if test="preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val != 'code'">
    <pre class='{w:pPr/w:pStyle/@w:val}'>
      <xsl:apply-templates select='.' mode="pre" />

... but only until I've decided to add a border around the code style in Word. The border creates a wx:pBdrGroup in WordProcessingML ...

... and the xsl:if test fails if there is no preceding w:p element. To make the code work, I had to reverse the test condition and add a not at the beginning:

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <xsl:if test="not(preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val = 'code')">
... rest of code ...

Automatic generation of HTML/WikiMarkup from Word

After developing the XSLT translations that convert WordProcessingML (Microsoft Word XML) into desired HTML or MediaWiki markup, I wanted to integrate them tightly into Word, so I wrote a Word macro that automatically:
  • Performs the conversion using the specified XSLT file;
  • Opens the new document as text;
  • Selects the converted text and copies it to the clipboard;
  • Reopens the original file;
  • Informs the user that the clipboard contains converted text, ready to be pasted into Blogger or MediaWiki.

Here is the source code for the macros:

Sub SaveAsWiki()
' SaveAsWiki Macro
' Save Word document as Wiki markup
    ConvertWithXSL "createWikiMarkup.xsl"
    MsgBox "The Wiki Markup is on the clipboard", vbOKOnly, "Done"
End Sub

Sub SaveAsBlogger()
' SaveAsBlogger: save word document as blogger-optimized HTML markup
    ConvertWithXSL "createBloggerMarkup.xsl"
    MsgBox "The Blogger Markup is on the clipboard", vbOKOnly, "Done"
End Sub

Sub ConvertWithXSL(XSLFile As String)
' ConvertWithXSL: converts the Word document with specified XSL
    Dim XPath, DocName, TxtPath
    XPath = ActiveDocument.AttachedTemplate.Path & _
            Application.PathSeparator & XSLFile
    TxtPath = Environ("TEMP")
    If TxtPath <> "" Then TxtPath = TxtPath & "\"
    TxtPath = TxtPath & "wmk.txt"
    DocName = ActiveDocument.FullName
    If MsgBox("The conversion process will lose all changes you've made. " & _
              "You have to save the document before running the conversion. " & _
              "Did you do it?", vbYesNo, "Warning") <> vbYes Then Exit Sub
    With ActiveDocument
        .XMLSaveDataOnly = False
        .XMLUseXSLTWhenSaving = True
        .XMLSaveThroughXSLT = XPath
        .XMLHideNamespaces = False
        .XMLShowAdvancedErrors = False
        .XMLSchemaReferences.HideValidationErrors = False
        .XMLSchemaReferences.AutomaticValidation = True
        .XMLSchemaReferences.IgnoreMixedContent = False
        .XMLSchemaReferences.AllowSaveAsXMLWithoutValidation = True
        .XMLSchemaReferences.ShowPlaceholderText = False
    End With
    ActiveDocument.SaveAs _
        FileName:=TxtPath, FileFormat:=wdFormatXML, _
    Documents.Open FileName:=TxtPath, ConfirmConversions:=False, _
        ReadOnly:=False, AddToRecentFiles:=False, _
        Format:=wdOpenFormatAuto, Encoding:=65001
    Documents.Open FileName:=DocName
End Sub

XSLT: Generate heading from WordProcessingML

Microsoft Word (and WordProcessingML) treats all paragraphs equally (all of the are represented as w:p elements) and it's the job of the XSLT programmer to separate headings (the paragraphs where the paragraph style contains w:pPr/w:outlineLvl/@w:val) from the regular text.

I've started by defining a key that would extract the outline level from the paragraphs style associated with the current w:p element:
<xsl:key name="outline" match="w:outlineLvl" use="ancestor::w:style[@w:type = 'paragraph']/@w:styleId" />
If successful, this key would return the w:outlineLvl element whose ancestor w:style element has the specified w:styleId attribute (and, yes, it took me five minutes to figure out what I've been doing when I've revisited the code after three months). The key is then used in a template that uses xsl:element to create a Hx or P element:
<xsl:template match="w:p">
  <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" />
  <xsl:variable name="outLvl"
    select="key('outline',w:pPr/w:pStyle/@w:val)/@w:val" />
  <xsl:variable name="elName">
      <xsl:when test="$outLvl">h<xsl:value-of select="$outLvl + 1" /></xsl:when>
  <xsl:element name="{$elName}">
... rest of translation code ...

XSLT: Use specific matches instead of xsl:choose

The if-then-else construct in XSLT 1.0 is "somewhat baroque", it's thus easier (and probably also faster) to use other methods to select various alternatives, for example the specific matches in the xsl:template match parameter.

In one of my recent WordProcessingML (Microsoft Word XML) XSLT projects I had to transform paragraphs with the cite style into blockquote, those with code style into pre and the rest of them into regular paragraphs. Instead of using a complex xsl:choose structure, I've defined three templates:
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'cite']">
  <blockquote class="{w:pPr/w:pStyle/@w:val}">
    <xsl:apply-templates />
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <pre class="{w:pPr/w:pStyle/@w:val}">
    <xsl:apply-templates />
<xsl:template match="w:p">
... regular code goes here ...

PHP startup kit

Recently I've got involved in deploying Open Source Web 2.0 applications (specifically, phpBB, Wordpress and MediaWiki), so I had to bite the PHP/MySQL bullet. If you're in the same situation, I would strongly recommend these books:

Full disclosure: If you click on one of the above links and actually buy the book, I might get around $1.00 from Amazon