OMG: jQuery tag selectors are case sensitive

… if you use Firefox and client-side XSLT transformations. This is the scenario:

If you use tags in jQuery selectors, for example $(".pulldownMenu P A").click(handler), the tag names are case insensitive if you use IE or server-side transformation with Firefox but become case sensitive if Firefox performs client-side transformation regardless of xsl:output settings. To ensure your jQuery code works in all cases, write exactly the same tag names in jQuery selectors and XSLT stylesheet.

Detect whether your browser has a working XSLT implementation

After I’ve finally managed to persuade IE7, FF and Transform jQuery plugin to work with my XSL documents, I’ve started testing the other two browsers I have: Opera and Chrome. The latest release of Opera might occasionally work. Chrome fails (as expected) as it doesn’t support xsl:import. Welcome to the next round of browser incompatibilities.

I’ve decided to ignore Chrome until the great minds @ Google decide to implement XSLT properly. Visitors using it will have reduced experience … but of course I have to detect whether I should use XSLT or not.

Here’s a small Christmas gift if you have similar issues: a jQuery extension that checks whether the current visitor can use XSLT transformations.

jQuery.xslt = {
  need: function(success,url,xml) {
    var componentCount = 0;
    var xslt;
    var el;
    var debug=1;
    
    function tryTransform() {
      if (typeof(xslt) != 'object') return; // XSL did not parse into XML object
      var html = $.transform({xmlstr: xml, xslobj: xslt, async: false});
      alert("html="+html);
      if (!html) return;
    }

    function gotScript() {

      function transformFail(html,xsl,xml,obj,ex) { 
        if(debug) alert ("$.xslt.need failed:"+ex.message); }
    
      function transfromDone(html,xsl,xml,obj) { 
        $.xslt.has = html.search($.xslt.expectedResult) >= 0;
        if ($.xslt.has && success) success();
      }
      
      el = $("<div>");
      el.transform({xmlstr: xml, xsl: url, 
        success: transfromDone, error: transformFail});
    }
    
    url = url ? url : $.xslt.defaultURL;
    xml = xml ? xml : $.xslt.defaultXML;
    $.getScript($.xslt.transformScript,gotScript);
  },
  has: false,
  
  defaultURL: "/forms/xml/static.xsl",
  defaultXML: "<section />",
  transformScript: "/common/js/jquery.transform.packed.js",
  expectedResult:  /table/i
}

The extension implements a simple function and a property: $.xslt.need() and $.xslt.has. You can pass the URL to the sample XSL document and the sample XML markup to the need function; it also supports a success callback to tell you whether you should install XSL-related action handlers.

The need function loads the jQuery Transform plugin (reducing the load time if the page does not need the XSLT functionality) and tries to transform a sample XSL document. The sample transformation should be as complex as possible: you should use xsl:import, xsl:include or the document() function if you use them in other transformations.

You could change the default parameters in the source code or write a setup function.

IE7, xsl:import and jQuery Transform plugin

After exhausting all other (more convenient from my perspective) approaches, I’ve had to admit that there’s only a single solution that allows you to include/import XSL stylesheets with the jQuery Transform plugin if you want to:

  • Use XSL documents from another directory on the Web server.
  • Use xsl:import or xsl:include in the XSL documents.
  • Have a reliable cross-browser implementation.

The solution, as you might have guessed, is to use the URL pointing to the XSL document in the call to $.transform (the xsl parameter). And just in case you’re wondering why I had to try everything else: I wanted to use the $.transform() call to get the transformed HTML without the delay incurred by setting the async parameter to false.

Xsl:import fails when using XSL object in $.transform

After the initial set of xsl:import related problems I’ve encountered running jQuery Transform plugin in IE7, I’ve tried to pass the XML document as string (using the xmlstr parameter) and XSL as parsed object (using the xslobj parameter) returned by the $.get call. Works perfectly in FF, fails miserably in IE7. The error message indicates that the imports requested with xsl:import don’t work at all. Back to the drawing board …

Import/include problems in jQuery Transform plugin

I started testing the jQuery Transform plugin with a set of pretty convoluted XSLT documents that I use on the target web site, which relies almost exclusively on browser-side XSLT transformations. All server responses are initially encoded as XML and transformed on the server only if the browser does not support XSLT (or if a spider has come to visit us). Not surprisingly, the XSLTs are heavily loaded with xsl:import and xsl:include statements.

Initially, I tried to control as much as possible: read XML and XSLT into strings and perform the transformation when everything has been collected. This approach fails miserably in IE7. The reason is simple: if I pass XSLT source or parsed XSLT object to the $.transform routine, the routine tries to fix xsl:import and xsl:include references using the current web page’s URL as the base reference, which is wrong if the stylesheet has been loaded from another directory.

Conclusion: if you read XSL documents from another directory and pass them as string to the $.transform routine, make sure you use absolute references in import/include statements.

jQuery Transform plugin

Finally I found some time to start working on XSLT transformations in jQuery framework. I may be completely wrong, but I decided to go with the Transform plugin. If anyone has found a better plugin, I would highly appreciate your feedback :).

Refactoring a menu: remove inline CSS

If you’re new to this series, read the “Refactoring a simple menu” and “Changing the background images” first.

The next step in HTML/CSS refactoring was to clean up the very ugly HTML code … I don’t know what I was thinking at that time; the menu full of repetitive inline CSS is plainly stupid.

Looking at the HTML code, it’s obvious that:

  • The row buttons are independent DIV elements;
  • Each row button has a P child and potentially a DIV child (holding the pull-down menu box).
  • The pull-down menu box has P children that are styled identically to the P elements in the row buttons.

So here’s a very obvious CSS solution:

  • Mark the row button DIV elements with a class (rowMenu).
  • Define styling for P, A and DIV children of the DIV.rowMenu elements.

The resulting CSS is pretty simple …

DIV.rowMenu { float: left; position: relative; z-index: 100; }
DIV.rowMenu DIV { position: absolute; display: none; }
DIV.rowMenu P { … original btx170 definition … }
DIV.rowMenu P.down { … btx170_down definition … }
DIV.rowMenu A { color: #000; text-decoration: none; }

… and so is the cleaned-up HTML code:

<div class="rowMenu" id="IDAHYJQB">
  <p><a href="/climbing/myClimbs/myClimbs.asp">First page</a></p>
</div>
<div class="rowMenu" id="IDAKYJQB">
  <script>menuRegister('IDAKYJQB')</script>
  <p id="IDAKYJQB_main"><a href="javascript:menuClick('IDAKYJQB')">Add ...</a></p>
  <div id="IDAKYJQB_sub">
    <p><a onclick="menuSelect('IDAKYJQB')" 
        href="/climbing/myClimbs/myClimbs_add.asp">New entry </a></p>
    <p><a onclick="menuSelect('IDAKYJQB')" 
        href="/climbing/myClimbs/myClimbs_editWall.asp?a=add">Edit</a></p>
    … more …
  </div>
</div>

Next task: get rid of superfluous element IDs.

Refactoring a menu: Changing the background images to borders

Before reading this: read the “refactoring a simple menu” post to get the background information.

One of the stupidities I did in the original menu implementation was to use CSS background images for simple buttons that could be implemented equally well with CSS borders. It would be hard(er) to get rid of background images if I would have had rounded corners or shaded button background, but I had none of those. So here’s the new CSS definition, replacing images with borders.

The new CSS has another great side-effect: the button sizes are specified in EMs, not in pixels, so the buttons get resized automatically if you change points-to-pixels ratio.

.btx170,.btx170_down {
  overflow: hidden; cursor: pointer;
  padding: 0 0 2px 0; margin: 0 0; text-align: center;
  font-family: Verdana, Arial, Helvetica, sans-serif; 
  font-weight: 700; font-size: 8pt; }
  
.btx170, .btx170_down { width: 16em; }
.btx170 { background-color: #FFCE63; border: 2px outset #FFCE63; }
.btx170_down, { background-color: rgb(255,156,0); 
  border: 2px inset rgb(255,156,0); }

Next task: Remove inline CSS

Refactoring a simple menu

Years ago I had to implement a drop-down menu. Nothing fancy; no open-on-hover magic, but a simple line of buttons with drop-down boxes that would open on clicking the button.

There were just a few annoying details: clicking a top-row button should obviously open a drop-down box, but also change state the button’s state to “depressed” and close any other open drop-down box (and change the state of corresponding buttons).

Here is my five-year-old HTML code …

<div style="float: left; position: relative; z-index: 100;" id="IDAHYJQB">
  <p class="btn170">
    <a href="/climbing/myClimbs/myClimbs.asp">First page</a>
  </p>
</div>
<div style="float: left; position: relative; z-index: 100;" id="IDAKYJQB">
  <script>menuRegister('IDAKYJQB')</script>
  <p class="btn170" id="IDAKYJQB_main">
    <a href="javascript:menuClick('IDAKYJQB')">Add ...</a>
  </p>
  <div style="position: absolute; display: none;" id="IDAKYJQB_sub">
    <p class="btn170">
      <a onclick="menuSelect('IDAKYJQB')" 
        href="/climbing/myClimbs/myClimbs_add.asp">New entry </a>
    </p>
    <p class="btn170">
      <a onclick="menuSelect('IDAKYJQB')" 
        href="/climbing/myClimbs/myClimbs_editWall.asp?a=add">Edit</a>
    </p>
    … more …
  </div>
</div>

… the corresponding CSS …

.btn170, .btn170_down { background-repeat: no-repeat; 
    width: 170px; height: 20px; line-height: 18px; overflow: hidden; 
    padding: 0 0; margin: 0 0; text-align: center;
    font-family: Verdana, Arial, Helvetica, sans-serif; 
    font-weight: 700; font-size: 11px; }

.btn170
  { background-image: url('images/button_170.gif');  }   
.btn170_down
  { background-image: url('images/button_down_170.gif'); }

… and JavaScript code …

var topMenuItems = [] ;

function addClass(id,sfx) {
  var se = getElement(id) ;
  if (se.className.indexOf(sfx) < 0) se.className = se.className + sfx ;
}
function removeClass(id,sfx) {
  var se = getElement(id) ;
  var i = se.className.indexOf(sfx) ;
  if (i > 0) se.className = se.className.substr(0,i) ;
}

function menuShow(id) {
  var se = getElement(id) ; se.menuActive = true ; 
  showElement(id + "_sub") ; addClass(id+"_main","_down"); }

function menuHide(id) { 
  var se = getElement(id) ; se.menuActive = false ; 
  hideElement(id + "_sub") ; removeClass(id+"_main","_down"); }

function menuGo(id,l) { menuHide(id); location.href = l; }
function menuSelect(id) { menuHide(id) ; }

function menuClick(id) {
  var i,se ;
  se = getElement(id) ;
  if (se.menuActive) {
    menuHide(id) ; return ;
  }
  for (i = 0 ; i < topMenuItems.length ; i++) {
    if (topMenuItems[i] != id) menuHide(topMenuItems[i]);
  }
  menuShow(id) ;
}

function menuRegister(id) { 
  topMenuItems[topMenuItems.length] = id ;
}

I will not try to explain what this code does, as it’s way too painful. As I’ll walk through the refactoring process, I’ll show you the changes I’ve made and the stupidities in the original code.

What else could you expect from Google :(

Armed with the minimum ATOM-like document Blogger accepts, I wrote an XSL transformation that converted my proprietary news feed format into Blogger-liked ATOMish format. I used Saxon as the transformation engine, resulting in a nice UTF-8 encoded file that opened easily in Internet Explorer, Firefox and XML Notepad.

The end result: Blogger refused to import the file until I've opened it in FrontPage and prettified the XML text. Obviously they have a "custom" XML parser that has problems with some valid variants of XML. But what else can one expect from company that wants to build everything on its own.

Import into Blogger from an Atom feed

Google has recently added the import functionality to Blogger, making it very easy to migrate from any other information service to Blogger. Unfortunately, the import process rejects anything that looks like it did not come from the Blogger export feature. I did not try to nail down the very minimum that would be required for the import to work, but this is a very minimalistic Atom document that works:

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' 
      xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'  
      xmlns:gd='http://schemas.google.com/g/2005'
      xmlns:thr='http://purl.org/syndication/thread/1.0'>
    <id>tag:blogger.com,1999:blog-123456789.archive</id>
    <updated>2008-11-01T00:00:00.000+01:00</updated>
    <title type='text'>Sample import file</title>
    <author>
        <name>Example.com</name>
        <uri>http://www.blogger.com/profile/12345678</uri>
        <email>noreply@blogger.com</email>
    </author>
    <generator version='7.00' uri='http://www.blogger.com'>Blogger</generator>
    <entry>
        <id>tag:blogger.com,1999:blog-123456789.post-12345678</id>
        <published>2008-11-01T00:00:00.000+01:00</published>
        <updated>2008-11-01T00:00:00.000+01:00</updated>
        <category scheme='http://schemas.google.com/g/2005#kind' 
          term='http://schemas.google.com/blogger/2008/kind#post'/>
        <category scheme='http://www.blogger.com/atom/ns#' term='CatItem'/>
        <title type='text'>Post title</title>
        <content type='html'>Post text, HTML needs to be escaped</content>
        <author>
            <name>Example.com</name>
            <uri>http://www.ioshints.info/about</uri>
            <email>noreply@blogger.com</email>
        </author>
        <thr:total>0</thr:total>
    </entry>
</feed>

Poor man's AJAX: Browser-side XSLT transformation in IFRAME

One of the oldest methods to provide AJAX-like functionality in a browser that does not support XmlHttpRequest object is loading the dynamic content into a hidden IFRAME. It’s an unreliable technique that should not be used, but it can still provide a viable workaround in scenarios where you need to perform browser-side XSLT transformation in browsers with lousy JavaScript-based XSLT support (it looks like Chrome is still in this category).

This is a conceptual step-by-step description of the process:

  • Create a hidden IFRAME in the main page.
  • Define a callback function in the main page that will receive the results of the transformation. This callback function will have to insert the transformation results into the main page’s HTML/DOM structure.
  • When you need to download additional content, set the IFRAME.href attribute to the URL of the dynamic content.

You should use a timeout in the main page to capture failed loads of the dynamic content. The timeout code should display an error message and kill the download process in IFRAME by resetting the IFRAME.href attribute.

  • The dynamic content loaded into the IFRAME should have XML MIME type and contain the xml-stylesheet processing instruction. This will cause the browser to perform XSLT transformation on the XML data.
  • The resulting HTML should include JavaScript call of the callback function in the parent frame, for example <body onload="parent.callback(document.body.innerHTML)">

Alternatively, you can use the onload handler in the IFRAME element to call the callback function.

Firefox forgets to create document.body object on Linux

If you use client-side XSLT transformations driven by xml-stylesheet pseudo-instruction in XML documents, you might encounter interesting problems when using Firefox 2 on Linux. When the  HTML page is created with XSLT transformation, Firefox does not create the document.body object, causing JavaScript libraries (for example, jQuery) to break.

Workaround: add an ID to your body tag (for example, <body id="body">) and fix the document.body object after the DOM is ready. To do it in jQuery, use the following code:

$(function() {   if (!document.body) document.body = $('#body').get(0); }

XSLT transformation in ASP: non-standard XML encoding

If you want to generate documents with non-standard encodings with server-side XSLT transformation in IIS/ASP environment, you should:

  • Set the Response.Charset property to the desired character set;
  • Set the Response.Codepage property to the desired code page (or use the <%@ LANGUAGE="VBScript" CODEPAGE="codepage" %> directive in your ASP script).
  • Omit XML declaration from the translated text (using omit-xml-declaration=”yes” attribute in xsl:output element) and prepend the desired XML declaration in front of the translated text.

For example, the following program generates XML (or XHTML) document encoded in windows-1250 character set (codepage: 1250) …

<%
Const DOMClass = "MSXML2.DOMDocument"
Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)
XDoc.loadXML("<root greek='&#946;' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))
Response.Clear
Response.Charset = "windows-1250"
Response.Codepage = 1250
Response.ContentType = "text/xml"
Response.Write "<?xml version='1.0' encoding='windows-1250'?>"
Response.Write XDoc.transformNode(XSLT)
%>

… when using the following XSL transformation:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" />
… rest deleted …

XSLT transformation in ASP: XML with MSXML6

Similar to the XML-to-HTML transformation with MSXML6, the results of the transformNode function do not include the encoding information; the XML text produced by the sample program with the XSL transformation described in the previous post is a perfect XML document:

<?xml version="1.0"?>
<output>
  Greek letter: β 
  EE: č
</output>

An XML document without the encoding attribute in the xml declaration is assumed to be encoded in UTF-8. The results of the transformNode function in MSXML6 are thus absolutely correct if you use UTF-8 output encoding.

XSLT transformation in ASP: XML with MSXML3

You might consider server-side XML-to-XML transformations rare, but you have to use <xsl:output method=”xml”> if you want to generate valid XHTML code from your XML data. To shorten the printouts, we’ll use a simple (non-XHTML) XSL transformation to generate the test results:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    Greek letter: <xsl:value-of select="@greek" /> 
    EE: <xsl:value-of select="@ee" />
  </output>
</xsl:template> 

</xsl:stylesheet>

As with HTML, MSXML3 interferes with the output, inserting improper UTF-16 encoding directive, resulting in the following XML text:

<?xml version="1.0" encoding="UTF-16"?>
<output>
  Greek letter: β 
  EE: č
</output>

You could bypass this bug by setting the omit-xml-declaration attribute of the xsl:output element to yes

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

… rest deleted …

… resulting in the following transformation output:

<output>
  Greek letter: β 
  EE: č
</output>

However, if you want to retain XML declaration in the XML document, you have to replace UTF-16 in the output string with UTF-8, like we did in the HTML transformation case. The following modified test program produces perfect XML document when used with the original XSLT transformation (without the omit-xml-declaration attribute):

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT), _
  "encoding=""UTF-16""","encoding=""utf-8""")
%>

XSLT transformation in ASP: HTML with MSXML6

If your IIS platform includes MSXML6, you should use MSXML6 to transform XML into HTML instead of MSXML3 (see the Using the right version of MSXML article for proper fallback process). MSXML6 still generates the META tag (the MSXML3-related post describes problems caused by the META tag), but does not include the charset parameter in it, resulting in perfect HTML output. The sample ASP program using the simple XSLT transformation from the “XSLT transformation in ASP: HTML with MSXML3” post produces the following output when using MSXML6 (MSXML2.DomDocument.6.0 ProgID):

<html>
<head>
<META http-equiv="Content-Type" content="text/html">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">Greek letter: β EE: č</node></body>
</html>

If you want to use complex XSLT transformations with MSXML6, you might have to set additional second-level DOM properties, for example AllowDocumentFunction. You might also want to set ValidateOnParse to false if the validity of your XML document is not a major concern.

XSLT transformation in ASP: HTML with MSXML3

Most commonly, you’ll use server-side XSLT transformation in ASP to transform XML data into HTML using MSXML3 (available on almost all IIS platforms). The following XSLT stylesheet is used in the HTML test:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="html" encoding="utf-8" />

<xsl:template match="root">
  <html>
    <head>
      <title>Sample XSLT server-side transformation</title>
    </head>
    <body>
      <node type="test">
        Greek letter: <xsl:value-of select="@greek" /> 
        EE: <xsl:value-of select="@ee" />
      </node>
    </body>
  </html>
</xsl:template> 

</xsl:stylesheet>

With MSXML3, the transformNode function inserts a META directive in the output stream, resulting in the following transformation result:

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">
Greek letter: β 
EE: č
</node>
</body>
</html>

On the other hand, the HTTP headers generated by the test ASP script claim the content is UTF-8 encoded (and the raw data dump performed with Fiddler confirms that). The response headers were taken from LiveHTTPHeaders Firefox extension:

HTTP/1.x 200 OK
Server: Microsoft-IIS/5.1
Date: Mon, 15 Sep 2008 11:20:56 GMT
X-Powered-By: ASP.NET
Content-Length: 222
Content-Type: text/html; Charset=utf-8
Cache-Control: private

Most browsers (including IE and Firefox) take the Content-type from HTTP headers, using the META directive as a fallback. They thus render the page as intended. Some browsers (including the TextView tab of Fiddler) prefer the META directive and produce garbage.

The solution

To match the META header with the HTTP headers, replace the charset=UTF-16 string in the text returned by the transformNode function with charset=UTF-8. The modified sample ASP script is included below:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT),"charset=UTF-16","charset=UTF-8")
%>

XSLT transformation in ASP: The testbed

As I’ve mentioned in the previous post, MSXML functions called from ASP believe they work in UTF-16 environment. The transformNode function might insert this information in the output string based on several parameters, one of them being the version of the MSXML ActiveX control. To test behavior of various versions of MSXML, we’ll use the following test program:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write XDoc.transformNode(XSLT)
%>

And a stylesheet similar to this one:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    <node type="test">
      Greek letter: <xsl:value-of select="@greek" /> 
      EE: <xsl:value-of select="@ee" />
    </node>
  </output>
</xsl:template> 

</xsl:stylesheet>

We’ll test two MSXML versions: MSXML3 (default with Windows XP) and MSXML6 (default with Vista, also available on Windows XP). MSXML3 should be available on almost all web servers, you might not get MSXML6 everywhere (my hosting provider did not offer it when it mattered most to me).

The test results will be published in the next few posts.

Further reading:

XSLT transformation in ASP: the principles

If you want to perform server-based XSLT transformations on IIS using classic ASP, you should keep in mind the following facts:

  • ASP uses 16-bit Unicode characters internally. MSXML ActiveX modules called from ASP therefore think they work in UTF-16 environment.
  • Due to the impression that the XSLT transformations happen in UTF-16 environment, MSXML procedures might prepend UTF-16 specific headers in transformed text (more about that in follow-up posts).
  • The transformed text returned to ASP from transformNode function is UTF-16 encoded.
  • When the ASP script sends the text to the browser (or writes it into a text file), the codepage set with Response.Codepage is used to transform UTF-16 into target character set.

Further reading:

Yahoo is not totally HTTP compliant ... what else is new?

An article in Cisco's support wiki caught my attention today: it claims that Cisco routers could deny access to yahoo.com because Yahoo!'s web servers emit invalid chunked encoding. Interesting ... so I've started Fiddler and opened Yahoo!'s home page. This is what I've got:
HTTP/1.1 200 OK
Date: Sun, 14 Sep 2008 13:42:11 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV"
Cache-Control: private
Vary: User-Agent
Set-Cookie: IU=deleted; expires=Sat, 15 Sep 2007 13:42:10 GMT; path=/; domain=.yahoo.com
Set-Cookie: FPCM=deleted; expires=Sat, 15 Sep 2007 13:42:10 GMT; path=/
Set-Cookie: D=_ylh=X3oDMTFkbmtlMG9nBF9TAzI3MTYxNDkEcGlkAzEyMjEzOTczMjEEdGVzdAMwBHRtcGwDaW5kZXgtbA--; path=/; domain=.yahoo.com
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip

7b17   
As you can see there are actually three blanks after the chunk length, which is a clear violation of section 3.6.1 of RFC 2616 (HTTP); the chunk length should be followed by semicolon (for chunk extensions) or CRLF. I'm not really surprised; the crappy implementation of Frontpage extensions on Geocities (and the fact that they wanted to charge me for it) pushed me toward my own web site six years ago.

JavaScript: the good parts

If your brain hasn't been polluted with awful JavaScript practices yet (or if you're willing to submit to a proper brainwashing), read the JavaScript: The Good Parts book from Douglas Crockford. He covers what he believes (and is usually right) are the good parts of JavaScript, resulting in scalable, easy-to-maintain code. Although the book targets beginner audiences (or programmers not yet familiar with JavaScipt), it's an interesting read for anyone who wants to write better JavaScript code. Even I found a few interesting topics (for example, multiple ways to implement object constructors and inheritance) after working with JavaScript for 15+ years.

Chrome: first impressions

I've just downloaded Chrome (the new browser from Google), tested it on my applications and got immediately impressed - they all worked, even the client-side XSLT transformation driven by the xsl-stylesheet directive. Then I did a few more random tests and my enthusiasm was drastically reduced:
  • The inter-line spacing on table heading texts was way too large: <th valign='bottom'>Line 1<br />Line 2</th> produced a blank line between the two text lines. I did not investigate what the root cause might be as all the other major browsers render it almost identically, so I don't really care what upset Chrome.
  • The top line of our corporate Wiki (the Login text) is misplaced.
  • The View source window does not display processing instructions in XML documents.
  • And the worst offender: Blogger in draft works way better, faster and more reliable in Firefox or Internet Explorer than in Chrome.
Summary: I will have Chrome installed to test my applications (some visitors are already using it), but will not use it in the near future.

jQuery: Read This First

If you're paid for your programming efforts (or if you put even a marginal value on your time), I would strongly recommend that you buy and read a few jQuery-related books before jumping into the code writing. This approach is usually way more efficient (and gives you a broader picture) than relying on snippets provided by good uncle Google. If, on the other hand, you believe everything should be free, be my guest ... but then your time is probably worth approximately as much :)

The following two books were a perfect fit for my level of JavaScript/CSS experience. The first one is a great step-by-step introduction to jQuery (focusing on jQuery, not on mundane JavaScript or CSS details) and the second one serves me as a great paper reference (I am old enough to prefer paper to pixels).

jQuery.convert++

Readers of articles I wrote for InformIT have probably noticed that I've avoided using high-level JavaScript libraries and relied on thin browser abstraction layers: Sarissa for AJAX and X library for cross-browser DOM/DHTML compatibility.

Recently I've decided to try jQuery and got persuaded within a day. My new projects are using jQuery ... and you'll see a number of reasons in my future blog posts. Repetitive operations that I had to code previously became much simpler and more streamlined with jQuery functions.

Poor man's capitalization in JavaScript

I wanted to have simple capitalization in JavaScript (the first letter of the string should be upper-case, the rest of the string unchanged). Although the String object provides uppercase and lowercase functions, there is no capitalization function. The simplest expression I could come up with is this:
txt.substr(0,1).toUpperCase()+txt.substr(1)
Do you have a better solution?

Testing Your Website in a Realistic Environment (InformIT article)

My last website performance related article published by InformIT, Testing Your Website in a Realistic Environment, deals with an interesting question: "assuming you've fixed most of the performance problems your web site had, how can you test what your global visitors experience without buying a round-the-world ticket?"

Fix Your Web Site Performance Problems (InformIT article)

If you've realized that you might have HTTP-related performance problems when reading my Why is my web site so slow article published by InformIT, you can find a variety of quick fixes and more permanent solutions in my Fix Your Web Site Performance Problems article (also published by InformIT).

Multiple style attributes in IE and FF

I've just stumbled across an interesting discovery today: if you use the style attribute multiple times in a single HTML tag (which you should not do, BTW, but it could happen if you write HTML code by hand), Internet Explorer will merge the style definitions whereas Firefox will ignore the second style attribute.

Why is my web site so slow (InformIT article)

If you've been involved in more than a few website deployments, I'm positive that you've encountered the following scenario: The website was developed, tested, demonstrated to internal audiences, accepted, deployed, …and failed completely due to unacceptable performance. In my InformIT article Why is my web site so slow I'm describing various reasons that can cause unacceptable performance for global visitors of your web.

Section breaks in Office Open XML and WordML

Identifying the section breaks in pure Office Open XML (OOXML) document is the ultimate nightmare: the only indication of a section break is the presence of w:sectPr element within the last paragraph of the section. To complicate matters further, the last section in the document could be represented by a w:sectPr element that is a sibling to the w:p elements … and of course you could have elements without sections, in which case there would be no w:sectPr element anywhere within the XML. Just try to imagine writing an XSLT translation that would perform OOXML-to-HTML translation and split the Word text into DIVs (a DIV for each section).

Fortunately, the task is much easier if you use WordML, which contains auxiliary hints in the wx namespace; in our case, the wx:sect element, which encloses all the paragraphs within a section.

For example, the following Word text …… generates this WordML markup (to get the corresponding OOXML, remove the wx:sect elements).

Create post excerpts in Blogger

One of the features sorely missing in Blogger is the ability to write an excerpt for your post (Wordpress supports several different methods), so I had to write my own JavaScript solution that provides functionality similar to the more tag in Wordpress. It hides parts of the post’s text and displays a More button which reveals the hidden text. The hidden text is enclosed within SCRIPT tags:

<script>startHide()</script>
… extra text …
<script>endHide()</script>

I’m including a JavaScript library from one of my web sites into the Blogger template. If you want to have a Blogger-only solution, include the following JavaScript code in your template:

var isMainPage = 0;
var hideCount = 0;

function dw(t) { document.write(t); }

function setMainPage() { isMainPage = 1; }

function startHide() {
hideCount ++ ;
if (!isMainPage) { dw('<div id="show_'+hideCount+'">') ; return; }
dw ('<p class="hideMenu" id="hideMenu_'+hideCount+
'"><a href="javascript:showRest('+hideCount+')">More ...</a></p>');
dw ('<div id="hide_'+hideCount+'" style="display: none;">');
}

function showRest(id) {
var e = document.getElementById('hide_'+id);
if (e && e.style) e.style.display = "" ;
e = document.getElementById('hideMenu_'+id) ;
if (e && e.style) e.style.display = "none" ;
}

function endHide() {
dw ('</div>') ;
}

Spelling errors might break your WordML translations

The code that generates a single PRE block from multiple Word paragraphs worked perfectly … until Word decided I made a spelling error in one of the listings. Based on where it thinks the error is, Word can generate the w:proofErr element between w:p elements, thus breaking my code which assumed that the w:p elements are adjacent:

To fix this problem, I had to change following-sibling::*[1] expression into following-sibling::w:p[1] throughout the affected code. The fixed templates are included below:

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
<xsl:if test="not(preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val = 'code')">
<pre class='{w:pPr/w:pStyle/@w:val}'>
<xsl:apply-templates select='.' mode="pre" />
</pre>
</xsl:if>
</xsl:template>

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']" mode="pre">
<xsl:apply-templates />
<xsl:if test="following-sibling::w:p[1]/w:pPr/w:pStyle/@w:val = 'code'">
<xsl:text>&#x0a;</xsl:text>
<xsl:apply-templates mode="pre" select="following-sibling::w:p[1]" />
</xsl:if>
</xsl:template>

<xsl:template match="*" mode="pre" />

Check for empty string or empty node-set

After my close encounter with ternary logic of XSLT (details are here), I started worrying about the results of every test that could contain empty elements (based on how you phrase the test, the empty string might not be equal to empty node-set). To ensure that I’m comparing strings even when one of the variables might be empty, I’m using the string XSLT function that converts whatever input it gets into a string (which can be reliably compared to another string). For example, to test if the current element’s name is empty or missing, use this test:

<xsl:if test="string(@name) = '' ">

WordML: Translate spaces in fixed-font text

I had problems with Blogger formatting, so I’ve decided to translate all fixed-font spaces in my Word texts into non-breakable spaces in translated Blogger-ready HTML. To do this conversion, I had to find the font of the current range … but it could be stored in the range properties, character style or paragraph style, and the range property could use a proportional font that overrides the character- or paragraph style fixed font.

I’ve defined the xsl:key instructions that extract the paragraph or character style fonts …

<xsl:key name="parafont" match="w:rFonts/@w:ascii" 
use="ancestor::w:style[@w:type = 'paragraph']/@w:styleId" />
<xsl:key name="rangefont" match="w:rFonts/@w:ascii"
use="ancestor::w:style[@w:type = 'character']/@w:styleId" />

… and used them in pretty complex (as it has to handle so many cases) xsl:choose statement in the w:t (text-within-range) template:

<xsl:template match="w:t/text()">
<xsl:variable name="pfont"
select="key('parafont',ancestor::w:p/w:pPr/w:pStyle/@w:val)" />
<xsl:variable name="rfont"
select="key('rangefont',ancestor::w:r/w:rPr/w:rStyle/@w:val)" />
<xsl:variable name="font" select="w:rPr/w:rFonts/@w:ascii" />
<xsl:variable name="xlate" select="translate(.,' ','&#160;')" />
<xsl:choose>
<xsl:when test="contains($font,'Courier')">
<xsl:value-of select="$xlate" /></xsl:when>
<xsl:when test="string($font) != ''">
<xsl:value-of select="." /></xsl:when>
<xsl:when test="contains($rfont,'Courier')">
<xsl:value-of select="$xlate" /></xsl:when>
<xsl:when test="string($rfont) != ''">
<xsl:value-of select="." /></xsl:when>
<xsl:when test="contains($pfont,'Courier')">
<xsl:value-of select="$xlate" /></xsl:when>
<xsl:otherwise><xsl:value-of select="." /></xsl:otherwise>
</xsl:choose>
</xsl:template>

WordML: Extract font from paragraph style

The paragraph styles in WordProcessingML have paragraph properties (w:pPr element) and range properties (w:rPr element). The font of a paragraph style is stored in the range properties (w:rPr/w:rFonts/@w:ascii element), as shown in the following snapshot from XML Notepad:

The way to extract font name from paragraph style based on the style name is very similar to the character style case: define an xsl:key definition that matches paragraph style (w:type = 'paragraph') with the given name and extract the font name.

<xsl:key name="parafont" match="w:rFonts/@w:ascii" →
use="ancestor::w:style[@w:type = 'paragraph']/@w:styleId" />

WordML: generate PRE element from multiple paragraphs

In my Word-to-Blogger converter, I wanted to convert a block of paragraphs with the code style into a single PRE element. As Word does not generate a grouping of paragraphs of the same style, I had to develop a pretty convoluted solution:

  • A special template matches paragraphs with the code style.
  • The template performs the translation only if the preceding paragraph does not have the same style. This ensures the subsequent paragraphs with the code style don’t generate extra PRE elements.
  • The template generates the PRE element and sends the current element through another translation using pre mode.
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
<xsl:if test="not(preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val = 'code')">
<pre class='{w:pPr/w:pStyle/@w:val}'>
<xsl:apply-templates select='.' mode="pre" />
</pre>
</xsl:if>
</xsl:template>

The paragraph matching with mode=’pre’ is quite simple:

  • Child elements are processed (producing translated paragraph text).
  • If the following sibling has the code style (we haven’t reached the end of the PRE block), a newline element is appended with the xsl:text instruction and the sibling (the next code paragraph) is translated with mode=’pre’.
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']" mode="pre">
<xsl:apply-templates />
<xsl:if test="following-sibling::*[1]/w:pPr/w:pStyle/@w:val = 'code'">
<xsl:text>&#x0a;</xsl:text>
<xsl:apply-templates mode="pre" select="following-sibling::*[1]" />
</xsl:if>
</xsl:template>

The default template for mode=’pre’ is empty, ensuring that non-paragraph entities accidentally translated with mode=’pre’ do not generate output text.

<xsl:template match="*" mode="pre" />

The solution could be made easier if I would have used the wx:pBdrGroup element that Word inserts around my code paragraphs (I’ve configured a border on the paragraph style), but the wx:pBdrGroup-based approach would fail if someone decided to change the border of the code style.

WordML: extract font from character style

The font of a character style (the style of a range of text, not the whole paragraph) is stored in the w:styles/w:style/w:rPr/w:rFonts/@w:ascii element, as shown in the following snapshot from XML Notepad:

To get the character font from the style name, use the following xsl:key definition:

<xsl:key name="rangefont" match="w:rFonts/@w:ascii" →
use="ancestor::w:style[@w:type = 'character']/@w:styleId" />

You should check the style type in the key definition to ensure that the key matches only the character styles.

Later on, you can use the rangefont key to check the font in a range-matching template. For example, to check whether the current range has courier font, use the following xsl:choose block:

<xsl:choose>
<xsl:when test="contains(w:rPr/wx:font/@wx:val,'Courier')">
<!-- fixed-font specified in the range -->
</xsl:when>
<xsl:when test="contains(key('rangefont',w:rPr/w:rStyle/@w:val),'Courier')">
<!-- fixed-font specified in the character style -->
</xsl:when>
<xsl:otherwise>
<!-- not a fixed font -->
</xsl:otherwise>
</xsl:choose>

Drawing charts in your web pages

Here are a few links that will help you draw great charts in your web pages:

Open XML text file in Microsoft Word

The Word-to-Wiki converter macro I’ve described in one of the previous posts (developed in Word 2003) worked perfectly, but when I wanted to add a Word-to-Blogger macro (along the lines of Word-to-Wiki concept, but with a different XSLT), things got complex. I didn’t want to have any whitespace between P tags generated by XSLT (Blogger interprets whitespace line breaks as implicit <br /> tags), so I wanted to generate XML, not HTML … only to find out that the default text converter used by Microsoft Word (wdOpenFormatAuto) …

Documents.Open FileName:=TxtPath, ConfirmConversions:=False, _
ReadOnly:=False, AddToRecentFiles:=False, _
Format:=wdOpenFormatAuto, Encoding:=65001

… removes the XML tags (leaving only the text nodes) when importing XML files as text. Next I’ve tried the the wdOpenFormatText converter, only to find out that it cannot handle Unicode text. Great news … Finally I’ve managed to get exactly what I needed with the wdOpenFormatUnicodeText converter and msoEncodingUTF8 encoding:

Documents.Open FileName:=TxtPath, ConfirmConversions:=False, _
ReadOnly:=False, AddToRecentFiles:=False, _
Format:=wdOpenFormatUnicodeText, Encoding:=msoEncodingUTF8

xsl:element generates an extra linebreak

In a previous post I’ve described how you can generate dynamic output elements with xsl:element instruction. Unfortunately, the xsl:element generates a newline before the element name (in Saxon and in MSXML) if you use HTML output (xsl:output with method=’html’), resulting in extra line breaks in Blogger (I was developing Word-to-Blogger converter). The only workaround is to use the method=’xml’ with no XML declaration:

<xsl:output method=’xml’ omit-xml-declaration=’yes’)

I didn’t want to have the <?xml ?> heading in the document as I’m pasting the transformation results straight into Blogger.

Query-string-based revision control

One of the easy ways to improve the perceived response time of your web site is to ensure that your web server sets explicit Expires: HTTP header on the static web page components (JavaScript libraries, CSS files, images …), thus reducing the number of HTTP requests sent by the browser. However, if you change your JavaScript code or CSS, your visitors could be stuck with the old version for a long time.

If you use static HTML and a decent development environment, you can easily rename the JavaScript or CSS files (and the HTML pages get updated as a side effect). For more complex environments, you could use an easy trick: append the revision number as a query string after the file name:

<link href="myStyle.css?57” rel="stylesheet" type="text/css" />
<script src="myCode.js?42" type="text/javascript"></script>

Most web servers ignore the query string after the name of a static file, but the browsers perform the caching on whole URL (so x.js?1 is different from x.js?2).

When boolean logic has a maybe value

When doing Word-to-Blogger conversion, I wanted to identify a group of code paragraphs and convert them into a single PRE element (Word-to-MediaWiki is simpler, as you just prepend a single space to the text and MediaWiki merges all code lines together). The original idea was pretty simple:
  • Match w:p elements with w:pPr/w:style/@w:val = 'code';
  • Check if the previous w:p element also has code style, if not, emit the PRE element and handle the whole group of code paragraphs.

The initial code worked ...

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <xsl:if test="preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val != 'code'">
    <pre class='{w:pPr/w:pStyle/@w:val}'>
      <xsl:apply-templates select='.' mode="pre" />
    </pre>
  </xsl:if>
</xsl:template>

... but only until I've decided to add a border around the code style in Word. The border creates a wx:pBdrGroup in WordProcessingML ...

... and the xsl:if test fails if there is no preceding w:p element. To make the code work, I had to reverse the test condition and add a not at the beginning:

<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <xsl:if test="not(preceding-sibling::w:p[1]/w:pPr/w:pStyle/@w:val = 'code')">
... rest of code ...
  </xsl:if>
</xsl:template>

Automatic generation of HTML/WikiMarkup from Word

After developing the XSLT translations that convert WordProcessingML (Microsoft Word XML) into desired HTML or MediaWiki markup, I wanted to integrate them tightly into Word, so I wrote a Word macro that automatically:
  • Performs the conversion using the specified XSLT file;
  • Opens the new document as text;
  • Selects the converted text and copies it to the clipboard;
  • Reopens the original file;
  • Informs the user that the clipboard contains converted text, ready to be pasted into Blogger or MediaWiki.

Here is the source code for the macros:

Sub SaveAsWiki()
'
' SaveAsWiki Macro
' Save Word document as Wiki markup
'
    ConvertWithXSL "createWikiMarkup.xsl"
    MsgBox "The Wiki Markup is on the clipboard", vbOKOnly, "Done"
End Sub

Sub SaveAsBlogger()
'
' SaveAsBlogger: save word document as blogger-optimized HTML markup
'
    ConvertWithXSL "createBloggerMarkup.xsl"
    MsgBox "The Blogger Markup is on the clipboard", vbOKOnly, "Done"
End Sub

Sub ConvertWithXSL(XSLFile As String)
'
' ConvertWithXSL: converts the Word document with specified XSL
'
    Dim XPath, DocName, TxtPath
    
    XPath = ActiveDocument.AttachedTemplate.Path & _
            Application.PathSeparator & XSLFile
    
    TxtPath = Environ("TEMP")
    If TxtPath <> "" Then TxtPath = TxtPath & "\"
    TxtPath = TxtPath & "wmk.txt"
    
    DocName = ActiveDocument.FullName
    If MsgBox("The conversion process will lose all changes you've made. " & _
              "You have to save the document before running the conversion. " & _
              "Did you do it?", vbYesNo, "Warning") <> vbYes Then Exit Sub
              
    With ActiveDocument
        .XMLSaveDataOnly = False
        .XMLUseXSLTWhenSaving = True
        .XMLSaveThroughXSLT = XPath
        .XMLHideNamespaces = False
        .XMLShowAdvancedErrors = False
        .XMLSchemaReferences.HideValidationErrors = False
        .XMLSchemaReferences.AutomaticValidation = True
        .XMLSchemaReferences.IgnoreMixedContent = False
        .XMLSchemaReferences.AllowSaveAsXMLWithoutValidation = True
        .XMLSchemaReferences.ShowPlaceholderText = False
    End With
    ActiveDocument.SaveAs _
        FileName:=TxtPath, FileFormat:=wdFormatXML, _
        AddToRecentFiles:=False
    ActiveDocument.Close
    
    Documents.Open FileName:=TxtPath, ConfirmConversions:=False, _
        ReadOnly:=False, AddToRecentFiles:=False, _
        Format:=wdOpenFormatAuto, Encoding:=65001
    Selection.WholeStory
    Selection.Copy
    ActiveDocument.Close
    
    Documents.Open FileName:=DocName
End Sub

XSLT: Generate heading from WordProcessingML

Microsoft Word (and WordProcessingML) treats all paragraphs equally (all of the are represented as w:p elements) and it's the job of the XSLT programmer to separate headings (the paragraphs where the paragraph style contains w:pPr/w:outlineLvl/@w:val) from the regular text.

I've started by defining a key that would extract the outline level from the paragraphs style associated with the current w:p element:
<xsl:key name="outline" match="w:outlineLvl" use="ancestor::w:style[@w:type = 'paragraph']/@w:styleId" />
If successful, this key would return the w:outlineLvl element whose ancestor w:style element has the specified w:styleId attribute (and, yes, it took me five minutes to figure out what I've been doing when I've revisited the code after three months). The key is then used in a template that uses xsl:element to create a Hx or P element:
<xsl:template match="w:p">
  <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" />
  <xsl:variable name="outLvl"
    select="key('outline',w:pPr/w:pStyle/@w:val)/@w:val" />
  <xsl:variable name="elName">
    <xsl:choose>
      <xsl:when test="$outLvl">h<xsl:value-of select="$outLvl + 1" /></xsl:when>
      <xsl:otherwise>p</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:element name="{$elName}">
... rest of translation code ...
  </xsl:element>
</xsl:template>

XSLT: Use specific matches instead of xsl:choose

The if-then-else construct in XSLT 1.0 is "somewhat baroque", it's thus easier (and probably also faster) to use other methods to select various alternatives, for example the specific matches in the xsl:template match parameter.

In one of my recent WordProcessingML (Microsoft Word XML) XSLT projects I had to transform paragraphs with the cite style into blockquote, those with code style into pre and the rest of them into regular paragraphs. Instead of using a complex xsl:choose structure, I've defined three templates:
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'cite']">
  <blockquote class="{w:pPr/w:pStyle/@w:val}">
    <xsl:apply-templates />
  </blockquote>
</xsl:template>
 
<xsl:template match="w:p[w:pPr/w:pStyle/@w:val = 'code']">
  <pre class="{w:pPr/w:pStyle/@w:val}">
    <xsl:apply-templates />
  </pre>
</xsl:template>
 
<xsl:template match="w:p">
... regular code goes here ...
</xsl:template>

PHP startup kit

Recently I've got involved in deploying Open Source Web 2.0 applications (specifically, phpBB, Wordpress and MediaWiki), so I had to bite the PHP/MySQL bullet. If you're in the same situation, I would strongly recommend these books:

Full disclosure: If you click on one of the above links and actually buy the book, I might get around $1.00 from Amazon

XSLT: Extract file name from path

When transforming the WordProcessingML documents, I wanted to output an IMG element that would contain the HREF attribute equal to the picture name used in the Word file (assuming the picture is linked, not just inserted). However, the picture name in the Word file usually contains whole path (even when the picture and the Word document are in the same directory), so I needed a function that would extract the file name from the path. As XSLT 1.0 has very limited function set (and MSMXL still doesn't support XSLT 2.0), I had to write the function myself using a named template:
<xsl:template name="fileName">
  <xsl:param name="path" />
  <xsl:choose>
    <xsl:when test="contains($path,'\')">
      <xsl:call-template name="fileName">
        <xsl:with-param name="path" select="substring-after($path,'\')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="contains($path,'/')">
      <xsl:call-template name="fileName">
        <xsl:with-param name="path" select="substring-after($path,'/')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$path" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

The function recognizes Windows- and Unix-style paths (forward and backward slashes are recognized as path delimiters).

Count XML items with specific attribute value

To count the number of nodes that have specific attribute or that have attribute with specific value in your XSLT transformation, use the following code:
  • All elements having the specified attribute:
  • <xsl:select value="count(//*[@attr])" />
  • All elements having attribute with desired value:
  • <xsl:select value="count(//*[@attr = 'value'])" />
For example, to count all elements in an XHTML document having a class name, use:
<xsl:select value="count(//*[@class])" />
To count all elements with class name containing 'menu' use:
<xsl:select value="count(//*[contains(@class,'menu')])" />

Line breaks in XSLT-generated text document

If you're generating text documents with XSLT transformation and use xsl:text tags for tight whitespace control, you might need to insert the end-of-line characters into the output stream manually. You could do that with a newline within the xsl:text tag, like this:
<xsl:text>
</xsl:text>
However, using explicit newline character (&#xa; or &#10;) results in more explicit code that's easier to read, understand and maintain:
<xsl:text>&#xa;</xsl:text>

XPATH expressions in MSXML selectNodes() function are evaluated on the whole DOM tree

The selectNodes() function available in the Microsoft XML (MSXML) API evaluates the XPATH expressions supplied as the argument in the context of the whole DOM tree to which the DOM node belongs, not just the node on which the selectNodes function was executed. For example, if you use the following code …
set firstDiv = document.selectSingleNode("//div")
set paraList = firstDiv.selectNodes("//p")
… the paraList will contain the list of all paragraphs in the whole document, not just the list of paragraphs in the DIV on which the selectNodes call was executed.

Tight control on whitespaces in XSLT-generated documents

Continuing from the previous post on the whitespace issues, here are a few rules that you should keep in mind:
  • Whitespace-only text nodes are not copied from the XSLT document into the output document;
  • Text nodes containing non-whitespace characters are copied in their entirety, including any whitespace characters that are copied verbatim. Extra line breaks can easily appear in your output text, more so if you try to apply nice readable format to the source XSLT document.
  • If you want very tight control on the generated output, place the non-whitespace characters only within the xsl:text tags. The contents of the xsl:text tag (which cannot contain any embedded tags) is copied straight into the output document.

The whitespace control is extremely important if you're generating text output with XSLT transformation.

Word to MediaWiki conversion with XSLT

I've tried two Word to MediaWiki converters: the set of macros described in the Word2MediaWikiPlus extension and the OpenOffice converter. The OpenOffice converter does not work too well (for example, if you have a paragraph style with COURIER font, it's not transformed into MediaWiki code markup) and I needed some extensions that would be hard to cram into Word macros used by Word2MediaWikiPlus, so I decided to implement the converter as an XSLT translator (usable in Word 2003/2007) that should be easy(er) to modify for someone fluent in XSLT. You can find the current (pre-alpha) sources on SourceForge and download the alpha release. If I'm missing a functionality you desperately need, let me know.

Soft breaks in WordProcessingML

The WordProcessingML has an “interesting” way of representing soft breaks (Ctrl-Enter in Word): if a range (w:r, also called run) has a w:br child, it represents a soft break at that position (see also Section 2.3.3 of Part 4 of ECMA-376). Here is the XML Notepad display of a sample three-line paragraph (with two soft breaks):To process the soft break in a range with XSLT, use templates similar to these:
<xsl:template match="w:r">
  <xsl:apply-templates />
</xsl:template>

<xsl:template match="w:t"><xsl:value-of select="text()" /></xsl:template>

<xsl:template match="w:br|w:cr"><br /></xsl:template>

Generate ATOM feed in Microsoft SQL Server

Microsoft SQL Server 2005 can generate XML document straight from the relational tables. If you want to publish an ATOM feed based on data stored in your database, you no longer have to write complex server-side scripts; the SQL server can do all the work for you. I've described the step-by-step solution complete with code samples and printouts in an article “Generating Atom Feed from SQL Data” that was recently published by InformIT.

Implementing Access Controls on SQL Server Data

Most relational databases provide fine-tuned access controls to various objects in the database, including tables, views, and indices, but lack the support for individual row (record) access control. In the “Implementing Access Controls on SQL Server Data” article I wrote for InformIT, I'm describing how you can implement record-level access control in any relational database that supports triggers and separate access controls for views and underlying tables.

Analyze your web page peformance

Straight from the Yahoo Developer Network: YSlow analyzes any web page and generates a grade for each rule and an overall grade. If a page can be improved, YSlow lists the specific changes to be made. Highly recommended tool :)

Back to the future?

In his blog post, Steve Souders describes how IE8 increases the page download performance by using more than two parallel HTTP sessions, briefly mentioning that this violates the recommendation from RFC 2616, which was, after all, written in 1999.

Some web developers might be too young to remember why RFC 2616 has the "two parallel sessions" recommendations. It was (among other reasons) a result of the disasters an earlier version of Internet Explorer (IE3?) caused on the Web infrastructure when Microsoft in its infinite wisdom decided to open multiple parallel HTTP sessions. The browsers quickly overloaded WAN links, caused server overloads (if you use 6 parallel sessions instead of two, all of a sudden the number of "visitors" increases three-fold), firewall failures (some firewalls had licenses limiting the number of parallel sessions) and potentially NAT failures (if you have to do port-address-translation, you might run out of port numbers).

But it looks like the history needs to repeat itself ... or maybe this time the infrastructure is ready for the additional load? My "what could fail" bet would be on the servers, but we'll see in a few months ...

Reliability of client-side XSLT transformations

I've received an interesting question on my “Search Engine Optimization in XML+XSLT designs” post:
Are there considerable browser-specific differences in xsl transformation, or am I being overly cautious?
I've been using browser-side XSLT processing for three years. Although I had initial share of problems with IE5 and some releases of IE6 (finally forcing me to perform server-side transformations for IE5), IE6/7 and Firefox behave almost identically as long as your XSLT is valid. Firefox is a bit more relaxed in error handling, so it might survive an invalid stylesheet and ignore the error. Handling XML/XSLT errors on IE is a nightmare: if the XSLT transformation fails, it's almost impossible to reload the document without closing the browser (obviously IE has some serious caching problems with XML documents). To test new XSLT transformations, I usually force server-side transformations and receive good error messages from server's MSXML.

Opera is a different story. It didn't support XSLT until pretty recently and even then the document() function was broken. I have simply decided not to offer XML data to Opera visitors (they are a minority anyway) and Safari is not widespread enough in my customer base to notice.

This post is part of You've asked for it series of articles.

Four very useful XSLT tricks

Attribute value template, Muenchian Grouping, Usage of XPath axes, handling input XML namespaces. Highly recommended reading :)

Web 2.0 is not necessarily a good thing

A sane voice in the Web-2.0-crazy world: Jacob Nielsen analyzes the impact of AJAX on web usability (beyond the cool factor) and its impact on the bottom line.

Set output attributes with XSLT

Setting attributes in output elements generated by XSLT transform is extremely easy: just enclose the XPath expression in curly brackets within the quotes surrounding the attribute value. You might even concatenate multiple XPath expressions or XPath expressions and string constants within the attribute value. For example, if you want to set CLASS attribute of the output DIV element to the value of the input @id attribute prefixed by "CSS-", use the following syntax:
<DIV class="CSS-{@id}">
However, if you want to eliminate the attributes that would have empty values (for example, the DIV element in the previous example should not have the class attribute if the @id input attribute does not exist), use xsl:attribute in combination with xsl:if or xsl:choose.

Microsoft's XMLHttpRequest Objects

If you don't want to use a wrapper library like Sarissa and plan to work out the differences between Microsoft's and other vendors' implementation of XMLHttpRequest, you'll find a lot of details in this post by Jonathan Snook.

Make Your Web Pages Mobile-Friendly

A while ago I had to develop a few mobile-oriented applications, essentially serving existing content on tiny screen resolutions. As one could expect, I approached it from the XML/XSLT perspective and designed a few XSLT stylesheets to render existing XML content (all of my recent content is in XML format) in a more mobile-friendly format.

At approximately the same time I was working a lot with Blogger APIs and Atom ... and you can see the results of these seemingly unrelated activities in my new InformIT article “Make Your Web Pages Mobile-Friendly” where I'm describing how you can take an existing Atom feed and wrap it into a mobile-friendly user interface.

UTF-8 text file generated from ASP

In the previous post, I've described how to output plain text encoded as UTF-8 from an ASP script. To save that text to a file on the client's disk drive, you can add the Content-disposition header, indicating the ASP script response is actually an attachment that has to be saved to disk. However, some Windows programs expect to see the Byte-order mark in front of the UTF-8 text. To generate this byte stream, I've used a simple trick: first I've switched the codepage to US-ASCII (ensuring there will be no character transcoding), wrote the three byte sequence and switched back to UTF-8 to write the rest of the text.
Sub OutputCSV(finalText)
  Response.Clear
  Response.Charset = "utf-8"
  Response.ContentType = "text/plain"
  Response.AddHeader "Content-disposition", "attachment; filename=savedData.csv"
  Response.Expires = 0
  Response.Codepage = 1252
  Response.Write Chr(239) & Chr(187) & Chr(191)
  Response.Codepage = 65001
  Response.Write finalText
  Response.End
End Sub

High-performance AJAX

Julien Lecomte has shared an excellent presentation describing how to create high-performance AJAX applications. Highly recommended reading :)