XSLT transformation in ASP: XML with MSXML6

Similar to the XML-to-HTML transformation with MSXML6, the results of the transformNode function do not include the encoding information; the XML text produced by the sample program with the XSL transformation described in the previous post is a perfect XML document:

<?xml version="1.0"?>
<output>
  Greek letter: β 
  EE: č
</output>

An XML document without the encoding attribute in the xml declaration is assumed to be encoded in UTF-8. The results of the transformNode function in MSXML6 are thus absolutely correct if you use UTF-8 output encoding.

XSLT transformation in ASP: XML with MSXML3

You might consider server-side XML-to-XML transformations rare, but you have to use <xsl:output method=”xml”> if you want to generate valid XHTML code from your XML data. To shorten the printouts, we’ll use a simple (non-XHTML) XSL transformation to generate the test results:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    Greek letter: <xsl:value-of select="@greek" /> 
    EE: <xsl:value-of select="@ee" />
  </output>
</xsl:template> 

</xsl:stylesheet>

As with HTML, MSXML3 interferes with the output, inserting improper UTF-16 encoding directive, resulting in the following XML text:

<?xml version="1.0" encoding="UTF-16"?>
<output>
  Greek letter: β 
  EE: č
</output>

You could bypass this bug by setting the omit-xml-declaration attribute of the xsl:output element to yes

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

… rest deleted …

… resulting in the following transformation output:

<output>
  Greek letter: β 
  EE: č
</output>

However, if you want to retain XML declaration in the XML document, you have to replace UTF-16 in the output string with UTF-8, like we did in the HTML transformation case. The following modified test program produces perfect XML document when used with the original XSLT transformation (without the omit-xml-declaration attribute):

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT), _
  "encoding=""UTF-16""","encoding=""utf-8""")
%>

XSLT transformation in ASP: HTML with MSXML6

If your IIS platform includes MSXML6, you should use MSXML6 to transform XML into HTML instead of MSXML3 (see the Using the right version of MSXML article for proper fallback process). MSXML6 still generates the META tag (the MSXML3-related post describes problems caused by the META tag), but does not include the charset parameter in it, resulting in perfect HTML output. The sample ASP program using the simple XSLT transformation from the “XSLT transformation in ASP: HTML with MSXML3” post produces the following output when using MSXML6 (MSXML2.DomDocument.6.0 ProgID):

<html>
<head>
<META http-equiv="Content-Type" content="text/html">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">Greek letter: β EE: č</node></body>
</html>

If you want to use complex XSLT transformations with MSXML6, you might have to set additional second-level DOM properties, for example AllowDocumentFunction. You might also want to set ValidateOnParse to false if the validity of your XML document is not a major concern.

XSLT transformation in ASP: HTML with MSXML3

Most commonly, you’ll use server-side XSLT transformation in ASP to transform XML data into HTML using MSXML3 (available on almost all IIS platforms). The following XSLT stylesheet is used in the HTML test:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="html" encoding="utf-8" />

<xsl:template match="root">
  <html>
    <head>
      <title>Sample XSLT server-side transformation</title>
    </head>
    <body>
      <node type="test">
        Greek letter: <xsl:value-of select="@greek" /> 
        EE: <xsl:value-of select="@ee" />
      </node>
    </body>
  </html>
</xsl:template> 

</xsl:stylesheet>

With MSXML3, the transformNode function inserts a META directive in the output stream, resulting in the following transformation result:

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">
Greek letter: β 
EE: č
</node>
</body>
</html>

On the other hand, the HTTP headers generated by the test ASP script claim the content is UTF-8 encoded (and the raw data dump performed with Fiddler confirms that). The response headers were taken from LiveHTTPHeaders Firefox extension:

HTTP/1.x 200 OK
Server: Microsoft-IIS/5.1
Date: Mon, 15 Sep 2008 11:20:56 GMT
X-Powered-By: ASP.NET
Content-Length: 222
Content-Type: text/html; Charset=utf-8
Cache-Control: private

Most browsers (including IE and Firefox) take the Content-type from HTTP headers, using the META directive as a fallback. They thus render the page as intended. Some browsers (including the TextView tab of Fiddler) prefer the META directive and produce garbage.

The solution

To match the META header with the HTTP headers, replace the charset=UTF-16 string in the text returned by the transformNode function with charset=UTF-8. The modified sample ASP script is included below:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT),"charset=UTF-16","charset=UTF-8")
%>

XSLT transformation in ASP: The testbed

As I’ve mentioned in the previous post, MSXML functions called from ASP believe they work in UTF-16 environment. The transformNode function might insert this information in the output string based on several parameters, one of them being the version of the MSXML ActiveX control. To test behavior of various versions of MSXML, we’ll use the following test program:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write XDoc.transformNode(XSLT)
%>

And a stylesheet similar to this one:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    <node type="test">
      Greek letter: <xsl:value-of select="@greek" /> 
      EE: <xsl:value-of select="@ee" />
    </node>
  </output>
</xsl:template> 

</xsl:stylesheet>

We’ll test two MSXML versions: MSXML3 (default with Windows XP) and MSXML6 (default with Vista, also available on Windows XP). MSXML3 should be available on almost all web servers, you might not get MSXML6 everywhere (my hosting provider did not offer it when it mattered most to me).

The test results will be published in the next few posts.

Further reading:

XSLT transformation in ASP: the principles

If you want to perform server-based XSLT transformations on IIS using classic ASP, you should keep in mind the following facts:

  • ASP uses 16-bit Unicode characters internally. MSXML ActiveX modules called from ASP therefore think they work in UTF-16 environment.
  • Due to the impression that the XSLT transformations happen in UTF-16 environment, MSXML procedures might prepend UTF-16 specific headers in transformed text (more about that in follow-up posts).
  • The transformed text returned to ASP from transformNode function is UTF-16 encoded.
  • When the ASP script sends the text to the browser (or writes it into a text file), the codepage set with Response.Codepage is used to transform UTF-16 into target character set.

Further reading:

Yahoo is not totally HTTP compliant ... what else is new?

An article in Cisco's support wiki caught my attention today: it claims that Cisco routers could deny access to yahoo.com because Yahoo!'s web servers emit invalid chunked encoding. Interesting ... so I've started Fiddler and opened Yahoo!'s home page. This is what I've got:
HTTP/1.1 200 OK
Date: Sun, 14 Sep 2008 13:42:11 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV"
Cache-Control: private
Vary: User-Agent
Set-Cookie: IU=deleted; expires=Sat, 15 Sep 2007 13:42:10 GMT; path=/; domain=.yahoo.com
Set-Cookie: FPCM=deleted; expires=Sat, 15 Sep 2007 13:42:10 GMT; path=/
Set-Cookie: D=_ylh=X3oDMTFkbmtlMG9nBF9TAzI3MTYxNDkEcGlkAzEyMjEzOTczMjEEdGVzdAMwBHRtcGwDaW5kZXgtbA--; path=/; domain=.yahoo.com
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip

7b17   
As you can see there are actually three blanks after the chunk length, which is a clear violation of section 3.6.1 of RFC 2616 (HTTP); the chunk length should be followed by semicolon (for chunk extensions) or CRLF. I'm not really surprised; the crappy implementation of Frontpage extensions on Geocities (and the fact that they wanted to charge me for it) pushed me toward my own web site six years ago.

JavaScript: the good parts

If your brain hasn't been polluted with awful JavaScript practices yet (or if you're willing to submit to a proper brainwashing), read the JavaScript: The Good Parts book from Douglas Crockford. He covers what he believes (and is usually right) are the good parts of JavaScript, resulting in scalable, easy-to-maintain code. Although the book targets beginner audiences (or programmers not yet familiar with JavaScipt), it's an interesting read for anyone who wants to write better JavaScript code. Even I found a few interesting topics (for example, multiple ways to implement object constructors and inheritance) after working with JavaScript for 15+ years.

Chrome: first impressions

I've just downloaded Chrome (the new browser from Google), tested it on my applications and got immediately impressed - they all worked, even the client-side XSLT transformation driven by the xsl-stylesheet directive. Then I did a few more random tests and my enthusiasm was drastically reduced:
  • The inter-line spacing on table heading texts was way too large: <th valign='bottom'>Line 1<br />Line 2</th> produced a blank line between the two text lines. I did not investigate what the root cause might be as all the other major browsers render it almost identically, so I don't really care what upset Chrome.
  • The top line of our corporate Wiki (the Login text) is misplaced.
  • The View source window does not display processing instructions in XML documents.
  • And the worst offender: Blogger in draft works way better, faster and more reliable in Firefox or Internet Explorer than in Chrome.
Summary: I will have Chrome installed to test my applications (some visitors are already using it), but will not use it in the near future.