Showing posts with label XML. Show all posts
Showing posts with label XML. Show all posts

Firefox 3.5 XHTML support stinks

Continuing my Firefox “quality” rants: I just found the willpower to completely redesign my AJAX framework, going from IFRAMEs to jQuery AJAX calls (replacing straightforward and quite elegant XSL transformations with pages of convoluted JavaScript code) to work around the bugs in FF3.0, only to find out that FF3.5 is even worse and introduced numerous additional “features”.

For example, FF3.5 requires an explicit </script> tag within an XHTML document with proper DOCTYPE. The following document was validated with W3C XHTML validator …

<!DOCTYPE html 
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>Test</title>
  <script src="… source URL … " type="text/javascript" />
</head>
<body id="body">
</body>
</html>

… but it does not load the script in FF3.5.

Before anyone starts telling me that it’s not so hard to include the closing </script> tag in your source – try telling that to the database with native XML support where I’m storing snippets of the code. I had to convert the field from XML to TEXT (and lose all XML goodies I might eventually get) just to avoid the elimination of the explicit closing tag.

May I make a quick suggestion to the FF developers: maybe, just maybe, you might want to consider supporting existing web applications in parallel with adding new features that not too many people can use (because no other browser supports them yet).

What else could you expect from Google :(

Armed with the minimum ATOM-like document Blogger accepts, I wrote an XSL transformation that converted my proprietary news feed format into Blogger-liked ATOMish format. I used Saxon as the transformation engine, resulting in a nice UTF-8 encoded file that opened easily in Internet Explorer, Firefox and XML Notepad.

The end result: Blogger refused to import the file until I've opened it in FrontPage and prettified the XML text. Obviously they have a "custom" XML parser that has problems with some valid variants of XML. But what else can one expect from company that wants to build everything on its own.

Import into Blogger from an Atom feed

Google has recently added the import functionality to Blogger, making it very easy to migrate from any other information service to Blogger. Unfortunately, the import process rejects anything that looks like it did not come from the Blogger export feature. I did not try to nail down the very minimum that would be required for the import to work, but this is a very minimalistic Atom document that works:

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' 
      xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'  
      xmlns:gd='http://schemas.google.com/g/2005'
      xmlns:thr='http://purl.org/syndication/thread/1.0'>
    <id>tag:blogger.com,1999:blog-123456789.archive</id>
    <updated>2008-11-01T00:00:00.000+01:00</updated>
    <title type='text'>Sample import file</title>
    <author>
        <name>Example.com</name>
        <uri>http://www.blogger.com/profile/12345678</uri>
        <email>noreply@blogger.com</email>
    </author>
    <generator version='7.00' uri='http://www.blogger.com'>Blogger</generator>
    <entry>
        <id>tag:blogger.com,1999:blog-123456789.post-12345678</id>
        <published>2008-11-01T00:00:00.000+01:00</published>
        <updated>2008-11-01T00:00:00.000+01:00</updated>
        <category scheme='http://schemas.google.com/g/2005#kind' 
          term='http://schemas.google.com/blogger/2008/kind#post'/>
        <category scheme='http://www.blogger.com/atom/ns#' term='CatItem'/>
        <title type='text'>Post title</title>
        <content type='html'>Post text, HTML needs to be escaped</content>
        <author>
            <name>Example.com</name>
            <uri>http://www.ioshints.info/about</uri>
            <email>noreply@blogger.com</email>
        </author>
        <thr:total>0</thr:total>
    </entry>
</feed>

XSLT transformation in ASP: non-standard XML encoding

If you want to generate documents with non-standard encodings with server-side XSLT transformation in IIS/ASP environment, you should:

  • Set the Response.Charset property to the desired character set;
  • Set the Response.Codepage property to the desired code page (or use the <%@ LANGUAGE="VBScript" CODEPAGE="codepage" %> directive in your ASP script).
  • Omit XML declaration from the translated text (using omit-xml-declaration=”yes” attribute in xsl:output element) and prepend the desired XML declaration in front of the translated text.

For example, the following program generates XML (or XHTML) document encoded in windows-1250 character set (codepage: 1250) …

<%
Const DOMClass = "MSXML2.DOMDocument"
Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)
XDoc.loadXML("<root greek='&#946;' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))
Response.Clear
Response.Charset = "windows-1250"
Response.Codepage = 1250
Response.ContentType = "text/xml"
Response.Write "<?xml version='1.0' encoding='windows-1250'?>"
Response.Write XDoc.transformNode(XSLT)
%>

… when using the following XSL transformation:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" />
… rest deleted …

XSLT transformation in ASP: XML with MSXML6

Similar to the XML-to-HTML transformation with MSXML6, the results of the transformNode function do not include the encoding information; the XML text produced by the sample program with the XSL transformation described in the previous post is a perfect XML document:

<?xml version="1.0"?>
<output>
  Greek letter: β 
  EE: č
</output>

An XML document without the encoding attribute in the xml declaration is assumed to be encoded in UTF-8. The results of the transformNode function in MSXML6 are thus absolutely correct if you use UTF-8 output encoding.

XSLT transformation in ASP: XML with MSXML3

You might consider server-side XML-to-XML transformations rare, but you have to use <xsl:output method=”xml”> if you want to generate valid XHTML code from your XML data. To shorten the printouts, we’ll use a simple (non-XHTML) XSL transformation to generate the test results:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    Greek letter: <xsl:value-of select="@greek" /> 
    EE: <xsl:value-of select="@ee" />
  </output>
</xsl:template> 

</xsl:stylesheet>

As with HTML, MSXML3 interferes with the output, inserting improper UTF-16 encoding directive, resulting in the following XML text:

<?xml version="1.0" encoding="UTF-16"?>
<output>
  Greek letter: β 
  EE: č
</output>

You could bypass this bug by setting the omit-xml-declaration attribute of the xsl:output element to yes

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

… rest deleted …

… resulting in the following transformation output:

<output>
  Greek letter: β 
  EE: č
</output>

However, if you want to retain XML declaration in the XML document, you have to replace UTF-16 in the output string with UTF-8, like we did in the HTML transformation case. The following modified test program produces perfect XML document when used with the original XSLT transformation (without the omit-xml-declaration attribute):

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT), _
  "encoding=""UTF-16""","encoding=""utf-8""")
%>

XSLT transformation in ASP: HTML with MSXML6

If your IIS platform includes MSXML6, you should use MSXML6 to transform XML into HTML instead of MSXML3 (see the Using the right version of MSXML article for proper fallback process). MSXML6 still generates the META tag (the MSXML3-related post describes problems caused by the META tag), but does not include the charset parameter in it, resulting in perfect HTML output. The sample ASP program using the simple XSLT transformation from the “XSLT transformation in ASP: HTML with MSXML3” post produces the following output when using MSXML6 (MSXML2.DomDocument.6.0 ProgID):

<html>
<head>
<META http-equiv="Content-Type" content="text/html">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">Greek letter: β EE: č</node></body>
</html>

If you want to use complex XSLT transformations with MSXML6, you might have to set additional second-level DOM properties, for example AllowDocumentFunction. You might also want to set ValidateOnParse to false if the validity of your XML document is not a major concern.

XSLT transformation in ASP: HTML with MSXML3

Most commonly, you’ll use server-side XSLT transformation in ASP to transform XML data into HTML using MSXML3 (available on almost all IIS platforms). The following XSLT stylesheet is used in the HTML test:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="html" encoding="utf-8" />

<xsl:template match="root">
  <html>
    <head>
      <title>Sample XSLT server-side transformation</title>
    </head>
    <body>
      <node type="test">
        Greek letter: <xsl:value-of select="@greek" /> 
        EE: <xsl:value-of select="@ee" />
      </node>
    </body>
  </html>
</xsl:template> 

</xsl:stylesheet>

With MSXML3, the transformNode function inserts a META directive in the output stream, resulting in the following transformation result:

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">
<title>Sample XSLT server-side transformation</title>
</head>
<body><node type="test">
Greek letter: β 
EE: č
</node>
</body>
</html>

On the other hand, the HTTP headers generated by the test ASP script claim the content is UTF-8 encoded (and the raw data dump performed with Fiddler confirms that). The response headers were taken from LiveHTTPHeaders Firefox extension:

HTTP/1.x 200 OK
Server: Microsoft-IIS/5.1
Date: Mon, 15 Sep 2008 11:20:56 GMT
X-Powered-By: ASP.NET
Content-Length: 222
Content-Type: text/html; Charset=utf-8
Cache-Control: private

Most browsers (including IE and Firefox) take the Content-type from HTTP headers, using the META directive as a fallback. They thus render the page as intended. Some browsers (including the TextView tab of Fiddler) prefer the META directive and produce garbage.

The solution

To match the META header with the HTTP headers, replace the charset=UTF-16 string in the text returned by the transformNode function with charset=UTF-8. The modified sample ASP script is included below:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write Replace(XDoc.transformNode(XSLT),"charset=UTF-16","charset=UTF-8")
%>

XSLT transformation in ASP: The testbed

As I’ve mentioned in the previous post, MSXML functions called from ASP believe they work in UTF-16 environment. The transformNode function might insert this information in the output string based on several parameters, one of them being the version of the MSXML ActiveX control. To test behavior of various versions of MSXML, we’ll use the following test program:

<%
Const DOMClass = "MSXML2.DOMDocument"

Set XSLT = Server.CreateObject(DOMClass)
Set XDoc = Server.CreateObject(DOMClass)

XDoc.loadXML("<root greek='β' ee='č' />")
XSLT.load(Server.MapPath("SampleXSLT.xsl"))

Response.Clear
Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write XDoc.transformNode(XSLT)
%>

And a stylesheet similar to this one:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="root">
  <output>
    <node type="test">
      Greek letter: <xsl:value-of select="@greek" /> 
      EE: <xsl:value-of select="@ee" />
    </node>
  </output>
</xsl:template> 

</xsl:stylesheet>

We’ll test two MSXML versions: MSXML3 (default with Windows XP) and MSXML6 (default with Vista, also available on Windows XP). MSXML3 should be available on almost all web servers, you might not get MSXML6 everywhere (my hosting provider did not offer it when it mattered most to me).

The test results will be published in the next few posts.

Further reading:

XSLT transformation in ASP: the principles

If you want to perform server-based XSLT transformations on IIS using classic ASP, you should keep in mind the following facts:

  • ASP uses 16-bit Unicode characters internally. MSXML ActiveX modules called from ASP therefore think they work in UTF-16 environment.
  • Due to the impression that the XSLT transformations happen in UTF-16 environment, MSXML procedures might prepend UTF-16 specific headers in transformed text (more about that in follow-up posts).
  • The transformed text returned to ASP from transformNode function is UTF-16 encoded.
  • When the ASP script sends the text to the browser (or writes it into a text file), the codepage set with Response.Codepage is used to transform UTF-16 into target character set.

Further reading:

Generate ATOM feed in Microsoft SQL Server

Microsoft SQL Server 2005 can generate XML document straight from the relational tables. If you want to publish an ATOM feed based on data stored in your database, you no longer have to write complex server-side scripts; the SQL server can do all the work for you. I've described the step-by-step solution complete with code samples and printouts in an article “Generating Atom Feed from SQL Data” that was recently published by InformIT.

Make Your Web Pages Mobile-Friendly

A while ago I had to develop a few mobile-oriented applications, essentially serving existing content on tiny screen resolutions. As one could expect, I approached it from the XML/XSLT perspective and designed a few XSLT stylesheets to render existing XML content (all of my recent content is in XML format) in a more mobile-friendly format.

At approximately the same time I was working a lot with Blogger APIs and Atom ... and you can see the results of these seemingly unrelated activities in my new InformIT article “Make Your Web Pages Mobile-Friendly” where I'm describing how you can take an existing Atom feed and wrap it into a mobile-friendly user interface.

Search Engine Optimization in XML+XSLT designs

It's amazing that the question “how do I perform SEO for a web site that does XSLT transformation on the browser” still pops up, as the short answer is very obviously “You can't.”The long answer is, as always, a bit more complex:
  • Google can process XML data, but only stores it into supplementary index (lower rankings)
  • If the server's XML output does not contain enough context (for example, product description in HTML-ish format), the search engines cannot make any sense out of it, so it would not be indexed appropriately.
  • Search engines will not follow explicit (let alone implicit) links in XML documents, so you need a sitemap (classic HTML page, Atom/RSS feed or Google sitemap) to help search engines find the content pages
Until Google (and other search engines) implement XSLT transformations, the only sensible approach is to detect the client capabilities on the server and perform the XSLT transformation on the server if the client cannot do it (the whole architecture is described in my InformIT article Optimized Presentation of XML Content that was published almost exactly a year ago).

XML Handling in Microsoft SQL Server 2005

InformIT.com has just published my article describing how you can use XML data type in SQL Server 2005 to optimize SQL queries and updates.

Link multiple elements from source XML data

An interesting question was asked in the Sun's Java forums: “How do I match attributes on different input nodes?” For example, I would like to link the target attribute of the source element in the following XML data with the target node (based on the num attribute of its id child).
<data>
  <source id="abc" target="123" />
 
  <target>
     <text>Message</text>
     <id num="123" />
  </target>
</data>
While it's easy to select the correct target node, it's harder to get the source attribute into the XPath expression; the only way to do it is to store the source attribute in a local variable and then use the variable value (which is context-independent) in the XPath expression:
<xsl:template match="source">
  <xsl:variable name="target" select="@target" />
  Source <xsl:value-of select="@id" />
    is associated with
  <xsl:value-of select="//target[id/@num = $target]/text" />
</xsl:template>
The final XPath expression works as follows:
  • It selects a target node anywhere in the source XML tree (the // path) such that the num attribute of its id child is equal to the local variable target
  • When the target node is selected, the value of the XPath expression is the value of its text child, which is then rendered into a string (collapsing all its descendant text nodes into the final result).

This post is part of You've asked for it series of articles.

Storing XML Data in a Relational Database

The Storing XML Data in a Relational Database article just published by InformIT.com describes the various methods you can use to store XML data in an SQL database. I've also tried to explain when it would be appropriate to store XML data in a database and when you'd be better off using the traditional relational database model. The rest of the article details the procedures you can use to insert, query, retrieve and modify the stored XML data.

Interesting: greater-than character does not need to be escaped in XML

I always thought that less-than, ampersand and greater-than character have to be escaped in XML. As Micah Dubinko points out in his blog post, that's not strictly true; greater-than character usually does not have to be escaped (here is the relevant part of the XML standard).

Serve SQL data in XML format

InformIT has just published my next article that explains various options you have when your data resides in an SQL database and the client side of your AJAX application expects the data in XML format. The article covers a number of different approaches, including:

Firefox hides the ?xml processing instruction

When I was testing the DOM handling of the XML processing instructions in various browsers, I've found an interesting inconsistency: Internet Explorer includes the <?xml ?> processing instruction in the DOM tree, while Firefox hides it.

For example, when the following XML document ...
<?xml version="1.0" encoding="windows-1250"?>

<?xml-stylesheet href="countryPage.xsl" type="text/xsl"?>

<Test />
... is transformed into a DOM tree and processed with this JavaScript function:
function testPI(dom) {

var topNodes = dom.childNodes;

for (var i = 0; i < topNodes.length; i++) {

var node = topNodes[i];

wr("nodeType="+node.nodeType);

if (node.nodeType == 7) {

wr("PI="+node.target+" ... "+node.data);

}

}
... Internet Explorer generates the following text:
nodeType=7
PI=xml ... version="1.0" encoding="windows-1250"
nodeType=7
PI=xml-stylesheet ... href="countryPage.xsl" type="text/xsl"
nodeType=1
... while Firefox skips the <?xml ?> processing instruction:
nodeType=7
PI=xml-stylesheet ... href="countryPage.xsl" type="text/xsl"
nodeType=1


Extract the default stylesheet from the DOM document

If you use XSLT transformations to change the XML data into HTML markup and use the xml-stylesheet processing instruction to specify the default XSLT stylesheet to use for the transform, you might need to fetch the default stylesheet value in JavaScript when developing AJAX applications.

You can find a good case study on using XML and XSLT in my Inform-IT article Optimized Presentation of XML Content.


The following JavaScript function extracts the default XSLT stylesheet from the XML document (passed to the function as DOM Document object). It relies on the fact that the DOM Document object extends the Node object and thus has the childNodes property that contains the root element as well as all processing instructions.

function getXMLStylesheet(dom) {

var topNodes = dom.childNodes;

for (var i = 0; i < topNodes.length; i++) {

var node = topNodes[i];

if (node.nodeType == 7) {

if (node.target == "xml-stylesheet") {

var match = /href="(.*?)"/gi.exec(node.data);

if (match.index >= 0) return match[1];

}

}

}

}


A typical usage of this function is illustrated below:
<script src="/sarissa.js"></script>

...

<script>

var xrq = new XMLHttpRequest();

xrq.open("GET","getPI.xml",false);

xrq.send(null);

if (xrq.status == 200) {

var dom = xrq.responseXML;

var xsl = getXMLStylesheet(dom);

alert("stylesheet="+xsl);

}