Section breaks in Office Open XML and WordML

Identifying the section breaks in pure Office Open XML (OOXML) document is the ultimate nightmare: the only indication of a section break is the presence of w:sectPr element within the last paragraph of the section. To complicate matters further, the last section in the document could be represented by a w:sectPr element that is a sibling to the w:p elements … and of course you could have elements without sections, in which case there would be no w:sectPr element anywhere within the XML. Just try to imagine writing an XSLT translation that would perform OOXML-to-HTML translation and split the Word text into DIVs (a DIV for each section).

Fortunately, the task is much easier if you use WordML, which contains auxiliary hints in the wx namespace; in our case, the wx:sect element, which encloses all the paragraphs within a section.

For example, the following Word text …… generates this WordML markup (to get the corresponding OOXML, remove the wx:sect elements).

No comments:

Post a Comment