Despite the recent spat of anti-XHTML rantings on blogs and HTML mailing lists, there is value in coding in XHTML and in the future of the XHTML Family of markup languages. However, my intention in this article is not to go into the validity of XHTML markup versus HTML 4.01 but demonstrate how to publish XHTML documents as an XML application.
Extensible HyperText Markup Language, or XHTML, "is a reformation of HTML 4 as an XML 1.0 application, and three DTD's corresponding to the one's defined by HTML 4." [1] This means that you can use the existing structure of HTML 4 but also add the functionality of XML to create your own custom elements when necessary.
But there is more to it than just adding custom elements. When you create a valid XHTML document, you have a well-formed XML document that can be read and interact with other XML applications and parsers. This will enable you to easily reuse content created in your document and publish it to other document types and/or databases. Content/data reuse, for at least the last ten years, has been the holy grail of computing and XHTML brings Web content one step closer to this goal.
XML, due to its close roots from SGML, demands well-formed documents. This concept is in stark contrast to the current state of markup in a vast majority of web pages. When the Internet's popularity exploded, HTML could not keep up with the visual demands of Web designers and programmers. This led to custom tags being introduced by browser vendors, hacks in markup by designers to get a desired effect and a general disregard to "standards" compliance by all involved.
What we ended up with is tag soup. Error handling in today's browsers encompasses a large majority of the code in their rendering engines. XHTML is a step toward breaking away from the sins of the past with well-formed documents that are parsed against a published DTD.
So let's say you have created a well-formed and validated XHTML document, now what? If you publish the document on the web, most web browsers will render the document as normal HTML, especially if you follow Appendix C HTML compatibility guidelines of the XHTML recommendation. All is well and you are on the cutting edge of markup... right? Not exactly. There is a problem with the media type of the document.
HTML documents are typically sent with a media type of text/html, but in order to utilize the advantages of XML, we will need a new media type. Application/xhtml+xml was registered by W3C HTML working group as RFC3236 [2] to create this new media type. "[It] is the media type for XHTML Family document types, and in particular it is suitable for XHTML Host Language document types. XHTML Family document types suitable for this media type include [XHTML1], [XHTMLBasic], [XHTML11] and [XHTML+MathML]." [3]
There will be a growing number of these "+xml" application media types. The +xml simply signifies that the document belongs to the XML family and needs to conform to its rule-set. In addition, instead of using the existing application/xml or text/xml media types, a new media type was needed specifically for XHTML rendering purposes.
For example, if we used text/xml, although acceptable technically, the browser would mostly likely render the document as a well-formed XML text document. Elements such as HREF's would be not be recognized as anything but text. Application/xhtml+xml forces the document to be rendered as an XML application but with the utility of an HTML document.
Now we need to tell our HTTP server to post these documents using our new media type in the header. For my demonstrations, I will be using Apache's HTTP server in a hosted environment where I do not have access to httpd.conf.
Typically, media types are associated with a specific file extension. In our first scenario, this is what we will try to accomplish. We will publish our valid XHTML documents to the Web server with an extension of .xhtml (.xht is also acceptable) and set the media-type for this extension. This is accomplished simply by editing (or creating, if it did not previously exist) the .htaccess file in the root web directory. We then add the following line to the file and save:
AddType application/xhtml+xml;charset=utf-8;qs=0.999 .xhtml
Now, any file requested that has a .xhtml extension, will have the content-type set to application/xhtml+xml in the header. Here is an example header:
HTTP/1.1 200 OK
Date: Fri, 24 Jan 2003 20:13:58 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.2 mod_gzip/1.3.19.1a
Last-Modified: Fri, 24 Jan 2003 16:47:52 GMT
ETag: "55815a-992-3e316e38"
Accept-Ranges: bytes
Content-Length: 2450
Connection: close
Content-Type: application/xhtml+xml; charset=utf-8; qs=0.999
But, as you might have already guessed, there is a problem with this scenario. The User Agent requesting this document will need to know how to handle a file with an .xhtml extension. By default, only Gecko based browsers (Mozilla and Netscape 6+), Opera and Amaya have support to display .xhtml pages. Unless you are serving pages in an environment where you control the UA's, this is not a great option.
The second scenario involves content negotiation between the browser and the HTTP server. Here we will publish our XHTML documents with the normal .html extension for a Web page. But before the document is sent to the browser, we will query as to whether or not the UA claims it can support our mime type.
In order to accomplish this, we need to make sure that the mod_rewrite module has been installed on our Apache server. Once verified, add the following lines to the same .htaccess file[4] :
RewriteEngine on
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.html$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule .* - "[T=application/xhtml+xml; charset=utf-8]"
Although this is still somewhat of a hack, it is a lot more Web-friendly to the general UA population. It also enables us to maintain only one version of our documents and allow them to be rendered appropriately in the various UA's. Even Internet Explorer will accept the media-type in this method, albeit in Quirks Mode. If the UA does not support application/xhtml+xml the document will be rendered as text/html.
Using the higher level protocol of HTTP, as we did above, is also the best way to set the character set. However, it is recommended to set the character set in the document as well with preferably the XML declaration or the meta tag "http-equiv="Content-Type"".
Strictly speaking, according to the W3C, the only XHTML documents that "may" be rendered as text/html are those that follow the previously mentioned guidelines of Appendix C. All others "should not" be displayed as text/html but rather "should" be rendered as application/xhtml+xml.
In conclusion, you are now armed with the knowledge and tools to properly post XHTML documents on the Internet. Hopefully this will now open doors to your applications to harness the power of XML.
Resources
[1]
XHTML 1.0 Recommendation, W3C Recommendation 26 January 2000, revised 1 August 2002
[2]
The application/xhtml+xml Media Type, The Internet Society, January 2002
[3]
XHTML Media Types, W3C, 1 August 2002
[4]
Christoph Schneegans document on handling XHTML with HTTP/Apache
Ted Pibil
http://www.pibil.org/Ted Pibil is a former lion-tamer and sword-swallower who now enjoys the more challenging world of markup and meta languages in distributed hypermedia systems.
He is a fellow Christian pilgrim and can be reached at www.pibil.org