HTML5 structure—HTML4 and XHTML1 to HTML5

boblet:

We’ve covered a lot of ground so far. To recap, HTML5 has four new block-level sectioning elements that we can use to give the parts of web pages more semantic meaning. These new elements are for ‘chunks of related content’—basically a logical section of the document:

  • <section>—a generic chunk of related content
  • <article>—an independent, self-contained chunk of related content, that still makes sense on it’s own
  • <aside>—a chunk of content that is tangentially related to the content that surrounds it, but isn’t essential to understanding that content
  • <nav>—navigation for the site or page

These new sectioning elements should contain a <header> element with the section’s title, and any other introductory information. They can also contain one or more <footer> elements with additional information such as copyright, related links etc. It’s important to note that <header> and <footer> apply to the structural element they’re in—they’re not the same as a page header or page footer. It’s also important to remember that <header> and <footer> can’t contain other <header>s and <footer>s, and <footer> can’t contain heading or sectioning elements. Finally, while the words “header”, “footer” and “aside” all come with preconceptions, their semantic meaning comes from the types of content they contain, not from their presentation or relative placement. For example, an <aside> could contain a footnote, and a <footer> containing a ‘Top of Page’ link could appear at both the top and bottom of a section.

Now let’s look at example structures for a basic article page; using the standard layout of a page header (with logo etc), navigation tabs, a main column, a side column, and a page footer.

Article Page

Here’s the outline of the parts of our page:

Article Page Layout
  • Site header (logo, search, …)
  • Site navigation
  • Main content (wrapper)
    • Main column
      • Article
        • Article title
        • Article metadata
        • Article content…
    • Sidebar
      • Sidebar title
      • Sidebar content…
  • Footer

So let’s write that in standard HTML4:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
  <head>
    <title>Article (HTML4)</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <div id="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content">
      <div id="main"> <!-- main content (the article) -->
        <h1>Article title</h1>
        <p class="meta">Article metadata</p>
        <p>Article content…</p>
      </div>
      <div id="sidebar"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>

So let’s write that in standard XHTML1.0:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xml:lang="en">
  <head>
    <title>Article (XHTML1)</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>
    <div id="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content">
      <div id="main"> <!-- main content (the article) -->
        <h1>Article title</h1>
        <p class="meta">Article metadata</p>
        <p>Article content…</p>
      </div>
      <div id="sidebar"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>

Now let’s convert that to HTML5, using the new structural elements:

<!-- 'HTML-style' HTML5 -->
<!doctype html>
<html lang="en">
  <head>
    <title>Article (HTML5)</title>
    <meta charset="utf-8">
  </head>
  <body>
    <header id="branding">Site logo, search etc</header>
    <nav>
      <ul>
        <li>Site navigation</li>
      </ul>
    </nav>
    <section id="content">
      <article> <!-- main content (the article) -->
        <header>
          <h1>Article title</h1>
          <p>Article metadata</p>
        </header>
        <p>Article content</p>
      </article>
      <section id="sidebar"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content</p>
      </section>
    </section>
    <footer id="footer">Footer</footer> <!-- a very basic footer! -->
  </body>
</html>
<!-- 'XHTML-style' HTML5 -->
<!doctype html>
<html lang="en">
  <head>
    <title>Article (HTML5)</title>
    <meta charset="utf-8" />
  </head>
  <body>
    <header id="branding">Site logo, search etc</header>
    <nav>
      <ul>
        <li>Site navigation</li>
      </ul>
    </nav>
    <section id="content">
      <article> <!-- main content (the article) -->
        <header>
          <h1>Article title</h1>
          <p>Article metadata</p>
        </header>
        <p>Article content</p>
      </article>
      <section id="sidebar"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content</p>
      </section>
    </section>
    <footer id="footer">Footer</footer> <!-- a very basic footer! -->
  </body>
</html>

Note here we assume that the sidebar contains content not related to the article (such as recent articles etc). If it only contained content tangentially related to the article we could use <aside>. Also we assume that the <footer> doesn’t contain much more than a copyright statement and contact information—a detailed footer with headings etc would need it’s own <section>.

doctype, charset & XHTML-style markup

You’ll notice the doctype and charset are both much simpler. While this style charset is recommended, the pre-HTML5 charset declarations are still valid. Also, if you’re viewing XHTML-style code examples, you’ll note that the charset element still has an XHTML-style trailing slash in the HTML5 example. In fact XHTML-style markup (a closing / on empty elements) like this is also valid HTML5! This makes it very easy to migrate to HTML5 from both HTML and XHTML pages. You should try to avoid mixing HTML and XHTML-style code, however—choose one style and stick with it.

HTML5 or XHTML5? Choose HTML5

If you currently use XHTML1.x you might be thinking to use XHTML5, the XML-compatible version of HTML5. If your website will have a general audience, don’t. XHTML5 must be sent with an XML mime type (like application/xhtml+xml), and even IE8 still doesn’t support this. However, all of the hallmarks of XHTML coding—writing elements in lower case, correct nesting, closing tags, adding optional elements that add meaning—are all compatible (HTML5 is case-insensitive) or encouraged in HTML5.

You’ll notice the charset is simplified. While this style is recommended, the pre-HTML5 charset declarations <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> and <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> are both still valid. Also, if you’re viewing XHTML-style code examples, you’ll note that the charset element still has an XHTML-style trailing slash. In fact XHTML-style markup like this is also valid HTML5! This makes it very easy to migrate to HTML5 from both HTML and XHTML pages. You should try to avoid mixing HTML and XHTML-style code, however—choose one style and stick with it.

Browser support; CSS and JS

So, does it work? Currently the HTML5 structural elements will work in modern browsers (Firefox 3+, Safari 3+, Opera 9+, Chrome 1+) as long as we declare them as block-level elements via this CSS:

/* Declaring HTML5 elements */
article, aside, dialog, figure, footer, header, legend, nav, section {
  display: block;
  }

and in Internet Explorer 8 and below we need to hack support in via Javascript (I bet you didn’t see that coming ;-)

(function(){if(!/*@cc_on!@*/0)return;var e = "abbr,article,aside,audio,bb,canvas,datagrid,datalist,details,dialog,eventsource,figure,footer,header,hgroup,mark,menu,meter,nav,output,progress,section,time,video".split(','),i=e.length;while(i--){document.createElement(e[i])}})()

The recommended way to add this Javascript is via Remy Sharp’s Google Code-hosted HTLM5 shiv for IE in the head:

<!--[if IE]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

So, all together now…

<head>
  <!--[if IE]>
    <script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
  <![endif]-->
  <style type="text/css" media="screen"> /* Declaring HTML5 elements */
  article, aside, dialog, figure, footer, header, legend, nav, section {
    display: block;
    }
  </style></head>

…but IE requiring JS means we’re screwed, right?

You might think that IE’s lack of support without Javascript for these new elements means you can’t use HTML5 at all, but there are two ways we can still benefit from HTML5’s greater semantic richness—by using HTML5 semantic element names as class names on <div>, in either HTML4/XHTML1.0 or HTML5. You’re probably already using a standard set of class and id names anyway, and this is in effect a standardised set of semantic class names. HTML5 is basically a superset of HTML4/XHTML1, so as long as you don’t use any new elements HTML5 pages will work in IE. It also has the benefits of simplifying a future move to HTML5, and if you use the HTML5 doctype you can also use the more detailed HTLM5 validators.

Here’s the HTML4 version using HTML5 class names:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
  <head>
    <title>Article (HTML4), with HTML5 class names</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <div id="branding" class="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content" class="section">
      <div id="main" class="article"> <!-- main content -->
        <div class="header">
          <h1>Article title</h1>
          <p>Article metadata</p>
        </div>
        <p>Article content…</p>
      </div>
      <div id="sidebar" class="section"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>

Here’s the XHTML1 version using HTML5 class names:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xml:lang="en">
  <head>
    <title>Article (XHTML1), with HTML5 class names</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>
    <div id="branding" class="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content" class="section">
      <div id="main" class="article"> <!-- main content -->
        <div class="header">
          <h1>Article title</h1>
          <p>Article metadata</p>
        </div>
        <p>Article content…</p>
      </div>
      <div id="sidebar" class="section"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>

Now in HTML5, again using <div> with HTML5 class names rather than the new HTML5 elements:

<!-- 'HTML-style' HTML5 -->
<!doctype html>
<html lang="en">
  <head>
    <title>Article (HTML5), with HTML5 class names</title>
    <meta charset="utf-8">
  </head>
  <body>
    <div id="branding" class="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content" class="section">
      <div id="main" class="article"> <!-- main content -->
        <div class="header">
          <h1>Article title</h1>
          <p>Article metadata</p>
        </div>
        <p>Article content…</p>
      </div>
      <div id="sidebar" class="section"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>
<!-- 'XHTML-style' HTML5 -->
<!doctype html>
<html lang="en">
  <head>
    <title>Article (HTML5), with HTML5 class names</title>
    <meta charset="utf-8" />
  </head>
  <body>
    <div id="branding" class="header">Site logo, search etc</div>
    <ul id="nav">
      <li>Site navigation</li>
    </ul>
    <div id="content" class="section">
      <div id="main" class="article"> <!-- main content -->
        <div class="header">
          <h1>Article title</h1>
          <p>Article metadata</p>
        </div>
        <p>Article content…</p>
      </div>
      <div id="sidebar" class="section"> <!-- secondary content -->
        <h2>Sidebar title</h2>
        <p>Sidebar content…</p>
      </div>
    </div>
    <div id="footer">Footer</div>
  </body>
</html>

You may be wondering why these two examples are so similar—after all, only the doctype and charset differ! That’s because one of HTML5’s core principles is compatibility. If we don’t use any new HTML5 elements, a change of doctype might be all that’s required to convert a well-coded HTML or XHTML page to HTML5.

Why bother with HTML5?

So if you’re not going to use HTML5’s new elements, which IE doesn’t support without Javascript, what’s the point of thinking about HTML5 now? I see several benefits:

  1. Thinking about HTML5’s structural elements (even if we only express the semantics via class names) will make our code more logical and semantic
  2. HTML5 is defined in far greater detail than previous HTML/XHTML specs, giving us more guidance in creating web pages
  3. Another benefit of this detail is more accurate validators (W3C, Validator.nu), with the potential for more detailed error messages
  4. If you think you might convert to HTML5 in the future, the HTML5-elements-as-class-names approach and a little regexp magic should remove a lot of the pain of converting
  5. Now that XHTML2 development will be halted, starting to learn about the official future of HTML is a good idea
  6. Using HTML5 is a sliding scale, not all or nothing. You can get benefits from simply changing the doctype, a five second job.
  7. Because browsers use the same parser for HTML5 as HTML4/XHTML1, and because backwards compatibility is a central tenet, using an HTML5 doctype today has almost no disadvantages (make sure to check HTML5 differences from HTML4, specifically 3.3-3.5).

It’s possible to just change the doctype and get some benefits from having converted to HTML5 (when you use a validator :). However, the more time you put into HTML5 the greater the reward. You’ll get the most benefit from rethinking your site’s semantics from an HTML5 perspective, although for the present I’d recommend adding these extra semantics via the HTML5-elements-as-class-names approach.

While it’s possible to just change the doctype and correct any validation errors when converting to HTML5, you’ll get the most benefit from rethinking your site’s semantics from an HTML5 perspective.

Questions? Feedback? Mistakes? Let me know via Twitter (@boblet)

Changes:

  1. 2009-07-16 Added notes about doctype, XHTML5 and XHTML-style coding in HTML5, thanks to feedback from @robertdot. Also changed doctype to lower case in HTML5 code examples for consistency (HTML5 is case-insensitive so either is fine).
  2. Added headings for the doctype, charset & XHTML-style markup and HTML5 or XHTML5? Choose HTML5 sections I added last time, for better scanability. Added more info on the HTML5 shiv plus a copy-paste-able head code block for adding JS and CSS (thanks to HTML5 Doctor for the prompt). Also I added a few more links, a couple more points to “Why bother with HTML5?” (‘sliding scale’ and ‘no disadvantages’), rewrote the conclusion, and added what could be my favourite header ever.

HTML5 structure—div, section & article

boblet:

It seems my HTML5 id/class name cheatsheet article interested a few people, so here’s the start of an in-depth look at the document structures that fall out of the HTML5 spec. First, let’s introduce three easily confused HTML5 structural elements:

  1. <div>—the generic flow container we all know and love. It’s a block-level element with no additional semantic meaning (W3C:Markup, WhatWG)
  2. <section>—a generic document or application section. A <section> normally has a heading (title) and maybe a footer too. It’s a chunk of related content, like a subsection of a long article, a major part of the page (eg the news section on the homepage), or a page in a webapp’s tabbed interface. (W3C:Markup, WhatWG)
  3. <article>—an an independent part of a document or site. This means it should be able to ‘stand alone’, and still make sense if you encountered it somewhere else (eg in an RSS feed). Examples include a weblog article (duh), a forum post or a comment. Like <section> these generally have a header, and maybe a footer (W3C:HTML, WhatWG)

The difference between <div>, <section> and <article>

In writing semantic HTML we should use the most suitable or semantically accurate element. In HTML4 <div> is a general block-level container element; it doesn’t have any semantic meaning beyond being block-level, and is used when there are no more appropriate elements (ie all the time). There is no requirement for the things inside the <div> to be related to each other.

The new HTML5 <section> element is similar to <div> as a general container element, but it does have some semantic meaning—the things it contains are a logical group of related content:

The section element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a header, possibly with a footer.

It is also a ‘sectioning content’ element. Along with <article>, <nav> and <aside>, it indicates a new section in the document. Imagine making your page into a bulleted list of related parts—sectioning elements create a new bullet point in the page’s outline, with indentation reflecting nesting. Note <div> isn’t a sectioning element.

The new HTML5 <article> element is like a specialised kind of <section>; it has a more specific semantic meaning in that it is an independent, self-contained part of the page. We could use <section>, but using <article> gives more semantic meaning.

To think about HTML4, we can compare this (kind of ;-) to <p> and <blockquote>. Both are block-level elements for text, but using <blockquote> gives more semantic meaning (this is a block of quoted text). The same with <section> and <article>; <section> means related content, but <article> means one piece of related content which makes sense on it’s own, even outside the context of the page (the page’s header and footer etc).

The potentially confusing part of this is that <section> can be used for parts of a page (eg the main content column, the news section on a homepage) and contain <article>s, and also for sections of a long <article> (ie inside an <article>).

To decide which of these three elements is appropriate, first think if the enclosed content would make sense on it’s own in a feed reader—if so use <article>. If that’s not the case, is the enclosed content related? If so use <section>. Finally if there’s no semantic meaning use <di>. Except for occasional use to provide a hook for styles, I expect the humble <div> will mostly be superseded by <section>, and where required more specialised HTML5 elements.

We use <section> and <article> just like <div> is used in HTML4—eg these elements can’t be used inside <blockquote> or <address>. Also avoid nesting an <article> inside another <article>—use <section> for indicating logical parts of an <article> instead.

Document structures

Let’s describe some common document structures as bulleted lists:

A weblog article

  • Weblog article
    • Title
    • Content…

In HTML4 we’d most probably wrap the article in a <div>. Obviously we should use <article> instead in HTML5.

A long article with subsections (like a thesis)

  • Article
    • Title
    • Content
      1. Subsection
        • Subtitle (section title)
        • Content
      2. Subsection
        • Subtitle (section title)
        • Content
      3. Subsection
        • Subtitle (section title)
        • Content

Again the article would generally be wrapped in a <div> in HTML4, and the subsections would only be suggested by <h1>-<h6> elements. In HTML5 the article should be wrapped in <article>, and the subsections of the <article> should be explicitly indicated by wrapping them in <section> elements, perhaps in ordered list items if you’d like section numbering.

A weblog homepage

  • Weblog header
    • Logo
    • Search
  • Site navigation
  • Main content
    1. Weblog article
      • Title
      • Summary
    2. Weblog article
      • Title
      • Summary
    3. Weblog article
      • Title
      • Summary
  • Secondary content
    • Blogroll, photos, other content…
  • Footer

All the main block-level sections of this structure would generally be <div>s in HTML4. Using our pieces from above, in HTML5 the <article>s should all be inside one <section> (for main content), with the secondary content also inside a <section>. Any independent (‘can stand alone’) chunk of content in the secondary content might also be marked up with an <article>. Note that the semantically hardcore might consider choosing to wrap the list of articles in <ol> list items if they are sorted by date, which articles on weblog homepages normally are :)

However <section> and <article> aren’t all the new structural tags! Stay tuned to see what becomes of the header, navigation and footer

Questions? Feedback? Mistakes? Let me know via Twitter (@boblet)

Changes:

  1. 2009-06-30 Added sectioning content link and improved explanation
  2. 2009-07-02 Improved explanation of nested articles
  3. 2009-07-03 Feedback link, minor improvements to nested list examples
  4. 2009-07-07 Added links to some other HTML5 articles I’ve written

XHTML2 is dead.

Jeffrey Zeldman 所寫的部落格文章《XHTML WTF》的回應區當中, Tantek Çelik 提到說他是 XHTML2 工作組的成員,但他六年前就強烈不同意其他成員,而 W3C 也不聽反對意見。反對力量在2004年開始集結,他們自立門戶成為 WHATWG ,搞出來的標準,也就是 W3C 後來在2007年也接納的 (X)HTML5 。

關於 XHTML2 與 (X)HTML5 的角力,人們認為是純粹主義者與實用主義者的拉鋸,也有人說是專注在結構化內容(content)與強調應用程式(application)的區別。這從 HTML5 的前身稱為 Web Applications 1.0 可以看得出來。以 XML 撰寫網頁的主要好處——機器可處理、資訊易傳遞——在 RSS 的廣泛流行之後,變得食之無味; JavaScript 的復興使 Web App 成為潮流;IE 一直不支援 application/xml+xhtml ;這都是 XHTML2 的間接死因。