Using the HTML lang attribute

What do an American actor, a British sitcom character and an HTML attribute have in common? If you’ve ever watched Mary Poppins and winced at Dick Van Dyke’s attempt at an English accent, or found yourself laughing at Delboy Trotter trying to speak French in Only Fools and Horses, you may well guess the answer.

The HTML lang attribute is used to identify the language of text content on the web. This information helps search engines return language specific results, and it is also used by screen readers that switch language profiles to provide the correct accent and pronunciation.

To set the primary language for a document, you use the lang attribute on the <html> element:

<html lang="en">
...
</html>

The lang attribute takes an ISO language code as its value. Typically this is a two letter code such as “en” for English, but it can also be an extended code such as “en-gb” for British English.

The lang attribute must also be used to identify chunks of text in a language that is different from the document’s primary language. For example:


<html lang="en">
...
<body>
<p>This page is written in English.</p>
<p lang="fr">Sauf pour ce qui est écrit en mauvais français.</p>
</body>
</html>

The lang attribute is forgotten surprisingly often, perhaps because it makes no apparent difference unless you use a screen reader or you are a search engine.

Categories: Technical
Tags:

Comments

Good reminder why developers should not forget to declare the language of the content. One remark or two:

“The lang attribute takes an ISO language code as its value” is half-true. For English, you should not use the ISO 369-2 code “eng”, but “en” (ISO 369-1), according to BCP 47. To look up language codes, use the IANA Language Subtag Registry, cf. article Two-letter or three-letter ISO language codes at W3C Internationalization.

One more reason to declare the language is automated hyphenation (CSS `hyphens` property). Developers with Chrome as their primary browser tend to forget that since it is still not implemented in Chrome (or its Blink brother Opera), but it is in Firefox, IE, Edge, and Safari. Obviously, there are different hyphenation algorithms for different languages. Hyphenation will only be applied when the content language is declared; and will lead to wrong results when a wrong declaration is given. (Developers beware! Do not copy from elsewhere when your content is not in English.)

Other reasons include language-dependent (script-dependent) styling. See article Why use the language attribute?.

Sylvie says:

Thank you, this is a good reminder.
As far as users of non-english speaking screen readers are concerned, when they land on an english-speaking web page without lang attribute, it will be spoken with the screen reader language, that makes the page impossible to understand, unless the screen reader user disables language switching in the screen reader.
So don’t forget to add this attribute to your documents, even if your site speaks English!

I waded into this on my site a little while back and gathered a list of benefits:

It is important to avoid political pitfalls for CJK languages.
It will choose the appropriate formats for date and number inputs.
It is used by the spellcheck attribute.
Quote characters change for the q element.
In addition to hyphens, hanging punctuation and ::first-letter selectors can be affected.

Thanks to feedback from the interwebs I also identified how it affects screen readers:

VoiceOver on iOS uses the attribute to auto-switche voices.
VoiceOver can speak a particular language using a different accent when specified.
Leaving out the lang attribute may require the user to manually switch to the correct language for proper pronunciation.
JAWS uses it to load the correct phonetic engine / phonologic dictionary — Handy for sites with multiple languages.
NVDA (Windows) uses it in the same way as VoiceOver and JAWS.
When used in HTML that is used to form an ePub or Apple iBooks document, it affects how VoiceOver will read the book.

Neil Osman says:

As far as mixed language is convened, CMS generated content is a common obstacle… ckEditor provides a builtin mechanism to configure.

My experience with mixed languages, is that NVDA and Jaws switches automatically between English and Hebrew without explicitly setting lang attributes.
Also, lang attribute does not guarantees correct reading of local date formats (dd/mm/yyyy).
I suspect that more then anything else, browser type determinants SR behavior.

Also, even if i explicitly set the lang attribute to Hebrew, this 05.08.2001 would be:
displayed in chrome in a none local format and would be read correctly in English voice.
displayed in IE11 in local format but will read Aug 5th 2016 (in Hebrew voice),
displayed in FF in local format but will read Aug 5th 2016 (in Hebrew voice)

brianary says:

Technically, the country part of the language code is supposed to be capitalized, though most browsers don’t care.

Colete Anglia Romania says:

It seems that my template is overriding my language settings. I use latest Joomla 3.6.2, only 3 extensions and the template is from a known provider. In my heading I have only , and the lang attribute is empty.
Do I have to set the lang attribute in template file somewhere in default.php file or in joomla settings? Any help would be appreciated.

A couple of additions RE: multiple languages on the same page (your last example).

* In my experience NVDA will only switch automatically between them when using the built-in eSpeak synthesizer, but not with installed 3rd party voices.
* Like Neil Osman said, CMSs make this hard. Here’s a small WP plugin that solves this nicely.