XHTML eReader Catalogs

XHTML Typography

typographic correctness via standard XHTML…

Table of Contents

Content

Layout

APPENDICES

Endnotes

XHTML Typography

Preface

…XHTML…

Publishing elegant type on the web has been next to impossible. Printed type has always been much clearer, easier to read, than copy produced on typewriters, early word processors or by browsers.

Global acceptance of computers and World Wide Web have given millions the ability to ‘publish’ documents—though not elegant documents—before HTML 4.01 and XHTML 1.0 came along. The web was missing almost all the neccessary tools for elegant typography and is still missing many important tools. So, finally, we have the basic essential tools for creating highly readable content presented by HTML 4 and XHTML 1 compliant browsers.

XHTML 1, CSS 2 and UTF-8

The scope of this document is to consider the use of XHTML 1.0 for the generation of consistant, highly readable web pages. This document will cover the design of documents employing all the available standard typography in this environment. I’ve written this document (and the entire library) with ‘XHTML 1.0 Strict’ compliant documents as the design goal. The whole project could just as easily be written using ‘HTML 4.01 Strict’ compliant documents—more on this later. Special characters have up to four methods of reference: named, decimal, hexadecimal, and UTF-8 (Unicode). For easier, more intuitive, web page construction I’m incorporating special characters with UTF-8 encoding.

XHTML generation

Modifying a document for the XHTML eReader Library is described in detail below. The tasks are included in roughly the optimum order of execution.

At the start a document doesn’t need to be in XHTML format; it may be as simple as a text document exported as a ‘.txt’ from MS Word for instance. I would recommend working as a pure text document for as long as possible to be able to take advantage of spelling and syntax checking. Remember, always save as a text document; you never want Microsoft Word to save your file as HTML! This is entirely non-standard and will definitely ruin the file and cause extra work in cleaning out the extraneous mark-up.

XHTML Typography

Content

…XHTML…

Spaces

There are fifteen space characters defined in Unicode. Most aren’t defined in HTML, and you can ignore many of these. Though many should be a part of the web, let’s deal with the ones that are defined first.

The non-breaking space ( ) ( ) commonly found in otherwise-empty table cells, is safely referred to by either its numeric or named entity reference in all 3.0-level and higher browsers.

The thin space ( ) ( ) is the most similar space character which is defined in HTML. It is supposed to be one-fifth of an em in width, but is almost always rendered much wider. The only font I’ve found with a correctly-designed hair space in is Arial Unicode MS, and it renders both with almost exactly the same width.

Bottom-line: Unless you can be sure that your target audience has Arial Unicode MS installed, neither of these spaces has anything close to the desired and correct appearance.

The last two spaces in the HTML repertoire are the en space ( ) ( ) and the em space ( ) ( ). Can you guess how wide each is?

Both are visibly wider than a normal space, and once again, Arial Unicode MS is the only mainstream font that includes both, even though they are part of the official XHTML 1.0 specification.

XHTML Typography

Sentences

…On every line of text in the document, watch all line breaks carefully. Be sensible.…

Period (.) Question Mark (?) Exclamation Point (!)

All sentences should end with a period (.), question mark (?), or exclamation point (!) followed by a single space ( ) aka word space. So the document should be searched and any where two successive spaces are found they should be replaced with a single space. This may need to done for successive passes until no double spaces remain.

Search out as many of the primitive markup as you can find and correct them. Some examples of these anomalies are double hyphen-minus (--) used mistakenly in place of an em dash, double single quotes ('') used in place of open double quotes () or close double quotes ().

XHTML Typography

Abreviations

…Avoid abreviations. Use small caps for A.M. and P.M.; space once after the number, and use periods.…
XHTML Typography

Punctuation

…Use proper punctuation with parentheses.… …Reduce the size of the punctuation marks in headlines.…
XHTML Typography

Parentheses

XHTML Typography

Capitals

…XHTML…
XHTML Typography

ASCII Characters

…XHTML…

These are the characters that you type on a standard (US layout) keyboard. They should never be used in proper typography, but are often used because they’re easy to type and well supported. They should always be superseded by one of the following, depending on meaning.

ASCII Backtick (`)

This is the character that’s typically on a US layout keyboard below the tilde (~). The backtick shouldn’t be used in place of the opening single quote (‘), or for any other discernible typographic purpose.

ASCII Hyphen-minus (-)

The hyphen-minus is produced by the key next to the zero on your keyboard. Do to problems in the HTML specification with hyphenation it should always be used whenever a hyphen is required, but never used as a minus sign. More about this in the section on the minus character.

ASCII Apostrophe (')

These are the characters that you type on a standard (US layout) keyboard with the key that’s beside the semicolon (normal and shifted). They should never be used in proper typography, but are often used because they’re easy to type and well supported. They should always be superseded by one of the following, depending on meaning.

ASCII Double Quote (")

These are the characters that you type on a standard (US layout) keyboard with the key that’s beside the semicolon (normal and shifted). They should never be used in proper typography, but are often used because they’re easy to type and well supported. They should always be superseded by one of the following, depending on meaning.

XHTML Typography

Apostrophes

…XHTML…

An apostrophe (’) (’) (Option Shift ]) is the preferred character to use as an apostrophe, as in I’m coming, or He’s with me.

XHTML Typography

Quotation marks

…XHTML…

I’m going to make life easy on you here (well, mostly). There are actually fourteen quotation characters. (Eighteen if you count the big, bold versions in the Dingbats section of Unicode.) I’m going to pretend that most of them don’t exist—you’ll only need them for foreign languages anyway.

Methods for correctly inserting curly quotes in web pages are not well understood. Do not, under any circumstances, use “ through • for curly quotes.

Don’t ever trust the 8-bit representations to be correct, because they almost certainly won’t be. The biggest problem is that many web browsers assume that 8-bit characters refer to the local character system, translating your curly quotes or dashes into Greek or accented Latin characters on other platforms. These same browsers will always display the entity references correctly.

Don’t ever try to ``fake it´´ with doubled-up grave accents and straight single quotes or acute accents, as most of the ``best-known newspapers" do.

Opening Single Quote (‘)

An opening single quote (‘) (‘) (Option ]) should be used to start a quotation that’s delimited with single quotes, for example, ‘Over here!’.

Closing Single Quote (’)

A closing single quote (’) (’) (Option Shift ]) should be used to end a quotation that’s delimited with single quotes, for example, ‘Over here!’.

Opening Double Quote (“)

An opening double quote (“) (“) (Option [) should be used to start a quotation that’s delimited with double quotes, for example, “Over here!”.

Closing Double Quote (”)

A closing double quote (”) (”) (Option Shift [) should be used to end a quotation that’s delimited with double quotes, for example, “Over here!”.

Single Left-pointing Angle Quotation Mark (‹)

Single Right-pointing Angle Quotation Mark (›)

For a single left-pointing angle quotation mark (‹) (‹) for a single right-pointing angle quotation mark (›) (›)

Left-pointing Double Angle Quotation Mark or Left Pointing Guillemet («)

Right-pointing Double Angle Quotation Mark or Right Pointing Guillemet (»)

For a left-pointing double angle quotation mark or left pointing guillemet («) («) for a right-pointing double angle quotation mark or right pointing guillemet (») (»)

XHTML Typography

Hyphens

…Avoid too many hyphenations in any paragraph…

Hyphens are NOT Dashes!

Stop! Go back and re-read the subhead above—at least 2–3 times—then let it sink in before continuing.

The sentence above illustrates the proper use of the hyphen and the two main types of dashes. They are not the same, and must not be confused with each other. In some fancy fonts the difference is more than just their width—hyphens have a distinct serif. If you don’t already know the rules, let’s review them.

Hyphen (-) and Minus (−)

…Never have more than two hyphenations in a row…

The hyphen, inserted with the key next to the zero on your keyboard, is an ambiguous character suffering from an identity crisis. It can’t decide if it’s a hyphen, a minus, or an en dash—in fact, the Unicode specification describes it as “hyphen-minus” and defines very specific replacements for each of its personalities.

Use it if you need to insert a hyphen, but never for a minus (−) (−) or any dash, since it does not have the correct width or the vertical position for for any of these, compare these uses of hyphen when a minus is called for…

with your proportional serif font…

1 +4 -2= 3 …hyphen is not correct
1 +4 −2= 3 …minus sign is correct

with your proportional sans serif font…

1 +4 -2= 3 …hyphen is not correct
1 +4 −2= 3 …minus sign is correct

with your monospace font…

1 +4 -2= 3 …hyphen is not correct
1 +4 −2= 3 …minus sign is correct

Soft Hyphen (­)

The soft hyphen (­) (­) also known as “discretionary hyphen” and “optional hyphen” is to be used for one purpose only—to indicate where a word may be broken at the end of a line. Otherwise, it is to remain invisible and not affect the appearance of the word.

Some browsers display it no matter where it falls, but this is not the correct behavior. Others in the past have recommended against its use because its behavior was not well-defined, but the HTML 4.01 spec makes its use and behavior clear and unambiguous.

XHTML Typography

Dashes

…Avoid too many hyphenations in any paragraph…

Em Dash (—)

The em dash (—) (—) (Option Hyphen) is used to indicate a sudden break in thought (“I was thinking about writing a—what time did you say the movie started?”), a parenthetical statement that deserves more attention than parentheses indicate, or instead of a colon or semicolon to link clauses. It is also used to indicate an open range, such as from a given date with no end yet (as in “Richard Plourde [1947—] authored this document.”), or vague dates (as a stand-in for the last two digits of a four-digit year).

Two adjacent em dashes (——) are used to indicate missing letters in a word (“I just don’t f——ing care about 3.0 browsers”).

Three adjacent em dashes (———) are used to substitute for the author’s name when a repeated series of works are presented in a bibliography, as well as to indicate an entire missing word in the text.

En Dash (–)

The en dash (–) (–) (Option Shift Hyphen) is used to indicate a range of just about anything with numbers, including dates, numbers, game scores, and pages in any sort of document.

The en dash is also used instead of the word “to” or a hyphen to indicate a connection between things, including geographic references (like the Mason–Dixon Line) and routes (such as the New York–Boston commuter train).

The en dash is used to hyphenate compounds of compounds, where at least one pair is already hyphenated (as in “Netscape 6.1 is an Open-Source–based browser.”). The Chicago Manual of style also states that it should be used “Where one of the components of a compound adjective contains more than one word,” instead of a hyphen (as in “Netscape 6.1 is an Open Source–based browser”). Both of these rules are for clarity in indicating exactly what is being modified by the compound.

Other sources also specify the use of an en dash when referring to joint authors, as in the “Bose–Einstein” paper. Some also prefer it to a hyphen when text is set in all capital letters.

Some typographers prefer to use an en dash surrounded by full spaces instead of an em dash. Others prefer to insert hair spaces on either side of the em dash, but this is problematic with some web browsers (see the section on spaces for more detail).

Though some of the finer points in the rules are complex, their basic applications are clear-cut and their misuse easily identifiable. First, neither an em dash (—) nor an en dash (–) should be confused with the hyphen (-), which is used to join compound words together.

XHTML Typography

Languages

…XHTML…
XHTML Typography

Special characters

…XHTML…

Horizontal ellipsis (…)

Here are some fine points on the use of the horizontal ellipsis or three dot leader (…) (…) (Option ;) :

  1. An ellipsis is most often used to indicate one or more missing words in a quotation. It is also used to indicate when a thought or quotation trails off.
  2. When it occurs at the end of a sentence, it should be treated in one of three ways, depending on usage:
    1. If the ellipsis is being used to indicate one or missing words in the sentence, then it should be followed by a period.
    2. If it indicates one or more missing sentences, then it should appear after the period of the preceding sentence, and with a space on either side.
    3. But if it indicates that the thought or quote is just trailing off at the end of a sentence, then only the ellipsis is used, to clarify that no words from a quotation were omitted, as would be the case if the additional period were there.

Single Prime () and Double Prime ()

Many people (most, from what I’ve observed) believe that curly single opening and double opening quotes are the correct symbols for feet and inches. If you are one of these people, put out your hand so I can slap it with a ruler.

The correct symbols to use are prime and double prime. They look similar to curly quotes in a few fonts, but are usually much more distinct. They never, ever look like commas. They are usually set at a slight angle of 75°–80°, and are also usually tapered from the top to the bottom.

Single Prime ()

A single prime (′) () is used in mathematics, as in

x +y= x′ +y′= x″ +y″.

It’s also used as the symbol for feet, as in I am 6′ tall. Also, it’s used to represent minutes as in 36° 15′ 32″.

Double Prime ()

A double prime (″) () is used to indicate inches or seconds. A doubled version of the prime symbol is used in mathematics as in

x +y= x′ +y′= x″ +y″.

It is also used to indicate inches, The table is 5′ 6″ long. Also, it’s used to represent seconds as in 36° 15′ 32″. (I won’t discuss the triple prime and the three reversed versions of these characters.)

Dagger (†)

(†) (†) for a dagger

Double Dagger (‡)

(‡) (‡) for a double dagger

Per Mille Sign (‰)

(‰) (‰) for a per mille sign

Euro Sign (€)

For a euro sign (€) (€)

Bullets (Option 8)

…Use some sort of bullet when listing items, not a hyphen.…

fi (Option Shift 5)

fl (Option Shift 6)

© (Option g)

™ (Option 2)

® (Option r)

° (Option Shift 8)

¢ (Option $)

⁄ (Option Shift !)

¡ (Option 1)

⁄ (Option Shift !)

¿ (Option Shift ?)

£ (Option 3)

ç (Option c)

Ç (Option Shift c)

…Use some sort of bullet when listing items, not a hyphen.…
XHTML Typography

Numbers versus numerals

…XHTML…

There are two styles of Arabic numerals in type, lining figures and old style figures. Lining figures—also called ranging or modern figures—are the same height as the capital letters in the font. Old style figures—also called old face, non-lining or non-ranging figures—align on the x-height of the lowercase alphabet and have ascenders and descenders. I would expect that your proportional serif font’s numerals below is old style figures and the other two fonts’ numerals are lining figures. Old style figures lend themselves to better use in text because they are less obtrusive. Lining figures are neater and are more suitably used in tabular and other technical uses.

with your proportional serif font…

XYZ1234567890

with your proportional sans serif font…

XYZ1234567890

with your monospace font…

XYZ1234567890
XHTML Typography

Superscripts and subscripts

…XHTML…
XHTML Typography

Fractions

…XHTML…
XHTML Typography

Underlining

…XHTML…
XHTML Typography

Layout

Line… font… text…
Line… font… text…
/* CSS/(X)HTML element selector {property: value;} rules */

html
{
  margin: 0;
  border: 0;
  padding: 0;
  background: repeat url(lines.png) #ffffee;
  color: #000000;
  font-family: Georgia, serif;
  font-size: 100%;
  font-weight: normal;
}

Typefaces

Most books, periodicals, newsletters, technical documents, etc. is best presented in proportional type. Use serif type for the body text unless you are going to compensate for the lower readability. Never combine two serif fonts on one page. Never combine two sans serif fonts on one page. Never combine more than two typefaces on one page (unless you’ve studied typography). So, to summarize, if you’re going to use more than one face, use one serif and one sans serif. The exception to this rule is when you need to use a monospace font for columnar data, printouts, or source code listings.

Display fonts

A serif font will provide greater readability especially when displayed in a font designed for the quirkiness of computer displays.

Printer fonts

Most fonts that are optimized for display will render very well on the printed page. Some fonts, Times New Roman for example, will render exceptionally well on the printed page but renders poorly on a display screen.

Choosing typefaces

Do -not- on any account put the generic sans-serif or serif at the end of your CSS font list, in such a situation. This I will do also. I thought it was imperative that they should be included. Oh no; it's often recommended, for typographical reasons, but the two criteria are pulling in opposite directions in this instance: you can either have better typography (with missing glyphs) or better character repertoire (with somewhat klunky typography).

The choice of typefaces should remain consistant thoughout the web site, if at all possible. When using bold, italic or bold italic fonts do so sparingly. Do not use the <b> or <i> ever. This is styling which should not be included in the content mark-up; styling should always be included in the stylesheets (more on this later). Always use the <em> and </em> to indicate places where emphases is required. let the browser (user agent) choose the proper form of emphases for it's environment. Include the space before an emphasized word to also in the mark-up.

Em and En

em
a unit of measurement defined as the point size of the font—12 point type uses a 12 point “em”
en
one-half of an “em”

I'm using the three fonts (serif, sans serif, and monospace) thus…

XHTML Typography

Serif

Serif fonts should be used for the main body prose such as headings, lists, table data, term descriptions, and normal text. The font Georgia works very well on both Mac OS X and Windows systems and there’re no known tradeoffs provided the uses are limited to those specified above. The font should be used for serif throughout all documents requiring a serif font. The stylesheet should call for the font thus…

For display font use…
{font: medium Georgia, serif;}
For print font use…
{font: 10.5pt "Georgia", serif;} /* small pica */

Georgia is a very pleasant font for use on both the display screen and on the printed page. It includes regular, bold, italic, and bold italic. Georgia’s Tradeoffs are the font that IE won't use the Mathematical Operators so use sans serif for them… actually I believe math looks better in sans serif.

But that's not the point! The font does not claim to support them, so IE will look for some better-populated font.

Font properties extension = charset/unicode reports, for this version of Georgia at least (that's Win/2000 Pro, font version 2.05) that it supports only:

XHTML Typography

Sans Serif

Sans Serif fonts should be used for mathematical expressions such as equations, variables, constants, and some source code listings. Sans serif are also used for menu, table lables and description terms. The font Lucida Grande works very well on Mac OS X system and the font Lucida Sans Unicode works very well on Windows systems. The font should be used for sans serif throughout all documents requiring a sans serif font. The stylesheet should call for the font thus…

For display font use…
{font: medium "Lucida Grande", "Lucida Sans Unicode", sans-serif;}
For print font use…
{font: 10.5pt "Lucida Grande", "Lucida Sans Unicode", sans-serif;} /* small pica */

Tradeoffs are Lucida Grande has no italics and Lucida Sans Unicode has a section of ugly characters around ≥ and ≤ characters and also there is no bold or italics.

Whereas if we take say Lucida Sans Unicode, it has lots of extra ranges supported, including IPA Extensions, Combining Diacritical Marks, ect. etc. and in particular: Mathematical Operators.

XHTML Typography

Monospace

Monospace typefaces should be used for columnar presentations including numerical tables, printouts, and some source code listings—whether wrap or no wrap. The font Courier New works very well on both Mac OS X and Windows systems and there’re no known tradeoffs provided the uses are limited to those specified above. The font should be used for monospace throughout all documents requiring a monospace font. The stylesheet should call for the font thus…

For display font use…
{font: medium "Courier New", monospace;}
For print font use…
{font: 10.5pt "Courier New", monospace;} /* small pica */

XHTML Typography

Kerning

…XHTML…
XHTML Typography

White space

…Encourage whitespace.…

Cramping text

…Don't crowd text inside a box—let it breathe. Leave no widows or orphans…
…Never justify the text on a short line. Hang the punctuation off the aligned edge.…

Indents and Tabs

…Use a one-em first-line indent on all indented paragraphs…

Numerical column alignment

…Never use the spacebar to align text Use a decimal or right-aligned tab for the numbers in a numbered paragraph…

Leading or linespace

…Keep the line spacing consistent…

Tighten up the leading in lines with all caps or with few ascenders. Adjust the spacing between paragraphs, rarely use a full line of space btween paragrahs in body texts. Either indent the first line of paragraphs or add extra space between them—not both.

Baseline alignment

…Align the first baselines of juxtaposed columns.…

Bibliography

  1. William Strunk: Elements of Style
  2. W3C HTML 4 & XHTML 1 entity definitions
  3. Unicode Consortium
  4. International System of Units
  5. NASA’s A Handbook for Technical Writers and Editors is of great help even if you don’t write about technical subjects.
  6. Jukka Korpela provides a great amount of detail on specific characters as part of a larger series on characters, and a buttload of additional web authoring—related information.
  7. Got a detailed question about which characters allow line breaks to occur? An update to the Unicode specification has all the answers
 Copyright © 1994–2006  Richard R Plourde 
 Some rights reserved 
 SeaPlusPlus.net  xhtml  css — 2006.05.31