Em and En Dashes in Movable Typo

Several users of the MTCurly plugin have written in or commented about the module catching em dashes written in as two hyphens. In the PHP function I have parsing the text on this site, I actually have this rule running. Well, it’s a pretty trivial task to do this, but when I was thinking about it the issue of en dashes bothered me and I decided to reread one of my favorite ALA articles, A List Apart: The Trouble With EM ‘n EN.

While the majority of people will just want to use the em dash, I think the possibility of having en dashes generated automatically as well would be very convenient and encourage more widespread adoption of them. But how do you notate an en dash using only the characters on the keyboard? I thought about this and came to the conclusion that there should be two optional addons to the code, one for changing two hyphens into an em dash, and one that changes two hyphens into an en dash and three into an em dash. Get the updated code. I’m curious to hear some thoughts on this.

For those who aren’t familiar with the proper usage of em and en dashes, here’s a quote from the ALA article, which I think is the best summary of the matter on the net, by Peter K. Sheerin.

The em dash (—) is used to indicate a sudden break in thought (“I was thinking about writing a—what time did you say the movie started?”), a parenthetical statement that deserves more attention than parentheses indicate, or instead of a colon or semicolon to link clauses. It is also used to indicate an open range, such as from a given date with no end yet (as in “Peter Sheerin [1969—] authored this document.”), or vague dates (as a stand-in for the last two digits of a four-digit year).

Two adjacent em dashes (a 2-em dash) are used to indicate missing letters in a word (“I just don’t f——ing care about 3.0 browsers”).

Three adjacent em dashes (a 3-em dash) are used to substitute for the author’s name when a repeated series of works are presented in a bibliography, as well as to indicate an entire missing word in the text.

The en dash (–) is used to indicate a range of just about anything with numbers, including dates, numbers, game scores, and pages in any sort of document.

It is also used instead of the word “to” or a hyphen to indicate a connection between things, including geographic references (like the Mason–Dixon Line) and routes (such as the New York–Boston commuter train).

It is used to hyphenate compounds of compounds, where at least one pair is already hyphenated (as in “Netscape 6.1 is an Open-Source–based browser.”). The Chicago Manual of style also states that it should be used “Where one of the components of a compound adjective contains more than one word,” instead of a hyphen (as in “Netscape 6.1 is an Open Source–based browser”). Both of these rules are for clarity in indicating exactly what is being modified by the compound.

Other sources also specify the use of an en dash when referring to joint authors, as in the “Bose–Einstein” paper. Some also prefer it to a hyphen when text is set in all capital letters.

While quoting the above article, I noticed what may be an error in the source where it looks like in the paragraph about en dashes em dashes are actually used. I’ve corrected it in the quote above. Typos happen, I’ll drop Zeldman an email. (Time to break out the spel chequer.)

I’m also considering adding a few other things to the next version of the plugin, so if you have anything you’d like to see in there let me know.