International URIs

Internationalized URI support in WordPress. Does your weblog tool do that? (This means you can have characters like جحخدذ in your URIs.

11 replies on “International URIs”

Now that’s pretty darned sweet. Page-level character encoding is reliable enough by now, but how is client-side support for URIs? Any browser-specific issues popping up?

Oh, and who’s running the standards for URI-level i18n? Is there an IETF paper or two on the octet/unicode translation you’re doing, or is it adapted from another system? (I’m completely uninformed about this stuff.)

Perhaps I’m misunderstanding what the fix does, but what we really need for western-european languages is something which maps accented characters such as éèçàô to the non-accented ascii equivalent.

For example, I have a category on a French-language site powered by WordPress with the name “événements” (events). This is transformed into “vnements” in the URL – far from ideal. What would be good is that “é” can be transformed into “e”, giving “evenements”.

Is that possible yet with WordPress?

Richard, why would you want that? The word wouldn’t mean the same. You really want IRIs. Escaped URIs seems like a nice solution for the time being, especially since the ones WordPress generates are IRI compatible if I’m not mistaken.

If you really want to use unaccented versions you could use the slug of course.

Anne – it may be an “evil” reason, but it’s primarily for SEO, and partly for usability (which certainly isn’t evil!).

If you take my example of “événements”, and Google for both the accented and non-accented versions, the number of results varies slightly, and the order too – but if you check the bold highlighting of terms within the search results, both versions are highlighted, irrespective of accents. Will the escaped URI work for both? What happens when the searcher leaves off one of the accents, or both, or uses the wrong accent (not everyone is a perfect speller, and many don’t use the accents in searches)?

Then comes the question of readability and usability: try reading the following URLs over the phone:

Which do you prefer?

SEO doesn’t matter. Google is capable of handling such URIs perfectly. However, like I mentioned above, you actually want IRIs. Those are not escaped and don’t need to be escaped. In such cases you can just use whichever word you want without having to make a us-ascii friendly version of it.

Richard, transliteration of accented characters is the current behaviour of WP 1.3 but that’s pretty useless for people who blog in characters that really don’t have roman “equivilents.”