This post is a blogified version of a lightning talk I gave on BarCamp London 5. It was inspired by Chris Ball's Favourite Unicode Codepoints post. It's going to be in a weird talk/blogpost hybrid form that I hope my readers will excuse.
First, I want to say that this talk is not going to convey any useful information whatsoever. You won't learn anything about internationalization, or anything else from it. I'm doing it just because it's going to be fun and awesome.
First the famous mirror trick, where text can be seen upside down, or mirrored left to right. None of it is real Unicode characters like "mirrored e" or "upside down a". It's just a bunch of characters that happen to look like that - for example "upside down p" (like in pet) is obviously "d" (like in dog). If there's no good Latin letter, a letter from other script is used, like Cyrillic or IPA phonetic alphabet. It will be more or less noticable depending on your font.

CROSSBONES
Here's a real Unicode character - Skull and Crossbones, arrr! It's used as danger signal, so it's arguably common enough for inclusion in Unicode.

This one I totally don't get. It's just a random icon that somehow got into Unicode. Unicode is huge, so they have very low standards for inclusion. Maybe it was in Microsoft Wingdings or something like that and they thought it's a good enough reason to include it.

I half-get this one. Top three lines are Japanese Post symbol. Where does the rest of the face comes from and how it got into Unicode is a mystery to me. It was probably included in some JIS standard as a joke, and Unicode copied it, or something along these lines.

Operators from APL programming language got into Unicode too. APL is like 1960s' Perl. This operator doesn't feel too good because it has to program in APL.

It's called Arabic ligature Uighur Kirghiz yeh with hamza above with alef maksura isolated form, and it's exactly what it says it is. It looks rather ordinarily for this list, but it might be the character with the longest name.

Another Arabic one. Most ligatures are for just 2 or 3 characters, but canonical decomposition of this one is whooping 18 characters. It means something like "May Allah bless him and grant him peace" and is used when Prophet Muhammad is mentioned. By the way I had a really funny picture of Muhammad that I wanted to put here, but I somehow cannot find it.

How many loops are there?

This letter is very spidery so better be careful or it will bite you.

Sometimes it's not enough to be greater than, or even much greater than something else. Oh no, you need to be very much greater than. I think TeX is spoiling mathematicians and they come up with way too many symbols, and then we have to support them.

A polar opposite of the previous character. It's not greater than, neither is it less than. We kinda have a symbol for that already - U+003D EQUALS SIGN. OK, I know it's about partial orders, and it means that two objects cannot be compared, but it's not any less funny for knowing that.

This is a very sad symbol. Not only its heart is heavy, it's also black. Is it a waste of codepoint or what? It's just a random icon not a meaningful "character".

That's my personal favorite for "worst waste of codepoint award". Not only is "Floral Heart Bullet" not a character, they even included a reversed rotated version of it in Unicode. It's an icon, not a character.

We really need a punctuation mark that says "WTF". This entire list is one big interrobang use case, am I right?

The last one is not a character, but the entire Tibetan script. It looks absolutely beautiful.
If you have any questions related to this talk/blogpost, just put them in comments.
digg
reddit
del.icio.us
DZone
18 comments:
Apparently the snowman's from a legacy character set. Which one, I don't know.
Also, check out the Arabic Letter Teh (U+062A). Something shocking happens and it turns into Arabic Letter Teh Marbuta (U+0629). Maybe there is a close call in a soccer game, because Arabic Letter Teh Marbuta GOAAAAAL!!! (U+06C3) looks quite similar.
Finally, Arabic Letter Teh With Ring (U+067C) has various uses, even if you don't read and write Pashto.
nice post
u r blog Is very nice
The florar heart bullet is actually called "the Aldus leaf." It's one of the oldest known ornaments used in printing. It has great historical and iconic value, and is - by no means - waste. In fact the Aldus leaf can be seen as a symbol for the art of printing itself.
thats kinda cool actually :)
FAIL.
One of the design requirements of Unicode is that it be "round-trip compatible" with every crappy legacy encoding ever used seriously.
What that means is that you can take some knee-biting horrible encoding like EBCDIC (take your pick of the variant) and you can take text in that encoding, translate it into Unicode, then translate it *back* into EBCDIC and there will be enough information to reproduce the *exact same* EBCDIC code-points.
To do that, Unicode must include every silly, stupid character ever used in every obscure, local encoding out there. It's somewhat unfortunate, yes, but the alternative is to break round-trip compatibility.
The snowman and the monkey both come from the land of hello-kitty. Which one of Japan's several incompatible legacy encodings, I'm not sure. Probably the JIS family.
The funky changes mentioned by an earlier commentator in regards to the Arabic Tah and Tah Marbuta are actually just a functional representation of an Arabic oddity. When there is a "Tah Marbuta" placed at the end of a word, it generally functions as an "Ah" sound. However, if a possessive or other conjunction is added onto the end of the word, that "Tah Marbuta" turns into a normal "Tah" and assumes the normal "Tah" sound and functions. Oh how I love the Arabic.
This was awesome.
I like that the skull has teeth missing.
Thanks for an excellent read.
Just wanted to point out that APL is still in daily use and not some dead programming language from the 1960s. The inclusion of the APL character set in Unicode is really necessary for APL programmers - it's our ordinary working alphabet.
Anonymous: APL is a dead language from 1960s. There are still a few leftover systems written in APL, just like there are still some vacuum tubes, horse buggies, and typewriters in use, but they're all dead technologies for practical purposes.
I'm sorry you think that. It's the number one problem that APL faces - because it's been around a long time people think it's out-dated.
I know and use lots of computer languages - C#, Java, Objective C, Ruby, etc. For some jobs APL is still my language of choice.
You should take time to find out more about APL.
No, taw, APL is not a dead language from the 1960s. The character in question was introduced and used in the language from the 1980s. APL may have been in its prime back then, but like Mary Queen of Scots, it's not dead yet.
I would appreciate you not mentioning that u found a funny picture of Muhammad. It is very offensive to people as a whole.
You say "Unicode is huge, so they have very low standards for inclusion.". But that's not true, in fact Unicode has very high standards for inclusion with some remarkably erudite and detailed discussion. It's not a perfect process, but by no means do they just include every goofy character set they find.
U+FDFD ﷽ (a ligature of "ﻢﻴﺣﺮﻟﺍ ﻦﻤﺣﺮﻟﺍ ﷲﺍ ﻢﺴﺑ") has to be one of the most awesome Unicode characters. However, it is hard to find a font supporting it. I have found only three fonts supporting it: Nafees Nastaleeq, GNU Unifont, and PakType Naskh. Though GNU Unifont looks really bad with it.
Post a Comment