This post is a blogified version of a lightning talk I gave on BarCamp London 5. It was inspired by Chris Ball's Favourite Unicode Codepoints post. It's going to be in a weird talk/blogpost hybrid form that I hope my readers will excuse.
First, I want to say that this talk is not going to convey any useful information whatsoever. You won't learn anything about internationalization, or anything else from it. I'm doing it just because it's going to be fun and awesome.
First the famous mirror trick, where text can be seen upside down, or mirrored left to right. None of it is real Unicode characters like "mirrored e" or "upside down a". It's just a bunch of characters that happen to look like that - for example "upside down p" (like in pet) is obviously "d" (like in dog). If there's no good Latin letter, a letter from other script is used, like Cyrillic or IPA phonetic alphabet. It will be more or less noticable depending on your font.
CROSSBONES
Here's a real Unicode character - Skull and Crossbones, arrr! It's used as danger signal, so it's arguably common enough for inclusion in Unicode.
This one I totally don't get. It's just a random icon that somehow got into Unicode. Unicode is huge, so they have very low standards for inclusion. Maybe it was in Microsoft Wingdings or something like that and they thought it's a good enough reason to include it.
I half-get this one. Top three lines are Japanese Post symbol. Where does the rest of the face comes from and how it got into Unicode is a mystery to me. It was probably included in some JIS standard as a joke, and Unicode copied it, or something along these lines.
Operators from APL programming language got into Unicode too. APL is like 1960s' Perl. This operator doesn't feel too good because it has to program in APL.
It's called Arabic ligature Uighur Kirghiz yeh with hamza above with alef maksura isolated form, and it's exactly what it says it is. It looks rather ordinarily for this list, but it might be the character with the longest name.
Another Arabic one. Most ligatures are for just 2 or 3 characters, but canonical decomposition of this one is whooping 18 characters. It means something like "May Allah bless him and grant him peace" and is used when Prophet Muhammad is mentioned. By the way I had a really funny picture of Muhammad that I wanted to put here, but I somehow cannot find it.
How many loops are there?
This letter is very spidery so better be careful or it will bite you.
Sometimes it's not enough to be greater than, or even much greater than something else. Oh no, you need to be very much greater than. I think TeX is spoiling mathematicians and they come up with way too many symbols, and then we have to support them.
A polar opposite of the previous character. It's not greater than, neither is it less than. We kinda have a symbol for that already - U+003D EQUALS SIGN. OK, I know it's about partial orders, and it means that two objects cannot be compared, but it's not any less funny for knowing that.
This is a very sad symbol. Not only its heart is heavy, it's also black. Is it a waste of codepoint or what? It's just a random icon not a meaningful "character".
That's my personal favorite for "worst waste of codepoint award". Not only is "Floral Heart Bullet" not a character, they even included a reversed rotated version of it in Unicode. It's an icon, not a character.
We really need a punctuation mark that says "WTF". This entire list is one big interrobang use case, am I right?
The last one is not a character, but the entire Tibetan script. It looks absolutely beautiful.
If you have any questions related to this talk/blogpost, just put them in comments.
49 comments:
Apparently the snowman's from a legacy character set. Which one, I don't know.
Also, check out the Arabic Letter Teh (U+062A). Something shocking happens and it turns into Arabic Letter Teh Marbuta (U+0629). Maybe there is a close call in a soccer game, because Arabic Letter Teh Marbuta GOAAAAAL!!! (U+06C3) looks quite similar.
Finally, Arabic Letter Teh With Ring (U+067C) has various uses, even if you don't read and write Pashto.
u r blog Is very nice
The florar heart bullet is actually called "the Aldus leaf." It's one of the oldest known ornaments used in printing. It has great historical and iconic value, and is - by no means - waste. In fact the Aldus leaf can be seen as a symbol for the art of printing itself.
thats kinda cool actually :)
FAIL.
One of the design requirements of Unicode is that it be "round-trip compatible" with every crappy legacy encoding ever used seriously.
What that means is that you can take some knee-biting horrible encoding like EBCDIC (take your pick of the variant) and you can take text in that encoding, translate it into Unicode, then translate it *back* into EBCDIC and there will be enough information to reproduce the *exact same* EBCDIC code-points.
To do that, Unicode must include every silly, stupid character ever used in every obscure, local encoding out there. It's somewhat unfortunate, yes, but the alternative is to break round-trip compatibility.
The snowman and the monkey both come from the land of hello-kitty. Which one of Japan's several incompatible legacy encodings, I'm not sure. Probably the JIS family.
The funky changes mentioned by an earlier commentator in regards to the Arabic Tah and Tah Marbuta are actually just a functional representation of an Arabic oddity. When there is a "Tah Marbuta" placed at the end of a word, it generally functions as an "Ah" sound. However, if a possessive or other conjunction is added onto the end of the word, that "Tah Marbuta" turns into a normal "Tah" and assumes the normal "Tah" sound and functions. Oh how I love the Arabic.
This was awesome.
I like that the skull has teeth missing.
Thanks for an excellent read.
Just wanted to point out that APL is still in daily use and not some dead programming language from the 1960s. The inclusion of the APL character set in Unicode is really necessary for APL programmers - it's our ordinary working alphabet.
Anonymous: APL is a dead language from 1960s. There are still a few leftover systems written in APL, just like there are still some vacuum tubes, horse buggies, and typewriters in use, but they're all dead technologies for practical purposes.
I'm sorry you think that. It's the number one problem that APL faces - because it's been around a long time people think it's out-dated.
I know and use lots of computer languages - C#, Java, Objective C, Ruby, etc. For some jobs APL is still my language of choice.
You should take time to find out more about APL.
No, taw, APL is not a dead language from the 1960s. The character in question was introduced and used in the language from the 1980s. APL may have been in its prime back then, but like Mary Queen of Scots, it's not dead yet.
I would appreciate you not mentioning that u found a funny picture of Muhammad. It is very offensive to people as a whole.
You say "Unicode is huge, so they have very low standards for inclusion.". But that's not true, in fact Unicode has very high standards for inclusion with some remarkably erudite and detailed discussion. It's not a perfect process, but by no means do they just include every goofy character set they find.
U+FDFD ﷽ (a ligature of "ﻢﻴﺣﺮﻟﺍ ﻦﻤﺣﺮﻟﺍ ﷲﺍ ﻢﺴﺑ") has to be one of the most awesome Unicode characters. However, it is hard to find a font supporting it. I have found only three fonts supporting it: Nafees Nastaleeq, GNU Unifont, and PakType Naskh. Though GNU Unifont looks really bad with it.
I appreciate you mentioning that you found a funny image of Mohammad. It made someone overstate the severity of the infraction by stating that it is "very offensive to people as a whole."
Nice.
FLORAL HEART BULLET, REVERSED ROTATED is actually a typographic symbol used in french writing (mostly academical and in beaux-lètres). So it's not as useless as it may appear:)
Thanks for the interesting peep into the wonderfully cute world of Unicode characters. Saves a lot of trouble when you know it's already built-in and you don't have to create one from scratch. Would love to get a cheat sheet of these.
I have been trying to write something that you would understand, as you are non-Muslim, to express to you how much disappointed I was when I read your comment about the funny pic you found for prophet Mohamed.
So, just at least from respect to other people point-of-view, don't make such comments again, please.
We Muslims, do respect the other prophets and acknowledge them. We don't make funny comments about them or take them as a subject of cartoons and jokes. They are prophets, that means God chosen them among all other people to deliver his message. If God chosen them to deliver his message to us, how would one underestimate them and make cartoons and jokes about them??!!
Anonymous: Making fun of religion is an established part of the Western culture at least since the Enlightenment - people have been making fun of the religion, mostly of Christian religion but others are not spared either, for very long time.
You should respect our culture, including our custom of making fun of different religions. We don't force you to make or read any cartoons or jokes yourself.
How cool is the interrobang‽
I want to know what the heck a character with the name of 'allah' is doing in the character set. Islam is a violent religion which has constantly attacked other beliefs since the illiterate 'prophet' mohamud appeared. See how these muslims will complain about pictures of mohamud and yet they kill each other every day, mutilate and rape their own women, and engage in deceit in order to spread their sick religion.
Its very funny,really enjoyed a lot.
see this one
http://digitalpbk.blogspot.in/2006/11/fun-with-unicode-and-mirroring.html
grow up, muman. if you don't know what the hell you're talking about, just keep quiet
@muman613 If you really want to know, it's because simple text processors don't know to automatically put a shadda (for gemination) and an alif (for vocalization) on top of the second lam.
Oh and Islam prohibits all pictures of people, so you can imagine exactly how offended they get at a picture of Muhammad.
Take a look at this one:
ه̒ͨ҈҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉
.....wzium!
WE ARE LEGION IN UNICODE LULZ
( ͡° ͜ʖ ͡°)
How do you appreciate something that is very offensive to people
Aristocratic & educated westerners are know to uphold a level of respect for other religions & cultures. What you said makes no sense at all. Westerners are know to be very respectful people, maybe you are an exception.
i am very sorry, but i am missing U+1F4A9 - the pile of Poo
��
this is too fun to keep it for yourself.
Ok, I just wanted to say that all of your little comments about the different types of Unicode were absolutely hilarious.
I find your indignation EXTREMELY offensive. How dare you demand special treatment in this forum. All religion is complete twaddle, and our glorious western liberal civilization's freedoms that generations of people fought for will not pander to disgusting repressive censorship from people who want to take us back to the dark ages. Show some respect for the high ideals of secularism, and stop bleating.
God does not exist.
i think you have enough time for waste.
࿉ looks like poop to some and an egg with a curve and dot to others.
࿂
a wild unown appeared!
Hey fellow Anonymous users. You ll realize that getting each other mad on an old forum like this certainly isn't going to further your --... I'm really not a great example for preventing arguments if I do this, am I? Gosh, how paradoxical. Well, you all of various religions have a nice day, and don't worry about those who have harm in their hearts, just understand they know no other reaction. Perhaps one day they'll come about. Today is not that day.
Also, that poo Unicode character is definitely worth it.
𠆭𪟧嫐𠁥𡷉𠃠𢀓𠁼𛀀𢀓𠁧
࿂roll bowls
Look up U+130BA EGYPTIAN HIEROGLYPH D053 in some font. For some reason mayn fonts with otherwise good coverage of Egyptian Hieroglpyhs ommit this one, or just render it as square
𓂺
🕋🕌۩۞ﷺﷻ﷼ﷲﷴﷷ﷽
This is neat
It's absolutely disgusting how you were planning on including an image of the Prophet Muhammad. Have some respect for other religions, you nasty swine.
๛ (Thai Character Khomut) actually looks much more boring.
Post a Comment