The best kittens, technology, and video games blog in the world.

Thursday, January 15, 2009

Subtle changes is Web architecture

ridiculously cute bunny by ..its.magic.. from flickr (CC-NC-ND)
Today I'd like to discuss some things that has been happening to the Internet lately, and I'm talking about the subtle stuff, not the well known things like AJAX, Comet, and Javascript that doesn't suck.

wget --mirror no longer works

Simple mirroring crawlers like wget --mirror used to work very well, and you could easily get a static snapshot of most websites. The way it worked was extracting and following all links, like <a href=''>, <img src=''>, <style href=''>, <script src=''>, and <link rel='stylesheet' href=''>. When you think about it - there weren't that many kinds of links.

Now it fails for most websites. Links from Javascript, and CSS @import are so prevalent, it's harder to find a website without them than one with them. In theory wget could be patched to get it back to supporting 90% of websites, but unfortunately nobody seems to care, and it's in C, not the most popular language amongst Web developers.

Per-website settings no longer work

It used to be possible to tell browser to treat some website in a special way. If you wanted not to send cookies to Google Search, or to proxy-anonymize your traffic to porn torrent search website, or make Comedy Central think you're watching Colbert Report from US not UK, you'd set it on domain basis. It won't work any more, as too many websites use multiple domains on the same page, serving media files or Javascripts or anything else they feel like serving from different domains. They could even share domains with other unrelated websites, like getting Flash videos from youtube or images from S3. This means per domain settings are no longer a good enough substitute for per-website settings, and no other equivalent is widely available.

With most modern browsers and tabbed browsing it's not even that simple to open different instances of a browser, and switch settings per instance, you'd need to use Firefox for normal browsing, Opera for some extra privacy, and Safari for porn, or something like that.

Email spam is no longer a problem

I don't mean there's no more spam, there's plenty of it - but good web email clients
got just unreasonably good at filtering it into Spam folders, something none of desktop-based solutions of 5 years ago could do. So nobody needs to obfuscate their email address any more. I don't think spammers really cared about obfuscation anyway, do you think they're too dumb to use regular expression like /@|\s*\[\s*at\s*\]\s*/i? Putting your email address in captcha would work a lot better, but pretty much nothing supported it anyway.

Anyway I don't care any more, and I don't see why anybody else would. My email address is Tomasz.Wegrzanowski@gmail.com and you can send more viagra ads to my Spam folder if you want to.

Life of honest bot makers is harder

In just a few short years Captcha infected the entire Internet. They don't give any protection against people who make money on breaking your website - Indian captcha breakers cost something like a dollar per 1000 captchas, and if that didn't work specialized computer algorithms seem to be catching up with human solvers quickly enough.

The main result of captchas is making life harder for honest bot makers like me. Yes, I like screenscrapping sites and extracting fun info from them. Do you think I made the list of most popular blog posts on this sidebar by manually entering data into OpenCalc spreadsheet? No, I just screenscrapped Google Analytics, Delicious, and Blogger to get all relevant data - fortunately they weren't captcha-protected.

People like me cannot really afford to spend time finding and contracting Indian captcha breakers, it only makes sense for high volume for profit operations like spammers. It's ironic that this alleged anti-spam measure hurts spammers very little, and hurts honest bot makers a lot. I guess disabled people are not big fans of it, and neither are normal people who fail half of the captchas already, and will fail 80% soon if the trend in captcha complexity continues.

Sensible ideas get abused a lot

If you want to register or change password you need to enter your new password. To make sure nobody sees it it's invisible, but then it's easy to make a typo. It's not a big deal during login, as you can always retry, but it would be really annoying if you made a typo during registration and couldn't login at all.

So a perfectly sensible idea was to make you enter your password twice. You're unlikely to do the same typo twice, so you're protected against both onlookers and mistakes. So who was the first asshole who thought it's a good idea to make people repeat their email address?

Another thing that really annoys me are websites that require captchas for anything else than potentially spammable actions. Some forums want captchas for search, how stupid is that? It's pure usability breaker, making human action harder at no benefit whatsoever.

While I'm at it, let me whine about Xbox Live registration. They use double-entered invisible password just like any website - except that Xbox360 doesn't have a keyboard, so they use big highlighted virtual keyboard on the screen, so every onlooker sees your password anyway.

Geographical restrictions are harder to bypass

Web used to be full of open proxies, so limiting viewers by country wasnt't really working. Unfortunately most proxies got closed because of spammers, and now it's really hard to access Pandora or Comedy Central from UK IP address. Pandora is the only such service that I care about, if I wanted Colbert Report so badly I could always use Bittorrent. Unfortunately the fascist music industry forced Pandora to introduce this and all other restrictions, and none of Pandora alternatives are even close, and believe me I tried many of them.

It's such a shame because Pandora is the best thing to happen to music since Audio Galaxy, which was also great in the same way of making meaningful music recommendations. I'm surprised Pandora is still alive even in their restricted form, it must really piss off some people at the record industry that some innovation is happening in spite of all their efforts to supress it.

Internet is a much milder place

People used to trick others into clicking links to goatse.cx and other shock sites. Now it's Rick Astley. Even 4chan which not that long ago was full of gore and borderline kiddie porn is fascinated by a cute hyperactive girl from Gaia Online. Torrent sites are no longer about kinky porn - they're mostly about downloading TV series so you can watch them without following TV schedules, just a more convenient version of TiVo.

There are no ads on the Internet

It's funny because pretty much everything on the Internet is ad-supported, yet nobody has to watch ads unless they choose to, by not installing an ad-blocker. Ads started as fairly reasonable static links and banners that nobody cared about. So ad companies in their greed and stupidity moved to animated banners, flash, popups, popunders, and abused users so much that a great backlash emerged, and now all ads are gone, even the most meaningful and least annoying ones like Google text ads.

I like this example of consumers taking control over things. In pretty much everything else big corporations abuse their powers and force consumers into following their ways, while governments passively stand by or even help the abusers. For example banks can charge you extra fees whenever they want to, and it's up to you to challenge them. Who gave them that right? Why cannot consumers charge bank fees for crappy service for symmetry? Such abuses are really common, and consumers rarely stand up against them. I cannot really think of any case other than ad blocking where it worked.

Most of these processes have been happening slowly and quietly in the background, but over years they made Internet into a very different place from what it used to be.

19 comments:

Unknown said...

Um... $title =~ s/is/in/ ? Or is that on purpose?

Anonymous said...

Another thing that really annoys me are websites that require captchas for anything else than potentially spammable actions. Some forums want captchas for search, how stupid is that? It's pure usability breaker, making human action harder at no benefit whatsoever.

Search is often a CPU-heavy operation. Somebody wanting to ruin your day can easily start doing thousands of search requests to bring the site to its knees.

Of course, there are better solutions to that problem, but if we used good solutions this wouldn't be the internet, would it?

taw said...

Divided Mind: Not, it's me relying on spell checker too much. If it spells I think it's OK. I somehow make fewer mistakes when I don't have spell checker.

Anonymous: You can rate limit per IP, what would be slightly less assholish, or get a search feature that doesn't suck. Google can search the entire Internet without captchas, so why your forum should be any more complicated than that?

Anonymous said...

There are still a lot of pop-ups. I had to install Adblock on my new Ubuntu system

taw said...

Anonymous: Installing Adblock is the same as installing Firefox and Flash, just something you do on every computer before you start browsing.

Anonymous said...

Search is a tough one. Google has also invested billions of dollars into their search technology. As well, they can buy their own airport. Does that mean we should all be able to duplicate their success?

But rather than duplicate, one can utilize/integrate Google's search for their own site, which is what many do nowadays.

Aaron Griffin said...

I haven't used an ad blocking utilities i ages and am going fine.

This article gets huge props from me. Cheers!

Anonymous said...

haha, opera for porn. now thats an idea. a browswer totally tricked out for streaming and dl'ing porn.

Anonymous said...

With most modern browsers and tabbed browsing it's not even that simple to open different instances of a browser, and switch settings per instance, you'd need to use Firefox for normal browsing, Opera for some extra privacy, and Safari for porn, or something like that.

With Firefox, you can create different profiles with different settings and run them simultaneously. No?

taw said...

prakash: I tried it, but it's pain in the ass to set it up, and it behaves funny when opening links from other applications.

Trev said...

What you're doing for your popular posts sounds like what PostRank does. You might want to try out the Top Posts widget.

taw said...

Trevor: It looks good. It also has very low correlation with results of my script, I wonder which results are better.

Trev said...

You can see the details for your blog here. The individual metrics (del.icio.us, Google blogsearch, etc.) will show up when you hover over the postrank.

DanielAjoy said...

Internet is a much milder place

I've noticed that too. Any idea why is that?

taw said...

Blank: Some obvious factors are demographic shift away from college-age males who tend to be the most cynical people, and commercialization.

Anonymous said...

Dude, awesome post! This is one of the few blog posts I have actually read all the way through (I got here from reddit.com)
Interesting read! And as for you "milder place" comments, its just much more mainstream. It used to be that only geeks were on the internet, no less knew how to use it. Now, even my grandmother can find and print out her crossword puzzles from her town's newspaper's website. The internet has become mainstream, and the frameworks behind it are struggling to meet demand.

Anonymous said...

Won't somebody please think of the educated middle-class caucasian males :(

Anonymous said...

Firefox has support for multiple profiles. Just run firefox -no-remote -ProfileManager.

mathew said...

Actually, per-website settings still work pretty well, because typically the stuff that's transcluded from other domains is crap you don't want anyway--like Google analytics, ads, and systems to let clueless morons comment on the page.

So what actually happens is that the site owner foolish enough to use files transcluded from other domains misses out on ad revenue and gets incorrect page metrics. Which isn't a problem for me.