Monday, August 27, 2012

Script to convert Google+ takeout into a single easy to use document

Search cat by zenera from flickr (CC-SA)

Google+ did many things wrong like their retarded and discriminatory real name policy, but one surprising thing they did right that almost everybody else gets wrong was making it easy to export all your data using Google Takeout.

Unfortunately Google+ posts from Takeout (and pretty much everything else from Takeout) are pretty hard to use directly, but we're all hackers, so it's not a big deal to reformat them, and at least this one time it doesn't involve breaking any Terms of Service or working around any rate limiters, captchas, and other such nonsense just to get your own data.

I wrote a script to process Takeout archive into a single easy to search HTML document. Since it's pretty short, I put it in unix-utilities repository on github (the one I wrote about earlier) instead of making a new repository for it.

It's very easy to use (Stream/ directory is how it's packed in Takeout .zip):
process_gplus_takeout Stream/ output.html
It removes everything except actual content and attachments, and sorts entries by date. If you want to include different things or filter them, it should be pretty easy to modify the script.

It's even a reasonable example of how to use Hpricot to mass-process a lot of HTML documents if that's a new thing to you.

About the only hard part is not arranging computations in a way that doesn't load DOM of every single HTML file in memory simultaneously, but extracts them one by one instead, and frees DOM in between. It probably doesn't even matter in this case, since it's just a few MBs of HTML, so even all DOMs will fit in memory together, but it's a good practice in general.

1 comment:

  1. Cheimon21:26

    I have been playing Concentrated Vanilla for quite a while now, and felt like giving some more feedback to see if that ends up with a higher quality mod. I've been playing as Russia.

    First, the strong rebels. I'd like to re-emphasise that I reckon it's too strong: certain factions really depend on an early rush of taking different settlements (Russia, notably, starts with just Novgorod, and to take the rest of russia takes many years. Lithuania and various other rebels survive into the 13th century as independent kingdoms, which doesn't really feel like vanilla. It's crippling to some nations that start off small, essentially: they can't grow and the AI wastes their armies on sending them piecemeal. It makes for interesting gameplay, but AI doesn't like it and I don't feel that it concentrates the vanilla experience. It just makes the first section of gameplay last an extra 50-70 years in a campaign.

    Cannons are fantastically powerful. This is brilliant. I use them in all my armies, along with cossack musketeers.

    Pikes are also very powerful, which might or might not be good. They're vulnerable to missiles, but because the AI never tries to exploit this, it ends up making them very powerful indeed. Playing as Scotland was very easy for this reason: even a simple unarmoured pike unit could hold the line nicely against superior warriors from the Holy Roman Empire.

    Sallying out is really, really easy to exploit...indeed, you can't stop them from going out. Russia overcame the enormous rebel hordes purely by sending out armies of archer militia with minimal infantry and pelting them to kill most of them before they were well out of the gate. My later gunpowder armies tend to treat sieges as a field battle where they're bunched together: get some cannon and lots of musketeers and make them run away before they touch the line or get organised. It's horrendously effective and it's weird to see that they don't try using the walls and streets, which would be much better.

    I'm assuming it's impossible to make carracks travel faster over the ocean like you've made agents so much faster. It'd be nice if you could, since planning the response to aztec sieges years in advance doesn't feel right.

    Characters don't seem to age properly. Is this deliberate, or a bug?

    Cities are great and one turn building works very nicely. Given how useful chivalry is, I haven't yet seen a need to take taxes from anything but 'low', but that might just be me.

    The rebel version of Russia's 'Spearmen' Unit, frequently used in rebel armies, has no texture and appears as white. Any chance of implementing a fix? It must have been done by some modder or other.

    Any chance of adding a message when a princess comes of age? I often miss them, and they get bad traits the older they get, like 'man-hunter' that makes them far less useful.

    Lands to conquer did a really nice job of making merchants make a decent amount of money without feeling overpowered (like Darth's merchants were). I think he did increase trade resources, which would speed up the campaign, which doesn't have to be bad. As I said before, outlying provinces are great for merchants...while those mines do make a chunk of money, historically money also came from trade in chocolate and tobacco and slaves etc (which are there), and you're also losing about half of that money from mining to corruption. They're nice, but they're not worth going out of your way (ie fighting aztec hordes and spending years at sea) to get at the moment.

    If starting positions are more equal, you could safely reduce the peacefulness of Lusted's AI. The computer is very hesitant to declare war: you can practically ally with everyone in the game before they've got around to infighting, and it makes Europe in particular a very static place to be...as do the impregnable rebel settlements, but I've already mentioned those.

    ReplyDelete