taw's blog: Writing replacement for magiccards.info

Saturday, October 10, 2015

Writing replacement for magiccards.info

♥ My Girl ♥ by Trish Hamme from flickr (CC-BY)

magiccards.inf is probably the most useful MTG site out there. Comparing it to Gatherer is like comparing Birthing Pod to Search the City.

It has a few issues I wrote about, but it never bothered me enough to do anything more than complain on the blog. Unfortunately it seems to be abandoned, so someone needs to write a replacement, and that someone might just as well be me.

The plan

The plan is very simple:

get data on cards
write search engine which accepts magiccards.info style queries
write a web frontend to search engine
host it somewhere
get images
add extra functionality
avoid lawyers

Gathering data

Step one seemed like a messy Gatherer scrapping, but it turns out mtgjson website already hosts 99% of data we need. I manually entered a few more bits like which sets belong to which blocks, etc. There's still some missing bits like assigning frame types to individual promo cards, but it's really low priority.

For that matter I'm not sure magiccards.info is doing a particularly good job on it. For example what makes Unhinged Forests is:funny? No idea.

Writing search engine

I wasn't sure how complex it's going to be, but it turned out it's fairly easy. I wrote a search engine in Ruby, and put it on github. It can be used from command line, or as a library.

Vast majority of queries like a:"steve argyle" t:elf, e:gatecrash r:mythic t:legendary t:angel, f:standard pow>cmc c:g, or "Birds of Paradise" translate to very simple code.

The search engine still misses a few things. Biggest remaining problems are:

How to handle 2-in-1 cards (split, flip, DFC) - are these simple two cards which just happen to share cardboard (and color identity for commander), or is it one card. In certain way Wear // Tear should return true when asked cmc=1 cmc=2 (this query is why Counter-Top loves this card so much), in certain way it shouldn't.
What the hell expressions like mana>={u/g} even mean? mana is a reasonable query when both card and query are just made out of "normal" mana symbols - or even Unhinged half-mana symbols - but is {u/g}>={r/u} true or false or what?
How to make it a bit more performance, as loading 50MB JSON file takes about 3s on my laptop (and then individual queries are super fast) and well as too much memory for my liking.
Some fallback method to deal with spelling errors.

And there's still some small remaining bits. But generally, search engine is about as good as one on magiccards.info. A lot of improvements are already there. For example:

reminder text is stripped from Oracle, so you can now reasonably search for green flying creatures without hitting all reach creatures - and it also correctly makes Dryad Arbor match is:vanilla
support for b:Innistrad style queries for filtering cards by block
support for loyalty: queries
support for Unhinged fractional values
color identity queries ci:wb actually mean something now (in this case - can go into White/Black Commander deck)
support for querying random metadata mtgjson had like watermark:gruul
proper differentiation between cards with mana cost 0 and no mana cost (like lands)
some convenience features like not:black-bordered

There's still stuff to do, but I'd say it's about 80% done after one productive all-nighter.

Writing web frontend

Search engine is written in Ruby, so it would be fairly easy to do a Rails front-end. This runs into issue of how to present the data (just text, no pictures yet), write some syntax guide etc., but none of that is terribly complicated.

I could also do a silly thing and write a coffeescript version (or cross-compile ruby to javascript to run on client side) where all calculations are done client-side - which would incidentally completely solve hosting problem, as hosting static sites is really easy to set up.

Host it somewhere

Somehow it's been a long while since I last put any nontrivial website online myself, usually I use other people's EC2 accounts or servers for that.

It's mostly awkward from monetary point of view, as it could get significant traffic (especially if magiccards.info never recovers). For text it's not too bad, but for tons of images it could get painful.

I'll look at it once I get previous step done. It's probably something I should take a look at anyway.

Get images

Search engine doesn't necessarily require pictures - it could get away with just text data, card images are sort of more decoration than anything else. But it would be better with images. I'm not entirely sure where to get them from - presumably from wherever programs like cockatrice get them. Hosting them is a bit of a bandwidth cost risk.

Add extra functionality

Once I go through trouble of setting it all up, I could add some extra features like Sealed pool simulators. (I wrote a few before), or whatever I feel like. No point talking about it that early.

Avoid lawyers

Unfortunate side effect of doing anything Magic-related online is that technically WotC lawyers can screw you up any time they want - and sometime they go on a rampage with tons of innocent casualties, like fan art, draft simulators, etc.

It's pretty much random, as thousands of sites (blogs, shops, etc.) use Magic card images, it's well within boundaries of traditional fair use, and most of the time Wizards couldn't care less, but it would suck horribly to put all effort (and own money for hosting) into setting up such site, and then get it removed by a lawyer on a power trip.

Coming soon

Anyway, I'll keep you posted, or if you're impatient just check my github project. If you have any feature requests, github is probably the easiest way, or just contact me in any other way.

1 comment:

ElephantofDoom said...: It seems like Magiccards.info has not been abandoned, it just is taking longer for the guy running it to update. Maybe you should contact him and offer to help him.; 06:27