Monday, November 09, 2015

Writing replacement for magiccards.info - syntax improvements

my adorable cat by pinkiwinkitinki from flickr (CC-SA)
A couple years ago I started bitching about how magiccards.info could be improved. Unfortunately that on its own rarely leads anywhere, so a couple of weeks ago I decided to write a replacement for magiccards.info. Since then it turns out that magiccards.info got sort of reactivated, so my project became a lot less urgent, but I got so much cool stuff implemented that I decided to come back and finish it.

All the code is available on github. It has somewhat unpolished Ruby on Rails-based frontend (without any card pictures) as well as command line interface and ruby API.

You can run it locally on just about any OSX machine if you know how to run Rails (git clone, bundle install,  rails s), and it should works on Windows and Linux even somehow if you can run Rails there.

I obviously need to look for some hosting solution so it's usable for average person. It would be relatively easy to add images to it, it's really mostly hosting issue. Frontend links to Gatherer (as well as magiccards.info), so except for a few weirdo cards in mtgjson database but not on Gatherer like International Collector Edition picture is one click away, so I guess I could enable hotlinking for local use, but let's not give WotC's paradoid lawyers any more reasons to go after this.

If anybody has hosting recommendations, I'd love to hear them. Oh and it should probably be called something more unique than magic-search-engine.

The search engine is backwards compatibly with about 90% of magiccards.info syntax, so you won't need to break your habits. Here I'm documenting just major additions and fixes.

Weirdly even some completely undocumented syntax from magiccards.info works the same - like is:reserved and loyalty>=6 work identically on both search engines, even thought I had no idea magiccards.info had that when I added them to my search engine (it's in mtgjson data we both use, so it's not entirely accidental).

Full documentation for all the syntax is available in the frontend.

Automatic data cleanup

Real data is full of silly inconsistencies and it's search engine's job to try to clean them up. The most important such cleanup is completely stripping out reminder text from o: queries, so it's finally possible to search for flying green creatures with straightforward o:flying c:g t:creature - or to correctly have Transguild Courier and Dryad Arbor match is:vanilla for your sweet Muraganda Petroglyphs deck.

Another thing that's not matched because it shouldn't be is foreign names in default mode, so you don't get unrequested silliness like this.

These are the biggest cleanups, but there are some weird things I ran into which I told the search engine to fix as well.  For example would you like to search for planeswalkers with -3 loyalty ability? That would be t:planeswalker o:-3, right? Well, that won't work because on planeswalkers it's a Unicode minus sign (U+2212), not ASCII symbol which your keyboard and all other cards have. Except on DFC's Garruk flip side, where it's ASCII again.

Much better spelling suggestion system

If you try to search for "kolagan command" on magiccards.info, it will helpfully ask if you meant "Kolaghan's Command", which with one more click you can get to. But that works only for full card names - searching "kolagan" will return no cards and no suggestions.

Here it's much better, if search engine can't find precisely what you were looking for, it extends all names which were not in any title of real Magic card to all misspellings. So you can search "kolagan", "purphuros", "tezeret seeker", "joira of githu" and so on.

The engine is smart enough to know not to do that if something is a real magic card, so if you search for "mox f:standard", it will not try to autocorrect it to "ox" and return Standard-legar "Yoked Ox".

Sensible sorting

You can specify how to sort the results as part of the query with  sort:new,  sort:old, sort:name etc.

sort:new / sort:old sensibly treat all supplemental sets as lower priority than every Standard set, so random duel deck reprints won't pollute your results when you're trying to search by actually newest.

Use sort:newall and sort:oldall if you want to treat all sets equally.

Robust handling of multipart cards

This is probably the most complex additional syntax.

On base level search engine operates on individual parts of cards, so "Tear" or "Chandra, Roaring Flame" are what it cares about, not "Wear // Tear" or "Chandra, Fire of Kaladesh // Chandra, Roaring Flame" together. The only exception is color identity (ci:), which is defined by rules to apply on cardboard level (and which is currently broken on magiccards.info).

Of course sometimes you actually care about whole cards, so there's new syntax for that.

A // B will return any card with one part matching A, and the other part matching B. It can be used for title matches like Wear // Tear or Gideon // Kytheon, but queries can be anything like t:human // t:insect, or mana=1r // mana=w.

You don't even need to specify the other side if what you've got is specific enough. c:r t:werewolf // will return all multipart cards which have red Werewolf on one of their parts.

I'd guess this is most useful syntax, but if you want to be more specific you can use other:condition to specify what goes on the other (not returned) side of the card, like with c:w t:creature other:(c:b) returning just Cloistered Youth and Loyal Cathar (but not their black sides).

There's also part:condition syntax for specifying that either side matches, so for example part:t:enchantment matches cards which are either enchantments, or their other sides are.

For vast majority of queries I expect people to just use A // B syntax - which for that matter expands to part:(A other:B) behind the scenes.

To simply query particular type of multipart card you can use is:flip, is:split, is:dfc or particular kinds of them, or is:multipart to get them all.

Searching for multiple card versions

As analogue of part: system you can also query other printings of same card with alt:.

For example to find a card which was printed by both Rebecca Guay and someone who's not Rebecca Guay, ask a:"rebecca guay" alt:(-a:"rebecca guay"). Or to find all cards which stood test of time and were printed in both 1993 and 2015 ask year=1993 alt:year=2015.

Block queries

You want to get all equipment from Mirrodin block? That would be t:equipment (e:A or e:B or e:C) kind of query, if only you could remember set symbols for all its sets - and it doesn't help that Gatherer, magiccards.info, and mtgjson often use different symbols for same sets (for example Alpha can be 1ED, LEA, or AL in different sources).

That has trivial fix in t:equipment b:mirrodin syntax, which matches all equipment in a block. For that matter if you remember symbol of first set in the block, you can use it as a shortcut, so b:rtr t:angel returns you all Angels from Return to Ravnica block. Some sensible logic is used to prevent matches for overlapping blocks, so b:ravnica matches only original Ravnica.

For that matter you have third alternative of using syntax like f:"mirrodin block" t:land for individual block formats, but that's more wordy, and excludes banned cards.

Time travel

You can time travel and search Magic cards as they used to be. So to search Standard as it was during New Phyrexia just ask for time:nph f:standard.

If you time travel it obviously won't show you any printing from the future, and formats will be as they were at that point, with set legality and banned and restricted list.

Historical banned and restricted list should be fully accurate from September 2004 Legacy/Vintage split onwards. Earlier than that best available data is often not fully consistent, so it's only going to be mostly correct. Quite often there's unbanning announcement, but original banning is lost somewhere in depths of rec.games.deckmaster Usenet group. If anybody has better data than what I managed to find (spending far more time on this than is reasonable), I'd love to hear about it.

Search engine doesn't plan to do anything as extreme as getting old version of Oracle text, then again it would be kinda fun.

time: currently moves the entire query to a certain point in the past, so you can't mix multiple times in same query.

Reprint search

Database supports queries like e:ktk firstprint<ktk which would return all cards from Khans of Tarkir which are reprints. You can use either set code or specific date (year or day) and any comparison operator, like let's say print>wwk t:jace (Jaces printed after Worldwake), firstprint=1993 r:mythic (all cards first printed in 1993 which ended up as mythic rares), or lastprint=8e (cards for which last printing was 8th edition) is supported.

Unfortunately technically prerelease promos count as prior printing as prerelease cards are "released" one week before actual set. That might need some changing.

Support for nonstandard card types

By default only "normal" card types are included, but you can explicitly request other kinds by type, like with t:conspiracyt:scheme or even t:dominaria.

You can also request all cards with t:*, presumably followed by more specific criteria.

This is implemented as a special filter - any reference to type of non-standard cards anywhere in the query will switch query from regular mode to everything mode, so for example if you ask for -t:scheme it will expand the query to match Planes, Conspiracies etc.

Tokens are currently not included, but since they're in mtgjson database I might change that.

English only

If English was good enough for Jesus, it ought to be good enough for Jace. Search results are not polluted by fake matches from foreign language cards like they are on magiccards.info.

Foreign card names are still in the database, so it's possible to add a mode to search for it, but it's going to be strictly English only by default.

Really minor improvements

Various information about card frame like layout:leveler (for card layout), w:gruul (for watermarks) is supported in addition to already existing queries like is:black-bordered, is:future etc.

You can use either f:edh or f:commander - they both work.

All is: queries can be negated with not: like not:reserved.

Unhinged fractional power, toughness, and cmc queries like pow=0.5 or cmc=0.5 work, at least mostly.

Split card system supports 5-part cards too, all one of them.

mana= queries check actual mana, not converted mana cost, so mana=0 will nor returns lands and manaless suspend cards, only cards with actual mana cost equal to 0.

mana= queries treat hybrid etc. mana as their own kind, because I couldn't come up with any consistent logic to do otherwise which would fit all special mana types. So Bioshift is mana={u/g}, not mana=g anything like it.

Everything is documented on help page in the application.

No comments:

Post a Comment