The best kittens, technology, and video games blog in the world.

Wednesday, October 12, 2016

How I made fast

red kitten 01 by mathias-erhart from flickr (CC-SA)
Website speed was one of the most common complaints about, so I decided to do something about it.

It won't be necessary, but if you want more background, you can read about architecture of in my previous post.

Verify the problem

The site seemed "fast enough" to me, but multiple users complained about its performance, so presumably there was something real about it.

I still wanted to make sure. It's important to verify that the problem you're trying to fix is real and on your side, as to not waste time fixing problems which might be caused by someone's crappy mobile connection or ISP throttling.

I checked Rails logs, and indeed - timing I've seen there were pretty poor. There wasn't even much need for statistical analysis - very simple queries shouldn't routinely took ~1000ms.

Have a test suite

Before I started any performance work, I needed to have a test suite to validate that I'm not breaking anything. Fortunately was as TDD as it gets, and had very comprehensive tests - even if I'm retrospectively somewhat regretting using minitest for that.

If you don't have good test suite, don't despair - for optimizations you just need to find a reasonably representative sample of inputs (which you can use logs for), and record current version's outputs for them. If something broke in process, you'll get different results, so you'll instantly know.

It's not as good as dedicated test suite, as your random sample probably lacks important edge cases a proper test suite would generally have, but you get 80% of value for 20% of effort, and you can add a few test for edge cases as you go.

Generate a benchmark

This was easy enough - I downloaded logs, extracted list of queries from them, took a random sample, and tested time per request.

The best thing about this set was that I could very easily generate new benchmarks for specific types of queries with just grep - so when optimizing let's say card type searches, I can look for t:, which will give a much more accurate result than just throwing everything at it. (assuming my optimizations don't affect other query types of course)

As secondary benchmark I also recorded time to run the test suite. I didn't expect it to be super sensitive measurement, but it's nice to have extra validation for zero effort, and I was going to run the tests a lot anyway.

A small complication for any kind of benchmarking is that you want your computer to not be overly busy, or it will contaminate the results. It doesn't need to be completely idle, as computers have multiple cores and you'll generally be using only one core, but I closed things like Chrome and used my other computer for sucht things for a while.


This is optional, and profiling tools in ruby are fairly weak and awkward, but I ran ruby-prof anyway to get some idea about code hot spots.

It would identify which parts of the code were taking the most time, but it's mostly there to generate some suggestions.

Precompute data

Shockingly the biggest hot spot was very simple query ConditionExact, which corresponds to querying for exact card - with queries like !goblin guide or !Far // Away.

The query was actually very slow, as it was going through entire database, and doing something like this:
  • go through the whole database, every card printing
  • take query, downcase, fix unicode, fix spacing
  • take, downcase, fix unicode, fix spacing
  • if they're the same, it's a match
That was really slow, and also quite silly. Card names don't have spacing problems (like double spaces between words), and we're already fixing unicode silliness like the "Æ" ligature when we load the database. As for the query, we can precompute this data when we parse it and create Condition object.

It still needed the card name downcase step as card name was something like Goblin Guide, but it needed to match goblin guide, GOBLIN GUIDE and whichever other capitalization people use.

So the algorithm became:
  • go through the whole database, every card printing
  • take preprocessed query
  • take, downcase
  • if they're the same, it's a match
That's still fairly inefficient, but it skips huge amount of pointless string manipulation.

Similar preprocessing was later applied to a lot of other conditions.

Separate common and uncommon cases

Going back to ConditionExact, it was actually two different queries:
  • exact card name - like !Goblin Guide
  • multipart card name - like !Far // Away
The multipart case was much more complex, as search engine represents every part of a multipart card as a separate object, so it needed somewhat complex logic to handle this.

So I moved all rare and complex code to ConditionExactMultipart and I just had the commond and simpler case left to optimize.

In a few other cases I set a flag if certain complex processing was needed. For example we support queries like o:"when ~ enters the battlefield", where "~" matches name of the card - so for example it matches Zealous Conscripts as it contains text "When Zealous Conscripts enters the battlefield, ...".

Such queries require more complex logic than most o: (card text) queries, which just look for same text every time, and can do more preprocessing.

In this case we set a flag in ConditionOracle constructor and then follow different logic based on that flag.

This optimization only makes sense when simpler condition is also the common one - if most cases are complex, you're not really gaining much by splitting them.

Use indexes

Most queries use ConditionSimple, which just tests card printings one at a time with match? method.

The alternative is for condition to process whole database and generate Set of results in one go.

These two modes are combined by he most common ConditionAnd to first get all Sets, take their intersection, and then filter them with each ConditionSimple one at a time.

Direct result Set generation was already used for a few queries already - like searching for cards printed in a specific edition, or specific block. I added that to format search as well, as f:standard is reasonably common phrase in queries which benefits a lot from this kind of early culling. Bigger formats like f:modern or f:commander didn't significantly benefit from this optimization, but they didn't seem to suffer either, so I applied it everywhere.

Of course the biggest beneficiary of direct access was ConditionExact which got a downcased card name index, and can now result results almost instantly.

I didn't add any more indexes as ruby is very memory hungry and keeping memory use low is very important. Other queries were either much less common or wouldn't benefit anywhere near as much.

Optimize memory use

I wanted to take a quick go at reducing memory use, unfortunately ruby memory profiling tools are still very poor, so I just did a few intuitive things like removing unnecessarily duplicated data.

I pondered going further, but speed improvement I got for random sample of queries were already huge. I didn't need anything complex like adding caching, getting bigger box, trying JRuby, or moving search engine to external service.

Verify the solution

Even though changes looked really good in benchmark mode, it's important to verify them in production. So I checked the results, and indeed vast majority of requests are fast now.

There's a few which are still somewhat slow - like for example someone trying to get 52nd page of result of all cards in EDH, but it's a reasonably safe bet that most of such weird requests are made by bots.

Overall, it's been a fairly straightforward and successful process.

No comments: