Sunday, June 25, 2017

11 Small Improvements For Ruby

"HALT! HOO GOEZ THAR?" 😸 by stratman² (2 many pix!) from flickr (CC-NC-ND)

For followup post with some solutions, check this.

A while ago I wrote a list of big ideas for Ruby. But there's also a lot of small things it could do.

Kill 0 octal prefix

I'm limiting this list to backwards compatible changes, but this is one exception. It technically breaks backwards compatibility, but in reality it's far more likely to quietly fix bugs than to introduce them.

Here's a quick quiz. What does this code print:

p 0123
p "0123".to_i
p Integer("0123")

Now check in actual ruby and you'll see my point.

If your answer wasn't what you expected, then it proves my point. The whole damn thing is far more likely to be an accident than intentional behaviour - especially in user input.

If you actually want octal - which nobody ever uses other than Unix file permissions, use 0o123.

Add missing Hash methods

Ruby has a bad habit of treating parts of standard library as second class, and nowhere it's more mind boggling than with Hash, which might be the most commonly used object after String.

It only just got transform_values in 2.4, which was probably the most necessary one.

Some other methods which I remember needing a lot are:
  • Hash#compact
  • Hash#compact!
  • Hash#select_values
  • Hash#select_values!
  • Hash#reject_values
  • Hash#reject_values!
You can probably guess what they ought to do.

Hash#zip

Technically Hash#zip will call Enumerable#zip so it returns something, but that something is completely meaningless.

I needed it crazy often. With a = {x: 1, y: 2} and b = {y: 3, z: 4} to run a.zip(b) and get {x: [1, nil], y: [2,3], z: [nil, 4]}, which I can then map or transform_values to merge them in meaningful way.

Current workaround of (a.keys|b.keys).map{|k| [k, [a[k], b[k]]]}.to_h works but good luck understanding this code if you run into it, so most people would probably just loop.

Enumerable#count_by

Here's a simple SQL:

SELECT author, COUNT(*) count FROM posts GROUP BY author;

Now let's try doing this in ruby:

posts.count_by(&:author)

Well, there's nothing like it, so let's try to do it with existing API:

posts.group_by(&:author).map{|author, posts| [author, posts.size]}.to_h

For such a common operation having to do group_by / map / to_h feels real bad - and most people would just loop and += like we're coding in some javascript and not in a civilized language.

I'm not insisting on count_by - there could be a different solution (maybe some kind of posts.map(&:author).to_counts_hash).

URI query parameters access

Ruby is an old language, and it added a bunch of networking related APIs back then internet was young. I don't blame anyone for these APIs not being very good, but by now they really ought to be fixed or replaced.

One mindbogglingly missing feature is access to query parameters in URI objects to extract or modify them. The library treats the whole query as opaque string with no structure, and I guess expects people to use regular expressions and manual URI.encode / URI.decode.

There are gems like Addressable::URI that provide necessary functionality, and URI needs to either adapt or get replaced.

Replace net/http

It's similar story of API added back when internet was young and we didn't know any better. By today's needs the API feels so bad quite a few people literally use `curl ...`, and a lot more use one of hundred replacement gems.

Just pick one of those gems, and make it the new official net/http. I doubt you can do worse than what's there now.

Again, I'm not blaming anyone, but it's time to move on. Python had urllib, urllib2, urllib3, and by now it's probably up to urllib42 or so.

Make bundler chill out about binding.pry

For better or worse bundler became the standard dependencies manager for ruby, and pry its standard debugger.

But if you try to use require "pry"; binding.pry somewhere in your bundle exec enabled app, it will LoadError: cannot load such file -- pry, so you either need to add pry to every single Gemfile, or edit that, bundle install every time you need to debug anything, then undo that afterwards.

I don't really care how that's done - by moving pry to standard library, by some unbundled_require "pry", or special casing pry, the current situation is just too silly.

Actually, just make binding.pry work without any require

I have this ~/.rubyrc.rb:

begin
  require "pry"
rescue LoadError
end

which I load with RUBYOPT=-r/home/taw/.rubyrc.rb shell option.

It's such a nice quality of life improvement to type binding.pry instead of require "pry"; binding.pry, it really ought to be the default, whichever way that's implemented.

Pathname#glob

Pathname suffers from being treated as second class part of the stdlib.

Check out this code for finding all big text files in path = Pathname("some/directory"):

path.glob("*/*.txt").select{|file| file.size > 1000}

Sadly this API is missing.

In this case can use:
glob("#{path}/*/*.txt").map{|subpath| Pathname(subpath)}.select{|file| file.size > 1000}

which not only looks ugly, it would also fail if path contains any funny characters.

system should to_s its argument

If wait_time = 5 and uri = URI.parse("https://en.wikipedia.org/wiki/Fidget_Spinner"), then this code really ought to work:

system "wget", "-w", wait_time, uri

Instead we need to do this:

system "wget", "-w", wait_time.to_s, uri.to_s

There's seriously no need for this silliness.

This is especially annoying with Pathname objects, which naturally are used as command line arguments all the time. Oh and at least for Pathnames it used to work in Ruby 1.8 before they removed Pathname#to_str, so it's not like I'm asking for anything crazy.

Ruby Object Notation

Serializing some data structures to send over to another program or same in a text file is a really useful feature, and it's surprising ruby doesn't have such functionality yet.

So people use crazy things like:
  • Marshal - binary code, no guarantees of compatibility, no security, can't use outside Ruby
  • YAML - there's no compatibility between every library's idea of what counts as "YAML", really horrible idea
  • JSON - probably best solution now, but not human readable, no comments, dumb ban on line final commas, and data loss on conversion
  • JSON5 - fixes some of problems with JSON, but still data loss on conversion
What we really need is Ruby Object Notation. It would basically:
  • have strict standard
  • have implementations in different languages
  • with comments allowed, mandatory trailing commas before newline when generated, and other such sanity features
  • Would use same to_rbon / RBON.parse interface.
  • And have some pretty printer.
  • Support all standard Ruby objects which can be supported safely - so it could include Set.new(...), Time.new(...), URI.parse(...) etc., even though it'd actually treat them as grammar and not eval them directly.
  • Optionally allow apps to explicitly support own classes, and handle missing ones with excepions.
This is unproved concept and it should be gem somewhere, not part of standard library, but I'm surprised it's not done yet.

11 comments:

  1. posts.count_by(&:author) => posts.count(&:author)

    binding.pry => binding.irb since 2.4

    ReplyDelete
  2. posts.count(&:author) will return number of posts with non-nil author, which is not even close to what we need.

    binding.irb is like a small step in right direction, but irb is far too limiting compared to pry.

    ReplyDelete
  3. 'posts.group(:author).count' will return a hash with post ids as keys and the count of comments as values.

    ReplyDelete
  4. Ack, that's ActiveRecord stuff, not Enumerable. My bad

    ReplyDelete
  5. I'm 100% with you on the octal and count_by suggestions. I see no reason why octal literals shouldn't follow the same pattern as hex literals. I've also done the group_by-map-count dance often enough to wish there was a count_by method for this.

    I disagree with regards to Hash#zip. I don't know what you want to do with your hashes, but probably you could get away with passing a block for key conflicts to merge: Hash#merge { |key, old_val, new_val| ... }

    ReplyDelete
  6. Also regarding Hash#select_values or #reject_values.. just do hash.reject { |_, v| .... } ?

    ReplyDelete
  7. Kai:

    So starting with 2.4 (or using hash-polyfill gem) you can now do posts.group_by(&:author).transform_values(&:count) which is much better than previous 3-step process.

    select_values is mostly convenience feature so you can do .select_values(&:present?) instead of .select{|_,v| v.present?} Not a huge thing, but especially in long expression it will make things look nicer.

    .zip is useful for merge conflicts, but a lot of other things as well. For example when tests fail you can quickly display differences with a.zip(b).select_values{|x,y| x != y}
    There's a lot of cases like that.

    ReplyDelete
  8. Re: net/http I think https://github.com/httprb/http has the cleanest and most matured API.

    As to count_by, I agree with the idea but the name is just as confusing, as demonstrated in the comments above already. :)

    For serialization, I wouldn't use Ruby Object Notation anyway, it's a little bit of improvement over Marshal in terms of compatibility, but I fail to see any improvements in terms of security. Complexity is the enemy of security, and that's why JSON survived as a universal API format IMO.

    Other than that, all good. :) Especially Pathname, yeah... I think it should be a first-class citizen in Ruby.

    ReplyDelete
  9. kenn: In different codebases I used different http libraries, and any of them is far better than what ruby is doing.

    If you like Pathname, you might find this gem I wrote useful https://github.com/taw/pathname-glob

    ReplyDelete
  10. taw: I've used more than a dozen myself, but anything before http.rb had rough corners here and there to be a part of standard lib. I feel the pain because I have a couple of gems where I didn't want to include external dependency, and had to work with net/http. It's a PITA for sure...

    Thanks for pointing me to pathname-glob! That's the biggest thing missing in Pathname. Great work!

    ReplyDelete
  11. I rarely find myself to agree so completely with another developer. Thanks!

    ReplyDelete