taw's blog: The Next Ruby

Monday, September 26, 2016

The Next Ruby

Ruby turned out to be the most influential language of the last few decades. In a way that's somewhat surprising, as it didn't come up with that many original ideas - it mostly extracted the best parts of Perl, Lisp, Smalltalk, and a few other languages, polished them, and assembled them into a coherent language.

The Great Ruby Convergence

Nowadays, every language is trying to be more and more like Ruby. What I find most remarkable is that features of Perl/Lisp/Smalltalk which Ruby accepted are now spreading like wildfire, and features of Perl/Lisp/Smalltalk which Ruby rejected got nowhere.

Here's some examples of features which were rare back when Ruby got created:

Lisp - higher order functions - Ruby accepted, everyone does them now
Lisp - everything is a value - Ruby accepted, everyone is moving in this direction
Lisp - macros - Ruby rejected, nobody uses them
Lisp - linked lists - Ruby rejected, nobody uses them
Lisp - s-expression syntax - Ruby rejected, nobody uses them
Perl - string interpolation - Ruby accepted, everyone does them now
Perl - regexp literals - Ruby accepted, they're very popular now
Perl - CPAN - Ruby accepted as gems, every language has it now
Perl - list/scalar contexts - Ruby rejected, nobody uses
Perl - string/number unification - Ruby rejected, nobody uses them except PHP
Perl - variable sigils - Ruby tweaked them, they see modest use in Ruby-style (scope indicator), zero in Perl-style (type indicator)
Smalltalk - message passing OO system - Ruby accepted, everyone is converging towards it
Smalltalk - message passing syntax - Ruby rejected, completely forgotten
Smalltalk - image based development - Ruby rejected, completely forgotten

You could make a far longer list like that, and correlation is very strong.
By using Ruby you're essentially using future technology.

That was 20 years ago!

A downside of having a popular language like Ruby is that you can't really introduce major backwards-incompatible changes. Python 3 release was very unsuccessful (released December 2008, today it's about even split between Python 2 and Python 3), and Perl 6 was Duke Nukem Forever level fail.
Even if we knew for certain that something would be an improvement, and usually there's a good deal of uncertainty before we try. But let's speculate on some improvements we could do if we weren't constrained by backwards compatibility.

Use indentation not `end`

Here's some Ruby code:

class Vector2D
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  end
  def length
    Math.sqrt(@x**2 + @y**2)
  end
end

All the ends are nonsense. Why can't it look like this?

class Vector2D
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  def length
    Math.sqrt(@x**2 + @y**2)

It's much cleaner. Every lexical token slows down code comprehension. Not character - it really makes no difference between end vs }, but all the extra tokens need to be processed even if they're meaningless.

Ruby dropped so much worthless crap like semicolons, type declarations, local variable declarations, obvious parentheses, pointless return statements etc., it's just weird it kept pointless end.

There's minor complication that chaining blocks would look weird, but we can simply repurpose {} for chainable blocks, while droping end:

ary.each do |item|
  puts item

versus:

ary.select{|item|
  item.price > 100
}.map{|item|
  item.name.capitalize
}.each{|name|
  puts name
}

This distinction is fairly close to contemporary Ruby style anyway.

If you're still not sure, HAML is a Ruby dialect which does just that. And Coffeescript is a Ruby wannabe, which does the same (while going a bit too far in its syntactic hacks perhaps).

Autoload code

Another pointless thing about Ruby are all the require and require_relative statements. But pretty much every Ruby project loads all code in a directory tree anyway.

As Rails and rspec have shown - just let it go, load everything. Also make the whole standard library available right away - if someone wants to use Set, Pathname, URI, or Digest::SHA256, what is the point of those requires? Ruby can figure out just fine which files are those.

Files often depend on other files (like subclasses on parent classes), so they need to be loaded in the right order, but Rails autoloader already solves this problem.

That still leaves out files which add methods to existing objects or monkeypatch things, and they'll still need manual loading, but we're talking about 1% of use cases.

Module nesting needs to die

Here's some random Ruby code from some gem, rspec-expectations-3.5.0/lib/rspec/expectations/version.rb:

module RSpec
  module Expectations
    module Version
      STRING = '3.5.0'
    end
  end
end

That's an appalling ratio of signal to boilerplate.

It could seriously be simply:

module Version
  STRING = '3.5.0'

With the whole fully qualified name being simply inferred by autoloader from file paths.

The first line is technically inferrable too, but since it's usually something more complex like Class Foo < Bar, it's fine to keep this even when we know we're in foo.rb.

Module nesting based constant resolution needs to die

As a related thing - constant resolution based on deep module nesting needs to die. In current Ruby:

Name = "Alice"
module Foo
  Name = "Bob"
end

module Foo::Bar
  def self.say_hi
    puts "Hi, #{Name}!"
  end
end

module Foo
  module Bar
    def self.say_hello
      puts "Hello, #{Name}!"
    end
  end
end

Foo::Bar.say_hi     # => Hi, Alice!
Foo::Bar.say_hello  # => Hello, Bob!

This is just crazy. Whichever way it should go, it should be consistent - and I'd say always fully qualify everything unless it's in the current module.

New operators

Every DSL is abusing the handful of predefined operators like <<, [], and friends.

But there's seriously no reason not to allow them to create more.

Imagine this code:

class Vector2DTest
  def length_test
    v = Vector2D.new(30, 40)
    expect v.length ==? 50

That's so much cleaner than assert_equal or monkeypatching == to mean something else.

I expect that custom operators alone would go halfway through making rspec style weirdness unnecessary.

Or when I have a variables representing 32-bit integers for interfacing with hardware, I want x >+ y and x >! y for signed and unsigned comparisons instead of converting it back and forth with x.to_i_signed > y.to_i_signed and x.to_i_unsigned > y.to_i_unsigned.

This obviously will be overused by some, but that's already true with operator overloading, and yet everybody can see it's a good idea.

We don't need to do anything crazy - OCaml is a decent example of fairly restrictive class of operator overloading that's still good enough - so any operator that starts with + parses like + in expressions etc., and parsers don't need to be aware of which library it uses.

a +!!! b *?% c would always mean a.send(:"+!!!", b.send(:"*?%", c)), regardless of those operators meaning anything or not.

Real keyword arguments

Ruby hacks fake keyword arguments by passing extra Hash at the end - it sort of works, but really messes up more complex situations, as Hashes can be regular positional arguments as well. It will also get messed up if you modify your keyword arguments, as it will happily modify Hash in the caller.

We don't check if last argument is a Proc, we treat them as a real thing. Same should apply to keyword arguments.

Ruby is currently built around send operation:

  object.send(:method_name, *args, &block_arg)

we should make it:

  object.send(:method_name, *args, **kwargs, &block_arg)

It's a slight incompatible change for code that relied on previous hacky approach, and it makes method_missing a bit more verbose, but it's worth it, and keyword arguments can help clean up a lot of complex APIs.

Kill `#to_sym` / `#to_s` spam

This is somewhat of a cultural rather than cultural problem, but every codebase I've seen over last few years is polluted by endless #to_sym / #to_s, and hacks like HashWithIndifferentAccess. Just don't.

This means {foo: :bar} syntax needs to be interpretted as {"foo" => "bar"}, and seriously it just should. The only reason to get anywhere close to Symbols should be metaprogramming.

The whole nonsense got even worse than Python's list vs tuples mess.

Method names should not be globally namespaced `String`

This is probably the biggest change I'd like to see, and it's somewhat speculative.

Everybody loves code like (2.hours + 30.minutes).ago because it's far superior to any alternatives, and everybody hates how many damn methods such DSLs add to common classes.

So here's a question - why do methods live in global namespace?

Imagine if this code was:

class Integer
  def time:hours
    60*self.time:minutes
  def time:minutes
    60*self
  def time:ago
    Date.now - self

and then:

  (2.time:hours + 30.time:minutes).time:ago

This would let you teach objects how to respond to as many messages as you want without any risk of global namespace pollution.

And in ways similar to how constant resolution works now with include you could do:

class Integer
  namespace time
    def minutes
      60*self
    def hours
      60*self.minutes
    def ago
      Date.now - self

and then:

  include time
  (2.hours + 30.minutes).ago

The obvious question is - how the hell is this different from refinements? While it seems related, this proposal doesn't change object model in any way whatsoever by bolting something on top of it - you're still sending messages around - it just changes object.foo() from object.send("foo".to_sym) global method namespace to object.send(resolve_in_local_lexical_context("foo")), with resolution algorithm similar to the current constant resolution algorithm.

Of course this is a rather speculative idea, and it's difficult to explore all consequences without trying it out in practice.

Unified matching/destructuring

Here's a feature which a lot of typed functional programming languages have, and which Ruby sort of has just for Strings and regular expressions - you can test for a match and destructure in a single expression:

case str
when /c:[wubrg]/
  @color = $1
when /t:(\S+)/
  @type = $1

Doing this kind of matching on anything else doesn't work because

$1

and friends are some serious hackery:

$1 and friends are accessing parts of $~ - $1 is $~[1] and so on.
$~ is just a regular local variable - it is not a global, contrary to $ sigil.
=~ method sets $~ in caller's context. It can do it because it's hacky C code.

Which unfortunately means it's not possible to write similar methods or extend their functionality without some serious C hacking.

But why add a mechanism to set caller $~, and then we could create our own matchers:

case item
when Vector2D
  @x = $~x
  @y = $~y
when Numerical
  @x = $0
  @y = $0

To be fair, there's a workable hack for this, and we could write a library doing something like:

case s = Scanner(item)
when Vector2D
  @x = s.x
  @y = s.y
when Numerical
  @x = s.value
  @y = s.value

and StringScanner class in standard library which needs just a tiny bit extra functionality beyond what String / Regexp provide goes this way.

But even that would still need some kind of convention with regards to creating scanners and matchers - and once you have that, then why not take one extra step and fold =~ into it with shared syntactic sugar?

Let the useless parts go

Here's an easy one. Ruby has a lot of crap like @@class_variables, protected visibility (pop quiz: what it actually does, and how it interacts with method_missing), Perl style special variables like $=, method synonyms like #collect for #map, flip flop operator, failed experiments like refinements etc.

Just let it all go.

Wait, that's still Ruby!

Yeah, even after all these changes the language is essentially Ruby, and backwards incompatibility shouldn't be that much worse than Python 2 vs 3.

11 comments:

headius said...: Some interesting ideas. I have comments.

* python-style scoping has been discussed in the Ruby community for decades, and it's not likely to ever change. I don't have anything against it, really, if everyone settles on consistent indentation characters, but it's not going to happen. See a joke feature request related to "too many ends" here: https://bugs.ruby-lang.org/issues/5054
* I think having the whole stdlib always loaded is a terrible idea, not only because it means we'd actually have to *load* everything on startup, but because it pollutes the top-level namespace horribly. All languages "require" you to pull in some level of dependencies explicitly, and I personally think it's a good thing. Admittedly, one reason libraries don't even load *themselves* completely is because loading/parsing/compiling Ruby code from disk is a slow process, even on MRI and especially on JRuby.
* Module nesting constant lookup may be confusing, as in the case you point out. However, it's also how the following example works:
module Foo
Name = 'Joe'
def say_hello; "Hello, #{Name}"; end
end
In order to support the above and not cause the confusion you pointed out, we would need a separate place to store lexically-visible constants that isn't the module itself. Possible, but not how things are done right now.
* My experience with having unlimited operators is largely from watching the Scala community, where it seems like everyone agrees unlimited operators often leads to very confusing libraries. Basically, everyone decides their own meanings for their own arbitrary strings of symbols...potentially conflicting meanings across libraries. I guess I've never found it valuable to encode so much meaning in symbols when we have natural language to express that meaning.
* Re: challenges mixing hashes in positional args with keyword args: don't do that. Both need to be supported for backward-compatibility, so this isn't going to go away (at least, not until some hard choices in Ruby 3.0).
* The Symbol vs String distinction is troublesome, indeed. Symbol is both more and less than a String, though: it is idempotent and immutable. You could get there by simply using interned and frozen strings, but then you'd just move the issues to whether you've actually got an interned string or not. And not having them interned means an expensive equality check rather than a cheap identity check. I don't have a good answer here.
* "Method names should not be globally namespaced String" I'm not sure I understand. What you've done seems to be simply providing longer names for these methods using a character not currently allowed in method names. Refinements is an entirely different beast; it allows any scope to define a sort of "overlay" on top of existing classes such that the methods in the overlay are preferred. Basically, scoped monkey-patching. It also has a ton of complex problems that have not been solved yet.
* Unified destructuring is kinda cool. You might want to suggest it directly to ruby-lang.org. However, the various $ variables are starting to fall ou tof fashion (for good reason), and I'm not sure what this gains you out of just using the variable name again.
* Useless bits like class variables, protected visibility, perl-style vars (see previous answer), etc: I agree with all your examples.; 20:41
Peter Cooper said...: ary.each do |item|
puts item

I think we could even go a step further with:

ary.each |item|
puts item

(and make |identifier| a singular piece of syntax, so it's not confusable with ary.each | item | ...)

I suspect formatting will be fried on the above due to how ancient Blogger is, so you might need some imagination :); 21:06
taw said...: Peter Cooper: I'm not totally sure if it would work, as | is used for other things, and a lot of block have no arguments, so they might need ||, but it might work.; 21:45
Dziulius said...: This comment has been removed by the author.; 15:56
Anonymous said...: http://i2.kym-cdn.com/photos/images/facebook/000/865/302/3d5.gif

This is the worst possible idea. And fuck you for that. If you don't want to write END go write python, coffee or whatever else stupid whitespace languages exist. Tell me how would you do eval, define_method, etc?

At least I'm happy to know that as long as Matz is alive, this will never happen.; 15:58
taw said...: I see some people feel very emotional about this.

define_method and friends are absolutely trivial - https://gist.github.com/nickjacob/300c508732857ad42683

Here's ruby version:

class Person
define_method("overreacting?") do
self.name == "Dziulius"; 16:05
Anonymous said...: Using whitespace for login instead of formatting is just plain stupid, even harmful. Any sane person would not prefer writing code which would break after deleting a whitespace or replacing it non-breaking-space.

We have a quite large coffee script code base. And it's a nightmare to maintain it.

And your comparison with haml/slim is worth nothing. Erb is a way to go, and it will always be (unless things like jsx will take over).

So you and have wet dreams about significant whitespace as much as you want, but that's never gonna happen. So stop giving heart attacks for real programmers.

Also, you just can't do eval with whitespace, that one of the reasons python (even 3.0) is much less dynamic compared to ruby. And imagine all the libraries written that would need to be rewritten. That will just never happen - you can have as much wet dreams as you like - it just won't.; 00:54
Anonymous said...: Some of these suggestions lead me to believe that Ruby is not your primary language, or if it is, you really don't seem to have a good fundamental grasp of it. Looking through some of your repositories, I see a lot of huge no-no's, like loading your entire library in the main require call (Charlie explained why this is bad form), or using underscore variable names (aesthetic, but telling).

In short, your ideas would make Ruby fundamentally un-Ruby. They would break Ruby, and not just in a backwards compatibility way. So, to rebut your final point: no, it would no longer be Ruby.

Here's why:

1. Whitespace isn't about emotion, it's about parser rules. Ruby is meant to be expressive, and with that comes the ability to do weird things that Python would have an aneurysm over. Even your simplistic examples fall over when you bring them into the real world, and you even admit that edge cases come about really quick (chaining). The problem is that chaining is not always done via method calls, it is often done via operators:

diff = list.reduce do |a, b|
a + b
end - otherlist.sum

CoffeeScript is actually a great example to bring up, because it shows how appending whitespace rules to Ruby's syntax is problematic. There are plenty of Ruby idioms that just break down with CoffeeScript

2. So you want to know why methods live in a global namespace? Okay-- you could have just asked. The answer is: caching.

I'm not sure what your code example is supposed to do (are those methods scoped lexically? per-file? per-module?), but if you can rewrite the dispatch lookup rules everytime you enter a new namespace, Ruby would never be fast. You're right that your proposal differs from refinements, but only in so much that it completely ignores the actual solutions that refinements has to minimize issues around method caching and leak. You're basically reimplementing refinements but without scoping. Scoping is the complicated part, and you're completely handwaving over it.

3. Load everything: a horrible idea, one that you clearly haven't thought much about. For one, Rails doesn't load everything unless you ask it to. Loading everything up front in a server architecture is occasionally (but not always) a reasonable thing to do. Loading everything up front in a command line utility is wholly absurd-- you'd end up spending more time loading your utility than executing it. From a perf perspective, your suggestion would create an O(N) performance curve where N is the number of gems your have installed in your system. In other words, Ruby installs would get slower the more they get used, which to say the least is a really bad idea.

Even if you're talking about autoloads for libraries, you still have O(N) performance across files, so the larger a particular library gets, the slower all consumers of that library become. You mentioned this as a statement of fact, but in fact, this is NOT how most libraries are architected. If this is how you are writing your libraries, please stop doing this, it's bad practice. Use autoload for all your Ruby files, or only bring in files when they are explicitly needed.

contd; 10:07
Anonymous said...: More importantly though, require statements indicate intent, especially when it comes to loading external code. What is Digest::SHA256? Where does it come from? How am I supposed to look up documentation for this?

Also, and equally importantly: which Digest::SHA256 are you asking for? Even dir==package based languages have this problem. I can public gem2 which exports Digest::SHA256 and now you have a collision.

And, I realize that your suggestion about module nesting is meant to solve this, but here's the thing:

Your suggestion to remove namespace nesting would fundamentally change Ruby into a completely different language. Opening up other modules is a feature, not a bug. If all namespaces are lexically scoped to your active module, you could never monkeypatch any external classes. Even "String" would have problems with lookup. Which String am I using now? Could I rewrite your world by creating a lib/string.rb file without you ever knowing, because you aren't even calling require? What does "module ActiveRecord; def foo()" inside some lib/foo/bar/baz.rb do in your fork of Ruby? These are questions that impact how Ruby is used in very basic ways. Ruby as a "script" language would completely break, since all code organization would be forced into directory structures that may not fit the problem. You would necessarily have to define separate directories to export multiple modules, which may not even be possible if you don't have disk access.

Finally your suggestion to use fully qualified module names everywhere would make Ruby very Java-esque. In other words, ugly as all hell. Any complex scenario would end up with Rails::ActiveRecord::Foo::Bar::Baz.new everywhere. This is hideous. It's also why include exists. Incidentally, your above suggestion falls apart once include semantic are brought in, since you can no longer scope inclusions to specific modules.

Basically all of your ideas are fairly horrific.; 10:07
taw said...: Anonymous: What? Loading the whole library on primary require is the standard way. objectiveflickr isn't even my code - it's just some gem I forked to add more features, and it was already doing the thing you find so objectionable, because it is the standard way of doing things in ruby.

You seem to obsess about things you want to do <1% of the time and performance hacks - like that "do end - more calculations" is very rare code.

On sample of 430k lines of ruby from rails and related gems, there's 50223 bare ends, 664 ends followed by comment, 351 ends followed by method call, 118 followed by postfix if/unless/while/until, 19 by ] or ), and and 31 followed by everything else.
So at worst we're forcing extra parentheses around 0.06% of end lines, while saving lexemes for 99.94%.

Similar argument applies to not autoloading, hacking paths etc. - there are plenty of ways to allow for 1% use case without forcing pointless verbosity for 99% use case.

Fully qualified names and autoloading are thing Rails already uses, so I'm not even sure what you're talking about here. You can monkeypatch in Rails, it's just optimized for 99% case (one class per file matching its name) not for 1% case (monkeypatching String).; 12:51
jellymann said...: Indentation-based scoping is a rubbish idea. I've worked on enough coffeescript, and I find it utterly horrible to work with. I really, really like my "end"s, they make me feel safe.; 12:44