Monday, February 19, 2007

Yannis's Law: Programmer Productivity Doubles Every 6 Years

(Soon to be) Frozen Nordic Toys by fisserman from flickr (CC-NC-SA) Everybody and their dog have heard about Moore's Law. Yet few know about a far more important Yannis's Law, which states:
Programmer productivity doubles every 6 years
There's no question that today it's possible to get a lot more functionality in less time and with smaller teams than in the past. The really shocking part is magnitude of the difference. In 1972 paper "On the Criteria to Be Used in Decomposing Systems into Modules" David Parnas wrote:
The KWIC index system accepts an ordered set of lines, each line is an ordered set of words, and each word is an ordered set of characters. Any line may be "circularly shifted" by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a listing of all circular shifts of all lines in alphabetical order. This is a small system. Except under extreme circumstances (huge data base, no supporting software), such a system could be produced by a good programmer within a week or two.
Fast forward to 2003, Yannis Smaragdakis wrote:
I would not consider a programmer to be good if they cannot produce the KWIC system within an hour or two.
"As hour or two" seemed seriously excessive, so in 2007 I timed myself coding such a system. It took me 4 minutes and 11 seconds to create and test the following code:
res_lines = []
STDIN.each{|line|
    line.chomp!
    words = line.split(" ")
    n = words.size
    (0...n).each{|i|
        res_lines << (words[(i+1)..n] + words[0..i]).join(" ")
    }
}
res_lines.uniq!
res_lines.sort!
puts res_lines
That's 500 to 1200 times faster than a "good 1972 programmer". Maybe my Perl/Ruby background makes me a bit unrepresentative here, but would anybody call a person who cannot implement it in 15 minutes (like during a job interview) a "good programmer" ? The progress is not over - you can get a huge productivity boost by moving from Java to Ruby, from PHP or J2EE to Ruby on Rails, or by seriously doing test-driven development. That can get you from 1990s to 2000s. And there's plenty of people who are still in the productivity 1980s - not using SVN, not knowing regular expressions and Unix, manually recompiling their programs, some are even doing things are insane as coding application-specific network protocols and data storage formats! By the way, there's a cool movie by some NASA guy, which times development of a small website in multiple web frameworks - the difference is really significant. Some people actively deny this progress. The most famous is Fred Brooks's 1986 essay "No Silver Bullet - essence and accidents of software engineering", which claimed that there's simply no way 10x productivity increase could possibly ever happen. Guess what - since 1986 programmer productivity already increased by about 11x (according to Yannis's Law figure) ! So programmer productivity doubles every 6 years. Every 6 years it takes only half the time to do the same thing. How many other professions can claim something like that ? Is there any human activity for which it is happening consistently for decades ? (Actually I wouldn't be surprised if the answer was true, even if with somewhat slower pace of improvement.) This means a few things. First, because fewer and fewer people will be necessary to do today's work, programming must expand to new areas - and maybe you should also look there. Second, drop the things keeping you in the past and move on. Do you really want to waste your time coding C if you can code many times more effectively in Ruby ? Whether it's an Open Source project, a job, or your personal thing, it just makes no damn sense to accept this kind of productivity loss. And third - if you are some sort of person making technology decisions for your company or organization - "because we have always done it that way" is terrible reason for choosing any programming-related technology. What they're making today is without doubt better, and programmers are really good at adapting, even if they don't want to admit it.

29 comments:

  1. You quote Fred Brooks wrong: What he says, is "There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity." This is not even close to, as you say, "there's simply no way 10x productivity increase could possibly ever happen."

    What he was trying to do was combat the early, and still existing, myth that we're just one step away from magic. It is pretty amazing how fast programmer productivity has increased, but it's a lot of incremental improvement, and really is no silver bullet.

    ReplyDelete
  2. Anonymous15:04

    I tried implementing the KWIC program in Haskell. Took me slightly more than your 4 minutes, but then I'm a Haskell newbie and I had to look up some docs. But still impressing compared to one week.

    module Main where

    import List

    shifts l = [ (drop n l)++(take n l) | n <- [1..length l] ]

    main = interact (unlines . sort . concatMap (map unwords . shifts . words) . lines)

    Blogger.com doesn't allow code or pre tags? WTF? The code is four lines. Five if you write it more readable :-). Greetings, Stephan

    ReplyDelete
  3. Phil: Brooks was claiming much more than that.

    Some quotes:
    "Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any".
    "If we examine the three steps in software technology development that have been most fruitful in the past, we discover that each attacked a different major difficulty in building software, but that those difficulties have been accidental, not essential, difficulties. We can also see the natural limits to the extrapolation of each such attack."

    To me that sounds like "We were able to increase productivity in the past only because we were reducing accidental complexity. Now the complexity left is mostly essential complexity, and there's just no way to get around this complexity, ever". That's also how most people seem to be interpretting the essay.

    You can claim that Brooks was right, because the >10x productivity increase was due to combination of techniques, not a single technique, and that it took longer than a decade, but even if you do so, the very concept of "essential" vs "accidental" complexity needs to be thrown away.

    gman: That's part of the "expand to new areas" I'm talking about. People are making Pacman-style games in Flash in a few hours these days, but thanks to the increased productivity completely new kinds of games are possible.

    Stephan Walter: It's pretty annoying, but Blogger bans most of the HTML tags in comments, and as far as I know there's no way to unban them in configuration. Programming blogs really need <code>, <pre> etc.

    Thanks for the "one to two week project" in four lines of Haskell.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Your comparison with Parnas's assignment is unfair - The exercise required the development of all subsystems, including persistence, I/O, etc... from scratch - The point of the article was the decomposition of a big systems into modules, not how to write it as fast as possible. Libraries existed already in the 1970s and this could be probably done in less than 1h, without constraints. IMHO the "real" productivity boost since 1970 is far less than your estimation - although I can't really put it in numbers.

    ReplyDelete
  6. Anonymous21:10

    A Lisp implementation of the KWIK similar to the Ruby example can be done in a few minutes. Given that Lisp was a tool that existed by 1972, one could argue that indeed, there hasn't been much improvement in productivity.

    ReplyDelete
  7. Anonymous23:54

    [[To me that sounds like "We were able to increase productivity in the past only because we were reducing accidental complexity. Now the complexity left is mostly essential complexity, and there's just no way to get around this complexity, ever". That's also how most people seem to be interpretting the essay.]]
    Not true. Brooks did list in his essay the things that promise to effectively tackle the essential complexity such as reusability, incremental development process, education etc. What I learnt from the essay was that the inherent characteristics (invisibility, changeability etc.) of software make it hard to tackle the essential complexity of software construction, but it is not impossible.

    [[You can claim that Brooks was right, because the >10x productivity increase was due to combination of techniques, not a single technique, and that it took longer than a decade, but even if you do so, the very concept of "essential" vs "accidental" complexity needs to be thrown away.]]
    I kinda agree with this. I think it's hard to separate between the essential and accidental difficulties. Brooks considered the high-level languages as the attack to the accidental difficulty while leaving the essential (design process) untouched, but that is not true - the amount of design process needed for a Ruby project is much less than that for an Assembly project, given the same problem to be solved.

    ReplyDelete
  8. Anonymous04:27

    I think Brooks is talking more about complete systems development rather than discrete bits of coding. You did effectively show that individual programmers can do specific things much more quickly than before but something major like a back-end system for a bank takes as long as it always has if not longer.

    Admittedly, most large systems do more than they ever did in the past but people on the business side tend to think "how come I can't just click my fingers and have a fully operational system in place?" Because there's no silver bullet.

    ReplyDelete
  9. In his follow-up essay '"No Silver Bullet" Refired', he says "The part of software building I called essence is the mental crafting of the conceptual construct; the part I called accident is its implementation process." I would interpret this to mean that the tools can get better and better, but we're always going to be building something new, and building new things is inherently difficult.

    While it may take 4 minutes to do KWIC now in Ruby vs. a week in Assembly on a mainframe (since Brooks was on the original OS/360 team), it's a toy example. The problem has in inherent low essential complexity, so most of the complexity is accidental - you just have to write the code. If you were building an OS or an on-board flight control system (as Parnas did for the A7E), the essential complexity is a much greater proportion of the overall complexity. Brooks says in "Refired" that even if the accidental complexity is 9/10ths of the total complexity, shrinking it to zero would be the only way to give an order of magnitude.

    One of the other things that may also affect the outcome here is that powerful tools like Ruby have abstracted away the cruft of programming that previously existed and programmers are allowed more time to think about the essence, so they hone their essence skills much more than their accidental skills.

    ReplyDelete
  10. You are correct that in software people are constantly working on new things, and possibly using new tools (this depends on familiarity, circumstance, etc).

    People I know however mostly interpret that to be what Brooks means - 1/10th of the effort would always be the essense - you cannot reduce it to zero, because it requires thinking.

    So that's why you are getting many comments correcting your assertion on Brooks, even though your conclusion is correct.

    Increase in Productivity != Silver Bullet.

    ReplyDelete
  11. kwic's not a good example here. The ability to split lines and sort lists is built into modern languages like ruby. Productivity increases are based on library code instead of any inherent improvements in the language.

    Not saying that libraries aren't good and don't improve productivity, but the example you quoted seemed almost purpose built to show off ruby.

    It's like sayiny my new language "e" is great because it has a built in "kwic" function.

    newlist = new (kwik(system.oldlist))

    I wrote that code in three seconds, but it's no way to identify how good the imagined "e" language may be.

    ReplyDelete
  12. Anonymous09:25

    Fab writes: "Your comparison with Parnas's assignment is unfair"

    Fab has a good point. I could define a language with a built-in function to do the KWIC test with a single keyword. Does that make me 25 times better than Ruby? Knocking out KWIC implementations is meaningless.

    'Ahah' you say. But I can still call 'kwic' in one word from my language, so I'm still faster. 'But' I reply 'does your language have a pacman function, or gearsofwar or accounts_receivable function? No.

    Nevertheless, interesting how much good programmers can get so much faster. Not everyone does though!

    ReplyDelete
  13. Anonymous05:06

    First, the Ruby program doesn't actually produce a KWIC index, it hasn't indexed anything (it needs line numbers or page numbers, or something to reference the source text), so an extra 10 minutes to test, discover the error, and correct it seems reasonable.

    Second, I believe a competent awk programmer on Unix, would deliver the correct results within 15 minutes (I'm very rusty, so I'd take longer :-(
    I think a pipeline like
    awk '... permute lines ...' < file | sort | uniq | awk ' ... rotate lines and format ...'
    would do it (missing the option flags, sorry :-)

    Awk has been available since 1977/78 (http://www.softpanorama.org/History/Unix/index.shtml).

    So, I would suggest that ALL of the productivity was gained by 1978, and by Yanni's measure, almost no productivity gains have happened in the subsequent 30 years (sorry Yanni the function looks much more complex and less satisfying !-)

    Our apparent productivity has come via hardware which is so fast and capable that we don't need to write using 'high performance' but unproductive tools as often; many of the productive tools and technologies have existed for years, waiting for the hardware to catch up, and new generations to embrace them.

    ReplyDelete
  14. Anonymous03:05

    taw started saying the gaming example was a good one. That seems rather amusing to me - since the gaming companies are usually advertising for C++ developers - surely a rather unproductive language to be using in the mid-naughtie's !
    Yet you agreed we were more productive now, than pac-man coded in ... well I don't know, but surely C or basic?
    So are we more productive? probably - because as pointed out by others, we have more libraries to not repeat ourselves over and over ...
    As pointed out already, Ruby is more productive - for a small subset of problems, period!

    ReplyDelete
  15. Anonymous03:13

    We have mostly concentrated on languages here in terms of productivity - however taw originally mentioned something far more valuable, which was methodology. In particular he mentioned test first strategy.
    This I know from experience improves productivity - even if you shorten it to simply unit testing. I know my code produced via solid unit testing took longer than the other guys at first. But months after my code was released, the others were still trying to get their code release ready after multiple iterations to QA, I was smoking my cigar (OK well I don't smoke literally) with only having to touch one up issue discovered after much (_much_) testing ... More productive? Absolutely.

    ReplyDelete
  16. Jonathan: Back in pac-man era game programmers coded in assembly, not in C. In fact in early 1990s people were making a big deal of out Doom being coded mostly in C, with only small pieces in assembly.

    I don't know about too many games still coded in C++ without something on top of it. Most have C++ engines, but the engines are just libraries shared between multiple games. The games by themselves tend to be coded in some high-level language, either engine-specific or something generic like Lua, with only small parts still coded in C++.

    That's a long road since early 90s, from pure assembly to mostly Lua (or something at that level) with a bit of C++.

    ReplyDelete
  17. Anonymous16:03

    How much speed is gaind just due to faster computers? You don't have to optimize your routines as much as you had to 20 years ago. More optimized code is harder to maintain as it often hides the underliing algorithm, etc.
    Than there are way faster compiletimes, etc.

    ReplyDelete
  18. And another question is how much of the allotted week goes into documentation? Getting a slot on the test machine?

    Working around the fact that you only have 128k memory, and can't assume to be able to sort your data in main memory?

    Yesterday I hacked a little ruby script to find some jitter in a log file. The file was 1/2 Gig, and so was the ruby process after reading it into hashtables (after five minutes).

    You'd have an interesting time even implementing ruby on the machines of those days! Overlays, anyone?

    ReplyDelete
  19. Anonymous10:18

    taw said : "I don't know about too many games still coded in C++ without something on top of it. Most have C++ engines, but the engines are just libraries shared between multiple games. The games by themselves tend to be coded in some high-level language, either engine-specific or something generic like Lua, with only small parts still coded in C++."

    Oh, I just wish it were true :)
    I am a programmer in the game industry for almost 10 years now and I can tell you from both personal experience in the companies I worked in (and still do) and from following Game Developers Conference "publications" that we are really very very far from it.
    Lua is a good example of a "high" level language that is more and more used, but only to offer assistance to game and level designers in scripting the game characters behaviors and general gameplay. It's certainly not used at all by the engine programmers. We are still deeply stuck into c++ and the sad truth is that most of game programmers have a solid aversion towards anything that is higher level.

    This industry is so young and immature that the teams are under too much pressure to even think of looking outside the small world of game development for possible evolutions. It's beginning to change, but in very restricted areas (read, a handful of small companies, no more) and I really don't expect us to experiment with higher level languages in the game engines before at least 5 years (and that's optimistic).

    ReplyDelete
  20. We are still deeply stuck into c++ and the sad truth is that most of game programmers have a solid aversion towards anything that is higher level.

    Thank you, Anonymous. You are the only one here who is talking about serious real-world scenarios instead of toy problems. This is work, not the fantasies of some dreamer. Your job deals with mission-critical applications such as weapons systems, trajectories, full-detail simulators. For those, it's c++ or nothing!

    ReplyDelete
  21. Anonymous11:17

    wonderful code for the kwic index!!!!.

    richard
    portal0001@lycos.com

    ReplyDelete
  22. People are making Pacman-style games in Flash in a few hours these days, but thanks to the increased productivity completely new kinds of games are possible.

    ReplyDelete
  23. I'm a little late to the party but this is what a good 1977 programmer would have wrote.

    awk '{ word=$1; $1=""; gsub("^[ \t]*", ""); print $0,word}' <lines | uniq | sort

    There have been major advances in programming but old school unix is still the be all and end all of text processing.

    ReplyDelete
  24. npe: Thank you for proving my point, as your "1977's programmer" version doesn't do anything remotely like what it needs, so this "1977's programmer" would spend another week or so debugging.

    ReplyDelete

  25. npe: Thank you for proving my point, as your "1977's programmer" version doesn't do anything remotely like what it needs, so this "1977's programmer" would spend another week or so debugging.


    Not sure if I follow what you mean. Could you elaborate?

    ReplyDelete
  26. npe, you obviously neither understood the original description nor ran or read the ruby code. You missed the 'all circular shifts' part. When there is a line 'a b c', the output must also contain the lines 'b c a' and 'c a b'.

    ReplyDelete
  27. Everybody seems to point at 1978's appearance of Awk.

    But, as an anonymous poster mentioned earlier, by 1972 Lisp had 14 years of existence, multiple implementations, with Lisp-specific hardware soon to follow.

    Lisp was so prominent in the 60's and 70's, that the first book on Lisp history was published in 1979, written by Herbert Stoyan.

    According to these materials, by early 1970's, Lisp was used to build systems of AI, computer algebra, circuit analysis and simulation, symbolic integration, proof automation, data visualisation (Quam67) and expert systems.

    For an index to Lisp history materials, you might want to see:
    http://archive.computerhistory.org/resources/text/FindingAids/102703236-Stoyan.pdf

    ReplyDelete
  28. Anonymous20:01

    talking about programmers' productivity...

    There's a new excellent code-editor extension called Flow.

    Really improves productivity by automating the process of browsing through Q&A sites (like StackOverflow)

    http://www.flowextension.com

    ReplyDelete
  29. Eldc13:51

    It's more beautiful to do this:

    words.size.times {
    res_lines << words.join(' ')
    words.push(words.shift)
    }

    ReplyDelete