There are all kinds of categorization schemes for programming languages, by paradigms or checklists of supported features. Categorizations criteria tend to be highly subjective (e.g. "builtin support for regular expressions"), useless (e.g. "significant indentation"), or both (e.g. "does it have a standard").
I want to propose a new categorization - objective, easy to evaluate, and at the same time exposing something very deep about programming languages.
I will divide languages into:
- thin variable languages - where variables refer to data.
- fat variable languages - where variables contain data. Variables can also contain references to data, but there's a distinction between direct and indirect access.
This division is very old. Assembly language is obviously a fat variable language, even though its variable system is very simple - registers and memory locations contain stuff directly, or contain references (memory addresses) to stuff. As languages need to be compiled to assembly plenty of high performance languages follow this road. Fortran and C variables are just assembly variables plus types. C++ didn't break up with it, it made container variables much fatter and much more complicated - RAII, copy constructors, assignment operators, and all the related mess. Java in spite of superficial similarity to C++ is definitely a thin variable language.
Thin variable languages are also very old. The original Lisp was the first language with thin variables, and all Lisp dialects, just like all ML and Haskell dialects, are thin variable languages. I don't think a single seriously functional language uses fat variables.
Scripting languages are interesting. Old languages like Unix shell have very fat variables, even though all variables are simple strings. Perl and PHP continue this tradition, but Python and Ruby are soundly in the thin variable camp.
Having categorized all popular languages let's do some observations.
- All (pure and impure, strict and lazy) functional languages are thin variable.
- All honestly object-oriented languages are thin variable.
- Thin variable languages and garbage collected languages are very closely related categories. There are some reference counted languages in both camps (Perl, PHP on thick side, Python on thin side), but there seem to be no thin language with manual memory allocation or thick language with full GC.
- All segfaulting languages (assembly, C, C++) use fat variables, but many fat variable languages are non-segfaulting (Fortran, bash, Perl, PHP).
- Dynamic typing is on both sides (Perl/PHP vs Lisp/Ruby).
- Explicit static typing is also on both sides (C/C++ vs Java).
- Implicit static typing is only no the thin side (ML/Haskell) and is actually quite popular there.
- Almost all thin languages have closures. A few languages like Python and Java have less than full closures, in form of named inner functions or anonymous inner classes. In both cases it's a syntactic not semantic limitation.
- Almost no thick language has closures. A big exception is Perl, which has full closures.
- Lexical and global scope exists on both sides, in almost every language.
- Dynamic scope is unusual, but is supported on both thick (Perl), and thin (some Lisp dialects including the original Lisp, Emacs Lisp, and Common Lisp, but not Scheme) side.
- Rich literal notation is supported (Perl vs Python/Ruby/Lisp/ML/etc.) and not supported (C/C++ vs Java) on both sides.
- Macros exists only on the thin side (Lisps, Dylan, Nemerle). There doesn't seem to be any obvious reason for it.
I could go on. It actually surprises me how many semantic differences follow the thin vs fat divide, with Perl and Java being the biggest outliers (and also their derivatives like PHP and C#). These outliers are very interesting. Java's lack of power is definitely syntactic not semantic and there is plenty of JVM languages which are little more than fully compatible alternative syntaxes for Java with more expressive power. Nothing like that ever happened to popular fat variable languages like C/C++, which fail for semantic not syntactic reasons.
The biggest outlier on the fat side in Perl. While Perl was able to get almost 100% score on supported features checklist, it seems to be an evolutionary dead end. Every new thin variable language steals ideas from Perl, but Perl 6 effort was never able to transform the language, and Perl programmers have been leaving for Ruby and Python for years now.
If you're writing a new language today, and every programmer should do that, just forget about fat variables. They have one big advantage of allowing explicit memory management, what can still result in more memory efficient programs, but that's about it. Expressiveness of fat variable languages have been pushed to the limits by Perl, and it seems it cannot be pushed any further. Thin side is already far ahead, with Ruby, Scheme, Haskell and all the small research languages you've never heard of.