Central to this flurry of activity is Paul Graham's quality metric, that shorter (in terms of node count) is better. In fact, Paul explicitly says "making programs short is what high level languages are for."
I have already posted an argument against this position, citing APL as an example of a very concise language that is nonetheless not widely considered to be the be-all-and-end-all of high-level languages (except, perhaps, among its adherents). Here I want to examine Paul's assumption on his own terms, and see what happens if you really take his quality metric seriously on its own terms.
Paul defines his quality metric at the end of this essay:
The most meaningful test of the length of a program is not lines or characters but the size of the codetree-- the tree you'd need to represent the source. The Arc example has a codetree of 23 nodes: 15 leaves/tokens + 8 interior nodes.
He is careful to use nodes rather than characters or lines of code in order to exclude obvious absurdities like single-character variable and function names (despite the fact that he has a clear tendency towards parsimony even in this regard). So let's take a look at Arc and see if we can improve it.
One of the distinguishing features of Arc compared to other Lisps is that it has two different binding forms that do the same thing, one for the single-variable case (LET) and another for the multiple-variable case (WITH). This was done because Paul examined a large corpus of Lisp source code and found that the traditional Lisp LET was used most of the time to bind only a single variable, and so it was worthwhile making this a special case.
But it turns out that we can achieve the exact same savings WITHOUT making the single-variable case special. Let's take a traditional Lisp LET form:
(let ((var1 expr1) (var2 expr2) ...) form1 form2 ... formN)
and just eliminate the parentheses:
(let var1 expr1 var2 expr2 ... form1 form2 ... formN)
At first glance this doesn't work because we have no way of knowing where the var-expr pairs stop and the body forms start. But actually this is not true. We CAN parse this if we observe that If there is more than one body form, then the first body form MUST be a combination (i.e. not an atom) in order to be semantically meaningful. So we can tell where the var/expr pairs stop and the body forms begin by looking for the first form in a varN position that is either 1) not an atom or 2) not followed by another form. If you format it properly it doesn't even look half bad:
(let var1 expr1
(let var1 expr1
In order to avoid confusion, I'm going to rename this construct BINDING-BLOCK (BB for short). BB is, by Paul's metric, an unambiguous improvement in the design of ARC because it's the same length (in terms of nodes) when used in the single-variable case, and it's a node shorter when used in the multiple-variable case.
In fact, we can do even better by observing that any symbol in the body of a binding block that is not at the end can be unambiguously considered to be a binding because otherwise it would not be semantically meaningful. So we can simplify the following case:
(bb var1 expr1
(bb var2 expr2
(bb var1 expr1
Again, this is an unambiguous win because every time we avoid a nested binding block we save two nodes.
But notice that something is happening that ought to make us feel just a teensy bit queasy about all these supposed "improvements": our code is starting to look not so much like Lisp. It's getting harder to parse, both for machines and humans. Logical sub-blocks are no longer delimited by parentheses, so it's harder to pick out where they start and end.
Still, it's not completely untenable to argue that binding blocks really are an improvement. (For one thing, they will probably appeal to parenthophobes.) I personally have mixed feelings about them. I kind of like how they can keep the code from creeping off to the right side of the screen, but I also like being able to e.g. select a sub-expression by double-clicking on a close-paren.
Can we do even better? It would seem that with binding-block we have squeezed every possible extraneous node out of the code. There simply are no more parentheses we can get rid of without making the code ambiguous. Well, it turns out that's not true. There is something else we can get rid of to make the code even shorter. You might want to see if you can figure out what it is before reading on.
Here's a hint: Arc already includes a construct [...] which is short for (fn (_) ...).
Why stop at one implied variable? Why not have [...] be short for (fn ((o _) (o _1) (o _2) ...) ...)? Again, this is an unambiguous improvement by Paul's quality metric because it allows us to replace ANY anonymous function with a shorter version using implied variable names, not just anonymous functions with one variable.
And now we should be feeling very queasy indeed. On Paul's quality metric, all those variable names are expensive (every one requires a node) and yet I hope I don't have to convince you that getting rid of them is not an unalloyed good. If you doubt this, consider that we can use the same argument convention for non-anonymous functions as well, and replace the lambda-list with a single number indicating the number of arguments:
(def foo 3 (do-some-computation-on _1 _2 _3))
Again, an unambiguous improvement by Paul's metric.
We can extend this to binding blocks as well, where we replace variable names with numbers as well. In fact, we don't even need to be explicit about the numbers! We can just implicitly bind the result of every form in a binding block to an implicitly named variable! Again, on Paul's quality metric this would be an unambiguous improvement in the language design.
All this is not as absurd as it might appear. If we push this line of reasoning to its logical conclusion we would end up with a language that very closely resembled Forth. Forth is a fine language. It has its fans. It's particularly good at producing very small code for use on very small embedded processors, so if you need to program an 8-bit microcontroller with 4k of RAM it's not a bad choice at all. But God help you if you need to debug a Forth program. If you think Perl is write-only, you ain't seen nothin' yet.
The flaw in the reasoning is obvious: adding nodes to program source has a cost to be sure, but it can also provide benefits in terms of readability, flexibility, maintainability, editability, reusability... and there's no way to make those tradeoffs unambiguously because they are incommensurate quantities. That is why language design is hard.
Now, I don't really mean to rag on Arc. I only want to point out that if Arc succeeds (and I think it could, and I hope it does) it won't be because it made programs as short as they can possibly be. It will be because it struck some balance between brevity and other factors that appeals to a big enough audience to reach critical mass. And I think Paul is smart enough and charismatic enough to gather that audience despite the fact that there might be a discrepancy between Paul's rhetoric and Arc's reality. But IMO the world would be a better place (and Arc would be a better language) if the complex tradeoffs of language design were taken more seriously.
Sami Samhuri asks:
How do you account for the following case?
(let i 0
(x y) (list 1 2)
(prn "i is " i)
(prn "x is " x)
(prn "y is " y))
Well, the easiest way is (bb i 0 x 1 y 2 ...
Or, less glibly: (bb i 0 L (something-that-returns-a-list) x (1st L) y (2nd L) ...
Of course, if you want to support destructuring-bind directly you need some kind of marker to distinguish trees of variables from forms. In this case, Arc's WITH syntax, which essentially uses a set of parens to be that marker, becomes tenable again (though it doesn't actually support destructuring even though the syntax could easily be extended to support it). Another possibility is to overload the square-bracket syntax, since that also would be a no-op in a binding position. Now things start to get even more complicated to parse, but hey, shorter is better, right?
But what happens if some day you decide that multiple values would be a spiffy feature to add to the language? Now you're back to needing a marker.
Personally, in my BB macro (in Common Lisp) use the :mv and :db keywords as markers for multiple-value-bind and destructuring-bind. So your example would be:
(bb i 0
:db (x y) (list 1 2)
; and just for good measure:
:mv (q s) (round x y)
I also use a prefix "$" character to designate dynamic bindings, which tends to send CL purists into conniptions. :-)
And to top it off, I use the :with keyword to fold in (with-X ...) forms as well. The result is some very un-Lispy-looking Lisp, but which shorter-is-better advocates ought to find appealing.
Here's the code for BB in case you're interested:
;;; Binding Block
(defmacro bb (&rest body)
((null (rst body)) (fst body))
((consp (1st body))
`(progn ,(1st body) (bb ,@(rst body))))
((not (symbolp (1st body)))
(error "~S is not a valid variable name" (1st body)))
((eq (1st body) ':mv)
(if (symbolp (2nd body))
`(let ((,(2nd body) (multiple-value-list ,(3rd body))))
(bb ,@(rrrst body)))
`(multiple-value-bind ,(2nd body) ,(3rd body)
(bb ,@(rrrst body)))))
((eq (1st body) :db)
`(destructuring-bind ,(2nd body) ,(3rd body)
(declare (special ,@(find-specials (2nd body))))
(bb ,@(rrrst body))))
((eq (1st body) :with)
`(,(concatenate-symbol 'with- (2nd body)) ,(3rd body) (bb ,@(rrrst body))))
((keywordp (1st body))
(error "~S is not a valid binding keyword" (1st body)))
(t `(let ((,(1st body) ,(2nd body)))
(declare (special ,@(find-specials (1st body))))
(bb ,@(rrst body))))))
Note that it would be straightforward to extend this to allow users to define their own binding keywords, which would really help things to spin wildly out of control :-)