Badly Typed

ill-informed idioglossia

Why Not Just Use Java?

So, this is kind of a hard one.

Java’s got’s a lot going for it… the JVM is a pretty impressive piece of work, the runtime is pretty capable (and quite portable, for what that’s worth), and there’s a whole raft of third party libraries for all sorts of purposes. The language is quite mature and clearly isn’t about to go anywhere anytime soon, and there’s no shortage of folk out there who know how to code in it. It is interesting to note that there are many very powerful and quite ‘grown up’ open-source libraries and applications available in Java. Things like Hadoop, Eclipse and Open Office spring to mind.

The language stagnated somewhat whilst Sun was at the helm (you could argue that one man’s “stagnation” is another’s “stability”, of course) but Oracle seem to have stepped up their game somewhat: Java 8 even gets closures, so once again it has a pretty comparable set of features to C# and C++ (!). On the flip side, it suffers from all of the same complains I have about C# (not very surprising, really) plus a few more, like the fact that it inflicts a punishment upon its generics called “type erasure” upon its generics, so their generic nature is effectively invisible at runtime. This is of course quite expedient and makes implementing generics a fair bit easier, but the fact that dotnet has generics as first class citizens which support reflection shows that doing it properly isn’t just possible, but in some cases actually a benefit (but I won’t go into boxing issues here).

Have a read of this rant by Jamie Jawinski. It is practically ancient history now, but many of its most important parts still stand. I note he is still coding in C to this day… his observation that it is the only way to write portable code is sadly still very much the case.

That’s not the worst of it, of course.

Object Orientation

Object Orientation is the programming paradigm du jour. Everyone is at it, all the most popular languages either support it or demand it. Curiously though, it is slightly difficult to define.

To my mind, the very simplest possible definition of an object is this:

An object comprises some persistent state, and some code that acts upon that state.

(and just to define some terms before I get stuck in: I shall use the term “member variable(s)” to refer to an object’s state, and “methods” to refer to the functions that act upon those variables)

This is a very broad and inclusive definition, but it provides a good foundation for all future nitpicking. I’m reasonably certain that this is a required minimum of functionality… without persistent state, you just have a function, and without code to act upon that state you just have a value.

Given that this definition encompasses things like C functions with static variables, closures, more or less any datastructure (or indeed value) when combined with a function that either accepts or returns that datastructure, it is effectively so vague as to be almost useless.

So let the nitpicking commence.

Expressions and Statements

First, I’ll attempt to define the terms. A statement is the smallest chunk of code in a language that does something. If an application or function can compile or run and actually do something, then presumably it comprises at least one statement. An expression is a chunk of code that when evaluated, returns a value. An expression might be a statement but the reverse is not necessarily true; a statement that does not evaluate to something cannot be an expression. You can’t return it from a function or assign it to a variable, for example.

Seems like a fairly minor distinction, and it is one that I didn’t appreciate for a startlingly long time. The first thing I noticed was that a language that has both statements and expressions becomes a bit of an inconvenience if you want to make a nice REPL for it (a Read-Eval-Print-Loop, does exactly what it says on the tin. Better definitions may be found elsewhere on the internets). I’ve embedded a simple Lua REPL into my Last Big Project which was a mostly straightfoward and simple task, given that Lua has eval (see last post about Dynamic Languages) in the form of loadstring (at least in version 5 and 5.1) and reifies lexical environments as tables so you can have a persistent sandbox environment for your REPL to play in without breaking everything else. And so on.

Now, your calculator (or computer simulation thereof) is a simple REPL; you stick in an expression like 1 + 2 and it calculates the result and shows it you; 3. You can’t do that in lua; 1 + 2 might be an expression (that evaluates to 3) but it isn’t a statement, and if you feed it to the lua parser you’ll get unexpected symbol near '1' in return. Which sucks. So if you wan’t to use your lua REPL as a calculator, you have to create a value-returning expression in the form of return 1 + 2, a slightly irritating little bit of ceremony.

There’s more, naturally.

Dynamism

I’n my short rambling intro to my thoughs on polyglot programming I glossed over static and dynamically typed languages a bit, and I guess I should probably expand on that a little. There’s a nice write-up by Steve Yegge of one of the talks he’s made on the subject which is well worth a read (or a listen).

Basically, dynamic languages do their type-checking at run time. That’s more or less it. So on the one hand, you lose all of the compile-time type checking which has the potential to catch all manner of silly and subtle mistakes, and you lose all of the programmer friendly documentation that your type annotations provide, and you also lose out on th opportunity to hint to your compiler (or JIT) what the purpose of a given object is which has the potential to hinder some kinds of optimisations.

On the other hand you suddenly have to type half as much (but see type inference below) and a handful of refactoring issues have suddenly gone out of the window because you can chop and change types and type names to your heart’s content. You are also basically forced to write decent test cases for any moderately complex project in order to make sure you haven’t made any stupid mistakes that a type system would have noticed… but then on the other hand you should be writing lots of test cases anyway if you know what’s good for you.

Polygot Programming I

One thing that became abundantly clear to me some time ago was that there was no one perfect programming language that I could use to do everything I wanted. This wasn’t really a surprise; pretty much everything in life is a compromise after all. In programming at least, you don’t have to pick one language for a job and stick with it however. Enter the notion of polyglot programming; using multiple languages coupled together in the same application.

In my day job I do a fair bit of C++ work, in areas where reasonably fast low-level stuff is required such as image processing on a live video stream. C++ gives me nice deterministic memory allocation and deallocation, raw pointer access and only a small amount of language “magic” so I can reason about what is going on internally without too much effort. It makes a lousy application programming language however. As I work for a Windows shop, we’re big on dotnet for application or service layer stuff, and dotnet gives you C++/CLR, an interesting half way house that lets you mix garbage collected managed objects with conventionally allocated and deallocated resources and use dotnet assemblies and perhaps more importantly: create dotnet assemblies from your ansi C++ code.

Once you’ve actually got one of these mixed assemblies building it is simplicity itself to reference it from a nice C# application in which it becomes vastly faster, easier and safer to write all the rest of the system where you don’t have to worry so much about bitbashing. This is quite a powerful language combination, but C# is not the simplest or fastest-to-code language out there, on account of being a strong, safe, static, explicitly typed language. If the application layer could be written in a dynamic language instead, think how much easier life could be!

Perl

A few years ago in the dark days between my graduation and Getting A Real Job, I did some web dev stuff. Back in the day, if you wanted cheap web hosting, that basically meant you got PHP. I wrote quite a lot of PHP and cultivated a seething hatred for it; it has been some time since I wrote any so I won’t be enumerating my complaints here. Suffice to say, it is still not a particularly nice language but some of the worst warts have been burnt off so it appears to be slightly less awful to use now (6+ years on).

So I hunted around for a nice MVC framework in a slightly more capable language with a decent standard library and hit upon Perl and its Catalyst framework. The reasons I chose Perl over Python (I couldn’t work out which framework was better, Django, Turbogears or Pylons and that didn’t help) or Ruby on Rails seemed important at the time but in retrospect weren’t really that compelling.

Nowadays I just use Perl for scripting. But why the change of heart?

Funargs

One of the programming languages I find myself using most often are closures. Concise and accurate definitions of what a closure is are readily available elsewhere on the web, so I’ll just gloss over the problem by saying they’re are a bit of (usually compile-time specified) code with some associated (usually runtime-generated) data. This is a bit of functionality that can basically be replaced with the aid of a class… consider function objects in C++ or anonymous runnables in Java. In languages where functions are first-class types however, the act of creating and using closures becomes significantly more convenient.

Quick terminology note: in this post, I’ll use “anonymous function”, “lambda” and “closure” interchangeably. This isn’t necessarily correct, but lets not worry about that right now.

Anyway, the ability to create and throw around functions on the fly is tempered by two things, the upwards and downwards funargs problems, where “upwards” and “downwards” refer to directions on the stack. The key notion here is the stack frame; all of a function’s local state is contained in a stackframe and remains on the stack until that function exits. When the function calls another function, all of the caller’s state is kept in the stack ready to be re-used when the callee returns.

There are a handful of exceptions here, but for the moment I’ll ignore the languages which (for example) don’t worry about stacks.

The downwards funarg problem would appear to be the easier of the two, and the one that is most often dealt with in many programming languages. It involves a called function accessing the values stored in stack frames below its own; all those stack frames and their associated values will exist on the stack for the lifetime of the function so there’s a minimum of magic involved.

On Blub

Blub is a hypothetical language coined by some guy called Paul Graham. He’s a Lisp programmer and venture capitalist, so that probably tells you a fair amount of what you need to know about him.

The idea is that you can arrange programming languages in some sort of hierarchy, ordered by some property. Let’s call it “expressive power”, which is a fairly nebulous concept we’ll gloss over for now. Just nod and smile. Assuming the languages being compared are Turing complete, they can all ultimately be used to solve the same problems (this is part of the Church-Turing thesis; some of the key ground work for modern computer science) but the ease with which they may be used by a programmer differs wildly (see also, the Turing Tarpit problem).

Blub isn’t a new or sexy language, but it isn’t (very) old or cruſty either. It is more expressive and easier to use than C, which is in turn much friendlier than assembler and so on. It sits in the middle of the hierachy of programming languages, and is probably used by an average corporate code monkey. We won’t worry about what lies above it in the hierarchy just yet.

And Yet

C# did not make the grade, however. There are two key reasons for this. Firstly, the CLR is not as readily available as its major competitor, the JVM. The official Oracle releases work out of the box on my usual development platforms, whereas the official CLR is Windows only which presents a significant hassle in terms of license fees and platform requirements. Mono is better, but like the OpenJDK it feels a lot like a second class citizen. Secondly, there’s a lingering spectre of litigation over anything that Microsoft releases for free… they’ve chosen to tie their fortunes to Windows, which does not necessarily seem like the smartest thing they could be doing, but does mean that using Microsoft IP on non-Microsoft platforms might prove to be a mistake in the future.

This seems a bit of a shame, seeing as neither of these are really good technical or aesthetic reasons for ditching the language. Never fear though, I’ve got plenty of gripes.

Some irritations I’ve found, in no particular order:

Next Up: C#

Given that my day job is for a Microsoft shop, it is fairly unsurprising that I’d use a fair bit of C#. Whilst it started as a bit of a “me, too!” of Java (at least to my eyes) it seems to have turned into a very nice language indeed, surpassing Java in a lot of ways (for which I largely blame Sun).

The foundation of the .net framework (the CLR) provides a fair chunk of the features that many people like about C#, and those same features are fairly easily useable by other languages that compile to CLR bytecode, so I won’t talk about them much here. Not performing type-erasure on generics at runtime is one, perhaps, and the whole reflection mechanism and the stuff it provides like attributes and so on. Delegates are another.

LINQ doesn’t quite fit in to being either a .net feature (you don’t see it in C++/CLR, for example, and F# is a bit of a second class citizen when it comes to LINQ support) or a C#-only feature (VB.net gets it too). There’s plenty of interesting magic associated with the way LINQ works, but as I don’t spend a whole lot of time querying or performing complex transformations on various kinds of structured data objects, I don’t really use LINQ much at all. So let’s just gloss over it for now.

Given that C# is a fairly conservative language in the whole Algol, C++ sort of family, there aren’t a whole lot of particularly exciting or alient language constructs that you actually get to make use of in “day to day” coding. A handful of features do stand out, however…