Saturday, July 16, 2011

Haskell - Can your language's type system do this?

Haskell is known for its very strict, and yet very versatile type system. There's so much to learn about it, and I'm barely scratching the surface myself. But I've thought of a simple example of something impressive that I think people who know nothing about the language could still appreciate, and hopefully it will convince you to check it out.

Note: There is something in Haskell called "Type level programming". I'll leave you to look it up and decide whether what I'm doing here is an example of this or not. I never figured that out. but what I'm about to show you is at very least something you probably can't do in your language, unless your language is even cooler than Haskell.

Let's start with this simple program:

data Purse a b c = Purse a b c


data Keys = Keys

data Phone = BlackBerry | IPhone | Android

data Wallet = LeatherWallet Float | MoneyClip Float


addKeys (Purse ( ) b c) newkeys = Purse newkeys b c

addPhone (Purse a ( ) c ) newphone = Purse a newphone c

addWallet (Purse a b ( )) newwallet = Purse a b newwallet


emptyPurse = Purse ( ) ( ) ( )


leaveHouse :: (Purse Keys Phone Wallet) -> ( )

leaveHouse purse = ( )


In short, this program makes sure that you add your keys, a phone, and a wallet to your purse before you leave the house. However, unlike most languages, your compiler will make you do this, rather than the runtime.

So first, let me decipher this thing for you.

data Purse a b c = Purse a b c


This creates a data type called Purse. It is a parametric data type, meaning that it takes other types as parameters, in this case 3 different ones. This is very similar to Java or C++, with something like Vector< int >. a, b, and c can be anything, at least for now. On the right side of the equal sign, we describe what it requires to construct an object of type Purse. This also takes three parameters, however these parameters are values, not types, but the values must be of types a, b and c respectively. (Note that in Haskell, it's fairly common for the type, and the value of the type, to have the same name. It might seem confusing, but if you understand the context they're each used in, it's unambiguous which is meant. So here, on the left, Purse is the type, but on the right, Purse a b c represents the value, or more properly one would say that on the right Purse is the type constructor)

This is basically like a struct in C. So in all, we have something like a templated struct in C++, with one data member of each parametric type. For example, based on this declaration, you could create a variable of type Purse Int Float Int that had the value Purse 1 2.3 4 or Purse 2 5.7 8. You could also (in the same program) create a variable of type Purse Int Char Char, and it could hold the value Purse 5 'a' '%' .

data Keys = Keys

data Phone = BlackBerry | IPhone | Android

data Wallet = LeatherWallet Float | MoneyClip Float


Phone and Wallet are also datatypes. They are not parametric, however they have a different feature. The variables of these types can each take on multiple forms. A Phone variable can either have the value BlackBerry, IPhone, or Android. Very similar to an enum in C, they basically are constants, and variables of type Phone are restricted to them. Wallet however, is a bit different. In addition to having multiple forms, it also has a Float parameter. If you remember Purse variables had parameters of unspecified types, however in this case we specify that it must be a Float. Keys is simplest of all, it can only have one value, Keys, with no parameters.

addKeys (Purse ( ) b c) newkeys = Purse newkeys b c


This is a function definition. It takes two parameters. Similar to def addKeys(purse, newkeys): in Python. However, Haskell uses something called pattern matching. You can find this in Erlang, and I'm sure a good handful of other languages as well. What this generally does is allow you to define a function multiple times, for different input, as long as every definition has the same type signature. However in this case we're only doing one, because we just want to use it to (partially) define the type signature. The first parameter of the Purse object has to be ( ), which is of type ( ). This is in some ways like None or NULL in other languages, however it's its own type, with only one value. Variables a and b can be any value of any type. What it returns is a new Purse, with newkeys as its first parameter, rather than ( ). The other parameters depend on what's passed in.This also means that the new Purse has a different type than the first one: Purse Keys b c, rather than Purse ( ) b c

Similar in Python would be:

def addKeys(purse, newkeys):

if purse.a == None:

newpurse = copy(purse)

newpurse.a = newkeys

return newpurse


The difference being that the Haskell version restricts the type of purse, the consequences of which we'll see in a bit.

addPhone (Purse a ( ) c ) newphone = Purse a newphone c

addWallet (Purse a b ( )) newwallet = Purse a b newwallet


Exactly the same as addKeys, except that ( ) are in different slots, appropriate to where this new phone or wallet should go. Again, note that the type of purse returned is different than then one entered, and that the type entered is partially parametric. For instance, for addPhone, c can either be of type Wallet or of type ( ) . Actually, again it could really be of any type, but if the purse was created with the functions defined in this program, it would be one of those.

emptyPurse = Purse ( ) ( ) ( )


This is defining a value called emptyPurse. It is a Purse of type Purse ( ) ( ) ( ) and of value Purse ( ) ( ) ( ) .

leaveHouse :: (Purse Keys Phone Wallet) -> ( )

leaveHouse purse = ( )


Finally, we have a function with an explicit type signature. Until now we've been taking advantage of type inference. However, here we want to force leaveHouse to have a specific type signature. The one argument to this function is of type Purse Keys Phone Wallet. Basically, any wallet with everything in it. No ( ). The return value of this function happens to be ( ) , but this is only because this is for demonstration. For our purposes, we don't care what it returns.

Ok cool, so what's the point? Well let's fire up the interpreter, ghci:

Prelude> :l purse.hs

[1 of 1] Compiling Main ( purse.hs, interpreted )

Ok, modules loaded: Main.

*Main> let p1 = emptyPurse

*Main> let p2 = addKeys p1 Keys

*Main> let p3 = addWallet p2 (LeatherWallet 50.24)

*Main> let p4 = addPhone p3 BlackBerry

*Main> leaveHouse p4

( )


Cool, we've left the house successfully. Let's inspect the types of each purse:

*Main> :t p1

p1 :: Purse ( ) ( ) ( )

*Main> :t p2

p2 :: Purse Keys ( ) ( )

*Main> :t p3

p3 :: Purse Keys ( ) Wallet

*Main> :t p4

p4 :: Purse Keys Phone Wallet


Remember, these aren't the values, these are the types.

Now let's restart out interpreter, and try to skip a step:

Prelude> :l purse.hs

[1 of 1] Compiling Main ( purse.hs, interpreted )

Ok, modules loaded: Main.

*Main> let p1 = emptyPurse

*Main> let p2 = addWallet p1 (MoneyClip 5.20)

*Main> let p3 = addKeys p2 Keys

*Main> leaveHouse p3


<interactive>:1:11:

Couldn't match expected type `Phone' against inferred type `()'

Expected type: Purse Keys Phone Wallet

Inferred type: Purse Keys () Wallet

In the first argument of `leaveHouse', namely `p3'

In the expression: leaveHouse p3

*Main> p4 = addKeys p3 Keys


We now have an error, because it was expecting a purse without ( ) anywhere in its type parameters. Remember, it's not complaining about a bad value, this pass doesn't know about the value, because this is a compiler error. It's complaining about the type. This doesn't look too different from something you might set up in Python, but remember, with Haskell, if you put these lines into a program, you would get these errors without having to run it (you wouldn't even be able to).

Now for fun, with the same interpreter session, let's try making a new purse, adding the keys to p3 a second time:

*Main> let p4 = addKeys p3 Keys


<interactive>:1:17:

Couldn't match expected type `()' against inferred type `Keys'

Expected type: Purse () t t1

Inferred type: Purse Keys () Wallet

In the first argument of `addKeys', namely `p3'

In the expression: addKeys p3 Keys


It's expecting a Purse with the first parameter of type ( ). Any Purse that has addKeys in its "history" will not have this in its type, it will be of type Keys, so the compiler will not let you call addKeys again.

Now, remember one particularly interesting thing here: when desiging the addKeys, addWallet, and addPhone functions, we did not know exactly what type of Purse would be going into them. We just set one restriction, that one particular slot had to be empty. The actual types of purse going into the functions were realized by what called the functions.

Why would you want to do something like this? Supposing you were creating a library function to call on some sort of API. You want to allow for multiple function calls to initialize some sort of object before kicking it off to the library. In a language like C or Python, failure to set some sort of initialization variable would lead to a runtime error you have to decipher. Here, the compiler won't let you run the program until you to set all of the necessary parameters, and you can set them in any order*. There's a benefit to having things fail before you even have a chance to run them. It's a pain to get working, but once it does, your program will have far fewer errors.

This trick I've described to you, however confusing I made it sound, is based on the very basics of the Haskell type system. There's a whole crazy world out there. It's complicated, but once you get to know it, you can force yourself into some pretty stable programs.

* Actually, the way I have it set up, while you can run these functions in any order, it (for better or for worse) forces you to call the functions in the same order every time you use them, if you ever use it more than once in the same program. However, using more advanced features (typeclasses) I think it would be possible to overcome this, though I've not tried it.

4 comments:

Jeremy Shaw said...

Here is a version that allows you to store an unlimited number of items in your purse in any order, but still ensures that you have the essential items:

http://hpaste.org/49143

The NotElem and IsElem classes are a bit ugly. I think it would be a bit cleaner in Agda.

Dan said...

I'm still a novice at Haskell, so it will take a minute to read through that and understand it all. Does this one still enforce the same things through the type system rather than during runtime?

Thanks for the feedback!

Jeremy Shaw said...

Yeah. If you look at the bottom of the example, you will see that purse3 and purse4 fail to type check.

The rules I enforce are:

(-->) only allows an item (such as Keys) to be added once.

leaveHome requires that you have your Keys, Wallet, and Phone. But you can have other things too.

Jeremy Shaw said...

Also, my example is pretty wacky Haskell code. You don't see code like that in the wild much.