Saturday, December 18, 2010

Cryptonomicon: A Lesson for my Hyper-Logical Friends

I'm currently reading Cryptonomicon by Neil Stephenson. I'm not very acquainted with literature at large, so forgive me if I'm being ignorant here, but it seems that this book is unique or among very few that are in wide release and yet somewhat esoteric. That is to say, anybody can appreciate it, but I think it speaks specifically to computer programmers and mathematicians, and may not be 100% understood by those who are unfamiliar with certain mathematical and engineering concepts, and who don't share that mentality. Then again, the purpose could be to provide some insight to outsiders who may want to understand the hyper-logical nerd mentality. Tom Wolfe seems to do a similar thing, for instance, with the investment bankers in Bonfire of the Vanities.

Though I think Neil Stephenson must have a closer personal connection with this mentality. It's a great book for a nerd because it's literature we can really relate to. It's told from the perspective of those of us who try to make logical sense of everything, see patterns all around us, and are confused by strange things like social niceties.

All in all I think it teaches an important lesson to nerds and non-nerds alike. I only just now crossed the 1/3 way mark (it's like 1100 pages), but I just came across some particular dialog which I think is particularly insightful. In this scene, Randy Waterhouse pulls Eberhard Föhr aside during a business meeting, and explains to him why, for their own legal protection, information has been withheld from them by one of their business partners, Avi. Ebehard, being of this nerd mindset, is frustrated that his business partners are not behaving logically. Randy, being of the same mindset but somewhat more enlightened, explains to Ebehard the realities of dealing with illogical people, but he does so in logical terms that Ebehard can relate to. This conversation is amusing like a lot of things in this book, because it demonstrates how us analytical types like to deconstruct everything.

Rather than risk inviting Neil Stephenson's lawyers (I have no idea how likely a scenario this is be but I don't care to do the research right now) I'll just invite you to read this page via Google Books.


I appreciate a couple things about this passage. Firstly, I appreciate that Randy's character is sort of an enlightened techie, who we should aspire to, who respects the qualities of other sorts of people, even if he doesn't understand their mentality. Business people clueless about technology, idealistic designers with a vision, techies who can't design a usable interface to save their life, we should all accept our own limitations of understanding, respect the others, and occasionally yield our own ideals for the sake of other ones. (ex: if "doing it right" means taking twice as long, and failing in the market, what use is your ideally laid out code if nobody's going to use it?)

The other thing I like about this passage is, as I mentioned above, the logical way that it approaches illogical people. Some nerds have a tendency to refuse to approach the world in anything other than a logical manner. Normal People may try to explain to them that the world, particularly other individuals, aren't rational at all, and we should stop seeing things so logically. I include myself in this group of nerds, so honestly, this line of argument is ridiculous to me. The universe is logical. But, I think that sometimes we as nerds are just Doing It Wrong, and we can take a cue from Randy here.

What we need to do is to appreciate that the fact that people act irrationally, out of emotion, is just a condition of the world. Just as we accept that animals are irrational, or that the sun is hot. It's a datum. Further, accept that you yourself, the nerd, are also emotional, particularly when people don't act logically. This frustration with others' illogical behavior is based on an expectation for people to act contrary to their nature. You're ignoring a data point. You're mad at the sun for being hot. You're a non-techie who's mad at your computer for doing something other than exactly what you told it to. Now look who is being irrational? I'm going to agitate a little and propose that we are in fact being hypocritical here.

The main problem I think we sometimes have is the distinction between Logic and Logical Faculties. The expectation of perfection in Logic is not the same as expecting a human to have perfect Logical Faculties. The universe works by rational laws. People are part of the universe, so their workings are rationally explainable. But this is entirely distinct from their Logical Faculties being able to perfectly model the world around them. Furthermore, people's Logical Faculties being able to model the world around them is distinct from their ability to defend it from any of their Emotional Faculties getting in the way. We humans are but animals who happen to possess a limited amount of logical faculties.

Expecting people to act in a rational straightforward manner is like expecting a computer to compute beyond its capacity. A problem may be Logically solvable. There is a perfect Logical progression toward the answer. If we treated computers the same way we sometimes treat other humans, we would demand that we should be able to stick the problem into a computer and get an instant output. But again, Logical Faculties are in limited supply. Somehow we don't seem to have a problem accepting this in computers. In fact, we have entire sub-fields of computer science, taking RAM, HD, and time limitations as data, and creating a whole new set of Logical problems. Why not accept the same limitations and challenges in humans?

Perhaps it's that there is one fundamental difference between computers and humans, which is that our departure from being perfect logic solvers is not just in our processing capabilities, but also, as Randy pointed out in the passage linked above, in our interfaces. Human interfaces are more like neural networks than serial connections. To gain access to the Logical Faculties, one must enter a pattern that is accepted by the neural network. The patterns include such things as social niceties and innuendo. Some of us have simpler interfaces than others. (And as Randy described, some may even require other humans to act as intermediate interfaces. When I worked at Oracle, there was a guy who was fluent in both Engineer and Customer, and intermediated all conversation. I understand this is a common thing to have in a company.)

And you, the nerd, are a neural network, at your core, not a Turing machine. You operate in that domain. That means you have the natural ability, however impaired by years sitting in front of the computer, to interface with other neural networks, if you would just accept your nature. This is in fact the only way you can communicate with other humans, so you might as well accept it for what it is. You may try to approximate a Turing machine, but your neural network nature will still show on occasion. For instance, as I pointed out above, when you are frustrated about others not behaving like Turing machines.

Monday, November 15, 2010

allCombinations: leaveOut

leaveOut

Ok, on my previous post I brought up a small Python module I was inspired to throw together while writing tests. It's still sitting in a gist, though I'll probably move it to a real repo before too long:

https://gist.github.com/674715

So I admit, as I was posting it, it occurred to me that to a large extent this stuff could be replaced with a nested for loop. For instance, this:

for lst in allCombinations([1, 2, oneOf(3,4), oneOf(5,6)]):

can be pulled off with:

for x in (3, 4):
for y in (5, 6):
lst = [1, 2, x, y]

Not a huge gain necessarily on my part. So as I was using it in my testing I realized I once again had engineered something for a tiny use that, neat as it is, could have been done much faster by brute force. But then, I realized another thing I could add that would make my code much more concise. I've added another keyword called "leaveOut". It lets you opt to not have the element show up at all. Here's an example:

allCombinations([1,2, oneOf(3, leaveOut), oneOf(4, leaveOut)])

This will return:

[ [1, 2, 3, 4], [1, 2, 3], [1, 2, 4], [1, 2] ]
And of course, the "leaveOut" case will omit dictionary entries and object data members as well.

BTW

I should also mention another use case I thought of, "leaveOut" aside, that might be a real pain to do without an aide such as allCombinations, which is dynamically created structures, with an arbitrary amount of variables:

allCombinations( [ oneOf(1, 2) ] * x )

I've just generated all possible lists of either 1 or 2, of an arbitrary length, which can be set at runtime. Or how about something a bit more fun:

allCombinations( [ oneOf( *range(y) + [leaveOut] ) for y in range(x) ] )
Taking all combinations of lists of length x, where each element can equal any integer from zero to its index, and then adding combinations where items are omitted. Not horribly useful, but complicated.

To do these in a standard way you'd need x for loops, which you can't do directly. (I bet you could do it with recursion).

Fixes

I'll also mentione that I fixed a couple general errors. oneOf on Data members had a big bug. And now if you don't have oneOf in your structure, allCombinations just returns a list containing only the original structure, instead of looping to death.

Friday, November 12, 2010

allcombinations - generating combinations of python structures

Alright, on a whim I decided to make another tricky thing in Python. This one is less of a hack, and is more likely to be useful.

So let's say you want to do something with all combinations of... something.

[5, 6, oneOf(7,8,9), oneOf(10, 11, "shazaam")]

So you want to turn this structure into all the possibilities represented within:

[
[5, 6, 7, 10],
[5, 6, 7, 11],
[5, 6, 7, "shazaam"],
[5, 6, 8, 10],
[5, 6, 8, 11],
[5, 6, 8, "shazaam"],
[5, 6, 9, 10],
[5, 6, 9, 11],
[5, 6, 9, "shazaam"],
]

Well with allcombinations, you can do just that:

from allcombinations import allCombinations

allCombinations( [5, 6, oneOf(7,8,9), oneOf(10, 11, 12)] )

Here it is. The gist includes more complicated example.

Features
The oneOf should be able to reside almost anywhere in your expression. It can be in a list (as seen here), in a dict, or even in the attribute of an object. It can also reside in a list within a dict within an object's attribute, etc, as long as it's nowhere within an unsupported container type.

Limitations:
This will only work if oneOf resides in a list, dict, or an object's attributes. It shouldn't work anywhere within a set, or any other structure I can't think of. If you try it in something unsupported, the oneOf object should just stick around in all your combinations.

I'm probably actually going to use this, particularly (again) for testing. Anyone else think they'd find it useful? Should I package it?

Testing (Django) views with pyquery

PyQuery is basically what it sounds like. Using jQuery syntax, you can query and even manipulate XML files. Obviously we don't (yet!) have Python in the browser, so it's not useful in the same domain, but it can help out in dealing with XML in general, in the same way as, say, lxml, but without having to learn about things like ElementTree for simple cases. It's particularly good for XHTML because jQuery (and thus PyQuery) uses CSS syntax for class= and id=. Which brings me to how I'm using it:

from django.test.client import Client
from pyquery import PyQuery
from django.test.testcases import TestCase

...

class TestSomeViews(TestCase):

def testAView(self):

client = Client()

...

response = client.get("/someurl/")

self.assertTrue("expected text" in PyQuery(response.content)("#someid").html() )

(If you're unfamiliar with testing Django views, see this.)

For some basic tests, you can just search the entire response html, and not have to worry about where it shows up. But suppose you're searching for a username in a particular part of your response. You're pretty likely to find that username elsewhere on the page, so you have to select out the part of the file you expect it. I think this is much easier than using a regex.

So what this bit of PyQuery does is find the tag with the id of "someid" (presumably there's only only one, being an id), and returns the html within that tag. (If you search for a class that returns multiple tags, it seems that a simple call to .html() will only return the contents of the first one. This very well may match jQuery's behavior, I'm admittedly not that familiar, but just a head's up.) For more details look at the PyQuery API.

Wednesday, October 27, 2010

Django Model Validation

I'm really excited about model validation, in Django circa 1.2. It's going to save me from the disastrous hack of ModelForms I've used so far. One wants something to validate data before saving it, since validating it by saving it is apparently a Bad Idea, since you may have already made changes to the database by the time the error is thrown.

I wanted this done automatically, and Django only had form validation until 1.2, so I hacked forms into some sort of all-purpose wrapper for models. I've since learned to program Django like an adult, and model validation came around just in time anyway. But there's a problem I have with it.

First I should point out that there's a few different things that get validated, but the two I'll point out are A) checking that required fields are set and B) whatever I put into the clean() function. Judging by some errors I've gotten while debugging/testing, Django seems to be checking B before A.

Now, when I use a ModelForm, sometimes I want to use the commit=False option when I save. This returns a model that hasn't been committed to the database. Sometimes there's extra data I want to add to the model that the form didn't supply. Sometimes that data is in fact necessary for the model to be valid. So clearly Django shouldn't check A, and it doesn't. Here's the funny thing though: it does check B. Why would it do that? I can understand checking when I call is_valid(), I can control when that's called, making sure I added everything first.

So far the consequences of checking B early have been trying to access members that haven't yet been set, in my clean() function. So in clean() I just check for those items, and just let it pass if they don't exist. I figure, yeah, it'll pass certain tests when it shouldn't. But if it's really running through validation (not just B as in the save (commit=False) case) that means it'll catch the missing members on the A pass anyway.

Maybe I got something wrong, but I thought it was a weird design decision.

Tuesday, October 5, 2010

Significant Whitespace in Python Data Structures

I recently wrote a program in Python for parsing files. I'm pretty naive still when it comes to functional programming, but I'm still excited about it, so I wanted it to be more functional in style. It had a complicated data structure representing the file structure, instead of a loop with bunch of if-thens. By Python standards I may have gone a bit overboard. Guido probably would not have approved of my code (not to mention what follows in this blog post).

So as a result, most of the program became whitespace irrelevant. Huge dicts of lists of tuples, etc. It made me think that relevant whitespace might become handy for data too. And while talking on IRC about it this morning I realized I could sortof hack it using decorators and generators. So here's what it looks like:

http://gist.github.com/611646

So I have two examples:

  • example.py - This shows how you can define a more complicated structure with whitespace instead of a bunch of ){(}[].
  • inlinefunc.py - This demonstrates a sort of side-effect benefit. You can have multi-line functions inline in a list (or tuple or dict). Usually you're stuck with lambdas, and of course that starts to look confusing too.
I'd like to clean up the syntax. Obviously having to have a decorator before and a yield after isn't great. I'll have to think about how I could do that. Maybe make an even dirtier hack by doing introspection.

Any thoughts?

Saturday, March 13, 2010

A useful script: copytoclipboard

Another quick tool I figure I'd share. Made this one a while ago. When you're doing stuff on the command line and you need to copy something for a desktop app, you end up having to select text on your terminal screen. Well if it's a big text file, you have to copy, paste, scroll to the next part, copy (make sure you didn't copy any part twice), paste, etc.

Well, copytoclipboard takes standard input and puts it into the gtk clipboard (haven't tried this on KDE).

#!/usr/bin/env python

import sys

import pygtk
pygtk.require('2.0')
import gtk

c = gtk.Clipboard()
c.set_text(''.join(sys.stdin.read()))
c.store()

For instance, to paste this code, I went into my terminal and typed:

cat copytoclipboard | copytoclipboard

and pasted the results here. For a less confusing example:

uptime | copytoclipboard

lets me paste this:

17:42:14 up 3:23, 2 users, load average: 0.35, 0.39, 0.41

EDIT: Or I could just do a google search and find that xclip already exists. Oh well.

EDIT2: No, xclip seems to be its own thing. I just tried it, doesn't seem to work with gtk. I guess I still win!

Thursday, February 25, 2010

"New Music" script

I just thought of a script (can't test it at work so I'll actually write it later, but it'd be only a few lines long). Would work thusly:
Desktop>newmusic britney_spears_leak_trance_remix.ogg ~/music/various/
It would act like mv, just move the song over to the "various" directory, but it would also add a symlink to
~/music/new/
That way, I'll remember to give it a listen. And when it's no longer novel I can just delete the symlink. Otherwise this crap gets accumulated on my Desktop. Maybe happens to you too.

Maybe it could be made into a Nautilus script or something, but that seems a bit more tricky if you're dealing with drag n drops.

Sunday, February 14, 2010

Making new twitter followers marginally more convenient

Well, one tiny annoyance that I just randomly decided I felt like itching - I get a handful of emails from twitter saying so-and-so is following me. Usually it's somebody I'm not interested in, but on occasion it's a friend. What annoys me is that I have to open their profile to see. I'd prefer to see the bio and their last few tweets in the email itself. Would help me decide much faster. Well, I've started on a real hack of a solution. It should make it a little less annoying:

http://gist.github.com/304335

Look it over. If you trust it, run it on a command prompt (works on Linux, all I can say). It asks for your gmail password. It opens browser tabs for all the twitter accounts that have followed you that are still in your gmail inbox.

Now, I'm making some wild assumptions about the emails that Twitter is sending us (namely that the first Twitter url is the profile in question), so this may not open up everything correctly. Especially if Twitter changes the email. So, I also open up a tab with a gmail search for all those emails. You can look it over and confirm that it opened up the right tabs, plus this helps you archive them right away.

If it doesn't work, do this first:

https://www.google.com/accounts/DisplayUnlockCaptcha

I guess it tells Google that your computer isn't engaging in any anti-human activities. I had to do it.

I may eventually have it generate an html page with all the useful info right there, so you can look it over quicker. I'll also look for a way to have a GUI password entry thing, so you won't need a cmd prompt. Or, you know, if any of you Open Sourcerers want to do it yourself and send back the update, that would be great too.