Saturday, January 28, 2012

Playing Words with Friends

Words with Friends


Words with Friends, if you're not already familiar, is a game of Scrabble that lets you play asynchronously with friends over the internet.


I recently started playing WoF because I like to think I have a large vocabulary and rarely get the opportunity to use it.  I also enjoy playing boardgames like Scrabble but between work and ... sleep... I don't often get the chance to get together with my friends and play Taluva into the wee hours.


Anyway, I have a confession to make.


I'm terrible at Words with Friends


Whenever it is my turn I look at my hand of letters:


ILNIREE


And think:


ILNIREE is not a word.  Darn.


And then I start to mentally rearrange letters to see if I can make other words.  WoF's conveniently includes a "Shuffle" button so I hit that a few times...  And while given enough time I do come up with a word a few things bug me about this process.


First of all, years of Google and Wikipedia use have atrophied important muscles like "memory".  Second, i'm a programmer and that means


Laziness1


Basically shuffling letters and checking or guessing that they are words is a perfectly straightforward although not terribly efficient way to try to optimize your scrabble game.  Of course, it's got a lot of drawbacks if you try to do it by hand.  It's slow and it's uninteresting.  To try to find every word possible with any of your 7 letters you'd need to try all permutations and there are over 10,000 permutations of 7 letters.  

Anyway, I'm bad at Words with Friends but I like programming so once I started playing WoF of course I gravitated towards...


Solving Words with Friends2


Boring and repetitive work is exactly what computers are good at it!  In fact the Python program to generate all possible permutations of 7 letters (1 letter words, 2 letter words, 3 letter words, etc) is pretty straightforward especially with a little help from our friend itertools.


My first and simplest attempt involved putting the contents of the standard unix dictionary (/usr/share/dict/words) into a python dictionary (an excellent hash table implementation). 


This isn't exactly what I came up with first but it should illustrate what I'm talking about:


Of course you end up looking at a lot of permutations for 7 letters (~13 thousand) so this can take a little while. It's still not slow on any reasonable computer but when you start adding in other letters to try and find words that might fit in with available spots on the board things can slow down. For example, say you could play words that begin with t, l, or r. Running the above getScrabbleLetters function for 10 letters requires you to check 9,864,101 permutations.

Worse still, if you add in the notion of wildcards (9 letters plus some character like '*' to represent a wildcard) you end up repeating those over 9 million lookups 26 times to check all the other possible letters of the alphabet in the wildcard slot.  That's over 250 million dictionary lookups to find a fairly small list of unique words.

However, if we step back for a moment and think about anagrams, a simple tweak can help process larger strings (and strings with wildcards) much more quickly.

Imagine you've got a couple anagrams like "was" and "saw".  They've got the same letters but count as two different words in our dictionary and require two separate lookups.  If we shift our perspective a little bit and start thinking about the letters as what's important we get some very nice performance improvements.

We still use a hash table as our primary data structure but instead of using the words as the keys we use the sorted letters of a word as the key.  The values become lists of all the words which include those same letters.  It's a little bit more work to insert, as we first have to calculate what the key would be, and it's a little more work to lookup, as we have to iterate over the list of possible words.  But it's great any time we care just about anagrams and it's much more efficient in terms of the number of lookups required to find all the scrabble words.  For example, for the same 10 letter case from earlier we now need only 1024 lookups.  ~26 thousand lookups for the wildcard case.

Without further ado that solution looks like this:



One other really useful bit is a quick way to look for all the words that have a given suffix or prefix. This is a little trickier than the anagram case but it's also more interesting because I got to spend a little time reading about a new data structure, a Trie. A Trie is a tree-like structure optimized for storing strings.  The root node of the tree represents the empty string and each of its children represents the first letter of a word.  Every child of these nodes is the next letter in a word starting with that substring.  


For example, a trie of nodes for the words "an apple yum" would look like:

So the trie solution for finding words that begin with a certain prefix is:



The suffix version is just a couple more lines with judicious use of the reversed function.

Conclusion and Next Steps

I think that Words with Friends is a lot more fun now that I have my computer as a crutch.  I've enjoyed playing with some simple algorithms for finding algorithms and looking up strings.

My next steps after today are going to be putting up my utilities here on my website so that I can share them with my non-programmer friends.  Building a simple web service for solving/cheating at scrabble is a hopefully interesting topic in its own right so building that will probably be my next blog post.

After that I'm interested in taking this to its logical conclusion and actually building (and blogging about) a scrabble solver where you can enter in the whole state of the scrabble board and your hand and have it tell you all your possible moves (or at least the subset of moves that give you higher scores).

If you made it this far you might be interested in the actual tested source code here on github.


1
Larry Wall calls this the first great virtue of a programmer.  
2
Perhaps a more honest heading here would be "Cheating at Words With Friends" :-)











Saturday, January 21, 2012

Emacs Archaeology



Recently a friend asked me for my Emacs configuration. RMS would excommunicate me from the church if I didn't make my configuration available to any who asked so, of course, I started packaging it up and shipping it off to my friend. However, to help my friend it seemed reasonable to explain what all I've accumulated over the last few years, highlight some areas she might want to customize differently, and to point out some modes and tools I find particularly helpful in my day to day work.

After opening up a terminal and poking around in my ~/etc/emacs.d directory I started to realize that after all this time I wasn't entirely familiar with my Emacs configuration either. Of course, no one has any geek cred at all if they can't explain how their editors configured so I started digging in to remind myself what the heck all this stuff does.  
Here's a few numbers. Over the years I've written ~1500 lines of elisp code spread across 27 *.el files and accumulated another ~60,000 lines of dependencies (mostly downloaded from EmacsWiki and/or Emacs Fu).

In any event, over the next few weeks I'll be cleaning things up a bit and posting more about some of the things about Emacs I really love.

If anyone's interested in my current configuration I have a bitbucket repo (maybe I should move it to
github?) here (https://bitbucket.org/steder/dotfiles) with all of my
dotfiles including the emacs stuff if you'd like to take a look.



Monday, December 19, 2011

Recipe: Programmatically Creating and Updating AWS security groups

I think I've rewritten this code 3 times now in the last year so it seems prudent to save it somewhere.  If other folks find it useful that'd be great.

The problem is a simple one.  You're looking to setup and install of a few machines on EC2, perhaps to run something fun like a Cassandra cluster.

Typically it's really tempting to just setup the security group once and never ever touch it again.  I'd log into the the AWS console, and following along with this datastax guide I would manually setup the group, launch instances, etc.

However, without automation there's some duplication of effort whenever someone on your team sets up a cluster and possibility for user error setting up security groups.  And of course we're already automating the other important bits like "launch a new instance" or "run a backup" already so why not manage security groups with the same scripts?

I'm currently working with Fabric to automate EC2 stuff so I pulled out the Python code I'm using to handle creation of security groups and permission rules within those groups.

The script attempts to be idempotent.  The idea here is that simply rerunning the script will, only if necessary, create groups, revoke old rules and authorize any new ones.

Anyway, without further ado here's the script:


Thursday, September 15, 2011

Simple Nosy Script: Personal Continuous Integration for TDD

I recently cleaned up and resurrected my old nosy script.  These days there are a few alternatives on PyPI as well though I prefer mine.  The whole concept of nosy is simply to rerun the tests whenever the code changes.  Personally, I find a script like this is really helpful for maintaining flow while doing test driven development.

What I like about this nosy script is that it allows me to basically just say "run this nosetests / trial command every time the code changes" and nothing else.  There's no config file to setup or any tool specific arguments.  You just need to know how to use your test runner.

Here's the code in case anyone is interested: https://gist.github.com/1220683

Oh, and despite the name I've used it successfully to work on twisted projects with trial.  I'm also willing to bet that it would work just fine with py.test.

Edit:  The polling loop with a time.sleep(1) is eating away at me now that I've posted this.  I'm thinking that the only way to live with myself is to replace that with listening for real filesystem events ala inotify... So do ease my conscience I'll see about doing a followup post to show what the script would look like with filesystem events instead of the polling loop..

Tuesday, September 13, 2011

Git Pre-commit Hook For Python now with extra Python!

After reading a post by Bryce Verdier, and inspired by comments that suggested the Python version of said hook script would not be as nice as the bash version I decided to hack up a quick python version of the same script using pyflakes instead of pylint.

#!/usr/bin/env python
#-*- mode: python -*-

from subprocess import Popen, PIPE
import sys

syntax_checker = "pyflakes"

def run(command):
    p = Popen(command.split(), stdout=PIPE, stderr=PIPE)
    p.wait()
    return p.returncode, p.stdout.read().strip().split(), p.stderr.read()

_, files_modified, _= run("git diff-index --name-only HEAD")

for fname in files_modified:
    if fname.endswith(".py"):
        print >>sys.stderr, "Checking syntax on %s: ... "%(fname,),
        exit_code, _, errors = run("%s %s"%(syntax_checker, fname))
        if exit_code != 0:
            print >>sys.stderr, "\rChecking syntax on %s: FAILED! \n%s"%(fname, errors)
            sys.exit(exit_code)
        else:
            print >>sys.stderr, "\rChecking syntax on %s: OK!"%(fname,)

You can download / fork this here if would like to give it a try: https://gist.github.com/1214061 And of course, if you're like me and you have no idea what to do with this script you can just do the following:

cp pre-commit.py YourGitProject/.git/hooks/pre-commit
chmod +x YourGitProject/.git/hooks/pre-commit

It's also worth noting that this version is currently really strict.  ANY warnings will cause your commit to fail.  Of course, replacing pyflakes with pylint again is a simple modification of the syntax_checker variable in the above script.


Thursday, August 18, 2011

Installing Xcode 3 on Lion


Much thanks to AnatomicWax!

Just reposting here:
  • Mount the Xcode 3.2.6 DMG
  • Open Terminal
  • Enter the command: export COMMAND_LINE_INSTALL=1
  • Enter the command: open “/Volumes/Xcode and iOS SDK/Xcode and iOS SDK.mpkg”
And on a related note, when I screwed up experimenting with Xcode3 and 4 removing them became an issue:  This mac developer tip worked like a charm though:


sudo <Xcode>/Library/uninstall-devtools --mode=all

Monday, May 9, 2011

Setting up a new macbook for work

First day at the new job and I'm setting up a new macbook pro (macbookpro 8,2) for work.

Here's a list of things I'm doing to setup this machine as a reference for future installations:

* Ran "Software Update"
* Installed XCode (3.x because it's free and is included on your install media) and X11
* Installed Chrome
* Installed Firefox
* Installed Opera

Break for lunch, frustrating phone call with fifth third bank, and reset my MCS account passwords...

* After debating using macports, fink, or homebrew decided to give homebrew a try.
* brew install python (which installs readline, gdbm, and python2.7.1).
* Added /usr/local/share/python to PATH
* easy_install pip
* pip install virtualenv virtualenvwrapper mercurial
* hg clone https://bitbucket.org/steder/dotfiles etc
* cd etc && ./setup_links_mac.sh
* Added path customizations for homebrew to ~/.bashrc
* brew install emacs --cocoa --with-x
* ln -s /usr/local/Cellar/emacs/23.3/Emacs.app /Applications/
* brew install subversion
* Didn't find grep in the default homebrew (they include system duplicates by default) but apparently you can just point homebrew at a recipe on some random site and it'll install it... Not sure I'm happy with this but that's probably the reason homebrew runs as a non-root user... Basically you can do: sudo brew install https://github.com/adamv/homebrew/raw/duplicates/Library/Formula/grep.rb
* generated an ssh key: ssh -t rsa -b 4096
* brew install ssh-copy-id
* enabled sshd in the sharing control panel
* ssh-copy-id msteder@localhost
* sent authorized keys files to systems guys for setup on the MCS resources.
* brew install wget
* brew install proctools
* brew install openssl
* Installed Dropbox
* Installed http://www.levien.com/type/myfonts/inconsolata.html

I think that may be just about enough setup for today. I'm thinking next steps are going to be installing AWS tools like s3cmd, boto, etc.