An Anagram Detector

Yesterday, I tackled the last problem in the “Aha! Algorithms” chapter of Jon Bentley’s “Programming Pearls”, an anagram detector. It takes a file consisting of words separated by a line and return lists containing anagrams separated by a line. It does not construct anagrams though and just hunts down a file for them.

The first step in building this includes an element of identity common to anagrams. This is called a signature. All anagrams belonging to the same class have the same signature. I chose my signature to be a string that contains letters followed by the number of times they appear in the word (the letters are in alphabetical order). Example:

"alphabet" - "a2b1e1h1l1p1t1"

The next step involves using a script that will bring together words with the same signature and these obviously are anagrams. Finally, a function prints these anagrams.

This portion of Python code generates the signature:

def sign(word):
        char_list = list(word)
        char_list.sort()
        word_list = []
        for char in char_list:
                if not char in word_list:
                        word_list.extend([char, str(word.count(char))])
                else:
                        continue
        signature = "".join(word_list)
        return signature

Next comes the function that is supposed to bring the anagrams together (so to speak):

def anagramCollect(filename):
        """Obtain Sinatures and Create a Dict.
        key = the signature
        value = a list of words with same signature"
""
        f = open(filename, "r")
        ana_dict = {}
        for word in f:
                signature = sign(word[:-1])
                if ana_dict.has_key(signature):
                        ana_dict[signature].append(word[:-1])
                else:
                        ana_dict[signature] = [word[:-1]]
        f.close()
        return ana_dict

In the above function, I used word[:-1] to avoid appending the newline character that seems to be unavoidable while using “for word in f:”.

Finally, a function to check the anagram dictionary for anagrams (should include those values whose length is more than 1) and to print them out.

def printAnagram(ana_dict):
        ana_list = ana_dict.values()
        for value in ana_list:
                if len(value) > 1:
                        print value
                else:
                        continue

I decided to test this script on a file containing words separated by lines. I used the following words:

pans
pots
opt
snap
stop
tops

When I ran the script, I got:

 python anagram.py
['pans', 'snap']
['pots', 'stop', 'tops']

So, it works! The script finally looks like this.

Well, Happy Coding :D

7 comments ↓

#1 Mike Pirnat on 06.14.08 at 12:46 pm

To get around the repeated slicing of the last character in your word, you could always do:

for word in f:
word = word.strip()

#2 DiThi on 06.16.08 at 2:58 am

Here’s my signature maker:

def sign(word):
x=list(word)
x.sort()
return ”.join(x)

It’s a different format of signature but I’m sure it’s faster.

#3 Ben on 06.16.08 at 11:47 am

@DiThi: good call.

If you’re using a recent Python, consider just doing: “”.join(sorted(WORD)).

#4 J on 06.22.08 at 2:38 am

To be even lazy, we can do:

from itertools import groupby

def sign(xs):
return “”.join(x + str(len(list(g)))
for x, g in groupby(sorted(xs)))

#5 Justin on 06.24.08 at 7:15 pm

def sign(word):
#can’t hash a list
return tuple(sorted(word))

def anagramCollect(filename):
f = open(filename, “r”)
ana_dict = {}
for word in f:
word = word.rstrip()
signature = sign(word)
ana_dict.setdefault(signature,[]).append(word)
f.close()
return ana_dict

#6 Maarten van Emden on 07.12.08 at 7:00 pm

I found your blog when searching for the source of the anagram program I vaguely remembered. So it was Bentley. Thanks. I also found http://pramode.net/2005/11/18/teaching-tips-the-anagram-problem-solved-in-unix-style/
Programming it is fun, but it is also fun to do it without programming, which is what Bentley showed. Bentley tells how Tom Cargill (another person worth knowing) describes the method: “first sort this way (moves hand from left to right), then sort that way (moves hand up and down).

#7 Shriphani’s Weblog - Huge Anagram List on 08.21.08 at 10:51 pm

[...] got on this blog) over to my account. Then, I ran a very huge file full of words through my anagram detector and the results were astonishing, a record 1.93 seconds to parse over 10000 words and pick [...]

Leave a Comment