Suffix Trees

Since I was quite bored this week (thanks to an October break I spent entirely in my room) and a homework assignment that quite seriously sucked balls, I decided to read something about strings and read about Suffix trees and decided to whip up an algorithm that would create a neat suffix tree by walking over a string and failed incredibly to get it to split at the right positions. Which means that while the suffix tree for “mississippi” is:

       |(1:mississippi)|leaf
tree:|
      |     |       |(6:ssippi)|leaf
      |     |(3:ssi)|
      |     |       |(9:ppi)|leaf
      |(2:i)|
      |     |(9:ppi)|leaf
      |
      |     |      |(6:ssippi)|leaf
      |     |(4:si)|
      |     |      |(9:ppi)|leaf
      |(3:s)|
      |     |     |(6:ssippi)|leaf
      |     |(5:i)|
      |     |     |(9:ppi)|leaf
      |
      |     |(10:pi)|leaf
      |(9:p)|
      |     |(11:i)|leaf

all I can get is a single level tree and I have no motivation whatsoever to get it to work correctly considering that we are covering computational complexity theory and that kenneth rosen makes it so hard for mankind to put his book down. By the way, here is my partial (decent word for “wrong”) suffix tree generator:

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Mail: shriphani@shriphani.com
import sys
tree = {}

#Algorithm:
#Move through the string and pick characters and make splits where appropriate.

def parseString(stringName):
    stringName = stringName.lower() #Make it all bloody lowercase
    for i in xrange(len(stringName)):
        if tree.has_key(stringName[i]):
            appendChar(stringName[i],[stringName[i+1:]]) #Add a new branch if this character didn’t exist previously
        else:
            tree[stringName[i]]=[stringName[i+1:]]
    return tree

def appendChar(char, stringName):
    ‘Add a new branch to a tree’
    tree[char].append( stringName[0])

if __name__ == "__main__":
    stringName = sys.argv[1]
    print parseString(stringName)

And this is what I get for a bunch of strings:

shriphani@linux-s359:~/scripts/Strings> python suffix.py mississippi
{'i': ['ssissippi', 'ssippi', 'ppi', ''], 'p': ['pi', 'i'], 's': ['sissippi', 'issippi', 'sippi', 'ippi'], 'm': ['ississippi']}

shriphani@linux-s359:~/scripts/Strings> python suffix.py shriphani
{'a': ['ni'], 'i': ['phani', ''], 'h': ['riphani', 'ani'], 'n': ['i'], 'p': ['hani'], 's': ['hriphani'], 'r': ['iphani']}

shriphani@linux-s359:~/scripts/Strings> python suffix.py palakodety
{'a': ['lakodety', 'kodety'], 'e': ['ty'], 'd': ['ety'], 'k': ['odety'], 'l': ['akodety'], 'o': ['dety'], 'p': ['alakodety'], 't': ['y'], 'y': ['']}

shriphani@linux-s359:~/scripts/Strings> python suffix.py scrappycoco
{'a': ['ppycoco'], 'c': ['rappycoco', 'oco', 'o'], 'o': ['co', ''], 'p': ['pycoco', 'ycoco'], 's': ['crappycoco'], 'r': ['appycoco'], 'y': ['coco']}

Bingo.

Anyway, I was listening to a bunch of Erik Demaine lectures courtesy OCW (and I am extremely jealous of Demaine, he got his BS at age 14 !) and I sort of wondered when he mentioned that the Big O didn’t work when we had a chain of functions such as n2 = O(n3) = O(n4) but this doesn’t carry on indefinitely.

I couldn’t help wondering:
big-o-vs-small-o

So n = o(n2) and then I am left wondering, but I wrote n = O(n2);c=1, n0=1 and here I am stating that I have n is << n^2 so it is no longer an asymptotic upper bound…. So my question is, am I getting everything wrong here?

Leaving all that aside, I am setting new records for low personal hygiene standards (although I am quite careful about dental, hand and the rectal departments)… college life rocks….


About this entry