Shriphani Palakodety

In Pursuit Of Truth and Beauty

Shriphani Palakodety header image 2

Boyer – Moore String Search (Good Suffix Shift)

December 7th, 2008 · 1 Comment · Computer Science, python

The next string search algorithm I am going to cover seems pretty cool since the match is made in a reverse fashion. We check if the last character in the keyword equals some character in the text and decrease from that point on.As a result, a mismatch is detected extremely fast and this is also the reason why this algorithm is performs beautifully when the keyword is longer.

Now, when a mismatch occurs, the position where we look for the next character is determined by a table of values mapping each character in the keyword to an integer. Example, we find an “a” in the text which at a particular state is sitting atop an M in the keyword, we shift the position in the text by the corresponding number in our table and look again.

Here is how we generate the table:

bcs = {} #the table

def goodSuffixShift(key):

        for i in xrange(len(key)-1, -1, -1):
                if key[i] not in bcs.keys():
                        bcs[key[i]] = len(key)-i-1
 

Next, comes the actual search:

def search(text, key):
       
        i = len(key)-1

        index = len(key) -1

        j = i

        while True:

                print str(i) + ", " + str(j)

                if i == 0:
                        return j #if every character in the keyword matches, return current position in text

                elif j > len(text):
                        return "not found" #if we run out of the string while searching, there is no chance…

                elif text[j] != key[i] and text[j] not in bcs.keys():
                        j += len(key) #If there isn’t a match and we don’t have a switch value in our table, don’t bother looking for characters in the middle and start matching again
                        i = index

                elif text[j] != key[i] and text[j] in bcs.keys():
                        j += bcs[text[j]] #move forward as table says and start looking again
                        i = index

                else:
                        j -= 1 #in case a match is found, check for the character to the left
                        i -= 1

 

Here are some sample runs:

key_list = ["POWER", "HOUSE", "COMP", "SCIENCE", "SHRIPHANI", "BRUAH"]

text = "SHRIPHANI IS A COMPUTER SCIENCE POWERHOUSE"

for key in key_list:
        goodSuffixShift(key)
        print key + " location: " + str(search(text, key))
        bcs = {}
 

The output:

tark-b-120:scripts shriphani$ python boyer_moore.py
POWER location: 32
HOUSE location: 37
COMP location: 15
SCIENCE location: 24
SHRIPHANI location: 0
BRUAH location: not found

You can get the entire script at http://shriphani.com/scripts/boyer_moore.py.

Got to get back to Psychology and GUI stuff in Java. Anyway, GUIs are for losers, CLI FTW !!!

Tags:

One Comment so far ↓

  • meade

    There’s a problem with the code if you search for a single character, ‘A’ for example. I changed the following to make it work:
    while True:
    if i < 0: #changed
    return j + 1 #changed

    in the search function

Leave a Comment