Weblog of an Aspiring Computer Scientist
Random header image... Refresh for more!

Category — python

Soundex

I remembered reading about the soundex code used to encode similarly sounding names or names with similar spelling so that they could be retrieved easily.

This is what the soundex code for a word consists of:

  • The first letter of the name.
  • The letters A, E, I, O, U, Y, W and H are never used in encoding (except when they are the first character in a word)
  • The following table lists the character codes:
  • Code Key Letters
    1 B, P, F, V
    2 C, S, K, G, J, Q, X, Z
    3 D, T
    4 L
    5 M, N
    6 R
  • Letters are not encoded if the preceeding letters have the same code
  • Codes are truncated after the third digit.
  • Trailing zeros are appended as needed.

So, here is the snippet that should generate a soundex code for a given word:

code_dict = {‘A’:"0", "E":"0", "I":"0", "O":"0", "U":"0", ‘W’:"0", ‘H’:"0", ‘B’:1, ‘P’:1, ‘F’:1, ‘V’:1, ‘C’:2, ‘S’:2, ‘K’:2, ‘G’:2, ‘J’:2, ‘Q’:2, ‘X’:2, ‘Z’:2, ‘D’:3, ‘T’:3, ‘L’:4, ‘M’:5, ‘N’:5, ‘R’:6}

def codeGen(word):
word = word.upper()
code = [word[0]]
for i in range(1, len(word)):
if code_dict[word[i]] != "0":
if word[i] != word[i-1] and code_dict[word[i]] != code_dict[word[i-1]]:
code.append(str(code_dict[word[i]]))
else:
continue
else:
continue

if len(code) <= 4:
code += ["0"] * (4 - len(code))

else:
code = code[0:4]

return "".join(code)

#Tests
print codeGen("SCHAAK")
print codeGen("Lee")
print codeGen("Kuhne")
print codeGen("EBELL")
print codeGen("ebelson")
print codeGen("SCHAEFER")

The output:

$ python soundex.py
S200
L000
K500
E140
E142
S160

This problem was part of the Berkeley Programming Contest held in 2007 at the University of California Berkeley.

June 27, 2008   No Comments

An Anagram Detector

Yesterday, I tackled the last problem in the “Aha! Algorithms” chapter of Jon Bentley’s “Programming Pearls”, an anagram detector. It takes a file consisting of words separated by a line and return lists containing anagrams separated by a line. It does not construct anagrams though and just hunts down a file for them.

The first step in building this includes an element of identity common to anagrams. This is called a signature. All anagrams belonging to the same class have the same signature. I chose my signature to be a string that contains letters followed by the number of times they appear in the word (the letters are in alphabetical order). Example:

"alphabet" - "a2b1e1h1l1p1t1"

The next step involves using a script that will bring together words with the same signature and these obviously are anagrams. Finally, a function prints these anagrams.

This portion of Python code generates the signature:

def sign(word):
        char_list = list(word)
        char_list.sort()
        word_list = []
        for char in char_list:
                if not char in word_list:
                        word_list.extend([char, str(word.count(char))])
                else:
                        continue
        signature = "".join(word_list)
        return signature

Next comes the function that is supposed to bring the anagrams together (so to speak):

def anagramCollect(filename):
        """Obtain Sinatures and Create a Dict.
        key = the signature
        value = a list of words with same signature"
""
        f = open(filename, "r")
        ana_dict = {}
        for word in f:
                signature = sign(word[:-1])
                if ana_dict.has_key(signature):
                        ana_dict[signature].append(word[:-1])
                else:
                        ana_dict[signature] = [word[:-1]]
        f.close()
        return ana_dict

In the above function, I used word[:-1] to avoid appending the newline character that seems to be unavoidable while using “for word in f:”.

Finally, a function to check the anagram dictionary for anagrams (should include those values whose length is more than 1) and to print them out.

def printAnagram(ana_dict):
        ana_list = ana_dict.values()
        for value in ana_list:
                if len(value) > 1:
                        print value
                else:
                        continue

I decided to test this script on a file containing words separated by lines. I used the following words:

pans
pots
opt
snap
stop
tops

When I ran the script, I got:

 python anagram.py
['pans', 'snap']
['pots', 'stop', 'tops']

So, it works! The script finally looks like this.

Well, Happy Coding :D

June 14, 2008   5 Comments

Binary Search and Vector Rotation - Jon Bentley Inspired

Since I got my copy of “Programming Pearls” by Jon Bentley, I have hardly been able to enjoy the excellent writing skills of Jon Bentley. I finally got some time to read, “Aha! Algorithms” from his book and I encountered a section on “Binary Search”. I read somewhere that during a lecture at CMU, Jon said that most binary search implementations were broken. After reading a description of the problem from the book, I decided to whip up an implementation myself. Binary search involves probing the range to see on which side of the median, the number lies and the new range is that half. This process is repeated until the value is located. So, here is a very short binary search I wrote:

def search(num_list, num, begin, end):
        if end< begin:
                return "Number not found in list"
        pos = (begin+end)/2
        if num_list[pos] > num:
                return search(num_list, num, begin, pos - 1)
        elif num_list[pos] < num:
                return search(num_list, num, pos+1, end)
        else:
                return pos

num_list = [1, 2, 4, 5, 6, 8, 11, 34, 72, 112, 134, 145, 156, 167, 178, 189, 190, 210, 321, 432, 543, 654, 765, 876, 987]
value = 112

print search(num_list, value, 0, len(num_list))

I ran the script and this is what I got:

 python binsearch2.py
9

Another attempt:

num_list = [1, 2, 4, 5, 6, 8, 11, 34, 72, 112, 134, 145, 156, 167, 178, 189, 190, 210, 321, 432, 543, 654, 765, 876, 987]
value = 35
print search(num_list, value, 0, len(num_list))

The output:

python binsearch2.py
Number not found in list

The next problem in the same chapter involved rotating vectors. A beautiful algorithm was described in the book. To rotate “i” elements out of “n” in an array, the best solution I could think of was to throw i elements in a separate list and keep appending them in reverse order to a new list which contains the (n-i) elements. As usual, Jon Bentley had a better algorithm. He considered the list to be two groups “a” and “b”. So our list can be represented by “ab”. We need to obtain “ba”. If we denote a’s reverse as {a}^{r} and b’s reverse as {b}^{r}, then we get, ({{a}^{r} {b}^{r}})^{r} = ba. So, I implemented this algorithm in a few seconds using:

def rotate(list_name, i):
        ‘Rotate i elements of the given list’
        a = list_name[:i]
        b = list_name[i:]
        a.reverse()
        b.reverse()
        final_list = a+b
        final_list.reverse()
        return final_list

This is how it works:

print rotate([‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’], 3)

The solution:

 python vectorrot.py
['d', 'e', 'f', 'g', 'h', 'i', 'j', 'a', 'b', 'c',]

.

Well, there’s another problem that I shall solve later. The book is really a masterpiece. I had a sneak-peek at what was to follow the next problem and it turns out its an anagram constructor! Wee! I can foresee a fun filled week ahead!

EDIT: The binary search implementation now terminates when it can’t find the value in the list.

June 12, 2008   5 Comments

Observations About Strings

I have recently added the podcasts on http://blog.stackoverflow.com to my feeds and in one of the episodes (second, I think), someone called in and asked Joel Spolsky whether he should learn C++ vs something else and aside the OOP what differentiated C and C++. Joel explained the benefits of C to Jeff and mentioned an example whose shaky reproduction goes like this, “If you are asked to write a script to replace every “x” in a string with a “y”, then the first solution most high level programmers would think of is to go over all the characters in the string and when they find an “x”, they append a “y” to the new string or they append the currently existing letter. They just don’t know how inefficient this is.”

I don’t know C (but I am learning it now) but I guess he might be right. I wrote a recursive solution for someone on orkut who had problems with it and I employed such a method (but considering that Python strings are immutable, I was forced to use a list and finally do a “”.join(string_list) ). Listening to that podcast made me think about other ways of doing that and obviously slicing a string comes to mind. I cannot plainly replace positions in Python strings (they’re immutable) and converting it to a list, going over replacing and then rejoining it sort of defeats the purpose of setting out to optimize that code. Here is the solution I whipped up for someone called Hari Priya on Orkut:

def changeXY(string_name, index=0, final_string=[]):
        if index == len(string_name):
                return "".join(final_string)
        else:
                if string_name[index] == "x":
                        final_string.append("y")
                        index += 1
                else:
                        final_string.append(string_name[index])
                        index += 1
                return changeXY(string_name, index, final_string)
 

Clearly, not the best, there’s a load of repetition there and all that but since I am too bored to fix that up, I will proceed with my next solution.

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Filename: changeXY2.py

def changeXY(string_name):
        slices_list = []
        places = string_name.count("x")
        start = 0
        for i in xrange(places):
                position = string_name.find("x", start)
                slices_list.append(string_name[start:position]+"y")
                start = position+1
        slices_list.append(string_name[start:])
        return "".join(slices_list)
 

That is the solution where I construct portions of the string and join them at the end. Since the only idea I’ve got about efficiency is limited to checking the runtime, I did just that.

Step 1: Pick a string. I picked

boneywasawarriorwayayixkickedasterixandobelixbutt

.

Here is a time based comparision:

time python changeXY1.py
boneywasawarriorwayayiykickedasteriyandobeliybutt

real    0m0.301s
user    0m0.077s
sys     0m0.170s

That was the time taken for the recursive solution to replace all “x”s with “y”s.

time python changeXY2.py
boneywasawarriorwayayiykickedasteriyandobeliybutt

real    0m0.255s
user    0m0.077s
sys     0m0.124s

That was the time the second solution using slicing took to run. It took less time and hence it was more efficient than the recursive solution.

Did that deserve an entire blog post? Probably not. Still, stackoverflow’s podcasts are pretty good and any advice from Joel Spolsky is good advice. I might drop in a question but I have no idea what to ask him.
Well, Happy Coding :D

June 9, 2008   9 Comments

220, Anna Salai - Hurdle Cleared

Woohoo, I’ve managed to clear the hurdle at 220 Anna Salai. But let me tell you guys, Chennai can drive anyone mad. Here is what I recollect from my trip (I do not have a camera so no pics):

  • We drop in at a 3 star hotel named Dee Cee Manor ( who rates these hotels by the way? 3 stars is like overrating this establishment’s ridiculous rooms/food ).
  • The autos suck. Oh yes! Spencer’s Plaza is a cool shopping center and it is about 2 kilometers from Dee Cee Manor. The first auto driver I met said 120 rupees. F*ck, 120 ? For 2 kilometers? Right, I ask the bugger to put on the meter. He says no. The next guy I meet charges me 60. I climb in and realize that auto drivers in Hyderabad are better than most others
  • I then purchased two albums I had been dying to get my hands on, Paul McCartney’s Memory Almost Full and the #1 Beatle’s (I spent all my music aware days listening to the Beatles).
  • I then go to the consulate and the doors impress me.
  • The VO looks at my i20 and scribbles on my DS156 and then says in 1 week, your passport will be with you. Wee ! No questions !
  • Then I went to Tirupathi ( no stupid comments on my religious beliefs. My faith is my own ).
  • I took a train to the aforementioned location. A marwari family climbed and make a mess of everything, moving people around to accommodate that king sized, 20 member family of theirs. Bah, they kept shouting, put on music at 100 decibels ( and promptly earned abuse from my mother ). I know a marwari family myself and they are bloody educated and know how to behave in public. I still think people in our nation need to understand the notion of Public property and how all citizens are entitled to silence and deserve as much space as they have paid for ( which is limited to one seat in this case ).
  • I then realized that Tirupathi wasn’t something I could survive. Did nothing there.

Oh by the way, I solved a problem ( albiet an easy one ) from the YSU-ACM High School Programming Contest - 1991 ( The best I could perform when the silence was under siege by this bloody family) on the way to Tirupathi. Here is the question: http://www.cs.berkeley.edu/~hilfingr/programming-contest/ysu_94.pdf (the inverted triangles sum).

Here is the solution:

#!/usr/bin/env python
#Author: Shriphani Palakodety
#YSU-ACM programming contest
#1991 PSet
#Premature optimization is the root of all evil.
#        - Shriphani Palakodety ( 2008 A.D )
level = "To be given"
level_list = []

while True:
        if level != "":
                level = raw_input()
                level_list.append(level)
        else:
                break

level_list.pop(-1)

def printTriangle(level):
        for i in range(int(level), -1, -1):
                print "*" * i
               

for x in level_list:
        printTriangle(x)
 

Done. Expect an increase in post frequency now. Phew, the consulate appoinment gave me nightmares.

June 1, 2008   No Comments

A Few More Math Sums; Combinatorics and Triangular Numbers

I solved a couple of sums from the Project Euler archives and feel quite stupid since this is the first time in the last few weeks that a post has some code in it. I really ought to improve my efficiency. Well, the first sum I did was no. 42 and extremely special since it is the ultimate answer to the ultimate question. Well, the question was:

By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 + 11 + 25 = 55 = t10. If the word value is a triangle number then we shall call the word a triangle word.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, how many are triangle words?

So, I whipped up a solution in no time. I was only hindered when it came to figuring out whether a number was a perfect square or not and I was trying to explore better algorithms that mine which involved squaring the integral part of the square root of type float obtained by using math.sqrt(num) and squaring this integral part to see if we return to the original number again. I couldn’t really find a decent solution so I just implemented my own answer:

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Example 42, Project Euler Archives.

import string
import math

def makeAlphaDict():
        alpha_dict = {}
        for i in range(len(string.uppercase)):
                alpha_dict[string.uppercase[i]] = i + 1
        return alpha_dict

alpha_dict = makeAlphaDict()

def checkTriangular(word):
        word_sum = 0
        for char in word:
                if char == ‘"’:
                        continue
                else:
                        word_sum += alpha_dict[char]
        discrmnt = 8 * word_sum + 1
        if checkOdd(int(math.sqrt(discrmnt))) == True and checkPerfSquare(discrmnt) == True:
                return True
        else:
                return False
       
def checkPerfSquare(num):
        probable = int(math.sqrt(num))
        if probable * probable == num:
                return True
        else:
                return False

def checkOdd(num):
        if num % 2 == 0:
                return False
        else:
                return True

input = open("words.txt", "r")
words_list = input.readlines()[0].split(",")
input.close()
word_count = 0

for word in words_list:
        print word
        if checkTriangular(word) == True:
                word_count += 1
        else:
                continue

print word_count
 

Thats the first question. The next problem I solved was #53;

How many values of nCr, for 1 ? n ? 100, are greater than one-million?

Since \binom {n} {r} is a symmetric function about the middle term while r increases from 0 to n, I figured out that once we hit upon one (n, r) pair, without computing the rest, we could obtain the other values of r for a fixed value of n that crossed the million mark. Remember, if n is even, the number of terms is odd and if n is odd, number of terms is even. In case the number of terms is even, the middle terms are n / 2 and n / 2   1. In case, the number of terms is odd, the middle term is (n   1) / 2. Using just these three I could get to the solution in no time. Here is the code:

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Project Euler Problem 53

def factorial(number):
        if number == 0 or number == 1:
                return 1
        else:
                return number * factorial(number - 1)

def combination(n, r):
        n_fact = factorial(n)
        r_fact = factorial(r)
        n_r_fact = factorial(n - r)

        return n_fact / (r_fact * n_r_fact)

def midTerm(n):
        if n % 2 == 0:
                mid_term_1 = n / 2
                return mid_term_1
        else:
                mid_term = (n + 1) / 2
                return mid_term

comb_num = 0

def countComb(n, r):
        ‘If a particular value is found to be greater than 0,
           it adds a certain number to comb_sum using properties of
           combinatorics.’

       
        if n % 2 != 0:
                num_to_add = 2 * (n / 2 - r + 1)
                return num_to_add
       
        else:
                num_to_add = 2 * ((n+1)/2 - r) + 1
                return num_to_add

for n in range(101):
        for r in range(midTerm(n) + 1):
                if combination(n, r) > 1000000:
                        comb_num += countComb(n, r)
                        break
                else:
                        continue

print comb_num

Thats the solution. And of course, who’ll forget the time taken to get to the solution.

shriphani@psp-laptop:~/project_euler$ time python ex53.py
4075

real    0m1.116s
user    0m0.140s
sys     0m0.024s
shriphani@psp-laptop:~/project_euler$

Till the next time I put up a post, HAPPY CODING !

May 18, 2008   No Comments

Python Variables

For a very long time, I was under the impression that variables in Python were like boxes and I could throw objects in them. So, according to me, if I wrote a piece of code that went:

>>> a = "Shriphani Palakodety"
>>> print a
Shriphani Palakodety

I used to think that there is a new box and I’ve thrown the string “Shriphani Palakodety” in there. If I do:

>>> a = "Shriphani Palakodety"
>>> a = "Doofus"
>>> print a
Doofus

the general idea is that “Doofus” replaces “Shriphani Palakodety” in box “a”.

I was reading a mail on Edu-Sig (Python) and John Zelle ( he wrote an introductory computer science book ) mentioned that variables were not boxes but merely labels ( or tags, or “Sticky Notes” whatever you want to call them ). So, if I do something like:

>>>a = "Shriphani Palakodety"

All I am doing is sticking “Sticky Note” “”a”" on an object ( in this case, “Shriphani Palakodety” ). When I do something like:

>>>a = "Shriphani Palakodety"
>>>a = "Doofus"

all I have done is move the Sticky Note “a” to “Doofus” and not remove “Shriphani Palakodety” from box “a” and put “Doofus” in it. Here is my idea of a verification:

>>>a = "Shriphani Palakodety"
>>>print id(a)
10353360
>>>a = "Doofus"
>>>print id(a)
10386016

So, if a was a box, its location shouldn’t have changed. But we observe that a’s location changes.

But the Sticky Note analogy can’t be entirely applied to variables. In real life, I can stick one sticky note over another and I can have three sticky notes such that one is stuck to the object and the other two are stuck to the sticky note itself. The figure below should illustrate that:

3 Sticky Notes Attached One Above Another

The above picture should depict the following Python code:

>>> x = y = z = 0
>>> x
0
>>> y
0
>>> z
0

So far so good. But let us say, I want to remove the Sticky-Note “y”. In real life that move would be represented by:

\

This means that “z” would be affected too which is not how it works in case of Python. Hence we come to the conclusion that a sticky note should be stuck only on the object and not on another sticky note.

OR

Python’s variables ( Sticky Notes ) come with a special kind of glue that isn’t sticky on material from which the sticky notes are made. So “x”, “y” and “z” should be stuck to the object but not to one another.

This completes what I learned from the Edu-Sig mailing list so far and I hope I joined this list earlier. John Zelle ( who wrote the textbook that MIT now uses for its introductory CS ) is there and I hope to keep learning more.

Well, I would love comments that will improve my knowledge but a comment on my bad GIMP skills will earn you spam and of course, I will call you names as well.

May 11, 2008   2 Comments

Prime Minister’s Office - Comic Sans ?

In the recent controversy in the Indian government, a politician seems to have abused his power by assigning tenders ( ? ) to his son. A news channel showed us a letter yesterday that seems to have originated from the Prime Minister’s office concerning this and as the newsreader shouted at the top of his voice ( I hate all Indian news channels because the readers either smile while delivering news like mass murder or a large number of deaths and they shout at the top of their voice. This is probably what they learned at a young age when their parents told them to speak at the top of their voice in a debate. I just hate them ). A surge of anger cruised through me when I saw that letter. IT WAS WRITTEN IN COMIC SANS !

Which professional uses comic sans and that too to draft a letter from the Prime Minister’s Office ? Comic sans is the least professional typeface on the planet and using it to draft formal letters ? I want the PMO to give a public apology for their careless use of typefaces.

Right, in some other news concerning what I love - programming, I have managed to solve another sum from the North America, East/Central 2007 problems set. This problem was about a combination lock with a few conditions as follows:

1. The lock dial must first be spun clockwise at least one full rotation, ending with the number x at the top (with no intervening counterclockwise turns). Note this could be accomplished with consecutive clockwise turns.
2. The lock must be turned counterclockwise until the number y appears at the top for the second time. Note this could be accomplished with consecutive counterclockwise turns (but no intervening clockwise turns).
3. The lock must then be turned clockwise until the number z appears on top, without going more than one full rotation. Note this could be accomplished with consecutive clockwise turns (but no intervening counterclockwise turns).

I wrote this to solve the question:

#!/usr/bin/env python
#ICPC Locks Problem:
#Author: Shriphani Palakodety

class Lock:
    ‘The lock class’
    def __init__(self, dig_on_dial, combination, x_state=0, y_state=0, z_state=0):
        self.dig_on_dial = dig_on_dial
        self.x_state = x_state
        self.y_state = y_state
        self.z_state = z_state
        self.combination = combination
        self.xturns = 0
        self.yturns = 0
        self.zturns = 0
    def __str__(self):
        return "(" + str(self.x_state) + ", " + str(self.y_state) + ", " + str(self.z_state) + ")"
   
    def x_turn(self):
        ‘One Clockwise turn of the first ring of numbers’
        if self.x_state == 0:
            self.x_state = self.dig_on_dial
            self.xturns += 1
            return self
        else:
            self.x_state -= 1
            return self
   
    def y_turn(self, no_of_turns):
        ‘One Counter Clockwise turn’
        if self.y_state == self.dig_on_dial:
            self.y_state = 0
            self.yturns += 1
            return self
        else:
            self.y_state += 1
            return self
   
    def z_turn(self):
        ‘One clockwise turn’
        if self.z_state == 0:
            self.z_state = self.dig_on_dial
            self.zturns += 1
            return self
        else:
            self.z_state -= 1
            return self
   
    def isClosed(self):
        ‘Check if lock is closed’
        if self.xturns < 1 or self.yturns < 1 or self.zturns < 1:
            return True
        elif not self.x_state == self.combination[0] and self.y_state == self.combination[1] and self.z_state == self.combination[2]:
            return True
        else:
            return False
 

Well, I’ll post more on my problem solving spree. I posted on Prof. Phil G’s forums for his advice and he is one cool professor. I really want to keep my personal whatnots away from my blog so that won’t find its way here.

May 1, 2008   No Comments

Roman Numerals - More Python

Since I love Python and since 90 % of those online judges won’t allow me to use Python to submit my solutions, I have to contend with writing the script and hoping it is right. I am now solving a few ICPC sums from last year and I absolutely love them. I first picked the North America, East/Central Problem Set and found this question on roman numerals. Basically subtracting two supplied roman numbers. Guess what this reminded me of, Claude Shannon of course ! Shannon designed a box that used only roman numbers. All hail God Shannon - the best the world had to offer. Well, I hacked up a simple solution in a few seconds. It looks horrible but who cares.

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Roman Numerals Problem.
#Solution to:
#ICPC North America East-Central Problem1

#Roman to Decimal dictionary

rom2dec_dict = {"I":1, "V":5, "X":10, "L":50, "C":100, "D":500, "M":1000}

def romToDec(roman_string):
    ‘Convert from Roman to Decimal’
    #Move from right to left.
    number_dec = 0
    i = len(roman_string) - 1
    if i % 2 != 0:
        while i > 0:
            if rom2dec_dict[roman_string[i]] > rom2dec_dict[roman_string[i-1]]:
                number_dec += rom2dec_dict[roman_string[i]] - rom2dec_dict[roman_string[i-1]]
                i = i - 2    
                continue
            else:
                number_dec += rom2dec_dict[roman_string[i]] + rom2dec_dict[roman_string[i-1]]
                i = i - 2
                continue
    else:
        while i > 1:
            if rom2dec_dict[roman_string[i]] > rom2dec_dict[roman_string[i-1]]:
                number_dec +- rom2dec_dict[roman_string[i]] - rom2dec_dict[roman_string[i-1]]
                i = i - 2
                continue
            else:
                number_dec += rom2dec_dict[roman_string[i]] + rom2dec_dict[roman_string[i-1]]
                i = i - 2
                continue
        number_dec += rom2dec_dict[roman_string[0]]
    return number_dec

 

There, first ICPC problem. I would love to crunch this set in this week. Some problems seem tough and I really would love to solve them all. I found another interesting problem on circular combination locks. I am still thinking about that one and I hope I can solve it tomorrow.

Well, I am just loving the freedom. Happy Coding! :)

April 29, 2008   No Comments

Sorting Algorithms - Python Implementations

Well, I have to admit, I don’t know how to begin 1/2 the posts on topics like these. Let me just point out that I was trying to arrange a list of words in dictionary order without using the sort() method available for lists in Python and since I am a lazy what-not, I only wrote a small bubblesort implementation the crux of which was:

import string

alphas = string.uppercase

alpha_dict = {}

for i in range (1, 27):

    alpha_dict[alphas[i-1]] = i

So, I created a dictionary which went like:
{”A”: 1, “B”: 2, “C”: 3 ………, “Z”: 26}

One of the words in the list which had to be sorted was “BABELFISH”. Using the dictionary, I created a list consisting of numbers corresponding to the letters which composed the word.

nums_list = []

for char in word:

    nums_list.append(str(alpha_dict[char]))

    num = int(("").join(nums_list))

Then comes the actual bubblesort implementation. And when OpenBySource dissed me for sticking to something so inefficient, I decided I would go and read as much about sorting as I could and well, here is what I have learned.

First, the sorting algorithm that I always knew, BubbleSort. The algorithm is pretty simple. Begin at the first element, if this element is greater than the next element, swap their positions and proceed to the next element. This is how it is supposed to go. Here is a Python implementation:

#!/usr/bin/env python

#Author: Shriphani Palakodety

#Bubble Sort Algorithm Implementationimport copy

def sorter(num_list):

    new_num_list = copy.copy(num_list)

    n = 0

    for i in range(len(num_list)-1):

        if num_list[i] < num_list[i+1]:
continue

        else:

            num_list[i+1] = new_num_list[i]

            num_list[i] = new_num_list[i+1]

            new_num_list = copy.copy(num_list)

            n += 1
return num_list, n

num_list = [2,7,1,5,8,6]

def checker(num_list):

    while True:

        sacred_tuple = sorter(num_list)

        if sacred_tuple[1] != 0:

            num_list = sacred_tuple[0]

        else:

            return sacred_tuple[0]
print checker(num_list)

I can see the question coming up ( if it isn’t, please don’t bother telling me ). “What is that checker() doing there ?”. The thing is that you can always keep running the sorter function. The checker stops the sorting process run by the sorter function when no sorting is performed when sorter(list_name) is run.

The next algorithm I encountered struck me as stupid. Every single kid must have used it. I am ashamed of myself to even have written this thing. Anyway, it is called the “Selection Sort” algorithm. You pick the least valued term in a list, assign it to the first position, from the remaining elements, pick the smallest number, put it at the second position and so on. Here is what I wrote ( and what every 5 year old out there could have written ):

#!/usr/bin/env python
#Author: Shriphani Palakodety

def selSort(list_name):
    sorted_list = []
    while True:
        least_number = min(list_name)
        sorted_list.append(least_number)
        list_name.remove(least_number)
        if len(list_name) == 0:
            return sorted_list
            break
        else:
            continue

unsorted_list = [2, 1, 8, 10, 12, 9, 42, 21]
print selSort(unsorted_list)

Actually the code displayed above is a cross between what is known as the “Insertion Sort” algorithm and the “Selection Sort” algorithm. In the insertion sort algorithm, the sorted list is built one element at a time.

The next algorithm I am going to cover is the “Merge Sort” algorithm. This algorithm works by diving a list into a number of sublists and then merging them back to get a sorted list.

Here is the merge function:

def merger(left, right):
    left_index, right_index = 0, 0
    result = []
    while left_index < len(left) and right_index < len(right):
        if left[left_index] < right[right_index]:
            result.append(left[left_index])
            left_index = left_index + 1
        else:
            result.append(right[right_index])
            right_index = right_index + 1
     result += left[i:]
     result += right[j:]
     return result

And of course, now comes the splitter() function. I am too tired to explain wtf is going on.

def splitter(list_name):
    if len(list_name) < 2:
        return list
    else:
        middle = len(list_name) / 2
        left = splitter(list_name[:middle])
        right = splitter(list_name[middle:])
    return merger(left, right)

Happy Coding.

April 17, 2008   No Comments