June 27th, 2008 — python
I remembered reading about the soundex code used to encode similarly sounding names or names with similar spelling so that they could be retrieved easily.
This is what the soundex code for a word consists of:
- The first letter of the name.
- The letters A, E, I, O, U, Y, W and H are never used in encoding (except when they are the first character in a word)
- The following table lists the character codes:
| Code |
Key Letters |
| 1 |
B, P, F, V |
| 2 |
C, S, K, G, J, Q, X, Z |
| 3 |
D, T |
| 4 |
L |
| 5 |
M, N |
| 6 |
R |
- Letters are not encoded if the preceeding letters have the same code
- Codes are truncated after the third digit.
- Trailing zeros are appended as needed.
So, here is the snippet that should generate a soundex code for a given word:
code_dict =
{‘A’:
"0",
"E":
"0",
"I":
"0",
"O":
"0",
"U":
"0",
‘W’:
"0",
‘H’:
"0",
‘B’:
1,
‘P’:
1,
‘F’:
1,
‘V’:
1,
‘C’:
2,
‘S’:
2,
‘K’:
2,
‘G’:
2,
‘J’:
2,
‘Q’:
2,
‘X’:
2,
‘Z’:
2,
‘D’:
3,
‘T’:
3,
‘L’:
4,
‘M’:
5,
‘N’:
5,
‘R’:
6}
def codeGen(word):
word = word.upper()
code = [word[0]]
for i in range(1, len(word)):
if code_dict[word[i]] != "0":
if word[i] != word[i-1] and code_dict[word[i]] != code_dict[word[i-1]]:
code.append(str(code_dict[word[i]]))
else:
continue
else:
continue
if len(code) <= 4:
code += ["0"] * (4 - len(code))
else:
code = code[0:4]
return "".join(code)
#Tests
print codeGen("SCHAAK")
print codeGen("Lee")
print codeGen("Kuhne")
print codeGen("EBELL")
print codeGen("ebelson")
print codeGen("SCHAEFER")
The output:
$ python soundex.py
S200
L000
K500
E140
E142
S160
This problem was part of the Berkeley Programming Contest held in 2007 at the University of California Berkeley.
June 19th, 2008 — Daily life
Right, the greatest CS text in the world has graced the our house and our family is sure to churn out computer scientists for the next few generations.
If you still haven’t guessed it, I have purchased the first volume of “The Art of Computer Programming” by Donald Knuth, professor emeritus of the art of computer programming at Stanford University, receipient of the Turing award, creator of TeX and what not.
The book I have is a low price edition (the benefit of living in Asia) and hence I could get the greatest text of the century for a mere USD 10.6 .
The book is a masterpiece and I couldn’t help figuring out the enthusiast Knuth was. Right on the first page I saw:
“This series of books is affectionately dedicated to the type 650 computer once installed at the Case Institute of Technology, in remembrance of many pleasant evenings.”
I still don’t have volumes 2 and 3 and I will get them soon.
I was worried at first whether I would be able to understand the contents but it turns out that all I need to know to begin using the book is “in my head”.
Here are a few pics of this masterpiece.

Another pic:

Yet another pic:

Its adventure time. I have an ebook on algorithms by Umesh Vazirani but its just so much fun to be able to hold a book and read it.
June 17th, 2008 — Mathematics
Well, it is the first time I’ve really had a serious look at the big O notation. I actually knew what it was about etc, but I never really delved into it.
I was debating whether I needed to shell out money on a Cormen + Leiserson work but then I came across the penultimate draft of “Algorithms” by Umesh Virkumar Vazirani, Sanjoy Dasgupta and Chris Papadimitriou (how do you pronounce it? ) on Vazirani’s homepage: http://www.cs.berkeley.edu/~vazirani/algorithms.html
I decided to pick the big O notation and after not reading it right the first three times, I finally began to get a hang of things and decided to solve the problems at the end of the big O notation chapter and here is how it went:
Question 1 is boring and simple algebra, so we skip it.
Question 2):
Show that if c is a positive real number, then g(n) = 1 + c + c2 + c3 + . . . . + cn is :
(a). ?(1) if c < 1
(b). ?(n) if c = n
(c). ?(cn) if c > 1
The solution is simple. We know that g(n) is the sum of a geometric sequence. When the common ratio is less than 1, we know that the value of g(n) becomes: 1 / (1 - c). Since this solution has clearly defined bounds, we know that g(n) when c < 1 is ?(1).
Next, when c = 1, we know that every single term is going to be 1 (1 raised to any power is 1). Hence, the computer steps required to approach the solution are:
- (n-1) steps to go from c to cn and list them down as ones.
- 1 step to perform the addition.
The result is ?(n).
The final problem can be solved by observing that:

Here a = 1 and r = c.
It can be observed that 1 - r in the denominator is a constant and can be removed. After carrying out a similar process in the numerator we are left with rn+1 . This can be written as r * rn and since r is a constant, we can safely assume that g(n) is ?(cn).
I actually liked the prologue (believe it or not, this was part of the prologue. I have to get to the first chapter).
Disclaimer: In this post, one image was copied off Wikipedia and the question solved can be found in the penultimate draft of the book, “Algorithms” by Umesh Virkumar Vazirani, C. H. Papadimitriou and Sanjoy Dasgupta.
Enjoy this treat, I am not too sure how much money you need to shell out for the final edition of the book.
June 14th, 2008 — python
Yesterday, I tackled the last problem in the “Aha! Algorithms” chapter of Jon Bentley’s “Programming Pearls”, an anagram detector. It takes a file consisting of words separated by a line and return lists containing anagrams separated by a line. It does not construct anagrams though and just hunts down a file for them.
The first step in building this includes an element of identity common to anagrams. This is called a signature. All anagrams belonging to the same class have the same signature. I chose my signature to be a string that contains letters followed by the number of times they appear in the word (the letters are in alphabetical order). Example:
"alphabet" - "a2b1e1h1l1p1t1"
The next step involves using a script that will bring together words with the same signature and these obviously are anagrams. Finally, a function prints these anagrams.
This portion of Python code generates the signature:
def sign(word):
char_list = list(word)
char_list.sort()
word_list = []
for char in char_list:
if not char in word_list:
word_list.extend([char, str(word.count(char))])
else:
continue
signature = "".join(word_list)
return signature
Next comes the function that is supposed to bring the anagrams together (so to speak):
def anagramCollect(filename):
"""Obtain Sinatures and Create a Dict.
key = the signature
value = a list of words with same signature"""
f = open(filename, "r")
ana_dict = {}
for word in f:
signature = sign(word[:-1])
if ana_dict.has_key(signature):
ana_dict[signature].append(word[:-1])
else:
ana_dict[signature] = [word[:-1]]
f.close()
return ana_dict
In the above function, I used word[:-1] to avoid appending the newline character that seems to be unavoidable while using “for word in f:”.
Finally, a function to check the anagram dictionary for anagrams (should include those values whose length is more than 1) and to print them out.
def printAnagram(ana_dict):
ana_list = ana_dict.values()
for value in ana_list:
if len(value) > 1:
print value
else:
continue
I decided to test this script on a file containing words separated by lines. I used the following words:
pans
pots
opt
snap
stop
tops
When I ran the script, I got:
python anagram.py
['pans', 'snap']
['pots', 'stop', 'tops']
So, it works! The script finally looks like this.
Well, Happy Coding 
June 12th, 2008 — python
Since I got my copy of “Programming Pearls” by Jon Bentley, I have hardly been able to enjoy the excellent writing skills of Jon Bentley. I finally got some time to read, “Aha! Algorithms” from his book and I encountered a section on “Binary Search”. I read somewhere that during a lecture at CMU, Jon said that most binary search implementations were broken. After reading a description of the problem from the book, I decided to whip up an implementation myself. Binary search involves probing the range to see on which side of the median, the number lies and the new range is that half. This process is repeated until the value is located. So, here is a very short binary search I wrote:
def search
(num_list, num, begin, end
):
if end< begin:
return "Number not found in list"
pos =
(begin+end
)/
2
if num_list
[pos
] > num:
return search
(num_list, num, begin, pos -
1)
elif num_list
[pos
] < num:
return search
(num_list, num, pos
+1, end
)
else:
return pos
num_list = [1, 2, 4, 5, 6, 8, 11, 34, 72, 112, 134, 145, 156, 167, 178, 189, 190, 210, 321, 432, 543, 654, 765, 876, 987]
value = 112
print search(num_list, value, 0, len(num_list))
I ran the script and this is what I got:
python binsearch2.py
9
Another attempt:
num_list = [1, 2, 4, 5, 6, 8, 11, 34, 72, 112, 134, 145, 156, 167, 178, 189, 190, 210, 321, 432, 543, 654, 765, 876, 987]
value = 35
print search(num_list, value, 0, len(num_list))
The output:
python binsearch2.py
Number not found in list
The next problem in the same chapter involved rotating vectors. A beautiful algorithm was described in the book. To rotate “i” elements out of “n” in an array, the best solution I could think of was to throw i elements in a separate list and keep appending them in reverse order to a new list which contains the (n-i) elements. As usual, Jon Bentley had a better algorithm. He considered the list to be two groups “a” and “b”. So our list can be represented by “ab”. We need to obtain “ba”. If we denote a’s reverse as
and b’s reverse as
, then we get,
=
. So, I implemented this algorithm in a few seconds using:
def rotate(list_name, i):
”‘Rotate i elements of the given list’”
a = list_name[:i]
b = list_name[i:]
a.reverse()
b.reverse()
final_list = a+b
final_list.reverse()
return final_list
This is how it works:
print rotate([‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’], 3)
The solution:
python vectorrot.py
['d', 'e', 'f', 'g', 'h', 'i', 'j', 'a', 'b', 'c',]
.
Well, there’s another problem that I shall solve later. The book is really a masterpiece. I had a sneak-peek at what was to follow the next problem and it turns out its an anagram constructor! Wee! I can foresee a fun filled week ahead!
EDIT: The binary search implementation now terminates when it can’t find the value in the list.
June 9th, 2008 — python
I have recently added the podcasts on http://blog.stackoverflow.com to my feeds and in one of the episodes (second, I think), someone called in and asked Joel Spolsky whether he should learn C++ vs something else and aside the OOP what differentiated C and C++. Joel explained the benefits of C to Jeff and mentioned an example whose shaky reproduction goes like this, “If you are asked to write a script to replace every “x” in a string with a “y”, then the first solution most high level programmers would think of is to go over all the characters in the string and when they find an “x”, they append a “y” to the new string or they append the currently existing letter. They just don’t know how inefficient this is.”
I don’t know C (but I am learning it now) but I guess he might be right. I wrote a recursive solution for someone on orkut who had problems with it and I employed such a method (but considering that Python strings are immutable, I was forced to use a list and finally do a “”.join(string_list) ). Listening to that podcast made me think about other ways of doing that and obviously slicing a string comes to mind. I cannot plainly replace positions in Python strings (they’re immutable) and converting it to a list, going over replacing and then rejoining it sort of defeats the purpose of setting out to optimize that code. Here is the solution I whipped up for someone called Hari Priya on Orkut:
def changeXY(string_name, index=0, final_string=[]):
if index == len(string_name):
return "".join(final_string)
else:
if string_name[index] == "x":
final_string.append("y")
index += 1
else:
final_string.append(string_name[index])
index += 1
return changeXY(string_name, index, final_string)
Clearly, not the best, there’s a load of repetition there and all that but since I am too bored to fix that up, I will proceed with my next solution.
#!/usr/bin/env python
#Author: Shriphani Palakodety
#Filename: changeXY2.py
def changeXY(string_name):
slices_list = []
places = string_name.count("x")
start = 0
for i in xrange(places):
position = string_name.find("x", start)
slices_list.append(string_name[start:position]+"y")
start = position+1
slices_list.append(string_name[start:])
return "".join(slices_list)
That is the solution where I construct portions of the string and join them at the end. Since the only idea I’ve got about efficiency is limited to checking the runtime, I did just that.
Step 1: Pick a string. I picked
boneywasawarriorwayayixkickedasterixandobelixbutt
.
Here is a time based comparision:
time python changeXY1.py
boneywasawarriorwayayiykickedasteriyandobeliybutt
real 0m0.301s
user 0m0.077s
sys 0m0.170s
That was the time taken for the recursive solution to replace all “x”s with “y”s.
time python changeXY2.py
boneywasawarriorwayayiykickedasteriyandobeliybutt
real 0m0.255s
user 0m0.077s
sys 0m0.124s
That was the time the second solution using slicing took to run. It took less time and hence it was more efficient than the recursive solution.
Did that deserve an entire blog post? Probably not. Still, stackoverflow’s podcasts are pretty good and any advice from Joel Spolsky is good advice. I might drop in a question but I have no idea what to ask him.
Well, Happy Coding 
June 4th, 2008 — Uncategorized
I have been observing the lifestyle of the 4th grader who is spending a fortnight at our place and positively, I hate it. Poor kid is sort of pushed around everyday. Why? To impart to him the knowledge of English Grammar. He recites the parts of speech (I know them too, just in case you think I have gotten away with that), examples of each part of speech, points them out in a sentence and solves long worksheets on a daily basis. Why? Grammar. I sympathize with him and would like to point out that he’d be better off perfecting his theoretical knowledge of mathematics, such as study what a symmetry is, study what homogeneity is, study what polynomials truly are and so on. He’ll benefit that way. Instead, recite each kind of noun, pronoun, verb, adverb etc. I sometimes wonder what the point of all this is. Sure, if he wants to compete with Natural Language Parsers like montylingua, then it might help.. but he’s got world class search engines to find help too! What is the point of teaching someone non-functional grammar?
I believe that if someone gets grammar right, it has to be the CollegeBoard. Look at the SAT paper. No one asks you what a noun is or what something is. You are forced to learn how to use correct english so that your dialect (there are so many right here that every 200 miles, you may see English being spoken differently) does not hinder your communication with someone in the West or in another part of the world. Focus on that.
Even nature has an example for us. What does a toddler first learn? How to communicate, or the definitions of the parts of speech? So far, I have only used my knowledge of the parts of speech to pass the intermediate examination in English. Instead of teaching kids what the building blocks of English are, teach them how to speak English well.
June 3rd, 2008 — Daily life
I was solving sums from something I found on the webpage of the Berkeley Programming Contest (the sum does not belong to the PSets of the Berkeley Contests). I am pretty bored now and I intend to do something different (the only things I can do are code or sleep). My dad thinks cool posts on the government will land me in jail or something, so thats out of contention. Praising MS is well… stupid on my part. I have not evolved to consider Vista as an alternative to Linux and let us be frank, I never will. Vista is just….. let’s say it irritates me. I am just forced to use it for now as taking Vista off will be like pulling the rug from under my family. Anyway, I haven’t spoken about college plans for a long time owing to the fact that I had this inferiority complex since MIT rejected me. I spoke to my MIT EC (I have been speaking to her quite regularly so I no longer look to her as an EC but as a mentor of sorts). Right, I am going to the home of Debian, the house of the largest cluster in the Big 10 campuses, Purdue University - West Lafayette. Right, those who want to laugh or otherwise can.
Anyway, my cousin who just finished his 4th grade has decided to spend a fortnight at our place and seems to be enjoying it ( he loves DBZ and some other BS ). So, I got a few vids for him and left my Debian Laptop for him to play games at gamegecko.com and some other site.
I engaged in the following conversation with him the other day:
Me: Can you code? Do they teach you stuff at your school?
He: I do small things with the logo turtle. Our teacher in school doesn’t teach anything. But I do stuff with logo at home.
I couldn’t help wondering, we need geeks and this is what our schools do? Believe me guys, this kid is from one of Hyderabad’s most posh establishments at the elementary school level. I am left speechless when I discover the trash that school level computer programs in Indian schools are infused with.
I have no idea where to find purdue badges to put on my site…. so site stays sans promotion. Purdue CS hm… sounds cool. Even Philip Greenspun seems to like it.
I have begun reading PhD comics and I seriously hope grad school isn’t like that.
I will continue with posts on code soon. But right now, I need some alternative to that - writing seems to be one and I hope to come up with some cool posts / essays etc.
Well, stay tuned for more code filled posts soon and yeah, I’m a future Boilermaker.
June 1st, 2008 — Daily life, college prep., python
Woohoo, I’ve managed to clear the hurdle at 220 Anna Salai. But let me tell you guys, Chennai can drive anyone mad. Here is what I recollect from my trip (I do not have a camera so no pics):
- We drop in at a 3 star hotel named Dee Cee Manor ( who rates these hotels by the way? 3 stars is like overrating this establishment’s ridiculous rooms/food ).
- The autos suck. Oh yes! Spencer’s Plaza is a cool shopping center and it is about 2 kilometers from Dee Cee Manor. The first auto driver I met said 120 rupees. F*ck, 120 ? For 2 kilometers? Right, I ask the bugger to put on the meter. He says no. The next guy I meet charges me 60. I climb in and realize that auto drivers in Hyderabad are better than most others
- I then purchased two albums I had been dying to get my hands on, Paul McCartney’s Memory Almost Full and the #1 Beatle’s (I spent all my music aware days listening to the Beatles).
- I then go to the consulate and the doors impress me.
- The VO looks at my i20 and scribbles on my DS156 and then says in 1 week, your passport will be with you. Wee ! No questions !
- Then I went to Tirupathi ( no stupid comments on my religious beliefs. My faith is my own ).
- I took a train to the aforementioned location. A marwari family climbed and make a mess of everything, moving people around to accommodate that king sized, 20 member family of theirs. Bah, they kept shouting, put on music at 100 decibels ( and promptly earned abuse from my mother ). I know a marwari family myself and they are bloody educated and know how to behave in public. I still think people in our nation need to understand the notion of Public property and how all citizens are entitled to silence and deserve as much space as they have paid for ( which is limited to one seat in this case ).
- I then realized that Tirupathi wasn’t something I could survive. Did nothing there.
Oh by the way, I solved a problem ( albiet an easy one ) from the YSU-ACM High School Programming Contest - 1991 ( The best I could perform when the silence was under siege by this bloody family) on the way to Tirupathi. Here is the question: http://www.cs.berkeley.edu/~hilfingr/programming-contest/ysu_94.pdf (the inverted triangles sum).
Here is the solution:
#!/usr/bin/env python
#Author: Shriphani Palakodety
#YSU-ACM programming contest
#1991 PSet
#Premature optimization is the root of all evil.
# - Shriphani Palakodety ( 2008 A.D )
level =
"To be given"
level_list =
[]
while True:
if level != "":
level = raw_input()
level_list.append(level)
else:
break
level_list.pop(-1)
def printTriangle(level):
for i in range(int(level), -1, -1):
print "*" * i
for x in level_list:
printTriangle(x)
Done. Expect an increase in post frequency now. Phew, the consulate appoinment gave me nightmares.
May 18th, 2008 — Mathematics, python
I solved a couple of sums from the Project Euler archives and feel quite stupid since this is the first time in the last few weeks that a post has some code in it. I really ought to improve my efficiency. Well, the first sum I did was no. 42 and extremely special since it is the ultimate answer to the ultimate question. Well, the question was:
By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 + 11 + 25 = 55 = t10. If the word value is a triangle number then we shall call the word a triangle word.
Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, how many are triangle words?
So, I whipped up a solution in no time. I was only hindered when it came to figuring out whether a number was a perfect square or not and I was trying to explore better algorithms that mine which involved squaring the integral part of the square root of type float obtained by using math.sqrt(num) and squaring this integral part to see if we return to the original number again. I couldn’t really find a decent solution so I just implemented my own answer:
#!/usr/bin/env python
#Author: Shriphani Palakodety
#Example 42, Project Euler Archives.
import string
import math
def makeAlphaDict():
alpha_dict = {}
for i in range(len(string.uppercase)):
alpha_dict[string.uppercase[i]] = i + 1
return alpha_dict
alpha_dict = makeAlphaDict()
def checkTriangular(word):
word_sum = 0
for char in word:
if char == ‘"’:
continue
else:
word_sum += alpha_dict[char]
discrmnt = 8 * word_sum + 1
if checkOdd(int(math.sqrt(discrmnt))) == True and checkPerfSquare(discrmnt) == True:
return True
else:
return False
def checkPerfSquare(num):
probable = int(math.sqrt(num))
if probable * probable == num:
return True
else:
return False
def checkOdd(num):
if num % 2 == 0:
return False
else:
return True
input = open("words.txt", "r")
words_list = input.readlines()[0].split(",")
input.close()
word_count = 0
for word in words_list:
print word
if checkTriangular(word) == True:
word_count += 1
else:
continue
print word_count
Thats the first question. The next problem I solved was #53;
How many values of nCr, for 1 ? n ? 100, are greater than one-million?
Since
is a symmetric function about the middle term while r increases from 0 to n, I figured out that once we hit upon one (n, r) pair, without computing the rest, we could obtain the other values of r for a fixed value of n that crossed the million mark. Remember, if n is even, the number of terms is odd and if n is odd, number of terms is even. In case the number of terms is even, the middle terms are
and
. In case, the number of terms is odd, the middle term is
. Using just these three I could get to the solution in no time. Here is the code:
#!/usr/bin/env python
#Author: Shriphani Palakodety
#Project Euler Problem 53
def factorial(number):
if number == 0 or number == 1:
return 1
else:
return number * factorial(number - 1)
def combination(n, r):
n_fact = factorial(n)
r_fact = factorial(r)
n_r_fact = factorial(n - r)
return n_fact / (r_fact * n_r_fact)
def midTerm(n):
if n % 2 == 0:
mid_term_1 = n / 2
return mid_term_1
else:
mid_term = (n + 1) / 2
return mid_term
comb_num = 0
def countComb(n, r):
”‘If a particular value is found to be greater than 0,
it adds a certain number to comb_sum using properties of
combinatorics.’”
if n % 2 != 0:
num_to_add = 2 * (n / 2 - r + 1)
return num_to_add
else:
num_to_add = 2 * ((n+1)/2 - r) + 1
return num_to_add
for n in range(101):
for r in range(midTerm(n) + 1):
if combination(n, r) > 1000000:
comb_num += countComb(n, r)
break
else:
continue
print comb_num
Thats the solution. And of course, who’ll forget the time taken to get to the solution.
shriphani@psp-laptop:~/project_euler$ time python ex53.py
4075
real 0m1.116s
user 0m0.140s
sys 0m0.024s
shriphani@psp-laptop:~/project_euler$
Till the next time I put up a post, HAPPY CODING !