At CMU, there's a cluster on which I use EPD-Free^{1} as my Python distribution which is located in:

`~/opt/Python27/bin/python`

I have a custom gcc/g++ (version 4.7.0) in `/opt/gcc/`

- First, grab boost from here, decompress and
`cd`

into it. - Now, we need to install boost:
`./bootstrap.sh --prefix=~/opt/boost/ --libdir=~/opt/lib --with-libraries=signals,thread,python,mpi --with-python-root=/home/spalakod/opt/Python27/ --with-python-version=2.7`

- The above command generates a file called
`project-config.jam`

- This file contains some specifics about your python setup (it allows you to type in the path, version etc. in case it gets it wrong).
- Despite MPI being specified, it got skipped. More on this later.
- Now, do
`./b2`

- Go to the
`stage/lib`

in the appropriate directory (for me it was the directory I got by decompressing the original tarball). - If you see an
`mpi.so`

there, you're good. I didn't so I had to do the following:- Create a file called
`user-config.jam`

- I placed one line in there:
`using mpi ;`

- Now, run
`./b2 --user-config=user-config.jam`

- Create a file called
- At this stage I had an mpi.so in stage/lib. Add
`/path/to/stage/lib`

to`LD_LIBRARY_PATH`

and`PYTHONPATH`

- Now, none of the tests will pass because the tests import mpi using

`import boost.mpi`

and the way it is installed, we will need to use:

`import mpi`

- I have attached an archive of the tests that import MPI the correct (or - if you insist - incorrect) way: https://github.com/shriphani/mpi_python_tests
^{2} - Also, put the export statements (for LD_LIBRARY_PATH etc) in
`~/.bashrc`

.

`>>> dparser.parse("P 16:08 May 14, 2003 UTC", fuzzy=True)`

Traceback (most recent call last):

File "", line 1, in

File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 697, in parse

return DEFAULTPARSER.parse(timestr, **kwargs)

File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 301, in parse

res = self._parse(timestr, **kwargs)

File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 557, in _parse

res.hour += 12

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

I am not sure what the fuck it is even doing...

Some form of AI possibly. ]]>

You can get the latest build here

]]>I recently made the switch from Mac OS X to Windows (I now own a 17'' XPS which is an absolute beast of a machine). I have recently developed a major habit of wasting my time on sites like reddit, facebook (the addiction is rather huge here), gizmodo, engadget and the like. To get work done, it was imperative I used my web browser lesser than usual. Luckily, I found one of the best apps ever written to boost productivity: SelfControl came to my rescue here. SelfControl allows you to type in a list of sites you want to block and you are all set. Essentially, the one stop solution for people with my problem.

I couldn't find a free app on Windows (which sucks) so I decided to go write my own. Of course, this doesn't even match the awesomeness that SelfControl packs in its UI (I use the command line and a text file for user input and SelfControl has absolutely the best user interface for the job) and I still don't have persistence across reboots etc working (* yet).

My "app" (a small script) reads in domains from .list.txt (you can change it in the script) and adds this one line to the System hosts file:

127.0.0.1 _user_submitted_domain_

and voila, site is blocked. For running the app, you need to supply no. of minutes blocked as a command line argument. So you would run:

python WinSelfControl.py #number_of_minutes

and it should go block it.

Of course, there are some quirks. For example: www.facebook.com is needed to block facebook. I initally added just facebook.com and it didn't work (probably my misunderstanding of how hosts files work).

Also, this app isn't persistent across shutdowns. If you shut-down in the middle of its execution, you will have to kick out the lines added to the hosts file yourself. So beware!

This was the product of a few hours and at last I can get back to productivity.

I have only tested this on a 64 bit Windows 7 machine. Also, this code looks like spaghetti code and it might break something. So no warranties there.

Source uploaded here: http://shriphani.com/scripts/WinSelfControl.py

Besides this, Microsoft interview on Tuesday, thanksgiving break and finals approaching fast etc. etc. Whatever.

]]>And of course, more people (millions) know about that app than about mine (about 20). Hmm no fair.

]]>The additions to Listener:

- A new Voice Activity Detection algorithm [ link to pdf by Moattar and Homayounpour].
- Skype4Py is still buggy on OS X with Python2.5 and it takes a nice solid dump (segfault) with Python2.6 so I am sure if the Skype part of the code works.
- Python only. I removed the applescript portion from my code.

I have moved the entire code over to github: http://github.com/shriphani/Listener/ . The VAD algorithms can be seen in the file VAD.py.

This VAD also works by starting off with a base threshold for energy, power and an attribute called Spectral Flatness Measure. Luckily the paper had pseudocode so my DSP n00bishness wouldn't get in the way of progress.

Anyway, to get this version of Listener running, download Listener.tar , untar it and run:

python2.5 audio_analysis.py

And you should be all set.

]]>Example: 1 USD gives us 0.8 Euro and 1 Euro gives us 0.8 GBP and 1 GBP gives us 1.7 USD. So if I start with 1 dollar, I can exchange it for 0.8 Euro which can then be exchanged for 0.64 GBP and if converted back to USD we have 1.088 dollars.

So if rate_i_j is used to represent the amount of currency_j which we can get for 1 unit of currency_i, we observe:

rate_i_j * rate_j_k * rate_k_l * ..... * rate_z_i > 1

Invert this to get:

1/(rate_i_j * rate_j_k * rate_k_l * ..... * rate_z_i) < 1

and take lg on both sides:

lg(1/rate_i_j) + lg(1/rate_j_k) + ..... + lg(1/rate_z_i) < 0

So we observe that if we represent our table using a graph with lg(1/exchange_rate) as the weights, then the presence of a negative weight cycle is all that is needed to ensure we get more $$ than we started out with.

And of course, negative weight cycle detection is really easy with the bellman-ford single source shortest path algorithm.

This algorithm operates by the following dynamic programming equation:

distanceFromSource(v) = distanceFromSource(w) + weight(w, v)

When we set the distance of vertex 'v' from the source, we look at the edge (w, v) and observe that if we can get to 'v' by a shorter path through 'w', we discard the old distance and set distanceFromSource(w) + weight(w, v) as the new distance. This operation is called "Relaxing an edge"

Now, we only need to observe every edge a certain number of times. Precisely |V| - 1 times. Why? Well, consider a graph where the max out-degree is 1.

So, we have something like: v1 ----> v2 -----> v3 -----> v4 ------> ........ ------> v{n}. This is a directed graph with n vertices. If our source is v1, in the first of |V| - 1 iterations, we relax the first edge and fix the path to v2 (cannot get to v2 using a shorter path see?). In the second iteration, the path to v3 is fixed and continuing this we finish fixing the paths to the vertices |V| - 1 iterations (Remember, you relax **every** edge in each iteration).

To detect a negative weight cycle, finish relaxing all the edges |V| - 1 times. Now, carry the relax operation out once again. Since, all paths from the source to each vertex are guaranteed to be fixed by now, if you ever stumble upon a situation where you change distanceFromSource(v) for a vertex v, then a negative weight cycle starts and ends at v. The result of this is that a shortest path to v cannot be found, since for every path to v, you can make another trip about this negative-weight cycle and obtain an even shorter path.

So, I wrote a simple script for the Bellman-Ford routine and tested it using the USD, Euro, GBP example.

Here's the Bellman-Ford routine:

```
#!/usr/bin/env python
#Author: Shriphani Palakodety
#Bellman Ford Implementation. Look it up on wikipedia.
distances = {}
parents = {}
def Initialize(graph, start):
'''Prepares the graph for the bellman-ford algorithm'''
for vertex in graph.V:
if graph.E.has_key((start, vertex)):
distances[vertex] = 9999999e6
distances[start] = 0
parents[start] = None
def Bellman_Ford(graph, start):
Initialize(graph, start) #first initialize the graph
for i in xrange(len(graph.V) - 1):
for edge in graph.E.keys():
if (graph.E[edge] + distances[edge[0]]) < distances[edge[1]]:
distances[edge[1]] = graph.E[edge] + distances[edge[0]]
parents[edge[1]] = edge[0]
#one final check for negative weight cycles.
for edge in graph.E.keys():
if (graph.E[edge] + distances[edge[0]]) < distances[edge[1]]:
distances[edge[1]] = graph.E[edge] + distances[edge[0]]
parents[edge[1]] = edge[0]
return edge[1] #return here since, a negative weight cycle is detected
return None #return none on successful completion.
```

And the test itself:

```
!/usr/bin/env python
#Author: Shriphani Palakodety
#Mail: spalakod@purdue.edu
#Testing Arbitrage
#import graph
import bellman_ford
from math import log
#in the format cur1_cur2 = no. of units of cur2 for 1 unit of cur1
usd_euro = 0.8
euro_gbp = 0.8
gbp_usd = 1.7
#Nodes List
nodes = ["usd", "euro", "gbp"]
edge_dict = { ("usd", "euro") : log(1/usd_euro), ("euro", "gbp") : log(1/euro_gbp), ("gbp", "usd") : log(1/gbp_usd), ("euro", "usd") : log(usd_euro), ("gbp", "euro") : log(euro_gbp), ("usd", "gbp") : log(gbp_usd)}
#make a small graph to represent the currency network.
cur_graph = bellman_ford.Graph(nodes, edge_dict)
print bellman_ford.Bellman_Ford(cur_graph, "usd")
```

And when run, I get:

?(shriphani@Shriphani-Palakodetys-MacBook-Pro)?(533/ttys001)?(09:10P:07/02/10)?- ??(%:~/scripts)?- python arbitrage_test.py usd ??(shriphani@Shriphani-Palakodetys-MacBook-Pro)?(534/ttys001)?(09:10P:07/02/10)?- ??(%:~/scripts)?-

So, there's a negative weight cycle and if you kick off with some USD, you can get rich.

Get my routines here:

My code reads too much like pseudocode. Didn't know a theory class corrupts so much.

]]>- MakeSet(x) : Create a set with the element 'x' in it.
- FindSet(x) : Return the set that the element belongs to
- Union(x, y) : Make a new set defined as : where and and destroy X and Y.

Here each set is represented by placing its elements in the nodes of a linked list. In each node, we also store a pointer to the parent which contains the name of each set and some other information like the number of elements in the set, the name and so on.

MakeSet(x) : Constant Time

FindSet(x) : Return the parent that the parent pointer points to

Union(x, y) : Assume X is the set x resides in and Y is the set y resides in. Now, if we append the elements of X to Y, we would need to walk through X and set the parent pointers to Y. So, 1 union operation takes O(n) time.

The Union(x, y) implementation gets a slight change here. When deciding which set to append to the other in the union operation, we append that set which has fewer elements to the other set. The advantage is that the walk through the list to change the parent takes lesser time.

We represent a set by a rooted tree where each node points to its parent (not parent -> children but the other way round). We define a special term here called rank. The rank of a set is an upper bound on the height of the tree. Why it is an upper bound will be evident soon.

MakeSet(x) : Constant Time. Let x be a node whose parent is itself

FindSet(x) : O(log(n)) time. Basically need to go from x to the root.

Union(x, y): Constant time. We just make the root of either X or Y (the sets x and y belong to respectively) the parent of the other root.

We use a heuristic here called path compression. When we run FindSet(x), we typically traverse the path from x to the root. With path compression, we basically make all nodes on this path the immediate children of the root. The advantage? Subsequent FindSet() operations on these nodes take constant time. This might modify the actual height (as you can see). BUT YOU DON'T CHANGE THE RANK HERE. Hence the rank is an "upper-bound" on the height. If we didn't have the path-compression heuristic, the rank would have been the height.

So, the code follows. First the linked list implementation:

Here are the class definitions for the Nodes, the List and the Universe

```
valueNodeDict = {}
class Node():
'''Represents a node of the linked list'''
def __init__(self, value, parent, next=None):
self.value = value
self.next = next
self.parent = parent
valueNodeDict[value] = self
def __str__(self):
return str(self.value)
class List():
'''Represents the linked list itself'''
def __init__(self, name, value=None):
'''Creates a set and sets head and tail to the appropriate nodes'''
self.name = name
self.count = 0
if value is None:
self.head = None
self.tail = None
else:
self.head = Node(value, self)
self.tail = self.head
self.count += 1
def __str__(self):
'''Lists out all the elements'''
s = self.name + ": "
node = self.head
while node != None:
s += str(node) + ", "
node = node.next
return s + "n"
def addElement(self, node):
'''Add another node'''
self.tail.next = node
self.tail = self.tail.next
self.count += 1
def changeParent(self, newParent):
'''Changes the parent of every node in the set'''
node = self.head
while node is not None:
node.parent = newParent
node = node.next
class Universe():
'''Operations you can perform in the universe'''
def __init__(self, name="U"):
'''Creates the universe where all your subsequent sets belong'''
self.name = name
self.setCount = 0
self.sets = []
def __str__(self):
s = self.name + ":n"
for set in self.__sets:
s += "t" + str(set) + "n"
return s
def addSet(self, name=None, value=None):
'''Adds a set to the universe'''
if name is None:
self.__sets.append(List("S"+str(self.setCount), value))
self.setCount += 1
else:
self.__sets.add(List(name, value))
U = Universe()
```

As you can see, I have a valueNodeDict dictionary in the implementation. The idea is that I should be able to use the values themselves in the operations and not the nodes.

Next, the operations themselves:

```
def MakeSet(x, name=None):
'''Makes a new Linked List'''
U.addSet(name, x)
def FindSet(x):
'''Find The Set This Node Belongs To'''
return valueNodeDict[x].parent
def Union(x, y):
'''Destructively Perform The Union'''
if valueNodeDict[x].parent is not valueNodeDict[y].parent:
x_set = valueNodeDict[x].parent
y_set = valueNodeDict[y].parent
if x_set.count < y_set.count:
#x gets appended to y
y_set.count += x_set.count
x_set.changeParent(y_set)
y_set.tail.next = x_set.head
y_set.tail = x_set.tail
U.sets.remove(x_set)
else:
#y gets appended to x
x_set.count += y_set.count
y_set.changeParent(x_set)
x_set.tail.next = y_set.head
x_set.tail = y_set.tail
U.sets.remove(y_set)
```

Now, the tree implementation:

```
valueNodeDict = {}
class Node():
'''Represents a node in a tree'''
def __init__(self, value, parent, rank):
self.value = value
self.parent = parent
self.rank = rank
valueNodeDict[value] = self
def __str__(self):
return str(value)
class Universe():
'''The Universe where all sets sit'''
def __init__(self):
self.sets = [] #list of all root nodes
def addSet(self, root):
self.sets.append(root)
U = Universe()
```

And the operations:

```
def internalFindSet(x):
'''Find The Root Of The Tree That Contains This Node. Uses path compression'''
if x.parent is not x:
x.parent = internalFindSet(x.parent)
return x.parent
def MakeSet(x):
'''Make a new node whose parent is itself'''
a = Node(x, None, 0)
a.parent = a
U.addSet(a)
def FindSet(x):
'''Returns the root of the tree which contains this value'''
x_node = valueNodeDict[x]
return internalFindSet(x_node)
def Union(x, y):
'''Destructively Unite X and Y where x belongs to X and y to Y'''
x_set = FindSet(x)
y_set = FindSet(y)
if x_set.rank > y_set.rank:
y_set.parent = x_set
else:
x_set.parent = y_set
if x_set.rank == y_set.rank:
y_set.rank += 1
```

As an added exercise, I also threw in Kruskal's algorithm for finding the Minimum Spanning Tree of a Graph. Here is my graph implementation which just consists of an edge-list and a vertex-list:

```
import heapq
from disjoint_set2 import *
class Vertex:
def __init__(self, value):
self.value = value
def __str__(self):
return self.value
class Edge:
def __init__(self, vertex1, vertex2, weight):
self.vertex1 = vertex1
self.vertex2 = vertex2
self.weight = weight
def __str__(self):
s = "Connects: " + str(self.vertex1) + " and " + str(self.vertex2) + "tWeight: " + str(self.weight)
return s
class Graph:
def __init__(self, V, E):
self.V = V
self.E = E
```

I decided to create a small graph with nodes: 'a', 'b', 'c', 'd' and 'e'. The code to do that:

```
characters = ['a', 'b', 'c', 'd', 'e']
vertices = []
edges = []
for char in characters:
vertices.append(Vertex(char))
```

Next, I decided to connect every vertex to every other vertex using edges whose weights are in increasing order as follows:

Edge from 'a' to 'a' has weight 0

From 'a' to 'b' we get 1

From 'a' to 'c' we get 2

and so on, you get the idea. The plan was to get a verifiable result straight away.

```
for vertex1 in vertices:
for vertex2 in vertices:
edges.append(Edge(vertex1, vertex2, i))
i += 1
G = Graph(vertices, edges)
```

Now, we prepare the required data structures for this algorithm. We need a priority queue. Python 2.6.5 ships with a heapq module. Using this module, we can maintain a heap in an array and the desired "bubble up" and "bubble down" operations are performed by the heapq module's routines. As it is known to mankind, the root of any heap contains the element with the smallest key (assuming the heap is storing (key, value) pairs and is a min-heap). So, we finally have:

```
def kruskal(G):
heap = []
for edge in G.E:
heapq.heappush(heap, (edge.weight, edge))
T = [] #this contains all the edges in the tree
# run makeset on all the vertices
for vertex in G.V:
MakeSet(vertex)
while heap:
min_edge = heapq.heappop(heap)[1]
if FindSet(min_edge.vertex1) is not FindSet(min_edge.vertex2):
# perform a union and add this edge to the Tree
T.append(min_edge)
Union(min_edge.vertex1, min_edge.vertex2)
return T
```

So, I return a list of edges in the MST. When I run this on the graph created above, I get:

(shriphani@Shriphani-Palakodetys-MacBook-Pro) (556/ttys000) (09:03P:05/19/10) - (%:~/scripts) > python kruskal.py Connects: a and b Weight: 1 Connects: a and c Weight: 2 Connects: a and d Weight: 3 Connects: a and e Weight: 4

Which when checked is the actual MST.

**Analysis**

Assume that we have n makeset operations out of m overall operations, with the special heuristic we use, anytime a list is chosen for appending to another list, we observe that this list has to have fewer elements than the other list.

So, this is the pattern we have:

- If there is one element in the list and this list is chosen for appending to the other list, then the resulting size would be at least 2
- If the current size is 2 and we append this list to another list, the resulting size will be at least 4
- Once we append a list of size 4 to another list, the resulting list would have a size of 8.

So, we observe that in 3 append operations, we approached a size of 8. So, in lg(8) operations, we approached size 8. So, we would approach size 'n' (the kruskal() routine obtains the MST on a connected graph at this stage) in lg(n) operations.

Also, there need to be n-1 union operations since the Universe finally contains just 1 set with all vertices in it (assuming you have a connected graph). There need to be n-1 union operations for the universe to approach this state. So, a final running time is O(n * lg(n)) for this.

Now, if the graph is pretty sparse, as in 5 vertices and a single edge in the entire graph, the other operations would dominate. So, the actual running time is

O(m + n*lg(n))

With the tree based structure, our course didn't cover the analysis but we were told that the running time was a cool O(m * alpha(n)) where alpha(n) <= 4 for most circumstances.

The code can be obtained at:

In this post I used material from Cormen-Leiserson-Rivest-Stein's amazing book (It is the best book I've ever read. Please go get a copy. You won't regret it). And of course, stuff from Professor GNF's CS 381 class. It is the best CS course I've taken thus far.

]]>Finally to decide whether there was speech on an overall level, I look for at least 3 instances of 18 consecutive frames being marked as active (just random picks, 18 frames allows 8 active frames and 10 additional for the counter we have and 3 looked like a good candidate at the end when I spoke my own name out).

And as a final measure, I also ensure that the overall intensity beats 48 dB so that someone trying to have a conversation with me is only recognized.

Finally, I made the switch from GeekTool to Growl as this thing kept taking a solid amount of real estate and since I have one 23'' monitor and a 15'' monitor, this thing is positioned outside the real estate of my laptop's display. Growl seems like a better candidate overall and since I could get growl bindings to build on my machine finally, I think I should let growl handle this.

So, the only places where my VAD implementation (or my mod of whatever was in that paper) doesn't seem to work is in surroundings with a piano (in our dorm's lobby for example), v inconvenient but whatever, probably some time in the future, I will begin understanding DSP and spectral analysis well enough to come up with a simple VAD algorithm (as opposed to implementing something straight from a paper without any understanding of what is going on). Anyway, here is the updated script, it seems to do well recognizing speech in sort of silent settings:

```
#!/usr/bin/env python
#Author: Shriphani Palakodety
#Tool to aid those with noise cancellation headphones
import pyaudio
import wave
import sys
import struct
import numpy
import time
Growl_exists = True
try:
import Growl
except ImportError:
print "No Growl"
Growl_exists = False
pass
skype_on_call = False
notifier = 0
if Growl_exists:
notifier = Growl.GrowlNotifier('Listener', ['Attention', 'test'])
#notifier.applicationName = 'Listener'
notifier.register()
def record():
'''Records Input From Microphone Using PyAudio'''
duration = 3 #record for 1 second. Pretty long duration don't you think
outfile = "analysis.wav"
p = pyaudio.PyAudio()
inStream = p.open(format=pyaudio.paInt16, channels=1, rate=44100,input=True, frames_per_buffer=1024)
out = []
upper_lim = 44100 / 1024 * duration #upper limit of the range we record to. 44100 / 1024 sized chunk * 5 seconds
for i in xrange(0, upper_lim):
data = inStream.read(1024)
out.append(data)
#now the writing section where we write to file
data = ''.join(out)
outFile = wave.open(outfile, "wb")
outFile.setnchannels(1)
outFile.setsampwidth(p.get_sample_size(pyaudio.paInt16))
outFile.setframerate(44100)
outFile.writeframes(data)
outFile.close()
analyze()
def analyze():
if skype_on_call:
print "n"
print "Skype Call In Progress"
print "Listener On Hold"
return
inFile = wave.open("analysis.wav", "rb") #open a wav file in read mode
thresh = 1000 #establish a minimum threshold
max_samp = 0
decision = [0]
#for i in xrange(441):
inactive_counter = 0
vals = inFile.readframes(inFile.getnframes()) #read in 30 samples
len(vals)
results = struct.unpack("%dh"%(inFile.getnframes()), vals) #unpack to get the samples
results = [abs(x) for x in results]
#now we need to pull 30 samples at a time (30 samples = 1 frame).
for i in xrange(4404):
frame = results[30*i: 30*(i+1)]
print frame
new_thresh = (thresh * (1 - (2.0 ** -7))) + ((2 ** -8) * max_samp)
#check how many samples go above this new threshold
count = 0
for j in frame:
if j > new_thresh:
count += 1
if count / 30.0 >= 0.9 : #need it to beat 90%
#frame is a candidate for speech
decision.append(1)
else:
#this is where we use a counter based implementation for labelling inactiveness
if inactive_counter < 10 and decision[-1] == 1: #we ignore silence for 10 runs
decision.append(1)
inactive_counter += 1
else:
inactive_counter = 0
decision.append(0)
#update the threshold and the max sample values
thresh = new_thresh
max_samp = max(frame)
#final check for characterization as speech, we use another counter
active_counter = 0 #since the inactive counter will cause silence to be recognized as speech, we only consider speech as
print decision
final_num = 0
for val in decision:
if active_counter >= 18:
print "Speech!"
final_num += 1
active_counter = 0
if val == 1:
active_counter += 1
else:
active_counter = 0
results = [x ** 2 for x in results]
intensity = 20 * numpy.log10(numpy.sqrt(sum(results)/inFile.getnframes()))
if final_num >= 3 and intensity > 48:
if Growl_exists:
notifier.notify('Attention','Listener', 'Speech Detected Nearby')
else:
print "Speech Detected Nearby!nSomeone might be calling you"
inFile.close()
if __name__ == "__main__":
f = open("skype_Status", "r")
for new_line in f:
if new_line == "PROGRESS":
skype_on_call = True
if skype_on_call:
analyze()
else:
record()
```

Anyway, it would be really convenient if I could find something about VAD algorithms and improve listener to work better for my dorm room settings. It is doing a pretty good job already but there is always scope for improvement.

As always, my solutions need to be convoluted and over here, I make use of applescript to check if there's a skype call going on or not, so yeah, you can find all that here.

Screenshots etc available on Listener's new home: http://shriphani.com/blog/listener/.

]]>