Shriphani ‘PSP’ Palakodety

Weblog of an Aspiring Computer Scientist

Shriphani ‘PSP’ Palakodety header image 1

Listener Gets a VAD

January 21st, 2010 · DSP, python

So, the beginning of the 4th semester in the midst of losers and overachievers and this sem promises to set my a$$ on fire. As usual, I plan to continue working under Dr. Kihara this sem so that should be interesting. Anyway, I decided to improve upon what listener offered and decided to add a VAD algorithm to it. I initially chose the algorithm by moattar and homayounpur and decided that I ended up with too much to do (it might certainly be a good candidate for later, when I have more time for example). Hence, I decided to snoop around for something simpler and found this paper which seemed small and had a sort of ever changing threshold for successive frames. The paper was authored by S.Milanovic, Z. Lukac and A. Domazetovic. I still don’t think I got it exactly right though. The paper mentioned that they used counters to mark frames as silence based on what the previous frame was and I had to come with a counter upper bound for myself and finally chose to go with 10 as a good counter for such stuff. i.e. even if a particular frame doesn’t make it beyond its threshold, it still will be marked as active if the previous frame was active. This is done to accommodate situations where we end up reducing our volume at the end of a word / sentence.

Finally to decide whether there was speech on an overall level, I look for at least 3 instances of 18 consecutive frames being marked as active (just random picks, 18 frames allows 8 active frames and 10 additional for the counter we have and 3 looked like a good candidate at the end when I spoke my own name out).

And as a final measure, I also ensure that the overall intensity beats 48 dB so that someone trying to have a conversation with me is only recognized.

Finally, I made the switch from GeekTool to Growl as this thing kept taking a solid amount of real estate and since I have one 23” monitor and a 15” monitor, the geektool is positioned outside the real estate of my laptop’s display. Growl seems like a better candidate overall and since I could get growl bindings to build on my machine finally, I think I should let growl handle this.

So, the only places where my VAD implementation (or my mod of whatever was in that paper) doesn’t seem to work is in surroundings with a piano (in our dorm’s lobby for example), v inconvenient but whatever, probably some time in the future, I will begin understanding DSP and spectral analysis well enough to come up with a simple VAD algorithm (as opposed to implementing something straight from a paper without any understanding of what is going on). Anyway, here is the updated script, it seems to do well recognizing speech in sort of silent settings:

#!/usr/bin/env python
#Author: Shriphani Palakodety
#Tool to aid those with noise cancellation headphones

import pyaudio
import wave
import sys
import struct
import numpy
import time

Growl_exists = True

try:
        import Growl
except ImportError:
        print "No Growl"
        Growl_exists = False
        pass

skype_on_call = False
notifier = 0
if Growl_exists:
        notifier = Growl.GrowlNotifier(‘Listener’,  [‘Attention’, ‘test’])
        #notifier.applicationName = ‘Listener’
        notifier.register()

def record():
    ‘Records Input From Microphone Using PyAudio’
    duration = 3 #record for 1 second. Pretty long duration don’t you think
    outfile = "analysis.wav"
   
    p = pyaudio.PyAudio()
   
    inStream = p.open(format=pyaudio.paInt16, channels=1, rate=44100,input=True, frames_per_buffer=1024)

    out = []
    upper_lim = 44100 / 1024 * duration #upper limit of the range we record to. 44100 / 1024 sized chunk * 5 seconds
   
    for i in xrange(0, upper_lim):
        data = inStream.read(1024)
        out.append(data)
   
    #now the writing section where we write to file
    data = .join(out)
    outFile = wave.open(outfile, "wb")
    outFile.setnchannels(1)
    outFile.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    outFile.setframerate(44100)
    outFile.writeframes(data)
    outFile.close()
    analyze()

def analyze():
    if skype_on_call:
        print "\n"
        print "Skype Call In Progress"
        print "Listener On Hold"
        return
    inFile = wave.open("analysis.wav", "rb") #open a wav file in read mode
    thresh = 1000  #establish a minimum threshold
    max_samp = 0               
   
    decision = [0]

    #for i in xrange(441):

    inactive_counter = 0
       
    vals = inFile.readframes(inFile.getnframes()) #read in 30 samples
    len(vals)
    results = struct.unpack("%dh"%(inFile.getnframes()), vals)  #unpack to get the samples
    results = [abs(x) for x in results]
   
    #now we need to pull 30 samples at a time (30 samples = 1 frame).

    for i in xrange(4404):
        frame = results[30*i: 30*(i+1)]
        print frame
        new_thresh = (thresh * (1(2.0 ** -7)))  +  ((2 ** -8) * max_samp)
         
        #check how many samples go above this new threshold
        count = 0

        for j in frame:
            if j > new_thresh:
                count += 1
        if count / 30.0 >= 0.9 :   #need it to beat 90%
            #frame is a candidate for speech
            decision.append(1)
         
        else:
            #this is where we use a counter based implementation for labelling inactiveness
            if inactive_counter < 10 and decision[-1] == 1: #we ignore silence for 10 runs
                decision.append(1)
                inactive_counter += 1
            else:
                inactive_counter = 0
                decision.append(0)
         
        #update the threshold and the max sample values
        thresh = new_thresh
        max_samp = max(frame)

    #final check for characterization as speech, we use another counter
    active_counter = 0 #since the inactive counter will cause silence to be recognized as speech, we only consider speech as
    print decision
    final_num = 0
    for val in decision:
        if active_counter >= 18:
            print "Speech!"
            final_num += 1
            active_counter = 0
        if val == 1:
            active_counter += 1
        else:
            active_counter = 0

    results = [x ** 2 for x in results]
    intensity = 20 * numpy.log10(numpy.sqrt(sum(results)/inFile.getnframes()))
   
    if final_num >= 3 and intensity > 48:
        if Growl_exists:
            notifier.notify(‘Attention’,‘Listener’, ‘Speech Detected Nearby’)
        else:
            print "Speech Detected Nearby!\nSomeone might be calling you"
    inFile.close()

if __name__ == "__main__":
    f = open("skype_Status", "r")
    for new_line in f:
        if new_line == "PROGRESS":
            skype_on_call = True

    if skype_on_call:
        analyze()
    else:
        record()
 

Anyway, it would be really convenient if I could find something about VAD algorithms and improve listener to work better for my dorm room settings. It is doing a pretty good job already but there is always scope for improvement.

As always, my solutions need to be convoluted and over here, I make use of applescript to check if there’s a skype call going on or not, so yeah, you can find all that here.

Screenshots etc available on Listener’s new home: http://shriphani.com/blog/listener/.

→ No CommentsTags:

2009 – Recap

January 2nd, 2010 · Uncategorized

Well, this year, I did a lot of cool stuff. You can’t believe the stuff I did this year. First, I managed to blow up a stat course and get a B – right in the second semester in one of the easiest courses (supposedly). Well, the next sem went well. And that is all I did. Well, in code land here is the stuff I did:

1. Wrote a search engine – Seriously, purdue CS students, take Dr. GRR’s Course and you won’t regret it.

2. Wrote a Map Editor and a Turn by Turn direction finder – Although the GUI stuff was 100% done by my teammate  ( I suck at SWING), I worked with the dijkstra algorithm. (Again, Dr. GRR’s course).

3. Worked in real mode. Did you know the real mode is absofuckinglutely cool. From a debugger, you can modify the fkin screen. Every CS major should code in real mode – it is important that you do. Else you have missed out, believe me, you have missed out.

Now, some whacko notes:

From the DSP guide I managed to see a signal in a different way.

Lemma: Assume a signal composed of n samples is a linear space. (It is. A signal with every sample set to 0 falls in this space and we can add / scale without adding any more samples. So an n – point signal is a linear space  - Q.E.D)

Since we have a linear space, what is the basis of the space. These are the basis functions (the sinusoids). There’s n + 2 sinusoids but two of these have all values set to 0. Hence these sinusoids are linearly dependent on other sinusoids we already have – leaving us with n linearly independent sinusoids. These span the n dimensional space containing the signal and the frequency domain provides us with the coordinates of the signal in the space represented by the basis functions. Useless piece of knowledge is now recorded on this site and will rot here till eternity.

It is 2010…….

→ 2 CommentsTags:

The One PITA

November 29th, 2009 · Daily life, PITA, python

Well, it is thanksgiving break and I was so far having a decent semester, straight As in all exams (a perfect score in economics – not that I should be proud of it or anythin) and then terror strikes. Or well whatever the college version of a cataclysm is. I managed to f’kin ruin (misunderstand) the spec on a project and I am in danger of throwing away a coveted 4.0 GPA which would have been a great reward for the long hours of study I have put in + a retooling of my schedule so I have the multitasking capability of a mule and those $$ my parents spend so their son can enjoy a pain free life in a first world country and try to make them proud. Well, as it happens, I managed to (or at least I think – the scores are not yet out) misunderstand a spec AAAAARRRGGHH! In a datastructures course, I mastered AVL & Red-Black trees, spent hours trying to tweak my implementations, did well on the exam and managed to blow it when it came to reading a text file and filling an array. I just wonder how I even pull this stuff off. With Grad School apps coming in 2 years, what will I have to show – a carelessness in even reading specs seriously that puts doubts on the efficacy of my research methodology, I just hope I don’t cause major problems for myself.

Well, in case this doesn’t make sense, I managed to misunderstand a technique to populate an array with data (the data structs would work with this but oh no, in my interest to take the maximum from this course, I devoted long hours to getting the data structs right). There is a very good chance that the part I screwed up would end up being insignificant but that is not the point. The point is that once again my grade is going to shuttle between the first two letters of the english alphabet.

Anyway, in all this self hatred that I have been building up for myself, I have managed to learn some cool stuff in Dr. Kihara’s lab. The work there is pleasant, I can feel good boasting about it and when I talk to my friends’ parents, I can make them believe I am going to find a solution to poverty in 4 years and find a cure for cancer in my spare time (you can see I am not popular).

However, a weird question I was dealing with was that most of the stuff I have written for my work is in Python and it would be cool if I could call it from Java since there are a bunch of people in the lab who use it. I am not looking for something complicated, just want to call a bunch of methods from a Python module. JPype and Jython seem to the things I should be looking for but with my awesome constraints (Python 2.6 should be supported, It should make coffee etc), I will need to use my Uber-GOOGLE-SKILLZ.

Anyway, this blog has managed to get to pagerank 0 (I will interpret that as the pagerank o loner being relevant ) and I am now a suggestion on Google (yay! my plan for world domination is in full swing!)

And I love this Charlie dude who bites his bro’s finger, as for kanye, well even the prez thinks he’s a jackass:

→ No CommentsTags:

Tinkering With A New Project

September 18th, 2009 · python

During the vacation this summer, I began working on creating an app that would help me respond to calls of attention when my auditory sensory capabilities were compromised courtesy the noise cancellation capabilities of my Bose headsets. Well, I managed to make a few mods to that script, used a touch of applescript (thank god OS X apps are scriptable) and introduced a bunch of new behaviors:

-> When a Skype call is in progress, the app stops listening.

Ok, not a bunch, just one. Also, this is not exactly an app you should be using since it relies too much on quirks in my own computing environment. Anyway, for more details you can head over to http://shriphani.com/Shriphani_Website/Listener.html.

So, a few screenshots:

Apart from that, I also wrote a protein function matching script as part of my research this semester. I will be putting it up soon. Wow, I have been more productive in these three weeks than all of last year.

→ 4 CommentsTags:

Experiments With Interior Design

September 7th, 2009 · Daily life

Well, yesterday I mentioned the battle of tarkington, an elaborate arrangement of action figures to illustrate a story. Well here is the second part:

Snake Eyes, struggles to aim at our heroes due to the high velocity winds. He decides to end it the cool way, by cutting the cord used by Data Center and Beach Head. He begins his ascent, his destination: the smoke detector, his mission: to cut the damn cord.

IMG_0166

IMG_0168

IMG_0171

IMG_0172

Meanwhile, Beach Head looks down and sees the citadel of Qwerty City, Macbook Pro Center. A drop would mean certain death.

In the face of death, Beach Head puked – everything he ate over the last meal – totaling a whopping 1.5 Kg. (Belly Clench = Puking).

To his horror Beach Head realized that he had dropped his ammo + supplies bag !!

Suddenly Data Frame noticed a movement in the smoke alarm region. He saw Snake Eyes and the knife.

He alerted Beach Head about it. Beach Head rubbed his brow thinking that that may have well been his last vomit.

The decrease in the mass of the (Beach Head + Ammo Bag) system was just enough for the strong computer programmer, Data Frame to perform his famous maneuver. Using all his strength, he swung his arm lifting Beach Head and the gun and as the barrel came in line with his view of Snake Eyes, he fired. His life hinged on the accuracy of his aim:

mid – mid – somersault, Beach Head heard a bang and saw the bullet whizz past his face.

Snake Eyes stunned by this feat didn’t realize what was happening when the bullet hit him in the face. The impact was so severe, he lost balance and hit the wall:

Gravity showed up demanding a demonstration of free fall. On his way down, Snake Eyes realized that he would ram into Baroness as well.

Baroness oblivious to the activity, noticed a dark object hurtling towards her:

Baroness couldn’t hang on to the rope and began her fall with Snake Eyes:

On his way down Snake Eyes was visited by a strange thought “Radioactive Ammo”

Data Frame couldn’t believe his superhuman strength:

Dataframe and Beach Head, defending QwertCity always.

→ 1 CommentTags:

Experiments With Interior Design

September 4th, 2009 · Daily life

So, I was trying to get my room to look like the best place on earth and I stumbled upon the idea to use my Action Figures to spice things up. Everyday, the action figures will be arranged to show a new scene. Well this is how it looks so far:

Baroness tries to kill anyone who plans to enter through the door of NE 436 in Tarkington Residence Hall and her trusted assistant plans to defend her in case things go wrong.

IMG_0147

Suddenly her assistant “Snake Eyes” notices “Dataframe” and “Beachhead” sneaking in on them and alerts Baroness:

IMG_0146

Snake Eyes takes aim.

IMG_0148

Fatigued by the lack of oxygen over the summer break, Beach Head struggles to make it across.

IMG_0152

Suddenly high velocity winds cause problems:

IMG_0156

Beach Head slips and his weapon falls. Data Frame, the computer programmer with extremely high upper body strength tries to save Beach Head. He is now forced to use his only weapon, his gun to prevent Beach Head from falling.

IMG_0158

IMG_0159

IMG_0160

IMG_0164

For the purposes of ensuring that this story continues, the high velocity winds prevent snake eyes from taking aim.

WILL BEACH HEAD SURVIVE? WILL THEY DEFEND NE436 TARKINGTON RES. HALL FROM A FUTURE OF NO SOCIAL INTERACTION?

To Be Continued……….

→ 2 CommentsTags:

Bio-Informatics

September 4th, 2009 · python

It has been an amazing first week at college here. First, I began working with Prof Kihara and got to see a whopping 160 thousand (!) annotations. The coolest part is that I got to see a few protein function prediction algorithms (I get to code! Yippee!). Well, at one point I had to whip up my own factorial function (since I can’t use Python2.6 which has a math.factorial courtesy lab machines). Well, I modded a bit of the code I found at http://en.literateprograms.org/Factorials_with_prime_factorization_%28Python%29 (which helped me a lot, thanks). And then I mixed it up with my miller rabin implementation and had a bit of fun :D.

So the slightly modded version of the factorial script should look like this.

→ No CommentsTags:

Addiction Theory

August 17th, 2009 · Daily life, Psychology

School starts next week. I am now a sophomore and will work in a lab for the first time (yay!). Anyway, I was trying to understand how addiction works and I was not exactly pleased by the absence theories so I decided to give it my own spin. This is of course not researched and is just a hypothesis so please don’t put too much weight on whatever is in this document. You can find it here: http://shriphani.com/essays/addiction_final.pdf

→ 1 CommentTags:

Tonsillectomy: Biggest PITA in the World

August 13th, 2009 · Daily life

Ok, Since it seems to be customary to document one’s suffering post one of the most commonly performed surgeries in the world, I have decided to put my experience online as well.

Ok, post surgery, I spent a day in the hospital. I am sure that in most cases you’re sent back home in a few hours but my doctor (who is pretty awesome) insisted that I stay under observation for a night.

Day Of Surgery:

I downed a decent amount of ice-cream and managed to down a decent amount of water. Apart from the blood I was spitting out for like 2 hours after the surgery, I didn’t feel a lot of pain. Of course, the intensity of my voice was less than 55 dB for sure. My speech was reduced to whispering.

Day 1:

Decent amount of pain. Downed a decent amount of painkillers and ate ice-cream + yoghurt

Day 2:

Decent amount of painkillers again, throat begins hurting a bit more than usual. Apart from like a spoonful of yoghurt, I couldn’t down anything else. Slept a decent amount of time.

Day 3:

Had a decent amount of painkillers again (something like 50 – 60 ml) and I could down a bit more of yogurt. At this point ice-cream and cold stuff started hurting.

Day 4:

In the morning all of a sudden, I felt no pain. I ate chips, bread and yogurt with ease. Sometime in the evening though, the pain came back with full force and it hurt real bad. I was worried because I thought I did something wrong. I couldn’t sleep that night. Also, had to take painkillers every 2 hours since it hurt real bad.

Day 5:

Thought I was going to die. Any activity involving the throat would lead to some severe pain in the ears / throat and I was getting dizzy. I downed a solid 100 ml of painkillers on this day. I also ate nothing.

Day 6:

Small signs of improvement, with regular painkiller doses, I was able to keep the pain to a minimum. I also shifted to a diet of bread + soup. Worked beautifully.

Day 7:

Felt real awesome. Went for like 10 hours without painkillers. Had to take a small dose before sleeping. Else, it felt great.

On day 8, I resumed my normal lifestyle. No pain whatsoever.

There, I didn’t read any torture stories and felt that with painkillers nearby, it can all be dealt with easily.

If you’re having this op done, do not hesitate. I don’t snore anymore and all traces of apnea have disappeared. Life’s awesome now.

→ 1 CommentTags:

First DSP Attempts

June 26th, 2009 · DSP, python

Well, since my last post on detecting calls of attention using my microphone, I have been paying attention to DSP since it seems to be one cool topic to spend 3 months on. So, I decided to begin reading the dspguide and found some cool stuff which I decided to tinker with. After reading about convolution, I decided to implement some basic filters that would help me amplify, add echoes and so on. First, implementations of convolution, the input side algorithm and the output side algorithm:

The input side algorithm:

def convolute_inside(impulse_response):
        global output_signal
        global input_signal
        len_out_signal = len(input_signal)+len(impulse_response)-1
        output_signal = [0 for x in xrange(len_out_signal)]
        for i in xrange(len(input_signal)):
                for j in xrange(len(impulse_response)):
                        output_signal[i+j] = output_signal[i+j] + input_signal[i]*impulse_response[j]
 

The output side algorithm:

def convolute_outside(impulse_response):
        global output_signal
        global input_signal
        len_out_signal = len(input_signal)+len(impulse_response)-1
        for i in xrange(len_out_signal):
                output_signal.append(0)
        for i in xrange(len_out_signal):
                for j in xrange(len(impulse_response)):
                        #print i, j
                        if not (i-j)<0 and not (i-j)>len(input_signal)-1:
                                output_signal[i] += impulse_response[j] * input_signal[i-j]

 

Next, using appropriate filters, we can add echoes and amplify the input signal. So, we first need to read in a wav file. Using my previous script, that should be simple:
Reading a wav file:

def readWavFile(input):
        inFile = wave.open(input, "rb")
        sample_rate = inFile.getframerate()
        total_samples = inFile.getnframes()
        vals = inFile.readframes(total_samples)
        inFile.close()
        return struct.unpack("%dh"%(total_samples), vals)
 

This next function adds an echo to the input signal:
Add an echo:

def add_echo():
        ‘Amounts to scaling and shifting the delta function and then adding it to a delta function’
       
        #considering the intensity of the echo to be 60% of that of the original signal and delayed by 1000 samples:
       
        impulse_response = [0 for x in xrange(1003)] #shifted and scaled delta function + delta function
        impulse_response[0] = 1
        impulse_response[-1] = 0.6
        convolute_inside(impulse_response)

 

The above function considers that an echo occurs a 1000 samples after the current one and with an intensity 6 tenths of the original signal. Since testing this becomes a serious pain, I decided to test my echo function using a different input signal and impulse response (echo is 6/10 of the original intensity and 4 samples away).

def add_new_echo():
        ‘Amounts to scaling and shifting the delta function and then adding it to a delta function’
       
        #considering the intensity of the echo to be 60% of that of the original signal and delayed by 4 samples:
       
        impulse_response = [1,0,0,0,0.6] #shifted and scaled delta function + delta function
        convolute_inside(impulse_response)

input_signal = [1,2,3,4]
#print input_signal
add_new_echo()
print output_signal
 

The output of this is:

mouse-den% python convolution.py
[1, 2, 3, 4, 0.59999999999999998, 1.2, 1.7999999999999998, 2.3999999999999999]

The impulse response for amplifying a signal would be a scaled delta function, so:

def amplify():
        ‘Make the impulse function a scaled delta function’
       
        impulse_response = [2]
        convolute_inside(impulse_response)
 

The code to write to a wave file should be pretty straightforward:

def writeWavFile(out):
        outStream = wave.open(out, "wb")
        outStream.setnchannels(1)
        outStream.setsampwidth(2)
        outStream.setframerate(44100)
        data = ""
        for i in xrange(len(output_signal)):
                data += struct.pack(‘h’, output_signal[i])
        outStream.writeframes(data)
        outStream.close()

So, here is the test wave file: test.wav

And the amplified wav file: output1.wav

Now, to the part where we try to figure out if a signal contains another signal.

Turns out, to do that, you just need to obtain the cross-correlation of the input signal and the waveform we already have.

So, the code to do this for a signal that goes like [1,2,3,4] is:

def cross_correlate(input_file=""):
        ‘Cross correlate, obtain the graph and find the peak’
       
        global input_signal
        global output_signal
        #input_signal = record() #this procedure reads in the signal we need to find
        input_signal = [1,2,3,4,11,0,0,1,10,5,2]
        impulse_response = [1,2,3,4] #read in our signal
        impulse_response.reverse()
        convolute_outside(impulse_response)
        print output_signal
 

This results in a signal that looks like:

[4, 11, 20, 30, 64, 44, 26, 15, 43, 52, 44, 26, 9, 2]

So, I had to figure out an algorithm to detect the peak and I found one on stackoverflow.com and it goes like this:

for i in xrange(len(output_signal)-1):
                travel = output_signal[i+1] – output_signal[i]
                rise = output_signal[len(output_signal)-1] – output_signal[0]
               
                if (travel/rise) > 1:
                        #print output_signal[i]
                        peak_options[output_signal[i]]=i

        return peak_options[max(peak_options.keys())]   
 

In other news, I am going to work in Prof. Kihara’s bio-informatics laboratory this fall and I hope it works out.

→ 3 CommentsTags: