Tagged: machine learning

Predicting Which Hiragana Character You’re Drawing

I tried to take a shot @ the problem shown above. Turns out just high-school level math is sufficient to make a decent classifier for this.

In the Japanese alphabet(s), a character is composed of strokes. These strokes have a fixed order. This restriction is pretty much all you need. I grab the stroke’s end points and centroid and then normalize.

Once, a stroke is finished, a nearest-neighbor classifier runs and fetches 3 best matches. I have had very decent performance with this approach. Right now, no information about the distance between successive strokes is used. Incorporating that should improve performance by a significant amount.

Here’s a video of the classifier in action:

The full source code (very kludgy) is on github.

I have further ideas on improving the perf and training with some Kanji.

Enjoy :)

Communicating Through Fingertips – Finger Gesture Recognition Using Depth Data

In Prof. Vishy’s ML class (cs 590 – top notch course, top notch professor), we don’t have a final and instead we are supposed to apply ML to a problem we find interesting. Microsoft gave all of us interns a Kinect this summer so I decided to put it to some use (I don’t have a TV so the XBox is just collecting dust).

My goal was to be able to record finger gestures and then detect them when a user makes these gestures. I had 2 goals in mind – no OpenCV (i.e. I will use just depth data) and no wearing special stuff to guide anything.

So, let us see what I did. Basically, I used the CandescentNUI Hand Tracker to get a collection of fingertip locations and points and then applied two techniques to try and recognize the gestures we make.

First, I tried using the Passive-Aggressive algorithm by Crammer et. al. This algorithm uses an online-learning approach to build a hyperplane (in 3 dimensions, this is a plane, in 2 dimensions – a line etc. Basically, this is what is defined when you try to define a “surface” like structure for a space. Take 2 non-parallel vectors in 3D space and you can construct the entirety of the 2D world. The hyperplane is just that – an entire space (a subspace with 1 dim less than the one we are operating in).

The hyperplane is supposed to act like a brick wall (if we’re in 3D – no point visualizing a higher dimension). When we see a new data point come in, we want to inspect on which side of the wall it lies and then we can “detect” or label this point. This is the binary classifier.

The dataset consists of raw point coordinates in the space of the human palm seen by the kinect. Now it turns out that the online passive-aggressive algorithm fails at constructing a decent hyperplane separating 2 classes (data points for 2 different gestures).

 

The obvious hack was to deploy a nearest neighbors classifier. The trick I used was that I ran a large cluster k-means on the data and built myself a dataset consisting entirely of cluster centers. So I was able to reduce the neighbors tenfold and still get fantastic performance. A simple technique worked fabulously in this situation and I couldn’t be more pleased.

Here is a video of the gesture-detector in action. The annotations should show you what to look @

 

The source is up on github. The code is very kludgy and I will fix it up after finals week. In case you’re in a hurry : http://github.com/shriphani/KinectSpell 

Now, it is time to try and avoid failing in the finals x(.