Probability Trolls

In the not-too-distant past, I had the misfortune of being around probability trolls. Those who would exploit the bugs in the definition of probability to produce ridiculous answers to fermi-type problems.

Q. What is the probability that there is life on Uranus?

A. 0.5 – either there is life or there isn’t.

Q. What is the probability that the earth will be destroyed because of the LHC?

A. 0.5 – either it will be destroyed or it won’t.

And so on.

The source of this trolling is contained in the naive definition of probability :

P(A) = |E(A)|/|S|

And it pisses me off. But I guess that means that the troll’s mission has been accomplished.

Predicting Which Hiragana Character You’re Drawing

I tried to take a shot @ the problem shown above. Turns out just high-school level math is sufficient to make a decent classifier for this.

In the Japanese alphabet(s), a character is composed of strokes. These strokes have a fixed order. This restriction is pretty much all you need. I grab the stroke’s end points and centroid and then normalize.

Once, a stroke is finished, a nearest-neighbor classifier runs and fetches 3 best matches. I have had very decent performance with this approach. Right now, no information about the distance between successive strokes is used. Incorporating that should improve performance by a significant amount.

Here’s a video of the classifier in action:

The full source code (very kludgy) is on github.

I have further ideas on improving the perf and training with some Kanji.

Enjoy :)