Tagged: hiragana

Predicting Which Hiragana Character You’re Drawing

I tried to take a shot @ the problem shown above. Turns out just high-school level math is sufficient to make a decent classifier for this.

In the Japanese alphabet(s), a character is composed of strokes. These strokes have a fixed order. This restriction is pretty much all you need. I grab the stroke’s end points and centroid and then normalize.

Once, a stroke is finished, a nearest-neighbor classifier runs and fetches 3 best matches. I have had very decent performance with this approach. Right now, no information about the distance between successive strokes is used. Incorporating that should improve performance by a significant amount.

Here’s a video of the classifier in action:

The full source code (very kludgy) is on github.

I have further ideas on improving the perf and training with some Kanji.

Enjoy :)