Pincode search engine and an idea

Well, the URMS at NIPL is complete and I have decided to step aside for the “web designers” to take my place. No matter how hard I try, I end up dissatisfied with my work on making templates. That doesn’t mean that I hate those websites where the creators put in hours of hardwork. I particularly like Jeff Croft’s website. The colors are not too showy and I can read the text without straining my eyes. Jeff Croft worked at LJW, the place where django was created. I suppose it comes naturally to him.

I want to improve my django skills. Recently jburd (IRC) gave me an interesting task, making a search engine for India’s pin codes. I wrote one in seconds and was immediately confronted with a problem, India is a nation with over a million town and villages, who would give me the pin codes of all these places and when would I enter them into my db?

Search engines have to operate intelligently in order to produce relevant results. I remember when someone on IRC told me about the Levenshtein algorithm and showed me an implementation in python.

I then came across an idea. English is not a scientific language. Hence we have multiple spellings for the same name. Let us pick the word “Secunderabad”. I can also write Secunderabad as Sikandarabad or Sickandarabad. All sound the same. To a person who had only heard of Secunderabad but never seen how it is spelt, using a pincode search engine would be a bit of a problem. We can then use a speech application (espeak ?) and obtain a prononciation, then we can rewrite the same in a scientific language (like Hindi, Sanskrit and other Indian languages). We could compare that to the data in our database and get an accurate result. Then use the speech application to find out the possible permutations in english and spit out the results.

I don’t know, but I might have provided myself with an idea for research when I go to college. :)

Oh, and by the way, I have bundled the pincode search engine here.

Post a Comment

Your email is never published nor shared. Required fields are marked *