Monday, June 22, 2009

Get rid of modifiers

When you are performing queries you sometimes want to treat modified letters like their corresponding unmodified ones, e.g. è should be treated just like a plain e.

The first algorithm that comes into your mind is probably a long switch of modified characters, which is horribly ugly. The second one could be a map, slightly better but still ugly. Both approaches require quite an amount of work, and I didn't (I still don't) like them.

After investigating a little and asking some friends I was more or less resigned, until Gabriele pointed me to what I was actually looking for: the java.text.Normalizer, that lets you transform an ugly string into a neat one with just a single line of code:

result = Normalizer.normalize(myString, Normalizer.Form.NFD);
return result.replaceAll("\\W", "").toUpperCase();

Now, that's what I call quite good...

No comments: