Wednesday, March 26, 2008

Steps to test glibc sorting order of any locale

In indic-mashup we had a good discussion on sorting orders and issues of sorting of indic languages. All language experts posted thers expected data on
http://www.indlinux.org/wiki/index.php/CollationData

I have recenlty completed work for mr_IN and it is upstreamed also and now you can check sorting of mr_IN in next glibc releases

just blogging this here since it will be useful for many linguist to test sorting order for there languages and it will be nice if we can test and correct sorting order of all languages. :)
so first step to do this is test sorting and file bugs for wrong sorting order ;) as i am working on collation from some time i will surely help in fixing that

step 1: create text file for ex: barakhadi_test
step 2: write sorting data into such that each sorting syllable on one line

so content of your test file will be like this
http://pravins.fedorapeople.org/sorting/barakhadi_test

step 3: use following command in terminal
syntax : LC_ALL="locale name".utf8 sort "path/test file name"

for marathi case it will be
LC_ALL=mr_IN.utf8 sort test.sort

it will give you output as sorted data each syllable per line in terminal

if you want to write sorted data in some file just add following line instead of above

syntax : LC_ALL="locale name".utf8 sort "path/test file name" > output_file
LC_ALL=mr_IN.utf8 sort barakhadi_test > barakhadi_sorted

http://pravins.fedorapeople.org/sorting/barakhadi_sorted

there are also some other way to test but i have mentioned here method i am using to test