Pravin Satpute

Monday, May 05, 2008

Google Translation tool: (My First Look)

Just saw today google translation tool, nice to see effort they have done for hindi-to-english and english-to-hindi
http://www.google.com/translate_t
playing with it just 5 min gave me following results:

While doing from Hindi-to-english
i/p: तुम कौन हो -> What you have
expected result: who are you
i/p: मे घर जा रहा हु -> Hu is in the home ->
expected result: I am going home

While doing from english-to-Hindi
i/p: who are you -> आप जो कर रहे हैं
expected result: तुम कौन हो
i/p: go away -> दूर
expected result:दूर जा

so i found its accuracy something around 20-30% for sentence. It is good for single word.

while speaking about translation from english-hindi and hindi-english is really very tough job, since
1. For single english word there are lots of hindi words available so its really difficult for tool to understand, It need manual editing for improving accuracy.

2. Meaning of words keeps on changing according to statement while changing from english to hindi

3. Structure of sentence, if you observe carefully, 'is' 'was' comes in the middle of english sentence same translated things comes in the last part of hindi sentence.

4. Determining sex of sentence is a tedious task
Statement "I am going" this statement have two forms depending upon who actually speaking it
male "मे जा रहा हु" Female : "मे जा रही हु"
This can be achieve only if we know who is actually speaking it from previous statements else we need manual editing.

5. Most Important thing it need strong dictionary in the background for all these comparisons and should have all possibles words for particular word.

Good to see google tried it, some more work will definitely improve it lot!!!

Friday, May 02, 2008

Including smc-fonts package(Malayalam fonts) in Fedora 9.

I really found it useful to write here since i think its nice to see many Malayalam typefaces/fonts in Fedora 9 now. so now Malayalam users not need to install these fonts manually they can easily just do

yum install smc-fonts-*

they will get these fonts in Fedora 9

Its really worth to write here, since it was really difficult adding this package in Fedora, since there was lots of issue in this package as per (FPG)Fedora packaging guideline. and i am happy that finally i done required adjustments and finally it is in Fedora 9.

Issues:
1) There was license inside fonts but not easily viewable, person should have fontforge or any other font tool to actually see that license

2) Then i got detached Licensed text from upstream in this tar ball but it not solved the problem
since this package consist fonts from different upstream project all with different License.
and as per FPG we cant make a single package having different license inside it.

3) Its really nice, alternative options is given there for almost all the situation in FPG, i studied all these things from the expertise, and finally added sub packaging support in smc-fonts and then kept it for review. Thanks to FPG for good documentation.

4) thanks to Rahul Bhalerao who actually done review for this very complex packaging example, since it was not regular package it was package with different license and different upstream versions.

you can see bug for more details
https://bugzilla.redhat.com/show_bug.cgi?id=433584

as per review suggestions i modified it and finally Jens Peterson gave cvs access and now smc-fonts is in Fedora 9.
cheers for that !!!!!!!!!!!!!!

Presently i am fedora downstream maintainer of smc-fonts package it includes fonts "meera, rachana, raghumalayalam, dyuthi and suruma"

if you found any bug in these fonts feel free to file it at https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora

Tuesday, April 29, 2008

difference between malayalam fonts lohit and meera

here just mentioning things i notices in between these two fonts
1) both are mono thick fonts shapes are almost same
you can see the image(fontforge metrics window) below for checking shape difference, its almost null

meera fonts shapes looks bit thicker than lohit

2) this difference i noticed while comparing two fonts in open office

meera really needs some work on this, since if we set this font in GUI application at 8-9 size it is unreadable. Even if we check its point size with other Latin fonts it is very smaller almost half than other fonts.

3) the major difference is meera support traditional script whereas lohit is following new script, so you need to choose right font according to your script need.

yeah, there are possibility of some rendering differences, bugs as they are depends upon GSUB rules of fonts and underlying rendering engine, it is not in the scope of this blog.

Wednesday, March 26, 2008

Steps to test glibc sorting order of any locale

In indic-mashup we had a good discussion on sorting orders and issues of sorting of indic languages. All language experts posted thers expected data on
http://www.indlinux.org/wiki/index.php/CollationData

I have recenlty completed work for mr_IN and it is upstreamed also and now you can check sorting of mr_IN in next glibc releases

just blogging this here since it will be useful for many linguist to test sorting order for there languages and it will be nice if we can test and correct sorting order of all languages. :)
so first step to do this is test sorting and file bugs for wrong sorting order ;) as i am working on collation from some time i will surely help in fixing that

step 1: create text file for ex: barakhadi_test
step 2: write sorting data into such that each sorting syllable on one line

so content of your test file will be like this
http://pravins.fedorapeople.org/sorting/barakhadi_test

step 3: use following command in terminal
syntax : LC_ALL="locale name".utf8 sort "path/test file name"

for marathi case it will be
LC_ALL=mr_IN.utf8 sort test.sort

it will give you output as sorted data each syllable per line in terminal

if you want to write sorted data in some file just add following line instead of above

syntax : LC_ALL="locale name".utf8 sort "path/test file name" > output_file
LC_ALL=mr_IN.utf8 sort barakhadi_test > barakhadi_sorted

http://pravins.fedorapeople.org/sorting/barakhadi_sorted

there are also some other way to test but i have mentioned here method i am using to test

Thursday, February 14, 2008

How to test Open Type Fonts

Font is very important part of Operating system and 100% accuracy of that is very important.

Though i am not from QA background but after more than three years of working in this rendering issues field i have seen many varieties of font problems. I am writing some of area where we should look while testing fonts. This is just some guidelines please suggest me if any you can.

Steps:
1. Cross check font with Unicode chart:

This is very important step, use tool like character map in Fedora, charmap in windows. These tools show you glyphs in font and corresponding Unicode value. Just check this with Unicode code chart www.unicode.org/charts. There is chances of wrong Unicode value assignment to glyph and also possibility of missing important Unicode characters. Please check the Unicode version font supporting so it will be nice to check particularly with same version of Unicode chart.

2. Create a test file for your language:

The content of this file should be all possible syllables of language, say for marathi it will be barakhadi(combination of all consonant+ matras). It should also contain some conjuncts form ex. matra ligatures, consonant ligatures. Making this file is really big effort but doing so once will help you forever for testing fonts. I think there should be some available for some script already. So please surf net.

I will really appreciate people from community if they have such file please submit that file it will help lots to others peoples also.

So when you testing any new font for your language just apply that font to this file and check whether it is working properly or not. In case of any doubt you can refer to all ready available accurate font for cross checking.

Problems of this file can be, since you are not actually typing the characters, there are chances of source file might have typed using wrong typing sequence. Since we are just viewing that we cant be sure what person has entered while typing it. So i suggest please in case any doubt type actually that character sequence.

Example: Once i took some data from esakal.com for testing but some characters were not rendering properly, letter on i understood there data entry operators are typing ZWJ in many unnecessary places also(almost every places where half form's of consonant comes).

3. Problems of rasterizer:

Many times it happens, due to problem of hinting rasterizer shows some wrong GPOS attachments. In this case please increase the font size and check, else just take a printout of this doc and check.

I have noticed this problem while working of nastalliq script font, cursive attachment is key point of nastalliq script and on screen it was showing break in cursive attachment but in GPOS rules of font and in print paper it was showing properly, so it was problem from rasterizer side.

4. Compatibility Issues:

This is major problem of Open Type Fonts. Open Type Fonts gsub rules give different results according to rendering engine's reordering methodology.
So even though our test file gives nice results on gedit, it might not give same results on kwrite/openoffice since they use different rendering engine.
It will not give 100% different behavior but there are chances of some bugs. So please dont forget to do this.

5. Font Styles: (Normal, Italic, Bolt and Bold Italic etc)

check is it possible to apply all styles to font (Bold, Italic, Bold Italic)
I have seen this problem while creating Arabic fonts, even though after installing for weights of fonts(N, I, B and BI). When i was trying to see actually effect it was not showing me style variations. After digging into it i understood some problem of .fog(source file) file setting. I was not updating name of TTF names field.

Testing styles is very difficult task since many editor give inbuilt support for italics, bold. So checking the glyph we are watching, is from Italic font or is this default italic given by editor is very important. I was doing so by taking printout :).
First take printout of default italic given by Editor. Then Install Italic weight of Font take printout. Now compare things will give you exact results.

6. Fonts internal name:

Whenever we double click on any True Type Font it shows us its name, license information etc. In gnome it is gnome-font-viewer, in windows windows-font-viewer these viewer shows this information.
Testing this thing is also very important. Since this is the name we identify while selecting font in various editors(OO, gedit etc).
So test is this font-name shows as expected in Editor. ex. 'samyak devanagari' while selecting font in menubar.

7. Selecting Font:

Actually applying font to text. Some time editors don't allow us to do this. I have see this problem with Ms Office. After installing a font it was not allowing me select one from list.
The problem in this case is unicode bits in OS/2 are not set properly for script.
http://www.microsoft.com/typography/unicode/ulu.htm
Fontforge and Fontlab font editors do this, just we need to select required script. But people who are still using Fontographer need to give this by writing Hexa value for that script range.

8. Printing Quality:

Shapes we see on screen are result of hinting, anti-aliasing and some more processing of rasterizer also screen-resolution also effects on it.
Check by taking printout: It will shows how font looks in printed material, might be you can suggest some modification in glyphs.
Yes definitely we should check this with different kind of printer dot-matrix, laser, inkjet. It actually tell you the printing quality of your font :)
That's why you can seen many varieties of font today for different media(display, print etc)

please don't forget to mention version off all application with OS name while filing bug. Also if you are giving key name of ASCII keyboard, please mention which keyboard layout you are using ex. inscript, phonetic etc

Tuesday, January 22, 2008

Temporary Solutions for Kashmiri Problems

* All things in this blog is for Kashmiri devanagari only
Kashmiri Standardise in 2002 in CIIL by Shri. RK Bhat, but it is very sad still they don't got additional required code point in unicode devanagari code page u+0900. I have seen www.koshur.org on this website there is lots of content available for kashmiri, basic of kashmiri, its sounds and using only these resources given in this website anybody can learn kashmiri. Going through this website person will definitely understand in unicode devanagari codepage we need additional code points for kashmiri for sounds û & ü.

Then what is reason Unicode is still not supporting kashmiri fully??
Actually earlier community people discussed with unicode, but the thing is kashmri required vowel sound û & ü are little bit same like Gurumukhi vowels u & uu, so unicode suggested to use that else, propose the same to unicode.
But that solutions has may problems , since first thing is gurumukhi shape is different than require.
and there is lots of other problems from rendering and font side. Since if we type Gurumukhi matra u+0a42 in side Devanagari text it identifies it as punjabi language syllable, recognizing as invalid syllable
and through u+25cc mark in between devanagari character and gurumukhi matra. And dont gives the required combination.
The using matras of gurumukhi will not solve the problems since corresponding vowels signs also required.

Presently for Translation work i am putting these shapes at location u+e500, u+e501, u+e502 and u+e503 in lohit and samyak fonts. Yes i know it is wrong since data getting created using these code-points will be not standardize but no option because unicode thing will take long time and we cant stop our work, so as soon as we will get code-point in unicode we will use converter or replace these with proper value.

added first character of above image on Inscript V , Third character on B
Second & Fourth Character on # & $ key

become root user
1.take backup of font - copy /usr/share/fonts/lohit-hindi/
and
2. paste http://pravins.fedorapeople.org/kashmiri_temp/lohit_hi.ttf this font at same location
3. take back-up of mim file /usr/share/m17n/hi-inscript.mim
4. paste this file http://pravins.fedorapeople.org/kashmiri_temp/hi-inscript.mim at same location

just logoff and u can now type kashmiri using key #, $ and V and B

Tuesday, July 17, 2007

Presenting in National Workshop on Calligraphy and Typography

It was a very big Workshop cum Seminaar on Calligraphy and Typography organised by GIST, CDAC Pune. All masters in this subject were present in this Workshop. The good thing was Calligrapher as well as Technical/ Font Engineer were present. My topic was OpenTypeFonts and Some Case Studies.

Tuesday, September 05, 2006

Freindship

Do you know the relationship between two eyes..? they blink together, they move together, they cry together, they see things together and they sleep together BUT THEY NEVER SEE EACH OTHER.. that's what's friendship !!! If I called you and asked you to pick me up because something happened.... would you come?.... If I had one day left to live my life... would you be part of my last day?.... If I needed a shoulder to cry on.... would you give me yours? This is a test to see who is your real friend or just someone that talks to you when they are bored. . . (and I do care!) you know that?

Monday, August 28, 2006

Till now i have tried my best

Dear Friends,
Till now from my junior college till now i have always tried my best. This is the only thing which always in our hand & i have used it extensively perfect. Whatever may be outcome of situation we can satisfy ourself that whatever time we got we did our best. From this attitude my success story started even though many time i started from 0% & within very small time i got best.