Tuesday, October 08, 2013

Open Source Language Summit Nov 2013 in Pune (Sprint ideas)

Hi All,

   I am sure people who were part of this summit last time must be excited to attend 2013 Open Source Language Summit as well.
   As a planning we (Parag and Anish) discussed possible hackthon's we can do in this summit and we listed down following items:

Language Support:
    Identifying availability of basic components for 287 languages. (Unicode, Locale, Fonts, IME and rendering support) Mostly comparing with Wikipedia and Fedora language support.

Fonts:
Lohit2 project for "Creating standard and reusable Open type tables for Indian script font" [1] has already took pace with the release of Lohit Devanagari Alpha and start of work on Gujarati and Kannada. Its humble goal of creating language rendering tables so that font can render perfect across platform makes it priority item for summit. Following are things we are planning.
(Sprint 1 )
- Lohit2
    - Hacking Telugu and Kannada Open type tables
    - Testing of Released Devanagari and Gujarati

(Sprint 2 )
Just idea presently on how we can use lohit2 reusable tables in long run
- Lohit-ise (Will give small 10min presentation on this)
    - Work on Devanagari and Gujarati fonts

(Sprint 3 )
Time to update font packages in major OS distribution. One can learn packaging through this sprints. Expecting experts from different distribution packaging team will be there to help.
- Packaging AnjaliOldLipi,  Meera Tamil and SakalMarathi for Fedora, Debian and Ubuntu

Input Method 
Ibus typing booster [3] is predictive text input method supporting 41 languages  for in opensource world. Number of people using it and given positive feedback on it. During last few conference i got request from active opensource users to make it available on Debian and Ubuntu as well. Planning this activity in this summit.
(Sprint 1)
- ibus-typing-booster
    - Packaging for Debian and Ubuntu

(Sprint 2)
Not sure how much change will be need to port ibus-typing-booster to fcitx framework. Will discuss and see what we can do for this. Help from users and Developers of fcitx is welcome from all perspectives :)
- ibus-typing-booster
    - Porting to fcitx

(Sprint 3)
During last one year number of features has been added to it. Presentation and testing event for same.
- ibus-typing-booster
    - Presentation on ibus-typing-booster features
    - Testing for Indian IME's support

(Sprint 4)
Plan is to create layout images for inscript2 layout.
- Keyboard Layout Images
    - Porting mapper scripts to support inscript2 layouts
    - Update Fedora wiki images of keyboard layout

Rendering Engines
(Sprint 1)
This bug was identified during the lohit2 development. Unfortunately Behdad will not able to attend this summit, so will try to debug and see if we can provide patch to fix this.
- harfbuzz
    - hacking on  "glyph moving across syllable boundries" bug [2]


1. http://pravin-s.blogspot.in/2013/08/project-creating-standard-and-reusable.html
2. https://bugs.freedesktop.org/show_bug.cgi?id=69266
3. https://fedorahosted.org/ibus-typing-booster/

3 comments:

csslayer said...

Hi, I'm fcitx main developer.

I'm not sure if it's fully covered, but fcitx already covers some auto completion directly with keyboard layout, by using enchant (which has hunspell/aspell/ispell backend) and presage.

https://fcitx-im.org/wiki/File:Fcitx-Keyboard.png

AFAIK ibus-typing-booster parse hunspell dictionary itself instead of using hunspell, which could be faster than directly using hunspell library (hunspell is too slow for IME), so actually for English there is also a custom implementation in Fcitx.

https://github.com/fcitx/fcitx/tree/master/src/module/spell

And BTW in Fcitx we don't want separate engine for autocompletion (Fcitx now have autocompletion inside the keyboard engine), so inputs from your side are welcome, but I just want to mention that directly port ibus-typing-booster as one/many separate IME engines doesn't follow fcitx design.

Anish Patil said...

Hi csslayer!

Thank your for your input.
I am not familiar with Fctix.
What we are trying to achieve is we are trying to take out auto completion part out of ibus-typing-booster for that we have created a separate project called
https://gitorious.org/libyokan

Libyokan deals with both auto completion and text prediction.
In Libyokan we build language models(n-gram) based on user input and we will be building language models for major of the languages

It would be nice to see in fctix as well, let me know your thoughts?

csslayer said...

Sounds like Presage (http://presage.sourceforge.net/ , which is also used for text predicition, in real world Nokia N9 use it, and fcitx also use it as a backend).

In practice, presage lacks api to iterate the supported language, and I don't want to implement that manually (iterate over undocumented directory path is painful if anything changed upstream), so in fcitx I currently only use it for English. I hope this could be included in libyokan. (seems currently it's missing)

And another problem is the library license is not very friendly for fcitx to use it, could it be changed to something else?

Looks like current API is a key event driven API, while I would prefer a unicode character based API, key compose code is also live together with the auto-completion in fcitx, I don't want to convert it back and forth from keysym to unicode. And I really dislike the function key (space, enter) being handled by library. Libyokan as word predication library should only play with string, things like use transliterate should be moved outside and be put in the im engine implementation.

And is libyokan's n-gram character level or word level? Seems it's character level otherwise should be much larger. And I hope there could be some API to feed in surrounding text for predication.