Keynote Speaker: Ms. Lim Lian Tze
Title: Low Cost Construction of Multilingual Lexicons for Under-Resourced Languages
Multilingual translation lexicons are essential in many NLP applications, such as machine translation, as well as being valuable reading aids for human readers. Manually constructing multilingual lexicons from scratch, however, is very labour-intensive.
We describe a low cost method for constructing a multilingual lexicon prototype using very simple input data, namely lists of bilingual translation mappings. Such bilingual resources are often freely available and easily obtainable from the Internet or conventional paper-based dictionaries. The low requirement of the input data also means our method is especially suitable for under-resourced language and language-pairs. In addition, the overall framework of our multilingual lexicon presumes no extensive linguistic knowledge of contributors who wish to help improve the lexicon.
This allows any speaker of an under-resourced language to contribute to the project, thus making more language pairs possible via the multilingual lexicon. We also show how the multilingual lexicon can be used as an “intelligent” reading aid, using contextual information mined from untagged comparable bilingual corpus. Again, this will make under-resourced languages more accessible to speakers of other languages.
21st Jun 2012
Universiti Malaysia Sarawak (UNIMAS), Kota Samarahan, Sarawak, Malaysia