Doumi N, Lehireche A, Maurel D, Toumouh A, Khelifa C. FSM-based Free Resources and Tools for MSA Processing. ITCMS. 2017; 3 (1) :1-11
URL: http://itcms.europeansp.org/article-11-169-en.html
Computer Science Dept., University of Saïda Algeria
We present in this paper a set of resources and tools designed and implemented using the finite-state machine technology. These resources and tools are designed to process the Modern Standard Arabic textual corpora. The resources are kind of lexical electronic dictionaries containing lemmas, full diacritized word forms and their morph-syntactic features. The features can be extended to encompass the semantic category. The tools are kind of tokenizer, concordance tool, lemmatizer, POS-tagger, morphological analyzer, sentence segmentation and local grammars. Somme of these tools are hardcoded in programming language and the rest are designed as finite-state transducers and recursive transition networks. All these resources and tools are freely accessible in the web under the name of Arabic package in Unitex/GramLab platform.
