CoverImage

Illustration by Dhabia AlMansoori.

NYU Abu Dhabi Professors Develop Arabic Thesaurus

Two professors have developed an Online Readability Leveled Arabic Thesaurus that provides a novel way of searching for Arabic words, their related forms and English equivalents.

Feb 14, 2021

On Jan. 19, NYU Abu Dhabi announced the launch of an Online Readability Leveled Arabic Thesaurus that provides a novel way of navigating the search for Arabic words, especially for learners of the complex language.
Available online for free, the thesaurus was jointly developed by Associate Professor of Practice of Arabic Language Muhamed Al Khalil and Professor of Computer Science Nizar Habash. When a user inputs any Arabic word, the tool automatically identifies all roots associated with the word and links to related words of the same root. In addition, it provides the readability level which indicates the expected language proficiency needed to understand the word. Results are primarily in Arabic, but users can search in English as well, and English meanings are provided for each word.
“When reading literary text, we often find words that are hard — they may be fancy terms; more archaic; or convey complex multilayered meanings and connotations,” Al Khalil explained. “For educated readers, who know enough of the words in the text, it is not a problem to figure out the meaning using the context. But for learners — Arabic natives or non-native — this can be a struggle; the readability thesaurus can be used to help editors of the original text replace hard words with simpler synonyms, and to do so consistently.”
The thesaurus is part of a larger NYUAD-funded project titled Simplification of Arabic Masterpieces for Extensive Reading. It aims to create a standard for the simplification of modern Arabic fiction for school-aged learners.
Al Khalil believes that teachers and learners can benefit greatly from the thesaurus. “Most Arabic thesauri do not provide readability information,” he said. “This is the gap we fill.”
Currently, the interface focuses on Modern Standard Arabic, which is the form that is typically used in education and media. It categorizes words into Levels 1-5, from beginner to specialist vocabulary. When asked about how words are ranked on the readability scale, Habash explained that commonly spoken standard Arabic words, such as the ones that appear on television, tend to fall under the lower readability Levels 1-3. Words will cross over to Level 4 when they appear in literary discussion, and those that are highly specialized are classified as Level 5.
Habash is also the director of the Computational Approaches to Modeling Languages Lab which deals with research and education in artificial intelligence. His main lab research areas include Arabic natural language processing, machine translation, text analytics and dialogue systems.
“The tool is a demonstration of how basic enabling technologies for Arabic can help users. The same basic enabling technologies can be used in AI systems for disambiguation of ambiguous words,” Habash said. “The specific readability lexicon aspect is interesting for automatic AI text simplification, an area we look forward to exploring in the future.”
Charlie Fong is News Editor. Email her at feedback@thegazelle.org.
gazelle logo