Search In this Thesis
   Search In this Thesis  
العنوان
Enhancing english - arabic machine translation /
المؤلف
Mohammed, Ahmed Ibrahim El-taher .
هيئة الاعداد
باحث / احمد ابراهيم الطاهر محمحد
مشرف / مفرح محمد سالم
مشرف / أبو العلا عطيفى حسنين
مشرف / مفرح محمد سالم
الموضوع
Machine translation . Machine translation .
تاريخ النشر
2015 .
عدد الصفحات
ix,84P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
الناشر
تاريخ الإجازة
1/1/2015
مكان الإجازة
جامعة الزقازيق - كلية الهندسة - حاسب آلى
الفهرس
Only 14 pages are availabe for public view

from 84

from 84

Abstract

Converting a treebank into a CCGbank opens the respective language to
the sophisticated tools developed for Combinatory Categorial Grammar
(CCG) and enriches cross-linguistic development.
In this thesis, we propose a transformation approach to convert a widely recognized Arabic Treebank into CCGbank representation in order to gain
those benefits, The algorithm could successfully transform the Penn Arabic
Treebank (PA TB) into the first complete Arabic CCGbank (ACCGbank),
Our proposed algorithm performs the transformation through four steps,
starting with a preprocessing step which was enforced by characteristics
and peculiarities of Arabic language. This was required for normalizing the
PATB and making it suitable and accurate for the conversion from PATB
to CCGbank. The second step is determining the types of each node in the PATB tree structure. Afterwards, the PATB’s flat tree structures are
transformed into binary trees using binarization techniques. Finally, CCG
trees are formed using binary tree structure while augmenting the extracted
information during earlier steps to produce CCG-tags.
We conducted an experiment on several parts P A TB aiming at
converting the PATB into the ACCGbank. Our algorithm averaged 97.96%
conversion rate throughout the PATB parts. Moreover, the resulting CCG-
tags lexicon was four times larger than the PATB lexicon.
Keywords:Combinatory CategoriaI Grammar, Machine Translation,
Arabic CCGbank, Penn Arabic Treebank.