الفهرس | Only 14 pages are availabe for public view |
Abstract Anaphora Resolution (AR) is the process of determining the antecedent of a given anaphor. It is an understudied issue in Arabic Natural Language Processing (ANLP), although some current Machine Translation (MT) systems handle it poorly. AR is usually difficult because it requires various types of knowledge and resources – syntactic, lexical and morphological – which are not available for such a language like Arabic given its scarce Natural Language Processing (NLP) resources and tools. Consequently, the proposed algorithm follows a statistical, corpus-based approach, using the Web as corpus to overcome the sparseness of data and to provide necessary resources for Arabic AR such as semantic features, collocational associations and non-pleonastic pronouns. Evaluated against a gold standard set of manually annotated pronouns, the algorithm achieves an F-measured performance rate of 87.6%. |