<<

PoS, and Dependencies Annotation Guidelines for Arabic

Mohammed Attia, Tolga Kayadelen, Ryan Mcdonald, Slav Petrov Google Inc. May, 2017

Table of Contents 1. Introduction...... 2 2. Tokenization...... 3 Arabic Clitic Table...... 4 Special Cases...... 4 3. POS Tagging...... 8 POS Quick Table...... 8 POS Tags...... 13 JJ: ...... 13 JJR: Elative Adjective...... 14 DT: The Arabic System...... 14 PDT: Predeterminers...... 15 RB: ...... 15 ADP/IN: Adpositions...... 16 PRP: Personal ...... 17 WP: interrogative/adjectival pronouns...... 19 VBN: active and passive ...... 19 VBG: masdar...... 20 RP: Particle...... 20 UH: or hesitation...... 21 SYM: Symbol...... 21 Specific Cases for POS...... 22 4. Morphological feature tagging...... 34 Guiding Principle...... 35 Intent vs Production...... 35 Proper...... 36 Specific Cases For Morphology...... 41 Plurality and Numerals...... 41 Pluralia Tantum...... 41 Ambiguity...... 42 Gender Representation...... 42 ...... 44 Personal Names...... 45 Idafa vs Apposition...... 45 Tagging Foreign Words...... 46 Tagging Dialectical Words...... 46 The Unspecified Tag...... 48

1 5. Dependencies...... 49 5.1 Dependency Quick Table...... 49 5.2 Dependency Labels...... 62 5.2.1 Root...... 62 5.2.2 Auxiliary...... 63 5.2.3 Arguments...... 63 5.3 Specific Issues with Dependency...... 87 MWE List...... 87 xcomp...... 89 Prep / Mark...... 90 Dates and Time...... 90 Light constructions...... 92 Quantifiers: predet vs. head...... 92 Interrogative pronouns...... 92 Multi-token subordinating conjunctions...... 94 Range expressions...... 94 Locutions: mwe...... 94 Relative pronouns...... 95 with omitted relative pronouns...... 96 Headless relative clauses...... 96 Parataxis vs. appos...... 97 Adjuncts: choice of the head...... 97 97...... لن ولكي Phrases Symbols in Dependency...... 97 98...... يمكن، يعجب، يكفي : with csubj 98...... المر الذي Subordinate sentences starting with Definition of prepositional argument (CLR)...... 99 Irregular Adjective Sequence...... 100 100...... ليس Other functions of Case for Nouns Modified by Numbers...... 100 Case for Words of non-Arabic Origin...... 100 Restrictive vs Non-Restrictive Relative/Qualifying Clauses...... 101 with ...... 101 فوق، بدل، تحت Modifiers...... 102 102...... (المتعدي لمفعولين) and ditransitives ,(تمييز) Tamyeez ,(حال) Haal

1. Introduction

The aim of this document is to provide a list of dependency tags that are to be used for the Arabic dependency annotation task, with examples provided for each tag. The dependency representation is a simple description of the grammatical relationships in a sentence. It represents all sentence relations uniformly typed as dependency relations. The dependencies are all binary relations between a governor

2 (also known the head) and a dependant (any of or modifier to the head).

In the following sections, the dependency relations are both given in relational format and in graph format, to foster a better understanding. In the relational format, the head of the dependency relation is given as the first argument and the dependant as the second argument of the relation. We represent these relations as follows:

relation(head, dependent)

This representation is a triple which shows a relation between a pair of words. For example, he slept can be represented as nsubj(slept, he) which means “the subject of slept is he.” In other words, the dependencies are all binary relations: a grammatical relation holds between a governor (or head) and a .المعمول and العامل dependent or between

Similarly, in the graph representation, the dependency arcs emanate from the head category towards the dependant category, that is; from the heads towards the modifiers/complements. In dependency structures two elements must be explicitly represented: 1. head-dependent relations (directed arcs) 2. functional categories (arc labels)

The grammatical relations are defined in Section 5, in alphabetical order according to the dependency’s abbreviated name.

2. Tokenization

The purpose of tokenization is to identify token boundaries. In Arabic, like in many other languages, tokenization is performed automatically via relying on limited set of token delimiters: space and punctuation symbols. In addition the AMP (Arabic morphological processor) also detects common clitics that are attached to the free morpheme e.g. single letter prepositions and object personal pronouns. However, sometimes tools fail to detect and tokenize every clitic due to homography, typos etc. This section provides guidance when tokenization errors are encountered.

3 Arabic Clitic Table The following table shows Arabic clitics and the course POS that they occur with.

# Description Verbs Nouns Adjective Adverbs Prons Particles Prep Conjs Question particle √ √ 1 √ √ √ √ √ √ أ √ √ و Conjunctions √ √ √ √ √ ف and” and“ 2 “then” √ Prepositions ب “ √ √ with” ك “as” ل “ 3 ”to √ ل la “then” ل 4 li “to” and س sa ”“will The definite √ 5 √ ال “Al” 6 Clitic pronouns √ √

Special Cases

Fossilization: Some words are originally two tokens. Yet, the frequency and regularity of them attached together make them annotated as one doc. However, these are considered as fossilized and should remain as one token: كذلك، لذلك، هكذا، كذا، آنذاك، حينئذ، طالما، قلما، عندما، حالما، كلما، إنما، لمما، لقد، كأن

Despite their high frequency, the following words should be tokenized: الن، اليوم، كما، بدون، بل، لكشك، أمل، لبد، ليسيما، بما

ما Issue with represents a homograph of a widely used POS. The space between it and the following ما The syllable word is often omitted. In the cases below, it should be tokenized:

4 :أخوات كان Verbal: generally مادام، مابرح ،(لزال as well) مازال

الذي Relative : when it means ما + Mostly prepositions مما، عما، للما1، مثلما

Tricky issues

بما ● as, ما and the ب is made of the preposition بما Attention should be paid that the :بما أ نن opposed to the mwe+mark construction رحب بما جاء (ما x,ب)pobj

بما أن الفوز تحقق تأهل الفريق للنهائيات (بما x,أن)mwe

:حيث or باعتبار The latter can be replaced with حيث أنه تحقق الفوز تأهل الفريق للنهائيات

كما ●

:is widely used in Arabic. The following table explains its uses and segmentation كما The word/phrase

كما

Function Description Example POS Tag Number of tokens

PRT - RP one كما يختص الوزراء بالنظر Resumptive/i Starting a sentence في المشاكل اليومية nitial faa

ADP- IN one ارتفعت اليسعار كما زاد Linking sub- Linking a clause to a المطروح في اليسواق conj .preceding sentence

ADP - IN + :Two إفعل كما تريد Prep+relativ Can be split into two ك / prep + ما / PRON - WP pobj يتقبلك كما أنت e pronoun tokens كما تحب which means when , لل مما Not to be confused with 1

5 :فيما ● can be either a temporal expression meaning "while" or tokenized into a prep+relative pronoun

فيما

Function Description Example POS Tag Number of tokens

ADP- IN one ارتفعت اليسعار فيما زاد Linking sub- Linking a clause to a المطروح في اليسواق ,conj preceding sentence providing temporal meaning

ADP - IN + :Two تناول التقرير جوانب عديدة Prep+relativ Can be split into two في / prep + ما / PRON - WP فيما يتعلق بالقتصاد e pronoun tokens, meaning in+what/which pobj

بما

Function Description Example POS Tag Number of tokens

PRT- RP one بما انك طيب, يسيحبك Linking sub- Linking a clause to a الناس ,conj preceding sentence providing a causative meaning

ADP - IN + :Two حدثني بما يسمع Prep+relativ Can be split into two في / prep + ما / e pronoun tokens, meaning PRON - WP in+what/which pobj

Fossilized:

6 and these should be ماAs shown in the Fossilization section above, many function words end with 2 annotated as single tokens: 3طالما، قلما، عندما، حالما، كلما، إنما، لمما، فيما

Prep + The Word of God The Arabic word of God has an exceptional spelling. Unlike other words that have AL as a main part, the word of God loses the Alif and have its first laam as a prep ل + ال = ل Therefore the segmentation should be as the following: NNP له + IN ل

Typos Misspelling and typos frequently cause error in automatic segmentation. The context clarifies the intended word. This largely happens when a final taa’ marbouta is written without dots which results in ”الفرق بين البطارية الجافه والسائله“ .mistaken it as a pronoun. E.g

It should be one token, JJ, but the system mistook it with VBN+PRP due to the lack of dots on the final taa’

Abbreviations and Acronyms Latin script abbreviations are usually written as one token. Their Arabic equivalent, however, is often written with spaces between the letter transliterations. In this case, the Arabic text should remain tokenized while the Latin one should not be over tokenized. In dependency, if the Latin was the appos, it should be attached to the rightmost Arabic token.

CNN: one token three tokens :يسي أن أن

Ellipsis Note that in many docs in Arabic ellipsis can be realized as two dots only instead of three. In tokenization consider as one token. .. يستظل باقية

ل Words starting with provides the meaning of negation, sometimes it is a part of a word and should not be ل While this

where a masdar can replace it and its following verb ما المصدرية Usually 2

3 Only as a temporal expression.

7 segmented from it. Below are some examples: wireless ليسلكي subconscious لوعي Invertebrates لفقاريات indifference لمبالة nonvascular لوعائي

To test whether these words should be segmented or not, precede them with the definite article. If the :should not be tokenized ل text remains valid and the POS of the word does not change, then the قرأت كتاب عن لفقاريات تعيش في الماء قرأت كتاب عن اللفقاريات التي تعيش في الماء

became definite. The two texts ل The structure here did not change, except that the word starting with The first one is a sentence while the second one, even if it is .ال below, however, differ with adding the correct, it changed to an NP:

لكشك انهم هناك *اللكشك انهم هناك

زال، دام are frequently used with some verbs, such as ل and ما As mentioned above, negative particles without a space in between. In these cases they should be retokenized, e.g. مازال -> [ما][زال] ● مادام -> [ما][دام] ● ● The same rule above applies to all tokens where a space is not provided يارب -> [يا][رب] ● عبدال -> [عبد][ال] ● هذاالنظام -> [هذا][ال][نظام] ●

3. POS Tagging

POS Quick Table

Coarse Fine Morph Morphological Description Example Tag Tag features values NOUN كتاب، كرايسة ,NN Common noun Gender masc, fem unsp_g

8 Number sing, dual, plur, كتاب، كتابان، كتب unsp_n

Animacy ratl, irrat, كتاب، كاتب unsp_r

ا ,Case nom, acc, gen كتا بب، كتابا، كتا بب unps_c

Definitene definite, indefinite كتاب، الكتاب ss

Proper true, false See section on Proper below بشار، يسلمى Gender masc, fem, unsp_g إبراهيم، مصر Number sing, dual, plur

NNP Proper noun Case nom, acc, gen

Animacy ratl, irrat, unsp_r

Proper true, false Electronic Proper true, false ADD address (email or URL) ADJ Gender masc, fem, مجتهد، مجتهدة unsp_g Number sing, dual, plur, مجتهد، مجتهدان، مجتهدون unsp_n Adjective JJ (including Case nom, acc, gen, ordinal numbers) unps_c Definitene def, indef مجتهد، المجتهد، الول الثاني، العشرون ss

Proper true, false الفضل، الفضلى JJR Comparative Gender masc, fem, unsp_g adjective الفضل، الفضلن، الفضلون ,Number sing, dual, plur unsp_n This is in the case of post- adjectives, prenominal adjectives are unsp for number and gender.

Case nom, acc, gen

أفضل، الفضل Definitene def, indef ss

9 Proper true, false DET ال DT Determiner Proper true, false أيسماء التبعيض: كل، نصف, بعض، Case nom, acc, gen جميع, أغلب, أكثر (when followed by a noun)، إلخ PDT quantifiers Proper true, false أي، أية WDT Wh-Determiner Proper true, false VERB لكلت لب، ككلت لب Voice pass, act, unsp لكلت لب، يكتب Aspect imperf, perf, unsp

Mood ind, sub, jus, imp, يكتب، أن يكتب، لم يكتب، اكتب unsp

يكتب، كتب - لم يكتب، يسيكتب (يسوف ,Tense pres, past, fut يكتب) - لن يكتب unsp VBC Verb conjugated أكتب، تكتب، يكتب Person 1,2,3

Number sing, dual, plur, كتب، كتبا، كتبوا unsp_n

كتب، كتبت Gender masc, fem, unsp_g

Proper true, false Number sing, dual, plur, ايسم الفاعل وايسم المفعول العامل unsp_n معربا، معرلبة Gender masc, fem, unsp_g

Case nom, acc, gen verb VBN form Voice pass, act, unsp

Definitene def, indef ss

Proper true, false Proper true, false verb number sing, dual, plur, المصدر العامل VBG form unsp_n case nom, acc, gen ADV أيضا، .RB Proper true, false This includes fixed (e.g

10 أبدا، .and open adverbs (e.g (فقط .(خاصة Question and Proper true, false كيف، متى، أين، لماذا، كم، حيث WRB relative adverbs ADP Proper true, false prepositions من، إلى، عن، على، إلخ Preposition or prepositionals IN Subordinating فوق، تحت، أمام، خلف، إلخ Subord_conj أن، عندما، وقتما، إلخ PRON أنا، أنت، هو، نـي، ـك، ـه Person 1,2,3 هو، هما، هم Number sing, dual, plur

Personal case nom, acc, gen PRP pronouns Gender masc, fem, هو، هي، هما unsp_g

proper true, false Relative and Proper true, false ما، ماذا، من WP interrogative pronouns non-referential Proper true, false (expletive) ضمير الشأن:الهاء في أنه EX pronoun ضمير الشأن Number sing, dual, plur, الذي، التي، إلخ unsp_n Relative REL الذي، التي، إلخ pronouns Gender masc, fem, unsp_g proper true, false هذا، هذه، هذان، هاتان، هؤلء Gender masc, fem, unsp_g هذا، هذه، هذان، هاتان، هؤلء ) Number sing, dual, plur PDEM (pronouns Case nom, acc, gen Proper true, false CONJ Coordinating Proper true, false و، ف، ثم، أو، أم، بل، حتى، لك نن، ل CC conjunction NUM

11 واحد وعشرون، إحدى وعشرون ,Gender masc, fem unsp_g Note digits (0-9*) are not CD Cardinal number assigned number and gender number sing, dual, plur proper true, false PRT هل، أ اليستفهامية، ل، لم، ما، لن، Proper true, false النافية، يسوف، س، إذا الفجائية، ما RP Particle المصدرية، الواو الزائدة، لم المر ، فاء الربط، أما، إل، إنما، ما التعجبية PUNCT Terminal Proper true, false . punctuation such ? ! . as Comma and Proper true, false , comma-like punctuation Colon and Proper true, false : semicolon Closing bracket Proper true, false ) punctuation Opening bracket Proper true, false ( punctuation Open quotation Proper true, false marks and `` similar punctuations Close quotation Proper true, false mark and other '' similar punctuation Hyphen, dashes, Proper true, false - and similar punctuation Proper true, false Note that in many docs in Arabic ellipsis can be realized as two dots only. In ... Ellipsis tokenization consider as one token. E.g. يستظل باقية .. X

12 Includes Proper true, false currency ($, €) SYM and percentage symbols (%). LS List symbols Proper true, false Proper true, false Affixes that are This tag will be used for affixes like ' when detached from ”يريد ون“ in 'ون AFX separated due to .conjunction, etc the word. FW Foreign words Proper true, false whose meaning is not known and cannot be inferred Goes With. Word Proper true, false parts separated GW due to bad تل ميذ .tokenization. e.g Interjection or Proper true, false (بلى، أجل، آه، كل، نعم، ياه) UH hesitation Non-final Proper true, false punctuation, including NFP emoticons and multi-symbol tokens XX Total garbage Proper true, false Reference for naming conventions: http://universaldependencies.github.io/docs/u/feat/all.html

POS Tags JJ: Adjective ● Adjectives in Arabic follow the modified noun and agree with it in number, gender and definiteness. and agree with the head noun in خبر Adjectives can also come in the predicate position ● .الرجل كريم .number and gender, e.g are annotated as الوزير السوداني .e.g ,(نسبة) Adjectives derived from proper nouns ● JJ/proper=false. Generally speaking .الغنياء يحسنون إلى الفقراء .Note that nominalized adjectives are NN, e.g ● any JJ (with the exception of elatives and ordinals) that is not modifying or predicating a noun is a (lexicalized) noun. من المقرر أن، من المهم أن، ) Nominalized adjectives are also found in constructions such as ● is من is NN/pobj, the prepositions كشائع Here .من الشائع أن يعاني المريض من مشاكل .E.g .(من الضروري أن .'is 'csubj (يعاني) heads (ROOT) and the heads of the following clauses

13 ● Ordinal numbers are JJ, e.g. الول، الثاني، العشرون ● يعد البراهمي ثاني يسيايسي يتعرض للغتيال ● يوم الخامس والعشرون من فبراير ●

JJR: Elative Adjective template and are derived from أفعل Elative adjectives (JJR) are adjective that come in the ● ordinary adjectives. (من عظيم) JJRأعظم ،(من فاضل) JJRأفضل ،(من ماهر) JJRأمهر ،(من ذكي) JJRأذكى ○ but they are not derived from another أفعل Note that some adjectives have the pattern ● adjective and they are JJ NOT JJR. They include personal traits and colors. The test is that with .e.g ,ألنفلعلة or لفنعلء this type of adjectives the feminine is formed to the pattern JJأيسود ،JJأصفر ،JJأبيض ،JJأكشقر ،JJأجوف ،JJأرمل ،JJأحمق ○ ● Elative adjectives (JJR) can come post-nominal or prenominal. When they come post- nominal (or as a predicate), agreement in definiteness is obligatory and agreement in number and gender becomes optional. الرجل الفضل، الرجلن الفضلن/الفضل، الرجال الفضلون/الفضل ○ .form أفعل and have ال When JJRs come prenominal, they are always without ● أفضل رجل، أفضل رجلن، أفضل الرجال ○ ● JJR are not nominalized, even when they come in nominal positions, e.g. JJRمما يريد، يعطف على الفقر JJR أفضل ،JJR هدف أو أكثر ●

DT: The Arabic Determiner System In Arabic the determiner system includes three classes

بعض هؤلء الرجال المخلصين .e.g some those the men the faithful ‘some of those faithful men’ some بعض .Quantifiers, e.g .1 ○ Morphology: This class does not inflect for number or gender ○ POS: PDT ○ Dependency: predet those هؤلء .Demonstrative Pronouns, e.g .2 ○ Morphology: this class inflects for number and gender ○ POS: PDEM ○ Dependency: predet the ال :Definite Article .3 ○ Morphology: does not inflect for number or gender ○ POS: DT ○ Dependency: det

should be tokenized separately from the following noun, even if the following ال The definite article .الفيس بوك or adjoined to a foreign name ,السي أي إيه an acronym ,البرادعي noun is a proper name

14 PDT: Predeterminers or the quantifiers. These are words that describe the quantity, amount or approximation of أيسماء التبعيض the nouns they precede. Generally speaking, quantifiers are known by the fact that they do not determine the number and gender of the whole NP, but gender and number is determined by the noun .(بعض الولد، بعض البنات) that follows the quantifier

List of quantifiers:

كشطر أضعاف ضعف أغلب أكثر بضع كافة جميع كل غالب آخر معظم غالبية بعض مختلف كشتى كلتا كل إحدى أحد خمس ثلث ربع يسائر عدة أكمل كامل جل

.بأكمله is usually found in constructions such as أكمل Note that

.كشبه منعدم .is also considered as PDT when modifying adjectives, e.g كشبه Note that

are morphologically specified for number and gender (unlike the rest of the كل، كلتا أحد أحدى Note that quantifiers). Nonetheless, as they are tagged as PDT, no gender or number is available/assigned to is one of the quantifiers that can function as a noun when it is not in idafa construction أحد ,them. Also no one at home ل أحد في البيت .e.g

WDT. This list contains only two instances: أية أي RB: Adverbs Fixed adverbs. This is the list of fixed (frozen) adverbs: آنذاك، إذاا، إذن، أيضا، عندئذ، هكذا، حينئذ، حينذاك، بعدئذ، هنا، هناك، هنالك، ربما، ثمة، لثنم، وقتذاك، وقتئذ، يومئذ، قط، فقط، فحسب، ههنا، يسيما، يساعتئذ، آنذاك، كذلك، لذلك، بعكد، قبكل، لذا

(RB (dep/tmodقبكل mwe من :is tagged like this من قبكل Note. The expression

Less frequent adverbs: إمذاك، عندذاك، آنئذ، قبلئذ، عامئذ، يسنتذاك، يومذاك ، عمئذ، يساعتئذ، لحظتئذ، ليلئذ

Open adverbs (adverbials). Unlike adverbs, the words in this category can also function as nouns or below, for instance, is the same as the English adverb حقا adjectives based on their usage. The word which means right and the indefinite حق I really saw him. It consist of the noun/رأيته حقا really as in nunation). Thus, the exact same word can be seen as an indefinite accusative) ا accusative ending of That was their right. RB is also used for adverbials /كان ذلك حقا لهم noun as in

1. Adverbial nouns (noun + accusative nunation): أبدا – جدا– جميعا – البتة– خاصة – فعل – صدفة – أصل – أيسايسا – حقا – فجأة مباكشرة – مثل – عبثا – مجانا – حتما – تقريبا – جملة – كافة – خصوصا – تباعا – عموما – تماما– جميعا– مستقبل – 2. Adverbial Adjectives (adjective + accusative nunation):

15 غالبا – دائما – أخيرا – طويل – قديما – حديثا – داخل – خارجا مطلقا – دائما– جيدا - مؤخرا – مقدما –باطل – محضا – يسريعا – قليل – ,and will not show accusative tanween ممنوع من الصرف Note that elative adjectives are diptote ينامون أفضل من ذي قبل and يسار أيسر لع من أخيه .e.g :(accusative nunation + (ي+Adverbial participles (relative adjective (noun .3 ثقافيا – صحيا – اجتماعيا – رياضيا – اقتصاديا – لغويا – عراقيا – كشخصيا – عشوائيا – كشفويا – يسيايسيا – مركزيا – محليا – عالميا حاليا – يسنويا – يوميا – كشهريا – أيسبوعيا 4. Adverbials of time (based on nouns that describe time): دوما – فجرا – ليل – الليلة – يوما – نهارا – صباحا – مساءا – ليل ونهارا – ليل نهار – غدا – حينا – أحيانا – أبدا – مرة4 – مرارا - أمس Sometimes they can be modified by adjective .ال Temporal accusative words with .5 RBيوم DTاليوم ال ،RBآن DTالن ال ○ الشهر المقبل، العام الفائت ○ when explicitly temporal and in idafa to a following noun. The المفعول فيه The case with .6 .is RB and the following noun will be in genitive idafa relation مفعول فيه (NNيوم TDال/RBمساء) مساء اليوم ○ صباح الغد ○ فجر الحد ○ وقت الظهيرة ○ نحو، حوالي، زهاء، قرابة Words meaning about .7 حضر حوالي خمسون طالبا ○ عاش زهاء يسبعين يسنة ○ 8. Elative adjectives when used as adverbs of degrees are also adverbials, RB. يحبه أكثلر من إخوته ○ يسافر أقنل من زملئه ○ كثيرا ما when not functioning as a subordinating conjunction, but used in the sense of طالما .9 قلي ال ما when it means قلما is also RB. The same thing is applicable on السلع الغذائية التي طالما مثلت مشكلة للمواطن البسيط ○

.is VBG المفعول لجله Notice that ADP/IN: Adpositions ● Prepositions: This is a closed list of words that only function as prepositions: ،من، إلى، عن، على، في، الباء، الكاف، اللم، واو القسم، حتى، منذ، مذ، التاء but RP, and the following عدا، حاكشا، خل، إل In our framework exceptive particles are not prepositions noun is either in the accusative or appositive. ● Open Prepositionals (quasi-prepositions): The words below usually act similarly to prepositions but can also be preceded by other prepositions or function as adverbials. They differ from adverbials since they precede nouns: مع، أمام، إثر، إزاء، بعد، بين، تجاه، تحت، تلو، حذو، حول، حين، خلف، ضمن، عقب، عبر، عند، فوق، فور، قبل، قبيل، قبالة، قرب، مع، أثناء، طوال، عوض، حسب، وفق، أمثال، ضد، مثل، كشبه، نحو، دون، لدى، خلل، وراء، حيال، جراء، ويسط، رغم، داخل، خارج، رهن، كبلعنيد، كنصب، قيد، طيلة، بيد، مقابل، نظير، كشمال، كشرق، جنوب، غرب، نتيجة ● Complex prepositions: If two prepositions follow each other, each of them should be

(will be NN (npadvmod in dependency ثلث مرات and مرتين is an RB (advmod in dependency) while while مرة Note that 4

16 -Note that the quasi . من على، من أمام، من خلل، بدون، بداخل، من فوق .marked with ‘IN’, e.g .من المام .then it an NN, e.g ال If it comes with .ال preposition in this case must come without ● Subordinating Conjunctions: The following words are subordinate conjunctions that link subordinate clauses to the main sentences. Subordinate clauses express condition, reason, time, location or opposition. They are dependent clauses as they cannot stand alone.

إن الشرطية، أن المصدرية، أ من (قال أ من أو قال إ من)، إذ، إذا، بينما، طالما، عندما، وقتما، حالما، فيما (فيما كان أخي نائما خرجت من المنزل)، لما (لما هزه وجده ميتا) ريثما، كما، كيما، بعدما، أنما، كي، لو، لول، حتى، ما الشرطية (لن تنجح ما لم تذاكر)، واو الحال (توفوا غرقا وهم يحاولون عبور الحدود)، فاء السببية (ل أيستطيع رؤيتك فالظلم دامس)، لم التعليل (السببية) (عاد ليقاوم الحتلل) 5حيث الجوازم التي تجزم فعلين وهي: إ نن ، إذما ، مهما ، متى ، أيان ، أين ، أمنى ، حيثما ، كيفما ، أي also أخوات إن: أن، ليت، لعل، عل، كأن، لكن وعسى

:is subordinating conjunction also in all the following examples أن أكشار إلى أن أعلن أن أخبرني بأن بما أنه اتفقوا أن جدير بالذكر أن

PRP: Personal Pronouns ● Personal Pronouns: الضمائر المنفصلة: أنا، نحن، أنت، أنلت، أنتما، أنتم ، أنتن، هو، هي، هما، هم، هن الضمائر المتصلة: -ني، -ي، -نا، -ك، -لك، -كما، -كم، -ك نن، -كه، -ها، -هما، -هم، -ه نن ضمائر النصب المنفصلة: هي: إياي وإيانا وإيا لك وإياكما وإياكم وإيالك وإياكما وإياك نن وإياه وإياهما وإياهم وإياها وإياه نن are not considered as pronouns here, but NN+PRP نفسه ونفسها، إلخ Note that ● Pronouns: ي، -نا، - لك، -لك، -كما، -كم، -ك نن، -كه، -ها، -هما، -هم، -ه نن-

● Interrogative Pronouns: ما، ماذا، لمن ● Non-Referential (expletive) Pronoun: "ضمير الشأن: الهاء في "أنه ● Relative Pronouns: الذي ، التي ، اللذان ، اللتان ، اللذين ، اللتين ، الذين ، اللى ، اللتي ، اللواتي، اللئي ● Demonstrative Pronouns: هذا ، هذه ، هذان ، هاتان ، هؤلء ، ذلك ، ذاك ، تلك ، أولئك Less frequent demonstrative pronouns: ذا، ذانك ، تانك ، ذلكم، ذلكما، ذلكن، تاك، تيك، تلكم، تلكما، تينك، ذينك، أولئكم

in the Similar Words with Different Functions حيث means where, it should be tagged as WRB. See the table of حيث if 5 section

17 ما Words ending with :in their structure, for instance ما Some words in Arabic include of

انما, عندما, حيثما, كيفما, كلما, فيما, بينما, حينما, بعدما, لما, كما, حالما, طالما, اينما, اذما, قلما, كيما, مهما, مادام All of the above words are subordinating conjunctions ADP/IN

With other words it is not clear, for example: ,مما, عما, بما is a relative pronoun. Therefore, it should be splitted from the attached morphemes ما Here, sometimes and each part is annotated separately.

If the sentence still الذي is a relative pronoun, we can replace it with ما In order to recognize whether the :would be a relative pronoun (WP). For example ما makes sense, the

هذا ما أكد عليه هذا الذي أكد عليه

حدثني عما يسمع حدثني عن الذي يسمع is not a relative pronoun since it can not be replaced with ما However, in the following sentences, the الذي قلما ينجح المتشائم قل الذي ينجح المتشائم*

is a relative pronoun, it will be possible to refer back to it with a pronoun, as shown in the first ما When example above. The second example can also be: حدثني عما يسمعه

was replaced with an English relative ما Moreover, when the sentence is translated to English, if the pronoun (e.g. that, which, what), it is most likely a relative pronoun. The first two examples above can be translated as:

That was what he affirmed. he told me about what he had heard. here is also a WP ما etc. The كشخص ما, كتاب ما or كشيء ما One of the common phrases in Arabic is

should also be separated and annotated as ما This .مازال , مادام like ما verbs occur with أخوات كان Some of an RP: في البيت VBC زال RPما

مما The case with which can be a preposition+relative pronoun or a single token ,مما A confusing case here is and (المر الذي) subordinating conjunction. It is considered subordinating conjunction if it means introduces a subordinate sentence

18 بلغ عدد المسجلين 2.7 مليون مشترك مما يشير إلى أن - equivalent to بلغ عدد المسجلين 2.7 مليون مشترك المر الذي يشير إلى أن -

(من الذي) And it is preposition+relative if it means يسئمت مما حدث ينبغي أن تتحقق مما تقرأ

WP: interrogative/adjectival pronouns ما، ماذا، من :This includes relative and interrogative pronouns ● كسر النافذة WPهو من ○ كسر النافذة WPمن ○ which comes after indefinite ما Note that this also includes adjectival/specificational ● nouns WPكشيء ما ○ WPكشخص ما ○ WPمكان ما ○

VBN: active and passive participles These are active and passive participles that follows one of the following patterns (fAEil, mafoEuwl, mufaE~il, MufaE~al, musotafoEil, mustafoEal, etc.) when they are followed by at least one argument. .attached) or indefinite ال Note that VBN can be be definite (with the definite article ايسم الفاعل وايسم المفعول (على وزن فاعل، مفعول، مفععل، مفنعل، مستفلعل، مستفلعل، متفاعل، منفعل، مفتعل، إلخ) إذا كان عامل (إذا كان متبوعا بمعمول أو أكثر: مفعول به أو جار ومجرور متعلق أو أن) VBN are adjectival and verbal, adjectival because they agree with the head noun in number and gender, and verbal because they govern an argument or modified by an adverb.

.حال There are two instances of VBN: 1) in direct adjectival/predicational position, 2) as

1). In direct adjectival/predicational position. VBN can modify or predicate a head noun and agrees with it in number, gender and definiteness (just like an ordinary adjective), and it governs an argument .الصادرة أمس .or is itself modified by an adverb, e.g التابعة للقوات .usually a closely related PP), e.g) السلطة المصادرة للحريات .1 الطائرة التابعة للقوات الجوية كانت في مهمة تدريب .2 في الصحف الصادرة أمس .3 يسكان التيبت المنفيين في الهند .4 الدليل الواضح كوضوح الشمس .5 الطالب الناجح دوما .6 ,the verb it was derived from + التي/الذي can be replaced with ال Notice that each VBN starting with the the VBN can be replaced with ,ال which emphasizes their verbal readings. Even in examples without verbs.

is also VBN. Notice that adverbials حال Circumstantial accusative .حال circumstantial accusative .2 agrees with the head noun in number and حال are both accusative, but the difference is that حال and

19 gender. Some examples: مؤكدا في الوقت نفسه أنها ليست عملية يسرية .7 ... وأضاف قائل: ل يمكن .8 آملين في التوصل إلى اتفاق .9 رفض اقتراحهم معتبرا أنه يتصل بمسائل لم يتفق عليها .10 :وأضاف مبتسما .11

حاصل، مسئولين، مجني the words بالحاصل على الجائزة، إلى المسئولين عن الصحيفة، بالمجني عليهم Note the examples don't fulfill any of the two conditions for VBN (they are neither in the adjectival/predicational position .and they should be NN, as they are considered as nominalized adjectives (حال or

which typically الصفات المشبهة ) Another exception is when the participles are in false idafa construction :These are JJ, such as .(الضافة اللفظية occur in الدخل JJ الفئات المحدودة Low(“limited”JJ)-income groups السبب JJ كانت تعاني من مرض مجهول She was suffering from an idiopathic (“unknown”JJ) disease

فرحان، عشان، كريم، قريب، حزين،مريض، كشجاع، أعور، ,adjective like الصفات المشبهة Also included in the list of .أعرج

VBG: masdar

المفعول لجله .1 In order to consider the masdar as VBG, it should be followed by two arguments. The first argument could be semantically the subject or object, and the second argument could be the object or a closely is VBG المفعول لجله related PP. Also notice that إزالته أثار الماضي، انخراطه في العمل السايسي، كونهم على حق ذهب طلبا للعلم

.المبتدأ والخبر takes two arguments كان the verb ,كوننا على درجة أخرى ، كونه يسفيرا Note that in the examples can be a noun, adjective, PP or adverb. In the cases above, both examples are masdar followed خبر The .خبر is also a يسفيرا and خبر is a على درجة .by two arguments and both will be VBG

المفعول المطلق العامل .2 المفعول المطلق العامل Cognate accusative heading an argument من المتوقع صعو د المؤكشر بدءا من أول الشهر ○ تضاعف مستخدمو النترنت وفقا للتقارير الريسمية ○ يربط كشرق المدينة بغربها مرورابويسطها ○

RP: Particle :Here is the list of particles in Arabic .(حروف) Particles in Arabic are non-derived fixed forms (هل، أ) إ نن التوكيدية

20 ما الزائدة: دائما ما يعود متأخرا ,الواو الزائدة، يسبق ورأيت ذلك من قبل, الواو اليسئنافية ل (ل ينمو، ل تسرف، ل أحد في البيت)، لم، لن (يسوف، س) إذا الفجائية، مثال: فإذا بالمتفرجين ينهضون لم المر في مثل: لنذهب (قد، لقد) فاء الربط، مثال، أما السلطة فليست مسالمة أما أل، إنما ل النافية للجنس إما

and the following noun is either in عدا، حاكشا، خل، إل، غير، يسوى Exceptive particles and nouns are also RP .(غير ويسوى the accusative or appositive (or genitive with غير are exceptive nouns and the noun following them are in the genitive. We treat غير و يسوى Note that .ما جاء غيكر محمد، ما رأيت غيلر محمد، ما مررت بغيلر محمد receives the case غير as an RP even if ويسوى .غير مستقر .is also RP when it precedes an adjective to convey negative meaning, e.g غير The word in which case it will be(ل غير) is always RP and in dependency unless it occurs in the expression غير So or (غير كونه) a noun (غير صالح) labeled as advmod6. It takes the neg label whether preceding an adjective .(غيره) pronoun كان غير صالح لليستخدام غير (غير x,صالح)neg لم تكملف أكثر من 115 دولرا فقط ل غير غير (غير x,تكلف)advmod (ل x,غير)neg

قال إ نن/علمت أ نن الشمس :when they serve as complementizers for verbs أ نن and إ نن The exception here is .In this case they are IN .مشرقة ما التعجبية أ ني، كأنما، كر نب، حتى، من الزائدة، الباء الزائدة، فاء الربط، فاء الجزاء، لم التوكيد Vocal Particles: أحرف النداء (يا، أيها، أيتها، أيا، أ، أي) UH: Interjection or hesitation نعم، ل، بلى، أجل، كل، كترى،آمين، ألو، آه، لول، أوكي، ويحك، أف يسبحان، يسرعان، بئس، هيا، حذار، آمين، هيهات SYM: Symbol SYM should be used for mathematical, scientific and technical symbols or expressions that aren't words or digits of language. It should not be used for any and all technical expressions. For instance,

when they occur as independent phrases, usually at the ل أكثر and ل أق مل The same is applicable on similar expressions like 6 end of the sentences.

21 the names of chemicals, units of measurements (including abbreviations thereof) and the like should be tagged as nouns. In short, SYM is for non-alphanumeric characters which are not also punctuation marks. Examples of symbols are @, #, $, &, %, ↔, =, /, etc. List symbols (LS) include bullet points (•, ◦), section signs (§), pilcrows (¶) etc. Non-final punctuation include emoticons like � , � , � etc.

Specific Cases for POS

Numbers: CD Numbers are either cardinal or ordinal. The POS tags are (NUM/CD) and (ADJ/JJ) respectively. Sometimes the numbers appear as digits. The POS is CD whether in time (e.g. 5:00), dates (e.g. 2001), .(طلب lists (e.g. 1, 2, 3) or normal counting (e.g. 3 it is 'num'; for lists (1, 2, 3) they are (طلب For dependency, it's not always the same. For counting (3 it is gmod because the first part is indefinite and the second part defines (عام discourse'; for years (2001' .it is appos because the first part is already definite. For serial number (e.g ,(الساعة it, for time (4:30 .(الحلقة ٢٩) episodes, movie parts, etc) it is amod Digits representing dates (such as 06/07/1993) are tagged as NUM/CD. Numbers can occur either written in letters or in digits: CD مائة DET/ال PREP/ب CD/60 ,and 1, 2, 3, 4 واحد، اثنين، ثلثة، أربعة، إلخ) The CD tag is only for for numbers within the cardinal counting is CD in آلف etc.). Therefore the word متر CD/تبلغ المسافة يستة آلف But the numbers in the sentence below are tagged as NN’s السنين NN/منذ عشرات NN/هاجر اللف and everything more إثنان dual for ,صفر and واحد The number feature for CD’s is as simply singular for than 2 takes plural. Fractions are treated based on their inherent features: sing/ربع dual/ربعين plur/أرباع plur/ثلث

Digits do not express any morphology. Therefore, They take the unspecified tag for number, gender and case: حضر ١١ رجل و ١١ امرأة (ل يتضمن أحد عشر رجل وإحدى عشرة امراة)

واحد، اثنين Postmodifier numbers (act as qualitative (affirmative ,صوتين اثنين and صوت واحد Postmodifier numbers in examples such as adjective and should be tagged JJ.

Appositive Appositive in the grammar is different from how appositive is defined in the . Appositives in

22 the grammar is only the cases defined in traditional Arabic grammar . The only common type in MSA In idafa the .المام علي، الرئيس أوباما and it also includes titles ,أخي محمود، زوجتي يسعاد such as ,بدل المطابق is second part is always in the genitive, but in apposition, the second part receives the same case as the مضاف ومضاف إليه first. So remember that some cases which were treated as appositive in semantics are مدينة بوريسعيد، قناة الجزيرة .here, e.g

ل يسيما or ليسيما :Word as mentioned يسيما .which we tag as a PRT/RP ل النافية للجنس is ل According to classical linguists, the -The first part is tagged as an as RP .يسيما and ل should split into ليسيما ,above, is an adverb. Therefore (ما and يسي mwe and the second as an ADV/RB (although many Arabic linguists would also split

وإل :Word "the usage is not the typical exceptive, but it means "or else و is preceded by the resumptive إل When is ADP/IN إل is RP and و and is followed by a subordinate clause. Here the ايستولى اللصوص على السلطة INإل RPل ينبغي أن يتناحر الثوار و

عدم :Word looks like a quantifier, but it isn't. In quantifiers the head determines the number and عدم The word gender is determined by the following word (which is considered as the head): بعض الرجال جاءوا .e.g أغلب النساء حضرن .e.g عدم But not with عدم الثقة يفقدك التوازن .e.g will be NN. The negative meaning they carry is a property of the semantics (not انعدام and عدم So morpho-) of the word.

(Prenominal Adjectives) إضافة غير حقيقية False Idafa There are three types of false idafa as detailed below JJ+NN (مترامية الطراف) Attributive false idafa .1 Attributive false idafa is an adjective that goes in idafa position to a following noun and modifies or predicates a preceding noun. The adjective agrees with the preceding noun in number, gender and definiteness. Like ordinary adjectives, adjectives in attributive false idafa acquire definiteness only by :In dependency the JJ is the head. Examples .ال the definite article (amod) ظروف اقتصادية بالغة الخطورة (الظروف القتصادية البالغة الخطورة) ● لفافة بيضاء اللون ● رجل قوي البنيان ●

NN+NN (كبار الزوار) Nominalized false idafa .2 Nominalized false idafa is an adjective (usually in the masculine, plural form) that goes in idafa position to a following noun and itself behaves like a noun (it does not modify or predicate a preceding noun). The adjective is considered nominalized and receives NN tag, and it is considered definite because it is in idafa construction. In dependency the nominalized adjective is the head. Examples: محدودي الدخل ● كبار المستثمرين ● صغار الفلحين/المربين ● JJR+NN - (أذكى الطلب) Elative false idafa .3

23 form) that goes in idafa position to a following تفضيل Elative false idafa is an adjective (in the elative noun and is usually in the singular masculine form. The adjective is given the JJR tag and is considered definite if the following noun is definite and indefinite otherwise. In dependency the JJR is the head. Examples: (pobj) في أفضل وقت ● (nsubj) قام أجدر المدريسين ● (dobj) أعطى أقوى رد ● Ordinal Numbers Prenominal ordinal numbers are JJ-HEAD and the following noun is gmod (General Rule: any prenominal JJ/JJR is the head). أول الطلب ● ثاني الطلب ● ثالث الطلب ●

Post-nominal ordinal number are JJ, the head is the noun and JJ is the amod الطالب الول ● conjوالعشرون amodالثالث rootالطالب الثالث والعشرون: الطالب ●

Fractional quantifiers are quantifiers PDT-predet ثلث الطلب ● ربع المعلمين ●

Non-Conventional Constructions Adjectival Modification of a Noun مدير عام الثقافة :Problem case In Arabic adjectival qualification is mutually exclusive with nominal (idafa) qualification. So you can مدير عام الثقافة Therefore, the construction .كتاب جديد الولد but not كتاب الولد الجديد or كتاب الولد or كتاب جديد say is non-conventional. This happened because (مدير عام في وزارة الثقافة أو مدير عام لمديرية الثقافة which means) NN/def because مدير ,is an MWE job title treated as a unit. So here it will be treated as JJ/indef مدير عام In syntax, it .(إضافة غير حقيقية) or in idafa construction ال an adjective is only definite when preceded by will not be treated as amod (adjectival modifier) but mwe.

Conjoined Mudaf جنوب وكشرق مكة :Problem case -but the non ,جنوب مكة وكشرقها This is also non-conventional. The conventional way to say it is conventional way is becoming very common these days due to the effect of translation. So, both of them will be treated as def (considering that they are both mudaf). In syntax, the second one will be treated as a conj dependent of the first.

Abbreviations and Acronyms Abbreviations and acronyms should be `gender/number/case/rationality = unspecified`. Abbreviations of names are tagged as NNP's, e.g. NNP ج NNمنطقة DTالمنطقةج: ال ● NNP .ع NNP.م NNP.ج. م. ع.: ج ●

24 NNPيسي NNPبي NNPبي DTالبي بي يسي: ال ● NNدي NNفي NNدي DTالدي في دي: ال ●

Definiteness, however, does not have the unspecified value. Hence, the Annotator should select def or indef based on his/her best judgment of the context. In the example below, for instance, the year is :acronym of the adjective for Gregorian calendar) should be def) م definite, therefore يسنة 2015م ●

As indicated in the examples above, the POS (as well as dependency labels and attachments) of abbreviations and acronyms is the same as the word they refer to: JJ /يسنة 1955 م CD/يقدر عدد يسكان الردن 10 م NN /تبلغ المسافة 100 م Some problematic examples Example: تلقت كشكوى من الطبيب إبراهيم أحمد محمد اليماني، ال__مسجون__ حاليا فى يسجن وادى النطرون is a VBN because it is followed by an adverb and an argument. One of them is enough to مسجون Here establish the case for VBN.

الطبيب إبراهيم اليمانى، ألقى القبض عليه فى 18 أغسطس 2013، و__محبوس__ حاليا على ذمة القضية same as above

تلقت كشكوى من الطبيب إبراهيم أحمد محمد اليماني، ال__جراح__ المشهور modifies المشهور Also .الطبيب is also an appositive of إبراهيم and الطبيب is an appositive of الجراح Here is a job title not an adjective, the adjectival الجراح and a JJ cannot modify another JJ. Also الجراح meaning will be graphic and definitely not intended here.

يسعى لضم مهاجم نادى ريال مدريد ال__كشاب__ الفارو موراتا إلى النادى اليطالى فى مويسم النتقالت الصيفية and الشاب relationship between بدل and is an NN. There is also a مهاجم is an appositive from الشاب ,Here .الفارو

وصف المدير الفني لتشيلسي النجليزي، ال__برتغالي__ جوزيه مورينيو تجربته في إيطاليا مع إنتر ميلن بالرائعة cannot be an adjective in this context, because it is separated from the noun البرتغالي Same as above, also which is not فيلم المويسم لمحمد رمضان الجديد as فيلم المويسم الجديد لمحمد رمضان by a PP. It will be like reading even though it is normally an adjective. If ,مدير here must be a noun, appositive to البرتغالي possible. So an adjective does not modify a noun, it is lexicalized as a noun and, thus, annotated as NN.

There are other examples where the usual POS of a word is changed based on its position in the are tagged as NN when they are outside the idafa construction كل and بعض sentence. Quantifiers like :(الكل من والبعض من .e.g) منهم NN/منهم لم ينجح، رأيت كل NN/البعض In addition to that, CD’s can function as adjectives if they modify nouns. In the example below the numbers modify the nouns and agree with them in morphological features رأيت ولدا واحدا وبنتين إثنتين

25 Similar Words with Different Functions Some word in Arabic have identical forms. However, they function differently. The purpose of this doc is to illustrate the most common ones of these words with explanations and examples to help differentiate them and select the suitable POS tags for them:

أي

Function Description Example POS Tag

PRT -RP درس البايولوجي أي علم Explanatory Particle Meaning “in other الحياءwords 7, “

DET - WDT ل تقلق على أي كشئ Wh-Determiner Usually followed by an indefinite compliment

DET - WDT أي الدروس حضرت؟ Interrogative Pronoun Followed by genitive nouns (idafa)

PRT -RP أي علي! تعال هنا Vocal Particle Only in vocative expressions

الباء

Function Description Example POS Tag

ADP - IN أه ال بكم Preposition .Meaning with, by, etc

PRT -RP "كفى بك داء ان ترى الموت Particle Does not have a كشافيا" أبو الطيب المتنبي . meaning. الباء الزائدة It often follows :or لست بقاتل negation.

حتى

the POS is RP rather than CC. The following , أو is the same as أي While the meaning of 7 noun is labeled as appos in dependency.

26 POS Tag Example Description Function

Separates part from whole Conjunction تعجب الجميع حتى الطفال CONJ - CC

Meaning “in order to” or “until” Subordinate درس حتى ينج لح. ADP - IN followed by a verb in a subjunctive Conjunction أيستمر حتى تحق لق أهدافك. mood

Meaning “till”, Followed by a Preposition بقي نائما حتى منتصلف النهار ADP - IN noun in a genitive case

Starting a new sentence, meaning Subordinate أصبح المكان مهجورا حتى ADP - IN even Conjunction“” الطيور رحلت منه

حيث

Function Description Example POS Tag

ADV - WRB يسأجدهم حيث يكونوا (Relative Adverb where (locative

ADP - IN السباحة رياضة مفيدة حيث Sub_Conj occurs at the تتحرك كل أعضاء الجسد beginning of a sentence linking it semantically to the previous one

من IN-mwe أرخص المدن من حي كث Nominal Following the حيث IN-prep تكاليف السكن preposition من

ب IN-mwe يعيد تريسيم المدن بحيث تكون حيث IN-mark تبعيتها لمحافظات أخرى

حين

Function Description Example POS Tag Dependency

ADP - IN mark حين عادوا, حين يأتي Sub-conj heading a clause الصباح

ADP - IN prep حين عودتهم, حينها Quasi- followed by a يكون... preposition genitive noun or

27 a VBG

NOUN - NN depends on its من حين لخر, كل Regular noun in a nominal function حين position

ADP - IN Mark (preceded في حين كانوا... Preceded by في Sub-conj (preceded by and heading a by mwe في clause

-the quasi حين the sub-conj is almost always followed by a verb. It can also be distinguished from حين if the meaning was the same with عند or عندما preposition, by applying the following test: replace it with .worked, it is quasi-preposition عند it is sub-con8j. If ,عندما

الفاء

Function Description Example POS Tag

Resumptive/initial faa Usually occurs after a PRT - RP أما السلطة فليست مسالمة. sentence starting with فالمصانع الكبرى تستخدم أما. Sometimes it also كميات من الغاز الطبيعي starts a sentence or a paragraph

PRT - RP إن كان حبي للوطن جريمة Conditional response In a response of a فإعتبروني أول مجرم faa conditional clause

ADP -IN تدرب الفريق كثيرا ففاز Linking faa connects causes and بالبطولة results or occurs between two sentences indicating cause, result, .consequence etc

CONJ - CC يأتي الشتاء فالربيع فالصيف Conjunction particle Indicates sequence. فالخريف Test: Can be replaced with ثم

كما is an exception في حين The mwe 8

28 Function Description Example POS Tag Dependency label

PRT - RP prt كما يختص الوزراء بالنظر Resumptive/i Starting a sentence في المشاكل اليومية nitial faa

ADP- IN mark ارتفعت اليسعار كما زاد Linking sub- Linking a clause to a المطروح في اليسواق conj .preceding sentence

ADP - IN + Prep + pobj إفعل كما تريد Prep+relativ Can be split into two PRON - WP يتقبلك كما أنت e pronoun tokens كما تحب

اللم

POS Tag Example Description Function

Followed by a verb Emphatic لذهب نن هناك PRT -RP with a subjunctive mood

Followed by a noun Preposition عاد للبيلت ADP - IN with a genitive case

Followed by a verb Imperative Particle لنذه نب PRT - RP with a jussive mood

Followed by a verb Explanatory زاره ليطمئ نن عليه ADP - IN with a subjunctive mood

ل

Function Description Example POS - Tag

PRT -RP-neg ل أحد في البيت من أخوات إ نن ل النافية للجنس

PRT - RP لتخاطر بسلمتك Followed by a verb in ل الناهية a jussive mood

PRT - RP لنذهب الى المكان القريب ل Conjunction combines single البعيد words only (does not combine sentences)

29 X - UH ل! Interjection Occurs by itself or in an answer to a yes/no question

often look the same. However, the first لك نن and لك نن ,Since most Arabic texts do not write short vowels or a subordinating conjunction ,من أخوات إ نن one is a conjunction while the second can be a particle

لكن

Function Description Example POS - Tag

CONJ - CC لم يأكلوا السمك لكن الدجاج ”Conjunction meaning “but rather usually preceded with negation

ADP - IN لكن الجو بارد -Precedes a subject من أخوات إ نن predicate sentence

ADP - IN فازوا بالمباراة ولكن ل يمكن Subordinating preceding a clause اعتبار هذا الفوز نهائيا conjunction

ما

Function Description Example POS Tag Dependency label

PRON - WP Depends on its هذا ما يسمعته Relative pronoun Can be replaced with function. In this الذي example: ROOT

ADP - IN mark بعدما تشرق الشمس = This ما and the ما المصدرية بعد كشروق الشمس verb following it can be replaced with masdar

PRT - RP prt ما أروعه! For exclamation ما التعجبية

PRT - RP neg " ما الحسن في وجه preceding a ما المشبهة بليس الفتى كشرفا له" أبو الطيب المتنبي

PRT - RP neg ما أدري Negative Particle It does not affect

30 the mood of the verb

PRON - WP Takes the ما هذا؟ Interrogative Meaning pronoun ”?“what predicate label. In this example: ROOT

PRT - RP prt (child of the كثيرا ما أذهب هناك It does not ما الزائدة (verb يتوقع بناء ما بين ألف change the إلى ألفين مسكن جديد meaning of the إذا ما أيد الجيش sentence تركشحه

PRON-WP amod رأيت كشيئا ما Pronoun ”Meaning “some

ADP -IN mark لن نذهب ما لم تأتي Conditional Can be replaced معنا with “if”

متى

Example POS Tag

ADV -WRB متى أتيت؟ Interrogative Adverb Asking about time

ADP - IN الصديق يساعدك متى ما Subordinate Meaning whenever تحتاج Conjunction

من

Function Description Example POS Tag

ADP -IN من يدر نس ينج نح Conditional Followed by a verb in a jussive mood

PRON - WP من في البيت؟ Interrogative Pronoun ”?Meaning “who

APD - IN دخل من الشباك Preposition ”Meaning “from

PRON - WP الصديق هو من تثق به Subordinate Can be replaced with الذي Conjunction

31 نحو

Function Description Example POS Tag Dependency label

ADP - IN prep يسار نحو الشمال Quasi- Accusative and preposition followed by a genitive noun - meaning: towards

ADV - RB advmod يمثل نحو ثلث السعر :Adverbial Meaning modifier approximately

NOUN - NN Based on its على نحو آخر Nominal Can be pluralized or position modified by an function in the adjective sentence

الواو

Function Description Example POS Tag

CONJ - CC زيد وعلي في المدريسة. Conjunction Connects two elements أحال فردي كشرطة للتحقيق asymmetrically. It can وذلك في إطار يسيايسة also connect two الوزارة في عدم التستر على sentences المخالفين

PRT - RP وتعقيبا على ذلك قال ... إلخ Starting a new واو اليسئنافية sentence

PRT - RP يسبق ويسمعت ذلك It does not change the واو الزائدة meaning of the sentence

APD - IN عاد وهو يسعيد Adds description واو الحالية

APD - IN ذهبت وعلي الى السوق Meaning “with” واو المعية إتركه وكشأنه

32 PRT - RP والل Used for oath واو القسم

واو Note about Annotating at the beginning of the sentence is RP واو ● in the middle of the sentence is واو ● ○ CONJ - cc by default, ○ considered RP-prt when ولو، وإ نن، .followed by a subordinating conjunction (IN), e.g ■ ,ولكن، ولعل، إلخ حاول الصلح ولكن لم يكلل بالنجاح such as before a (الواو الزائدة) or when it is redundant ■ بعض الدول وعلى رأيسها السعوديه تنتج النفط .parenthetical clauses/phrases, e.g

أن … وأن، .unless there is a preceding subconj then the waw is still cc, e.g ○ : لعل … ولعل، إلخ …طالب حسين بأن تتحول البنوك الزراعية إلى بنوك تسليف فلحى وأن تحصل فائدة ل تزيد عن ,(عندما، قبلما، وقتما، حالما) Also before temporal subordinating conjunctions ○ أخذ لقب الملك وعندما .that belong to a whole conjoined sentence, the waw will be a CC , e.g .مات كان ابنه هو التالي .will be the conj كان and (أخذ) will be cc attached to the ROOT واو In dependency the كان will be a child of عندما مات is still labeled as CONJ-cc واو In this example the

يسواء

Function Description Example POS Tag

NOUN-NN على السواء Noun usually in the fixed expression على السواء meaning equally

PRT -RP لم يفز بأي بطولة يسواء Preconjunction with أو Particle الدوري أم الكأس

ADP-IN يسأذهب يسواء وافق المدير أم Subordinating Introducing a subord لم يوافق conjunction sentence

33 مجرد

POS Tag Example Description Function modifying or Adjective كلم مجرد JJ predicating a noun with an argument Participle كلم مجرد من أي معنى VBN before nouns Noun مجرد كلم Noun-NN بمجرد وصوله بمجرد أن جاء

4. Morphological feature tagging animacy aspect case rat rational imperf imperfective nom Nominative irrat irrational perf perfective gen Genitive unsp_r unspecified unsp_a unspecified acc Accusative unsp_c unspecified definiteness gender mood def Definite masc masculine ind indicative indef Indefinite fem feminine sub subjunctive unsp_g unspecified imp imperative jus jussive unsp_m unspecified number person proper sing singular 1 1 true true plur plural 2 2 false false dual dual 3 3 unsp_n unspecified tense voice pres Imperfective without particles that refer to act active مع المضارع الغير مسبوق بلم the past or the future

34 و السين ويسوف ولن Perfective or imperfective preceded by the مع الماضي والمضارع negative past particle pass passive المسبوق بلم past imperfective preceded by one of the future السين ويسوف ولن :fut particles with the imperative مع المر unsp_n unspecified

Guiding Principle The guiding principle with morphology annotation is that we only follow the inherent (not contextual) morphological features. We do not impose morphological features that are not triggered by the words themselves. We use the context only to disambiguate, but not to assign morphological features to a we أنت ولد طيب word which doesn’t bear any manifestation of this feature. For example in the sentence we don’t use the نحن معلمات But in the example .أنلت and exclude أنلت use the context to disambiguate .as the pronoun itself is not specified for gender نحن context to assign gender feature to Foreign names are assigned gender if they invariably receive a particular gender. طرحت أبل نسخة جديدة .e.g أعلنت مايكرويسوفت عن .e.g Acronyms spelled out as letters, although the MWE could behave together with a specific gender, we because the individual letters ام, بي يسي، يسي إن إن .do not assign gender to each individual letter, e.g themselves do not trigger morphological features. We do not assume that small unit inherit features from the extended span. unsp_g يسي unsp_g بي unsp_g أعلنت الم unsp_g إن unsp_g إن unsp_g أذاعت السي The rest of the features for acronyms: Number: unsp Gender: unsp Animacy: irrational Case: unsp Definiteness: true (دي في دي Proper: true/false (depending on whether it refers to proper name or not such as

and borrowed foreign ,جيرمان وينجز The same applies for compound (MWE) foreign names such as :This also includes foreign compound names of locations .توك كشو words such as unsp_g فرانسسكو unsp_g يسان

البعض حضروا when used as NN. It is unspecified for gender, as we can say بعض Another example is .depending on the context والبعض حضرن، والبعض حضر Intent vs Production but it (مجزوم) It is written here in the jussive mood .ل يجد حلول غير أن يقم باختطاف الفتى :Problem case .(حرف من حروف النصب which is) أن since it comes after (منصوب) should be subjunctive

35 for علي We should consider user intent only in one case, that is obvious spelling errors, such as writing when things are clear from the context. But as we said that we abide by the طائرات for طئرات or على ,will be jussive يقم inherent" morphology of the word wrong case and mood will not be corrected. So" even in an indicative or subjunctive context.

A relevant question is do we label literally or for correctness? The answer is that we consider the user's intent as a judging dimension. If something is obviously a spelling error not intended by the user, then we give the labels as if the word was corrected. But if the user has likely intended what he/she said and what they said is grammatically wrong due to poor editing or short memory, we annotate what is there, is masc, and so on. Also the example 7 كان here كان في الدار أمرأة fem. Another example هي masc اليمن .e.g .in the singular, and we treat it like so جوال the user intended it like so with ,جوال

More examples: will be nom in all cases المسلمون the word - when in a nom position will be assigned genitive (assuming that gen is more frequent المسلمين the word - than acc)

is homograph, rather than unsp for gender and person. This is how it is taught in تكتب Note that language classes هي تكتب is 3rd person feminine in تكتب .e.g أنت تكتب is 2rd person masculine in تكتب .e.g which are described in grammar text books only as أنا ونحن So, this is different from the case for (is 1st person singular (gender is unspecified أنا .e.g (is 1st person dual/plural (gender is unspecified نحن .e.g

Case Ambiguity If the choice of case is between genitive and accusative, we choose genitive as it is most frequent: مؤقتين ايستقبل العاملون المؤقتين بمديرية الشباب والرياضة ● بني هؤلء هم بني الوطن ● مسلمين قام الخوان المسلمين بدور هام في ● But if the choice is between nominative and genitive, we choose nominative, as it is the default case: واضح أتمنى أن يكون واضح ● كل يضم كل من ● متراكم يظل متراكم ●

Proper Note on Proper: This is a feature we have implemented in all languages. It is clearly, not morphological, but we are annotating at the morphological layer in Textan. The need for this is that we don't want to have all parts of proper names to be just NNP (e.g., book title

36 'One Flew Over the Cuckoo's Nest'). Instead we want to mark them as actual PoS (determiner, preposition, verb) with corresponding morphological features. To show the span of the proper name we use the proper feature, so all items in my example will have proper=true, while also retaining their PoS: CD, VBD, IN, DT, NN, NN.

General Principles 1. The general rule for assigning proper in Arabic is if the word is capitalized in English. 2. Generally the property of properness indicates a reference to only one entity among many of its kind. So Laika is proper, German Shepherd is not. 3. This include names of the days and weeks/months. names of ,(رئيس، رئيس الوزراء، وزير، المستشار) A few exception to the first rule are titles .4 diseases (Asperger's syndrome), adjectives derived from proper nouns that are not part of a and nominalized adjectives derived from proper nouns, such as ,(قرار أمريكي ) proper name .المصريين، المسلمون، البوذيون، الديمقراطيون، السلفيون، الجهاديون، البيجماليون Specific Cases or short form وزارة المالية Names of ministries are proper whether mentioned in long form .1 .التربية والتعليم Similarly with .المالية 2. Generally to be considered proper the name of the organization need to be an official البورصة when looking it up, it shows as the official name. Same for مصرف يسوريا المركزي :name .المصرية البنك المركزي We can also accept slight (translation) variation of the name ○ .مصرف ليبيا المركزي official name is ,الليبي is بورصة so probably ,يسوق دبي المالي The official name is :بورصة دبي With ○ not proper. This is borderline. ,كأس السوبر اليسباني is proper, short for السوبر اليسباني .3 by itself (i.e. not followed by a name) is proper=false كأس ,However ○ .it is generic ,يسوبر because, unlike إدارة البحث are all proper because it is an official name, same as الجهاز المركزي للتنظيم والدارة .4 .الجنائي is a vague general term that does not indicate a specific entity and is الجهاز الداري للدولة .5 not proper. حزب in حزب With appositives consider whether it is part of the official name or not. So .6 By contrast .مهرجان كان السينمائي and ميدان التحرير is part of the official name, same as with الوفد .is not part of the official name رواية يعقوبيان in رواية ○ Generally in the media world, the appositive is not part of the name: برنامج البيت بيتك، فيلم قلب اليسد، قناة الجزيرة، جريدة اليوم السابع، مسرحية الزعيم، إلخ جامعة :Generally with place names the appositive is part of the name ○ القاهرة، مسجد الرحمة، مستشفى أيسيوط الجامعي، كنيسة القديسين، برج خليفة، بحيرة ناصر، محافظة القاهرة، قطاع غزة، مطار نيودلهي، محور أكتوبر، ميدان روكسي they take جامعة القاهرة، وزارة المالية With appositives that function as part of the name .7 .الجامعة، الوزارة proper=false when mentioned alone 8. With adjectives الزهر الشريف، الوليات المتحدة :They are proper if they are part of the name ○ المريكية، القاهرة الجديدة، الضفة الغربية، الشرق الويسط ○ They are not proper if just functioning as modifiers (whether derived قرار أمريكي، منتج صيني، ترحيب أوروبي (from proper names or not كشمال أفريقيا، غرب :Region names are also proper if they are geopolitically well defined .9

37 .أوروبا، أمريكا الشمالية، الدلتا، الوجه القبلي، الوجه البحري that precedes a proper noun is also proper if the definite article is ال The definite article .10 .البي بي يسي but not in ,البرادعي، الثلثاء، التحاد الوروبي generally inseparable, as in 11. Generic nouns derived from proper nouns are still generic and they take proper=false .بعض المريكيين/المصريين، المسلمون، البوذيون، الديمقراطيون، السلفيون، الجهاديون، البيجماليون كشركة جوجل، كشركة ) from the name كشركة With names of companies we tend to drop .12 .الشركة العربية للتصنيع، كشركة عز للحديد والصلب) unless it is part of the official name (مايكرويسوف .أفضل ممثل، أفضل مخرج، أفضل تصوير :Names of awards are proper=true .13

Tricky cases مجلس الدوما الرويسي is proper true دوما Only مؤيسسة الفيفا is proper true فيفا Only proper=true المجلس العسكري proper=false مجلس الوزراء proper=false رئايسة الجمهورية proper=true السفارة اليطالية

NNP and Proper NNP is assigned to proper nouns according to the following rules.

1. Person Names Names of people are NNP even if they have an adjective or common noun variant (or if they occur as MWE). (Note that gender for people’s names will be based on whether it is the name of a male or female): يسعيد، يسيف، وجيه، إنشراح، عواطف، محايسن، رجاء، مبارك، صلح الدين، عبد ال Saeed (happy), Saif (sword), Wagih (reasonable), Awatef (feelings), Ragaa (hope), Mubarak (blessed), Salah Aldin (reforming the religion) Abd Allah (slave of Allah) NNP/يسعيد Saeed (happy) NNP ال NNP عبد ال: عبد Abd Allah (slave of Allah)

All the common words in people’s names are tagged as NNP’s while function words take their regular POS tags: NNP دين DT ال NNP صلح الدين: صلح Salah Aldin (reforming the religion) PRPه NNP رب NNP عبد ربه: عبد Abd Rabbah (Slave of his Lord) NNP ال IN ب NNP معتصم DET l المعتصم بال: ال Alm’tasim billah (The Infallible by God)

2. Non-Person Names Names of places, organizations, etc which are single words are NNP even if they have an adjective or

38 common noun variant: الجزائر، الباطنية، الشرقية، القاهرة، مطروح، المغرب Algeria (the islands), Al-Batiniya (the internal), Al-Sharkia (the western), Al-Qahirah (Cairo, the victorious), Matrouh (subtracted), Al-Maghrib (the western) NNP / جزائر DT_proper /ال the-Algeria Algeria NNP زاد NN محلت NNP تمرد NN ايستمارة NNP مهنديسين DTال NN حي NNP اتحادية DTال NN قصر NNP جزيرة DTال NN قناة

MWE non-person names are treated compositionally if they have a compositional meaning يساحل العاج، الدار البيضاء، كوريا الشمالية، الوليات المتحدة المريكية، البحــر البيــض، البحــر الحمــر المتويســط، البحيــرات المــرة، بحيرة البردويل، رأس الرجاء الصالح، الخليج العربي Ivory Coast, Casablanca, North Korea, the United States of America, the Mediterranean, Red Sea, the Mediterranean, the Bitter Lakes, Lake Bardawil, Cape of Good Hope, the Arabian Gulf NN / عاج DT /ال NN /يساحل Ivory Coast JJ / كشمالية DT /ال NNP / كوريا North Korea JJ / أمريكية DT /ال JJ /متحدة DT/ال NN /وليات DT /ال the United States of America JJ /مرة DT /ال NN / بحيرات DT /ال the Bitter Lakes NNP /بردويل DT /ال NN / بحيرة Lake Bardawil NN / نور DT /ال CC/و NN /توحيد DT/ال NN /محلت JJ/Proper:true جديدة DT/Proper: true ال NNP مصر Egypt the new New Egypt Heliopolis

The determiner takes proper = true only if it was a part of the proper noun or the official name of an entity: NNPإبراكشي DT ال NN كشركة Al-Ibrashi company NN/proper=true هدى DT ال NN كشركة the Guidance company NN/proper=trueإعمار NN كشركة Urbanization company NN/proper=true كشجرة DT ال IN/proper=trueفوق NN/proper=trueفيلم أبي the movies My Dad is above the Tree

forget you, do you still“ أنساك، لسه فاكر، يسواح، جانا الهوى .This also includes events, books, song titles, e.g remember, traveller, love came to us

39 VBC/proper:true أنسا forget PRP/proper:true ك you

3. Non-Arabic Names ● Please follow the “General Principles” above to decide whether a given name is proper or not. ● Note that not all non-Arabic words are automatically considered as proper names in Arabic. There are many generic (lexicalized) words that are come from non-Arabic origin, such توك كشو، دي في دي، كمبيوتر، تليفزيون، كاميرا، لب توب، إلخ as a) Person Names All non-Arabic persons’ names are NNP whether written in Arabic or Latin Script. b) Non-persons’ names in Arabic script For MWE non-person names (organizations, CGD, events, etc.), all parts are NNP بوركينا فايسو، يساو باولو، نيو أورليانز Burkina Faso, Sao Paulo, New Orleans NNP / فايسو NNP/ بوركينا Burkina Faso NNP / موتورز NNP/ جينيرال NNPمايكرويسوف NN كشركة Microsoft company NNPأبل NN كشركة Apple company ديلي ميل DET/proper = falseصحيفة ال فويس NNP/proper = true برنامج ذا

Note that for foreign place/organization names we do not consider whether the place name is originally a person’s name or not. NNP /فرانسيسكو NNP/ يسان NNP /روتشر NNP /فيريرو NN/كشركة c) Non-persons’ names in Latin script Non-Arabic non-persons’ names when written in foreign script are analyzed based on their function in the source language if the source language is English (which could be understood by the majority of readers). 11. Samsung[NOUN_NNP] GALAXY[NOUN_NN] 5[NUM_CD] 12. Apple[NOUN_NN] TV[NOUN_NN] 13. Ford[NOUN_NNP] Mustang[NOUN_NN] RTR-X[NOUN_NN] If the source language not English, but it clearly appears from the context that the foreign word is functioning as name, assign NOUN_NNP. If a foreign name is multi-token but the internal

40 structure cannot be distinguished, assign NOUN_NNP to all parts of the foreign name. NOTE: if the foreign word that cannot be understood is not functioning as name, X_FW should be assigned.

4. Religions and Ideologies NNP : اليسلم، الديمقراطية، الشيوعية، الماركسية، الوهابية، المسيحية Religions and ideologies

5. Miscellaneous NNP We also assign NNP to: ● names of the weekdays ● names of the months

Specific Cases For Morphology Plurality and Numerals ● For plural irrational objects, number is “pl” and gender is specified by the grammatical is قلم is masculine because the singular form أقلم gender of the singular form. For example masculine. ● Numerals are generally tagged as unsp_g, except when they are preceding nouns, in which case they follow the inherent morphology. ● In certain cases, the nouns appear in their singular forms even if the preceding numerals means forty men but the literal translation is أربعون رج ال suggest that they are plurals. The phrase more like forty one of them (the men). Thus, and in order to obey the inherent morphology principle, the number tag should be singular.

Pluralia Tantum are collective nouns. They refer to groups of people or items but أيسماء الجموع The pluralia tantum or sometimes they have plural forms themselves. Hence, attention should be paid to what morphological features they take. They can be subcategorized as follows. جماعة، قبيلة، فريق، أيسرة، :such as ,ايسم جمع يجمع Group nouns 1 that have plural forms .1 قطيع، جيش، عائلة، قرية، لجنة، كشعب ○ gender: morphological gender ○ number: sing ○ rationality: irrat كشرطة، مباحث :that do not have plural forms, such as ايسم جمع Group nouns 2 .2 ○ gender: morphological gender ○ number: sing ○ rationality: irrat نساء، ناس، إبل Fixed plural and the singular is a different word .3 ○ gender: morphological gender ○ number: plur is irrat إبل are rat ناس، نساء :rationality: depends ○

41 رمل، تراب، ضباب :Mass nouns .4 ○ gender: morphological gender ○ number: sing ○ rationality: irrat the singular is formed by adding a taa marboutah in ,ايسم جنس جمعي Collective nouns .5 بقر، ذباب، تفاح، برقوق، عنب :the end, such as ○ gender: morphological gender ○ number: plural ○ rationality: irrat are plur and rat because they are invariably treated as such قوم ورهط :Exceptions .6

Ambiguity The Arabic language is usually written without the short vowel diacritics. Thus, words with different morphological values can appear as homographs. For instance, There are two pronouns for the second person singular, one for masculine and one for feminine. Yet, they look identical without the last short vowels diacritic: أنت تلعب أنت تلعبين

Likewise, verbs of present tense that that are conjugated for the third person feminine or second person masculine are written the same, even if with the short vowel diacritics: أنت لتنككلت كب هي لتنككت كب Therefore, in such instances we tag the morphological features according to the context. play" VBC/ MASC/Sing/2" تلعب You.2nd.masc" PRP/MASC" أنت play" VBC/FEM/Sing/2" تلعبين You.2nd.fem" PRP/FEM" أنت

In addition to that, some personal pronouns and their verb conjugation are the same for both masculine or feminine (see the table in the PRP section above for a full list of PRP’s and their morphological features). Therefore, the unspecified tag will be selected for gender even if the gender is revealed from the context: أصدقاء و ندرس هنا PRP/UNSP_gنحن صديقات و ندرس هنا PRP/UNSP_g9 نحن

In case of true ambiguity, we don’t recommend a default, but give it your best guess using your best .فحبك الحقيقى يحافﻆ عليك .judgment, e.g

Gender Representation Some words in Arabic are used for both masculine and feminine. Many job titles, for example, have a fixed masculine form but are sometimes used referring to females: الشركة ثم أصبحت رئيسها NN/MASC مدير PRP/FEMكانت هي في البرلمان NN/MASC نائب PRP/FEMهي

9 g is for gender

42 عام NN/MASC مدير NN/FEMمراتي .The default morphological feature of these titles is masc ،أيستاذ دكتور، مدير إدارة Other words include are inherently feminine. They are often used فريسة, ضحية, مشكلة أيسطورة Similarly, words like metaphorically. Therefore, they can also modify masculine entities. This can appear as a subject- predicate disagreement or noun-pronoun discord. Their gender tag should be fem even if they refer to a masculine being. PRP/MASCمصرعهم NN/FEMلقي ثلثة ضحايا كرة القدم NN/FEM ايسطورة NNP/MASC ميسي NN/FEM المشكلة PRP/MASC هو NN/MASC النفتاح NN/FEM المشكلة PRP/MASC هم NN/MASC الخوان

Also note that gender contradiction could be frequent in modern writing. This contradiction should also be reflected in our annotation.

Gender of the Arab Country Names The rule about the of Arab countries is that they should be feminine with the exception of the following: .العراق - لبنان - المغرب - السودان - الصومال - الردن - اليمن For non-Arabic countries, they are all treated as “fem”.

Gender with Foreign Names is masc and جاك In Arabic, the gender of a foreign person’s name is the same as the natural gender, so is a مايكرويسوفت .is fem. For places and organizations, the gender correlates with the hypernym, e.g جاكلين .in the language ”كشركة“ company, so it receives the same gender as the word

جنيرال موتورز، توك كشو، بوركينا فايسو، نيوز أون لين، أون تي في، يسان فرانسيسكو :Compound foreign names/words receive gender=unsp_g, because gender in this case is a property of the entire phrase and not of the individual words.

Gender with Numbers .ثلثة رجال وعشر نساء Numbers between 3 and 10 take the opposite gender of the noun they modify According to the inherent morphology principle the gender of the number is specified by the word itself not by the word it modifies. Therefore consider these examples: رجل unsp/وثلثون fem/ثلثة رجل unsp/مائة امرأة unsp/ألف Gender for human names ● The gender of first names should be the same as that of the human they are associated with, e.g. (fem)هدى ،(fem)يسعاد ،(masc)يسمير ،(masc) محمد ● The gender of last names should always be ‘masc’ whether used to refer to a male or or بيل as a name is masc whether referring to كلينتون Here .كانت كلنتون وزير الخارجية .female, e.g .هيلري

Words with varying gender يسوق، بلد، .Some words are gender-ambiguous and can be treated either as feminine or masculine, e.g

43 In this case, the context will decide the gender. If it can not be inferred from the context, give it .،ريح or هذا the best judgment of how it can mostly occur e.g. try a demonstrative pronoun and see if it takes .هذه

ضمير الفصل Case of the Separating Pronoun when (المبتدأ والخبر) is the pronoun between subject and predicate ضمير الفصل The separating pronoun It has no place in case marking “case=unsp” because most Arabic .العدل هو الحل .both are definite, e.g .”ايسم مهمل، ل محل له من العراب“ grammarians consider it as redundant neglected word

Metaphors Although metaphors denotes likeness among rational and irrational entities, the animacy tag is selected for each entity independently. If, for instance, an author is comparing a human being to an object, the human should be tagged as rational and the object as irrational. الشرق NN/IRRAT هي كوكب NNP/RAT أم كلثوم كرة القدم NN/IRRAT أيسطورة NNP/RAT بيكام

Attention should be paid to homonyms that can refer to both rational and irrational beings: تسطع في السماء الصافية NN/IRRAT هذه النجوم السينما والمسرح NN/RATهؤلء هم نجوم Definiteness The def feature value is for definite nouns, adjectives and comparative adjectives. Nouns are made or when they are in idafa construction where the second part ال definite either by adding the determiner but also if it was a ,ال mudaf ilaih) is definite. The mudaf ilaih can be definite, not only as a noun with) pronoun, demonstrative or a subordinate clause ,(كشركة إعمار .proper noun (or an NN/proper=true, e.g with a relative pronoun. In the idafa case, it is possible to find more than one noun combined with conjunctions having one mudaf ilaih. Although this is a non-conventional construction of idafa, if it occurs in the corpus, the nouns are def: جنوب وكشرق مكة في بحيرات وأنهار إفريقيا نمو وتطور اللغة العربية احترام قيم وعادات الحضارات الخرى أكبر وأحسن النباتات

In this example, 2000 is referring to one .(عام Note that the mudaf elih can also be a number, e.g. (2000 specific point in time. Thus it is definite. The same thing is applicable on percentage expressions e.g. .is definite نسبة in 50% نسبة the word Numbers that are not dates are not specific and when the mudaf elih is number, the mudaf remains indefinite, e.g.: توريد 18 طن قمح جذب 500 مستورد إصابة 24 مجندا

Attention should be paid if they were digits. In the context below, 3 is a digit and, thus, specified. This :رقم ,makes it definite and so is its mudaf الفقرة رقم 3

44 Personal Names People’s full names in the Arabic speaking regions are commonly composed of the first name followed by the family name. Sometimes the father’s or grandfather’s names are added between the first and the last name. The full name, thence, has a construction of idafa. This makes every name after the first one genitive: gen عطية nom قال منصور بنت son of, or بن/إبن However, sometimes, especially in the classical tradition of naming, words like is annotated as NN taking the same منصور بن عطية in بن daughter of, follow the first name. The word considering it as appositive. In dependency all parts of the name will be connected via nn منصور case as to the first name. gen عطية NOM بن nom قال منصور

.حاتم العجمى، محمد البغدادي، حسن حجازي :Names that look like adjectives are also treated as NNP

Special case: religion textbooks are NNP’s but a closely related tokens would be annotated compositionally with proper = true JJ - true كريم DET - true ال NNP - true قرآن DET - true ال Idafa vs Apposition .apposition, may appear similar ,بدل As indicated in the section above, the idafa, annexation, or Nevertheless, it is important to differentiate them in order to decide their case endings. While the second part of idafa is always genitive, the appositive takes the case ending of the noun it modifies. The following points should be considered when determining the Case tag: the sentence will be tagged according to ,مضاف إليه If a sentence falls in the position of ● مبتدأ مؤخر is nominative because القاهرة In this example برنامج هنا القاهرة .its internal structure, e.g والخبر هنا مقدم it will receive the genitive مضاف إليه If a noun or a falls in the position of ● قناة الجزيرة، حزب الحرية والعدالة .case, e.g it will be فيلم المذنبون، جماعة الخوان المسلمون has a difference case مضاف إليه In case the ● tagged with the explicit case it has, nom. ● If a named entity has a fixed case, in our annotation it will receive the explicit case, e.g. مدريسة المشاغبين هي مسرحية كوميدية، تعرضت الخوان المسلمين genitive in the following two examples لكثير من التجاوزات when the word does show case باعتبار المحل We consider the contextual case ● .”which is tagged “nom رأيت مويسى in مويسى morphologically such as

Many official names of locations and organizations are in idafa construction meant as a tribute to a person. In this case, even if the whole name refers to an inanimate entities (irrational), the idafa composition keeps the animacy and gender features of the person’s name: rat/femزينب rat/fem السيدة irrat/mascحي rat/masc ركشيد irrat/fem منطقة

However, when the names of these entities is foreign, they are tagged as irrational. In the example :only واكشنطن below, the official name is irrat/fem واكشنطن irrat/fem مدينة

45 Tagging Foreign Words Many foreign words are borrowed into Arabic. Some of these words take the regular morphological features of the Arabic words, and others are tagged as unsp.: then case=unsp, but if it انترنت .Case: if case with foreign words sounds unnatural, e.g ● .then assign case دولرا .sounds natural, e.g .(يسيديهات، فيديوهات) Number is singular unless explicitly plural ● If in doubt .هذا الفيديو وهذه السينما .Gender, consider how the word is invariably used,e.g ● each token is unsp_g يسي إن إن .assign unsp, e.g ● Rationality, consider how the word is invariably used. If in doubt assign unsp indef/عن فديو def/كشو def /تحدث في برنامج التوك Definiteness, decided by the context, e.g ● took def this is because, if we consider its original كشو Note that in this example جديد indef/كليب .is like an idafa but in a reversed word order توك كشو ,language

The same applies if names are written in Latin script, e.g.

بأنه أكثر من مجرد موقع مبتكر للتواصل الجتماعي Google+ يتميز موقع ●

Tagging Dialectical Words

The general rule in annotating dialectical words is to treat them according to their correspondents in precedes verbs to indicate future tense. Hence, like the future particle ح MSA. For example, the letter .in MSA, it is tagged as PRT -RP س حالعب = يسألع كب and it is لن is a negative particle similar to مش ,and is also RB. Similarly أيضا is equivalent to برضه ,Also tagged as PRT - RP even if it precedes parts of speech other than verbs: مش حالعب مش ممكن and both parts are tagged as RP. Sometimes ,ما … ش Usually negative in Egyptian Arabic has two parts .In this case it should also be tokenized and marked as RP .م is shortened to ما RPش VBCلعب RPما لعبش: ما RPش VBCرح RPمرحش: م

appears in Arabic dialects بس Like MSA, dialects have multi function words. For instance, the word .in MSA. Hence, the suitable tag for it is ADV - RB فقط meaning only or the adverb عندي وحدة بس : in which case it should be tagged either CONJ - CC or لكن Sometimes, it also acts like but or هو صغير بس انت كبرت

46 and the على It is fossilized from the preposition .عشان One of the commonly used words in Egyptian is whose POS tag كي means so that of for the sake of. Its parallel in MSA is عشان In most cases .كشأن noun is ADT - IN: إدرس عشان تنجح = إدرس كي تنجح Yet, it can also appear in the following usage: عشانك يا أحمد which is also ADP -IN ,ل The most fitting MSA here is the preposition

and the non referential في It consists on the preposition فيه Another fossilized prepositional phrase is but في It commonly appears as a preposition only .هناك The whole phrase is a synonym to .ه ,pronoun functions the same. In this context, both , and.. are tagged as RB. النت ADP/IN مشكله في ADV/RB فيه

There are, however, some parts of speech that are used only in dialects and do not have an equivalent in MSA. Tagging them will depend on their functions. e.g. in the Egyptian dialect, to indicate :is added as in ب continuation of a present verb, the letter ?what is he doing/ بيعمل أيه؟ here, functions as a particle and, therefore, should be tagged as PRT - RP ب The . أهي or أهو preceding personal pronouns as in (أداة التنبية or) أ Another dialect particle is the emphatic

Another difference between MSA and dialects is that in dialects, cases and moods (except imperative) are never pronounced. For their morphological values, the tag “unspecified” is selected.

it اللي ,The gender and number are also “unspecified” for the relative pronoun in the egyptian dialect .in MSA that are masculine and feminine respectively التي and الذي replaces الولد اللي راح البنت اللي راحت

,هم Yet, in Egyptian it can also appear as .هن Furthermore, the feminine plural pronoun in MSA is only which in MSA is strictly for masculine. Here the morphological gender value is also unspecified هما or :هم for البنات وأيستاتذتهم لكن هما اصروا وقالولى احنا كشفنالك كشغل كويس

Passive voice is not dialect). So, they are tagged انطلق invariably indicate passive in dialect (note that انفعل واتفعل Both with voice:pass. اتكسر، اتفصل، اتبهدل، اتباع، اتهدر، اترحم، اتستر، انكسر، انفتح، انهزم .e.g

.متبهدل، لمننلتلحر .Also participles from these verbs are passive, e.g

Dialect and MSA have a lot of words in common. These words are annotated as dialect only when adjacent to dialect, otherwise, MSA.

47 بيا unspecified_m /محدش يتصل بي indicative/ل أحد يتصل

Coding-switching conflict If the sentence contains both MSA and dialectal words, there are usually ambiguous words which are spelled and pronounced the same way in both MSA and dialect. Hence, they can be interpreted both ways. These ambiguous words are analysed as dialect only when surrounded by dialectal words, otherwise MSA. The Unspecified Tag As indicated in the sections above, the unspecified tag is used for tokens whose morphological value is not specified or when none of the available tags is applicable. For example, if a word is invariably used to modify nouns with different numbers and genders, then it should have the feature unspecified for number and gender. Below are more examples of the cases where unspecified should be selected: ● The tense, aspect and voice for the imperative verbs are always unspecified: ادرس كي تنجح are tagged as البعض، الكثر، الغلب، إلخ Quantifiers when acting as nouns ● unsp_g/unsp_n/unsp_r.

● There are a few tokens that are never considered quantifiers in POS but are assigned عديد and ,كثير, قليل similar morphological features. When in nominal position, the tokens but (كثيرون plural for ,كثير should be specified for number (singular for(من followed by) should be specified for باقي invariably unspecified for animacy10 and gender. Similarly, the token but invariably unspecified for (باقون :pl ,باقي :and number (sing (باقية :fem ,باقي :gender (masc animacy.

● The prenominal comparative adjectives (JJR) (unlike comparative adjectives that come after nouns) take the unspecified tag for gender and number: أفضل النساء أحسن الرجال أصغر محارب

● Case is dropped with non-Arabic words, e.g. للعلن عن فيلمها الجديد كامب أكس ري ● Digits do not express any morphology. Therefore, They take the unspecified tag for number, gender and case: حضر 11 رجل و 11 امرأة (ل يتضمن أحد عشر رجل وإحدى عشرة امرأة) ● When quantifiers act as nominals, they take the unspecified tag for number and is the same despite the difference in the بعض rationality. In the example below, the word morphological feature of the nouns they are associated with: البعض ذهبوا البعض ذهبن البعض من هذه الكشياء

forces the rationality of animacy ون Animacy is usually unsp. However, as will be mentioned below, the plural 10

48 as a quantifier means one of but it is also means someone. For the latter case, it is أحد The masc., sing., and rat: لم أجد أحدا

● Some nominal adjectives are treated differently. They take the unspecified tag for gender only. For instance: البعض هنا ول أدري أين الباقي although from the context it seems referring to plurality, takes sing for number ,باقي The word in the example above, it does inflect with gender and بعض and masc for gender because, unlike .etc باقون, باقية number like NN/gender: unsp, number: unsp, rationality: unsp البعض ● :NN /gender: unsp, rationality: unsp, number (من followed by) الكثير , القليل ● (as plural كثيرون , قليلون sing (vs .كثيرون, قليلون, باقون Exception for animacy for words like ○ .at the end indicates rationality. Therefore, they are rationality:rat ون The NN/gender: masc, number: sing, rationality:unsp :الباقي ● NN/gender: masc, number: sing, rationality:rat :أحدا ●

● When numbers refer to entities outside cardinal countings, they take the unspecified tag for rationality: العشرات من الناس العشرات من أنواع الطيور Hence, it is tagged as plural and feminine عشرة above is plural of عشرات The

ذو and Annotating اليسماء الخمسة ,brother أخو ,father أبو or the five nouns. These are اليسماء الخمسة In Arabic there is a class of nouns called owner of. They differ from regular nouns as their morphological ذو mouth and فو ,father-in-law حمو cases are represented with long vowels as they occur in idafa construction. For their POS tags, they are :often functions as an adjective ذو NN’s. However

الحتياجات الخاصة NN رياضات لذوي التجاه المتضاد JJالطريق الرئيسي ذو الطابع الزراعى JJالموارد الطبيعية ذات 5. Dependencies

5.1 Dependency Quick Table

The table below is the alphabetical list of all dependency relations for Arabic, with their respective definitions and various examples illustrating their usage. The current representation contains approximately 50 grammatical relations. The representation of grammatical relations corresponds to a binary relation between a governor element and a governed one, and must be read as follows:

grammatical_relation(head/governor, dependent)

49 .are not considered as governors, but as markers (السين ويسوف Note. Particles with verbs (such as

must be understood as a binary relation of ”.نهض زيد“ For instance, the subject relation for the sentence and then will ,زيد and the dependent proper noun نهض nominal subject (nsubj) between the head verb be formalized as follows:

(زيد x,نهض)nsubj

The full range of grammatical relation tagset is listed in the following table:

Label Description Example كان زيد مريضا acomp An adjectival complement of a verb is an (مريضا x,كان)adjectival phrase which functions as the acomp complement. ليس زيد مريضا (مريضا x,ليس)This relation specifically includes “be” acomp كان وأخواتها: كان، وأمسى، ) copula constructions أصبح زيد مريضا وأصب لح، وأضحى، و لظنل ، وبالت، وصار، وليس، وما زال، (مريضا x,أصبح)with adjective acomp (وما انلف نك، وما لفلتيلء، وما لبلر لح، وما دام .(الخبر الوصفي) predicatives بدا يسعيدا (يسعيدا x,بدا)acomp ظن It also includes verbs of uncertainty وأخواتها: ظن وحسب وخال وزعم ورأى وعلم ووجد ظننته غنيا واتخذ، ويسمع (غنيا x,ظننت)acomp

ل تضارب في البورصة حتى ل تخسر advcl An adverbial clause modifier of a verb or a (تخسر x,تضارب)clause is a clause modifying the verb advcl (temporal clause, consequence, conditional عاد من عمله يعاني من الرهاق .(.clause, purpose clause, etc (يعاني x,عاد)advcl Adverbial clauses can either be introduced by عمل باجتهاد حرصا على مسقبل أولده a marker or include a tensed verb, as in the (حرصا x,عمل)advcl الحال الجملة case of

محمد (صلى ال عليه ويسلم) .المفعول لجله It also includes Mafoul li’ajlih (صلى x,محمد)advcl الجمل It also covers parenthetical clauses تضاعف مستخدمو النترنت وفقا للتقارير .المعترضة الريسمية (وفقا x,تضاعف)It also include cognate accusative heading an advcl المفعول المطلق العامل argument

رأيت زميلي هناك -advmod An adverbial modifier of a word is a (non (هناك x,رأيت)advmod (الظروف) clausal) adverb or adverbial phrase that serves to modify the meaning of the منذ عام تقريبا .word

50 (تقريبا x,عام)advmod This includes also quantifier modifiers جميل جدا .modifying the head of a QP constituent (جدا x,جميل)advmod

يستعمل يسيارته كثيرا (كثيرا x,يستعمل)advmod

انتشر محليا ودوليا (محليا x,انتشر)advmod

اكشترى يسيارة جديدة amod An adjectival modifier of an NP is any (جديدة x,يسيارة)that serves to modify amod (النعت) adjectival phrase the meaning of the NP. اتجه علء اليسواني، مؤلف عمارة يعقوبيان، of an NP is (البدل) appos An appositional modifier إلى النشاط السيايسي an NP immediately following the first NP (مؤلف x,علء)that serves to define or modify that NP. It appos includes defining abbreviations in one of يعيش صديقي حسن في لندن these structures as well as parenthesized (حسن x,صديق)examples. In these cases the second appos constituent modifies the first. حضر الجتماع وزير الثقافة اليسبق فاروق حسني (فاروق x,وزير)appos

كان محمد طبيبا بارعا attr An attr dependent is a nominal phrase headed (طبيبا x,كان)and the attr ,كان وأخواتها by a copular verb such as verbs of transformation ليس محمد طبيبا (طبيبا x,ليس)Note that attr is different from acomp in that attr the dependent is a noun phrase, not an صار محمد طبيبا .adjective (طبيبا x,صار)attr Sometimes it is not clear what should be the subject and what the attribute. In such cases, من كان مدريسك؟ .a.k.a) المبتدأ والخبر we should follow the (مدرس x,كان)attr subject-predicate, topic-comment or theme- rheme) structure. Note that in questions the wh-pronoun or the noun in the wh-phrase is in attr relation to the ROOT. كان الرجل يؤدي ما عليه aux An auxiliary of a clause is considered as a (كان x,يؤدي)non-main verb of the clause: this is reserved aux that is when they are ,كان وأخواتها to aspectual كان قد نسي كل ما حدث .followed by another verb (كان x,نسي)aux

51 ليس يساعد أحدا (ليس x,يساعد)aux يحب الناس ويساعدهم cc A coordination is the relation between an (و x,يحب)element of a conjunct and the coordinating cc conjunction. We take one conjunct of a conjunction (normally the first) as the head of the conjunction.) Words that can receive و، ف، ثم، أو، أم، بل، حتى، لك نن، ل :that tag are أيقن أن الوضع لن يتغير ccomp A clausal complement of a verb or adjective (يتغير x,أيقنت)is a dependent clause with an internal subject ccomp which functions like an object of the verb, or يريد أن يحصل كل إنسان على حقه adjective. This is usually introduced in (يحصل x,يريد)Sometimes ccomp .أ نن Arabic by the introduces this kind of sentences when the أ نن أنا على يقين أن المشروع يسيحقق نجاحا كبيرا .subject is present (يحقق x,يقين)ccomp Clausal complements for nouns are usually كان متأكدا أن الحقيقة يستظهر “ or ”حقيقة أ من“ associated with nouns like (تظهر x,متأكدا)We analyze them the same ccomp .”التصريح أ من (parallel to the analysis of this class as كان متأكدا أن الحقيقة يستظهر content clauses” in Huddleston and Pullum“ (متأكدا x,كان)ccomp .(2002

,are VBNs كان وأخواتها When predicates of they are also labels as ccomp

?يحقق ما يريد in ما What about هو صاحب الشركة ومديرها. conj A conjunct is the relation between two (مدير x,صاحب)elements (any phrase type) connected by a conj و، ف، " coordinating conjunction, cc, such as هي لطيفة ومهذبة وكريمة We treat conjunctions ."ثم، إلخ (مهذبة x,لطيفة)asymmetrically: The head of the relation is conj (كريمة x,لطيفة)the first conjunct and other conjunctions conj depend on it via the conj relation. Implied coordination (with no conjunctions) are .(هي لطيفة، مهذبة وكريمة) treated the same

يسرني أن أكون نافعا csubj A clausal subject is a clausal syntactic (أكون x,يسر)subject of a clause, i.e., the subject is itself a csubj .الفاعل جملة مسبوقة بأن المصدرية .clause يزعجني أن تتدهور المور بهذا الشكل (تتدهور x,يزعج)The governor of this relation might not csubj always be a verb: when the verb is a copular من الصعب أن تصبر أمام التحديات verb, the root of the clause is the complement (تصبر x,من)of the copular verb. csubj

52 يستحسن أن تستأذنه أول csubjpass A clausal passive subject is a clausal (تستأذن x,يستحسن)csubjpass نائب .syntactic subject of a passive clause .الفاعل جملة مسبوقة بأن المصدرية يفضل أن يبدأ الطفل في الكتابة مبكرا (يبدأ x,يفضل)csubjpass طريق القاهرة كشرم الشيخ dep A dependency is labeled as dep when the (كشرم x,القاهرة)system is unable to determine a more precise dep dependency relation between two words. كان الطبيب هو المسؤول This may be because of a weird grammatical (مسئول x,كان)construction, a limitation in the Stanford att (هو x,طبيب)Dependency conversion software, a parser dep error, or because of an unresolved long الكتاب الذي ايستعرته .distance dependency (الذي x,ايستعرت)dobj (ه x,ايستعرت)We use this tag in Arabic with the separating dep and الطبيب هو المسئول as in ضمير الفصل pronoun البرادعي (70 عاما) الكتاب as in ضمير الربط the (عام x,برادعي)dep .الذي ايستعرته (x 70,عام)num ضمير الفصل By default the separating pronoun حسن إبراهيم، دكتوراه في القتصاد will be attached to the subject unless there is (دكتوراه x,حسن)a conflict in number and gender between the dep subject and predicate and the pronoun حسن إبراهيم، وزاركة التجارة ,(الضحية هم الضعفاء .follows the predicate (e.g (وزارة x,حسن)in such case it is attached to the predicate. dep

فيلم الجزيرة، جإخرا ك كشريف عرفة This tag also covers independent noun (إخراج x,فيلم)phrases in parenthetical position (indicating dep age, affiliation, qualification, etc.), which doesn’t have a clear syntactic function in the clause.

عاد الرئيس det A determiner is the relation between the head (ال x,رئيس)of an NP and its determiner. In Arabic this is det .ال only the definite article دارت السيارة (ال x,يسيارة)det أهل، كيف حالك؟ discourse This is used for and other (أهل x,كيف)discourse particles and elements (which are discourse not clearly linked to the structure of the آه ياني sentence, except in an expressive way). We (آه x,ياني)generally follow the guidelines of what the discourse Penn Treebanks count as an INTJ. This بلى، أجل، آه، كل، نعم، ) includes: interjections .(ياه

53 الطفل غلبه النعاس dislocated The dislocated relation is used for fronted (طفل x,غلب)topicalized) or postposed elements that do dislocated) not fulfill the usual core grammatical السيارة لونها غريب relations of a sentence. The dislocated (يسيارة x,غريب)element attaches to the head of the clause to dislocated which it belongs. الكاتب نشرت الجريدة قصة حياته (كاتب x,نشرت)This happens in complex sentences nominal dislocated sentences when the predicate is a complete أين وضعته، الكتاب sentence that contain a pronoun referring (كتاب x,وضعت)dislocated الخبر جملة بها ضمير يعود على .back to the subject المبتدأ قرأ الطالب الدرس dobj The direct object of a VP is the noun phrase (درس x,قرأ)which is the (accusative) object of the verb. dobj This includes also relative pronouns كشكره .introducing rcmod (ه x,كشكر)dobj It also covers the object of a الضيف الذي ايستقبلته .(VBG) and non-conjugated verbs (VBN) (الذي x,ايستقبل)dobj

انتظاره صدور الحكم (صدور x,انتظار)dobj The main .ضمير الشأن expl This relation captures زعمت أنه ل يمكن تحقيق أرباح .verb of the clause is the governor (ه x,يمكن)expl foreign We use “foreign” to label sequences of أغنية أوند اش لوف foreign words whose meaning is not (أوند x,أغنية)understood to the Annotator. These are given gmod (اش x,أوند)a linear analysis: the head is the first token in foreign (لوف x,أوند)the foreign phrase. foreign does not apply to foreign loanwords or to foreign names. It applies to quoted foreign text incorporated in a ترجمه sentence/discourse of the host language set fire to the rain (x set,ترجمة)unless we want to and know how to annotate gmod) the internal structure according to the syntax dobj(set, fire) of the foreign language). prep(set, to) The foreign tag is only for sequence of words det(rain, the) which are not names and not easily pobj(set, rain) intelligible by average readers. طالب العلم gmod The genitive modifier relation applies to (علم x,طالب)cases in which there is a genitive attribute gmod الضافة .modifying an NP relation مدرس الجغرافيا (جغرافيا x,مدرس)This includes also relative pronouns gmod introducing rcmod.

54 العالم الذي يقوم بدوره ممثل مغمور (الذي x,دور)gmod أوا ئل الثانوية goeswith This relation links two parts of a word that (ئل x,أوا)are separate in the text that is not well edited. goeswith The head is in some sense the “main” part, often the first part. أعطى محمدا كتابا iobj The indirect object of a VP is the noun (محمدا x,أعطى)phrase which is the (dative) object of the iobj verb. The indirect object is the one that can .ل be moved after the preposition

It will be noted that indirect objects introduced by a preposition will respect the prep+pobj construction (cf. pobj relation examples). كشركة الهدى، تليفون: 555-9814 إيميل: list The list relation is used for chains of comparable items. Web text often contains '[email protected] (تليفون x,الهدى)passages which are meant to be interpreted as list (إيميل x,الهدى)lists but are parsed as single sentences. Email list (x 555-9814,تليفون)signatures in particular contain these appos (x [email protected],إيميل)structures, in the form of contact appos information: the different contact information items are labeled as list; the key-value pair relations are labeled as “appos”. In lists with more than two items, all items of the list should modify the first one. أيقن أن الوضع لن يتغير mark A marker is the word introducing a finite (أن x,يتغير)clause subordinate to another clause. For a mark أ نن complement clause, this will typically be يريد أن يسافر For an adverbial clause, the marker is .وأ نن (أن x,يسافر)mark إذا، typically a subordinating conjunction like إ نن، لو، حتى، طالما، حالما، بينما، عندما, وأخوات إن (أ نن، يسيأتي عندما يحين الوقت The mark is a .ليت، لعل، عل، كأن، لكن وعسى)، إلخ (عندما x,يحين)dependent of the subordinate clause head. mark

يستعاقب إذا أخطأت (إذا x,أخطأت)mark

يسيسود السلم عندما يعم التفاهم (عندما x,يعم)mark

يستستمر الفوضى طالما ل توجد خطة (طالما x,توجد)mark غير أني كنت يسأبقى. (mwe The multi-word expression (modifier (غير x,أن)relation is one of the three relations mwe (alongside gmod and nn) for compounding. It

55 دخل المستشفى حيث أنه أصيب. is used for certain fixed grammaticized (حيث x,أن)expressions with function words that behave mwe like a single . Multiword بالنسبة للوضع هناك expressions are annotated in a flat, head-last (ل structure, in which all words in the prep(x,x (ب x,ل)expression modify the last word using the mwe (ال x,ل)mwe label. The leftmost (last) word takes the mwe (نسبة x,ل)label based on its function. mwe

مازال في البيت. (ما x,زال)mwe لم يحضر أحد. neg The negation modifier is the relation between (لم x,يحضر)a negation word and the word it modifies. neg مواد غير صالحة لليستعمال The particles that are assigned the neg label (غير x,صالحة)neg لم، لن، ل، ل النافية للجنس، غير :include ل يرد العودة. (ل x,يريد)neg باراك أوباما nn A noun compound modifier of an NP is a (أوباما x,باراك)noun that serves to modify the head noun. In nn Arabic, this name is used for the relation محمد حسني مبارك ,between parts of people's names, i.e. first (حسني x,محمد)middle and last names. nn (مبارك x,حسني)nn Note that the hierarchy of the phrasal heads عبد العاطي :would be the following (عاطي x,عبد)nn 1. first name (as it is the case bearer) أبو عمار 2. middle name (عمار x,أبو)nn 3. last name بن لدن This means that the first name is the parent (لدن x,بن)node of the second name, and the second nn name is the parent node of the last name. بوركينا فايسو (فايسو x,بوركينا)nn

توك كشو This tag is also used for all MWE proper (كشو x,توك)nouns that are tagged in the POS as (NNP nn The .بوركينا فايسو، جينرال موتورز NNP), such as أراب أيدول .first element will be the head (أيدول x,أراب)This tag is also used for all MWE Arabized nn nouns that do not fit the idafa pattern (the لوي فيتون second part is not definite) that are tagged in (فيتون x,لوي)nn توك كشو، دي في the POS as (NN NN) , such as The first element will be the head .دي، يسي دي فولكس فاجن .in a flat structure

56 (فاجن x,فولكس)nn

نجح نجاحا باهرا npadvmod This relation captures various places where (نجاحا x,نجح)something, syntactically a noun phrase (NP), npadvmod is used as an adverbial modifier in a زرعنا الرض ذراة .sentence (ذرة x,زرعنا)npadvmod These usages include: هو أحسن منه حال المفعول المطلق غير العامل i) Mafoul mutlaq) (حال x,أحسن)not including tamyeez of npadvmod التمييز ii) Tamyeez) (تمييز العدد) numbers زرته مرتين (مرتين x,زرت)npadvmod

طمأنت إدارة الشركة . nsubj A nominal subject is a noun phrase (إدارة x,طمأنت)which is the syntactic subject of a nsubj clause. الوضع يسير نحو اليستقرار (وضع x,يسير)The governor of this relation might not nsubj always be a verb: when the verb is a copula. كانت السماء ملبدة بالغيوم. (يسماء x,كانت)This includes also relative pronouns nsubj introducing rcmod. السيارة معطلة (يسيارة x,معطلة)nsubj فاعل الجملة الفعلية ومبتدأ الجملة اليسمية واليسم الموصول الذي يحل محل الفاعل. الوضع الذي تفاقم (الذي x,تفاقم)It also covers the subject of a verbal noun nsubj (VBG). وضعه صديقه في مأزق (ه x,وضع)nsubj ايستقبل الرئيس في المطار ايستقبال باهرا. nsubjpass A passive nominal subject is a noun (رئيس x,ايستقبل)phrase which is the syntactic nsubjpass subject of a passive clause. وضع القانون لحماية الحريات. (قانون x,وضع)nsubjpass اكشترى أربعة كتب. num A numeric modifier of a noun is any number (أربعة x,كتب)phrase that serves to modify the meaning of num the noun with a quantity. في الفصل ثلثون طالبا. Note that numbers in proper names are also (ثلثون x,طالب)annotated as num, according to the German num and English analysis. This applies in Arabic whether the number is ثلثة رجابل as in مضاف إليه and the noun is مضاف .ثلثون رجل such as تمييز or the noun is عدد يسكانها خمسة وثلثون مليون نسمة number An element of compound number is a part of

57 (ثلثون x,خمسة)a number phrase or currency amount. conj (مليون x,خمسة)We regard a number as a specialized kind of number multi-word expression. The head is always the first element. واو Many numbers have the conjunction “and” in their construction. The conjoined number will be labeled as conj ذهبت إلى السوق. p This is used for any piece of punctuation in a (. x,ذهبت)clause. Punctuations usually depend on the p head of sentence (root element). بعد أن فرغت من كشراء احتياجاتها، عادت إلى A punctuation mark preceding or following a المنزل. subordinated unit is attached to this unit. The (، x,فرغت)punctuation "frames" the subordinate p element. و في عام 1973، كطرحت الفكرة من جديد Similarly, commas with prepositional phrases (، x,في)will attach to the head of the prepositional p phrase. هؤلء ”الخبراء“ يتقاضون مبالغ خرافية. ,When punctuation marks (parentheses (” x,خبراء)quotes, hyphens, etc.) indicate a local p (“ x,خبراء)dependency, punctuation tag will be p dependent on this local head. In the case where the punctuation play the role of a coordinative conjunction, p() rel must be assigned to the local head. ردد مقولته الشهيره: ما نخاف على التحاد إل parataxis The parataxis relation (from Greek for “place من التحاد نفسه side by side”) is a relation between a word (نخاف x,ردد)often the main predicate of a sentence) and parataxis) other elements, such as a sentential يسأله أحد الصحفيين: هل حدث تقدم يذكر في ,”;“ parenthetical or a clause after a “:” or a المفاوضات؟ placed side by side without any explicit (حدث x,يسأل)coordination, subordination, or argument parataxis relation with the head word. Parataxis is a أصوات بعيدة تتردد "منصورة منصورة، ,discourse-like equivalent of coordination واحد دمنهور “ .and so usually obeys an iconic ordering (منصورة x,تتردد)Hence it is normal for the first part of a parataxis sentence to be the head and the second part to be the parataxis dependent, regardless of the headedness properties of the language. خلق مناخ جاذب لليستثمار partmod A participial modifier of an NP or VP or (جاذب x,مناخ)sentence is a participial verb form that serves partmod to modify the meaning of a noun phrase or المرأة المعتمدة على نفسها .sentence (معتمدة x,مرأة)partmod ايسم الفاعل وايسم ) Active and passive participles (موضع النعت) in modifying position (المفعول صواريخ موجهة ذاتيا when they have a verbal meaning followed (موجهة x,صواريخ)by an argument), i.e. one of these tests apply: partmod

58 1) When the active participle is يسقط مغشيا عليه (الرجل قائد السيارة) in idafa to the object (مغشيا x,يسقط)or the object is linked through the partmod دور الشرطة ) such as ل preposition دخل مبتسما or the passive participle ,(المحقق للمن (مبتسما x,دخل)followed by the subject with the partmod الزوجة المهجورة ) such as من preposition (من زوجها 2) Active or passive participle is followed by a closely related الطفل المعتمد على والديه، preposition -or a non الشخص المتأخر عن يسداد ديونه الموجه عن بعد argument preposition 3) When Active or passive participles are followed by an adverb الطاقة المولدة ذاتيا، الطفل المبتسم دوما 4) The tag also includes Haal حال ,adverbial adjuncts

أعاده القضاء بعد ما ألغاه الرئيس pcomp This is used when the complement of a (الغى x,بعد)preposition is a clause ( or finite pcomp clause) or prepositional phrase (or أكشار إلى أن بعض القوانين تخالف الديستور occasionally, an adverbial phrase). The (تخالف x,إلى)complement of a preposition is the head of a pcomp clause following the preposition, or the نحتاج لن نعيد المور إلى نصابها preposition head of the following PP. This (نعيد x,ل)happens when a preposition (or pcomp ما، أ نن، أ من prepositional) is followed by التنبيه بأنه ل يمكن السفر إلى بعض الدول (يمكن x,ب)pcomp

عاد دون أن يحقق ما يريد (يحقق x,دون)pcomp

كان راغبا في أن يعود (يعود x,راغب)pcomp عاد إلى المنزل pobj The object of a preposition is the head of a (منزل x,إلى)noun phrase following the preposition. pobj

تفوق على أقرانه This includes also relative pronouns (أقران x,على)introducing rcmod. pobj

صديقه الذي يسافر معه (الذي x,مع)pobj مرحتش postneg Postneg is used for the postverbal adverb of (ش x,رحت)Egyptian Arabic double negative. This tag postneg

59 only concerns the second negative particle ما قال لكشي حاجة؟ when we have a double negative adverb (ش x,قال)in postneg ”م/ما … ش/كشي“ construction such as colloquial Egyptian Arabic. إما نقاوم أو نستسلم. preconj A preconjunct is the relation between the (إما x,نقاوم)head of an VP or an NP and a word that preconj (أو x,نقاوم)appears at the beginning bracketing a cc conjunction (and puts emphasis on it, such as .("إما" بعض الكشخاص predet A predeterminer is the relation between the (بعض x,أكشخاص)head of an NP and a word that precedes and predet modifies the meaning of the NP determiner. جميع التجاهات (جميع x,اتجاهات)This applies in Arabic to demonstrative predet nouns and quantifiers. هذه الحقيقة (هذه x,حقيقة)predet

كل هذا العناء (كل x,عناء)predet (هذا x,عناء)predet يسافر إلى أيسوان ,prep A prepositional modifier of a verb, adjective (إلى x,يسافر)or noun is any prepositional phrase that prep serves to modify the meaning of the verb, أعجب بالمكان .adjective, noun, or even another preposition (ب x,أعجب)We define prepositional (or quasi- prep “ like (اليسماء الملزمة للضافة prepositions or يسار نحو الديكتاتورية .”etc. as instances of “prep ”أمام”, “فوق (نحو x,يسار)We don’t distinguish whether the preposition prep is CLR or not. يسيحاول prt This is reserved for the list of particles that (س x,يحاول)do not function as subordinating prt conjunctions, complementizers, negation or قد حدث السين ويسوف، أدوات اليستفهام: هل، أ؛ ما ) discourse (قد x,حدث)prt الزائدة؛ لم المر؛ أحرف النداء : يا، أيها، أيتها، أيا، أ، أي؛ قد، لقد، أما وإنما، وإل، ويسوى، وعدا، فاء الربط، ما هل يسافرت They include future .(التعجبية، ل النافية للجنس (هل x,يسافرت)as well as interrogative ( prt ,(س، يسوف) particles and ,(إ نن) affirmative ,(إل، عدا) exceptive ,(هل، أ .(ما) exclamatory particles Only vocative and exceptive particles attach have affirmative إنما and أما to nouns, but and should attach to the إن scope similar to predicate. الكتاب الذي أعرته لي كان رائعا. rcmod A modifier of an NP is a (أعرت x,كتاب)relative clause modifying the NP. This is a rcmod

60 link from a noun to the verb which heads a relative clause. أحرز الزمالك هدفين والهلي ثلثة أهداف remnant The remnant relation is used to provide a satisfactory treatment of ellipsis. This Pierre lit un livre et Paul le journal. (الهلي x,الزمالك)relation is intended to capture syntactic remnant (أهداف x,هدفين)structure in elliptical constructions with a remnant missing head element. The "remnant" relation links dependents without an explicit head in an elliptical construction to dependents with an explicit head.

Note in particular that (unlike for conj), remnant uses a chaining analysis where each subsequent remnant depends on the immediately preceding remnant/correlate. اتجه يمينا … كشمال reparandum We use reparandum to indicate disfluencies (يمينا x,كشمال)overridden in a speech repair. The disfluency reparandum is the dependent of the repair. الملك حسن … حسين (حسن x,حسين)reparandum اجتمع وزراء الخارجية لمناقشة الزمة. root The root grammatical relation points to the (اجتمع ,root of the sentence. A fake node "ROOT" is ROOT(X used as the governor. الوضع لن يتغير كثيرا (يتغير ,ROOT(X

كشكرا جزيل (كشكرا ,ROOT(X

الحالة مستقرة (مستقرة ,ROOT(X

مع السلمة! (مع ,ROOT(X ذهبنا أمس للسينما tmod A temporal modifier (of a VP, NP, or an (أمس x,ذهب)ADJP) is a bare noun phrase constituent or tmod اليسبوع “ and ”أمس”, “اليوم“ adverbials such as يفتح اليسبوع القادم that serves to modify the meaning ”القادم/المقبل (أيسبوع x,يفتح)of the constituent by specifying a time. tmod “tmod” captures temporal points and ايستمر ثلثة أيام duration; it does not capture repetition ('two (أيام x,ايستمر)times', which would be an 'npadvmod'). tmod

ماذا تقول يا محمد؟ vocative The vocative relation is used to mark (محمد x,تقول)dialogue participant addressed in text vocative (common in emails and newsgroup postings).

61 The relation links the addressee’s name to its أحرف host sentence. The usually occur after النداء: يا، أيها، أيتها، أيا، أ، أي يريد أن يستقيل xcomp An open clausal complement of a VP or an (يستقيل x,يريد)ADJP is a clausal complement without its xcomp own subject, whose reference is determined by an external subject. The name xcomp is borrowed from Lexical Functional Grammar.

5.2 Dependency Labels 5.2.1 Root The root grammatical relation points to the root of the sentence. A fake node "ROOT" is used as the governor: .اجتمع وزراء الخارجية لمناقشة الزمة (اجتمع ,ROOT(X

الوضع لن يتغير كثيرا (يتغير ,ROOT(X

A special class of cases is presented by adjectival and nominal roots that result from copula omission in present tense. When the copula is omitted, the copula complement (nominal or adjectival) should be annotated as ROOT. الحالة مستقرة (مستقرة ,ROOT(X

However, when the copula is overtly present on surface, it should be annotated as ROOT. كانت الحالة مستقرة (كانت ,ROOT(X

Note that comparative degree adjectives can be ROOTs just as positive degree adjectives. الوضع أصعب مما تصورنا (أصعب ,ROOT(X

There is also a possibility for other parts-of-speech to be a ROOT: الكتاب هناك (هناك ,ROOT(X

62 الكتاب على الطاولة (على ,ROOT(X كشكرا جزيل (كشكرا ,ROOT(X !مع السلمة (مع ,ROOT(X

5.2.2 Auxiliary ● auxiliary: aux كان An auxiliary of a clause is considered as a non-main verb of the clause: this is reserved to aspectual .that is when they are followed by another verb ,وأخواتها

كان الرجل يؤدي ما عليه (كان x,يؤدي)aux

كان قد نسي كل ما حدث (كان x,نسي)aux

ليس يساعد أحدا (ليس x,يساعد)aux

5.2.3 Arguments 5.2.3.1 Subjects ● Phrasal ○ nominal subject: nsubj (.فاعل الجملة الفعلية ومبتدأ الجملة اليسمية واليسم الموصول الذي يحل محل الفاعل)

A nominal subject is a noun phrase which is the syntactic subject of a clause. . طمأنت إدارة الشركة (إدارة x,طمأنت)nsubj

الوضع يسير نحو اليستقرار (وضع x,يسير)nsubj

.كانت السماء ملبدة بالغيوم (يسماء x,كانت)nsubj

The governor of this relation might not always be a verb: when the verb is a non-existing copula which can be ,(الخبر the root of the clause is the complement (or predicate ,(جملة ايسمية verbless sentence) an adjective, noun, adverb or preposition. السيارة معطلة (يسيارة x,معطلة)nsubj

63 محمد طبيب (محمد x,طبيب)nsubj الرجل هناك (رجل x,هناك)nsubj الولد في الحديقة (ولد x,في)nsubj

This includes also relative pronouns introducing rcmod. الوضع الذي تفاقم (الذي x,تفاقم)nsubj

It also covers the subject of a verbal noun (VBG). وضعه صديقه في مأزق (ه x,وضع)nsubj

○ passive nominal subject: nsubjpass A passive nominal subject is a noun phrase which is the syntactic subject of a passive clause. .ايسكتقلبل الرئيس في المطار ايستقبال باهرا (رئيس x,ايستقبل)nsubjpass

.كولضع القانون لحماية الحريات (قانون x,وضع)nsubjpass

● Clausal ○ clausal subject: csubj

الفاعل جملة .A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause .مسبوقة بأن المصدرية يسرني أن أكون نافعا (أكون x,يسر)csubj

يزعجني أن تتدهور المور بهذا الشكل (تتدهور x,يزعج)csubj

The governor of this relation might not always be a verb: when it is a verbless copula construction, the .(الخبر root of the clause is the complement (or predicate من الصعب أن تصبر أمام التحديات (تصبر x,من)csubj

○ passive clausal subject: csubjpass نائب الفاعل جملة مسبوقة بأن .A clausal passive subject is a clausal syntactic subject of a passive clause .المصدرية يستحسن أن تستأذنه أول (تستأذن x,يستحسن)csubjpass

64 يفضل أن يبدأ الطفل في الكتابة مبكرا (يبدأ x,يفضل)csubjpass

5.2.3.2 Complements ● Phrasal ○ direct object: dobj The direct object of a VP is the noun phrase which is the (accusative) object of the verb. قرأ الطالب الدرس (درس x,قرأ)dobj

كشكره (ه x,كشكر)dobj

This includes also relative pronouns introducing rcmod.

الضيف الذي ايستقبلته (الذي x,ايستقبل)dobj

It also covers the object of a verbal noun (VBG). انتظاره صدور الحكم (صدور x,انتظار)dobj

The object argument of the VBN’s also take dobj. منتظرا صدور الحكم

(صدور x,منتظرا)dobj

○ indirect object: iobj The indirect object of a VP is the noun phrase which is the (dative) object of the verb. The indirect It will be noted that indirect objects .ل object is the one that can be moved after the preposition introduced by a preposition will respect the prep+pobj construction (cf. pobj relation examples). أعطى محمدا كتابا (محمدا x,أعطى)iobj

○ object of a preposition: pobj The object of a preposition is the head of a noun phrase following the preposition. عاد إلى المنزل (منزل x,إلى)pobj

تفوق على أقرانه (أقران x,على)pobj

65 ○ adjectival complement: acomp

An adjectival complement of a verb is an adjectival phrase which functions as the complement. This كان وأخواتها: كان، وأمسى، وأصب لح،ليس، وأضحى، و لظنل ،) relation specifically includes “be” copula constructions .(الخبر الوصفي) with adjective predicatives (وبالت، وصار، وليس، وما زال، وما انلف نك، وما لفلتيلء، وما لبلر لح، وما دام كان زيد مريضا (مريضا x,كان)acomp

ليس زيد مريضا (مريضا x,ليس)acomp

أصبح زيد مريضا (مريضا x,أصبح)acomp

بدا يسعيدا (يسعيدا x,بدا)acomp

ظن وأخواتها: ظن وحسب وخال وزعم ورأى وعلم ووجد واتخذ، ويسمع It also includes verbs of uncertainty

ظننته مخلصا (مخلصا x,ظننت)acomp

○ attributive: attr .كان وأخواتها An attr dependent is a nominal phrase headed by a copular verb such as كان محمد طبيبا بارعا (طبيبا x,كان)attr

ليس محمد طبيبا (طبيبا x,ليس)attr

Note that attr is different from acomp in that the dependent is a noun phrase, not an adjective. Sometimes it is not clear what should be the subject and what the attribute. In such cases, we should .a.k.a. topic-comment or theme-rheme) structure) المبتدأ والخبر follow the صار محمد طبيبا (طبيبا x,صار)attr صار محمد كريما (كريما x,صار)acomp

Note that in questions the wh-pronoun or the noun in the wh-phrase is in attr relation to the ROOT. من كان مدريسك؟ (مدرس x,كان)attr

66 (أفعال التحويل) Verbs of Transforming Verbs of transformation are ditransitive verbs that take subjects and predicates as its two objects أفعال ) They are of three categories: verbs of knowing . الفعال التي تنصب مفعولين أصلهما مبتدأ وخبر arguments and verbs of ,ظن، زعم، حسب such as (أفعال الرجحان) verbs of thinking ,علم، وجد، رأى such as ,(اليقين جعل، صير، اتخذ such as (أفعال التحويل) transforming Unlike regular diatransitive verbs, the second object of the verbs of transformation should be labeled as attr instead of iobj. This is because of its preicational function.

ظننته طبيبا (طبيبا x,ظننت)attr ظننته كريما (كريما x,ظننت)acomp إتخذه صديقا (صديقا x,إتخذ)attr

might not be listed as a verb of transformation in توج This verb category is not a closed list. Verbs like Arabic grammar references. Yet, It can still be functioning like a verb of transformation: توجوه ملكا (ملكا x,توجوا)attr

إنتخبوأ أوباما رئيسا (رئيسا x,إنتخبوا)attr

To distinguish the attr second object from the iobj one, apply the following test: separate the two objects from the sentence. If they form a subject-predicate sentence, the predicate will be the attr:

Full Sentence Separated Objects Subject Predicate? attr or iobj yes attr هو صديق إتخذه صديقا yes attr أوباما رئيس إنتخبوأ أوباما رئيسا no iobj صديقه هدية أعطى الولد صديقه هدياة

● Clausal ○ finite clausal complement: ccomp

A clausal complement of a verb or adjective is a dependent clause with an internal subject which functions like an object of the verb, or adjective. This is usually introduced in Arabic by the .introduces this kind of sentences when the subject is present أ نن Sometimes .أ نن complementizer

67 أيقن أن الوضع لن يتغير (يتغير x,أيقنت)ccomp

يريد أن يحصل كل إنسان على حقه (يحصل x,يريد)ccomp

We analyze them the .”التصريح أ من“ or ”حقيقة أ من“ Clausal complements for nouns are limited to nouns like same (parallel to the analysis of this class as “content clauses” in Huddleston and Pullum 2002).

أنا على يقين أن المشروع يسيحقق نجاحا كبيرا (يحقق x,يقين)ccomp

كان متأكدا أن الحقيقة يستظهر (تظهر x,متأكدا)ccomp

أوضح أن على المواطن كشراء وحدات يسكنية (على x,أوضح)ccomp

○ non-finite clausal complement : xcomp

An open clausal complement of a VP or an ADJP is a clausal complement without its own subject, whose reference is determined by an external subject. The name xcomp is borrowed from Lexical - Functional Grammar. يريد أن يستقيل (يستقيل x,يريد)xcomp

Notice that in the sentences above, the subject of the xcomp is the same as the subject of its parent verb. Sometimes the subject of the xcomp is the direct object of the parent verb: يريدهم أن يعودوا

(يعودوا x,يريد)xcomp

The two tokens will be ل when it occurs with the negative particle أن Attention should be paid to and the following verb will be أن annotated similarly to ,ل Should split from the أ The . أل merged as treated also the same (ccomp/xcomp and subjunctive) was preceded by a prep the pcomp overrides أن Also, since every prep requires an argument, when the the xcomp:

كان راغبا في أن يعود (يعود x,راغبا)pcomp

The following needs consideration??

68 are control verbs that indicate verbal complement even if the أراد and حاول, ايستطاع, تمكن The verbs :ال masdar is attached with the definite article

حاول التدخل في المر .1 أراد التوجه إلى البيت .2 ايستطاع الخروج في الوقت المنايسب .3 تمكن من تعويض خسائره .4 واصل تغطية الحداث .5 مواصلة تغطية الحداث .6 رغب في توضيح وجهة نظره .7 الرغبة في الرحيل .8 (exceptional case) الرغبة في عودة النظام القديم .9 حرص على التحدث .10 ايستعد للقفز في الماء .11 (control to object) دفعه للغاء المبارة .12 ايستمر في محاورة خصمه .13 and what about these cases:

انتهى من اختيار الفريق ● رفض توقيع العقد ● قام بتوزيع الجوائز ● قيامه بتوزيع الجوائز ● يهدف إلى زيادة الوعي ● يجب توفير الخدمات ●

○ prepositional complement: pcomp

This is used when the complement of a preposition is a clause (infinitive or finite clause) or prepositional phrase (or occasionally, an adverbial phrase). The complement of a preposition is the head of a clause following the preposition, or the preposition head of the following PP. This happens ما، أ نن، أ من when a preposition (or prepositional) is followed by

أكشار إلى أن بعض القوانين تخالف الديستور (تخالف x,إلى)pcomp

نحتاج لن نعيد المور إلى نصابها (نعيد x,ل)pcomp

التنبيه بأنه ل يمكن السفر إلى بعض الدول (يمكن x,ب)pcomp

69 عاد دون أن يحقق ما يريد (يحقق x,دون)pcomp

: ما المصدرية the pcomp is applicable only if it was ,ما Note that with أعاده القضاء بعد ما ألغاه الرئيس (الغى x,بعد)pcomp

:is treated differently ما The relative pronoun لم يعلق على ما حدث في ليبيا (ما x,على)pobj (حدث x,ما)rcmod

5.2.4 Modifiers ● Phrasal ○ determiner: det A determiner is the relation between the head of an NP and its determiner. In Arabic this is only the .ال definite article عاد الرئيس (ال x,رئيس)det

دارت السيارة (ال x,يسيارة)det

○ predeterminer: predet A predeterminer is the relation between the head of an NP and a word that precedes and modifies the meaning of the NP determiner. This applies in Arabic to demonstrative nouns and quantifiers. بعض الكشخاص (بعض x,أكشخاص)predet

جميع التجاهات (جميع x,اتجاهات)predet

هذه الحقيقة (هذه x,حقيقة)predet

كل هذا العناء (كل x,عناء)predet (هذا x,عناء)predet ■ Nominalized predet’s. Some predet words function as nouns. Below are some examples: .some is widely used in Arabic texts / بعض ● some people / بعض الكشخاص In most cases, it is a predet as in the example above. However, as mentioned in the POS and Morphology sections,

70 Some have attended. In this case, it /البعض حضر can be nominal as in بعض is labeled as an nsubj. Moreover, it can appear in reciprocal expressions Here are the most common uses of these expressions .بعضهم البعض like and their dependency labeling:

is a بعض his is clearly subject object situation, where the first يحب بعضهم بعضا In - predet are different from the classical usage and they بعضهم بعضا وبعضهم البعض In MSA - are influenced by the translation of "each other". There is no traditional grammatical parsing to this new construction. Examples: يحب الولد بعضهم بعضا 1.11 يتشاجرالولد مع بعضهم البعض .2 (looks ungrammatical but common) مشكلت الطلب مع بعضهم بعضا .3 and second الولد pdt and the pronoun as appos to بعض In (1) we can have first - .as object بعض as pdt and the pronoun as the pobj and second بعض In (2) we can have the first - .as appos to the pronoun بعض as an بعض In (3) it can be treated as (2) considering that the case of the second - .هم intentional error. So it will have case=acc and it will be appos of

one (of) is another predet if it أحد/إحدى ● .one of the students / أحد الطلب specifies a quantity meaning one of as in no one /ل أحد في البيت On the other hand, if it means someone or one as in at home. Here it is labeled as an nsubj

○ adjectival modifier: amod that serves to modify the meaning of the (النعت) An adjectival modifier of an NP is any adjectival phrase NP. اكشترى يسيارة جديدة (جديدة x,يسيارة)amod

أمرضه الحزن المفرط (مفرط x,حزن)amod The amod is basically for adjectives. However, if these adjectives were nominals, they’d be labeled based on their function in the context. This is also applicable on the adjectives heading false idafa:

تحمل أهم الذكريات (أهم x,تحمل)dobj (ذكريات x,أهم)gmod

is present أولد This is different from the first example as the subject 11

71 ○ noun compound modifier: gmod

The genitive modifier relation applies to cases in which there is a genitive attribute modifying an NP. الضافة طالب العلم (علم x,طالب)gmod

مدرس الجغرافيا (جغرافيا x,مدرس)gmod However, sometimes tokens other than nouns مضاف اليه Note that gmod is usually a nominal like the to suit is a verb but it is the head of the/يليق ”from the novel “The Black Suits you / '' من رواية '' اليسود يليق بك :for example second part of an annexation i.e. in a position of a gmod. Thus, it is labeled as gmod ○ noun compound modifier: nn A noun compound modifier of an NP is a noun that serves to modify the head noun. In Arabic, this name is used for the relation between parts of people's names, i.e. first, middle and last names. Note that the hierarchy of the phrasal heads would be the following: first name (as it is the case bearer) middle name last name This means that the first name is the parent node of the second name, and the second name is the parent node of the last name. باراك أوباما (باراك x,أوباما)nn

محمد حسني مبارك (حسني x,محمد)nn (مبارك x,حسني)nn

If the first name was a compound noun, the next (middle or last) name will be attached to its rightmost token: عبد الفتاح السيسي (فتاح x,عبد)nn (يسيسي x,عبد)nn

:”(Alm’tasim billah (The Protected by God“ المعتصم بال .Some name include a preposition e.g NNP ال IN ب NNP معتصم DET l ال Function words like prepositions and determiners are not labeled as nn. Rather, they are prep and det respectively. Prepositions, on the other hand, always require an argument. Therefore, their arguments within the names will be pobj instead of nn: pobj ال prep ب nn12 معتصم det ال

The nn label is also used for all MWE proper nouns that are tagged in the POS as (NNP NNP), such as

12 Please note that if this is the first name, the label is usually not nn.

72 .The first element will be the head .بوركينا فايسو، جينرال موتورز بوركينا فايسو (فايسو x,بوركينا)nn

أراب أيدول (أيدول x,أراب)nn

لوي فيتون (فيتون x,لوي)nn

فولكس فاجن (فاجن x,فولكس)nn

This tag is also used for all MWE Arabized nouns that do not fit the idafa pattern (the second part is not The first element .توك كشو، دي في دي، يسي دي definite) that are tagged in the POS as (NN NN) , such as will be the head in a flat structure. توك كشو (كشو x,توك)nn

○ ‘goes with’ element: goeswith This relation links two parts of a word that are separate in the text that is not well edited. The head is in some sense the “main” part, often the first part. أوا ئل الثانوية (ئل x,أوا)goeswith

○ multi-word expression modifier: mwe The multi-word expression (modifier) relation is one of the three relations (alongside gmod and nn) for compounding. It is used for certain fixed grammaticized expressions with function words that behave like a single word. It is used for a closed set of dependencies between words in common multi-word expressions for which it seems difficult or unclear to assign any other relationships. This relation concerns grammatical idioms. Multiword expressions are annotated in a flat, head-last structure, in which all words in the expression modify the last word using the mwe label. The leftmost (last) word takes the label based on its function. .غير أني كنت يسأبقى (غير x,أن)mwe

.دخل المستشفى حيث أنه أصيب (أن x,حيث)mwe

بالنسبة للوضع هناك (ل prep(x,x (ب x,ل)mwe (ال x,ل)mwe (نسبة x,ل)mwe

.مازال في البيت

73 (ما x,زال)mwe

○ appositional modifier: appos of an NP is an NP immediately following the first NP that serves to (البدل) An appositional modifier define or modify that NP. It includes defining abbreviations in one of these structures as well as parenthesized examples. In these cases the second constituent modifies the first. اتجه علء اليسواني، مؤلف عمارة يعقوبيان، إلى النشاط السيايسي (مؤلف x,اليسواني)appos

يعيش صديقي حسن في لندن (حسن x,صديق)appos

حضر الجتماع وزير الثقافة اليسبق فاروق حسني (فاروق x,وزير)appos

Sometimes an NP can be modified by more than one appos, in this case all the appos’s are dependent on the first NP: ...قال المهندس كشريف ايسماعيل وزير البترول (كشريف x,المهندس)appos (وزير x,المهندس)appos

Apposition relations do not hold only among NPs. Parenthetical noun phrases will also be annotated as appositions. ينحدر مجدي يعقوب ( أكشهر أطباء القلب في العالم) من قرية بلبيس في الشرقية (أكشهر x,يعقوب)appos

نفس، عين، كل، :This includes one of the six words that modify an NP .التوكيد المعنوي This also includes جميع، كل، كلتا حضر الناظر نفسه (نفس x,ناظر)appos

Similarly, post-nominal demonstrative pronouns are also appos: حضر الناظر هذا (هذا x,ناظر)appos

If the appos was a clause, its head will take the appos label العضوة زوجاته قدوتي هي صاحبة المشاركة (قدوة x,عضوة)appos even if it was not a noun:

○ adverbial modifier: advmod that serves to (الظروف) An adverbial modifier of a word is a (non-clausal) adverb or adverbial phrase modify the meaning of the word.

74 رأيت زميلي هناك (هناك x,رأيت)advmod

منذ عام تقريبا (تقريبا x,عام)advmod

جميل جدا (جدا x,جميل)advmod

يستعمل يسيارته كثيرا (كثيرا x,يستعمل)advmod

انتشر محليا ودوليا (محليا x,انتشر)advmod

This includes also quantifiers and expressions modifying a number (num). This can come before or after the number. حوالي 30 رجل (حوالي advmod(30,x رجل فقط 30 (فقط advmod(30,x رجل على الكثر 30 (على x,أكثر)mwe (ال x,أكثر)mwe (أكثر advmod(30,x

Note the difference in annotating the following expressions: رأى ما يقرب من 30 رجل (ما x,رأى)dobj (يقرب x,ما)rcmod (من x,يقرب)prep (رجل x,من)pobj (x 30,رجل)num رأى في حدود 30 رجل (في x,رأى)prep (حدود x,في)pobj (رجل x,حدود)gmod (x 30,رجل)num

رأى أقل من 30 رجل (أقل x,رأى)dobj (من x,أقل)prep (رجل x,من)pobj (x 30,رجل)num رأى أكثر من 30 رجل

75 (أكثر x,رأى)dobj (من x,أكثر)prep (رجل x,من)pobj (x 30,رجل)num

○ noun phrase adverbial modifier: npadvmod This relation captures various places where something, syntactically a noun phrase (NP), is used as an adverbial modifier in a sentence. These usages include: المفعول المطلق i) Mafoul mutlaq) نجح نجاحا باهرا (نجاحا x,نجح)npadvmod

(تمييز العدد) not including tamyeez of numbers التمييز ii) Tamyeez) زرعنا الرض ذراة (ذرة x,زرعنا)npadvmod

هو أحسن منه حال (حال x,أحسن)npadvmod

جاء وحده

(وحد x,جاء)npadvmod

In the examples above, the npadvmod is attached to the head of its clause. However, if it was modifying a noun, it would be attached to it as its child: إذا ذكر ال وحده

(وحد x,ال)npadvmod زرته مرتين (مرتين x,زرت)npadvmod .it would be an advmod ,مرة ,is an npadvmod while if it was singular مرتين ,Note that in the last example

○ temporal modifier: tmod A temporal modifier (of a VP, NP, or an ADJP) is a bare noun phrase constituent or adverbials such as “ that serves to modify the meaning of the constituent by specifying a ”اليسبوع القادم/المقبل“ and ”أمس”, “اليوم time. “tmod” captures temporal points and duration; it does not capture repetition ('two times', which would be an 'npadvmod'). ذهبنا أمس للسينما (أمس x,ذهب)tmod

يفتح اليسبوع القادم (أيسبوع x,يفتح)tmod

76 ايستمر ثلثة أيام (ثلثة x,ايستمر)tmod

○ numeric modifier: num A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun with a quantity. Note that numbers in proper names are also annotated as num, according to the German and English analysis. or the noun ثلثة رجابل as in مضاف إليه and the noun is مضاف This applies in Arabic whether the number is .ثلثون رجل such as تمييز is .اكشترى أربعة كتب (أربعة x,كتب)num

.في الفصل ثلثون طالبا (ثلثون x,طالب)num

○ element of compound number: number An element of compound number is a part of a number phrase or currency amount. We regard a number as a specialized kind of multi-word expression. The head is always the first element. عدد يسكانها خمسة وثلثون مليون نسمة ( ثلثون x,خمسة)conj (مليون x,خمسة)number

○ negation modifier: neg The negation modifier is the relation between a negation word and the word it modifies. .لم يحضر أحد (لم x,يحضر)neg

.ل يرد العودة (ل x,يريد)neg

○ postverbal negation modifier: postneg Postneg is used for the postverbal adverb of Egyptian Arabic double negative. This tag only concerns م/ما … “ the second negative particle when we have a double negative adverb construction such as .in colloquial Egyptian Arabic ”ش/كشي مرحتش (ش x,رحت)postneg

ما قال لكشي حاجة؟ (ش x,قال)postneg

○ prepositional modifier: prep A prepositional modifier of a verb, adjective, or noun is any prepositional phrase that serves to modify

77 the meaning of the verb, adjective, noun, or even another preposition. etc. as ”أمام”, “فوق“ like (اليسماء الملزمة للضافة We define prepositional (or quasi-prepositions or instances of “prep”. We don’t distinguish whether the preposition is CLR or not. يسافر إلى أيسوان (إلى x,يسافر)prep

أعجب بالمكان (ب x,أعجب)prep

يسار نحو الديكتاتورية (نحو x,يسار)prep

○ marker: mark A marker is the word introducing a finite clause subordinate to another clause. For a complement For an adverbial clause, the marker is typically a subordinating .أ نن وأ نن clause, this will typically be The .إذا، إ نن، لو، حتى، طالما، حالما، بينما، عندما, وأخوات إن (أ نن، ليت، لعل، عل، كأن، لكن) وعسى، إلخ conjunction like mark is a dependent of the subordinate clause head. أيقن أن الوضع لن يتغير (أن x,يتغير)mark

يريد أن يسافر (أن x,يحصل)mark

يسيأتي عندما يحين الوقت (عندما x,يحين)mark

يستعاقب إذا أخطأت (إذا x,أخطأت)mark

يسيسود السلم حالما يعم التفاهم (حالما x,يعم)mark

طالما ل توجد خطة، يستستمر الفوضى

(طالما x,توجد)mark

حتى لو Some MWE subordinating conjunctions are لن يستطيع حتى لو أراد (لو x,أراد)mark (حتى x,لو)mwe

A marker is also the word introducing a ccomp, csubj and pcomp. It corresponds to words tagged as IN .(”إذا“ and ”أن“ mostly the words) أيقن أن الوضع يسيتحسن

78 (أن x,يتحسن)mark يسرني أن أيساعدك (أيساعد x,يسر)csubj

● Clausal ○ adverbial clause modifier: advcl An adverbial clause modifier of a verb or a clause is a clause modifying the verb (temporal clause, consequence, conditional clause, purpose clause, etc.). الحال الجملة Adverbial clauses are either introduced by a marker or include a tensed verb, as in the case of ل تضارب في البورصة حتى ل تخسر (تخسر x,تضارب)advcl

عاد من عمله يعاني من الرهاق (يعاني x,عاد)advcl

أحست بالظلم ينخر عظامها (ينخر x,ظلم)advcl Note that in the last example the advcl is a child of the noun it adverbially modifies rather than the verb المفعول لجله It also includes Mafoul li’ajlih عمل باجتهاد حرصا على مسقبل أولده (حرصا x,عمل)advcl

.الجمل المعترضة It also covers parenthetical clauses محمد (صلى ال عليه ويسلم) (صلى x,محمد)advcl إن الشبان موهوبون وهم كشقيقان وصديق لهما (كشقيقان x,موهوبون)advcl زار بعض الدول منها بريطانيا والسويد (من x,زار)advcl in the sentence changed its label from prep to advcl من Note that in the last example, the function of While the head of the predicate takes the advcl, in some adverbial clauses, the predicate is omitted. : لول starting with جملة الشرط Therefore, the subject takes the advcl. This mostly occurs with لول جاهير النادي لما تحقق الفوز (جماهير x,تحقق)advcl

المفعول المطلق العامل It also include cognate accusative heading an argument

تضاعف مستخدمو النترنت وفقا للتقارير الريسمية (وفقا x,تضاعف)advcl

○ particle modifier: prt This is reserved for the list of particles that do not function as subordinating conjunctions, السين ويسوف، أدوات اليستفهام: هل، أ؛ ما الزائدة؛ لم المر؛ أحرف النداء : يا، ) complementizers, negation or discourse They include future .(أيها، أيتها، أيا، أ، أي؛ قد، لقد، أما وإنما، وإل، ويسوى، وعدا، فاء الربط، ما التعجبية، ل النافية للجنس and exclamatory ,(إ نن) affirmative ,(إل، عدا) exceptive ,(هل، أ) as well as interrogative ,(س، يسوف) particles

79 .(ما) particles يسيحاول (س x,يحاول)prt

قد حدث (قد x,حدث)prt

هل يسافرت (هل x,يسافرت)prt have affirmative scope similar to إنما and أما Only vocative and exceptive particles attach to nouns, but .and should attach to the predicate إن

○ relative clause modifier: rcmod A relative clause modifier of an NP is a relative clause modifying the NP. This is a link from a noun to the verb which heads a relative clause. الضيف الذي غادر يسريعا (غادر x,ضيف)rcmod

Relative pronouns are attached to the rcmod according to their function: الضيف الذي غادر يسريعا (الذي x,غادر)nsubj

The rcmod label is for the head of the relative clause. Attention should be paid when the nouns modified by clauses are indefinite since there will be no explicit relative pronoun. In the previous two examples, the modified nouns are definite. Otherwise, there would be no relative pronoun:

ضيف غادر يسريعا (غادر x,ضيف)rcmod Or compare these two examples: ترك العمال التي ل تنسى (تنسى x,أعمال)rcmod

ترك أعما ال لتنسى

(تنسى x,أعمال)rcmod

○ participial modifier: partmod A participial modifier of an NP or VP or sentence is a participial verb form that serves to modify the meaning of a noun phrase or sentence. خلق مناخ جاذب لليستثمار (جاذب x,مناخ)partmod

المرأة المعتمدة على نفسها (معتمدة x,مرأة)partmod

80 صواريخ موجهة ذاتيا (موجهة x,صواريخ)partmod

when they have (موضع النعت) in modifying position (ايسم الفاعل وايسم المفعول) Active and passive participles a verbal meaning, i.e. one of these tests apply: or the object is linked (الرجل قائد السيارة) When the active participle is in idafa to the object (1 or the passive participle followed by ,(دور الشرطة المحقق للمن) such as ل through the preposition (الزوجة المهجورة من زوجها) such as من the subject with the preposition الطفل المعتمد على Active or passive participle is followed by a closely related preposition (2 الموجه عن بعد or a non-argument preposition والديه، الشخص المتأخر عن يسداد ديونه الطاقة المولدة ذاتيا، الطفل المبتسم When Active or passive participles are followed by an adverb (3 دوما Haal حال ,The tag also includes adverbial adjuncts (5

يسقط مغشيا عليه (مغشيا x,يسقط)partmod

دخل مبتسما (مبتسما x,دخل)partmod

5.2.5 Coordinations / juxtapositions

5.2.5.1 Coordination ● coordination: cc A coordination is the relation between an element of a conjunct and the coordinating conjunction. We take one conjunct of a conjunction (normally the first) as the head of the conjunction.) Words that can و، ف، ثم، أو، أم، بل، حتى، لكن، ل :receive that tag are يحب الناس ويساعدهم (و x,يحب)cc

واو Labeling at the beginning of the sentence is prt واو ● in the middle of the paragraph (between two sentences) is واو ● ○ cc by default, ○ considered prt only when followed by a subordinating conjunction. It will ولو، وإ نن، .be daughter of the subordinating conjunction (which is labelled mark), e.g ,وطالما، ولكن، ولعل، إلخ ○ If waw comes between two subordinating conjunctions, the waw is still :أن وأن، لعل ولعل، إلخ .cc, e.g ...طالب حسين بأن تتحول البنوك الزراعية إلى بنوك تسليف فلحى وأن تحصل فائدة ل تزيد عن

81 ● conjunct: conj A conjunct is the relation between two elements (any phrase type) connected by a coordinating We treat conjunctions asymmetrically: The head of the relation ."و، ف، ثم، إلخ" conjunction, cc, such as is the first conjunct and other conjunctions depend on it via the conj relation. Implied coordination .(هي لطيفة، مهذبة وكريمة) with no conjunctions) are treated the same)

.هو صاحب الشركة ومديرها (مدير x,صاحب)conj

هي لطيفة ومهذبة وكريمة (مهذبة x,لطيفة)conj (كريمة x,لطيفة)conj

● preconjunct: preconj A preconjunct is the relation between the head of an VP or an NP and a word that appears at the .("إما" beginning bracketing a conjunction (and puts emphasis on it, such as .إما نقاوم أو نستسلم (إما x,نقاوم)preconj (أو x,نقاوم)cc

5.2.5.2 Juxtaposition ● parataxis The parataxis relation (from Greek for “place side by side”) is a relation between a word (often the main predicate of a sentence) and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. Parataxis is a discourse-like equivalent of coordination, and so usually obeys an iconic ordering. Hence it is normal for the first part of a sentence to be the head and the second part to be the parataxis dependent, regardless of the headedness properties of the language.

ردد مقولته الشهيره: ما نخاف على التحاد إل من التحاد نفسه (نخاف x,ردد)parataxis

يسأله أحد الصحفيين: هل حدث تقدم يذكر في المفاوضات؟ (حدث x,يسأل)parataxis

5.2.6 Miscellaneous ● pleonastic pronoun : expl .The main verb of the clause is the governor .ضمير الشأن This relation captures زعمت أنه ل يمكن تحقيق أرباح (ه x,يمكن)expl

● remnant in ellipsis: remnant The remnant relation is used to provide a satisfactory treatment of ellipsis. This relation is intended to capture syntactic structure in elliptical constructions with a missing head element. The "remnant" relation links dependents without an explicit head in an elliptical construction to dependents with an

82 explicit head. Note in particular that (unlike for conj), remnant uses a chaining analysis where each subsequent remnant depends on the immediately preceding remnant/correlate.

أحرز الزمالك هدفين والهلي ثلثة أهداف

(الهلي x,الزمالك)remnant (أهداف x,هدفين)remnant

ل يمكن تمييز الصخور الطبيعية من الصطناعية (الصطناعية x,الطبيعية)remnant

Note that even if crossing dependencies must be avoided, ‘remnant’ (like ‘reparandum’ and ‘dislocated’) is a rare case where the phenomenon occurs.

● dislocated elements: dislocated The dislocated relation is used for fronted (topicalized) or postposed elements that do not fulfill the usual core grammatical relations of a sentence. The dislocated element attaches to the head of the clause to which it belongs.

This happens in complex sentences nominal sentences when the predicate is a complete sentence that الخبر جملة بها ضمير يعود على المبتدأ .contain a pronoun referring back to the subject

الطفل غلبه النعاس (طفل x,غلب)dislocated

السيارة لونها غريب (يسيارة x,غريب)dislocated

الكاتب نشرت الجريدة قصة حياته (كاتب x,نشرت)dislocated

أين وضعته، الكتاب (كتاب x,وضعت)dislocated

● overridden disfluency: reparandum We use reparandum to indicate disfluencies overridden in a speech repair. The disfluency is the dependent of the repair. اتجه يمينا … كشمال (يمينا x,كشمال)reparandum

الملك حسن … حسين (حسن x,حسين)reparandum

83 ● discourse element: discourse This is used for interjections and other discourse particles and elements (which are not clearly linked to the structure of the sentence, except in an expressive way). We generally follow the guidelines of what .(بلى، أجل، آه، كل، نعم، ياه) the Penn Treebanks count as an INTJ. This includes: interjections أهل، كيف حالك؟ (كيف x,أهل)discourse

آه ياني (ياني x,آه)discourse

Discourse also includes emoticons which we treat as compounds composed of punctuation rather than orthographic characters, the head should be the right-most character, with all other characters attached via discourse(). (-; لم أفهم ما قلت ((-; x,أفهم)discourse

● list: list The list relation is used for chains of comparable items. Web text often contains passages which are meant to be interpreted as lists but are parsed as single sentences. Email signatures in particular contain these structures, in the form of contact information: the different contact information items are labeled as list; the key-value pair relations are labeled as “appos”. In lists with more than two items, all items of the list should modify the first one. '[email protected] :كشركة الهدى، تليفون: 555-9814 إيميل (تليفون x,الهدى)list (إيميل x,الهدى)list (x 555-9814,تليفون)appos (x [email protected],إيميل)appos فيلم الجزيرة، إخراج ك كشريف عرفة، بطولة أحمد السقا (إخراج x,فيلم)list (بطولة x,فيلم)list (كشريف x,إخراج)gmod (أحمد x,بطولة)gmod ● vocative: vocative The vocative relation is used to mark dialogue participant addressed in text (common in emails and newsgroup postings). The relation links the addressee’s name to its host sentence. ماذا تقول يا محمد؟ (محمد x,تقول)vocative

● foreign: foreign We use “foreign” to label sequences of foreign words. These are given a linear analysis: the head is the first token in the foreign phrase. foreign does not apply to loanwords or to foreign names. It applies to quoted foreign text incorporated in a sentence/discourse of the host language (unless we want to and know how to annotate the internal structure according to the syntax of the foreign language).

84 أغنية أوند اش لوف (أوند x,أغنية)gmod (اش x,أوند)foreign (لوف x,أوند)foreign

set fire to the rain ترجمه (x set,ترجمة)gmod dobj(set, fire) prep(set, to) det(rain, the) pobj(set, rain)

● punctuation: p This is used for any piece of punctuation in a clause. Punctuations depend on the head of sentence (root element) or the head of the local phrase/clause. .ذهبت إلى السوق (. x,ذهبت)p

A punctuation mark preceding or following a subordinated unit is attached to this unit. The punctuation "frames" the subordinate element. .بعد أن فرغت من كشراء احتياجاتها، عادت إلى المنزل (، x,فرغت)p

Similarly, commas with prepositional phrases will attach to the head of the prepositional phrase.

و في عام 1973، كطرحت الفكرة من جديد (، x,في)p

When punctuation marks (parentheses, quotes, hyphens, etc.) indicate a local dependency, punctuation tag will be dependent on this local head.

.هؤلء ”الخبراء“ يتقاضون مبالغ خرافية (” x,خبراء)p (“ x,خبراء)p The followings are some examples of hyphen attachments to local heads:

التاريخ العربى ـ اليسلمى (-x,عربي)p In citations, the hyphens are also local: طاجن المكرونة باللحمة المفرومة بالصور - موقع كشهية

(- x,موقع)p

The same thing is applicable if the a colon was used instead of the hyphen:

85 .مكة المكرمة: كشف مدير المستشفى عن حزمة من إحصائيات لعداد المرضى

(: x,مكة)p

Or:

قيل: إن أباه كان من أعضاء جماعة

(: x,قيل)p

Moreover, a hyphen following a list number should be attached to that number إن أحسست بتل مصق في العجينة أضيفي المزيد من الدقيق 5-

p(5,x -)

In number ranges, the hyphens are attached to the first number:

بدأ بعد ذلك بالتحلل بنسبة 8- %18 يسنويا

p(8,x -)

In the case where the punctuation play the role of a coordinative conjunction, p() rel must be assigned to the local head.

● dependent: dep A dependency is labeled as dep when the system is unable to determine a more precise dependency relation between two words. This may be because of a weird grammatical construction, a limitation in the Stanford Dependency conversion software, a parser error, or because of an unresolved long distance dependency.

طريق القاهرة كشرم الشيخ (كشرم x,القاهرة)dep

and the الطبيب هو المسئول as in ضمير الفصل We use this tag in Arabic with the separating pronoun .الكتاب الذي ايستعرته as in ضمير الربط resumptive pronoun كان الطبيب هو المسؤول (مسئول x,كان)att (هو x,طبيب)dep

الكتاب الذي ايستعرته (الذي x,ايستعرت)dobj (ه x,ايستعرت)dep

86 will be attached to the subject unless there is a conflict in ضمير الفصل By default the separating pronoun الضحية .number and gender between the subject and predicate and the pronoun follows the predicate (e.g .in such case it is attached to the predicate ,(هم الضعفاء

in the place of the object or object of preposition, the (ضمير الربط) If there is a resumptive pronoun pronoun is given the dep function, and the relative pronoun receives the main function. الكتاب الذي أعرته لي كان رائعا (الذي x,أعرت)dobj (ه x,أعرت)dep المكان الذي ذهبت إليه (الذي x,إلى)pobj (ه x,إلى)dep

This tag also covers independent noun phrases in parenthetical position (indicating age, location, affiliation, qualification, etc.), which doesn’t have a clear syntactic function in the clause.

البرادعي (70 عاما) (عام x,برادعي)dep (x 70,عام)num في محافظة الخليل (جنوب الضفة) (جنوب x,محافظة)dep

(business-card like phrases) حسن إبراهيم، دكتوراه في القتصاد (دكتوراه x,حسن)dep

حسن إبراهيم، وزاركة التجارة (وزارة x,حسن)dep

فيلم الجزيرة، جإخرا ك كشريف عرفة (إخراج x,فيلم)dep

5.3 Specific Issues with Dependency

MWE List head is mark :أن or ما followed by complementizer ( ... ،كما، طالما، حالما) Function word ● كما/طالما/حالما أن ○ إل أن ○ غير أن ○ حيث أن ○ ما أن ○ ما إذا ○

87 ● Prep - Function words head: mark حتى لو (حتى ولو) ○ head: mark حتى إذا ○ head: mark بحيث ○ head: tmod من قبكل ○ head: tmod من بعكد ○ head: refer to the multi function words table في حين ○ head: cc meaning and then 13من كثنم ○ head: tmod فيما بعد ○

● Prep JJ/JJR: head is advmod (POS: IN-NN) بالتالي ○ (POS: IN-JJR) بالحرى ○ (POS: IN-JJR) على الرجح ○ (POS: IN-JJR) على الكثر ○ (POS: IN-JJR) على القل ○

● Prep NN prep: head is prep (POS: IN-NN-IN) على الرغم من ○ بالرغم من ○ بالضافة إلى ○ بالضافة ل ○

● Prep Prep: head is prep (POS: IN-IN) من على ○ من أمام ○ من خلل ○ بدون ○ من بين ○ بداخل ○ من فوق ○ head: prep من لقلبلل ○

● Fixed head: advmod يا ريت ○ head: advmod :يا ترى ○ head:advmod ليسيما ○ head: depends of the function of the verb in the text مازال ○ head: depends of the function of the verb in the text مادام ○

من هنا , من with fatha) the annotation of the phrase will be ADP-IN + ADV-RB because it is the same as) من لثم Note that with 13 .etc هناك

88 head:nsubj لكشك ○ head: mark إل إذا ○ head:mark إل لو ○ head:nsubj لبد ○

xcomp should not be included in xcomp relations. Only control verbs assign the كشرع وتم Aspectual verb like xcomp relations

كشرع في إنشاء السد .1 كشروعه في النوم .2 بدأ في زيارة البلد .3 أوكشك على دحر العدو .4 أخذ في النهيار .5 الرغبة في الرحيل .6 (exceptional case) الرغبة في عودة النظام القديم .7 حرص على التحدث .8 ايستعد للقفز في الماء .9 (control to object) دفعه للغاء المبارة .10 ايستمر في محاورة خصمه .11

.تم The same also applies to the verb of completion

تم تعيينه في وظيفة مرموقة .12 تم توفير المطلوب .13 يتم ايستيفاء الشروط .14

حاول and أراد, ايستطاع, تمكن Occurring in the complement of control verbs (1

حاول, ايستطاع, تمكن قدر, طالب, طلب, كلف, يجب, ينبغي, تمكن, رغب, واصل, حرص, ايستعد, أعاد، كرر, رفض, Verbs like are control verbs that indicate verbal complement even if the masdar is attached with the أراد and حاول :ال definite article

حاول التدخل في المر .15 أراد التوجه إلى البيت .16 ايستطاع الخروج في الوقت المنايسب .17 تمكن من تعويض خسائره .18

What about these cases:

انتهى من اختيار الفريق ●

89 رفض توقيع العقد ● قام بتوزيع الجوائز ● قيامه بتوزيع الجوائز ● يهدف إلى زيادة الوعي ● يجب توفير الخدمات ●

(إن وأخواتها) Pseudo-verbs They are ADP/IN/mark (subordinating conjunction (إن، ليت، لعل، عل، كأن، لكن list) أخوات إ نن For introducing a subordinate clause) it will be subconj قال starting a sentence is PRT/RP/prt, when used after إ نن التوكيدية For

Prep / Mark

and (من، إلى، عن، على، في، الباء، الكاف، اللم، واو القسم، حتى، منذ، مذ، التاء) prep: includes both prepositions :including (الكلمات الملزمة للضافة) :prepositionals or quasi-prepositions مع، أمام، إثر، إزاء، بعد، بين، تجاه، تحت، تلو، حذو، حول، حين، خلف، ضمن، عقب، عبر، عند، فوق، فور، قبل، قبيل، قبالة، قرب، مع، أثناء، طوال، عوض، حسب، وفق، أمثال، ضد، مثل، كشبه، نحو، دون، لدى، خلل، وراء، حيال، جراء، ويسط، رغم، داخل، خارج، رهن، كبلعنيد، كنصب، قيد، طيلة، بيد، مقابل، نظير، كشمال، كشرق، جنوب، غرب، نتيجة mark: A marker is the word introducing a finite clause subordinate to another clause. For a For an adverbial clause, the marker is typically a .أ نن وأ نن complement clause, this will typically be The mark is a dependent of the .إذا، إ نن، لو، حتى، طالما، حالما، بينما، عندما، إلخ subordinating conjunction like .أيقن أن الوضع لن يتغير :subordinate clause head. Example

Note that when a prep follows another prep, the first prep is labeled as mwe: (من x,أمام)mwe Dates and Time Dependency structure Day name will be considered as the head of the date expression and the day of month will be related to day name with the appos relation. Then, month name and year will be annotated as dependent elements: .يستعقد القمة المقبلة الثنين 30 نوفمبر، 2015 (الثنين x,تعقد)tmod (x 30,الثنين)appos (نوفمبر tmod(30,x (x 2015,نوفمبر)tmod

When day name is not mentioned, the day of month will be the head of the date: .يستعقد القمة المقبلة 30 نوفمبر، 2015 (x 30,تعقد)tmod (نوفمبر tmod(30,x (x 2015,نوفمبر)tmod

When hours are mentioned, they will be attached to the VP or NP head at the same level as the head of

90 date expression, or attached to the head of date expression if any constraints (such as ambiguity or crossing dependencies): يستبث المباراة الثنين الساعة 11 مساء (مبارات x,تبث)nsubjpass (اثنين x,تبث)tmod (يساعة x,تبث)tmod ( x11,يساعة)amod (مساء tmod(11,x

يستبث المباراة الثنين في العاكشرة مساء (الثنين x,تبث)tmod (في x,الثنين)prep (عاكشرة x,في)pobj (مساء x,عاكشرة)tmod

Relations In an adverbial function, dates and time as all temporal expressions are always annotated as tmod if the expression is a bare noun, and are always annotated as prep+pobj if they are introduced by a preposition:

: غادر يوم 7 يوليو (x 7,غادر)tmod (يوليو tmod(7,x (يوم appos(7,x

يسيغادر الخميس القادم (الخميس x,يغادر)tmod (قادم x,الخميس)amod

● introduced by a preposition: يسيغادر في 7 يوليو (في x,يغادر)prep (x 7,في)pobj (يوليو tmod(7,x

”كيف، متى“ كيف يستسافر؟ (كيف x,تسافر)advmod

.ل أعلم كيف أتصرف (كيف x,أتصرف)advmod

متى جئت؟ (متى x,جئت)advmod

91 constructions In case of light verb constructions (“support verbs”), the construction will be annotated compositionally, i.e., every argument will be linked to the head verb as direct objects or prepositional objects (they will not be tagged with mwe).

أخذ بالثأر (ب x,أخذ)prep (ثأر x,ب)pobj

أخذ يساترا (يساترا x,أخذ)dobj

ألقت نظرة على ابنها (نظرة x,ألقت)dobj (على x,ألقت)prep (ابن x,على)pobj

Quantifiers: predet vs. head The list of quantifiers are tagged predet when immediately preceding the noun they modify in a but they are treated as heads when followed by a prepositional ,(أكثر الناس) seemingly idafa construction .(الكثير من الناس) phrase

● quantifiers as predet: بعض الناس يعارض بل يسبب (بعض x,ناس)predet (ال x,ناس)det

يجب مراجعة جميع القرارات (جميع x,قرارات)predet (ال x,قرارات)det

● quantifiers as head: البعض من الناس يتصيدون الخطاء (من x,بعض)prep (ال x,بعض)det

Interrogative pronouns Interrogative pronouns are annotated according to their respective syntactic function in the sentence. If they fill an argument position of the verb, they could be nsubj, dobj or pobj: من فعل ذلك؟ (من x,فعل)nsubj

من قابلت هناك؟ (من x,قابلت)dobj

92 ماذا حدث؟ (ماذا x,حدث)nsubj

ماذا أكلت؟ (ماذا x,أكلت)dobj

ماذا أكلت؟ (ماذا x,أكلت)dobj

أي الكتب تحب؟ (أي x,كتب)predet

لمن توجه حديثك؟ (من x,ل)pobj (ل x,توجه)prep

إلى متى تماطل؟ (متى x,إلى)pobj (إلى x,تماطل)prep

In the following two examples, the interrogative pronouns are ROOT’s

من الجاني؟ (جاني x,من) nsubj

ما الحل؟ (الحل x,ما) nsubj

then they will be annotated as ,(أين، متى، كيف، لم، لماذا) If they fulfill an adverbial function in the sentence advmod: أين ذهبت أمس؟ (أين x,ذهبت)advmod

كيف حدث ذلك؟ (كيف x,حدث)advmod

لم فعلت هذا؟ (لم x,فعلت)advmod

لماذا هاجرت؟ (لماذا x,هاجرت)advmod

93 Multi-token subordinating conjunctions وقتما، بينما، طالما، حالما، فيما (فيما كان أخي نائما خرجت من المنزل)، لمما (لما هزه وجده ميتا)، ريثما، كما، كيما، بعدما، أنما، لول، عندما، إنما (إنما جاء ليبين وجهة نظره)، إذما، مهما، حيثما، كيفما، لئل، مما، لماذا All multi-token subordinating conjunctions above are treated as single units, and they are tagged as mark for advcl: .هرب لئل يعتقل (يعتقل x,هرب)advcl (لئل x,يعتقل)mark

Range expressions

Range expressions often include a verb, two prep’s, two numbers and one pobj. The dependency relation should be as the following: تتراوح بين 3 الى 5 قطع

prep(ranges,x (بين x,تتراوح)prep (x 3) between,بين)pobj (pobj(between,x 3 (الى x,تتراوح)prep (x 5) prep(ranges,x to,قطع)num (num(pieces,x 5 (قطع x,الى)pobj pobj(to,x pieces)

حكم من عام 2005 حتى عام 2007 (من x,حكم)prep (حتى x,حكم)prep

With numbers separated by a dash, the dash and the following number will be dependent on the first number. حكم: 406-454ه :Example (x 406,حكم)tmod p(406,x -) num(406,x 454)

Locutions: mwe

The multi-word expression relation is used for certain multi-word idioms that behave like a single function word. It is used for a closed set of dependencies between words in common multi-word expressions for which it seems difficult or unclear to assign any other relationships. Multiword expressions are annotated in a flat, head-last structure, in which all words in the expression modify the first one using the mwe label. لن يستطيع حتى لو أراد (حتى x,لو)mwe

94 (لو x,أراد)mark

Complex complementizers

and you cannot replace any ”أ نن، أ من، إذا“ If the sequence introducing a subordinate clause ends with element the sequence by any other word and if you cannot insert anything, then annotate the sequence .إل إذا، حتى لو، حيث أن، غير أن as a Multi-word expression, such as

.إل إذا كنت يسأبقى (إل x,إذا)mwe

.دخل المستشفى حيث أنه أصيب ( حيث x,أن)mwe

Complex prepositions

In case of complex prepositions, if you can substitute another word with a similar meaning or if you can insert some other word without changing the meaning, then annotate according to the internal structure. If not, annotate the sequence as a multi-word expression to which only one DepRel will be assigned: prep بالنسبة للوضع هناك (ل prep(x,x (ب x,ل)mwe (ال x,ل)mwe (نسبة x,ل)mwe

This also covers expressions such as: على الرغم من بالرغم من بالضافة إلى

حتى إذا ل كشك بدون

بالضافة ل

Relative pronouns

Relative pronouns introducing a relative clause (rcmod) have the same dependency tag as the extracted .when found, will be tagged as dep ,(ضمير الربط) element. Note that the resumptive pronoun

95 صديقي الذي جاء من بغداد (جاء x,صديق)rcmod (الذي x,جاء)nsubj

الكتاب الذي اكشتريته (اكشتريت x,كتاب)rcmod (الذي x,اكشتريت)dobj (ه x,اكشتريت)dep

etc. will be annotated ,الذي له، الذي عليه Relative pronouns extracted from a prepositional phrase such as with prep+pobj relations: الشخص الذي تحدثت معه (تحدثت x,كشخص)rcmod (مع x,تحدثت)prep (الذي x,مع)pobj (ه x,مع)dep

Nouns with omitted relative pronouns When indefinite nouns are modified by a clause the relative pronoun is dropped. In this case, the head of the modifying clause is still tagged as rcmod. لي صديق يعاني من الكتئاب (يعاني x,صديق)rcmod (من x,يعاني)prep

لم يجد أحدا يثق فيه (يثق x,أحدا)rcmod (في x,يثق)prep (ه x,في)pobj

Headless relative clauses Headless relative clauses are clauses with no NP head, e.g. قال الذي كان عنده ● يرفضون ما تماريسه إدارة الشركة ● وكان السيسي هو الذي اعلن اقالة مريسي ● كل كشركة تقول ما تريده عن الرقام ● In such examples, the relative pronoun becomes the head of the phrase and receives the relevant grammatical function, and the resumptive pronoun becomes the dobj when applicable. This treatment is applicable in two cases: 1. If the relative pronoun was in a nominal position e.g. pobj or dobj 2. If the relative clause was in a predicate position, its relative pronoun becomes the head

96 of the sentence

Parataxis vs. appos Basically, the parataxis dependency concerns a relation between two predications. Verb constructions or deverbal nouns can be considered as predication. On the other hand, appos applies to NPs where the dependent element that immediately follows the head element generally defines or specifies this latter: ردد مقولته الشهيره: ما نخاف على التحاد إل من التحاد نفسه (نخاف x,ردد)parataxis

يعيش صديقي حسن في لندن (حسن x,صديق)appos

Adjuncts: choice of the head As non-essential elements of the sentence, adjuncts have no specific position and thus can be in initial, medial or final position in the sentence, or can be moved anywhere. Here are 3 rules to follow so as to determine the head of adjuncts: ● When there isn’t any factor constraining the position of an adjunct, the rule is to attach it to the root predicate or to its head verb in an embedded proposition: اصطحب أولده إلى الحديقة الخميس الماضي. / الخميس الماضي اصطحب أولده إلى الحديقة. / اصطحب أولده .الخميس الماضي إلى الحديقة (الخميس x,اصطحب)tmod

is ambiguous. In مصدر عمال Sometimes, the scope of adjuncts of verbs and verbal nouns ● these situations, the adjunct will be attached according to the context, which generally depends on the position of the adjunct. We need to note also that we generally prefer to make attachment that avoid crossing dependency arcs. .اضطرب الخميس الماضي أثناء اجتماعه مع المدير (الخميس x,اضطرب)tmod .اضطرب أثناء اجتماعه الخميس الماضي مع المدير (الخميس x,اجتماع)tmod this will lead to crossed مع to اجتماع and then attach الخميس to اضطرب In the second example if we attach arcs.

ل ن ولكي Phrases -are subordinating conjunctions (ADP أن، وكي ,(is a preposition (ADP-IN ل the ،لن، لكي In the phrases (are mark (head of the subordinate phrase is pcomp أن، وكي and و is prep ل IN). In dependency labelling headed by the prep.

Symbols in Dependency All symbols should receive the p label and attached to their relative head as in the following examples:

97 20$ p(20, $)

( p(20, ⁰ ن20 (& x,يسمير)p يسمير & علي (> x,يسوريا)p <في >يسوريا

يمكن، يعجب، يكفي :Verbs with csubj :يعجب ويكفي behaves like يمكن The verb يمكنني أن أرحل يعجبني أن أرحل يكفيني أن أرحل يمكنني الرحيلك يعجبني الرحيلك يكفيني الرحيلك is the csubj/nsubj. The meaning is similar to الرحيل or أن أرحل is the dobj and ي Here the pronoun - .يعجب الولد إياي - Another evidence, from the conjugation of the verb, it is obvious that the pronoun is the dobj. The كشكرني.د .e.g ,ياء المتكلم and object is كشكرت .e.g ,تاء الفاعل subject pronominal suffix is :will be dislocated يمكن، يكفي، يعجب، يجوز Any fronted NP with - (with pronominal reference) محمد يمكنه أن يرحل (without pronominal reference) محمد يمكن أن يرحل محمد يعجبه أن يرحل محمد يجوز له أن يرحل محمد يكفيه أن يرحل

الرمر الذي Subordinate sentences starting with :are annotated a follow المر الذي Subordinate clauses starting with يؤكد will be a child of الذي (will be the head of the subordinate clause (child of the preceding clause أمر and the rest is annotated like any regular clause with an rcmod: لم يجدوا كشيئا المر الذي يؤكد كذب المعلومات (أمر x,يجدوا) advcl

98 (يؤكد x,أمر) rcmod (الذي x,يؤكد) nsubj

Definition of prepositional argument (CLR) A masdar is considered verbal (VBG) if it governs two argument, and active and passive participles are considered verbal when followed by one argument. The argument could be closely related preposition (CLR). The definition of CLR as in the ATB is “the preposition should have a particularly close relationship, and the PP-CLR should be obligatory for that sense of the verb.” Here are four cases of CLR that give more details. We explain it in terms of the verb that the masdar or participle is derived from.

1) Transitive verbs that take a PP instead of an object. The verb is transitive in the sense that the verb alone (without its complement) doesn’t make a complete sense/sentence. أثر على النمو رحب بالضيف ايستولى على يسفينة أفضى إلى الفشل

2) Transitive that takes a either a direct object or PP. The selection of the type of argument will lead to a difference in meaning. أدى إلى يسقوط بعض القتلى أخذ في العتبار عمل على النهوض بالبلد

3) Di-transitive that takes an object and a PP اتهمه بالتقصير لفت النظر إلى ضرورة عرض صديقه للخطر قال كشيئا عن الرئيس حذر صديقه من الهمال

4) Can either be transitive or take a PP argument. The selection of the type will lead to a difference in meaning. قام بضم الراضي جاء بخبر يسار وصل إلى الحل ايستمر في النمو ايستمع إلى الحوار فاز على خصمه

99 Irregular Adjective Sequence Case 1. In some instances we have an adjective sequence where the reference is to a compound noun.

الزعيمين السودانيين الجنوبيين الدوري الكوري الجنوبي رياح كشمالية كشرقية

.respectively كشمال كشرق and جنوب السودان، كوريا الشمالية So, the reference here is to In this case, attach both JJs to the NN, as it is irregular in Arabic to attach an adjective to another adjective.

Case 2. In the following example

اليساطير الهندو - أوروبية

.has the proper gender agreement أروبية and ال has هندو We have two partially-formed adjectives: only .and the hyphen will take GW/'goeswith' since they are behaving like one large token هندو Therefore

ليس Other functions of precedes a noun or ليس functions as neg and not as a predicate. This happens when ليس In some cases .Examples .(مبتدأ وخبر adjective phrases (not the typical

على is neg and child of ليس here --- يقوم هذا النظام الجديد ليس على المقولت والفتراضات علوية is neg and child of the adjective --- ليس here كشفته السفلية وليس العلوية

It can also function as preconj as in: نظرا لما يوفره من العديد من فرص العمل، ليس في نطاق محافظة المنيا فقط ولكن للمحافظات المجاورة أيضا

.when it functions merely as a negative particle, RP مهملة or غير عاملة is considered as ليس In this case Case for Nouns Modified by Numbers Arabic grammar classifies numbers into some that take a genitive tamyeez and some that take an accusative tamyeez. We treating tamyeez the same:

ك ثلثة أقلبم gen 10 3- رأيت أحلد عشلر كوكبا acc 11-19 تسعون يسياراة acc 90..20,30 قرأت واحدا و عشرين كتابا acc 99 21- مئة كتا بب gen 1000 ,100

Case for Words of non-Arabic Origin The guiding principle is to differentiate between whether the word is a translation or transliteration of a foreign word. Translation is typically marked a significant difference in the way a word is pronounced

100 from the original word. In transliteration there is no significant difference in pronunciation (apart from vowel lengthening and consonant mapping, e.g. p->b and v->f). then (الجبل اليسود، يساحل العاج، البويسنة والهريسك، اليونان، الصين، الهند If it is a translation (such as ● case should be assigned. then case (توك كشو، آي فون، جون يسيتوارت، بوركينا فايسو، نيويورك .If it is mere transliteration (e.g ● is not relevant and should be unsp_c. ● Words of non-Arabic origin which are institutionalized in Arabic should receive case .(خمسون دولرا، اكشترى تليفزيونا .e.g) are case=unsp_c (يناير-ديسمبر) Names of the months ● ألمانيا، يسويسرا، النمسا، فرنسا، .Non-Arab country names ending in Alif are case=unsp_c, e.g ● إيسبانيا، إنجلترا، ايستونيا، يسلوفينيا، إلخ

Restrictive vs Non-Restrictive Relative/Qualifying Clauses

● Qualifying clauses for definite nouns ○ recmod only when the clause is preceded by an explicit relative pronoun البطل الذي وقف أمام المدرعة :without waw ○ advcl in two cases: بعض :If the clause is not preceded by a relative pronoun ■ الدول منها السعودية ■ If the clauses is preceded by a relative pronoun with waw, In that case the clause will be .التطبيق المجاني والذي من خلله يمن تفقد حالة البطارية .e.g advcl to the modified noun and the waw will be a particle considering it as resumptive, and the relative pronoun will attach similar to its attachment rules in rcmod clauses. ● Qualifying clauses for indefinite nouns ○ recmod for restrictive relative clauses (where commas are not تمثال على رأيسه تاج، صديق يخون صديقه :(appropriate ○ advcl for non-restrictive relative clauses (where commas are appropriate): Some helpful .اعتقلت مواطنين فلسطينيين، معظمهم من مدن الضفة، واقتادتهم إلى مكان غير معلوم (معظمهم، بعضهم) syntactic clues here are when the clause being introduced by a quantifier .or separated with commas ,(منها، منهم .e.g) من or with adjectives فوق، بدل، تحت are followed by adjectives, they will be tagged RP-prt, and will be headed by the فوق، بدل، تحت ، When following adjective. الكشعة فوق البنفسجية (بنفسجية x,أكشعة)amod (فوق x,بنفسجية)prt

فوق المتويسط، تحت الحمراء، بدل الضائع ,Other examples .are typically prepositionals when followed by nouns غير، فوق، تحت، بدل .N.B

101 Noun Modifiers When nouns are used to modify another noun, the dependency relation will be ‘nn’ Examples: عن تقدير الدول اليسلمية العضاء في المنظمة الرجل الوطواط الرجل العنكبوت فندق خمس نجوم

POS: NN .و dep: nn dependency label for noun modifying another noun

(المتعدي لمفعولين) and ditransitives ,(تمييز) Tamyeez ,(حال) Haal عاكشت البنت بعيدة عن والديها، ) comes as adjective and doesn’t fit into partmod حال When the ● assign it as advmod and attach it to the noun it modifies (and agrees with) if it is ,(عثر عليها يسليمة .attach it to the verb (عاكشت بعيدة عن والديها) explicitly present, otherwise

assign (يبعد ميل، يزن رطل، نام يساعة، ايستقر يوما، يسار ميل With words of measurement (like ● .(ميل، رطل، إلخ) and npadvmod with the rest (يساعة، يوما) tmod with time expressions .are tamyeez and npadmod نائبا، ضحية، ملعبا the words ,عمل نائبا، وقع ضحية، تصلح ملعبا Also in ●

● With di-transitive verbs, try to force them into one of the two categories: as an argument and this is covered under verbs of مبتدأ وخبر Verbs that take .1 transforming in the GL (covering verbs of knowing, thinking and transforming). (طبيبا x,ظننت)attr ظننته طبيبا (ه x,ظننت)dobj ظننته طبيبا

(كريما x,ظننت)acomp ظننته كريما (ه x,ظننت)dobj ظننته كريما Verbs of 'making', 'appointing', 'selecting', 'choosing', etc. all go under “verbs of .”will all be “attr انتخب رئيسا، اختارها عاصمة، عينها معيدة transforming”, so all of those will take dobj and أعطى، منح، منع، يسأل، ألبس، كسا Verbs of giving .2 iobj

102