Quick viewing(Text Mode)

Persian Language

Persian Language

Persian

Gal Halfon - Machine Translation Seminar History Language Origin Historical Facts

Ancient Tribes

“Arya” tribes migrated to

Persian Ancient Empire

Zoroastrianism religion ( B.C)

Old-Mid - until 7th century A.D

Muslim Empire

Muslim Religion - 7th century

Adopted alphabet Persian Native Speakers Distribution

70 milion fluent speakers

110 milion total speakers

Types of Persian:

Western - Iran

Eastern -

Tajiki - Persian Language

Grammar is similar to -origin

Arabic-based alphabet

Influence:

Arabic - 24% of the everyday vocabulary are of Arab origin.

Little Turkish/Mongolian Major Modifications from

Addition of 4 letters

( ه ) changes to heh ( ة ) teh Marbuta

5 major vowels - (a, i, e, o, u) Persian

SOV

The main clause precedes a subordinate clause

Aglutination - Stringing morphemes together Nouns

Nouns

Persian nouns have no grammatical

nouns can be made plural using a separate word (ها) ‹hā›

(ان) ‹pluralize using the suffix ‹-ān afer the word (را) ‹ using the word ‹rā

Adjectives

Typicaly folowed the noun using Ezāfe construct (-e- or -i-).

barādar-e-bozorg = “Big Brother”

Sometimes can be before the noun.

khosh-bakht = good-luck

Example sentence

سگ من از گربهی تو کوچکتر است

Sag-e man az gorbe-ye to kuchektar ast; My dog is smaler than your cat.

Morpheme pattern:

( NEG - DUR or SUBJ / IMPER ) - root - PAST - PERSON - ACC-ENCLITIC

Tenses -

past, , , present, future

Very ofen use with :

”To do“ = کردن

”Conversation“ = گفتگو

”To speak“ = گفتگو کردن Translation Quality

Up to date research (2013) made in Colombia University:

Morphological analysis improves results significantly

Word order should stil be improved

Example:

از فردا نمی ترسم چراکه دیروز را دیده ام وامروز را دوست دارم :Input

Reference: i ’m not afaid of tomorrow because i have seen yesterday and i like today

Translation: fom tomorrow , not afaid because i have seen yesterday and today i love Example -

‌ امریکا میگوید که برخلف ادعای گروه موسوم به America says that contrary to the "دولت اسلمی"، هواپیمای اردنی با شلیک آنها ,"claims of so-caled "Islamic state .سرنگون نشده است Jordanian aircraf to shoot them down yet. روز چهارشنبه این هواپیما در قلمروی تحت نفوذ این گروه شبه نظامی گم شد و خلبانش به اسارت On , the aircraf was lost in .آنها در آمد the sphere of influence of the militant group Khlbansh they were captured. شبه نظامیان "دولت اسلمی" که پیشتر به نام داعش شناخته میشدند، اعلم کرده بودند که با Militant "Islamic state" that را F-16 موشکهای ردیاب توانستند جنگنده ,previously were known as the Dash .سرنگون کنند had announced that the tracer missiles could overthrow F-16 fighter. در مقابل آمریکاییها میگویند "شواهد به وضوح" .نشان میدهد که این ادعا صحیح نیست The Americans said "the evidence clearly" shows that this claim is not true. Translation Tools

Translation preferences: Limited paralel corpuses, high

Corpuses:

TEP: English-Persian Paralel Corpus

El Kholy et ., 2013a; El Kholy et al., 2013b - 160,000 sentences

Analysis Tools

PerStem: (Jadidinejad et al., 2010) - morphological Segmenter

VerbStem: verb analyzer tool (Bijankhan et al., 2011) References http://en.wikipedia.org/wiki/ Persian_language#Grammar http://en.wikipedia.org/wiki/Persian_grammar

Improved Language Modeling for English-Persian Statistical MachineTranslation: http://www.aclweb.org/ anthology/W10-3810 http://www.cs.columbia.edu/~rasooli/papers/ijcnlp13.pdf