Iskandar Ding: Introduction to Tajik Persian 1 – the Alphabet

Posted  04 May 2020

Not many learners of Persian have realised that modern Persian, in fact, currently has two official alphabets – the Perso-Arabic one many are familiar with, and the Cyrillic, used to write Tajik. The debate on whether Tajik is a separate language from Persian is a socio-political one and has drawn much controversy both within Tajikistan as well as the Tajik-speaking areas of Uzbekistan, and among linguists worldwide. What withstands this debate, however, is the fact that the official, literary register of Tajik is no different from that of Iranian Persian and Afghan Persian (Dari), save for some particular stylistic preferences. From this point of view, Tajik Persian should not be considered to be a different language from Iranian Persian and Afghan Persian, not to mention that before the Soviet Union, Persian speakers in Central Asia had always referred to their language as فارسی / Farsi. It is beneficial, therefore, for learners of Persian to also familiarise themselves with the Cyrillic alphabet – the current official alphabet of Tajik Persian, in order to access the rich literary tradition, scholarship, and media resources published in Tajikistan. The good news is that, because the Cyrillic alphabet was invented based on the Greek alphabet, from which the Latin alphabet – one of the most widely employed writing systems globally – was developed, it will not take anyone familiar with the Latin alphabet a long time to master. Moreover, because Tajik Cyrillic writes every sound and the spelling is largely phonetical, it helps learners, especially at beginner’s and lower-intermediate levels, grasp faster the pronunciation of certain words, obscured by the consonant-based Perso-Arabic script. Students who wish to read classical Persian poetry but still lack an instinctive feeling for the metrical system (عروض / ʿarūz) will also benefit immensely from reading poetry collections published in Tajik Cyrillic, which presents sounds and syllables in a more obvious fashion. Fluent speakers of Persian who are literate in either script will also benefit from this little guide to learn the other script, which will greatly facilitate cross-border communication. Modern Persian speakers in the post-Soviet space have used, in total, three official writing systems. Prior to 1929, the Perso-Arabic script was the only alphabet traditionally learnt and used by those who could read and write, like in the rest of the Persian-speaking world still now. The official establishment of the Tajik Soviet Socialist Republic (Tajik SSR) in 1929 brought about a change of the official script from the traditional Perso-Arabic script to a Latin alphabet system, which had already been drafted in 1928. However, some books and newspapers were still published in the Perso-Arabic script, as most of the literate citizens needed time to adjust to the change. The Cyrillic alphabet began to be used in the 1930s, and became official in 1940, with which came also a ban on the use of the Perso-Arabic script. The Standard Tajik Cyrillic alphabet is slightly modified from the Russian Cyrillic alphabet and has 35 letters. The letters follow the order in Russian; letters with diacritics follow the ones from which they are derived. The correspondences between the Tajik Cyrillic and the Perso-Arabic alphabets are as follows:
Letter Transliteration from Perso-Arabic Perso-Arabic
Аа a ا (initial), or َ    )fatḥa/zabar ( , or final ه with a grammatical function 1
Бб b ب
Вв v and w 2 و
Гг g گ
Ғғ gh (also ġ) غ
Дд d د
Ее ē (merged with ī in Iranian Persian) 3 ی
Ёё یا
Жж ž (also zh, j (as in French)) ژ
Зз z ز/ض/ظ/ذ 4
Ии i (corresponding to the Iranian short e) ِ   (kasra/zēr) 5
Ӣӣ ī (used word-finally) ی
Йй y ی
Кк k ک
Ққ q (sometimes as gh in Iranian Persian) ق
Лл l ل
Мм m م
Нн n ن
Оо ā 6 آ , or sequence-final ا
Пп p پ
Рр r ر
Сс s س/ص/ث
Тт t ت/ط
Уу u (corresponding to the Iranian short o) ُ   (ḍamma/pēsh)
Ӯӯ ō (merged with ū in Iranian Persian) or ū و
Фф f ف
Хх ḫ (also kh, x) خ
Ҳҳ h ه/ح
Чч č (also ch) چ
Ҷҷ j (also dj, as in ‘j’ in ‘jam’) ج
Шш š ش
Ъъ ʿ and ʾ (glottal stop) 8 ع /ء
Ээ e (used word-initially) 9 ِ   (kasra/zēr)
Юю yū/yu یو/یُ
Яя ya یَ
  N.B. Special attention should be paid to the following letters:
  • The Cyrillic в looks like the Latin capital letter B, but is in fact v.
  • The Cyrillic и, italicised, i.e. и, looks like the Latin u, but is not.
  • The Cyrillic н looks like the Latin capital letter H, but is in fact n.
  • The Cyrillic р looks like the Latin capital letter P, but is in fact r.
  • The Cyrillic с looks like the Latin capital letter C, but is in fact s.
  • The Cyrillic т, italicised, i.e. т, looks like the Latin small letter m, but is not.
Test yourself
  1. Visual exercise: Here is a page from the Perso-Arabic/Cyrillic comparative index in the Фарҳанги забони тоҷикӣ (فرهنگ زبان تاجیکی). Read through the list to familiarise yourself with the letter correspondences.
  2. Below is the opening couplet of a famous poem from Saʿdi. See if you can decipher which poem it is.
Банӣ Одам аъзои якдигаранд Ки дар офариниш зи як гавҳаранд
  1. Such as the past participle in ده -> да (کرده -> карда), adjectival/adverbial ه , -انه (امروزه  -> имрӯза, دو نفره -> дӯ нафара, فداکارانه -> фидокорона etc.). In Iranian Persian this has become /e/.
  2. The consonantal و in Persian may appear as w in transliteration styles imitating the Arabic system, or representing Afghan Persian pronunciation. Both /v/ and /w/ exist in Tajik Persian, see Introduction to Tajik Persian 2 for more explanation.
  3. You may have noticed that there is no separate letter representing the non-word-final long ī sound in Persian. This letter covers both the short i and the long ī sounds, representing the phonological reality that in Tajik, they are very often indistinguishable, except in careful pronunciation such as poetry recitation. Thus, ‘milk’ is шир in the Tajik script.
  4. The Tajik alphabet represents sounds, and letters which sound the same in the Perso-Arabic script are represented by one letter, which facilitates learning but obscures etymology.
  5. The izafa (ezafe) is not written out in the Perso-Arabic script, but is always written out (as и) in the Cyrillic.
  6. The Tajik long ā, represented by the Cyrillic letter o, is more rounded than it is in Iran and Afghanistan, and some speakers may pronounce it with excessive roundedness, resulting in a sound very much like the ‘or’ in the British pronunciation of ‘lord’.
  7. Some speakers may pronounce it as if it has an umlaut, i.e. somewhat like ö in German or Turkish. This is not regarded as standard by many.
  8. The initial glottal stop is not written in the official script, e.g. عضو -> узв, although scholars have proposed for ъ to be written out to represent the etymological glottal stop.
  9. This letter is largely used in Russian loan words but also to represent the short i at word-initial position followed by what would be ع, ء or ح in the Perso-Arabic script, making the short i in this position sound like a short e, which is the Iranian pronunciation.