SANSKRIT PART-OF-SPEECH TAGSET

This POS tagset for Sanskrit is developed by Dr. R. Chandrashekar as part of Ph.D. thesis ‘Part-of-Speech Tagging for Sanskrit’ (2007) under the supervision of Dr. Girish Nath Jha at Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi.

The tagset is classified according to the morphological structure of Sanskrit words. There are three kinds of tags in this tagset. Word class main tags, feature sub-tags and punctuation tags. The tag as a whole is a combination of word class main tag with feature sub-tags separated by a delimitor underscore (indeclinable and punctuation tags do not have sub-tags). All tags bear Sanskrit names[1][1] with letter-digit acronymic in Roman script.

All the words in Sanskrit have inflections so they all get feature sub-tags except avyaya-s. These inflections (vibhakti-s) also suggest the syntactic relations between the words in a sentence. So tags in this tagset are morho-syntactic tags. Few tags have semantic nature along with morphological nature. NK and NB are the tags for the nouns which have the sense of agent and impersonal/abstract nouns which are morpho-semantic in nature. So the tagset has tags with morphological, syntactic and semantic features. This tagset is developed as a general purpose tagset which can be used for shallow parsing simple sandhi-split prose Sanskrit text.

The tagset has 65 word class tags, 43 feature sub-tags, and 25 punctuation tags and one tag AJ to tag unknown words – a total of 134 tags.

The word class tags are 8 Noun tags, 8 Pronoun tags, 3 Adjective tags, 9 Participle tags, 2 Number tags, 14 Compound tags, 11 indeclinable tags and 10 verb tags.

Feature tags are three .3 (p,s,n); 8x3 = 24 (Nominal)Case and Number tags (1.1 through 8.3); 4 Verb base modifying tags (Nd, Yn, Sn, Ni); 1Verbal Preposition (UPA); 2 Pada tags (P and A); 2 voice tags (Kr and Bh, default kartari is unmarked); 3x3 = 9 (Verbal) Person and Number tags (1.1 through 3.3).

Annotation Guidelines

 

Noun Tags

Gender sub-tags: p,s,n (for masculine, feminine and neuter)

Declensional sub-tags:  1.1 for prathama vibhakti eka vacana,

  1.2 For prathama vibhakti dvi vacana.... so on

                                      8.1 For sambodhana prathama vibhakti eka vacana

N

Nāmapada (Common Noun, with gender and declensional sub-tags)  (e.g. adriϊ, grāma, rāja)

NA

NāmaAbhidhāna (Proper noun, with gender, number and declensional sub-tags) (e.g. indrah, gir΄§aϊ, rādhikā, Άitthaϊ)

NAD

Nāma Abhidhāna Desa (Proper Noun Country, with gender, number and declensional sug-tags)  (e.g. vidarbhāϊ, lā āϊ)

NAP

Nāma Abhidhāna Pum_apatya (Patronymic Noun, with gender, number, and declensional sub-tags) (e.g. rāghavaϊ, vai§ravaφaϊ, jānak΄)

NAS

Nāma Abhidhāna Stri_apatya (Metronymic Noun, with gender, number, and declensional sub-tags) (e.g., pārthaϊ, saumitriϊ)

NAT

Nāma Abhidhāna Tadraja (Tadraja Noun, with gender, number, and declensional sub-tags) (e.g. maithilaϊ, vaidehaϊ)

NS

Nāma Sannaarthaka ( Noun Desiderative, with gender, number, and declensional sub-tags) (e.g., pipāsā, cikitsā, cik΄r·uϊ)

 

E.g.: वेगेन - N_p_3.1 (nāmapada-pu΅liΊga-t¨t΄yā-vibhakti-eka-vacana) and tagged as - वेगेन[N_m_3.1] (tag within the square brackets)

 

Compounds

NCDI

Nāmapada which is a Compound of Dvandva Itaretara type (Coordinative Enumerative Compound which is a Noun, with gender and declensional sub-tags)  (e.g. rŒmak¨·φau, rŒmalak·maφabharata§atrughnŒϊ)

NCDS

Nāmapada which is a Compound of Dvandva SamŒhara type (Coordinative Collective Compound which is a Noun, with neuter gender, and singular number and declensional sub-tags) (e.g. pŒφipŒdam, ŒhŒranidrŒbhayam)

NCT2

Nāmapada in 2nd case Compounded to form Tatpuru·a dvit΄yŒ type (Determinative Compound (accusative), with gender, number and declensional sug-tags) (e.g. k¨·φa§ritaϊ, duϊkhŒt΄taϊ)

NCT3

Nāmapada in 3rd case Compounded to form Tatpuru·a trt΄yŒ type (Determinative Compound (instrumental), with gender, number and declensional sug-tags) (e.g. haritrŒtaϊ, nakhabhinnaϊ, vŒkkalahaϊ)

NCT4

Nāmapada in 4th  case Compounded to form Tatpuru·a caturth΄ type (Determinative Compound (dative), with gender, number and declensional sug-tags) (e.g. yθpadŒru, gohitam)

NCT5

Nāmapada in 5th  case Compounded to form Tatpuru·a pa–cam΄ type (Determinative Compound (ablative), with gender, number and declensional sug-tags) (e.g. corabhayam, svargapatitaϊ)

NCT6

Nāmapada in 6th  case Compounded to form Tatpuru·a ·a· h΄ type (Determinative Compound (genitive), with gender, number and declensional sug-tags) (e.g. rŒjapuru·aϊ, devendraϊ)

NCT7

Nāmapada in 7th  case Compounded to form Tatpuru·a saptam΄ type (Determinative Compound (locative), with gender, number and declensional sug-tags) (e.g. ak·a§auφΆaϊ, ΄§varŒdh΄naϊ)

NCAl

Nāmapada Compounded without the deletion of case inflection to form Aluk type (gender, number and declensional sug-tags) (e.g. dhana–jayaϊ)

NCNT

Nāmapada Compounded with negation to form Na– Tatpuru·a type (gender, number and declensional sug-tags) (e.g. abrŒhmaφaϊ)

NCK

KarmadhŒrya Compound (with gender, number and declensional sug-tags) (e.g. ghana§yŒmaϊ, mukhakamalam, k¨·φasarpaϊ, §uklak¨·φaϊ)

NCD

Dvigu Compound (with gender, number and declensional sug-tags) (e.g. ·aφmŒturaϊ, pa–cava ΄, pa–catantram)

NCB

Bahuvr΄hi Compound (with gender, number and declensional sug-tags) (e.g. pitŒmbaraϊ, cakrapaniϊ)

NCA

Avyaya Compound (e.g. yathā§akti, am§ām§i) Adverbial Compound

 

Pronoun Tags

With/without gender, number and declension sub-tags

SN

 

Sarva Nāman (Pronoun Other, with gender, number, and declensional sub-tags) (e.g., anyaϊ, aparā)

SNU

Sarva Nāman Uttama (Pronoun First Person, number, and declensional sub-tags) (e.g., asmad)

SNM

Sarva Nāman Madhyama (Pronoun Second Person, number, and declensional sub-tags) (e.g., tvad)

SNA

Sarva Nāman ξtman (Pronoun Reflexive, with or without gender, number, and declensional sub-tags) (e.g., nijaϊ, svasya)

SNN

Sarva Nāman Nirdesatmaka (Pronoun Demonstrative, with gender, number, and declensional sub-tags) (e.g., idam, saϊ)

SNP

Sarva Nāman P§nārthika (Pronoun Interrogative, with gender, number, and declensional sub-tags) (e.g., kim, kad)

SNS

Sarva Nāman Sāmbandhika (Pronoun Relative, with gender, number, and declensional sub-tags) (e.g., yaϊ, yā)

 

अस्य[SND_p_6.1] (Pronoun demonstrative masculine ·a· h΄-vibhakti eka-vacana)

 

Adjective Tags

With gender, number and declension sub-tags

NVI

 

Nāma VI§e·aφa (Adjective, with gender, number, and declensional sub-tags) (e.g., sundaraϊ, krurā)

NVIT

Nāma VI§e·aφa Tulanatmaka (Adjective Comparative, with gender, number, and declensional sub-tags) (e.g., alpabhāgyataraϊ, §reyaϊ)

NVIA

Nāma VI§e·aφa Atishayavaci ( Adjective Superlative, with gender, number, and declensional sub-tags) (e.g., sattamaϊ, jye· hā)

विशालाः[NVI_p_1.3] (Nāma VI§e·aφa masculine prathama-vibhakti bahu-vacana)

 

Number Tags

With gender, number and declension sub-tags

SAM

Sa΅khyā (Cardinal Number, with gender, number, and declensional sub-tags) (e.g., ekaϊ, dve)

SAMY

Sa΅khyeya (Ordinal Number, with gender, number, and declensional sub-tags) (e.g., prathamaϊ, tur΄yā)

 

अष्टौ[SAMC_p_1.3](Cardinal Number masculine prathama-vibhakti bahu-vacana)

 

Participle Tags

With gender, number and declension sub-tags

Extra: with Nic: Ni_KB2_psn = kAritavat

           with Nominal: Nd_Ni_KB2_psn = kRupAyita

KV1

Krdanta Vartamana 1 (Satr, with gender and declensional sub-tags)  (e.g. kurvan, gacchat) Past Active Participle

KV2

Krdanta Vartamana 2 (Sanac, Satr, with gender and declensional sub-tags)  (e.g. labhamanah, vardhamanam) Past Middle/Active Participle

KB1

Krdanta Bhuta 1 (, Kta with gender and declensional sub-tags)  (e.g. drstah, gatam) Past Passive Participle

KB2

Krdanta Bhuta 2 (Ktavat, with gender and declensional sub-tags)  (e.g. uktavat, drstavan) Past Active Participle

KAa

Krdanta Agami a (sya-satr, with gender and declensional sub-tags)  (e.g. karisyat ) Future Active Participle

KAb

Krdanta Agami b (sya-sanac, with gender and declensional sub-tags)  (e.g.karisyamana ) Future Passive Participle

KVI1

Krdanta VIdhyarthaka 1 (-ya, with gender and declensional sub-tags)  (e.g. karya) Gerundive

KVI2

Krdanta VIdhyarthaka 2 (-tavya, with gender and declensional sub-tags)  (e.g. kartavya) Gerundive

KVI3

Krdanta VIdhyarthaka 3 (-aniya, with gender and declensional sub-tags)  (e.g. karaniya) Gerundive

 

क्रियमाणः[KV2_p_1.1](vartamāna-k¨t-§atranta pu΅liφga prathamā-vibhakti eka-vacana)

 

Verb Tags

For 'Atmane pada' - add 'A' before the lakAra tag

For 'parasmai pada' - add 'P' before the lakAra tag

 

lakAra tags

laT lakAra Vartamana  -  laTV

liT lakAra Bhuta          -  liTB

luT lakAra Agami        -  luTAg

lR^iT lakAra Agami    - lR^iTAg

loT lakAra Ajna           - loTA

la~N lakAra Bhuta       - la~NB

li~N lakAra  vidhi        - li~NVi

li~N lakAra AshI          - li~NAs

lu~N lakAra Bhuta        - lu~NB

lR^i~N lakAra Sanketa - lR^i~NS

 

For 'purusha' and 'vacana' - 1.1 - prathama purusha eka vacana

      1.2 - prathama purusha dvi vacana

      1.3 - prathama purusha bahu vacana

                   2.1 - madhyama purusha eka vacana

      2.2 - madhyama purusha dvi vacana

                                          2.3 - uttama purusha bahu vacana

      3.1 - uttama purusha eka vacana

      3.2 - uttama purusha dvi vacana

      3.3 - uttama purusha bahu vacana

 

These ' puru·a ' and 'vacana' tags will come after the lakara tag

laTV

VLaT ( preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Vartamana Present Tense (bhavati)

liTB

BLiT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (bhabhuva)

luTAg

AgLuT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Agami Future Tense (bhavita)

lRuTAg

AgLRuT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Agami Future Tense (bhavisyati)

loTA

ALoT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Ajna Imperative mood (bhavatu)

la~gB

BLa~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (abhavat)

li~gVi

Vidhi li~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Vidhi Potential mood (bhavet)

li~gAs

Ashir li~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Ashih Benedictive mood (bhuyat)

lu~gB

BLu~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (abhut)

lRu~gS

SLRu~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Sanketa Contitional mood (abhavisyat)

 

E.g. - for 'वहति' - the tag will be - P_laTV_1.1 - (parasmai-pada laT-lakAra prathama-purusha eka-vacana) and tagged as वहति[P_laTV_1.1] (tag within square brackets)

 

Other Sub-tags: Ni – φijanta – Causal Verb – precedes lakāra tag

           e.g. kārayati[P_Ni_laTV_1.1]

          Sn – sannanta – Desiderous Verb – precedes lākara tag

           e.g. cik΄r·ati[P_Sn_laTV_1.1]

          Nd – nāmadhātu – Nominal Verb – precedes lākara tag

           e.g. putr΄yati[P_Nd_laTV_1.1]

          Kr – karmani – Passive Verb – precedes lākara tag

           e.g. pa hyate[A_Kr_laTV_1.1]

          Bh – bhave – precedes lākara tag

           e.g. – bhθyate[A_Bh_laTV_1.1]

 

Avyaya Tags

  Tag

Description & Examples

AV

AVyaya (e.g. atha, iva, nu, saha) Particles

AVN

AVyaya Ni·edhārthaka (e.g. na, naiva, nahi, mā) Negative

AVC

AVyaya Conjunctive (e.g. ca, tu) Conjunctive

AVD

AVyaya Disjunctive (e.g. vā, athavā) Disjunctive

AVP

AVyaya P§nārthika (e.g. api, kinnu) Interrogative

AVT

AVyaya Tumunnanta (e.g. gantum, pātum) Infinitive

AVK

AVyaya Ktvānta (e.g. bhuktvā, pa hitvā) Gerund

AVL

AVyaya Lyabanta (e.g. ava-lambya, pra-h¨tya) Gerund

AVKV

AVyaya KriyāVi§e·ana (e.g. uccaiϊ, niicaiϊ) Adverbs

UD

UDgara (e.g. hā, hanta) Interjection

 

नमः[AV], उच्चैः[AVKV]

 

Punctuation and other Tags

|

PUN_VV - Punctuation VŒkya VirŒma - sentence end marker, half shloka marker, etc

||

PUN_SA - Punctuation κloka Anta - shloka end marker

,

PUN_LV – Punctuation Laghu VirŒma – comma

?

PUN_PC – Punctuation Pra§na Cihna - question mark

!

PUN_AC – Punctuation ξ§carya Cihna - exclamatory mark

‘

PUN_UC – Punctuation Uddharana Cihna 1 -  quote open

’

PUN_SC – Punctuation Samvarana Cihna 1 - quote close

“

PUN_UCd – Punctuation Uddharana Cihnadvaya - double quote open

”

PUN_SCd – Punctuation Samvarana Cihnadvaya - double quote close

(

PUN_VU1 – Punctuation Valaya-Uuddharana Cihna 1 - open braces

)

PUN_VS1 – Punctuation Valaya-Samvarana Cihna 1 - close braces

[

PUN_VU2  Punctuation Valaya-Uddharana Cihna 2 - open square bracket

]

PUN_VS2  Punctuation Valaya-Samvarana Cihna 2 - close square bracket

{

PUN_VU3 – Punctuation Valaya-Uuddharana Cihna 3 - open flower bracket

}

PUN_VS3  Punctuation Valaya-Samvarana Cihna 3 - close flower bracket

-

PUN_DS  dash

:

PUN_CL  colon

;

PUN_VA – VŒkya-anga Anta -  semi-colon

/

PUN_BS – Punctuation back slash

+

PUN_PL - Punctuation  plus sign

=

PUN_EQ - Punctuation equal sign

.

PUN_BIN – bindu - Punctuation  dot

*

PUN_LAG - Laghvikarana - abbreviation marker ( पं* for पंडित)

AB

AB AnyaBhŒ·Œ - foreign word

SAM-SAM

SAM-SAM hyphenated number (१९४७-२००६)