SANSKRIT PART-OF-SPEECH TAGSET

This POS tagset for Sanskrit is developed by Dr. R. Chandrashekar as part of Ph.D. thesis Part-of-Speech Tagging for Sanskrit (2007) under the supervision of Dr. Girish Nath Jha at Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi.

The tagset is classified according to the morphological structure of Sanskrit words. There are three kinds of tags in this tagset. Word class main tags, feature sub-tags and punctuation tags. The tag as a whole is a combination of word class main tag with feature sub-tags separated by a delimitor underscore (indeclinable and punctuation tags do not have sub-tags). All tags bear Sanskrit names[1][1] with letter-digit acronymic in Roman script.

All the words in Sanskrit have inflections so they all get feature sub-tags except avyaya-s. These inflections (vibhakti-s) also suggest the syntactic relations between the words in a sentence. So tags in this tagset are morho-syntactic tags. Few tags have semantic nature along with morphological nature. NK and NB are the tags for the nouns which have the sense of agent and impersonal/abstract nouns which are morpho-semantic in nature. So the tagset has tags with morphological, syntactic and semantic features. This tagset is developed as a general purpose tagset which can be used for shallow parsing simple sandhi-split prose Sanskrit text.

The tagset has 65 word class tags, 43 feature sub-tags, and 25 punctuation tags and one tag AJ to tag unknown words a total of 134 tags.

The word class tags are 8 Noun tags, 8 Pronoun tags, 3 Adjective tags, 9 Participle tags, 2 Number tags, 14 Compound tags, 11 indeclinable tags and 10 verb tags.

Feature tags are three .3 (p,s,n); 8x3 = 24 (Nominal)Case and Number tags (1.1 through 8.3); 4 Verb base modifying tags (Nd, Yn, Sn, Ni); 1Verbal Preposition (UPA); 2 Pada tags (P and A); 2 voice tags (Kr and Bh, default kartari is unmarked); 3x3 = 9 (Verbal) Person and Number tags (1.1 through 3.3).

Annotation Guidelines

 

Noun Tags

Gender sub-tags: p,s,n (for masculine, feminine and neuter)

Declensional sub-tags: 1.1 for prathama vibhakti eka vacana,

1.2 For prathama vibhakti dvi vacana.... so on

8.1 For sambodhana prathama vibhakti eka vacana

N

Nāmapada (Common Noun, with gender and declensional sub-tags) (e.g. adri, grāma, rāja)

NA

NāmaAbhidhāna (Proper noun, with gender, number and declensional sub-tags) (e.g. indrah, gira, rādhikā, ittha)

NAD

Nāma Abhidhāna Desa (Proper Noun Country, with gender, number and declensional sug-tags) (e.g. vidarbhā, lā ā)

NAP

Nāma Abhidhāna Pum_apatya (Patronymic Noun, with gender, number, and declensional sub-tags) (e.g. rāghava, vairavaa, jānak)

NAS

Nāma Abhidhāna Stri_apatya (Metronymic Noun, with gender, number, and declensional sub-tags) (e.g., pārtha, saumitri)

NAT

Nāma Abhidhāna Tadraja (Tadraja Noun, with gender, number, and declensional sub-tags) (e.g. maithila, vaideha)

NS

Nāma Sannaarthaka ( Noun Desiderative, with gender, number, and declensional sub-tags) (e.g., pipāsā, cikitsā, cikru)

 

E.g.: वेगेन - N_p_3.1 (nāmapada-puliga-ttyā-vibhakti-eka-vacana) and tagged as - वेगेन[N_m_3.1] (tag within the square brackets)

 

Compounds

NCDI

Nāmapada which is a Compound of Dvandva Itaretara type (Coordinative Enumerative Compound which is a Noun, with gender and declensional sub-tags) (e.g. rmakau, rmalakmaabharataatrughn)

NCDS

Nāmapada which is a Compound of Dvandva Samhara type (Coordinative Collective Compound which is a Noun, with neuter gender, and singular number and declensional sub-tags) (e.g. pipdam, hranidrbhayam)

NCT2

Nāmapada in 2nd case Compounded to form Tatpurua dvity type (Determinative Compound (accusative), with gender, number and declensional sug-tags) (e.g. karita, dukhtta)

NCT3

Nāmapada in 3rd case Compounded to form Tatpurua trty type (Determinative Compound (instrumental), with gender, number and declensional sug-tags) (e.g. haritrta, nakhabhinna, vkkalaha)

NCT4

Nāmapada in 4th case Compounded to form Tatpurua caturth type (Determinative Compound (dative), with gender, number and declensional sug-tags) (e.g. ypadru, gohitam)

NCT5

Nāmapada in 5th case Compounded to form Tatpurua pacam type (Determinative Compound (ablative), with gender, number and declensional sug-tags) (e.g. corabhayam, svargapatita)

NCT6

Nāmapada in 6th case Compounded to form Tatpurua a h type (Determinative Compound (genitive), with gender, number and declensional sug-tags) (e.g. rjapurua, devendra)

NCT7

Nāmapada in 7th case Compounded to form Tatpurua saptam type (Determinative Compound (locative), with gender, number and declensional sug-tags) (e.g. akaaua, vardhna)

NCAl

Nāmapada Compounded without the deletion of case inflection to form Aluk type (gender, number and declensional sug-tags) (e.g. dhanajaya)

NCNT

Nāmapada Compounded with negation to form Na Tatpurua type (gender, number and declensional sug-tags) (e.g. abrhmaa)

NCK

Karmadhrya Compound (with gender, number and declensional sug-tags) (e.g. ghanayma, mukhakamalam, kasarpa, uklaka)

NCD

Dvigu Compound (with gender, number and declensional sug-tags) (e.g. amtura, pacava , pacatantram)

NCB

Bahuvrhi Compound (with gender, number and declensional sug-tags) (e.g. pitmbara, cakrapani)

NCA

Avyaya Compound (e.g. yathāakti, amāmi) Adverbial Compound

 

Pronoun Tags

With/without gender, number and declension sub-tags

SN

 

Sarva Nāman (Pronoun Other, with gender, number, and declensional sub-tags) (e.g., anya, aparā)

SNU

Sarva Nāman Uttama (Pronoun First Person, number, and declensional sub-tags) (e.g., asmad)

SNM

Sarva Nāman Madhyama (Pronoun Second Person, number, and declensional sub-tags) (e.g., tvad)

SNA

Sarva Nāman tman (Pronoun Reflexive, with or without gender, number, and declensional sub-tags) (e.g., nija, svasya)

SNN

Sarva Nāman Nirdesatmaka (Pronoun Demonstrative, with gender, number, and declensional sub-tags) (e.g., idam, sa)

SNP

Sarva Nāman Pnārthika (Pronoun Interrogative, with gender, number, and declensional sub-tags) (e.g., kim, kad)

SNS

Sarva Nāman Sāmbandhika (Pronoun Relative, with gender, number, and declensional sub-tags) (e.g., ya, yā)

 

अस्य[SND_p_6.1] (Pronoun demonstrative masculine a h-vibhakti eka-vacana)

 

Adjective Tags

With gender, number and declension sub-tags

NVI

 

Nāma VIeaa (Adjective, with gender, number, and declensional sub-tags) (e.g., sundara, krurā)

NVIT

Nāma VIeaa Tulanatmaka (Adjective Comparative, with gender, number, and declensional sub-tags) (e.g., alpabhāgyatara, reya)

NVIA

Nāma VIeaa Atishayavaci ( Adjective Superlative, with gender, number, and declensional sub-tags) (e.g., sattama, jye hā)

विशालाः[NVI_p_1.3] (Nāma VIeaa masculine prathama-vibhakti bahu-vacana)

 

Number Tags

With gender, number and declension sub-tags

SAM

Sakhyā (Cardinal Number, with gender, number, and declensional sub-tags) (e.g., eka, dve)

SAMY

Sakhyeya (Ordinal Number, with gender, number, and declensional sub-tags) (e.g., prathama, turyā)

 

अष्टौ[SAMC_p_1.3](Cardinal Number masculine prathama-vibhakti bahu-vacana)

 

Participle Tags

With gender, number and declension sub-tags

Extra: with Nic: Ni_KB2_psn = kAritavat

with Nominal: Nd_Ni_KB2_psn = kRupAyita

KV1

Krdanta Vartamana 1 (Satr, with gender and declensional sub-tags) (e.g. kurvan, gacchat) Past Active Participle

KV2

Krdanta Vartamana 2 (Sanac, Satr, with gender and declensional sub-tags) (e.g. labhamanah, vardhamanam) Past Middle/Active Participle

KB1

Krdanta Bhuta 1 (, Kta with gender and declensional sub-tags) (e.g. drstah, gatam) Past Passive Participle

KB2

Krdanta Bhuta 2 (Ktavat, with gender and declensional sub-tags) (e.g. uktavat, drstavan) Past Active Participle

KAa

Krdanta Agami a (sya-satr, with gender and declensional sub-tags) (e.g. karisyat ) Future Active Participle

KAb

Krdanta Agami b (sya-sanac, with gender and declensional sub-tags) (e.g.karisyamana ) Future Passive Participle

KVI1

Krdanta VIdhyarthaka 1 (-ya, with gender and declensional sub-tags) (e.g. karya) Gerundive

KVI2

Krdanta VIdhyarthaka 2 (-tavya, with gender and declensional sub-tags) (e.g. kartavya) Gerundive

KVI3

Krdanta VIdhyarthaka 3 (-aniya, with gender and declensional sub-tags) (e.g. karaniya) Gerundive

 

क्रियमाणः[KV2_p_1.1](vartamāna-kt-atranta puliga prathamā-vibhakti eka-vacana)

 

Verb Tags

For 'Atmane pada' - add 'A' before the lakAra tag

For 'parasmai pada' - add 'P' before the lakAra tag

 

lakAra tags

laT lakAra Vartamana - laTV

liT lakAra Bhuta - liTB

luT lakAra Agami - luTAg

lR^iT lakAra Agami - lR^iTAg

loT lakAra Ajna - loTA

la~N lakAra Bhuta - la~NB

li~N lakAra vidhi - li~NVi

li~N lakAra AshI - li~NAs

lu~N lakAra Bhuta - lu~NB

lR^i~N lakAra Sanketa - lR^i~NS

 

For 'purusha' and 'vacana' - 1.1 - prathama purusha eka vacana

1.2 - prathama purusha dvi vacana

1.3 - prathama purusha bahu vacana

2.1 - madhyama purusha eka vacana

2.2 - madhyama purusha dvi vacana

2.3 - uttama purusha bahu vacana

3.1 - uttama purusha eka vacana

3.2 - uttama purusha dvi vacana

3.3 - uttama purusha bahu vacana

 

These ' purua ' and 'vacana' tags will come after the lakara tag

laTV

VLaT ( preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Vartamana Present Tense (bhavati)

liTB

BLiT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (bhabhuva)

luTAg

AgLuT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Agami Future Tense (bhavita)

lRuTAg

AgLRuT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Agami Future Tense (bhavisyati)

loTA

ALoT (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Ajna Imperative mood (bhavatu)

la~gB

BLa~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (abhavat)

li~gVi

Vidhi li~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Vidhi Potential mood (bhavet)

li~gAs

Ashir li~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Ashih Benedictive mood (bhuyat)

lu~gB

BLu~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Bhuta Past Tense (abhut)

lRu~gS

SLRu~g (preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Sanketa Contitional mood (abhavisyat)

 

E.g. - for 'वहति' - the tag will be - P_laTV_1.1 - (parasmai-pada laT-lakAra prathama-purusha eka-vacana) and tagged as वहति[P_laTV_1.1] (tag within square brackets)

 

Other Sub-tags: Ni ijanta Causal Verb precedes lakāra tag

e.g. kārayati[P_Ni_laTV_1.1]

Sn sannanta Desiderous Verb precedes lākara tag

e.g. cikrati[P_Sn_laTV_1.1]

Nd nāmadhātu Nominal Verb precedes lākara tag

e.g. putryati[P_Nd_laTV_1.1]

Kr karmani Passive Verb precedes lākara tag

e.g. pa hyate[A_Kr_laTV_1.1]

Bh bhave precedes lākara tag

e.g. bhyate[A_Bh_laTV_1.1]

 

Avyaya Tags

Tag

Description & Examples

AV

AVyaya (e.g. atha, iva, nu, saha) Particles

AVN

AVyaya Niedhārthaka (e.g. na, naiva, nahi, mā) Negative

AVC

AVyaya Conjunctive (e.g. ca, tu) Conjunctive

AVD

AVyaya Disjunctive (e.g. vā, athavā) Disjunctive

AVP

AVyaya Pnārthika (e.g. api, kinnu) Interrogative

AVT

AVyaya Tumunnanta (e.g. gantum, pātum) Infinitive

AVK

AVyaya Ktvānta (e.g. bhuktvā, pa hitvā) Gerund

AVL

AVyaya Lyabanta (e.g. ava-lambya, pra-htya) Gerund

AVKV

AVyaya KriyāVieana (e.g. uccai, niicai) Adverbs

UD

UDgara (e.g. hā, hanta) Interjection

 

नमः[AV], उच्चैः[AVKV]

 

Punctuation and other Tags

|

PUN_VV - Punctuation Vkya Virma - sentence end marker, half shloka marker, etc

||

PUN_SA - Punctuation loka Anta - shloka end marker

,

PUN_LV Punctuation Laghu Virma comma

?

PUN_PC Punctuation Prana Cihna - question mark

!

PUN_AC Punctuation carya Cihna - exclamatory mark

PUN_UC Punctuation Uddharana Cihna 1 - quote open

PUN_SC Punctuation Samvarana Cihna 1 - quote close

PUN_UCd Punctuation Uddharana Cihnadvaya - double quote open

PUN_SCd Punctuation Samvarana Cihnadvaya - double quote close

(

PUN_VU1 Punctuation Valaya-Uuddharana Cihna 1 - open braces

)

PUN_VS1 Punctuation Valaya-Samvarana Cihna 1 - close braces

[

PUN_VU2 Punctuation Valaya-Uddharana Cihna 2 - open square bracket

]

PUN_VS2 Punctuation Valaya-Samvarana Cihna 2 - close square bracket

{

PUN_VU3 Punctuation Valaya-Uuddharana Cihna 3 - open flower bracket

}

PUN_VS3 Punctuation Valaya-Samvarana Cihna 3 - close flower bracket

-

PUN_DS dash

:

PUN_CL colon

;

PUN_VA Vkya-anga Anta - semi-colon

/

PUN_BS Punctuation back slash

+

PUN_PL - Punctuation plus sign

=

PUN_EQ - Punctuation equal sign

.

PUN_BIN bindu - Punctuation dot

*

PUN_LAG - Laghvikarana - abbreviation marker ( पं* for पंडित)

AB

AB AnyaBh - foreign word

SAM-SAM

SAM-SAM hyphenated number (१९४७-२००६)