SANSKRIT PART-OF-SPEECH TAGSET
This POS tagset for Sanskrit is developed by Dr. R. Chandrashekar as
part of Ph.D. thesis Part-of-Speech Tagging for Sanskrit (2007) under the
supervision of Dr. Girish Nath Jha
at Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi.
The tagset is classified
according to the morphological structure of Sanskrit words. There are three
kinds of tags in this tagset. Word class main tags, feature sub-tags and
punctuation tags. The tag as a whole is a combination of word class main tag
with feature sub-tags separated by a delimitor underscore (indeclinable
and punctuation tags do not have sub-tags). All tags bear Sanskrit names[1][1] with letter-digit acronymic in Roman
script.
All the words in Sanskrit have inflections so they all
get feature sub-tags except avyaya-s. These inflections (vibhakti-s)
also suggest the syntactic relations between the words in a sentence. So tags
in this tagset are morho-syntactic tags. Few tags have semantic nature along
with morphological nature. NK and NB are the tags for the nouns
which have the sense of agent and impersonal/abstract nouns which are
morpho-semantic in nature. So the tagset has tags with morphological, syntactic
and semantic features. This tagset is developed as a general purpose tagset
which can be used for shallow parsing simple sandhi-split prose Sanskrit text.
The tagset has 65 word class tags, 43 feature sub-tags, and 25
punctuation tags and one tag AJ to tag unknown words a total of 134
tags.
The word class tags are 8 Noun tags, 8 Pronoun tags, 3
Adjective tags, 9 Participle tags, 2 Number tags, 14 Compound tags, 11
indeclinable tags and 10 verb tags.
Feature tags are three .3 (p,s,n); 8x3 = 24 (Nominal)Case and Number tags (1.1 through 8.3); 4 Verb base modifying tags (Nd, Yn, Sn, Ni); 1Verbal Preposition (UPA); 2 Pada tags (P and A); 2 voice tags (Kr and Bh, default kartari is unmarked); 3x3 = 9 (Verbal) Person and Number tags (1.1 through 3.3).
Noun Tags
Gender sub-tags: p,s,n (for masculine, feminine and neuter)
Declensional sub-tags: 1.1 for prathama vibhakti eka vacana,
1.2 For prathama
vibhakti dvi vacana.... so on
8.1 For
sambodhana prathama vibhakti eka vacana
N |
Nāmapada (Common Noun, with gender and declensional sub-tags) (e.g. adriϊ, grāma, rāja) |
NA |
NāmaAbhidhāna (Proper noun, with gender, number and declensional sub-tags) (e.g. indrah, gir΄§aϊ, rādhikā, Άitthaϊ) |
NAD |
Nāma Abhidhāna Desa (Proper Noun Country, with gender, number and declensional sug-tags) (e.g. vidarbhāϊ, lā āϊ) |
NAP |
Nāma Abhidhāna Pum_apatya (Patronymic Noun, with gender, number, and declensional sub-tags) (e.g. rāghavaϊ, vai§ravaφaϊ, jānak΄) |
NAS |
Nāma Abhidhāna Stri_apatya (Metronymic Noun, with gender, number, and declensional sub-tags) (e.g., pārthaϊ, saumitriϊ) |
NAT |
Nāma Abhidhāna Tadraja (Tadraja Noun, with gender, number, and declensional sub-tags) (e.g. maithilaϊ, vaidehaϊ) |
NS |
Nāma Sannaarthaka ( Noun Desiderative, with gender, number, and declensional sub-tags) (e.g., pipāsā, cikitsā, cik΄r·uϊ) |
E.g.: वेगेन - N_p_3.1 (nāmapada-pu΅liΊga-t¨t΄yā-vibhakti-eka-vacana)
and tagged as - वेगेन[N_m_3.1] (tag within the square brackets)
Compounds
NCDI |
Nāmapada which is a Compound of Dvandva Itaretara type (Coordinative Enumerative Compound which is a Noun, with gender and declensional sub-tags) (e.g. rmak¨·φau, rmalak·maφabharata§atrughnϊ) |
NCDS |
Nāmapada which is a Compound of Dvandva Samhara type (Coordinative Collective Compound which is a Noun, with neuter gender, and singular number and declensional sub-tags) (e.g. pφipdam, hranidrbhayam) |
NCT2 |
Nāmapada in 2nd case Compounded to form Tatpuru·a dvit΄y type (Determinative Compound (accusative), with gender, number and declensional sug-tags) (e.g. k¨·φa§ritaϊ, duϊkht΄taϊ) |
NCT3 |
Nāmapada in 3rd case Compounded
to form Tatpuru·a
trt΄y
type (Determinative Compound (instrumental), with gender, number and declensional
sug-tags) (e.g. haritrtaϊ,
nakhabhinnaϊ, vkkalahaϊ) |
NCT4 |
Nāmapada in 4th case
Compounded to form Tatpuru·a
caturth΄
type (Determinative Compound (dative), with gender, number and declensional
sug-tags) (e.g. yθpadru,
gohitam) |
NCT5 |
Nāmapada in 5th case
Compounded to form Tatpuru·a
pacam΄
type (Determinative Compound (ablative), with gender, number and declensional
sug-tags) (e.g. corabhayam, svargapatitaϊ) |
NCT6 |
Nāmapada in 6th case
Compounded to form Tatpuru·a
·a· h΄
type (Determinative Compound (genitive), with gender, number and declensional
sug-tags) (e.g. rjapuru·aϊ,
devendraϊ) |
NCT7 |
Nāmapada in 7th case
Compounded to form Tatpuru·a
saptam΄
type (Determinative Compound (locative), with gender, number and declensional
sug-tags) (e.g. ak·a§auφΆaϊ,
΄§vardh΄naϊ) |
NCAl |
Nāmapada Compounded without the deletion of case inflection
to form Aluk type (gender, number and
declensional sug-tags) (e.g. dhanajayaϊ) |
NCNT |
Nāmapada Compounded with negation to form Na
Tatpuru·a
type (gender, number and declensional sug-tags) (e.g. abrhmaφaϊ) |
NCK |
Karmadhrya Compound (with
gender, number and declensional sug-tags) (e.g. |
NCD |
Dvigu Compound (with
gender, number and declensional sug-tags) (e.g.
·aφmturaϊ, pacava ΄, pacatantram) |
NCB |
Bahuvr΄hi
Compound (with gender, number and
declensional sug-tags) (e.g. pitmbaraϊ,
cakrapaniϊ) |
NCA |
Avyaya Compound (e.g.
yathā§akti,
am§ām§i)
Adverbial Compound |
Pronoun Tags
With/without gender, number and declension sub-tags
SN |
Sarva Nāman (Pronoun Other, with gender, number, and declensional sub-tags) (e.g., anyaϊ, aparā) |
SNU |
Sarva Nāman Uttama (Pronoun First Person, number, and declensional sub-tags) (e.g., asmad) |
SNM |
Sarva Nāman Madhyama (Pronoun Second Person, number, and declensional sub-tags) (e.g., tvad) |
SNA |
Sarva Nāman ξtman (Pronoun Reflexive, with or without gender, number, and declensional sub-tags) (e.g., nijaϊ, svasya) |
SNN |
Sarva Nāman Nirdesatmaka (Pronoun Demonstrative, with gender, number, and declensional sub-tags) (e.g., idam, saϊ) |
SNP |
Sarva Nāman Prā§nārthika (Pronoun Interrogative, with gender, number, and declensional sub-tags) (e.g., kim, kad) |
SNS |
Sarva Nāman Sāmbandhika
(Pronoun Relative, with gender, number, and declensional sub-tags) (e.g., yaϊ,
yā) |
अस्य[SND_p_6.1] (Pronoun demonstrative masculine ·a· h΄-vibhakti
eka-vacana)
Adjective Tags
With gender, number and declension sub-tags
NVI |
Nāma VI§e·aφa (Adjective, with gender, number, and declensional sub-tags) (e.g., sundaraϊ, krurā) |
NVIT |
Nāma VI§e·aφa Tulanatmaka (Adjective Comparative, with gender, number, and declensional sub-tags) (e.g., alpabhāgyataraϊ, §reyaϊ) |
NVIA |
Nāma VI§e·aφa Atishayavaci ( Adjective Superlative, with gender, number, and declensional sub-tags) (e.g., sattamaϊ, jye· hā) |
विशालाः[NVI_p_1.3] (Nāma
VI§e·aφa
masculine prathama-vibhakti
bahu-vacana)
Number Tags
With gender, number and declension sub-tags
SAM |
Sa΅khyā (Cardinal Number, with gender, number, and declensional sub-tags) (e.g., ekaϊ, dve) |
SAMY |
Sa΅khyeya (Ordinal Number, with gender, number, and declensional sub-tags) (e.g., prathamaϊ, tur΄yā) |
अष्टौ[SAMC_p_1.3](Cardinal Number masculine prathama-vibhakti
bahu-vacana)
Participle Tags
With gender, number and declension sub-tags
Extra: with Nic: Ni_KB2_psn = kAritavat
with
Nominal: Nd_Ni_KB2_psn = kRupAyita
KV1 |
Krdanta Vartamana 1 (Satr,
with gender and declensional sub-tags)
(e.g. kurvan, gacchat) Past Active Participle |
KV2 |
Krdanta Vartamana 2 (Sanac, Satr, with gender and declensional sub-tags) (e.g. labhamanah, vardhamanam) Past Middle/Active Participle |
KB1 |
Krdanta Bhuta 1 (, Kta with gender and declensional sub-tags) (e.g. drstah, gatam) Past Passive Participle |
KB2 |
Krdanta Bhuta 2 (Ktavat, with gender and declensional sub-tags) (e.g. uktavat, drstavan) Past Active Participle |
KAa |
Krdanta Agami a (sya-satr, with gender and declensional sub-tags) (e.g. karisyat ) Future Active Participle |
KAb |
Krdanta Agami b (sya-sanac, with gender and declensional sub-tags) (e.g.karisyamana ) Future Passive Participle |
KVI1 |
Krdanta VIdhyarthaka 1 (-ya, with gender and declensional sub-tags) (e.g. karya) Gerundive |
KVI2 |
Krdanta VIdhyarthaka 2 (-tavya, with gender and declensional sub-tags) (e.g. kartavya) Gerundive |
KVI3 |
Krdanta VIdhyarthaka 3 (-aniya, with gender and declensional sub-tags) (e.g. karaniya) Gerundive |
क्रियमाणः[KV2_p_1.1](vartamāna-k¨t-§atranta pu΅liφga
prathamā-vibhakti eka-vacana)
Verb Tags
For 'Atmane pada' - add 'A' before the lakAra tag
For 'parasmai pada' - add 'P' before the lakAra tag
lakAra tags
laT lakAra Vartamana - laTV
liT lakAra Bhuta -
liTB
luT lakAra Agami
- luTAg
lR^iT lakAra Agami
- lR^iTAg
loT lakAra Ajna - loTA
la~N lakAra Bhuta - la~NB
li~N lakAra vidhi
- li~NVi
li~N lakAra AshI - li~NAs
lu~N lakAra Bhuta - lu~NB
lR^i~N lakAra Sanketa - lR^i~NS
For 'purusha' and 'vacana' - 1.1 - prathama purusha eka
vacana
1.2 -
prathama purusha dvi vacana
1.3 - prathama purusha bahu vacana
2.1 - madhyama purusha eka
vacana
2.2 - madhyama purusha dvi vacana
2.3 - uttama purusha bahu vacana
3.1 - uttama purusha eka vacana
3.2 - uttama
purusha dvi vacana
3.3 - uttama
purusha bahu vacana
These '
puru·a ' and 'vacana' tags will come
after the lakara tag
laTV |
VLaT ( preceded with either P (parasmai) or A (atmane) and post joined with purusa and vacana sub-tags) Vartamana Present Tense (bhavati) |
liTB |
BLiT (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Bhuta Past Tense (bhabhuva) |
luTAg |
AgLuT (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Agami Future Tense (bhavita) |
lRuTAg |
AgLRuT (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Agami Future Tense (bhavisyati) |
loTA |
ALoT (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Ajna Imperative mood (bhavatu) |
la~gB |
BLa~g (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Bhuta Past Tense (abhavat) |
li~gVi |
Vidhi li~g (preceded with either P (parasmai) or A
(atmane) and post joined with purusa and vacana sub-tags) Vidhi Potential
mood (bhavet) |
li~gAs |
Ashir li~g (preceded with either P (parasmai) or A
(atmane) and post joined with purusa and vacana sub-tags) Ashih Benedictive
mood (bhuyat) |
lu~gB |
BLu~g (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Bhuta Past Tense (abhut) |
lRu~gS |
SLRu~g (preceded
with either P (parasmai) or A (atmane) and post joined with purusa and vacana
sub-tags) Sanketa Contitional mood (abhavisyat) |
E.g. - for 'वहति' - the tag
will be - P_laTV_1.1 - (parasmai-pada laT-lakAra prathama-purusha eka-vacana)
and tagged as वहति[P_laTV_1.1] (tag within square brackets)
Other Sub-tags: Ni φijanta Causal Verb precedes lakāra tag
e.g. kārayati[P_Ni_laTV_1.1]
Sn sannanta Desiderous Verb
precedes lākara tag
e.g. cik΄r·ati[P_Sn_laTV_1.1]
Nd nāmadhātu Nominal Verb precedes
lākara tag
e.g. putr΄yati[P_Nd_laTV_1.1]
Kr
karmani Passive Verb precedes lākara tag
e.g. pa hyate[A_Kr_laTV_1.1]
Bh
bhave precedes lākara tag
e.g.
bhθyate[A_Bh_laTV_1.1]
Avyaya Tags
Tag |
Description & Examples |
AV |
AVyaya (e.g. atha,
iva, nu, saha) Particles |
AVN |
AVyaya Ni·edhārthaka (e.g. na, naiva,
nahi, mā) Negative |
AVC |
AVyaya Conjunctive (e.g. ca, tu) Conjunctive |
AVD |
AVyaya Disjunctive (e.g. vā, athavā) Disjunctive |
AVP |
AVyaya Prā§nārthika (e.g. api, kinnu) Interrogative |
AVT |
AVyaya Tumunnanta (e.g. gantum, pātum) Infinitive |
AVK |
AVyaya Ktvānta (e.g. bhuktvā, pa hitvā) Gerund |
AVL |
AVyaya Lyabanta (e.g. ava-lambya, pra-h¨tya) Gerund |
AVKV |
AVyaya KriyāVi§e·ana (e.g. uccaiϊ, niicaiϊ) Adverbs |
UD |
UDgara (e.g.
hā, hanta) Interjection |
नमः[AV], उच्चैः[AVKV]
Punctuation and other Tags
| |
PUN_VV - Punctuation Vkya Virma - sentence
end marker, half shloka marker, etc |
|| |
PUN_SA - Punctuation κloka Anta - shloka end marker |
, |
PUN_LV Punctuation Laghu Virma comma |
? |
PUN_PC Punctuation Pra§na Cihna - question mark |
! |
PUN_AC Punctuation ξ§carya
Cihna - exclamatory mark |
|
PUN_UC Punctuation Uddharana Cihna 1 - quote open |
|
PUN_SC Punctuation Samvarana Cihna 1 - quote close |
|
PUN_UCd Punctuation Uddharana Cihnadvaya - double quote open |
|
PUN_SCd Punctuation Samvarana Cihnadvaya - double quote
close |
( |
PUN_VU1 Punctuation Valaya-Uuddharana Cihna 1 - open
braces |
) |
PUN_VS1 Punctuation Valaya-Samvarana Cihna 1 - close braces |
[ |
PUN_VU2 Punctuation
Valaya-Uddharana Cihna 2 - open square bracket |
] |
PUN_VS2 Punctuation
Valaya-Samvarana Cihna 2 - close square bracket |
{ |
PUN_VU3 Punctuation Valaya-Uuddharana Cihna 3 - open flower
bracket |
} |
PUN_VS3 Punctuation
Valaya-Samvarana Cihna 3 - close flower bracket |
- |
PUN_DS dash |
: |
PUN_CL colon |
; |
PUN_VA Vkya-anga Anta -
semi-colon |
/ |
PUN_BS Punctuation back slash |
+ |
PUN_PL - Punctuation
plus sign |
= |
PUN_EQ - Punctuation equal sign |
. |
PUN_BIN bindu - Punctuation
dot |
* |
PUN_LAG - Laghvikarana - abbreviation marker ( पं* for पंडित) |
AB |
AB AnyaBh· - foreign word |
SAM-SAM |
SAM-SAM hyphenated number (१९४७-२००६) |