Syntactic Annotation

This section will give a detatiled description about how annotation takes place at the syntactic level. The syntactic annotation consists of two phases: (i) identifying the structure (dependency) of the sentence in the form of tree and (ii) identifying the dependency relations and assigning those relations to the edges in the tree structure.

Identifying the Structure


The structure of the sentence is identified manually by attaching the dependent nodes to the governing nodes. In the sentential structure, the head of the sentence will be predicate, and the predicate will have arguments (noun phrases, adverbials) as their children. The objective of this step would be, identifying the predicate rooted structure and attaching to the techincal root (AuxS, defined below) of the tree. The end of the sentence will also be attached to the technical root.  Once the structure is identified, all the edges have to be labeled with their relations. For technical reasons, the relation between the dependent and the governing node is stored as an a-layer attribute of the dependent node. The attribute is called 'afun'. The following sections explains each relation in detail.

Dependency Relations (Analytical functions)

Table 4.1: Dependency Relations (Afun values)
No
Afun
Description
1
AAdjn
Adverbial Adjunct
2
AComp
Adverbial Complement
3
Apos
Apposition
4
Atr
Attribute
5
AdjAtr Adjectival participial or Adjectivalized verbs
6
AuxA
Determiners
7
AuxC
Subordinating conjunctions
8
AuxG
Symbols other than comma
9
AuxK
Sentence termination symbols
10
AuxP
Postpositions
11
AuxS
Technical root
12
AuxV
Auxiliary verb
13
AuxX
Comma other than in coordination
14
AuxZ
Emphasis words or particles
15
CC
Part of a word
16
Comp
Complement other than attaching to verbs
17
Coord
Coordination node
18
Obj
Object (both direct and indirect)
19
Pnom
Predicate nominal
20
Pred
Predicate
21
Sb
Subject

The usage of analytical function or 'afun' or dependency relation in the following sections, they all refer to the dependency relation between dependent node and the governing node.

Detailed Description of Analytical Functions


In this sub section, each analytical function defined in the Table 4.1  is explained in detail.

AAdjn

The AAdjn relation is used to mark the adverbial adjuncts. The adverbial adjuncts are optional adverbial phrases, postpositional phrase, clauses or  simple adverbs modifying the verbs. 


An example for AAdjn
Figure 4.1: Analytical Function - AAdjn

Tamil:
பின்னர் எப்படி 
உங்கள் 
மதச்சார்பின்மையை 
நம்புவது என்று 
கேள்வி
எழுப்பினார்
.
Tr:
pinnar
eppati
ungkaL
maTaccArpinmaiyai
wampuvaTu
enRu
kELvi
ezuppinAr
.
Gloss:
then
how
your
secular credential-ACC
believing
that
question
raised-he
.
English:
" Then how to believe your secular credentials ", he raised a question

The Figure 4.1 shows how simple adverb (pinnar/'then, later') has been labeled with AAdjn label.  The following Figure 4.2 shows how an adverbial phrase ("by putting forward ...") has been labeled with AAdjn label.


An example for AAdjn
Figure 4.2: Analytical Function - AAdjn

Tamil:
நாங்கள்
சமூக 
நீதியை 
உம்
சமதர்மத்தை
உம்  
சமமான
வளர்ச்சியை
உம்
முன்னிறுத்தி
பிரசாரம் 
செய்கிறோம்
.
Tr:
wAngkaL
camUka 
wITiyai 
um
camaTarmaTTai 
um
camamAna 
vaLarcciyai 
um 
munniRuTTi 
piracAram 
ceykiROm
.
Gloss:
we
social
justice

equality
equal
development 
 -
by putting forward
campaign
do
.
English:
We are doing campaign by putting forward social justice, equality and equal development .


AComp

The AComp relation is used to mark the obligatory adverbials or adverbial complements in the structure. The context of occurrence of AComp relation is same as that of AAdjn relation. The only difference is that the 'adverbial adjuncts' (AAdjn) are optional elements in the sentential structure whereas 'adverbial complements (AComp)'are obligatory elements in the sentence structure. While doing annotation, this relationship is determined by whether  the 'adverbial structure' is required to complete the sentence. If the removal of the adverbial structure does not affect the sentence as a whole, then it is labeled as AAdjn or it will be labeled as AComp. The following Figure 4.3 illustrate this idea.


An example for AComp
Figure 4.3: Analytical Function - AComp

Tamil:
வளர்ச்சிக்க்  ஆக  
போராடுவோம் 
.
Tr:
vaLarccikk
Aka
pOrAtuvOm
.
Gloss:
development
for (the sake of)
struggle-we
.
English:
We will struggle for development .

In Figure 4.3, the phrase "vaLarccikkk Aka (for development)" occurs as an obligatory element in the sentence. Though it can be argued that in Tamil it is possible to write a minimal sentence with just a verb, the decision is made mainly based on whether the sentence conveys the meaning without loosing the essential information.

Apos

The adjectival clauses headed by enRa are appositional clauses. The entire finite clause will be attached to enRa which will act as a modifier to a noun phrase. The clausal head which is attached to enRa will receive Apos label.  The following Figure 4.4 illustrates the labeling of Apos.


An example for Apos
Figure 4.4: Analytical Function - Apos

Tamil:
அதுபோன்று  பிகாரில்
உம்
காங்கிரஸ்
வெற்றி 
பெறும்
என்ற
நம்பிக்கை 
எனக்கு
உள்ளது
.
Tr:
aTupOnRu
pikAril
um
kAngkiraS 
veRRi
peRum 
enRa 
wampikkai 
enakku 
uLLaTu  
.
Gloss:
like that
in Bihar
also
Congress
victory
gain
that
confidence
me
be-exist
.
English:
 I have a confidence that similarly in Bihar too the Congress will gain victory .

Other occurrence of appositions are not handled at present. It will be included in future versions of the annotation.

Atr

Attribute (refer PDT documentation) is a sentence member which depends on noun and closely determines its meaning. Original PDT annotation diferentiates "agreeing" and "non-agreeing" attribute. But, since Tamil does not have any agreement between nouns and their modifers, all noun modifers will receive the afun label Atr. The noun modifiers include nouns (except the head noun) in noun compounds, adjectives, numerals and adjectival participles. The Figure 4.5  shows the usage of afun Atr.


An example for Atr
Figure 4.5: Analytical Function - Atr

Tamil:
வடக்கு   மற்றும்  
கிழக்குப்  
பகுதிகளுக்க்
ஆக
55
பேருந்துகளை
இந்தியா
வழங்கிய் 
உள்ளது
.
Tr:
vatakku
maRRum 
kizakkup
pakuTikaLukk 
Aka 
55 
pEruwTukaLai 
iwTiyA 
vazangkiy 
uLLaTu 
.
Gloss:
north
and
east
regions
for
55
buses
India
give
AUX-PERF
.
English:
 India has provided 55 buses to north and eastern regions .

AdjAtr

AdjAtr label is used to mark the adjectival clauses, adjectivalized verbs or adjectival participials. They are equivalent to -ing, -ed (singing girl, departed train) forms in English. Verbs in Tamil can be adjectivalized for all three tenses, and they take appropriate tags depending on the word form features. The Figure shows an example for AdjAtr labeling.

An example for AdjAtr
Figure 4.6: Analytical Function - AdjAtr

Tamil:
கொழும்ப்  இலிருந்து  
விமானம் 
மூலம்
வன்னி
ராணுவ 
தலைமயகத்துக்குச்
சென்ற
அவர்
பின்னர் 
அங்கிருந்து கார் 
 மூலம்
வவுனியாவுக்கு 
அழைத்துச்
செல்லப்
பட்ட்டார்
.
Tr:
kozump
iliruwTu
vimAnamp
mUlam 
vanni 
rANuva
TalaimaiyakaTTukkuc 
cenRa 
avar
pinnar
angkiruwTu 
kAr
mUlam  
vavuniyAvukkv 
azaiTTuc
cellap
pattAr
.
Gloss:
Colombo
from
plane
through  
vanni
military 
headquarters-to
who went
he or she
later
from there
car
through
to VavuniyA
taken



English:
He/she who went to Vanni Military Headquarters by plane from Colombo, was later taken to VavuniyA from there by car.


AuxA

Tamil has two demonstrative determiners corresponding to 'this' and 'that' in English. At present, the question word ewTa/'which' is also tagged as determiner. Sometimes the determiners occur as a prefix to the following noun or noun phrase in a contracted form. During the tokenization phase, these demonstrative suffixes will be separated from the noun or noun phrase, and will be considered as a separate token. The determiners are iwTa/'this', awTa /'that' and ewTa/'which', and the corresponding contracted determiners are i, a and e. Due to orthorgraphic rules, the first letter of the following noun phrase will be added to the contracted form. The Figure 4.7 illustrates the usage of all determiners.


An example for AuxA
Figure 4.7: Analytical Function - AuxA

Tamil:
அப் பகுதியில்
வாழ்ந்தவர்கள்  
தான்  ...
Tr:
ap
pakuTiyil
vAzwTavarkaL
TAn ...
Gloss:
that
in the region
those who have lived
EMPHASIS
English:
It was those who have lived in that region

Tamil:
இச்
சட்டத்தின் 
படி
Tr:
ic
cattaTTin
pati
Gloss:
this 
law
according to
English:
according to this law

Tamil:
இந்தப்
புதிய  
சட்டம் 
Tr:
iwTap
puTiya
cattam
Gloss:
this
new
law
English:
this new law

Tamil:
அந்த நிதி  
உதவியை 
Tr:
awTa
wiTi
uTaviyai
Gloss:
that
financial
help
English:
that financial help

AuxC

In Tamil, embedding or adjoining of clauses are performed either by morphologically marking the clause or by using separate words. When separate words are used, they function similar to that of subordinating conjunction words in other languages such as English. These separate words are called complementizers in Tamil. Complementizers can be verbs, nouns or postpositions after nominalized clauses. There are three complementizing verbs - en/'say', pOl/'seem' and Aku/'become'. They have grammatical function during embedding of clauses, otherwise they retain their lexical meaings. The following list provides some of the noun complementizers - pOTu/'time', mun/'before', piRaku/'after', utan/'immediacy', varai/'as long as' and etc. The postpositions can also be interpreted as subordinating conjunction words when they were preceded by nominalized clauses. Refer [1] for detailed treatment of how complementizers work in Tamil.

An example for AuxC
Figure 4.8: Analytical Function - AuxC

Tamil:
தனித்துப்  போட்டியிடுவது  
தொடக்கத்தில்  
சிரமமானது
என்று
எனக்குத்
தெரியும்
.
Tr:
TaniTTup
pOttiyituvaTu
TotakkaTTil
ciramamAnaTu
enRu
enakkuT
Teriyum
.
Gloss:
separately 
competing
in the beginning
difficult
that
I
know
.
English:
I know that it will be difficult in the beginning to compete separately .


An example for AuxC
Figure 4.9: Analytical Function - AuxC

Tamil:
கொழும்பு
திரும்பியவ்
உடன்
புதன்கிழமை 
மாலை 
தமிழ்
 எம்பிக்க்க்ள்
மற்றும் 
தமிழர் 
கட்சித் 
தலைவர்கள்
உடன்
அவர்
ஆலோசனை
செய்வார்
என்று
தெரிகிறது
.
Tr:
kozumpu
Tirumpiyav
utan                
puTankizamai 
mAlai 
Tamiz 
empikkaL
maRRum 
Tamizar 
katciT 
TalaivarkaL 
utan
avar
AlOcanai
ceyvAr
enRu
TerikiRaTu
.
Gloss:
Colombo
return
as  soon as                  
Wednesday
evening
Tamil
MPs
and
Tamil
party
leaders

he/she
discussion
do
that
seem
.
English: It seems that she will discuss with Tamil MPs and Tamil party leaders as soon as she returns to Colombo .

The Figures 4.8 and Figure 4.9 illustrate the usage of AuxC.

AuxG

The symbols other than comma and sentence boundaries are labeled with AuxG. The Figure 4.10 shows the usage of AuxG label.


An example for AuxG
Figure 4.10: Analytical Function - AuxG

Tamil:
ஆனால்
நாங்கள்  
(
காங்கிரஸ்

மத 
,
ஜாதி 
அரசியலை
கடுமையாக 
எதிர்க்கிறோம்
.
Tr:
AnAl
wAngkaL
(
kAngkiraS
)
  maTa
,
jATi
araciyalai 
katumaiyAka 
eTirkkiROm
.
Gloss:
But
we
(
Congress
)
 religious
caste politics vehemently 
oppose-we

English:
But we (Congress) vehemently oppose the religious and caste poltics .

AuxK

AuxK  is the sentence terminal punctuation, usually full stop at the end of a sentence will receive this label. There are also other symbols such as (:, ?, !), will also receive AuxK.

AuxP


AuxP is used to mark the postpositions (heads) of the postpositional phrases.  The postposition will recieve the AuxP label and the element attached to AuxP will receive the afun (AAdjn , AComp, Atr, Comp) according to their context of occurrence. The following figures show, how AuxP is labeled.

An example for AuxP
Figure 4.11: Analytical Function - AuxP

Tamil:
2001-ம்
ஆண்ட்
இலிருந்து 
ஆறு
ஆண்டுகளுக்கு  
பொருளாதாரம் 
மற்றும் 
புள்ளியியல் 
துறை 
சிறப்பு 
ஆணையர் 
ஆகப் 
பணியாற்ல்றினார்
.
Tr:
2001-m
ANt
iliruwTu
ARu
ANtukaLukku
poruLATAram
maRRum 
puLLiyiyal
TuRai
ciRappu 
ANaiyar
Akap
paNiyARRinAr
.
Gloss:
in 2001
year
from
six
years
economy
and
statistics
department 
special
commissioner
-
worked-he
.
English:
from 2001, (he) worked as special comminssioner of economy and statistics depeartment for six years.

An example for AuxP
Figure 4.12: Analytical Function - AuxP

Tamil:
வவுனியாவில்
உள்ள
முகாமை அவர்
செவ்வாய்க்கிழமை
பார்வையிட்டார் 
.
Tr:
vavuniyAvil
uLLa
mukAmai
avar
cevvAykkizamai
pArvaiyittAr
.
Gloss:
in vavunia
exist
camp
he/she
on Tuesday
visited
.
English:
On Tuesday, he/she visited the camp in the vavuniya .


AuxS

AuxS is used to label the technical root of a tree. In the Figure 4.12, the technical root (StaA) is the root of the tree structure . To this node, the predicate and the sentencing ending node will be attached .

AuxV

Auxiliary verbs are assigned the afun AuxV. In compound verb constructions, the auxiliary verb will be hanged under lexical verb.  All auxiliary words including passive constructions will receive tha afun AuxV
In the Figure 4.13, there are two auxiliary verbs (patu/'experience,' uL/'exist'). The auxiliary patu/'experience' and   uLLaTu/'be exist' are labeled with AuxV. The lexical verb will receive the label Pred if there is only one clause in the sentence, otherwise the label of the lexical verb will depend upon the upper clauses.

An example for AuxV
Figure 4.13: Analytical Function - AuxV

Tamil:
எம்பிக்களுக்கு
மேலும்
ரூ.
10  ஆயிரம்
ஊதிய  
உயர்வு
அறிவிக்கப் 
பட்ட் 
உள்ளது
.
Tr:
empikkaLukku  
mElum
rU.
10
Ayiram
UTiya  
uyarvu  
aRivikkap 
patt  
uLLaTu
.
Gloss:
to MPs
more
Rs.
10
thousand 
salary
increment
announced
AUX-PASS 
AUX-PERF 
.
English:
Rs. 10000 more salary increment has been announced.


AuxX

All commas except the commas acting as coordinators will receive the afun AuxX.

AuxZ


The clitics um, E, TAn, mattum will receive the afun AuxZ. The Figure 4.14 illustrates the labeling AuxZ.

An example for AuxZ
Figure 4.14: Analytical Function - AuxZ

Tamil:
சமீபத்தில்

லட்சம்
சிமெண்ட்
மூட்டைகளைய் 
உம்
இந்தியா
வழங்கியது
.
Tr:
camIpaTTil 
4
latcam 
cimeNt
mUttaikaLy
um
 iwTiyA
vazangkiyaTu
.
Gloss:
recently
4
lakhs
cement
packets
also 
India
provided
.
English:
Recently, India also provided 4 lakhs cement packets


CC


The label CC is used to mark in places where a word is part of another word.  The CC label of a word would indicate that the current word is part of the parent word. In Tamil, a single action verb can be split into multiple words. But when during annotation, only one word in the compound word will qualify for taking arguments, the other words will be marked with CC relation, meaning they are part of the parent word. For ex, the Figure 4.15 indicates that the Tamil verb (veRRipeRu/'to win') is splitted into 2 words - veRRi peRu. In these situations, the first word is taken as a parent and all other words including arguments of the verb are attached to the main part of the verb. It is still not clear, whether this decision is right, because the morphological marker of the verb will be added to the end part of the compound verb.


An example for CC
Figure 4.15: Analytical Function - CC

Tamil:
குஜராத்
உள்ளாட்சி  
தேர்தலில்
பெரும்பானமயான
இடங்களில்
பாஜக
வெற்றி
பெற்றுள்ளது
.
Tr:
kujarAt
uLLAtci
TErTalil
perumpAnmaiyAna 
itangkaLil 
pAjaka  veRRi
peRRuLLaTu
.
Gloss:
Gujarat
municipal
in the election
majority
palces
BJP.
victory
received
.
English:
In the Gujart municipal elections, BJP has won in majority of the places

At present, multiple words (not modifiers) making up a  single noun  are not handled.

Comp

Comp label is used to mark the obligatory element not attaching to verbs. For example, consider the phrase "1200kk um mERpatta poTumakkaL uyirizawT uLLanar/more than 1200 people have been died", in that phrase, 1200kk occurs as an obligatory argument to mERpatta. So, 1200kk will be labeled with Comp. Even nouns (not modifiers) which obligatorily attach to  other nouns are labeled with Comp afun. Other occurrences of Comp is when  postpositional phrase (PP phrase) attaches to a noun phrase. The postpositional head will receive AuxP label whereas the head noun phrase of the PP phrase will receive Comp label. The following figure will illustrate the example mentioned above.


An example for Comp
Figure 4.16: Analytical Function - Comp

Tamil:
ஆப்கனில்
,
2010
ஜனவரி
முதல்
ஜூன் 
மாதம்
வரை
மட்டும்
சுமார் 
1200க்க் 
உம் 
மேற்பட்ட 
பொதுமக்கள் 
உயிரிழந்த் 
உள்ளனர் 
.
Tr:
Apkanil
,
2010
janavari 
muTal 
jUn   mATam 
varai
mattum 
cumAr 
1200kk
um
mERpatta 
poTumakkaL 
uyirizawT 
uLLanar
.
Gloss:
In Afhgan

2010
January
from
June
month
to 
only
approximately 
1200

more than 
people
die
AUX-PERF

English:
In Afhganistan, more than 1200 people have been killed from 2010 January to June alone.

Coord

Coordination is one of the complex phenomena in Tamil. In Tamil, coordination conjunction can be performed using at least 2 difierent ways. In the first method (for and coordination), all conjoining elements will add the inclusive particle 'um'(also) at the end of the word form. Thus in this method, all conjoining elements possess the suffix (um) which would indicate the coordination is taking place. Moreover, the 'is_member'  attribute of the conjoining elements will be set to 1. The separator (comma) between elements is optional. It is perfectly legitimate if there is no comma between any of the conjoining elements.

Second method is similar to English and coordination. The conjunction word maRRum'(and) will be added between conjoining elements. If there are more than 2 elements,  then maRRum will be added just before the last conjunct. The other elements will be separated by comma. Again, the 'is_member'  attribute of the conjoining elements will be set to 1.

The 'or' coordination is performed in a similar way for both the methods. For the first method, the suffix 'O' will be added, and for the second method, the conjunction word 'allaTu'(or) will be added between the last 2 conjoining elements. The remaining elements will be separated using comma.  The Figure 4.17 illustrates the usage of the Coord.


An example for Coord
Figure 4.17: Analytical Function - Coord

Tamil:
கடந்த
சில  
மாதங்களுக்கு
விழிப்புப்
பணி
மற்றும் 
கண்காணிப்பு 
ஆணையர் 
ஆக 
நியமிக்கப்
பட்டார்
.
Tr:
katawTa
cila
mATangkaLukku 
vizippup 
paNi 
maRRum
kaNkANipp
ANaiyar
Aka
wiyamikkap 
pattAr
.
Gloss:
past
few
months
vigilance 
 work
 and
supervision
commissioner
as
appointed
AUX-PASS
.
English:
For the past few months, (he/she) has been appointed as vigilance work and commissioner of supervision .

Obj

Direct and indirect objects receive the afun Obj. The Figure 4.18 also has the direct object 'a new law' which is labeled as Obj.


An example for Obj
Figure 4.18: Analytical Function - Obj

Tamil:
அமெரிக்க
அதிபர் 
ஒபாமாவின் 
இந்தியப்  பயணம் 
மிகுந்த 
எதிர்பார்ப்பை 
ஏற்படுத்திய்
உள்ளது
.
Tr:
amerikka  
aTipar
opAmAvin
iwTiyap
payaNam 
mikuwTa 
eTirpArppai
ERpatuTTiy 
uLLaTu
.
Gloss:
American
president  
Obama's
India
visit
a lot of
expections
created
AUX-PERF
.
English:
American president Obama's India visit created a lot of expectations .



Pnom

Nominal predicate occurs in the copula (be) constructions or verbless constructions. In these constructions, the sentence will not have any lexical verb. Instead, the predicate will contain only noun phrase. The noun phrase will be the predicate, and the afun label Pnom will be assigned to the noun phrase. The Figure 4.19 and the following example shows the usage of Pnom.

An example for afun Pnom
Figure 4.19: Analytical Function - Pnom

Tamil:
இது
மரியாதை
நிமித்தம்
ஆன
சந்திப்பு
.
Tr:
iTu
mariyATai
wimiTTam
Ana
cawTippu
.
Gloss:
This 
courtesy


call
.
English:
This is a courtesy call .




Pred

Predicate of the main clause will be given Pred. Only fnite verbs in Tamil can be the predicate of the sentence. In Tamil, fnite verbs at the end of the sentences are main predicates. So they receive Pred afun. In some cases, finite verbs will be absent at the end of the sentence. In that case, if there is a nominal predicate, it will receive Pnom afun or there won't be any Pred for that sentence.

An example for Pred
Figure 4.20: Analytial Function - Pred

Tamil:
பன்னாட்டு
தொழில்  
நிறுவனங்களில் 
தொழிற்சங்க 
உரிமைகள்
மறுக்கப் 
படுகின்றன
.
Tr:
pannAttu
Tozil
wiRuvanangkaLil
ToziRcangka 
urimaikaL 
maRukkap 
patukinrana
.
Gloss:
multinational
business 
in industries
union
rights denied
AUX-PASS
.
English:
In multinational business industries, union rights are denied .


         

Sb

The label Sb is assigned to the subject of the sentence. If there are more than one subjects in the sentence (i.e in the case of multiple clauses), then the label Sb will be assigned to all of them. The Figure 4.21 shows the example  usage of Sb.

An example for Sb
Figure 4.21: Analytical Function - Sb

Tamil:
தொழிலாளர்கள்
பழிவாங்கப் 
படுகின்றனர்
.
Tr:
TozilALarkaL
pazivAngkap
patukinRanar
.
Gloss:
employees
are revenged
AUX-PASS
.
English:
The employees are revenged .

References


1.
Thomas Lehmann, A Grammar of Modern Tamil, Pondicherry Institute of Linguistics and Culture (PILC), 1989.