Dependency Grammars and Treebanks
Lectures: Markéta Lopatková, Daniel Zeman
-
Wed, room S1, 15:40-17:10
Practical sessions: Jiří Mírovský, Daniel Zeman
Remote classes from March 11, 2020 - please study the teaching material provided below.
Zoom on-line classes from March 18 on (Wednesday 15:40): https://matfyz.zoom.us/j/501653775
I will do my best to provide slides and additional reading each Tuesday afternoon. In case of your interest, I am available for individual consultations, preferably in the time slot of the lecture (i.e., Wednesday afternoon). Please contact me in advance by email.
Lectures
-
Lecture 1 (February 19, 2020): Introduction, trees, word order, projectivity (pdf);
-
reading:
-
Kuhlmann, M., Nivre, J. (2006): Mildly Non-Projective Dependency Structures. In COLING/ACL Main Conference Poster Sessions, 507–514 (link).
-
Havelka, J. (2007): Mathematical Properties of Dependency Trees and their Application to Natural Language Syntax. PhD Thesis, MFF UK (link)
-
Lecture 2 (February 26, 2020): A bit of history; Dependency and Non-dependency relations (pdf)
-
reading:
-
Osborne, T. (2019) A Dependency Grammar of English. John Benjamins Publishing Company, Amsterdam/Philadelphia (available in my office)
-
Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
-
Štěpánek, J. (2006) Závislostní zachycení větné struktury v anotovaném syntaktickém korpusu. PhD Thesis, MFF UK (link)
-
Wikipedia - basic articles on dependency grammar are consistent with Timothy Osborne's approach
-
Lecture 3 (March 4, 2020): Intro to a stratificational language description (pdf)
-
reading:
-
Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
-
Štekauer, P., ed. (2000) Rudiments of English Linguistics.Slovacontact, Prešov.
-
Sgall, P. (1967) Generativní popis jazyka a česká deklinace. Academia, Praha (available in my office)
-
Žabokrtský, Z. (2006) Resemblances between Meaning Û Text Theory and Functional Generative Description. In Proceedings of the 2nd International Conference of Meaning-Text Theory, Slavic Culture Languages Publishers House, Moskva, pp. 549-557. (link)
-
https://www.britannica.com/science/linguistics/Stratificational-grammar
-
Lecture 4 (March 11, 2020): TOPIC 1: Prague Dependency Treebank: Intro (pdf)
-
reading:
-
Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
-
PDT guide: http://ufal.mff.cuni.cz/pdt2.0
-
documentation (see individaul corpora)
-
TOPIC 2: PDT: morphological annotation (pdf)
-
note: You are not supposed to memorize the tag structure but you might be ask to provide examples (using the following table pdf);
-
reading:
-
Matthews, H. (1997) The Concise Oxford Dictionary of Linguistics. Oxford University Press, Oxford
-
Filipec, J. (1994) Lexicology and Lexicography: Development and State of the Research. In Luelsdorff, P.A. (ed.) The Prague School of Structural and Functional Linguistics, Amsterdam-Philadelphia, John Benjamins, p.163–183
-
Hajič, J. (2004) Disambiguation of Rich Inflection (Computational Morphology of Czech). Karolinum, Charles Univeristy Press, Prague.
-
-
-
-
Straková Jana, Straka Milan and Hajič Jan. (2014) Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 13-18, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
-
-
-
Lecture 5 (March 18, 2020): Intro to UD, morphology (pdf, video)
-
reading:
-
Nivre Joakim et al. (2020) Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. To appear in: Proceedings of LREC 2020.
-
https://universaldependencies.org/
-
Lecture 6 (March 25, 2020): Surface syntactic annotation in PDT (a-layer) (pdf)
-
reading:
-
Hajič, J. (1998) Building a Syntactically Annotated Corpus: The Prague Dependency Treebank. In E. Hajičová (ed.): Issues of Valency and Meaning. Studies in Honour of Jarmila Panevová, Karolinum, Charles University Press, Prague, Republic, pp. 106-132 (link)
-
Štekauer, P., ed. (2000) Rudiments of English Linguistics.Slovacontact, Prešov (chapter 4, Syntax)
-
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J. (1985) A Comprehensive Grammar of the English Language, Longman, London.
-
PDT documentation: Manual for Analytical Annotation (link)
-
Table with analytical functions in PDT 2.0 (pdf)
-
Lectures 7 and 8 (April 1–8, 2020): Syntax in UD (pdf, video1, video2)
-
reading:
-
Nivre Joakim et al. (2020) Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. To appear in: Proceedings of LREC 2020.
-
https://universaldependencies.org/
-
Lecture 9 (April 15, 2020): Enhanced dependencies in UD (pdf, video)
-
reading:
-
Schuster, S., Manning, C. (2016) Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks. In: Proceedings of LREC 2016, Portorož, Slovenia (pdf)
-
Droganova, K., Zeman, D. (2019) Towards Deep Universal Dependencies. In: Proceedings of Depling/Syntaxfest 2019, Paris, France (pdf)
-
https://universaldependencies.org/u/overview/enhanced-syntax.html
-
Lecture 10a (April 22, 2020): PropBank (pdf)
-
reading:
-
Kingsbury, P., Palmer, M. (2002) From Treebank to PropBank. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, Spain (pdf)
-
Palmer, M., Gildea, D., Kingsbury, P. (2005) The Proposition Bank: A Corpus Annotated with Semantic Roles. Computational Linguistics, 31:1 (pdf)
-
https://propbank.github.io/ (main project page)
-
http://verbs.colorado.edu/propbank/framesets-english-aliases/ (verb & frame dictionary)
-
https://github.com/propbank/propbank-documentation/blob/master/annotation-guidelines/Propbank-Annotation-Guidelines.pdf (annotation guidelines)
-
Lecture 10b (April 22, 2020): PDT: t-layer (Intro) (pdf)
-
reading:
-
Hajič, J, Hajičová, E., Mikulová, M., Mírovský, J.: Prague Dependency Treebank. Chapter in Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation, Springer, Berlin, pp. 555-594, 2017
-
Manual for Tectogrammatical Annotation (PDT 2.0) (link - not recommended :-) and its shortened version (link - recommended)
-
Cinková, S., Toman, J., Hajič, J. et al.: Tectogrammatical Annotation of the Wall Street Journal. The Prague Bulletin of Mathematical Linguistics 92, pp. 85-104, 2009 (link)
-
PDT 3.5 (https://ufal.mff.cuni.cz/pdt3.5)
-
PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
-
Table with t-nodes attributes in PDT 2.0 (pdf)
-
Lecture 11 (April 29, 2020): PDT: t-layer (valency as a keystone for syntactic structure) (pdf)
-
reading:
-
Manual for Tectogrammatical Annotation (PDT 2.0) (link - not recommended :-) and its shortened version (link - recommended)
-
Fillmore, C.J. (1968) The Case for Case. In (Bach, E., Harms, R.T., eds.)
Universals in Linguistic Theory, Holt, Rinehart and Winston, p. 1-88 (
link)
-
Sgall, P., Panevová, J., Hajičová, E. (2004) Deep Syntactic Annotation: Tectogrammatical Annotation and Beyond. In A. Meyers (ed.)
Proceedings of the HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation, ACL, Boston, Massachusetts, USA, pp. 32-38 (
link)
-
Hajič, J. et al (2003) PDT-VALLEX: Creating a Large-coverage Valency Lexicon for Treebank Annotation. In
Proceedings of The Second Workshop on Treebanks and Linguistic Theories, Vaxjo University Press, Vaxjo, Sweden, p. 57-68 (
link)
-
PDT 3.5 (https://ufal.mff.cuni.cz/pdt3.5)
-
PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
-
FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/)
-
additional reading:
-
Tesnière, L. (1959) Éléments de syntaxe structurale. Paris: Klincksieck (Elements of Structural Syntax,, translated by Osborne, T. and Kahane, S., Amsterdam/Philadelphia. John Benjamins, 2015) - Open access: https://benjamins.com/catalog/z.185
-
Panevová, J. (1994) Valency Frames and the Meaning of the Sentence, In Luelsfdorff, P. A. (ed.) The Prague School of Structural and Functional Linguistics, Amsterdam, Philadelphia, John Benjamins Publishing Company, p. 223-243
-
Lecture 12 (May 6, 2020): PDT: t-layer (t-lemma, grammatemes), esp. for Czech (pdf)
-
reading:
-
Razímová, M., Žabokrtský, Z. (2006) Annotation of Grammatemes in the Prague Dependency Treebank 2.0. In Proceedings of Annotation Science Workshop, LREC 2006, ELRA, Genova, Italy, pp. 12-19 (link)
-
Ševčíková Razímová, M., Žabokrtský, Z. (2006) Systematic Parametrized Description of Pro-forms in the Prague Dependency Treebank 2.0. In Proceedings of TLT 2006, ÚFAL MFF UK, Praha, Czechia, pp. 175-186 (link)
-
Manual for Tectogrammatical Annotation (PDT 2.0) (link) (not covered in the shortened version)
-
Lecture 13 (May 13, 2020): PCEDT: t-layer for English (speaker: dr. Cinková) (pdf)
-
reading:
-
Cinková, S., Toman, J., Hajič, J. et al.: Tectogrammatical Annotation of the Wall Street Journal. The Prague Bulletin of Mathematical Linguistics 92, pp. 85-104, 2009 (link)
-
Annotation of English on thetectogrammatical level - Reference book ((link, i.e. shortened version of the Manual for Tectogrammatical Annotation)
-
PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
-
Lecture 14 (May 20, 2020; later date is possible upon request by email): Exam - Final Test (distant form)
Anybody is allowed to take the test without completing homeworks; however, the final grade will be registered in SIS only after completing them.
Note: 40% of the total score for the final test is necessary for passing!
Useful Links and Other Materials
-
Table with Czech positional morphological tags (pdf);
-
Table with analytical functions in PDT 2.0 (pdf);
-
Table with T-nodes attributes in PDT 2.0 (pdf);
-
PDT 2.0 Guide or here pdf
-
PDT documentation (PDT 3.0, 2.0)
-
Universal Dependencies (link)
Practical (lab) sessions
(click here)
Homeworks
-
All homeworks must be committed into the
https://svn.ms.mff.cuni.cz/svn/undergrads/students
svn repository; do not send your homeworks by e-mail.
-
Submit your work into your personal directories.
-
There is an explicit deadline for submitting each homework - Wednesday before midnight.
-
If the deadline is not met, ask for additional homework. All homeworks must be submitted in order to get the credit (zápočet).
-
You can solve an additional homework even if you submitted the normal homework in time (i.e., you can improve your average by solving some of the additional homeworks).
-
You have to e-mail us to confirm your additional homework is ready to be rated. All additional homeworks must be submitted at least one week before the credit.
-
Each student is supposed to create all homework solutions himself/herself; any cheating will be penalised (but you can send us an e-mail if you are stuck).
Final grade
-
Homework (40%)
-
Activity (10%)
-
Final test (50%) ... Note: 40% of the total score for the final test is necessary for passing!
-
Excellent: >= 90 %
-
Very good: >= 70 %
-
Good: >= 50 %
Archive