SIS code: 
Semester: 
summer
E-credits: 
summer s.:6
Examination: 
2/2 C+Ex

 

Dependency Grammars and Treebanks

Lectures: Markéta Lopatková, Daniel Zeman

  • Wed, room S1, 15:40-17:10

Practical sessions: Jiří Mírovský, Daniel Zeman

  • Fri, SU1, 12:20-13:50

 

Remote classes from March 11, 2020 - please study the teaching material provided below.

Zoom on-line classes from March 18 on (Wednesday 15:40): https://matfyz.zoom.us/j/501653775

I will do my best to provide slides and additional reading each Tuesday afternoon. In case of your interest, I am available for individual consultations, preferably in the time slot of the lecture (i.e., Wednesday afternoon). Please contact me in advance by email.

Lectures

  • Lecture 1 (February 19, 2020): Introduction, trees, word order, projectivity (pdf);
    • reading:
      • Kuhlmann, M., Nivre, J. (2006): Mildly Non-Projective Dependency Structures. In COLING/ACL Main Conference Poster Sessions, 507–514 (link).
      • Havelka, J. (2007): Mathematical Properties of Dependency Trees and their Application to Natural Language Syntax. PhD Thesis, MFF UK (link)
  • Lecture 2 (February 26, 2020): A bit of history; Dependency and Non-dependency relations (pdf)
    • reading:
      • Osborne, T. (2019) A Dependency Grammar of English. John Benjamins Publishing Company, Amsterdam/Philadelphia (available in my office)
      • Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a   počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
      • Štěpánek, J. (2006) Závislostní zachycení větné struktury v anotovaném syntaktickém korpusu. PhD Thesis, MFF UK (link)
      • Wikipedia - basic articles on dependency grammar are consistent with Timothy Osborne's approach
  • Lecture 3 (March 4, 2020): Intro to a stratificational language description (pdf)
    • reading:
      • Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a   počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
      • Štekauer, P., ed. (2000) Rudiments of English Linguistics.Slovacontact, Prešov.
      • Sgall, P. (1967) Generativní popis jazyka a česká deklinace. Academia, Praha (available in my office)
      • Žabokrtský, Z. (2006) Resemblances between Meaning Û Text Theory and    Functional Generative Description. In Proceedings of the 2nd International   Conference of Meaning-Text Theory, Slavic Culture Languages Publishers House, Moskva, pp. 549-557. (link)
      • https://www.britannica.com/science/linguistics/Stratificational-grammar
  • Lecture 4 (March 11, 2020): TOPIC 1: Prague Dependency Treebank: Intro (pdf)
    • reading:
      • Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a   počítačové lingvistiky, sv. I. Karolinum, Praha (available in the secretariat)
      • PDT guide: http://ufal.mff.cuni.cz/pdt2.0
      • documentation (see individaul corpora)
    • TOPIC 2: PDT: morphological annotation (pdf)
      • note: You are not supposed to memorize the tag structure but you might be ask to provide examples (using the following table pdf);
    • reading:
  • Lecture 5 (March 18, 2020): Intro to UD, morphology (pdf, video)
    • reading:
      • Nivre Joakim et al. (2020) Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. To appear in: Proceedings of LREC 2020.
      • https://universaldependencies.org/
  • Lecture 6 (March 25, 2020): Surface syntactic annotation in PDT (a-layer) (pdf)
    • reading:
      • Hajič, J. (1998) Building a Syntactically Annotated Corpus: The Prague Dependency Treebank. In E. Hajičová (ed.): Issues of Valency and Meaning. Studies in Honour of Jarmila Panevová, Karolinum, Charles University Press, Prague, Republic, pp. 106-132 (link)
      • Štekauer, P., ed. (2000) Rudiments of English Linguistics.Slovacontact, Prešov (chapter 4, Syntax)
      • Quirk, R., Greenbaum, S., Leech, G., Svartvik, J. (1985) A Comprehensive Grammar of the English Language, Longman, London.
      • PDT documentation: Manual for Analytical Annotation (link)
      • Table with analytical functions in PDT 2.0 (pdf)
  • Lectures 7 and 8 (April 1–8, 2020): Syntax in UD (pdfvideo1video2)
    • reading:
      • Nivre Joakim et al. (2020) Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. To appear in: Proceedings of LREC 2020.
      • https://universaldependencies.org/
  • Lecture 9 (April 15, 2020): Enhanced dependencies in UD (pdfvideo)
    • reading:
      • Schuster, S., Manning, C. (2016) Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks. In: Proceedings of LREC 2016, Portorož, Slovenia (pdf)
      • Droganova, K., Zeman, D. (2019) Towards Deep Universal Dependencies. In: Proceedings of Depling/Syntaxfest 2019, Paris, France (pdf)
      • https://universaldependencies.org/u/overview/enhanced-syntax.html
  • Lecture 10a (April 22, 2020): PropBank (pdf)
  • Lecture 10b (April 22, 2020): PDT: t-layer (Intro) (pdf)
    • reading:
      • Hajič, J, Hajičová, E., Mikulová, M., Mírovský, J.: Prague Dependency Treebank. Chapter in Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation, Springer, Berlin, pp. 555-594, 2017
      • Manual for Tectogrammatical Annotation (PDT 2.0) (link - not recommended :-) and its shortened version (link - recommended)
      • Cinková, S., Toman, J., Hajič, J. et al.: Tectogrammatical Annotation of the Wall Street Journal. The Prague Bulletin of Mathematical Linguistics 92, pp. 85-104, 2009 (link)
      • PDT 3.5 (https://ufal.mff.cuni.cz/pdt3.5)
      • PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
      • Table with t-nodes attributes in PDT 2.0 (pdf)
  • Lecture 11 (April 29, 2020): PDT: t-layer (valency as a keystone for syntactic structure) (pdf)
    • reading:
      • Manual for Tectogrammatical Annotation (PDT 2.0) (link - not recommended :-) and its shortened version (link - recommended)
      • Fillmore, C.J. (1968) The Case for Case. In (Bach, E., Harms, R.T., eds.) Universals in Linguistic Theory, Holt, Rinehart and Winston, p. 1-88 (link)
      • Sgall, P., Panevová, J., Hajičová, E. (2004)  Deep Syntactic Annotation: Tectogrammatical Annotation and Beyond. In A. Meyers (ed.) Proceedings of the HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation, ACL, Boston, Massachusetts, USA, pp. 32-38 (link)
      • Hajič, J. et al (2003) PDT-VALLEX: Creating a Large-coverage Valency Lexicon for Treebank Annotation. In Proceedings of The Second Workshop on Treebanks and Linguistic Theories, Vaxjo University Press, Vaxjo, Sweden, p. 57-68 (link)
      • PDT 3.5 (https://ufal.mff.cuni.cz/pdt3.5)
      • PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
      • FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/)
    • additional reading:
      • Tesnière, L. (1959) Éléments de syntaxe structurale. Paris: Klincksieck (Elements of Structural Syntax,, translated by Osborne, T. and Kahane, S., Amsterdam/Philadelphia. John Benjamins, 2015) - Open access: https://benjamins.com/catalog/z.185
      • Panevová, J. (1994) Valency Frames and the Meaning of the Sentence, In Luelsfdorff, P. A. (ed.) The Prague School of Structural and Functional Linguistics, Amsterdam, Philadelphia, John Benjamins Publishing Company, p. 223-243
  • Lecture 12 (May 6, 2020): PDT: t-layer (t-lemma, grammatemes), esp. for Czech (pdf)
    • reading:
      • Razímová, M., Žabokrtský, Z. (2006) Annotation of Grammatemes in the Prague Dependency Treebank 2.0. In Proceedings of Annotation Science Workshop, LREC 2006, ELRA, Genova, Italy, pp. 12-19 (link)
      • Ševčíková Razímová, M., Žabokrtský, Z. (2006) Systematic Parametrized Description of Pro-forms in the Prague Dependency Treebank 2.0. In Proceedings of TLT 2006, ÚFAL MFF UK, Praha, Czechia, pp. 175-186 (link)
      • Manual for Tectogrammatical Annotation (PDT 2.0) (link) (not covered in the shortened version)
  • Lecture 13 (May 13, 2020): PCEDT: t-layer for English (speaker: dr. Cinková) (pdf)
    • reading:
      • Cinková, S., Toman, J., Hajič, J. et al.: Tectogrammatical Annotation of the Wall Street Journal. The Prague Bulletin of Mathematical Linguistics 92, pp. 85-104, 2009 (link)
      • Annotation of English on thetectogrammatical level - Reference book  ((link, i.e. shortened version of the Manual for Tectogrammatical Annotation)
      • PCEDT 2.0 (http://ufal.mff.cuni.cz/pcedt2.0/)
  • Lecture 14 (May 20, 2020; later date is possible upon request by email): Exam - Final Test (distant form)

Anybody is allowed to take the test without completing homeworks; however, the final grade will be registered in SIS only after completing them.

Note: 40% of the total score for the final test is necessary for passing!

Useful Links and Other Materials

  • Table with Czech positional morphological tags (pdf);
  • Table with analytical functions in PDT 2.0 (pdf);
  • Table with T-nodes attributes in PDT 2.0 (pdf);
  • PDT 2.0 Guide or here pdf
  • PDT documentation (PDT 3.0, 2.0)
  • Universal Dependencies (link)

 

Practical (lab) sessions

(click here)

Homeworks

  • All homeworks must be committed into the https://svn.ms.mff.cuni.cz/svn/undergrads/students svn repository; do not send your homeworks by e-mail.
  • Submit your work into your personal directories.
  • There is an explicit deadline for submitting each homework - Wednesday before midnight.
  • If the deadline is not met, ask for additional homework. All homeworks must be submitted in order to get the credit (zápočet).
  • You can solve an additional homework even if you submitted the normal homework in time (i.e., you can improve your average by solving some of the additional homeworks).
  • You have to e-mail us to confirm your additional homework is ready to be rated. All additional homeworks must be submitted at least one week before the credit.
  • Each student is supposed to create all homework solutions himself/herself; any cheating will be penalised (but you can send us an e-mail if you are stuck).

Final grade

  • Homework (40%)
  • Activity (10%)
  • Final test (50%) ... Note: 40% of the total score for the final test is necessary for passing!
  • Excellent: >= 90 %
  • Very good: >= 70 %
  • Good: >= 50 %

Archive