Language Technologies for Research in Humanities


NPFL131 / ATKL00349

Pavel Straňák

stranak@ufal.mff.cuni.cz

Friday 12:30–14:00
Palachovo nám. 2, room S131

28. 4. 2023

NLP Applications and tools

Analysis of text

  1. sentence segmentation
  2. tokenisation
  3. stemming, POS tagging, lematisation a morfological analysis
  4. (surface) syntactic parser, chunker (= identification of clauses)
  5. deep syntax (e.g. in Treex via modifying the surface parse tree)
  6. other units (on various layers of description):
    • named entity recognition (NER): person, place, date, …
    • coreference (pronominal, nominal)
    • time relations (X immediately after Y and together with Z, etc.)
    • Word Sense Disambiguation (see t-lemma)

Complex analysis: A chain of many steps. Or a joint problem for a statistical system that tries to learn all of them together. (It makes sense, they influence each other more in a complex way in reality. Not realy a simple chain.)

NLP Toolkits on the web

LINDAT Tools

Services run at LINDAT/CLARIN as simple web applications. Each app has a clickable GUI (a HTML web form) and a REST API.

NLTK

Homework

Play with curl tool and examples of using it to process data in UDPipe and other LINDAT tools. See “REST API Documentation” of the tools in the links above.