Tuesday, 12 September, 2023 - 15:00

Toward Natural Metalanguage Processing

People don't just talk with natural language: sometimes, they talk about it. A wealth of knowledge about words, grammar, and meaning is communicated metalinguistically - whether it's through dictionaries, language learning resources, scholarly works in linguistics and literature, or social/political/legal discourse. Are current NLP models fluent in metalanguage, and can they provide accurate metalinguistic explanations? I will present case studies looking at two metalinguistically rich genres: (i) online language discussion forums, and (ii) judicial rulings involving language interpretation. We find that large language models can largely categorize kinds of metalanguage, and can generate satisfactory answers to some (but not all) metalinguistic questions. (Joint work with Shabnam Behzad, Michael Kranzlein, Keisuke Sakaguchi, Kevin Tobia, and Amir Zeldes.)


Nathan Schneider is an annotation schemer and computational modeler for natural language. As Associate Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage linguistic analysis: designing linguistic representations of grammar and meaning, annotating them in corpora, and automating them with natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. Among his favorite acronyms are AMR, CCG, CxG, GUCL, SNACS, and UD. He is an NSF CAREER award recipient and has served the computational linguistics community in various ways, having chaired SemEval, the Linguistic Annotation Workshop, and GURT/SyntaxFest; served on the board of SIGLEX; and served as an action editor for TACL and ARR. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya (https://en.wikipedia.org/wiki/Hoya_Saxa) and leader of NERT (http://nert.georgetown.edu/), he continues to play with data and algorithms for linguistic meaning.