University of Birmingham

Navigation Section

Proceedings from Corpus Linguistics 2005

This is a collection of the papers presented at the Corpus Linguistics 2005 conference which was held in Birmingham July 14-17 2005. Some of the papers are either as Word documents or as PDF files.

The proceedings have been divided into 11 subcategories:     

Compiling a Corpus

Contrastive Corpus Linguistics

Discourse

Evaluation and Stance

Grammar

Language Learning & Error Analysis through Corpora

Language Processing & Corpus Tool

The Lexicon

Phraseology & Patterns in language

The Web as  a corpus

Spoken Discourse

  

Compiling a corpus

Rachel Aires, Diana Santos & Sandra Aluisio: "Yes, user!": compiling a corpus according to what the user wants  
        See "Yes, user!".doc

Latifa Al-Sulaiti and Eric Atwell: Extending the Corpus of Contemporary Arabic  
See Extending the Corpus of Contemporary Arabic.doc

Wendy Anderson & Dave Beavan: Internet delivery of time-synchronised multimedia: the SCOTS Projects
See Traditional transcriptions.doc

Caroline Barri�re & Akakpo Agbago: Corpus Construction for  Terminology
See Terminology.doc  

Sara Piccioni: The Lorca corpus at the crossroads of philology and corpus linguistics  
See The Lorca corpus at the crossroads of philology and corpus linguistics.doc

Gong Wengao: English in computer-mediated environments: a neglected dimension in large English corpus compilation
See English in computer-mediated environments.pdf

Hilary Nesi, Sheena Gardner, Richard Forsyth, Dawn Hindle, Paul Wickens, Signe Ebeling, Maria Leedham, Paul Thompson and Alois Heuboeck: Towards the compilation of a corpus of assessed student writing 
See Towards the compilation of a corpus.doc

Contrastive Corpus Linguistics

Gisle Andersen: Assessing algorithms for automatic extraction of anglicisms in Norwegian texts
            See Assessing algorithms.doc

J�zsef Andor: A Lexical Semantic-Pragmatic Analysis of the Meaning Potentials of Amplifying Prefixes in English and Hungarian A Corpus-based Case Study of Near Synonymy
            See A Corpus-based Case Study of Near Synonymy.doc

Sandrelli Annalisa & Bendazzoli Claudio: Lexical patterns in simultaneous interpreting: a preliminary investigation of EPIC (European Parliament Interpreting Corpus)
See Lexical patterns in simultaneous interpreting: a preliminary investigation of EPIC.doc

Marianna Apidianaki: Translation prediction using word co-occurrence graphs 
See Translation prediction using word cooccurrence graphs

Tatjana Bala�ic Bulc: Connectors in students' academic writing in two closely related languages
See Connectors in students' academic writing in two closely related languages.doc

Silvia Bernardini & Marco Baroni: Spotting translationese: A corpus-driven approach using support vector machines
See Spotting translationese.doc   

Gabriela Castelo Branco Ribeiro & Maria Carmelita Padua Dias: Two corpus-based studies about the translation of adjectives in English and Brazilian Portuguese  

Wallace Chen: Patterns of Connectors in the English-Chinese Parallel Corpus of Popular Science Texts  

Debbie Elliott:  Using corpora to automatically detect untranslated and �outrageous� words in machine translation output  

Ana Frankenberg-Garcia: A corpus-based study of loan words in original and translated texts
See A corpus-based study of loan words.doc

Randall L. Jones : Analysis of lexical correspondence in an English-German parallel corpus  

Zhenglin Jin & Caroline Barriere: Exploring sentence variations in bilingual corpora  
See Exploring sentence variations with bilingual corpora.doc

Tony McEnery and Richard Xiao: Passive constructions in English and Chinese: A contrastive and translation study  
See Passive constructions in English and Chinese.doc

Stella Neumann and Silvia Hansen-Schirra : The CroCo Project: Crosslinguistic corpora for the investigation of explicitation in translations
See The CroCo Project.pdf

Pablo Romero Fresco:  The translation of phraseology in a parallel (English-Spanish) audiovisual corpus.
See The translation of phraseology in a parallel.doc  

Doaa A. Samy: Named Entities: Structure and Translation. A Study Based on a Parallel Corpus (Arabic-Spanish-English)  
See Named Entities.doc

Tam�s V�radi: Taking stock of the Bilingual Lexicon  
See Taking Stock of the Bilingual Lexicon.doc

Discourse

  Nadine Aldinger: Corpus-driven genitive disambiguation
See Corpus-driven genitive disambiguation.doc

Minhee Bang: Representation of foreign countries in two US newspapers: premodifications of keywords, countries, country, nations and nation  
See Representation of foreign countries in two US newspapers.doc

Michael  Barlow: Input grammars and output grammars: Investigating the language of individual speakers Christian Chiarcos  & Olga Krasavina: Rhetorical Distance Revisited: A pilot study
See Rhetorical Distance Revisited.doc  

Huaqing Hong: SCORE: A Multimodal Corpus Database of Education Discourse in Singapore Schools
See Scope.pdf

Henk Louw: Really Too Very Much: Adverbial Intensifiers in Black South African English  
See REALLY TOO VERY MUCH.doc

Ling Yin & Richard Power: Investigation of the structure of topic expressions: a corpus-based approach  
See Investigation of the Structure of Topic Expressions.doc

Massimo Poesio & Ron Artstein: Annotating (anaphoric) ambiguity
See Annotating (Anaphoric) Ambiguity.pdf

Evaluation and Stance

Monika A. Bednarek: "He's nice but Tim" -- contrastive evaluation in the British press
        See 'He's nice but Tim': contrast in British newspaper discourse.doc

Sara Radighieri: Arts in the news: Evaluative language use  in the 'arts review'
See Arts in the news.doc

Grammar

Solveig Granath & Michael Wherrity:  Prepositions with that-clause complements in tagged corpora, with a special focus on in that  
        See Prepositions with that-clause complements in tagged corpora.doc

Vladimir Petkevic &  Frantisek Cermak:Linguistically motivated tagging as the base for a corpus-based grammar
See Linguistically Motivated Tagging as a Base for a Corpus-Based Grammar.doc

Simone Sarmento: Distribution of Modal Verbs in an Aviation Corpus  
See Distribution of Modal Verbs in an Aviation Corpus.doc

Chris Shei: Analysing Chinese Sentence-final Particles Using Academia Sinica Balanced Corpus of Modern Chinese
See Analysing Chinese Sentence.doc   

Seo-in Shin: Automatic Pattern Extraction for Korean Sentence Parsing  
See Automatic Pattern Extraction for Korean Sentence Parsing.doc

Language Learning & Error Analysis through Corpora

Mariko Abe and Yukio Tono: Variations in L2 spoken and written English: investigating patterns of grammatical errors a cross proficiency levels
See Variations in L2 spoken and written English.doc  

Mar�a Bel�n D�ez Bedmar-Struggling with English at University level: error patterns and problematic areas of first-year students� interlanguage  
See Bedmar Uni English.doc

Xiaotian Guo: Modal Auxiliaries in Phraseology: A Contrastive Study of learner English and NS English
See A Contrastive Study of Learner English and NS English.doc

Anke L�deling, Peter Adolphs, Emil Kroymann & Maik Walter: Multi-level error annotation in learner corpora  
See Multi-level error annotation in learner corpora.doc

Zhang Yang: College English Course Corpus

Language Processing & Corpus Tool

 Sabine Bartsch, Elke Teich, Monica Holtz & Richard Eckart: Corpus-based register profiling of texts from mechanical engineering
See Corpus-based register profiling of texts.pdf

Anja Belz: Corpus-driven Generation of weather Forecasts 
See Corpus-driven Generation of weather Forecasts.pdf

Pernilla Danielsson & Andrew Sayers: Enhancing Concordance Method: Introducing the CHAB  

Stefan Evert & Manuela Schonenberger : Separating the sheep from the goats: Clarifying corpus content using XML
        See Separating the sheep from the goats.doc

David Hardcastle: Using the distributional hypothesis to derived co-occurrence scores from the British National Corpus
        See Using the distributional hypothesis.doc

Laura L�fberg Scott Piao, Asko Nykanen, Krista Varantola, Paul Rayson and Jukka-Pekka Juntunen: A semantic tagger for the Finnish language  
See A semantic tagger for the Finnish language.doc

Yuji Matsumoto, Masayuki Asahara, Kou Kawabe, Yurika Takashi, Yukio Tono, Akira Ohtani and Toshio Morita: ChaKi: An Annotated Corpora Management and Search System  
See ChaKi.doc

D�bora Oliveira, Diana Santos, Luis Sarmento & Belinda Maia: Corpus analysis for indexing: when corpus-based terminology makes a difference  
See Corpus analysis for indexing.doc

Shih-Ping Wang: Integrating corpora and word-focused tasks into a linguistics project for word growth  
See Integrating corpora and word-focused tasks into a linguistics project.doc

Maria ZIMINA- Bi-text topography and quantitative approaches of parallel text processing  
See Bi-text Topography and Quantitative Approaches.doc

Eros Zanchetta and Marco Baroni: Morph-it! A free corpus-based morphological resource for the Italian language
See Morph-it!.doc

  

The Lexicon

  Antti Arppe: The role of morphological features in distinguishing semantically similar words  
See The role of morphological features in distinguishing semantically similar words.doc

J�rg Asmussen: Automatic determination of new words within domain-specific vocabularies using document classification and frequency profiling  
See Automatic detection of new domain-specific words.

Marco Baroni & Stefan Evert: Testing the extrapolation quality of word frequency models  
See Testing the extrapolation quality.pdf

Dr Paul Doyle: Replicating Corpus-Based Linguistics: Investigating Lexical Networks in Text
See Replication and Corpus Linguistics.pdf

Cvetana Krstev & Dusko Vitas : Corpus and Lexicon � Mutual In-completeness  
See Corpus and Lexicon.doc

Jennifer Pedler: Using semantic associations for the detection of real-word spelling errors
See Using semantic associations for the detection of real-word spelling errors.doc  

Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson: A Large Semantic Lexicon for Corpus Annotation
See A Large Semantic Lexicon for Corpus Annotation.doc

Elisabete Marques Ranchhod: Using Corpora to Increase Portuguese MWE Dictionaries. Tagging MWE in a Portuguese Corpus.
See Using Corpora to Increase Portuguese MWE Dictionaries.pdf

Sofie Van Gijsel, Dirk Speelman & Dirk Geeraerts: A Variationist, Corpus Linguistic Analysis of Lexical Richness
See Lexical Richness.doc

Phraseology & Patterns in language

  Frantisek Cermak & Michal Křen: Large Corpora, Lexical Frequencies and Coverage of Texts
See Large Corpora, Lexical Frequencies and Coverage of Texts.doc  

Christopher Gledhill  & Pierre Frath: A Reference-based Theory of Phraseological Units: the Evidence of Fossils.  
See A Reference-based Theory of Phraseological Units.doc

Eva Hajičov�, Jiri Havelka & Katerina Vesela: Corpus Evidence of Contextual Boundness and Focus
See Corpus Evidence of Contextual Boundness and Focus.doc 

Csaba Oravecz, Karoly Varasdi & Viktor Nagy: Lexical idiosyncrasy in MWE extraction
See Lexical idiosyncrasy in MWE extraction.doc  

Bertus van Rooy: Expressions of modality in Black South African English  
See Expressions of modality in Black South African English.doc

Petra Storjohann: Corpus-driven vs. corpus-based approach to the study of relational patterns  
See Corpus-driven vs. corpus-based approach.doc

Christiane Wanzeck: The Determination of Phraseological Units in Historical Corpora: An Analysis System for Early New High German  
See The Determination of Phraseological Units in Historical Corpora.doc

  

The Web as  a corpus

Abdulrahman Almuhareb & Massimo Poesio: Finding Attributes in the Web
See Finding Attributes in the Web Using a Parser.pdf

Ilias Koutsis, Geroge Kouklakis, George Mikros & George Markopoulos: MINOTAVROS A tool for the semiautomated creation of large corpora from the Web.  
See Minotavros.doc

Alexander Mehler & Rudiger Gleim: Polymorphism in Generic Web Units � A Corpus Linguistic Study PCLC/
See Alexander_Mehler_and_Ruediger_Gleim_Corpus_Linguistics_2005.pdf

Antoinette Renouf: The WebCorp Search Engine: a holistic approach to web text search  
See The WebCorp Search Engine.doc

Jes�s Tom�s, Francisco Casacuberta & Jaime Lloret: WebMining: Non�supervised system to obtain parallel corpus from the Web  
See WebMining.pdf

Motoko Ueyama & Marco Baroni: Automated construction and evaluation of a Japanese web-based reference corpus
See Automated Construction and Evaluation of Japanese Web-based Reference Corpora.doc  

Spoken Discourse

Adriano Allora: A Tentative Typology of Net-mediated Communication
        See A Tentative Typology of Net-mediated Communication.pdf

Knut Hofland & Annette Myre Jorgensen: COLA: A Spanish spoken corpus of youth language  
See COLA.doc

Kikuo Maekawa: Quantitative Analysis of Word-form Variation Using a Spontaneous Speech Corpus  
See Quantitative Analysis of Word-form Variation.doc

Antonio Moreno-Sandoval & Ana Gonzales-Ledesma: Pragmatic analysis of man-machine interactions in a spontaneous speech corpus  
See Pragmatic analysis of man-machine interactions.doc