Pre-conference workshops

The following five pre-conference workshops will be held on Thursday, 3 July. Participants can choose between a morning and/or afternoon workshop, or the full-day workshop.

Morning session: 10h00 - 13h00 Afternoon session: 14h30 - 17h30

Workshop 1
Annotating pedagogy: implementing language teaching and learning-oriented annotation on corpora

Workshop 3
Learning Mandarin with the Sketch Engine: verb-object compounds in basic vocabulary

< CANCELLED >

Workshop 2
Using XAIRA to explore your XML Corpus

Workshop 4
Exploring and teaching the phraseology of academic discourse

Workshop 5
TaLC at TaLC: Teaching and Linguateca's (Portuguese language) Corpora


[workshop authors]

Registration begins in April and includes lunch, coffee break(s) and workshop materials. Places are limited and will be assigned on a first come, first served basis.

40 euros per workshop (early-bird, before May 30)
50 euros per workshop (normal)
50% discount for students (ID required)


Workshop 1
Annotating pedagogy: implementing language teaching and learning-oriented annotation on corpora

Pascual Pérez-Paredes, Universidad de Murcia, Spain
José M. Alcaraz, Universidad de Murcia, Spain

Keywords: Corpus annotation, pedagogical corpora, user-centered corpus exploitation, XML resources

The main aim of this hands-on workshop is to show participants the practical ways in which corpora can be annotated from a pedagogical perspective and create a corpus that is pedagogically rich and usable in the language classroom. This workshop is an excellent opportunity to motivate language teachers and researchers to think pedagogy as an annotation target, along with the better-established morphology or syntax.

The TALC community has been debating for over a decade now on the pedagogic uses of corpora. In this debate, different contributions have addressed the exploitation of language corpora in the classroom, both as direct and indirect sources of information and activities. For the most part, these proposals have made an extensive use of L1 principled corpora. The BNC is a case in point. Proposals to integrate the BNC in the language curriculum abound and address a wide range of elements of the curriculum.

However, pedagogical annotation has not been in the agenda of language educators. The reasons for this are not easy to enumerate here. Suffice it to say that the fascination of the CL community for the usefulness of principled corpora such as the BNC have delayed the implementation of other initiatives that may meet the needs of non-linguists, or non-tertiary students of foreign languages.

The workshop is structured in five different stages which combine the lecture input with a more predominant hands-on approach. First, participants will be introduced into the notion of pedagogical annotation. Here, we will present a theoretical framework that will serve as the basis for the understanding of the following steps. Second, we will offer an overview of the annotation tool that will be used in this workshop: SACODEYL Annotator. This freeware and open source application has been developed within the frame of SACODEYL, an international EU-funded Minerva initiative that implements DDL online language learning opportunities for young people. Although participants will be given precise instructions on how to use the main functions of the tool, the emphasis of the workshop is not this particular software but the technology behind, that is, XML and, in particular, the extensibility quality of this markup language.

After this, participants in the workshop will be given the chance to annotate themselves part of a corpus so as to test the theoretical approach and the know-how discussed above. This is the most important activity of the workshop. The preparatory work for the annotation stage includes the division of a text/interview into learning units and the structuring of an annotation taxonomy tree. The level of granularity here is for the user or annotator to decide. The application of this user-driven annotation may include topics, grammar, lexis, target exploitation level and similar pedagogical units that may play a role in the teaching/ research context of those taking part in the workshop.

Subsequent to this hands-on practice, a debate will explore the different views on the text(s) annotated, laying emphasis on the flexibility of the tool and the structuring possibilities of XML annotation. A fifth stage of the workshop will offer more technical information on the possibilities for the exploitation and uses of annotated XML corpora.

We expect that by the end of the workshop participants will have not only mastered the basics of XML-driven pedagogical annotation, but also will have shared views and developed their own appreciation of how pedagogy can be actually annotated and further exploited in the classroom.

Participants will be provided with details on how to access and further use SACODEYL tools.


Workshop 2
Using XAIRA to explore your XML Corpus

Guy Aston, SSLMIT, University of Bologna, Italy
Lou Burnard, Oxford University Computing Services, UK

Keywords: XAIRA, XML, retrieval software, indexing, computer-aided learning

This workshop will introduce participants to the latest version of the XAIRA system developed originally for use with the British National Corpus, but now enhanced as a general purpose and open source cross-platform software architecture.

Participants will learn how this software can take advantage of all the XML markup in the new XML edition of the British National Corpus. They will also learn how to use Xaira with their own corpora. Xaira can operate on a simple collection of plain text files, with no markup at all. It can also operate on a collection of texts with very sophisticated embedded linguistic markup, provided this is expressed in some dialect of XML. Participants will learn how to customize the program for either kind of material.

We will provide a series of exploratory exercises, designed to show off the searching capabilities of the system when used with the BNC. These will include the production of concordances, word lists, collocation lists etc. in the usual way, but with an emphasis on the kind of application for such capabilities likely to be of most use in a language teaching environment. Particular attention will be paid to issues of integration and portability, in order to show how results obtained with Xaira can be integrated into other teaching material. We will also present and discuss strategies for encouraging students' own exploration of corpus resources using the program.

All material used at the workshop will be made available online, and we will therefore need network access. Participants will also be encouraged to bring their own materials for experimentation.

http://www.natcorp.ox.ac.uk/workshop/
http://www.xaira.org


Workshop 3
Learning Mandarin with the Sketch Engine: verb-object compounds in basic vocabulary

Simon Smith, Ming Chuan University, Taiwan

Keywords: Mandarin, Chinese, verb-object compounds, vocabulary, Sketch Engine

This workshop is aimed at those with no knowledge of Mandarin (or any other kind of Chinese), although all are welcome to come along. Participants will first be shown how to recognize enough Chinese characters to be able to follow the concordances and examples presented; a brief introduction to the Sketch Engine corpus query tool (SkE, described by Kilgarriff et al, 2004) will also be given. SkE will be used to demonstrate the verb-object compounding process, which plays an important role in Mandarin word formation.

The introduction to SkE will in part resemble a walkthrough presentation, prepared by the authors, which participants may wish to view in advance (http://mcu.edu.tw/~ssmith/walkthrough/). For simplicity’s sake, we will first demonstrate the functions of SkE, including concordances, word sketches and thesaurus, using the English version of SkE, before moving on to the Chinese corpus and the workshop proper.

Because the audience will be unfamiliar with Chinese characters, a short introduction to the writing system will be needed. We will be focusing, where possible, on the relatively small class of characters which has traditionally been designated “pictorial” or “ideographic”. These are quite easy to recognize and remember (they are, after all, supposed to resemble the objects they represent) and many of them do figure in verb-object compounds. For example, the characters 車(a wheeled vehicle), 門 (a door or gate) and 刀 (knife) appear respectively in 開 車 (to drive, literally “open a car”), 開 門 (to open a door, or figuratively to open for business) and 開 刀 (to have an operation or operate on someone, literally “open knife”).

開, in the above examples, seems to be acting as a kind of light verb, or at least to be making a much smaller semantic contribution than the nominal (object) compound elements. It will be shown that another frequent verbal participant in verb-object compounds, 打 , literally meaning “hit”, is semantically bleached to an even greater extent. In yet other cases, it is the nominal component that appears to be redundant. Examples of these three varieties of verb-object compound will be discussed in as much detail as time permits.

Because word segmentation of Chinese texts is a non-trivial task, there is not universal agreement on the distinction between the processes of word formation compounding and collocation in Chinese. The segmentation procedure used to mark up the Chinese corpus will therefore be briefly described.

Aside from the morphological and collocational interest, participants will have plenty of opportunities to practise and acquire some basic language along the way. They will be encouraged to join in with pronunciation practice and simple drills, and it is hoped that the experience will be a novel, enjoyable and beneficial one for all involved.

Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. 2004. “The Sketch Engine.” Paper presented at EURALEX, Lorient, France, July 2004.
Xiao, R. and McEnery, T. 2006. “Collocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective”. Applied Linguistics 27: 103-129


Workshop 4
Exploring and teaching the phraseology of academic discourse

Michael Barlow, University of Auckland, New Zealand
Ute Römer, University of Michigan, Ann Arbor, USA

Keywords: phraseology, collocations, EAP teaching, academic speech and writing, disciplinary discourse

Phraseology has had a rather marginal status in linguistic analysis and description in those linguistic theories that treat lexis and grammar separately and deal with words independent of their preferred grammatical structures and with grammatical structures independent of the words that typically occur in them (cf. Sinclair 2005; see also Ellis 2008). Consequently, the phrase has not always received the attention it deserves, given its central status as meaning-carrying unit (cf. Römer 2008; Sinclair 1996). Although corpus linguists have worked on different approaches to phraseology (see e.g. Biber, Conrad and Cortes 2003; Hunston and Francis 2000; Meunier and Granger 2008; Scott and Tribble 2006), there remains a gap between a general awareness of phraseology and knowledge of practical methods of identifying and analysing lexical units. The purpose of the present workshop is to provide hands-on experience in extracting and analysing phraseological units.

Phraseological items (or collocations, multi-word units, lexical bundles - to list a few commonly used alternative notions) can be broadly defined as repeatedly occurring contiguous or non-contiguous combinations of two or (usually) more words that carry meaning. We assume that it is essential for the language learner (and teacher) to know what word combinations are most commonly used in a particular type of discourse (e.g. in academic speech) so that learning and teaching activities can centre around those combinations and on the semantic and pragmatic meanings conveyed by them. But how can pedagogically relevant phraseological items be identified in corpora?

There is now becoming available a new generation of software tools that enable users to extract from a corpus lists of candidate phraseological items for inspection. One of these new phraseological search engines is Collocate (Barlow 2004). Collocate uses frequency information and statistical analyses (t-score, log likelihood, MI) in order to retrieve lists of:

(a) collocations with a specified search word and within a set span (e.g. four words),
(b) n-grams (lexical bundles) of different lengths, and
(c) collocations extracted from the corpus as a whole.

In this 3-hour workshop, we will first demonstrate some of the Collocate facilities, focussing on the extraction of phrases and meaningful items from corpora such as MICASE (the Michigan Corpus of Academic Spoken English) and MICUSP (the Michigan Corpus of Upper-level Student Papers). The focus will be on corpora that capture English academic discourse but the analytic steps we will demonstrate are universal and can also be applied to other types of discourse and on data from languages other than English. We will discuss the ways in which Collocate can be used to provide insights into the phraseological profile of academic speech and writing, and how the program can highlight which word combinations the members of a particular disciplinary discourse community tend to use in order to convey particular meanings.

In the hands-on part of the workshop, the participants will be able to work with and evaluate different functions and statistics used in Collocate in order to produce and interpret lists of collocations from different disciplinary subsets of MICASE and MICUSP. We will then move on from analysis to pedagogy and show how Collocate output can be used in the creation of innovative teaching materials. After a discussion of learner and EAP instructor needs, workshop participants will be provided with exercise templates and get the opportunity to design their own exercises for use in the EAP classroom.

Barlow, M. 2004. Collocate 1.0: Locating collocations and terminology. Houston, TX: Athelstan.
Biber, D., Conrad, S. and Cortes, V. 2003. "Lexical bundles in speech and writing: an initial taxonomy." In A. Wilson, P. Rayson and T. McEnery (eds), Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech. Frankfurt/Main: Peter Lang. 71-92.
Ellis, N. C. 2008. "Phraseology: the periphery and the heart of language." In Phraseology in Foreign Language Learning and Teaching, F. Meunier and S. Granger (eds.). Amsterdam: Benjamins, 1-13.
Hunston, S. and Francis, G 2000. Pattern Grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: Benjamins.
Meunier, F. and Granger, S. (eds) 2008. Phraseology in Foreign Language Learning and Teaching. Amsterdam: Benjamins.
Römer, U. 2008. "Identification impossible? A corpus approach to realisations of evaluative meaning in academic writing" Functions of Language 15/1: 115-130.
Scott, M. and Tribble, C. 2006. Textual Patterns: keyword and corpus analysis in language education. Amsterdam: Benjamins.
Sinclair, J. McH. 1996. "The search for units of meaning" TEXTUS IX: 75-106.
Sinclair, J. McH. 2005. "The phrase, the whole phrase, and nothing but the phrase" Keynote given at Phraseology 2005, Louvain-la-Neuve, October 13-15, 2005.


Workshop 5
TaLC at TaLC: Teaching and Linguateca's (Portuguese language) Corpora

Ana Frankenberg-Garcia, Linguateca (FCCN) & ISLA Lisboa, Portugal
Belinda Maia, Linguateca (CLUP) & Universidade do Porto, Portugal
Cláudia Freitas, Linguateca (CISUC/DEI/FCTUC), Portugal
Diana Santos, Linguateca (SINTEF - ICT), Norway

Keywords: Portuguese corpora, Portuguese language teaching, Portuguese language resources, Linguateca

The use of corpora in language learning can be a valuable tool for both teachers and learners. In the past couple of decades, a steadily growing number of corpus-based dictionaries, grammars and textbooks have become available and been hugely successful in the teaching of English as a Foreign Language. Little progress has been made, however, with regard to the development of corpus-based pedagogic materials for languages other than English.

The aim of the present workshop is to introduce people working with the Portuguese language in educational settings to a number of corpus resources and tools created and made available to the general public by Linguateca over the past decade. These include the AC/DC project (free access to large quantities of Portuguese parsed text, including CETEMPúblico, a 180 million word corpus of newspaper text from the daily Portuguese newspaper Público and CETENFolha from the daily Brazilian newspaper Folha de São Paulo), COMPARA (a 3 million word bi-directional parallel corpus of Portuguese and English), the Floresta Sintá(c)tica (an expanding 1.5 million word syntactic treebank for Portuguese) and the Corpógrafo (a web-based platform for building and managing your own corpus).

Although most of the above were originally conceived for research in natural language processing, their usefulness in education cannot be overlooked. The scant availability of ready-made, corpus-based learner dictionaries, grammars and textbooks for the teaching of Portuguese makes it all the more important to disseminate resources such as these. Language teachers who learn how to use them can create their own corpus-based pedagogic materials to supplement areas that are particularly lacking, such as information on collocations.

This 6-hour, full-day workshop, which will be conducted in Portuguese, will begin with a brief introduction to Linguateca's corpus resources and practical demonstrations of their applications in language teaching. Later in the day, participants will have the opportunity to try out these resources hands-on and create a few of their own materials for teaching Portuguese. Participants who wish so will be able to share their corpus-based Portuguese language pedagogic materials on Linguateca's website.


Workshop authors

Ana Frankenberg-Garcia is Auxiliary Professor at ISLA, Lisbon, where she teaches English language and translation, and a senior researcher at Linguateca, FCCN, where she is joint leader of the COMPARA project. She holds a PhD in Applied Linguistics from Edinburgh University and her research interests include the use of corpora for language learning and translation studies, parallel corpora, corpus usability and user behaviour, learner autonomy, crosslinguistic influence and second language writing. See also http://adamastor.linguateca.pt/COMPARA/equipa/Ana/AnaHome.html

Belinda Maia is an Associate Professor at the Faculdade de Letras da Universidade do Porto where she is responsible for teaching and research in the areas of contrastive linguistics, translation, information technology applied to translation, and terminology. She is the supervisor of the Polo CLUP/FLUP of the Linguateca project. See also http://web.letras.up.pt/bhsmaia/belinda/index.htm

Cláudia Freitas obtained her PhD in Linguistics in 2007, with a thesis about the extraction of semantic relations from corpora. From 2002 to 2007, she taught grammar and writing skills to Brazilian students at Pontifícia Universidade Católica, Rio de Janeiro. In 2007, she joined Linguateca, working primarily at the Floresta Sintáctica project. See also http://eden.dei.uc.pt/~freitas/

Diana Santos has worked with Portuguese corpora for the last 18 years. She led the development of the first corpus browser for Portuguese in 1992, did her PhD in corpus-based contrastive studies in 1996, and, under the scope of Linguateca, was involved in the creation of AC/DC, COMPARA, CETEMPúblico, CETENFolha and many other corpora for the processing of Portuguese. See also http:/www.linguateca.pt/Diana/diana.html

Guy Aston, MA (Oxon), MSc (Edinburgh), PhD (London), is Professor of English Linguistics at the Advanced School of Interpreters and Translators of the University of Bologna, where he teaches English for interpreters, Computer-aided translation and English-Italian liaison interpreting. His research interests include corpus linguistics, contrastive pragmatics, conversational analysis and autonomous language learning.

José M. Alcaraz-Calero is a Computer Science Engineer at the Computer Science School, University of Murcia. He has widely contributed to Corpus-based language research and thus has implemented solutions for different projects at UMU, including automatic semantic tagging of Spanish based of word sense disambiguation (WSD) and intelligent tagging algorithms for the CUMBRE corpus of Spanish. At the moment he is completing his PhD in autonomic computing, networks and ontologies.

Lou Burnard is Assistant Director of Oxford University Computing Services, and group manager for the Information and Support Group, one of its four major divisions. He set up the Oxford Text Archive in 1976, has been European editor of the Text Encoding Initiative since 1989, and is responsible for Oxford University's participation in the British National Corpus Project.

Michael Barlow is Associate Professor in the Department of Applied Language Studies and Linguistics at the University of Auckland in New Zealand where he teaches courses on Corpus Linguistics and CALL. He is the author of the text analysis programs: MonoConc, ParaConc and Collocate and is the co-author of two corpus-based ESL texts: Phrasal Verbs and Business Phrasal Verbs.

Pascual Pérez-Paredes started his collaboration with the English Department in the University of Murcia, Spain in 1996. After a research stay in the University of Texas at Austin, he completed his PhD in Applied Linguistics in 1999. He currently teaches CALL and Applied Linguistics. He is also a Sworn/Official Translator. His main interests are quantitative research of register variation, the compilation and use of language corpora and the implementation of Information and Communication Technologies in Foreign Language Teaching/Learning. He is a member of the Research Group Lingüística Aplicada Computacional, Enseñanza de Lenguas y Lexicografía (LACELL). Pascual Pérez-Paredes directs a research project funded by the SENECA agency on orality. He is also the coordinator of a MINERVA project funded by the European Commission: SACODEYL (225836-CP-1-2005-1-ES-MINERVA-M SACODEYL). At the moment, he is involved in corpus-based international projects such as LINDSEI (UCL) [http://cecl.fltr.ucl.ac.be/] and ICCI (TUFS) [http://www.tufs.ac.jp/insidetufs/doc/08012802.pdf].

Simon Smith is a long-standing EFL teacher with Taiwan and China private and university experience. BA Chinese & Linguistics, Leeds, 1998; MSc Machine Translation, Manchester/UMIST 1999; PhD Statistical Language Modelling, Birmingham, 2004; postdoc at Institute of Linguistics, Academia Sinica, under Huang Chu-ren. Assistant Professor, English Language Center, Ming Chuan University. Interested in Corpus Linguistics techniques in ELT, various publications in this area. See also http://www.mcu.edu.tw/department/app-lang/elcenter/english/facultyandstaff/faculty-staff/simonsmith.html

Ute Römer is currently Director of the Applied Corpus Linguistics Unit at the University of Michigan English Language Institute where she manages the MICASE (Michigan Corpus of Academic Spoken English) and MICUSP (Michigan Corpus of Upper-level Student Papers) projects. Ute received her PhD in English linguistics from the University of Hanover, Germany, in 2004. Her primary research interests and areas in which she has published include corpus linguistics, phraseology, and the application of corpora in language learning and teaching. Ute's current research focus is on the creation of evaluative meaning in academic writing and on how corpus tools and methods can be used to identify meaningful units in specialized discourses. More information about Ute's research interests and a list of her publications can be found at http://www.uteroemer.com.