View with English headings
This workshop provides a meeting ground
for scholars involved in the creation of corpora of spoken language or
with a more general interested in the representation of spoken data
based on audio/video recordings. The workshop addresses the need to
harmonise corpus-building methods by developing or utilising
internationally recognised standards in corpus linguistics or best
practice guidelines for the transcription and annotation of
The aim is to facilitate the exchange of experience from large-scale and coordinated corpus building efforts as well as small-scale and local initiatives. This includes accounts of, on the one hand, the practicalities encountered in corpus compilation, transcription and annotation, and on the other hand, how annotation decisions are grounded in linguistic theory. This will hopefully stimulate a fruitful discussion about whether/how cross-corpora comparison is hampered by lack of uniformity in annotation schema and procedures, what solutions corpus builders recommend at different annotation levels, practical experience with the use of existing standards or de facto standards (e.g. COBUILD/NERC, TEI, XCES), methods for testing and improving inter-annotator agreement, etc. Relevant topics include, but are not restricted to:
- Corpus design (techniques for capturing and linking text and audio/video data; ensuring consistency in transcription; ensuring inter-annotator agreement)
- Orthographic transcription (transcription of non-standard vocabulary, slang, swearing, neologisms; standardised vs. idiosyncratic orthography; standardised representation of pauses, backchannels and hesitation phenomena)
- Annotation of syntactic features (the relevance and reliability of part-of-speech tagging for (informal/messy) conversational data; syntactic parsing of speech; parsers’/taggers’ capability of handling non-standard forms and neologisms)
- Annotation of prosodic, phonetic, or acoustic features (standardised vs. in-house annotation schemes, simple vs. detailed prosodic annotation; the relevance and reliability of phonetic annotation)
- Pragmatic or gestural annotation (standardised/in-house systems for annotation of speech act information, discourse functions, pragmatic markers, quotatives, anaphora and deixis; gestural annotation schemes)
We invite papers that discuss specific corpus initiatives dealing with any of the above topics, or that report on corpus-based case studies which illustrate or problematise the need for methodological harmonisation and standardisation in the field.
The workshop will be organised as a series of thematic slots consisting of 15-minute papers followed by joint discussions.
Abstracts of 300-400 words should be submitted by e-mail to all three convenors: [log in to unmask], [log in to unmask] and [log in to unmask] The notification of acceptance will be sent out in late February 2013.
Workshop convenors: Gisle Andersen (NHH-NO), John Kirk (QUB-UK), Susan Lee Nacey (HiHm-NO)