Task Description

TASKS

The task is concerned with intra-document coreference resolution for six different languages: Catalan, Dutch, English, German, Italian and Spanish. The core of the task is to identify which noun phrases (NPs) in a text refer to the same discourse entity.
Data is provided for both statistical training and evaluation, which extract the coreference chains from manually annotated corpora: the AnCora corpora for Catalan and Spanish, the OntoNotes and ARRAU corpora for English, the TüBa-D/Z for German, the KNACK corpus for Dutch, and the LiveMemories corpus for Italian, additionally enriched with morphological, syntactic and semantic information (such as gender, number, constituents, dependencies, predicates, etc.). Great effort has been devoted to provide the participants with a common and relatively simple data representation for all the languages.

Two tasks are proposed for each of the languages:

Full task. Detection of full coreference chains, composed of named entities, pronouns, and full noun phrases.
CANCELLED Subtask. Pronominal resolution, i.e., finding the antecedents of the pronouns in the text.

We strongly encourage to participate in the full task, but participants may limit themselves to participating in the second task, which is a subtask of the full task.

We will also allow participating for only one or a subset of the languages, although we target at systems addressing the full multilingual task.

[Back to the top]

GOALS

The main goal is to perform and evaluate coreference resolution for Catalan, Dutch, English, German, Italian and Spanish with the help of other layers of linguistic information and using different evaluation metrics (MUC, B-CUBED, CEAF and BLANC).

The multilingual context will allow to study the portability of coreference resolution systems across languages.
- To what extent is it possible to implement a general system that is portable to all six languages?
- How much language-specific tuning is necessary?
- Are there significant differences between Germanic and Romance languages? And between languages of the same family?
The additional layers of annotation will allow to study how helpful morphology, syntax and semantics are for solving coreference relations.
- How much preprocessing is needed?
- How much does the quality of the preprocessing modules (perfect linguistic input vs. noisy automatic input) affect the performance of state-of-the-art coreference resolution systems?
- Is morphology more helpful than syntax? Or semantics? Or is syntax more helpful than semantics?
The use of four different evaluation metrics will allow to compare the advantages and drawbacks of the generally used MUC, B-CUBED and CEAF measures, as well as the newly proposed BLANC measure.
- Do all of them provide the same ranking?
- Are they correlated?
- Can systems be optimized under all four metrics at the same time?

[Back to the top]

EVALUATION

See the description of the three different evaluation settings in the Evaluation section.

We invite - and strongly encourage - participants to send the results of their systems run in ALL FOUR EVALUATION SCENARIOS (closed vs. open, gold-standard vs. regular) and for ALL SIX LANGUAGES. This will be the only way to get an insight into the effect of additional layers of annotation on the same (and across) coreference resolution system, as well as the portability of systems across languages. Nonetheless, we will also allow participants to restrict themselves to any of the evaluation scenarios or/and to any of the languages.

[Back to the top]