In conjunction with the 5th International Natural Language Generation Conference (INLG 2008), June 12-14, 2008 Salt Fork, Ohio, USA.
Following the success of the Pilot NLG Challenge on Attribute Selection for Generating Referring Expressions (ASGRE) in September 2007, we are organising a second NLG Challenge, the Referring Expression Generation Challenge (REG 2008), to be presented and discussed during a special session at INLG 2008. While the ASGRE Challenge focused on attribute selection for definite references, the REG Challenge expands the scope of the original to include both attribute selection and realisation, while introducing a new task involving references to named entities in context.
The REG Challenge has eight submission tracks and two different data sets. It maintains the ASGRE Challenge's emphasis on openness to alternative task definitions and evaluation methods, and will involve both automatic and task-based evaluation.
Over the past few years, the need for comparative, quantitatively evaluated results has been increasingly felt in the field of NLG. Following a number of discussion sessions at NLG meetings, a workshop dedicated to the topic was held with NSF support in Arlington, Va., US, in April 2007. At this workshop, a decision was taken to organise a pilot shared task evaluation challenge, focussing on the area of GRE because of the broad consensus that has arisen among researchers on the nature and scope of this problem. The First NLG Challenge on Attribute Selection for Generating Referring Expressions (ASGRE), was held in Copenhagen in September 2007 in conjunction with the UCNLG+MT Workshop. It was highly successful, both in terms of participation and in the variety and quality of submissions received (see the report [pdf]). With 18 initial registrations, and final submissions from six teams comprised of 13 researchers submitting outputs from 22 different systems, it is safe to say that community interest was substantial.
Several aspects of the ASGRE Challenge were intended to promote an approach to shared-task evaluation where community interests feed directly into the nature and evaluation of tasks. This was done in order to counteract a potential narrowing of scope, where a shared task, rather than reflecting community interest, plays a causal role in shaping those interests in a single direction. The most important aspects were:
- A wide range of evaluation criteria, involving both automatic and task-based, intrinsic and extrinsic methods.
- An Open Category Track which enabled researchers to submit reports describing novel approaches involving the shared dataset, while opting out of the competitive element.
- An Evaluation Methods Track for submissions with novel proposals for evaluation of the shared task.
- Self-evaluation: participants computed scores for the development data set, using code supplied by the organisers.
The REG Challenge continues this tradition of openness, inclusion and low competitiveness. For details on how to participate, please see the Call for Participation.
- Task 1 (TUNA): Attribute selection for distinguishing descriptions.
- Task 2 (TUNA): Realisation of referring expressions from a given semantic representation.
- Task 3 (TUNA): Attribute selection and realisation combined.
- TUNA Open Track: Any work involving the TUNA data.
- TUNA Evaluation Methods: Any work involving evaluation of Tasks 1-3.
- Task 4 (GREC-2): Given a referent, a discourse context and a list of possible referring expressions, select the referring expression most appropriate in the context.
- GREC Open Track: Any work involving the GREC data.
- GREC Evaluation Methods: Any work involving evaluation of Task 4.
|17 Oct 2007||INLG'08 First Call for papers, including announcement of REG Challenge|
|03 Jan 2008||REG Challenge 2008 First Call for Participation; Preliminary registration open; sample data available|
|12 Feb 2008||Release of training and development data sets for all tasks|
|17 Mar 2008||Test data becomes available|
|17 Mar - 07 Apr||Test data submission period: participants can download test data at any time, but must submit system report first and must submit outputs within 48 hours|
|07 Apr 2008||Final deadline for submission of test data outputs|
|07 Apr - 10 May||Evaluation period|
|12 Jun 2008||REG Challenge meeting at INLG'08|
Anja Belz, NLTG, University of Brighton, UK
Albert Gatt, Computing Science, University of Aberdeen, UK
Eric Kow, NLTG, University of Brighton, UK
REG Challenge homepage:
REG Challenge email:
Last modified: 2008-02-23 11:43