In my view, a specific problem with evaluating and comparing NLG systems is the diversity and richness of input specifications such systems need. In realistic corpora, we typically find only pairs of data material and texts, connected by mere functional associations, where the specification of the purpose and intended presentation regularities is left implicit. In the long run, I think we should aim at enriching the input specifications with meta-data, incrementally filling situation descriptions in the sense of Dale and Reiter . In order for such specifications to be meaningful, agreement about vocabulary and/or ontological ingredients is essential. Examples of specifications associated with agreement requirements are: - meta-data describing numerical and other "real" data; may be by logical expressions composed of WordNet entries - regularities about the functional system behavior, expressed explicitly, such as ordering conventions involved, and selecting functions, all of these operating on meta-data, with some agreed NLG specification vocabulary (e.g., "prior to") - specifications, such as "fully detailed", "selective", whose interpretation is up to the generators, in view of applicable regularity specifications I hope that a "generation specfication" vocabulary will slowly evolve on these lines.