6. System Implementation and Evaluation: A Discussion In general, it is essential to empirically evaluate theories and systems that purportedly implement them. Not only do evaluations help others understand the strengths and limitations of various hypotheses and systems, but they also facilitate comparisons between competing claims in many cases. However, NLG evaluations are considered difficult (Hovy and Meeter, 1990). NLG systems can be evaluated at many different levels, some of them being orthogonal to each other. Our case is no exception. There are at least three different, and equally important questions that one could investigate further:
A large part of the work we have discussed in this paper is system independent and applicable to any automatic graphic design system. Perhaps the most surprising aspect about our current implementation is how far one can get with such a simple architecture. We made certain simplifying decisions initially in order to get a prototype implemented. Surprisingly few of these simplifying assumptions were problematic down the line. An example of this is our pipelined architecture. Most NLG researchers agree that the various modules in a NLG system need to be strongly interconnected with bi-directional communication and control and use shared data structures. We started off by using a pipelined architecture and were surprised to find that the simplifications seemed to be problematic in only one situation (which we were able to get around by planning appropriately). There are several advantages of a pipelined approach as in our case: not only is it easy to design, implement and test each module independently, it also becomes easy to extend the functionality of any individual module without significantly affecting the others. While such a simplified architecture will certainly not suffice for all generation tasks, this is a strong argument for trying this minimal approach to see where it falls short and why. Over the last two years, this system has been used to generate captions for several hundred figures in different domains (housing-sales, Napoleon's march of 1812, logistics transportation, scheduling, etc.). Porting the system from one domain to another usually requires only specifying the lexicon for the new domain (e.g., "battle," "troops," etc.). The fact that the captions generated in each of these--quite different--domains are deemed useful and natural by users is testimony to the effectiveness of the caption generation mechanism currently in place. It should be noted that there are two shortcomings in the system that will be addressed in future work: (1) the caption generation system, as described here, cannot in general, modify the graphics designed by SAGE if so required by the caption. There are several cases where this capability would be extremely useful, but the caption generation system described here was designed to work after SAGE had designed and rendered the graphic. There is one specialized case where coordination currently occurs, which is when the caption generator presents an example. In that case, the caption generator can request that the graphemes corresponding to the tuple values used in the example be highlighted in the picture; (2) the system does not, as yet, analyze the data set for interesting patterns or clusters of data points. To do this, the system will need a clustering analysis module that can be used by the caption generator. As a result, the system cannot generate captions of the sort "this chart shows that sales were flat throughout 1995, but rose sharply in 1996." To next section. |
Paper Sections:To Title pageTo Part 1: Introduction To Part 2: SAGE: A System for Automatic Graphical Explanations To Part 3: Discourse Strategies for Generating Captions To Part 4: Graphical Complexity: The Need for Clarification To Part 5: Generating Explanatory Captions To Part 7: Related Work To Part 7: Related Work To Part 8: Conclusions and Future Work To Appendix A To Acknowledgements |