Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

People like to see information presented as stories or narratives, and they comprehend and recall narratives better than lists of facts. Perhaps this is a consequence of the fact that our language abilities evolved when we were hunter-gatherers living in small tribes, who swapped stories rather than facts while sitting around a fire at night. At any rate, narratives are often (usually? almost always?) the best way to present information to people.

But what characteristics does a text need to have in order to be good narrative? We dont have a good understanding of this, especially when we’re talking about non-fictional narratives which are intended to communicate important information, instead of fictional narratives whose main goal is entertainment. Below I list some sources and experiences which have influenced my thinking about non-fictional narratives, and conclude with some thoughts on how such narratives could be generated by an NLG system.

Much of the below has been influenced by discussions with my PhD student Craig Thomson. But if you see something you disagree with, blame me, dont blame Craig!

Labov

One book which influenced my thinking on narratives is chapter 9 of Labov’s book Language in the Inner City. In chapter 9, Labov analyses “narratives of personal experience”. He points out that a “minimal” narrative can be constructed simply by listing events in time order, but “good” narratives include additional content, such as an abstract (summary of key points), orientation (time, place, context), evaluation (why is this important), and result (outcome). Labov also describes how syntactic/linguistic constructs are used in narratives. For example, we can use past-perfect tense to indicate places where events are not described in temporal order.

In a sense, Labov shows two points on the narrative spectrum: a “minimal” narrative which is pretty basic in terms of both linguistics and content, and a “good” narrative with much more sophisticated structures.

Babytalk – continuity

When we evaluated the Babytalk system (generating summaries of clinical data for babies in neonatal ICU), we discovered a number of issues related to narrative which decreased text quality (Reiter et al 2008). For example, users complained about paragraphs such as the below

There were 3 failed attempts to insert a peripheral venous line at 13:53. TcPO2 suddenly decreased to 8.1. SaO2 suddenly increased to 92. TcPO2 suddenly decreased
to 9.3.

Users didnt like the fact the TcPO2 suddenly decreased to 9.3 *after* it had suddenly decreased to 8.1; this did not make sense to them. In fact, atfer TcPO2 fell to 8.1, it had (slowly) increased to 19, before suddenly falling again to 9.3. The problem here is that a slow increase in TcPO2 is medically less important than a sudden fall in TcPO2, which is why Babytalk mentioned the sudden falls in TcPO2 but not the slow increase. But regardless of the medical rationale, not mentioning the slow increase confused our readers; it needs to be mentioned to create a good narrative, even if it is not medically important. I call this problem continuity.

Based on this, I experimented with adding a module to Babytalk which explicitly maintained a “mental model” of the situation described by the narrative, so that we could detect cases where this kind of thing might happen. Eg, if the model recorded that the user thought TcPO2 was 8.1, because this is what he was last told about TcPO2, then the system would know that it couldnt subsequently say that TcPO2 fell to 9.3 without giving additional information and explanation. A nice idea, but not easy to get to work in practice. But maybe something we need in order to generate high-quality narrative.

Sambaraju – causality

Later I worked with a discourse analyst, Rahul Sambaraju, who compared Babytalk texts to human texts (Sambaraju et al 2011) (see also McKinlay et al 2010). Rahul pointed out a number of ways in which the computer-generated texts were inferior to the human-written texts from a narrative perspective, including in particular poorer causal connectivity and structure. Good narratives have a causal structure, and present events in a way which highlights causal relationships, and the humans did this much better than the NLG system (details are complex, interested people can read section 4 of Rahul’s paper). Babytalk, incidentally, did try to highlight causal relationships, but its understanding of these was much worse than the human’s. In other words, the key system-building challenge was perhaps not so much communicating causal relations as identifying these relations in the first place.

Kintsch

In his book Comprehension : a paradigm for cognition, Kintsch presents a model for text comprehension which effectively suggests guidelines for the production of easy-to-understand texts. Although Kintsch does not explicitly focus on narrative, many of the principles he suggests seem related to the above-mentioned insights. For example, Kintsch says that texts will be easier to read if neighbouring sentences talk about related things, in part because this maximises the use of humans short-term working memory, and minimises the need to retrieve new information from human long-term memory.

My above-mentioned PhD student, Craig Thomson, tried to implement some of these principles within an NLG system (Thompson et al 2018). It was not easy, in part because you need to be able to predict which concepts are related, which Craig did using Word2Vec. One thing that emerged from this work is that if we want to measure comprehension, we cant just ask subjects to rate texts on a Likert scale, we need to directly measure comprehension and perhaps recall.

Narrative generation

So, what does this mean for NLG systems, in particular for document planning? In most current NLG systems, narrative structure is specified manually by developers for the target application. There is also research on learning narrative structure from corpora, but again what is learnt is specific to one application. Maybe I’m being idealistic, but I’d love to see a more generic and theoretically justified model for generating narrative.

So here is an initial suggestion for such a model, at a very abstract level. I assume that the task is to generate a narrative which communicates a set of events.

Characterise the space of allowable narratives in the genre, perhaps as a finite-state or Markov model (where the nodes are specified at the content level). This allows us to capture genre-dependent conventions, which are important in many contexts. Eg, biographies usually start with the person’s birth, even if they later mention earlier events such as parents birthdate.
Find the narrative in this space which does the best job of showing causal structure (Sambaruju) and grouping events that involve similar entities (Kintsch), while otherwise keeping events in time order (Labov). Probably there are trade-offs, maybe define a scoring/preference function over the search space?
Add continuity information (Babytalk) to the narrative if needed, perhaps based on simulating the reader’s mental model of the situation.
Add high-level content such as abstract (Labov) to the narrative

I realise this is crude and full of problems, but would be great if we could get something like this to work!

What Makes a Good Narrative?

Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

Labov

Babytalk – continuity

Sambaraju – causality

Kintsch

Narrative generation

Recommend

苹果CEO库克：我不用「元宇宙」这类流行语，我们称之为AR

SBF：加强对加密交易所监管或对投资者产生积极影响

泰国证交会：建议公司对数字资产保持谨慎

Chainalysis通过区块浏览器Walletexplorer为执法部门提供线索

嘉楠科技宣布2000万美元新股回购计划

彭博社：恒大危机波及全球市场的担忧引发加密市场抛售

Apple's private relay service does not sit still

俄罗斯法院驳回比特币庞氏骗局Finiko创始人Marat Sabirov的上诉

The Benefits of Social Media Advertising For Your Business

小数据显示，用iPhone手机亏钱的概率更高！

About Joyk