A Mutation Carol 2

December 15, 2022

Ghosts of creations past and citations not present

Domenico Amalfitano, Ana Paiva, Alexis Inquel, Luis Pinto, Anna Rita Fasolino, and René Just are the authors of an article in this month’s Communications of the ACM. Their article is on the program testing method called mutation.

Today we discuss how far back citations should go.

When I opened this month’s CACM hardcopy and saw the title “How do Java Mutation Tools Differ?,” I looked at the article’s references. Like most researchers, I am proud and enjoy seeing my work cited. Their article leads with the following as a major reference:

[33] Jeff Offutt. “A Mutation Carol: Past, Present, and Future.” Information and Software Technology 53, 10 (2011), 1098–1107.

This paper is cited twice, sandwiched around a mention of a 2019 survey. The second time is for a definition of mutation analysis as, “the use of well-defined rules defined on syntactic descriptions to make systematic changes to the syntax or to objects developed from the syntax.” There is only one citation dated before 2001, a 1992 paper by Offutt.

What isn’t cited is anything from the more distant past, before the Internet, before Seinfeld and The Simpsons. In particular, not this:

Richard A. DeMillo, Richard J. Lipton, and Fred G. Sayward. “Hints on Test Data Selection: Help for the Practicing Programmer.” IEEE Computer 11, 4 (1978), 34–41.

Bringing Past to Present

I was shocked, then upset, and then amazed. This isn’t plagiarizing, but there is still a sense of using someone else’s ideas as currency without giving credit. Or maybe mutation testing is now coin-of-the-realm? Whatever, I felt scrooged—or rather, “ghosted.”

My disorientation was alleviated upon looking at the “Mutation Carol” paper after a prompt from Ken. It not only cites the 1978 paper but, with echoes of the Charles Dickens story, takes me all the way back to my school days:

“Legend has it that the first ideas of mutation analysis were postulated in 1971 in a class term paper by Richard Lipton [2]. Depending on who we ask, his professor, Dave Parnas, either thought mutation was a bad idea or a reasonably clever idea that was not worthy of a PhD dissertation. The first research project was started in the late 1970s by DeMillo (Georgia Tech), Lipton (Princeton), and Sayward (Yale).”

The term paper that Offutt referred to is:

Richard J. Lipton. “Fault diagnosis of computer programs.” Technical Report, Student Report, Carnegie Mellon University, 1971.

I am not suggesting people should cite that. But a second kind of source to cite in an applied survey is the first implementation. This source was far from tiny: Tim Budd’s PhD dissertation titled Mutation Analysis in 1980 from Yale University. Well, Offutt references Budd copiously in his next paragraphs, besides citing papers by him and others. And Offutt should know—he was a PhD student of DeMillo later in the 1980s.

I guess—letting my heart soften a little here—the authors of the CACM paper figured that their major reference [33] sufficed for the record, all the more since its author was in the originators’ circle. But readers may not look at paper 33. Holding hardcopy, one cannot. This leads to a wider question.

Citation Proprieties

The question is, (when) should one cite a paper that one hasn’t actually consulted? Ken calls this “Transitive Citation.” Is it a vice? Here are some considerations:

The citation could be based on memory of having read the paper in the past. Even if you didn’t look at the paper while doing your current project, it may be material to your knowledge.
Supposing one never read the stem paper—that a survey or monograph sufficed—the stem paper may still be more accessible or concisely informative for readers.
The transitively cited papers may be used to set a context or tell a story. This is one reason many in computer science, especially for conference papers, use the “alpha” style of citation, like [DLS78] for the above paper. It takes less space that writing out the author names and spares readers who recognize the tag the interruption of going to the references.
On the other hand, it may be that the contents of the stem paper have attained the status of common knowledge that does not need to be cited.

What is common knowledge can be tricky because it depends on the scope of the audience. It is not just what they know but how readily they can find sources. For instance, the above link from the San José State University Library lists the following as examples that need not be cited:

Abraham Lincoln was the 16th President of the United States.
Sacramento is the capital of California.
A genome is all the DNA in an organism, including its genes.

The second item might not be known by a non-American (or even by an American) and the third presupposes memory of secondary school science. Yet the point is that they were established long ago and can be “found in many sources” as the link states. But in our case, the presence of many sources—when they have a unique and agreed lower bound—comes back to our original questions.

Open Problems

What should we do about this paper? Did they violate basic citation rules? Or is it okay since our initial creation of the mutation method is well known to all those who work in the area—and the impressive panoply of tools covered in their article go well beyond origins?

What do you think?

A Mutation Carol 2

A Mutation Carol 2

Bringing Past to Present

Citation Proprieties

Open Problems

Like this:

Recommend

2023，运营人可以抓住哪些高薪趋势？ | 运营派

业绩突然滑坡，“全球第一”暴雷

《今夜、我用身体恋爱。第二季》全集在线播放-热播电视剧-雷神影院

0203 - 硕士有必要吗

建行生活品牌升级：如何让国有大行变得更有温度

VR爆发前夜超高清视频产业寻找发展“新锚点”

理想，最大的危机不是新车起火？

0205 - 升级 Google Analytics 4

业绩下滑存货高企沃隆食品能否闯过IPO“红绿灯”？

Schools are banning ChatGPT but an OpenAI exec says the technology is a vital to...

About Joyk