2

The Myers Difference Algorithm

 1 year ago
source link: https://www.nathaniel.ai/myers-diff/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Visualizing Diffs

The Myers difference algorithm

In 1986, Eugene Myers published An O(ND) Difference Algorithm and Its Variations, which unified the problems of finding the longest common subsequence of two sequences (the LCS of "driftwood" and "artwork" is "two") and finding the shortest edit script for transforming one sequence into another. Myers showed that these problems were equivalent to finding the shortest path over an "edit graph."

His algorithm improved the popular diff utility, a data comparison tool that displays the smallest set of line-by-line deletions and insertions to transform one file into another. Figure 1 contains an example of diff output, itself called a "diff", when given two similar poems: Colin Morton's 1981 "Empty Bottles" and David Morgan's 2011 plagiarism "Monkey Stops Whistling."

In the output, lines prefixed by a "+" are insertions from the destination file, lines prefixed by a "-" are deletions from the source file, and lines lacking a "+" or "-" prefix are lines shared by both sequences.

Figure 1 - Excerpted "diff" of Colin Morton's 1981 poem and David Morgan's plagiarism

- Empty Bottles - Colin Morton (1981)

+ Monkey Stops Whistling - David Morgan (2011)

- line up all the empty bottles

+ Stand to attention all the empty bottles

the long-necked beer bottles from the antique stores

the wine bottles and pop bottles left on beaches

steam off the labels and line the bottles up the green ones with

the brown black yellow and clear ones

- line up

+ Stand to attention all the empty bottles

the beer bottles whose labels have been torn off by

neurotic fingers

and the bottles sent back by the breweries because they have

- cockroaches or dead mice at the bottom

- line up

+ cockroaches or dead bluebottles at the bottom

+ Stand to attention all the empty bottles

the bottles afloat on all the seas those with messages in

them and those without any

- and the bottles with methyl hydrate-soaked cotton in them

- used by schoolkids for killing insects

line up the bottle that killed Malcolm Lowry with the bottle that...

To form an edit graph, Myers arranged the source sequence left-to-right along an x-axis and the destination sequence top-to-bottom along a y-axis with an orthogonal grid of horizontal and vertical edges projected from them. Starting in the upper-left and seeking the bottom-right, each move along the graph's edges toward a vertex would correspond to an edit instruction on the source sequence. Horizontal moves would be deletions from the source element; vertical moves would be insertions of the destination element. Where source and destination elements of the sequence are equivalent, diagonal movement is permitted.

Figure 2 - Edit graph with source sequence on x-axis and destination sequence on y-axis

source deletions → ← destination insertions

Every trace across this directed, acyclic edit graph is a valid solution. Myers proposed a type of breadth-first search to discover the shortest traversal in either O(ND) space and time or, using a proposed refinement, in O(N) space and O(NlgN+D2) time where D is the size of the minimum edit script.

Figure 3 - Slowed, looped animation of Myers's search fanning out across the edit graph

source deletions → ← destination insertions

d a r i f t w o o d


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK