The Myers Difference Algorithm
source link: https://www.nathaniel.ai/myers-diff/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Visualizing Diffs
The Myers difference algorithm
In 1986, Eugene Myers published An O(ND) Difference Algorithm and Its Variations, which unified the problems of finding the longest common subsequence of two sequences (the LCS of "driftwood" and "artwork" is "two") and finding the shortest edit script for transforming one sequence into another. Myers showed that these problems were equivalent to finding the shortest path over an "edit graph."
His algorithm improved the popular diff
utility, a data comparison tool that displays the smallest set of line-by-line deletions and insertions to transform one file into another. Figure 1 contains an example of diff
output, itself called a "diff", when given two similar poems: Colin Morton's 1981 "Empty Bottles" and David Morgan's 2011 plagiarism "Monkey Stops Whistling."
In the output, lines prefixed by a "+" are insertions from the destination file, lines prefixed by a "-" are deletions from the source file, and lines lacking a "+" or "-" prefix are lines shared by both sequences.
Figure 1 - Excerpted "diff" of Colin Morton's 1981 poem and David Morgan's plagiarism
- Empty Bottles - Colin Morton (1981)
+ Monkey Stops Whistling - David Morgan (2011)
- line up all the empty bottles
+ Stand to attention all the empty bottles
the long-necked beer bottles from the antique stores
the wine bottles and pop bottles left on beaches
steam off the labels and line the bottles up the green ones with
the brown black yellow and clear ones
- line up
+ Stand to attention all the empty bottles
the beer bottles whose labels have been torn off by
neurotic fingers
and the bottles sent back by the breweries because they have
- cockroaches or dead mice at the bottom
- line up
+ cockroaches or dead bluebottles at the bottom
+ Stand to attention all the empty bottles
the bottles afloat on all the seas those with messages in
them and those without any
- and the bottles with methyl hydrate-soaked cotton in them
- used by schoolkids for killing insects
line up the bottle that killed Malcolm Lowry with the bottle that...
To form an edit graph, Myers arranged the source sequence left-to-right along an x-axis and the destination sequence top-to-bottom along a y-axis with an orthogonal grid of horizontal and vertical edges projected from them. Starting in the upper-left and seeking the bottom-right, each move along the graph's edges toward a vertex would correspond to an edit instruction on the source sequence. Horizontal moves would be deletions from the source element; vertical moves would be insertions of the destination element. Where source and destination elements of the sequence are equivalent, diagonal movement is permitted.
Figure 2 - Edit graph with source sequence on x-axis and destination sequence on y-axis
Every trace across this directed, acyclic edit graph is a valid solution. Myers proposed a type of breadth-first search to discover the shortest traversal in either O(ND) space and time or, using a proposed refinement, in O(N) space and O(NlgN+D2) time where D is the size of the minimum edit script.
Figure 3 - Slowed, looped animation of Myers's search fanning out across the edit graph
d a r i f t w o o d
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK