Skip to content

Commit 353d117

Browse files
Rewrite applyPatch (#533)
* Add stubs of three new tests * Flesh out one test * Add more stubs * First draft of new applyPatch; seems to be fucked currently * Fix some bugs in my work * Fix more bugs * Fix another bug * Fix another bug * Fix a bad test * Fix another bug * Fix a test bug * Make distance-iterator behaviour match its documentation * More tests * New test * rearrange planned tests a bit * Add a test * Add failing test - a bug, I think? * Fix bug and fix test * Add another test case * Add more tests * Delete redundant test * Add another test * Add another test * Placate eslint * Add a silly additional test to placate istanbul * Fix something silly I did, and further improve coverage metric in doing so * Add more tests to pump up coverage metric further * Eliminate for...of loops to placate the coverage checker * Add yet more tests to reach 100% coverage * Add docs * Tweak docs * Tweak docs * Add release notes
1 parent 0e7c20c commit 353d117

File tree

5 files changed

+994
-89
lines changed

5 files changed

+994
-89
lines changed

README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,13 +119,21 @@ Broadly, jsdiff's diff functions all take an old text and a new text and perform
119119

120120
* `Diff.applyPatch(source, patch[, options])` - attempts to apply a unified diff patch.
121121

122-
If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in `source`), returns false.
122+
Hunks are applied first to last. `applyPatch` first tries to apply the first hunk at the line number specified in the hunk header, and with all context lines matching exactly. If that fails, it tries scanning backwards and forwards, one line at a time, to find a place to apply the hunk where the context lines match exactly. If that still fails, and `fuzzFactor` is greater than zero, it increments the maximum number of mismatches (missing, extra, or changed context lines) that there can be between the hunk context and a region where we are trying to apply the patch such that the hunk will still be considered to match. Regardless of `fuzzFactor`, lines to be deleted in the hunk *must* be present for a hunk to match, and the context lines *immediately* before and after an insertion must match exactly.
123+
124+
Once a hunk is successfully fitted, the process begins again with the next hunk. Regardless of `fuzzFactor`, later hunks must be applied later in the file than earlier hunks.
125+
126+
If a hunk cannot be successfully fitted *anywhere* with fewer than `fuzzFactor` mismatches, `applyPatch` fails and returns `false`.
127+
128+
If a hunk is successfully fitted but not at the line number specified by the hunk header, all subsequent hunks have their target line number adjusted accordingly. (e.g. if the first hunk is applied 10 lines below where the hunk header said it should fit, `applyPatch` will *start* looking for somewhere to apply the second hunk 10 lines below where its hunk header says it goes.)
129+
130+
If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in `source`), `applyPatch` returns false.
123131
124132
`patch` may be a string diff or the output from the `parsePatch` or `structuredPatch` methods.
125133
126134
The optional `options` object may have the following keys:
127135
128-
- `fuzzFactor`: Number of lines that are allowed to differ before rejecting a patch. Defaults to 0.
136+
- `fuzzFactor`: Maximum Levenshtein distance (in lines deleted, added, or subtituted) between the context shown in a patch hunk and the lines found in the file. Defaults to 0.
129137
- `autoConvertLineEndings`: If `true`, and if the file to be patched consistently uses different line endings to the patch (i.e. either the file always uses Unix line endings while the patch uses Windows ones, or vice versa), then `applyPatch` will behave as if the line endings in the patch were the same as those in the source file. (If `false`, the patch will usually fail to apply in such circumstances since lines deleted in the patch won't be considered to match those in the source file.) Defaults to `true`.
130138
- `compareLine(lineNumber, line, operation, patchContent)`: Callback used to compare to given lines to determine if they should be considered equal when patching. Defaults to strict equality but may be overridden to provide fuzzier comparison. Should return false if the lines should be rejected.
131139

release-notes.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@
2828
- [#521](https://github.com/kpdecker/jsdiff/pull/521) **the `callback` option is now supported by `structuredPatch`, `createPatch
2929
- [#529](https://github.com/kpdecker/jsdiff/pull/529) **`parsePatch` can now parse patches where lines starting with `--` or `++` are deleted/inserted**; previously, there were edge cases where the parser would choke on valid patches or give wrong results.
3030
- [#530](https://github.com/kpdecker/jsdiff/pull/530) **Added `ignoreNewlineAtEof` option` to `diffLines`**
31+
- [#533](https://github.com/kpdecker/jsdiff/pull/533) **`applyPatch` uses an entirely new algorithm for fuzzy matching.** Differences between the old and new algorithm are as follows:
32+
* The `fuzzFactor` now indicates the maximum [*Levenshtein* distance](https://en.wikipedia.org/wiki/Levenshtein_distance) that there can be between the context shown in a hunk and the actual file content at a location where we try to apply the hunk. (Previously, it represented a maximum [*Hamming* distance](https://en.wikipedia.org/wiki/Hamming_distance), meaning that a single insertion or deletion in the source file could stop a hunk from applying even with a high `fuzzFactor`.)
33+
* A hunk containing a deletion can now only be applied in a context where the line to be deleted actually appears verbatim. (Previously, as long as enough context lines in the hunk matched, `applyPatch` would apply the hunk anyway and delete a completely different line.)
34+
* The context line immediately before and immediately after an insertion must match exactly between the hunk and the file for a hunk to apply. (Previously this was not required.)
3135

3236
## v5.2.0
3337

src/patch/apply.js

Lines changed: 195 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -29,104 +29,228 @@ export function applyPatch(source, uniDiff, options = {}) {
2929
hunks = uniDiff.hunks,
3030

3131
compareLine = options.compareLine || ((lineNumber, line, operation, patchContent) => line === patchContent),
32-
errorCount = 0,
3332
fuzzFactor = options.fuzzFactor || 0,
34-
minLine = 0,
35-
offset = 0,
33+
minLine = 0;
3634

37-
removeEOFNL,
38-
addEOFNL;
35+
if (fuzzFactor < 0 || !Number.isInteger(fuzzFactor)) {
36+
throw new Error('fuzzFactor must be a non-negative integer');
37+
}
38+
39+
// Special case for empty patch.
40+
if (!hunks.length) {
41+
return source;
42+
}
43+
44+
// Before anything else, handle EOFNL insertion/removal. If the patch tells us to make a change
45+
// to the EOFNL that is redundant/impossible - i.e. to remove a newline that's not there, or add a
46+
// newline that already exists - then we either return false and fail to apply the patch (if
47+
// fuzzFactor is 0) or simply ignore the problem and do nothing (if fuzzFactor is >0).
48+
// If we do need to remove/add a newline at EOF, this will always be in the final hunk:
49+
let prevLine = '',
50+
removeEOFNL = false,
51+
addEOFNL = false;
52+
for (let i = 0; i < hunks[hunks.length - 1].lines.length; i++) {
53+
const line = hunks[hunks.length - 1].lines[i];
54+
if (line[0] == '\\') {
55+
if (prevLine[0] == '+') {
56+
removeEOFNL = true;
57+
} else if (prevLine[0] == '-') {
58+
addEOFNL = true;
59+
}
60+
break;
61+
}
62+
prevLine = line;
63+
}
64+
if (removeEOFNL) {
65+
if (lines[lines.length - 1] == '') {
66+
lines.pop();
67+
} else if (!fuzzFactor) {
68+
return false;
69+
}
70+
} else if (addEOFNL) {
71+
if (lines[lines.length - 1] != '') {
72+
lines.push('');
73+
} else if (!fuzzFactor) {
74+
return false;
75+
}
76+
}
3977

4078
/**
41-
* Checks if the hunk exactly fits on the provided location
79+
* Checks if the hunk can be made to fit at the provided location with at most `maxErrors`
80+
* insertions, substitutions, or deletions, while ensuring also that:
81+
* - lines deleted in the hunk match exactly, and
82+
* - wherever an insertion operation or block of insertion operations appears in the hunk, the
83+
* immediately preceding and following lines of context match exactly
84+
*
85+
* `toPos` should be set such that lines[toPos] is meant to match hunkLines[0].
86+
*
87+
* If the hunk can be applied, returns an object with properties `oldLineLastI` and
88+
* `replacementLines`. Otherwise, returns null.
4289
*/
43-
function hunkFits(hunk, toPos) {
44-
for (let j = 0; j < hunk.lines.length; j++) {
45-
let line = hunk.lines[j],
46-
operation = (line.length > 0 ? line[0] : ' '),
47-
content = (line.length > 0 ? line.substr(1) : line);
48-
49-
if (operation === ' ' || operation === '-') {
50-
// Context sanity check
51-
if (!compareLine(toPos + 1, lines[toPos], operation, content)) {
52-
errorCount++;
53-
54-
if (errorCount > fuzzFactor) {
55-
return false;
90+
function applyHunk(
91+
hunkLines,
92+
toPos,
93+
maxErrors,
94+
hunkLinesI = 0,
95+
lastContextLineMatched = true,
96+
patchedLines = [],
97+
patchedLinesLength = 0,
98+
) {
99+
let nConsecutiveOldContextLines = 0;
100+
let nextContextLineMustMatch = false;
101+
for (; hunkLinesI < hunkLines.length; hunkLinesI++) {
102+
let hunkLine = hunkLines[hunkLinesI],
103+
operation = (hunkLine.length > 0 ? hunkLine[0] : ' '),
104+
content = (hunkLine.length > 0 ? hunkLine.substr(1) : hunkLine);
105+
106+
if (operation === '-') {
107+
if (compareLine(toPos + 1, lines[toPos], operation, content)) {
108+
toPos++;
109+
nConsecutiveOldContextLines = 0;
110+
} else {
111+
if (!maxErrors || lines[toPos] == null) {
112+
return null;
56113
}
114+
patchedLines[patchedLinesLength] = lines[toPos];
115+
return applyHunk(
116+
hunkLines,
117+
toPos + 1,
118+
maxErrors - 1,
119+
hunkLinesI,
120+
false,
121+
patchedLines,
122+
patchedLinesLength + 1,
123+
);
124+
}
125+
}
126+
127+
if (operation === '+') {
128+
if (!lastContextLineMatched) {
129+
return null;
130+
}
131+
patchedLines[patchedLinesLength] = content;
132+
patchedLinesLength++;
133+
nConsecutiveOldContextLines = 0;
134+
nextContextLineMustMatch = true;
135+
}
136+
137+
if (operation === ' ') {
138+
nConsecutiveOldContextLines++;
139+
patchedLines[patchedLinesLength] = lines[toPos];
140+
if (compareLine(toPos + 1, lines[toPos], operation, content)) {
141+
patchedLinesLength++;
142+
lastContextLineMatched = true;
143+
nextContextLineMustMatch = false;
144+
toPos++;
145+
} else {
146+
if (nextContextLineMustMatch || !maxErrors) {
147+
return null;
148+
}
149+
150+
// Consider 3 possibilities in sequence:
151+
// 1. lines contains a *substitution* not included in the patch context, or
152+
// 2. lines contains an *insertion* not included in the patch context, or
153+
// 3. lines contains a *deletion* not included in the patch context
154+
// The first two options are of course only possible if the line from lines is non-null -
155+
// i.e. only option 3 is possible if we've overrun the end of the old file.
156+
return (
157+
lines[toPos] && (
158+
applyHunk(
159+
hunkLines,
160+
toPos + 1,
161+
maxErrors - 1,
162+
hunkLinesI + 1,
163+
false,
164+
patchedLines,
165+
patchedLinesLength + 1
166+
) || applyHunk(
167+
hunkLines,
168+
toPos + 1,
169+
maxErrors - 1,
170+
hunkLinesI,
171+
false,
172+
patchedLines,
173+
patchedLinesLength + 1
174+
)
175+
) || applyHunk(
176+
hunkLines,
177+
toPos,
178+
maxErrors - 1,
179+
hunkLinesI + 1,
180+
false,
181+
patchedLines,
182+
patchedLinesLength
183+
)
184+
);
57185
}
58-
toPos++;
59186
}
60187
}
61188

62-
return true;
189+
// Before returning, trim any unmodified context lines off the end of patchedLines and reduce
190+
// toPos (and thus oldLineLastI) accordingly. This allows later hunks to be applied to a region
191+
// that starts in this hunk's trailing context.
192+
patchedLinesLength -= nConsecutiveOldContextLines;
193+
toPos -= nConsecutiveOldContextLines;
194+
patchedLines.length = patchedLinesLength;
195+
return {
196+
patchedLines,
197+
oldLineLastI: toPos - 1
198+
};
63199
}
64200

201+
const resultLines = [];
202+
65203
// Search best fit offsets for each hunk based on the previous ones
204+
let prevHunkOffset = 0;
66205
for (let i = 0; i < hunks.length; i++) {
67-
let hunk = hunks[i],
68-
maxLine = lines.length - hunk.oldLines,
69-
localOffset = 0,
70-
toPos = offset + hunk.oldStart - 1;
71-
72-
let iterator = distanceIterator(toPos, minLine, maxLine);
73-
74-
for (; localOffset !== undefined; localOffset = iterator()) {
75-
if (hunkFits(hunk, toPos + localOffset)) {
76-
hunk.offset = offset += localOffset;
206+
const hunk = hunks[i];
207+
let hunkResult;
208+
let maxLine = lines.length - hunk.oldLines + fuzzFactor;
209+
let toPos;
210+
for (let maxErrors = 0; maxErrors <= fuzzFactor; maxErrors++) {
211+
toPos = hunk.oldStart + prevHunkOffset - 1;
212+
let iterator = distanceIterator(toPos, minLine, maxLine);
213+
for (; toPos !== undefined; toPos = iterator()) {
214+
hunkResult = applyHunk(hunk.lines, toPos, maxErrors);
215+
if (hunkResult) {
216+
break;
217+
}
218+
}
219+
if (hunkResult) {
77220
break;
78221
}
79222
}
80223

81-
if (localOffset === undefined) {
224+
if (!hunkResult) {
82225
return false;
83226
}
84227

85-
// Set lower text limit to end of the current hunk, so next ones don't try
86-
// to fit over already patched text
87-
minLine = hunk.offset + hunk.oldStart + hunk.oldLines;
88-
}
228+
// Copy everything from the end of where we applied the last hunk to the start of this hunk
229+
for (let i = minLine; i < toPos; i++) {
230+
resultLines.push(lines[i]);
231+
}
89232

90-
// Apply patch hunks
91-
let diffOffset = 0;
92-
for (let i = 0; i < hunks.length; i++) {
93-
let hunk = hunks[i],
94-
toPos = hunk.oldStart + hunk.offset + diffOffset - 1;
95-
diffOffset += hunk.newLines - hunk.oldLines;
233+
// Add the lines produced by applying the hunk:
234+
for (let i = 0; i < hunkResult.patchedLines.length; i++) {
235+
const line = hunkResult.patchedLines[i];
236+
resultLines.push(line);
237+
}
96238

97-
for (let j = 0; j < hunk.lines.length; j++) {
98-
let line = hunk.lines[j],
99-
operation = (line.length > 0 ? line[0] : ' '),
100-
content = (line.length > 0 ? line.substr(1) : line);
239+
// Set lower text limit to end of the current hunk, so next ones don't try
240+
// to fit over already patched text
241+
minLine = hunkResult.oldLineLastI + 1;
101242

102-
if (operation === ' ') {
103-
toPos++;
104-
} else if (operation === '-') {
105-
lines.splice(toPos, 1);
106-
/* istanbul ignore else */
107-
} else if (operation === '+') {
108-
lines.splice(toPos, 0, content);
109-
toPos++;
110-
} else if (operation === '\\') {
111-
let previousOperation = hunk.lines[j - 1] ? hunk.lines[j - 1][0] : null;
112-
if (previousOperation === '+') {
113-
removeEOFNL = true;
114-
} else if (previousOperation === '-') {
115-
addEOFNL = true;
116-
}
117-
}
118-
}
243+
// Note the offset between where the patch said the hunk should've applied and where we
244+
// applied it, so we can adjust future hunks accordingly:
245+
prevHunkOffset = toPos + 1 - hunk.oldStart;
119246
}
120247

121-
// Handle EOFNL insertion/removal
122-
if (removeEOFNL) {
123-
while (!lines[lines.length - 1]) {
124-
lines.pop();
125-
}
126-
} else if (addEOFNL) {
127-
lines.push('');
248+
// Copy over the rest of the lines from the old text
249+
for (let i = minLine; i < lines.length; i++) {
250+
resultLines.push(lines[i]);
128251
}
129-
return lines.join('\n');
252+
253+
return resultLines.join('\n');
130254
}
131255

132256
// Wrapper that supports multiple file patches via callbacks.

src/util/distance-iterator.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ export default function(start, minLine, maxLine) {
1818
// Check if trying to fit beyond text length, and if not, check it fits
1919
// after offset location (or desired location on first iteration)
2020
if (start + localOffset <= maxLine) {
21-
return localOffset;
21+
return start + localOffset;
2222
}
2323

2424
forwardExhausted = true;
@@ -32,7 +32,7 @@ export default function(start, minLine, maxLine) {
3232
// Check if trying to fit before text beginning, and if not, check it fits
3333
// before offset location
3434
if (minLine <= start - localOffset) {
35-
return -localOffset++;
35+
return start - localOffset++;
3636
}
3737

3838
backwardExhausted = true;

0 commit comments

Comments
 (0)