Skip to content

Array diff bug #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yhack opened this issue Jul 18, 2014 · 11 comments
Closed

Array diff bug #30

yhack opened this issue Jul 18, 2014 · 11 comments

Comments

@yhack
Copy link

yhack commented Jul 18, 2014

Great program, but I think I found a bug. These steps will reproduce it:

src = [1,2,3]
des = [3,1,4,2]
Create a patch from these then apply the patch to src. It will produce an index out of bounds exception.

@kxepal
Copy link
Contributor

kxepal commented Jul 18, 2014

EWORKSFORME:

>>> import jsonpatch
>>> src = [1, 2, 3]
>>> dst = [3, 1, 2, 4]
>>> patch = jsonpatch.make_patch(src, dst)
>>> patch
[{"path": "/0", "from": "/2", "value": 3, "op": "move"}, {"path": "/3", "value": 4, "op": "add"}]
>>> patch.apply(src)
[3, 1, 2, 4]

@kxepal
Copy link
Contributor

kxepal commented Jul 18, 2014

Ouch...it should be [3, 1, 4, 2]. Ha-ha. My bad.
The issue is in _optimize method which produces quite awkward patch for your case:

[{"value": 1, "op": "replace", "path": "/1"}, {"value": 2, "op": "move", "from": "/0", "path": "/3"}, {"value": 4, "op": "add", "path": "/2"}]

Note, that it tries to move 2 from 0 element, but...there was no such value ever.

without optimization jsonpatch produces more verbose, but semicorrect patch (at least it applies without problems):

[{"op": "remove", "value": 2, "path": "/1"}, {"op": "remove", "value": 2, "path": "/0"}, {"op": "add", "value": 1, "path": "/1"}, {"op": "add", "value": 4, "path": "/2"}, {"op": "add", "value": 2, "path": "/3"}]

I'll take a look tomorrow on fresh head. Btw, how about make jsonpatch validate which value it tries to move/remove? The last patch really believe that there is value 2 both on 0 and 1 positions, but that's not a true.

@kxepal
Copy link
Contributor

kxepal commented Jul 25, 2014

Short update. I'm going to fix the way array patch been generated since there is another issue in whole algorithm: it uses to find out the longest common sequence for both arrays and build patch around it, but completely fails to provide reasonable patch for the case like this issue has: multiple short ranges of common values.

@stefankoegl
Copy link
Owner

@kxepal, any update on this?

@kxepal
Copy link
Contributor

kxepal commented Sep 26, 2014

@stefankoegl sorry, will take a second look on weekends. First attempt to solve the issue became a mess and ugly so I abandon it for a while until better idea will come. May be will have better luck with second sight.

@alexdutton
Copy link

Here's another example. One list is a permutation of the other:

>>> a = ['23232812', '23232808', '23232807', '23232809', '23232811', '23232805', '23232810', '23232814', '23232806', '23232813']
>>> b = ['23232805', '23232806', '23232807', '23232808', '23232809', '23232810', '23232811', '23232812', '23232813', '23232814']
>>> import jsonpatch
>>> jsonpatch.match_patch(a, b).apply(a) == b
False

@nxsofsys
Copy link

nxsofsys commented Dec 8, 2014

Hello, you may try my small utility - https://github.com/nxsofsys/jsondiff
Here is some results of diffs it makes, related to issues in this thread, to apply patch I used jsonpatch:

src:
['23232812', '23232808', '23232807', '23232809', '23232811',
 '23232805', '23232810', '23232814', '23232806', '23232813']
dst:
['23232805', '23232806', '23232807', '23232808', '23232809',
 '23232810', '23232811', '23232812', '23232813', '23232814']
diff:
[{'from': u'/1', 'op': 'move', 'path': u'/2'},
 {'from': u'/5', 'op': 'move', 'path': u'/0'},
 {'from': u'/5', 'op': 'move', 'path': u'/6'},
 {'from': u'/1', 'op': 'move', 'path': u'/6'},
 {'from': u'/8', 'op': 'move', 'path': u'/1'},
 {'from': u'/8', 'op': 'move', 'path': u'/9'}]
result == dst:
True

src:
{'data': [{'foo': 1}, {'bar': [1, 2, 3]}, {'baz': {'1': 1, '2': 2}}]}
dst:
{'data': [{'foo': [42]}, {'bar': []}, {'baz': {'boo': 'oom!'}}]}
diff:
[{'op': 'remove', 'path': u'/data/0'},
 {'op': 'add', 'path': u'/data/0', 'value': {'foo': [42]}},
 {'op': 'remove', 'path': u'/data/1'},
 {'op': 'add', 'path': u'/data/1', 'value': {'bar': []}},
 {'op': 'remove', 'path': u'/data/2'},
 {'op': 'add', 'path': u'/data/2', 'value': {'baz': {'boo': 'oom!'}}}]
result == dst:
True

src:
{'foo': [4, 1, 2, 3]}
dst:
{'foo': [1, 2, 3, 4]}
diff:
[{'from': u'/foo/0', 'op': 'move', 'path': u'/foo/3'}]
result == dst:
True

src:
[1, 2, 3]
dst:
[3, 1, 4, 2]
diff:
[{'from': u'/2', 'op': 'move', 'path': u'/0'},
 {'op': 'add', 'path': u'/2', 'value': 4}]
result == dst:
True

Moreover, my tool makes diffs really faster! Making diff of two arrays with 20000 random numbers is no problem. But some bugs exist I think =)
Patch maked by func 'make'.

@domdom
Copy link

domdom commented May 20, 2015

So this is still an issue, and I think it's all about how _optimize combines operations like remove and add as replace and move operations.
{'op' : 'remove', 'path' : '/0', 'value' : 'some matched value'}
...
{'op' : 'add', 'path' : '/1', 'value' : 'some matched value'}
At first it may seem logical to replace these two commands with a single move command, but in order to do that properly you have to consider all the commands that alter the state of this array in between.

Let's say you replace the first command with a 'move' and delete the add:
{'op' : 'move', 'from' : '/0', 'path' : '/1'}
...
<deleted>
But that path '/1' could actually be wrong. Any add or remove operations that take place between these two will alter what the correct location is supposed to be at the time of the move.

@apinkney97
Copy link
Contributor

@domdom is correct: any operation that potentially changes the indexes of a range of elements (ie add, remove, move, copy) needs to be treated with caution when optimising.

I think the only "trivial" optimisation is a remove directly followed by an add to the same path, which can be substituted for a replace.

@apinkney97
Copy link
Contributor

Incidentally, this is the patch that gets produced for [1, 2, 3] -> [3, 1, 4, 2] if you don't call _optimize on it:

                                              [1, 2, 3]
{"op": "remove", "path": "/1", "value": 2}    [1, 3]
{"op": "remove", "path": "/0", "value": 2}    [3]
{"op": "add", "path": "/1", "value": 1}       [3, 1]
{"op": "add", "path": "/2", "value": 4}       [3, 1, 4]
{"op": "add", "path": "/3", "value": 2}       [3, 1, 4, 2]

do we know why the second remove has "value": 2? Shouldn't it be 1?

My hand-made json patch for this change would be

                                              [1, 2, 3]
{"op": "move", "from": "/2", "path": "/0"}    [3, 1, 2]
{"op": "add", "path": "/2", "value": 4}       [3, 1, 4, 2]

which I arrived at by noticing that 1 and 2 are in the same order in the resultant list. I don't know how I'd put that down programatically (least of all in a nice generic way that would work in all cases).

velislavgerov added a commit to velislavgerov/python-json-patch that referenced this issue Jul 26, 2017
velislavgerov added a commit to velislavgerov/python-json-patch that referenced this issue Jul 26, 2017
stefankoegl added a commit that referenced this issue Sep 10, 2017
@stefankoegl
Copy link
Owner

This issue seems to be fixed at least in current master, when looking at the example posted above.

>>> import jsonpatch
>>> a = ['23232812', '23232808', '23232807', '23232809', '23232811', '23232805', '23232810', '23232814', '23232806', '23232813']
>>> b = ['23232805', '23232806', '23232807', '23232808', '23232809', '23232810', '23232811', '23232812', '23232813', '23232814']
>>> jsonpatch.make_patch(a, b).apply(a) == b
True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants