Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
15.10
-
Unknown
-
Description
At the moment, we're using MyersDiff from jrcs to compute the difference between two revisions for storing a diff. However, this algorithm needs time and space O(ND) where N is the sum of the lengths of the compared document revisions and D is the length of a minimum edit script between them. Therefore, in the worst case, storing a revision can be quadratic in the size of the document. See also XWIKI-21224 for a similar issue with slow diff algorithms.
When storing a revision, there is no need to store a human-readable diff. Therefore, we should instead use an algorithm that needs linear time and space like XDelta for storing the difference between two revisions. Unfortunately, I couldn't find a maintained Java implementation. Bentley-McIlroy as implemented by Google's Open VCDiff seems an alternative that has a Java port that doesn't look completely abandoned (an update a year ago). See also https://ably.com/blog/practical-guide-to-diff-algorithms for a discussion of different diff algorithms.
An alternative could be to limit the time spent on computing the diff and if it takes too long, just store a full revision.
Attachments
Issue Links
- relates to
-
XWIKI-22613 Updating the history can take a very long time when there are a lot of objects
- Open