Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-21680

Improve the performance of storing diffs for document revisions

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • 15.10
    • Old Core
    • Unknown

    Description

      At the moment, we're using MyersDiff from jrcs to compute the difference between two revisions for storing a diff. However, this algorithm needs time and space O(ND) where N is the sum of the lengths of the compared document revisions and D is the length of a minimum edit script between them. Therefore, in the worst case, storing a revision can be quadratic in the size of the document. See also XWIKI-21224 for a similar issue with slow diff algorithms.

      When storing a revision, there is no need to store a human-readable diff. Therefore, we should instead use an algorithm that needs linear time and space like XDelta for storing the difference between two revisions. Unfortunately, I couldn't find a maintained Java implementation. Bentley-McIlroy as implemented by Google's Open VCDiff seems an alternative that has a Java port that doesn't look completely abandoned (an update a year ago). See also https://ably.com/blog/practical-guide-to-diff-algorithms for a discussion of different diff algorithms.

      An alternative could be to limit the time spent on computing the diff and if it takes too long, just store a full revision.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              MichaelHamann Michael Hamann
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: