XWiki Platform
  1. XWiki Platform
  2. XWIKI-13939

Serializing solr indexer job progress on big wikis takes way too much memory

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 8.4.3
    • Fix Version/s: 8.4.4, 7.4.6, 9.0-rc-1
    • Component/s: Search - Solr
    • Labels:
      None
    • Difficulty:
      Unknown
    • Documentation:
      N/A
    • Documentation in Release Notes:
      N/A
    • Similar issues:

      Description

      Seems to be the root cause of what takes myxwiki.org down most of the time.

      Pretty much all the memory goes in a map XStream is using to remember everything it's serializing. In practice the map contains tons of duplicates of com.thoughtworks.xstream.io.path.Path instances totally useless for most of them. In my case it's a Solr indexed job, not sure if it's specific to this job or if it's progress related in general.

      Looks like a bug, from what I understand this map is here to remember unique objects and there is no reason it contains so many duplicates. Or maybe I don't read Yourkit properly and it's not really duplicates, it's just that the progress usually contains an insane number of object and XStream is really not designed to support this properly.

      The worst is that this whole map is totally useless in case of job progress since all serialized objects are unique, no need for any optimization to get smaller XML like XStream does by default.

        Issue Links

          Activity

          Hide
          Thomas Mortagne added a comment - - edited

          The last resort quick fix is to not serialize the progress (this is what I did on myxwiki.org in a patch for now) since we are not really using it much yet in unserialized job statuses (the performance tree is not displayed in job displayer for example).

          Seems to me the easiest here is actually to get rid of XStream at least for the progress serialization since it causes way more issues than it solves and serializing job progress (at least DefaultJobProgress) is not something complex.

          Show
          Thomas Mortagne added a comment - - edited The last resort quick fix is to not serialize the progress (this is what I did on myxwiki.org in a patch for now) since we are not really using it much yet in unserialized job statuses (the performance tree is not displayed in job displayer for example). Seems to me the easiest here is actually to get rid of XStream at least for the progress serialization since it causes way more issues than it solves and serializing job progress (at least DefaultJobProgress) is not something complex.
          Hide
          Thomas Mortagne added a comment - - edited

          So the issue comes from the a broken progress tree caused by a combination of XCOMMONS-1142 and the solr index job not protecting its steps enough.

          The database requests are producing unrelated new steps in the middle of the job and make it loose its way: all the following steps end up as children of the previous one. In the case of myxwiki we are talking 215 wikis with database steps produced once every plus a database step produced once every 100 documents. In practice it means XStream quickly end up duplicating String[800+] arrays (XStream recreated each complete path for each item from scratch instead of creating a tree which would not have had this memory issue) like crazy...

          Show
          Thomas Mortagne added a comment - - edited So the issue comes from the a broken progress tree caused by a combination of XCOMMONS-1142 and the solr index job not protecting its steps enough. The database requests are producing unrelated new steps in the middle of the job and make it loose its way: all the following steps end up as children of the previous one. In the case of myxwiki we are talking 215 wikis with database steps produced once every plus a database step produced once every 100 documents. In practice it means XStream quickly end up duplicating String [800+] arrays (XStream recreated each complete path for each item from scratch instead of creating a tree which would not have had this memory issue) like crazy...
          Hide
          Thomas Mortagne added a comment - - edited

          Note that the same apply to other code like DefaultWikiCopier#copyDocuments. Basically startStep can be used alone only when the step code is not going to produce any new step.

          I will create a jira issue for each one.

          Show
          Thomas Mortagne added a comment - - edited Note that the same apply to other code like DefaultWikiCopier#copyDocuments. Basically startStep can be used alone only when the step code is not going to produce any new step. I will create a jira issue for each one.

            People

            • Assignee:
              Thomas Mortagne
              Reporter:
              Thomas Mortagne
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: