Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-18817

Missing database indexes for the query used by Solr indexing to get all documents

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 12.10.6
    • Old Core, Search - Solr
    • None
    • MariaDB 10.3.27, Debian 10
    • Unknown

    Description

      After upgrading the flavor version on an XWiki instance using Extension Manager, I have restarted the instance and the Solr Index Job started doing its work, however, at the same time, I have noticed the mysqld process going to 60-70% and staying there for a long while.

      show full processlist was always displaying the following entry, with the limit updating as the job was slowly progressing through the 200K documents that I had in my wiki:

      | 199 | xwiki       | localhost:40818     | xwiki | Query       |    2 | Creating sort index                                                   | select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB, xwikidocum0_.XWD_NAME, xwikidocum0_.XWD_LANGUAGE limit 60200, 100 |    0.000 |
      

      Execution of the indexing job took over 2 hours and all of that was in order to realize that it did not have to index anything (probably the changes fro my upgrade were indexed on the spot):

      2021-07-06 22:53:35,168 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - Starting job of type [solr.indexer] with identifier [[solr, indexer]] 
      ...
      2021-07-07 01:04:59,293 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - 0 documents added, 0 deleted and 0 updated during the synchronization of the Solr index. 
      2021-07-07 01:04:59,360 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - Finished job of type [solr.indexer] with identifier [[solr, indexer]]
      

      Having a better look at the query, I noticed that it was triggering a filesort instead of using indexes:

      explain select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB limit 156300, 100;
      +------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
      | id   | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
      +------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
      |    1 | SIMPLE      | xwikidocum0_ | ALL  | NULL          | NULL | NULL    | NULL | 182534 | Using filesort |
      +------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
      

      Running the query as it is, for a batch of 100 results took:

      100 rows in set (2.483 sec)

      Individual indexes exist for all the clauses in the select and in the order by, but when you have more than 1 clauses and indexes for the order by, no single one is selected and mysql/mariadb turns to a filesort instead, as it ends up being "more efficient" than using the individual indexes.

      If we add an explicit index for the order by clauses, we get a significant improvement:

      CREATE INDEX solr_iterate_all_documents ON xwikidoc (XWD_WEB, XWD_NAME, XWD_LANGUAGE, XWD_VERSION);
      
      explain select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB limit 156300, 100;
      +------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+
      | id   | select_type | table        | type  | possible_keys | key                         | key_len | ref  | rows   | Extra       |
      +------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+
      |    1 | SIMPLE      | xwikidocum0_ | index | NULL          | solr_iterate_all_documents | 2319    | NULL | 156400 | Using index |
      +------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+
      

      ...which can be observed when executing the now indexed query:

      100 rows in set (0.067 sec)
      

      So the conclusion is that this index is vital to increasing the performance of the indexing operation, specially for wikis with a large number of documents. In my case, it would have resulted in an execution that would have been 1 hour faster, i.e. 50% performance gain by improving the get documents query, executed in each batch of 100 documents.

      This could even be considered a solution/improvement for XWIKI-16755.

      The automatic adding of this missing Solr index could probably be handled as part of, or the same way as XWIKI-11660.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              enygma Eduard Moraru
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: