Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 12.10.6
Component/s: Old Core, Search - Solr
Labels:
None
Environment:
MariaDB 10.3.27, Debian 10

Difficulty:
Unknown
Documentation:
N/A
Documentation in Release Notes:
N/A
Similar issues:

Description

After upgrading the flavor version on an XWiki instance using Extension Manager, I have restarted the instance and the Solr Index Job started doing its work, however, at the same time, I have noticed the mysqld process going to 60-70% and staying there for a long while.

show full processlist was always displaying the following entry, with the limit updating as the job was slowly progressing through the 200K documents that I had in my wiki:

| 199 | xwiki       | localhost:40818     | xwiki | Query       |    2 | Creating sort index                                                   | select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB, xwikidocum0_.XWD_NAME, xwikidocum0_.XWD_LANGUAGE limit 60200, 100 |    0.000 |

Execution of the indexing job took over 2 hours and all of that was in order to realize that it did not have to index anything (probably the changes fro my upgrade were indexed on the spot):

2021-07-06 22:53:35,168 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - Starting job of type [solr.indexer] with identifier [[solr, indexer]] 
...
2021-07-07 01:04:59,293 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - 0 documents added, 0 deleted and 0 updated during the synchronization of the Solr index. 
2021-07-07 01:04:59,360 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@540d56f1([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - Finished job of type [solr.indexer] with identifier [[solr, indexer]]

Having a better look at the query, I noticed that it was triggering a filesort instead of using indexes:

explain select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB limit 156300, 100;
+------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
| id   | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
|    1 | SIMPLE      | xwikidocum0_ | ALL  | NULL          | NULL | NULL    | NULL | 182534 | Using filesort |
+------+-------------+--------------+------+---------------+------+---------+------+--------+----------------+

Running the query as it is, for a batch of 100 results took:

100 rows in set (2.483 sec)

Individual indexes exist for all the clauses in the select and in the order by, but when you have more than 1 clauses and indexes for the order by, no single one is selected and mysql/mariadb turns to a filesort instead, as it ends up being "more efficient" than using the individual indexes.

If we add an explicit index for the order by clauses, we get a significant improvement:

CREATE INDEX solr_iterate_all_documents ON xwikidoc (XWD_WEB, XWD_NAME, XWD_LANGUAGE, XWD_VERSION);

explain select xwikidocum0_.XWD_WEB as col_0_0_, xwikidocum0_.XWD_NAME as col_1_0_, xwikidocum0_.XWD_LANGUAGE as col_2_0_, xwikidocum0_.XWD_VERSION as col_3_0_ from xwikidoc xwikidocum0_ order by xwikidocum0_.XWD_WEB limit 156300, 100;
+------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+
| id   | select_type | table        | type  | possible_keys | key                         | key_len | ref  | rows   | Extra       |
+------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+
|    1 | SIMPLE      | xwikidocum0_ | index | NULL          | solr_iterate_all_documents | 2319    | NULL | 156400 | Using index |
+------+-------------+--------------+-------+---------------+-----------------------------+---------+------+--------+-------------+

...which can be observed when executing the now indexed query:

100 rows in set (0.067 sec)

So the conclusion is that this index is vital to increasing the performance of the indexing operation, specially for wikis with a large number of documents. In my case, it would have resulted in an execution that would have been 1 hour faster, i.e. 50% performance gain by improving the get documents query, executed in each batch of 100 documents.

This could even be considered a solution/improvement for XWIKI-16755.

The automatic adding of this missing Solr index could probably be handled as part of, or the same way as XWIKI-11660.

Attachments

Issue Links

duplicates

XWIKI-15166 SOLR reindexation job cause serious perfomance issues on large database

Closed

is related to

XWIKI-11660 Automatically create remaining database indexes

Open

relates to

XWIKI-22926 Make it possible to configure the batch size in the Solr index synchronization on startup

Closed

Activity

People

Assignee:: Michael Hamann

Reporter:: Eduard Moraru

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Jul/21 02:13

Updated:: 14/Mar/25 09:20

Resolved:: 14/Mar/25 09:20

Date of First Response:: 26/Nov/24 7:21 PM