This job is triggered at XWiki startup and does two things:
- removes from the Solr index the entries that correspond to XWiki documents that don't exist anymore in the database
- indexes the XWiki documents that are not already indexed (or for which a newer version exists in the database)
The goal is to synchronize the Solr index with the XWiki database.
This synchronization is costly when the Solr index or the XWiki database is big because:
- for each document entry in the Solr index we need to check if a corresponding XWiki document exists in the database
- for each document in the XWiki database we need to check if the Solr index has a corresponding entry (for the current version of the XWiki document).
Basically we need to iterate over the entire Solr Index and XWiki database at startup. Even if this job runs in a daemon thread with minimum priority it still pushes the CPU to 100% for a while, depending on the size of the Solr index and XWiki database. Moreover, queries on both sides are not done with pagination. In other words, currently a considerable part of the Solr Index and XWiki database is loaded into memory.
We can improve this by:
- Paginate the queries and leave the CPU after each batch
- Stream the Solr query results
- Improve the Solr queries by disabling faceting and highlighting