When the page hierarchy is based on the page reference (the default since XWiki 7.2) a page can have two types of child nodes:
- (non-terminal) nested pages (i.e. nested spaces)
- terminal pages
Take for instance the page A.WebHome. It can have:
- A.B.WebHome (nested page)
- A.C (terminal page in the A space)
The tree needs to display:
So these two types of child nodes must be mixed and sorted by name. The problem is that the information about each of these types is stored in a separate table:
- XWikiSpace (xwikispace)
- XWikiDocument (xwikidoc)
Nested pages are documents too so they are found in the documents table but is not enough to look there because the paths in the page hierarchy can have holes. The fact that a page A.B.C.WebHome exists doesn't mean that A.B.WebHome exists, but the tree still needs to show B as child of A, in order to be able to get to C. That's why the tree needs to check the spaces table also.
The solution that has been implemented was to perform some kind of union between the spaces table and the documents table. But since HQL doesn't support unions the current query does a cartesian product between the spaces table and the documents table and then filters out rows that are either child spaces or child terminal pages. This is very costly. It can take up to 3 minutes on a large wiki. It clearly doesn't scale.
What options do we have?
- Perform the union on the Java side. This means executing two queries: one to get the child spaces and one the child terminal pages, and then sort them. This is probably faster than what we have right now but it has two drawbacks:
- It complicates the pagination. We'll have to re-fetch all the child spaces and pages from the database again and again each time the user clicks "More ...".
- It can still be costly if the number of child spaces or pages is large. Take for instance the XWiki space where we keep the user profiles. If your wiki has 50k users then expanding the XWiki node will be expensive.
- Use Solr. The downside is that the tree will not always be up to date, for instance when you delete the index after an upgrade. The good part is that it's going to be fast because we won't have to load the documents from the database and we won't have to re-render the page title as they are already indexed. The problem is that we're currently indexing space information on the documents rows. In order to handle the holes in the page hierarchy we may have to add a new type of rows in the Solr index, to keep space information.