Our company is deploying a new XEM instance, using version 4.2-milestone-2, with Apache, Tomcat and MySQL.
We have configured Apache to forward SSL client certificate authentication as if it was HTTP Basic Authentication.
The forwarded REMOTE_USER parameter is filled with the SSL client certificate property UID (in case you are wondering, we had to apply two patches to Debian's latest Apache to make this work, but it works without any glitches).
In order to allow a user to connect to a given subwiki we simply check on Apache the allowed set of certificates and we create the XWiki user profile. For our administrators the profile is global (xwiki). For our users the profile is local (subwiki).
I hope it's clear enough. Will explain more if needed.
Everything seemed to work flawlessly, until one and only one of our administrators discovered an apparent bug.
When he visits the administration page Main.XWikiPreferences and navigates to the Users section, in one and only one of the subwikis, it gets a login prompt.
Any successive page requests, in any wiki, redirects him to the login page.
The only workaround is to restart Tomcat.
We checked, and the REMOTE_USER parameter is still sent to XWiki. The problem seems instead related to how the XWikiUsers object is cached, or retrieved: it gets corrupted somewhere.
This is the kind of error we saw in the logs (DEBUG activated):
After some investigation, we found the problem within XWikiAuthServiceImpl.java:checkPassword(). We added some debugging functions, like this :
This is the output when the problem happens :
Seems like the applications retrieves the correct user page xwiki:XWiki.1008, but there is no XWikiUsers object inside.
We tried to create a new subwiki, named "bug", and replace the database content with the one from "cgn". The bug does not happen on "bug".
We tried to remove the "cgn" subwiki and recreate it. The bug still happens.
We had some accents in the Pretty name for "cgn", we removed them. The bug still happens.
The Pretty name of "cgn" is quite long, but we have another wiki with a longer name and the bug does not happen.
Everything seems to point to the fact that the database/wiki name, "cgn", is causing this. But I find no reference to this word within the code.
There also is absolutely nothing that helps us understand why all this only happens with the administrator with UID 1008 and not with others, like 1017.
I tried to bypass checkPassword() with a return true, since it is not necessary for our setup, but the XWikiUsers object is obviously needed in other places and, while the login page is not shown anymore, any other page is broken.
This is my first time debugging Java at this level, but we really have to solve this problem ASAP. I hope some developer will hop in and help us help you.
I dumped the JVM heap just after opening Main.WebHome, after opening XWiki.XWikiPreferences and after opening the Users section of the latter, all using the bugged user into the bugged wiki.
Then I loaded the dumps on Eclipse Memory Analyzer and played with OQL a little, in order to find the user's XWiki.XWikiUsers object each time.
The result is that I found, all three times, a BaseObject containing all the information about the user "XWiki.1008". I then found, in the following order, the correct DocumentReference and XWikiDocument.
The latter, of course, contains a reference to an xObjects object which, after following a double-chained array, links us back to the BaseObject/XWiki.XWikiUsers object.
This last link is what seems to miss the application the third time, and for any request after that.
The code does:
And the result is null.
Instead, manually, I find the object.
This gets me to think that the "object parsing" bit, run by XWikiDocument.getObject(), is the real problem. It is no longer capable to find and extract the corresponding BaseObject.
I'm currently looking in this direction. Please feel free to hop in at any time and give me some help. Thank you.
Edited 2012-08-29 #2
I added some more debug to checkPassword():
The result during the first load of XWiki.XWikiPreferences:
And the following is a second load of the same page:
The loaded document is the same one, but all the objects have disappeared.
Edited 2012-08-29 #3
Ok, now it's getting quite esoteric.
I've looked everywhere in the platform code, and the only class modifying the xObjects attribute is XWikiDocument itself. Like it should be, since it's private.
I planted a debugging line at the beginning of every function in that class that interacts with the attribute.
This is the result:
I tried to understand what's going on.
- When XWiki.XWikiPreferences is called the list of objects gets cloned, probably in another, identical, document object.
- The cloning seems to succeed, as we see the five addXObject() running with the number of objects incrementing.
- Some time after that, in fact a little after the second call to XWikiAuthServiceImpl.checkPassword(), the referenced document has an empty xObjects attribute.
- Any following use of the xwiki:XWiki.1008 document shows that xObjects is empty.
- I have no way to tell which document object is referenced at a given time (I found at least 6 xwiki:XWiki.1008 document objects in the heap dumps), since all the identifying functions are based on the private function getLocalKey(), which is based on a few identical parameters.
I must say, I feel a bit lost.
Thanks to many contributors on IRC, further debugging placed the problem into the fact that a new user document, empty, is stored in cache. This is something that should not happen.
Now that I have a workaround, the problem is less urgent for us.
I still would like to solve it though, so please feel free to ask if you have any proposition or need any other information.
Here is the workaround, in XWikiCacheStore.loadXWikiDoc() :
Edited 2012-08-30 #2
Here is a stack trace, taken just after an empty user document object has been saved in cache, direct consequence of this bug: http://pastebin.com/vptb9AfY