Currently we use String.hashcode() to generate document id from a key based on the fullName and language (for translated document only) of a document, concatenated with ':'. This cause 2 critical issues:
- String.hashcode() is a bad hash for small differences, especially in small string, it clashes event on a 2 chars string! (see
XWIKI-4396). Moreover, it is also a 32bits code while we have room for 64bits.
- original (not translated) document "space.name:fr" clashes with translated document "space.name" in "fr" (in the cache also, see
In the current architecture and model, fully fixing
XWIKI-4396 is really not so easy, but we may at least improve the situation, by using a better hash algorithm and a better key in input.
XWIKI-6169 proposed patch for the cache key (from Caleb), the key 8:fullname2:fr (where 8 is length of fullname and 2 is length of language) would solve the key clash issue. Using an improved hash algorithm like MD5 (or SHA-x), using the lower 64bits, will surely reduce the likelihood of a collision.
more here: http://markmail.org/thread/q5qn4w5jrulcqqiw
Moving ids from 32bits to 64bits will also reduce the risk of collision during the migration. Note that the migration should be mandatory, else we will create duplicate of the minimal set of documents with the new key, like when the database is empty!