Details
-
Improvement
-
Resolution: Fixed
-
Critical
-
None
-
None
-
Unit
-
Unknown
-
Description
Currently we use String.hashcode() to generate document id from a key based on the fullName and language (for translated document only) of a document, concatenated with ':'. This cause 2 critical issues:
- String.hashcode() is a bad hash for small differences, especially in small string, it clashes event on a 2 chars string! (see
XWIKI-4396). Moreover, it is also a 32bits code while we have room for 64bits. - original (not translated) document "space.name:fr" clashes with translated document "space.name" in "fr" (in the cache also, see
XWIKI-6169)
In the current architecture and model, fully fixing XWIKI-4396 is really not so easy, but we may at least improve the situation, by using a better hash algorithm and a better key in input.
Following XWIKI-6169 proposed patch for the cache key (from Caleb), the key 8:fullname2:fr (where 8 is length of fullname and 2 is length of language) would solve the key clash issue. Using an improved hash algorithm like MD5 (or SHA-x), using the lower 64bits, will surely reduce the likelihood of a collision.
more here: http://markmail.org/thread/q5qn4w5jrulcqqiw
Moving ids from 32bits to 64bits will also reduce the risk of collision during the migration. Note that the migration should be mandatory, else we will create duplicate of the minimal set of documents with the new key, like when the database is empty!
Attachments
Issue Links
- depends on
-
XWIKI-7006 Accessing a store that is not migrated to the latest data version should not be allowed
- Closed
- is duplicated by
-
XWIKI-3719 Cache the database name in the Hibernate-based storage
- Closed
- relates to
-
XWIKI-6169 Cache incoherency from collisions in cache key generation.
- Closed
-
XWIKI-4396 Duplicated document id
- Closed