Details
Description
When the security cache is near full capacity, the LRU eviction mechanism triggered by adding a new entry may cause that added entry to be removed through the cascading removal of child nodes of the evicted entry.
That looks very unlucky (and this could be improved), but it happens more often than you first thought.
Hopelessly, this trigger a deadlock issue in the cache container which recovers after 10 seconds. During these 10 seconds, the whole cache is write locked which is very bad for performance, but more importantly, the interrupted eviction create a corrupted situation, where nodes out of the cache, are still referenced from nodes still in the cache and that should have been evicted.
This is the main issue. However, during investigation of it, I suspect that a race condition could also lead to a single thread taking wrong decision about user memberships. This is far less visible and persistent, but this could happen at anytime and could inject some wrong decision into the cache for a while. This race condition has been introduced when the usage of links between entries in the cache, initially only a means of eviction, has been reused for implementing an internal group cache. Since this is closely related, it will be fixed as well.