The following test added to DefaultHTMLCleanerTest shows the problem:
The HTML cleaner should preserve the double XML encoding in attribute values (the use case being attributes with XML value). Right now the following:
ends up as:
which breaks any code that tries to parse the XML from the attribute value later on..