Apr. 24, 2018: HtmlCleaner release 2.22
202 test suite failes with java9: Transformer changed behaviour
200 Adds null end of the DOCTYPE when there is no DOCTYPE
199 "whitespace: pre" CSS property not taken into account
198 Make XPath methods protected to allow extension
193 MathML equations without a namespace prefix are breaking paragraphs
192 Support HTML5 inline content without namespaces
191 Infinite loop on time b li
190 NullPointerException in HtmlCleaner.makeTree
189 Unclosed CDATA sections results in odd behaviour with large documents
May. 11, 2017: HtmlCleaner release 2.21
Bug fix #188 Regression in 2.20, "prune"-tags do not get removed anymore
Thanks to Markus Schlegel for their help with this release.
May. 2, 2017: HtmlCleaner release 2.20
Enhancement - add ability to output to an Ant property from HtmlCleaner.
Enhancement 186 Add strict error checking flag as optional DomSerializer constructor
Bug fix 185 Unclosed CDATA can cause ArrayIndexOutOfBoundsException
Bug fix 175 HTMLCleaner genereates invalid attribute names from bad HTML
Bug fix 125 Html elements are doubled sometimes
Bug fix 57 Use first attribute if duplicated
Bug fix 51 setUseCdataForScriptAndStyle should apply on HtmlSerializer
Thanks to Gintas Grigelionis, Michael Ryan, Philipp Jeitner, legrass and Ivan Bondarenko for their help with this release.
Note there is an algorithm change in this release (see bug 175) where we by default try to change attribute names into valid XML attribute names; you can change this behaviour with two new cleaner properties: allowInvalidAttributeNames and invalidAttributeNamePrefix.
February. 7, 2017: HtmlCleaner release 2.19
Bug fix 183 Real world html causes clean() to eat all available memory
Bug fix 180 duplicate tags added at the end (script,body,html)
Bug fix 173 Infinite loop and OOM if uppercase P tag with xmlns
Bug fix 172 Infinite loop occurs when MathML tags are present
Bug fix 169 Several issues with CDATA blocks
Bug fix 168 DomSerializer doesn't seem to take into account the namespacesAware configuration
Enhancement 167 Make it easier to extend DomSerializer
Bug fix 166 Bad handling of <p> inside <ul> in HTML 5
Bug fix 164 Font tag is not known anymore
Enhancement 159 Add back in Utils.fullUrl()
Bug fix 158 NullPointerException in HtmlCleaner.saveToLastOpenTag
Thanks to Code Buddy, Haadar, Martin Denham, Tibor Dimitriu, Vincent Massol, Guillaume Delhumeau, and Rob Decker for their help with this release.
Note we have a small algorithm change in this release (see bug 166) to help make cleaning of lists and tables more sensible by inserting missing LIs and TDs in the first instance rather than moving invalid content outside; this should improve the quality of cleaned HTML, but YMMV. Please give your feedback on this change and report any bugs!
November. 2, 2016: HtmlCleaner release 2.18
Bug fix 179 java -jar option does not work
Thanks to Card Package for their help with this release.
October. 19, 2016: HtmlCleaner release 2.17
Bug fix 178 java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.htmlcleaner.TagNode
Bug fix 176 Crash: IllegalArgumentException in convertToUnicode
Bug fix 165 Invalid HTML characters are not removed
Thanks to Code Buddy and Matthew Fulgo for their help with this release.