Uploaded image for project: 'XWiki Commons'
  1. XWiki Commons
  2. XCOMMONS-1164

Upgrade to HTML Cleaner 2.22



    • Task
    • Resolution: Fixed
    • Major
    • 10.6-rc-1
    • 9.0
    • Dependency Upgrades
    • None
    • Unit
    • N/A


      We're currently using v2.16.

      RN at http://htmlcleaner.sourceforge.net/release.php

      Apr. 24, 2018: HtmlCleaner release 2.22
          202 test suite failes with java9: Transformer changed behaviour
          200 Adds null end of the DOCTYPE when there is no DOCTYPE
          199 "whitespace: pre" CSS property not taken into account
          198 Make XPath methods protected to allow extension
          193 MathML equations without a namespace prefix are breaking paragraphs
          192 Support HTML5 inline content without namespaces
          191 Infinite loop on time b li
          190 NullPointerException in HtmlCleaner.makeTree
          189 Unclosed CDATA sections results in odd behaviour with large documents
      May. 11, 2017: HtmlCleaner release 2.21
      Bug fix #188 Regression in 2.20, "prune"-tags do not get removed anymore
      Thanks to Markus Schlegel for their help with this release.
      May. 2, 2017: HtmlCleaner release 2.20
      Enhancement - add ability to output to an Ant property from HtmlCleaner.
      Enhancement 186	Add strict error checking flag as optional DomSerializer constructor
      Bug fix 185	Unclosed CDATA can cause ArrayIndexOutOfBoundsException
      Bug fix 175	HTMLCleaner genereates invalid attribute names from bad HTML
      Bug fix 125	Html elements are doubled sometimes
      Bug fix 57	Use first attribute if duplicated
      Bug fix 51	setUseCdataForScriptAndStyle should apply on HtmlSerializer
      Thanks to Gintas Grigelionis, Michael Ryan, Philipp Jeitner, legrass and Ivan Bondarenko for their help with this release.
      Note there is an algorithm change in this release (see bug 175) where we by default try to change attribute names into valid XML attribute names; you can change this behaviour with two new cleaner properties: allowInvalidAttributeNames and invalidAttributeNamePrefix.
      February. 7, 2017: HtmlCleaner release 2.19
      Bug fix 183	Real world html causes clean() to eat all available memory
      Bug fix 180	duplicate tags added at the end (script,body,html)
      Bug fix 173	Infinite loop and OOM if uppercase P tag with xmlns
      Bug fix 172	Infinite loop occurs when MathML tags are present
      Bug fix 169	Several issues with CDATA blocks
      Bug fix 168	DomSerializer doesn't seem to take into account the namespacesAware configuration
      Enhancement 167	Make it easier to extend DomSerializer
      Bug fix 166	Bad handling of <p> inside <ul> in HTML 5
      Bug fix 164	Font tag is not known anymore
      Enhancement 159	Add back in Utils.fullUrl()
      Bug fix 158	NullPointerException in HtmlCleaner.saveToLastOpenTag
      Thanks to Code Buddy, Haadar, Martin Denham, Tibor Dimitriu, Vincent Massol, Guillaume Delhumeau, and Rob Decker for their help with this release.
      Note we have a small algorithm change in this release (see bug 166) to help make cleaning of lists and tables more sensible by inserting missing LIs and TDs in the first instance rather than moving invalid content outside; this should improve the quality of cleaned HTML, but YMMV. Please give your feedback on this change and report any bugs!
      November. 2, 2016: HtmlCleaner release 2.18
      Bug fix 179 java -jar option does not work
      Thanks to Card Package for their help with this release.
      October. 19, 2016: HtmlCleaner release 2.17
      Bug fix 178 java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.htmlcleaner.TagNode
      Bug fix 176 Crash: IllegalArgumentException in convertToUnicode
      Bug fix 165 Invalid HTML characters are not removed
      Thanks to Code Buddy and Matthew Fulgo for their help with this release.


        Issue Links



              vmassol Vincent Massol
              vmassol Vincent Massol
              0 Vote for this issue
              2 Start watching this issue