Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-7871

PDF export fails on some pages because of JTidy messing up CDATA sections

    XMLWordPrintable

Details

    • Unknown
    • N/A
    • N/A

    Description

      To reproduce, but this code in a wiki page:

      = Some plain text =
      
      test
      
      == Invalid HTML ==
      
      {{html clean="false"}}
      this is invalid html <br>
      {{/html}}
      
      == CDATA section ==
      
      {{html clean="false"}}
      <script type="text/javascript">
      //<![CDATA[
      var test = 245;
      var bool1 = test < 200;
      //]]>
      </script>
      {{/html}}
      

      Exporting it in pdf will fail, because of the invalid html.

      However, the cleaning of the invalid html upon pdf export is JTidy's job and it usually does it well, but in this case JTidy is actually messing up the html more, because of the < in the CDATA section, which he doesn't ignore properly.

      In the pdf export code, after running JTidy, we parse resulted xhtml again to fix ids and if this parsing fails, we return the initial html as "cleaned xhtml" to send to the xhtml2fo.xsl transformation to output pdf. This transformation will fail because of the invalid html.

      I tested on a local instance with the latest release of jtidy, jtidy-r938, available here http://repo2.maven.org/maven2/net/sf/jtidy/jtidy/r938/ and it works fine, it handles CDATA properly, so the fix would be to upgrade JTidy, but not before analyzing how is this affecting the other places where jTidy is used in xwiki.

      Attachments

        Activity

          People

            sdumitriu Sergiu Dumitriu
            lucaa Anca Luca
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: