Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-7871

PDF export fails on some pages because of JTidy messing up CDATA sections

    Details

    • Difficulty:
      Unknown
    • Documentation:
      N/A
    • Documentation in Release Notes:
      N/A
    • Similar issues:

      Description

      To reproduce, but this code in a wiki page:

      = Some plain text =
      
      test
      
      == Invalid HTML ==
      
      {{html clean="false"}}
      this is invalid html <br>
      {{/html}}
      
      == CDATA section ==
      
      {{html clean="false"}}
      <script type="text/javascript">
      //<![CDATA[
      var test = 245;
      var bool1 = test < 200;
      //]]>
      </script>
      {{/html}}
      

      Exporting it in pdf will fail, because of the invalid html.

      However, the cleaning of the invalid html upon pdf export is JTidy's job and it usually does it well, but in this case JTidy is actually messing up the html more, because of the < in the CDATA section, which he doesn't ignore properly.

      In the pdf export code, after running JTidy, we parse resulted xhtml again to fix ids and if this parsing fails, we return the initial html as "cleaned xhtml" to send to the xhtml2fo.xsl transformation to output pdf. This transformation will fail because of the invalid html.

      I tested on a local instance with the latest release of jtidy, jtidy-r938, available here http://repo2.maven.org/maven2/net/sf/jtidy/jtidy/r938/ and it works fine, it handles CDATA properly, so the fix would be to upgrade JTidy, but not before analyzing how is this affecting the other places where jTidy is used in xwiki.

        Attachments

          Activity

            People

            • Assignee:
              sdumitriu Sergiu Dumitriu
              Reporter:
              lucaa Anca Luca
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Date of First Response: