Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-13932

Bad float in PDF might cause PDF attachment to not be indexed by SOLR

    Details

    • Difficulty:
      Unknown
    • Documentation:
      N/A
    • Documentation in Release Notes:
      N/A
    • Similar issues:

      Description

      The error generated by the indexer is:

      Exception in thread "main" java.io.IOException: Error expected floating point number actual='0.-262'
      	at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:81)
      	at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
      	at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:939)
      

      This issue is PDFBOX-3500 which is fixed in pdfbox-2.0.4. It is special case of issue PDFBOX-3369 that wasn't fixed in pdfbox-2.0.2. Currently, tika-1.14 used in 9.x depends on pdfbox-2.0.3, but replacing the dependency with pdfbox-2.0.4 seems to works perfectly. For 8.2.2, tika is still version 1.13 which depends on pdfbox-2.0.1, and is fully affected by PDFBOX-3369. Again, replacing the dependency with pdfbox-2.0.4 seems to works perfectly.

        Attachments

          Activity

            People

            • Assignee:
              tmortagne Thomas Mortagne
              Reporter:
              softec Denis Gervalle
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Date of First Response: