Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-13932

Bad float in PDF might cause PDF attachment to not be indexed by SOLR

    XMLWordPrintable

Details

    • Unknown
    • N/A
    • N/A

    Description

      The error generated by the indexer is:

      Exception in thread "main" java.io.IOException: Error expected floating point number actual='0.-262'
      	at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:81)
      	at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
      	at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:939)
      

      This issue is PDFBOX-3500 which is fixed in pdfbox-2.0.4. It is special case of issue PDFBOX-3369 that wasn't fixed in pdfbox-2.0.2. Currently, tika-1.14 used in 9.x depends on pdfbox-2.0.3, but replacing the dependency with pdfbox-2.0.4 seems to works perfectly. For 8.2.2, tika is still version 1.13 which depends on pdfbox-2.0.1, and is fully affected by PDFBOX-3369. Again, replacing the dependency with pdfbox-2.0.4 seems to works perfectly.

      Attachments

        Activity

          People

            tmortagne Thomas Mortagne
            softec Denis Gervalle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: