Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-15236

Upgrade to Tika 1.18



    • Task
    • Resolution: Fixed
    • Major
    • 10.4-rc-1
    • 10.3
    • Dependency Upgrades
    • None
    • Unknown
    • N/A


      See https://github.com/apache/tika/blob/1.18/CHANGES.txt

         * Upgrade Jackson to 2.9.5 (TIKA-2634).
         * Add support for brotli (TIKA-2621).
         * Upgrade PDFBox to 2.0.9 and include new jbig2-imageio
           from org.apache.pdfbox (TIKA-2579 and TIKA-2607).
         * Support for TIFF images in PDF files (TIKA-2338)
         * Detection of full encrypted 7z files (TIKA-2568)
         * Various new mimes and typo fixes in tika-mimetypes.xml
           via Andreas Meier (TIKA-2527).
         * Revert to listenForAllRecords=false in ExcelExtractor
           via Grigoriy Alekseev (TIKA-2590)
         * Add workaround to identify TIFFs that might confuse
           commons-compress's tar detection via Daniel Schmidt
         * Ignore non-IANA supported charsets in HTML meta-headers
           during charset detection in HTMLEncodingDetector
           via Andreas Meier (TIKA-2592)
         * Add detection and parsing of zstd (if user provides
           com.github.luben:zstd-jni) via Andreas Meier (TIKA-2576)
         * Allow for RFC822 detection for files starting with "dkim-"
           and/or "x-" via Andreas Meier (TIKA-2578 and TIKA-2587)
         * Extract xlsx files embedded in OLE objects within PPT and PPTX
           via Brian McColgan (TIKA-2588).
         * Extract files embedded in HTML and javascript inside HTML
           that are stored in the Data URI scheme (TIKA-2563).
         * Extract text from grouped text boxes in PPT (TIKA-2569).
         * Extract language metadata item from PDF files via Matt Sheppard (TIKA-2559)
         * RFC822 with multipart/mixed, first text element should be treated
           as the main body of the email, not an attachment (TIKA-2547).
         * Swap out com.tdunning:json for com.github.openjson:openjson to avoid
           jar conflicts (TIKA-2556).
         * No longer hardcode HtmlParser for XML files in tika-server (TIKA-2551).
         * Require Java 8 (TIKA-2553).
         * Add a parser for XPS (TIKA-2524).
         * Mime magic for Dolby Digital AC3 and EAC3 files
         * Fixed bug where TesseractOCRParser ignores configured ImageMagickPath,
           and set rotation script to ignore Python warnings (TIKA-2509)
         * Upgrade geo-apis to 3.0.1 (TIKA-2535).
         * Added local Docker image build using dockerfile-maven-plugin to allow
      images to be built from source (TIKA-1518).




            tmortagne Thomas Mortagne
            tmortagne Thomas Mortagne
            0 Vote for this issue
            1 Start watching this issue