• Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.3
    • Fix Version/s: 10.4-rc-1
    • Component/s: Dependency Upgrades
    • Labels:
    • Difficulty:
    • Documentation:
    • Similar issues:



         * Upgrade Jackson to 2.9.5 (TIKA-2634).
         * Add support for brotli (TIKA-2621).
         * Upgrade PDFBox to 2.0.9 and include new jbig2-imageio
           from org.apache.pdfbox (TIKA-2579 and TIKA-2607).
         * Support for TIFF images in PDF files (TIKA-2338)
         * Detection of full encrypted 7z files (TIKA-2568)
         * Various new mimes and typo fixes in tika-mimetypes.xml
           via Andreas Meier (TIKA-2527).
         * Revert to listenForAllRecords=false in ExcelExtractor
           via Grigoriy Alekseev (TIKA-2590)
         * Add workaround to identify TIFFs that might confuse
           commons-compress's tar detection via Daniel Schmidt
         * Ignore non-IANA supported charsets in HTML meta-headers
           during charset detection in HTMLEncodingDetector
           via Andreas Meier (TIKA-2592)
         * Add detection and parsing of zstd (if user provides
           com.github.luben:zstd-jni) via Andreas Meier (TIKA-2576)
         * Allow for RFC822 detection for files starting with "dkim-"
           and/or "x-" via Andreas Meier (TIKA-2578 and TIKA-2587)
         * Extract xlsx files embedded in OLE objects within PPT and PPTX
           via Brian McColgan (TIKA-2588).
         * Extract files embedded in HTML and javascript inside HTML
           that are stored in the Data URI scheme (TIKA-2563).
         * Extract text from grouped text boxes in PPT (TIKA-2569).
         * Extract language metadata item from PDF files via Matt Sheppard (TIKA-2559)
         * RFC822 with multipart/mixed, first text element should be treated
           as the main body of the email, not an attachment (TIKA-2547).
         * Swap out com.tdunning:json for com.github.openjson:openjson to avoid
           jar conflicts (TIKA-2556).
         * No longer hardcode HtmlParser for XML files in tika-server (TIKA-2551).
         * Require Java 8 (TIKA-2553).
         * Add a parser for XPS (TIKA-2524).
         * Mime magic for Dolby Digital AC3 and EAC3 files
         * Fixed bug where TesseractOCRParser ignores configured ImageMagickPath,
           and set rotation script to ignore Python warnings (TIKA-2509)
         * Upgrade geo-apis to 3.0.1 (TIKA-2535).
         * Added local Docker image build using dockerfile-maven-plugin to allow
      images to be built from source (TIKA-1518).




            • Assignee:
              tmortagne Thomas Mortagne
              tmortagne Thomas Mortagne
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: