Details
-
Task
-
Resolution: Fixed
-
Major
-
10.3
-
None
Description
See https://github.com/apache/tika/blob/1.18/CHANGES.txt
* Upgrade Jackson to 2.9.5 (TIKA-2634). * Add support for brotli (TIKA-2621). * Upgrade PDFBox to 2.0.9 and include new jbig2-imageio from org.apache.pdfbox (TIKA-2579 and TIKA-2607). * Support for TIFF images in PDF files (TIKA-2338) * Detection of full encrypted 7z files (TIKA-2568) * Various new mimes and typo fixes in tika-mimetypes.xml via Andreas Meier (TIKA-2527). * Revert to listenForAllRecords=false in ExcelExtractor via Grigoriy Alekseev (TIKA-2590) * Add workaround to identify TIFFs that might confuse commons-compress's tar detection via Daniel Schmidt (TIKA-2591) * Ignore non-IANA supported charsets in HTML meta-headers during charset detection in HTMLEncodingDetector via Andreas Meier (TIKA-2592) * Add detection and parsing of zstd (if user provides com.github.luben:zstd-jni) via Andreas Meier (TIKA-2576) * Allow for RFC822 detection for files starting with "dkim-" and/or "x-" via Andreas Meier (TIKA-2578 and TIKA-2587) * Extract xlsx files embedded in OLE objects within PPT and PPTX via Brian McColgan (TIKA-2588). * Extract files embedded in HTML and javascript inside HTML that are stored in the Data URI scheme (TIKA-2563). * Extract text from grouped text boxes in PPT (TIKA-2569). * Extract language metadata item from PDF files via Matt Sheppard (TIKA-2559) * RFC822 with multipart/mixed, first text element should be treated as the main body of the email, not an attachment (TIKA-2547). * Swap out com.tdunning:json for com.github.openjson:openjson to avoid jar conflicts (TIKA-2556). * No longer hardcode HtmlParser for XML files in tika-server (TIKA-2551). * Require Java 8 (TIKA-2553). * Add a parser for XPS (TIKA-2524). * Mime magic for Dolby Digital AC3 and EAC3 files * Fixed bug where TesseractOCRParser ignores configured ImageMagickPath, and set rotation script to ignore Python warnings (TIKA-2509) * Upgrade geo-apis to 3.0.1 (TIKA-2535). * Added local Docker image build using dockerfile-maven-plugin to allow images to be built from source (TIKA-1518).