Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: 10.4-rc-1
Affects Version/s: 10.3
Component/s: Dependency Upgrades
Labels:
None

Difficulty:
Unknown
Documentation:
N/A
Documentation in Release Notes:
http://www.xwiki.org/xwiki/bin/view/ReleaseNotes/Data/XWiki/10.4RC1/#HUpgrades
Similar issues:

Description

See https://github.com/apache/tika/blob/1.18/CHANGES.txt

   * Upgrade Jackson to 2.9.5 (TIKA-2634).

   * Add support for brotli (TIKA-2621).

   * Upgrade PDFBox to 2.0.9 and include new jbig2-imageio
     from org.apache.pdfbox (TIKA-2579 and TIKA-2607).

   * Support for TIFF images in PDF files (TIKA-2338)
   
   * Detection of full encrypted 7z files (TIKA-2568)

   * Various new mimes and typo fixes in tika-mimetypes.xml
     via Andreas Meier (TIKA-2527).

   * Revert to listenForAllRecords=false in ExcelExtractor
     via Grigoriy Alekseev (TIKA-2590)

   * Add workaround to identify TIFFs that might confuse
     commons-compress's tar detection via Daniel Schmidt
     (TIKA-2591)

   * Ignore non-IANA supported charsets in HTML meta-headers
     during charset detection in HTMLEncodingDetector
     via Andreas Meier (TIKA-2592)

   * Add detection and parsing of zstd (if user provides
     com.github.luben:zstd-jni) via Andreas Meier (TIKA-2576)

   * Allow for RFC822 detection for files starting with "dkim-"
     and/or "x-" via Andreas Meier (TIKA-2578 and TIKA-2587)

   * Extract xlsx files embedded in OLE objects within PPT and PPTX
     via Brian McColgan (TIKA-2588).

   * Extract files embedded in HTML and javascript inside HTML
     that are stored in the Data URI scheme (TIKA-2563).

   * Extract text from grouped text boxes in PPT (TIKA-2569).

   * Extract language metadata item from PDF files via Matt Sheppard (TIKA-2559)

   * RFC822 with multipart/mixed, first text element should be treated
     as the main body of the email, not an attachment (TIKA-2547).

   * Swap out com.tdunning:json for com.github.openjson:openjson to avoid
     jar conflicts (TIKA-2556).

   * No longer hardcode HtmlParser for XML files in tika-server (TIKA-2551).

   * Require Java 8 (TIKA-2553).

   * Add a parser for XPS (TIKA-2524).

   * Mime magic for Dolby Digital AC3 and EAC3 files

   * Fixed bug where TesseractOCRParser ignores configured ImageMagickPath,
     and set rotation script to ignore Python warnings (TIKA-2509)

   * Upgrade geo-apis to 3.0.1 (TIKA-2535).

   * Added local Docker image build using dockerfile-maven-plugin to allow
images to be built from source (TIKA-1518).

Attachments

Activity

People

Assignee:: Thomas Mortagne

Reporter:: Thomas Mortagne

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 30/Apr/18 18:45

Updated:: 11/May/18 15:44

Resolved:: 30/Apr/18 18:48