Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: 11.6-rc-1
Affects Version/s: 11.4
Component/s: Dependency Upgrades
Labels:
None

Difficulty:
Unknown
Documentation:
N/A
Documentation in Release Notes:
https://www.xwiki.org/xwiki/bin/view/ReleaseNotes/Data/XWiki/11.6RC1/#HUpgrades
Similar issues:

Description

See https://dist.apache.org/repos/dist/release/tika/CHANGES-1.21.txt

   * Add optional AUTO mode to OCR'ing of PDFs.  If tesseract is installed
     and on the path, and this option is selected programmatically
     or via TikaConfig(), the PDFParser will use heuristics to decide
     whether or not to run OCR per page on PDFs. (TIKA-2749)

   * The ZipContainerDetector's default behavior was changed to run
     streaming detection up to its markLimit.  Users can get the
     legacy behavior (spool-to-file/rely-on-underlying-file-in-TikaInputStream)
     by setting markLimit=-1. The POIFSContainerDetector requires an underlying file;
     it will try to spool the file to disk; if the file's length is > markLimit,
     it will not attempt detection; set markLimit to -1 for legacy behavior (TIKA-2849).

   * Upgrade PDFBox to 2.0.14 (TIKA-2834).

   * Add CSV detection and replace TXTParser with TextAndCSVParser;
     users can turn off CSV detection by excluding the TextAndCSVParser
     and adding back the TXTParser via tika-config (TIKA-2833).

   * Add a CSVParser.  CSV detection is currently based solely on filename
     and/or information conveyed via Metadata (TIKA-2826).

   * General upgrades: asm, bouncycastle, commons-codec, commons-lang3, cxf,
     guava, h2, httpcomponents, jackcess, junrar, Lucene, mime4j, opennlp, parso,
     sqlite-jdbc (provided), zstd-jni (provided) (TIKA-2824)

   * Bundle xerces2 with tika-parsers (TIKA-2802).

   * Upgrade jaxb to 2.3.2 (TIKA-2819).

   * Upgrade jackson to 2.9.8 (TIKA-2717).

   * Update tika-eval's common tokens lists (TIKA-2822).

   * Handle bad tags in tika-eval more robustly (TIKA-2810).

   * Add reports for tags in tika-eval (TIKA-2809).

   * Extract text from SDT element within textboxes in .docx files (TIKA-2807).

   * Try to handle truncated OOXML files more robustly (TIKA-2765).

Attachments

Issue Links

depends on

XCOMMONS-1155 Upgrade to Guava 28.0-jre

Closed

Activity

People

Assignee:: Thomas Mortagne

Reporter:: Thomas Mortagne

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Jun/19 16:32

Updated:: 19/Jul/19 15:40

Resolved:: 02/Jul/19 15:55