Details
-
Task
-
Resolution: Fixed
-
Major
-
6.2-milestone-2
-
None
Description
http://www.apache.org/dist/tika/CHANGES-1.6.txt
Release 1.6 - 08/31/2014
* Parse output should indicate which Parser was actually used
(TIKA-674).
* Use the forbidden-apis Maven plugin to check for unsafe Java
operations (TIKA-1387).
* Created an ExternalTranslator class to interface with command
line Translators (TIKA-1385).
* Created a MosesTranslator as a subclass of ExternalTranslator
that calls the Moses Decoder machine translation program (TIKA-1385).
* Created the tika-example module. It will have examples of how to
use the main Tika interfaces (TIKA-1390).
* Upgraded to Commons Compress 1.8.1 (TIKA-1275).
* Upgraded to POI 3.11-beta1 (TIKA-1380).
* Tika now extracts SDTCell content from tables in .docx files (TIKA-1317).
* Tika now supports detection of the Persian/Farsi language.
(TIKA-1337)
* The Tika Detector interface is now exposed through the JAX-RS
server (TIKA-1336, TIKA-1336).
* Tika now has support for parsing binary Matlab files as part of
our larger effort to increase the number of scientific data formats
supported. (TIKA-1327)
* The Tika Server URLs for the unpacker resources have been changed,
to bring them under a common prefix (TIKA-1324). The mapping is
/unpacker/{id} -> /unpack/{id}
/all/{id} -> /unpack/all/{id}
* Added module and core Tika interface for translating text between
languages and added a default implementation that call's Microsoft's
translate service (TIKA-1319)
* Added an Translator implementation that calls Lingo24's Premium
Machine Translation API (TIKA-1381)
* Made RTFParser's list handling slightly more robust against corrupt
list metadata (TIKA-1305)
* Fixed bug in CLI json output (TIKA-1291/TIKA-1310)
* Added ability to turn off image extraction from PDFs (TIKA-1294).
Users must now turn on this capability via the PDFParserConfig.
* Upgrade to PDFBox 1.8.6 (TIKA-1290, TIKA-1231, TIKA-1233, TIKA-1352)
* Zip Container Detection for DWFX and XPS formats, which are OPC
based (TIKA-1204, TIKA-1221)
* Added a user facing welcome page to the Tika Server, which
says what it is, and a very brief summary of what is available.
(TIKA-1269)
* Added Tika Server endpoints to list the available mime types,
Parsers and Detectors, similar to the --list-<foo> methods on
the Tika CLI App (TIKA-1270)
* Improvements to NetCDF and HDF parsing to mimic the output of
ncdump and extract text dimensions and spatial and variable
information from scientific data files (TIKA-1265)
* Extract attachments from RTF files (TIKA-1010)
* Support Outlook Personal Folders File Format *.pst (TIKA-623)
* Added mime entries for additional Ogg based formats (TIKA-1259)
* Updated the Ogg Vorbis plugin to v0.4, which adds detection for a wider
range of Ogg formats, and parsers for more Ogg Audio ones (TIKA-1113)
* PDF: Images in PDF documents can now be extracted as embedded resources.
(TIKA-1268)
* Fixed RuntimeException thrown for certain Word Documents (TIKA-1251).
* CLI: TikaCLI now has another option: --list-parser-details-apt, which outputs
the list of supported parsers in APT format. This is used to generate the list
on the formats page (TIKA-411).