Details
-
Task
-
Resolution: Fixed
-
Major
-
6.2-milestone-2
-
None
Description
http://www.apache.org/dist/tika/CHANGES-1.6.txt
Release 1.6 - 08/31/2014 * Parse output should indicate which Parser was actually used (TIKA-674). * Use the forbidden-apis Maven plugin to check for unsafe Java operations (TIKA-1387). * Created an ExternalTranslator class to interface with command line Translators (TIKA-1385). * Created a MosesTranslator as a subclass of ExternalTranslator that calls the Moses Decoder machine translation program (TIKA-1385). * Created the tika-example module. It will have examples of how to use the main Tika interfaces (TIKA-1390). * Upgraded to Commons Compress 1.8.1 (TIKA-1275). * Upgraded to POI 3.11-beta1 (TIKA-1380). * Tika now extracts SDTCell content from tables in .docx files (TIKA-1317). * Tika now supports detection of the Persian/Farsi language. (TIKA-1337) * The Tika Detector interface is now exposed through the JAX-RS server (TIKA-1336, TIKA-1336). * Tika now has support for parsing binary Matlab files as part of our larger effort to increase the number of scientific data formats supported. (TIKA-1327) * The Tika Server URLs for the unpacker resources have been changed, to bring them under a common prefix (TIKA-1324). The mapping is /unpacker/{id} -> /unpack/{id} /all/{id} -> /unpack/all/{id} * Added module and core Tika interface for translating text between languages and added a default implementation that call's Microsoft's translate service (TIKA-1319) * Added an Translator implementation that calls Lingo24's Premium Machine Translation API (TIKA-1381) * Made RTFParser's list handling slightly more robust against corrupt list metadata (TIKA-1305) * Fixed bug in CLI json output (TIKA-1291/TIKA-1310) * Added ability to turn off image extraction from PDFs (TIKA-1294). Users must now turn on this capability via the PDFParserConfig. * Upgrade to PDFBox 1.8.6 (TIKA-1290, TIKA-1231, TIKA-1233, TIKA-1352) * Zip Container Detection for DWFX and XPS formats, which are OPC based (TIKA-1204, TIKA-1221) * Added a user facing welcome page to the Tika Server, which says what it is, and a very brief summary of what is available. (TIKA-1269) * Added Tika Server endpoints to list the available mime types, Parsers and Detectors, similar to the --list-<foo> methods on the Tika CLI App (TIKA-1270) * Improvements to NetCDF and HDF parsing to mimic the output of ncdump and extract text dimensions and spatial and variable information from scientific data files (TIKA-1265) * Extract attachments from RTF files (TIKA-1010) * Support Outlook Personal Folders File Format *.pst (TIKA-623) * Added mime entries for additional Ogg based formats (TIKA-1259) * Updated the Ogg Vorbis plugin to v0.4, which adds detection for a wider range of Ogg formats, and parsers for more Ogg Audio ones (TIKA-1113) * PDF: Images in PDF documents can now be extracted as embedded resources. (TIKA-1268) * Fixed RuntimeException thrown for certain Word Documents (TIKA-1251). * CLI: TikaCLI now has another option: --list-parser-details-apt, which outputs the list of supported parsers in APT format. This is used to generate the list on the formats page (TIKA-411).