Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-10956

Upgrade to Tika 1.6



    • Unknown
    • N/A



      Release 1.6 - 08/31/2014
        * Parse output should indicate which Parser was actually used
        * Use the forbidden-apis Maven plugin to check for unsafe Java
          operations (TIKA-1387).
        * Created an ExternalTranslator class to interface with command
          line Translators (TIKA-1385).
        * Created a MosesTranslator as a subclass of ExternalTranslator
          that calls the Moses Decoder machine translation program (TIKA-1385).
        * Created the tika-example module. It will have examples of how to
          use the main Tika interfaces (TIKA-1390).
        * Upgraded to Commons Compress 1.8.1 (TIKA-1275).
        * Upgraded to POI 3.11-beta1 (TIKA-1380).
        * Tika now extracts SDTCell content from tables in .docx files (TIKA-1317).
        * Tika now supports detection of the Persian/Farsi language.
        * The Tika Detector interface is now exposed through the JAX-RS
          server (TIKA-1336, TIKA-1336).
        * Tika now has support for parsing binary Matlab files as part of 
          our larger effort to increase the number of scientific data formats 
          supported. (TIKA-1327)
        * The Tika Server URLs for the unpacker resources have been changed,
          to bring them under a common prefix (TIKA-1324). The mapping is
          /unpacker/{id} -> /unpack/{id}
          /all/{id}      -> /unpack/all/{id}
        * Added module and core Tika interface for translating text between
          languages and added a default implementation that call's Microsoft's
          translate service (TIKA-1319)
        * Added an Translator implementation that calls Lingo24's Premium
          Machine Translation API (TIKA-1381)
        * Made RTFParser's list handling slightly more robust against corrupt
          list metadata (TIKA-1305)
        * Fixed bug in CLI json output (TIKA-1291/TIKA-1310)
        * Added ability to turn off image extraction from PDFs (TIKA-1294).
          Users must now turn on this capability via the PDFParserConfig.
        * Upgrade to PDFBox 1.8.6 (TIKA-1290, TIKA-1231, TIKA-1233, TIKA-1352)
        * Zip Container Detection for DWFX and XPS formats, which are OPC
          based (TIKA-1204, TIKA-1221)
        * Added a user facing welcome page to the Tika Server, which
          says what it is, and a very brief summary of what is available. 
        * Added Tika Server endpoints to list the available mime types,
          Parsers and Detectors, similar to the --list-<foo> methods on
          the Tika CLI App (TIKA-1270)
        * Improvements to NetCDF and HDF parsing to mimic the output of
          ncdump and extract text dimensions and spatial and variable
          information from scientific data files (TIKA-1265)
        * Extract attachments from RTF files (TIKA-1010)
        * Support Outlook Personal Folders File Format *.pst (TIKA-623)
        * Added mime entries for additional Ogg based formats (TIKA-1259)
        * Updated the Ogg Vorbis plugin to v0.4, which adds detection for a wider
          range of Ogg formats, and parsers for more Ogg Audio ones (TIKA-1113)
        * PDF: Images in PDF documents can now be extracted as embedded resources.
        * Fixed RuntimeException thrown for certain Word Documents (TIKA-1251).
        * CLI: TikaCLI now has another option: --list-parser-details-apt, which outputs
          the list of supported parsers in APT format. This is used to generate the list
          on the formats page (TIKA-411).




            tmortagne Thomas Mortagne
            vmassol Vincent Massol
            0 Vote for this issue
            1 Start watching this issue