Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2-milestone-2
    • Fix Version/s: 6.3-milestone-1
    • Component/s: Dependency Upgrades
    • Labels:
      None
    • Difficulty:
      Unknown
    • Documentation:
      N/A
    • Similar issues:

      Description

      http://www.apache.org/dist/tika/CHANGES-1.6.txt

      Release 1.6 - 08/31/2014
      
        * Parse output should indicate which Parser was actually used
          (TIKA-674).
      
        * Use the forbidden-apis Maven plugin to check for unsafe Java
          operations (TIKA-1387).
      
        * Created an ExternalTranslator class to interface with command
          line Translators (TIKA-1385).
      
        * Created a MosesTranslator as a subclass of ExternalTranslator
          that calls the Moses Decoder machine translation program (TIKA-1385).
      
        * Created the tika-example module. It will have examples of how to
          use the main Tika interfaces (TIKA-1390).
      
        * Upgraded to Commons Compress 1.8.1 (TIKA-1275).
      
        * Upgraded to POI 3.11-beta1 (TIKA-1380).
      
        * Tika now extracts SDTCell content from tables in .docx files (TIKA-1317).
      
        * Tika now supports detection of the Persian/Farsi language.
          (TIKA-1337)
        
        * The Tika Detector interface is now exposed through the JAX-RS
          server (TIKA-1336, TIKA-1336).
      
        * Tika now has support for parsing binary Matlab files as part of 
          our larger effort to increase the number of scientific data formats 
          supported. (TIKA-1327)
      
        * The Tika Server URLs for the unpacker resources have been changed,
          to bring them under a common prefix (TIKA-1324). The mapping is
          /unpacker/{id} -> /unpack/{id}
          /all/{id}      -> /unpack/all/{id}
      
        * Added module and core Tika interface for translating text between
          languages and added a default implementation that call's Microsoft's
          translate service (TIKA-1319)
      
        * Added an Translator implementation that calls Lingo24's Premium
          Machine Translation API (TIKA-1381)
      
        * Made RTFParser's list handling slightly more robust against corrupt
          list metadata (TIKA-1305)
      
        * Fixed bug in CLI json output (TIKA-1291/TIKA-1310)
      
        * Added ability to turn off image extraction from PDFs (TIKA-1294).
          Users must now turn on this capability via the PDFParserConfig.
      
        * Upgrade to PDFBox 1.8.6 (TIKA-1290, TIKA-1231, TIKA-1233, TIKA-1352)
      
        * Zip Container Detection for DWFX and XPS formats, which are OPC
          based (TIKA-1204, TIKA-1221)
      
        * Added a user facing welcome page to the Tika Server, which
          says what it is, and a very brief summary of what is available. 
          (TIKA-1269)
      
        * Added Tika Server endpoints to list the available mime types,
          Parsers and Detectors, similar to the --list-<foo> methods on
          the Tika CLI App (TIKA-1270)
      
        * Improvements to NetCDF and HDF parsing to mimic the output of
          ncdump and extract text dimensions and spatial and variable
          information from scientific data files (TIKA-1265)
      
        * Extract attachments from RTF files (TIKA-1010)
      
        * Support Outlook Personal Folders File Format *.pst (TIKA-623)
        
        * Added mime entries for additional Ogg based formats (TIKA-1259)
      
        * Updated the Ogg Vorbis plugin to v0.4, which adds detection for a wider
          range of Ogg formats, and parsers for more Ogg Audio ones (TIKA-1113)
      
        * PDF: Images in PDF documents can now be extracted as embedded resources.
          (TIKA-1268)
      
        * Fixed RuntimeException thrown for certain Word Documents (TIKA-1251).
      
        * CLI: TikaCLI now has another option: --list-parser-details-apt, which outputs
          the list of supported parsers in APT format. This is used to generate the list
          on the formats page (TIKA-411).
      

        Attachments

          Activity

            People

            • Assignee:
              tmortagne Thomas Mortagne
              Reporter:
              vmassol Vincent Massol
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: