Uploaded image for project: 'XWiki Platform'
  1. XWiki Platform
  2. XWIKI-12735

Upgrade to Tika 1.11

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • 7.3-rc-1
    • 7.3-milestone-1
    • Dependency Upgrades
    • None
    • Unknown
    • N/A

    Description

      Release 1.11 - 10/18/2015
      
        * Java7 API support for allowing java.nio.file.Path as method arguments
          was added to Tika and to ParsingReader, TikaFileTypeDetector, and to
          Tika Config (TIKA-1745, TIKA-1746, TIKA-1751).
      
        * MIME support was added for WebVTT: The Web Video Text Tracks Format
          files (TIKA-1772).
      
        * MIME magic improved to ensure emails detected as message/rfc822
          (TIKA-1771).
      
        * Upgrade to Jackcess Encrypt 2.1.1 to avoid binary incompatibility
          with Bouncy Castle (TIKA-1736).
        
        * Make div and other markup more consistent between PPT and 
          PPTX (TIKA-1755).
      
        * Parse multiple authors from MSOffice's semi-colon delimited
          author field (TIKA-1765).
        
        * Include CTAKESConfig.properties within tika-parsers resources 
          by default (TIKA-1741).
        
        * Prevent infinite recursion when processing inline images
          in PDF files by limiting extraction of duplicate images
          within the same page (TIKA-1742).
      
        * Upgrade to POI 3.13-final (via Andreas Beeker) (TIKA-1707).
      
        * Upgraded tika-batch to use Path throughout (TIKA-1747 and
          (TIKA-1754).
      
        * Upgraded to Path in TikaInputStream (via Yaniv Kunda) (TIKA-1744).
      
        * Changed default content handler type for "/rmeta" in tika-server
          to "xml" to align with "-J" option in tika-app.  
          Clients can now specify handler types via PathParam. (TIKA-1716).
      
        * The fantastic GROBID (or Grobid) GeneRation Of BIbliographic Data
          for machine learning from PDF files is now integrated as a 
          Tika parser (TIKA-1699, TIKA-1712).
      
        * The ability to specify the Tesseract Config Path was added
          to the OCR Parser (TIKA-1703).
      
        * Upgraded to ASM 5.0.4 (TIKA-1705).
      
        * Corrected Tika Config XML detector definition explicit loading 
          of MimeTypes (TIKA-1708)
      
        * In Tika Parsers, Batch, Server, App and Examples, use Apache
          Commons IO instead of inlined ex-Commons classes, and the Java 7
          Standard Charset definitions (TIKA-1710)
      
        * Upgraded to Commons Compress 1.10, which enables zlib compressed
          archives support (TIKA-1718)
      

      Attachments

        Activity

          People

            tmortagne Thomas Mortagne
            tmortagne Thomas Mortagne
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: