Details

    • Type: Task Task
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 7.4
    • Fix Version/s: 9.0-rc-1
    • Component/s: Dependency Upgrades
    • Labels:
      None
    • Difficulty:
      Unknown
    • Documentation:
      N/A
    • Similar issues:

      Description

      See http://www.apache.org/dist/tika/CHANGES-1.14.txt

      Release 1.14 - 10/19/2016
      
        * Extract all headers from MSG/RFC822 (TIKA-2122).
      
        * Upgrade metadata-extractor to 2.9.1 (TIKA-2113).
      
        * Extract PDF DocInfo metadata into separate keys to prevent
          overwriting by XMP metadata (TIKA-2057).
      
        * Re-enable fileUrl for tika-server (TIKA-2081).  If you choose,
          to use this feature, beware of the security vulnerabilities!
          See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271
      
        * Add Tesseract's hOCR output format as an option, via Eric Pugh
          (TIKA-2093)
      
        * Extract macros from MSOffice files (TIKA-2069).
      
        * Maintain passed-in mime in TXTParser (TIKA-2047).
      
        * Upgrade to POI.3-15 (TIKA-2013).
      
        * Upgrade to PDFBox 2.0.3 (TIKA-2051).
      
        * Fix hyperlinks with formatting in DOC and DOCX (TIKA-1255
          and TIKA-2078)
      
        * Tika now is integrated with the Tensorflow library from Google 
          and it can use its Inception v3 image classification model to 
          identify objects in images (TIKA-1993).
      
        * Parser configuration is now type-safe and parameters for parsers
          can have assigned types (TIKA-1508, TIKA-1986).
      
        * Prevent OOM/permanent hang on some corrupt CHM files (TIKA-2040).
      
        * Upgrade ICU4J charset detection components to fix multithreading
          bug (TIKA-2041).
      
        * Upgrade to Jackcess 2.1.4 (TIKA-2039).
      
        * Maintain more significant digits in cells of "General" format
          in XLS and XLSX (TIKA-2025).
      
        * Avoid mark/reset issues when extracting or detecting embedded resources
          in RFC822 emails (TIKA-2037).
      
        * Improving accuracy of Tesseract for better extraction of numeric 
          and alphanumeric text from images (TIKA-2021, TIKA-2031).
      
        * Improve extraction of embedded documents from PPT, PPTX and XLSX
          (TIKA-2026).
      
        * Add parser for applefile (AppleSingle) (TIKA-2022).
      
        * Add mime types, mime magic and/or globs for:
           * Endnote Import File (TIKA-2011)
           * DJVU files (TIKA-2009)
           * MS Owner File (TIKA-2008)
           * Windows Media Metafile (TIKA-2004)
           * iCal and vCalendar (TIKA-2006)
           * MBOX (TIKA-2042)
           * Stata DTA (TIKA-2064)
      
        * Add configurable maximum threshold for number of events extracted
          from the XMP Media Management Schema in JempboxExtractor (TIKA-1999).
      
        * Integrate TesseractOCR with full page image rendering for PDFs (TIKA-1994).
      
        * Add mime detection via Nick C and parser for DBF files (TIKA-1513).
        
        * Add mime detection and parsers for MSOffice 2003 XML Word
          and Excel formats (TIKA-1958).
      
        * Extract hyperlinks from PPT, PPTX, XSLX (TIKA-1454).
      
        * Upgrade to Commons Compress 1.12 (supports progress on TIKA-1358)
      

        Activity

        Hide
        Vincent Massol added a comment -

        This could be interesting for us:

        Parser configuration is now type-safe and parameters for parsers can have assigned types (TIKA-1508, TIKA-1986).

        Show
        Vincent Massol added a comment - This could be interesting for us: Parser configuration is now type-safe and parameters for parsers can have assigned types (TIKA-1508, TIKA-1986).

          People

          • Assignee:
            Vincent Massol
            Reporter:
            Vincent Massol
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: