Issue Details (XML | Word | Printable)

Key: XCONTRIB-68
Type: New Feature New Feature
Status: Closed Closed
Resolution: Duplicate
Priority: Major Major
Assignee: Wang Ning
Reporter: Vincent Massol
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
XWiki Contrib

Convert Office documents

Created: 12/Jul/08 17:21   Updated: 09/Nov/09 17:43
Component/s: OfficeImporter
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. File Import.xar (2 kB)


Date of First Response: 22/Jul/08 17:56
Resolution Date: 31/Dec/08 14:22


 Description  « Hide
The project is to create a xwiki plugin to convert office douments such as MS Doc, MS Excel, Openoffice Odt to xwiki syntax and insert the result to a xwiki page. The middle setp of the conversion is to convert office document to clean tidy html code. Then use xhtml parser convert the html to xwiki syntax.

See http://dev.xwiki.org/xwiki/bin/view/Design/OfficeImporter for more details.
The code is currently committed here: http://svn.xwiki.org/svnroot/sandbox/xwiki-plugin-officeimporter/



 All   Comments   Change History   FishEye      Sort Order: Ascending order - Click to sort in descending order
Wang Ning added a comment - 22/Jul/08 17:56
I add a office import application for xwiki.
How to use.
1. install office converter plugin
http://dev.xwiki.org/xwiki/bin/view/Design/OfficeConverter#HQuickStart
2. install this application
http://dev.xwiki.org/xwiki/bin/view/Design/OfficeConverter#HUsetheofficeconverterapplication

Vincent Massol added a comment - 13/Aug/08 22:28 - edited
Hi Wang.

Some review comments (We discussed some of them on IRC but I'm noting them here for the record):

  • You should fix the tests so that they are real tests, i.e. they must verify the output of the conversion.
  • We shouldn't use JDOM for the filtering. Since 2.0 htmlcleaner has some nice and efficient methods for filtering the cleaned HTML.
  • We should move the HTML cleaning code in the rendering-api module and add tons of tests for it. (I can help)
  • HTML to wiki conversion should work - This is the biggest limitation right now since it's not started and is the hard part of the project
  • Some code should be componentized (I can help)

Thanks
-Vincent


Vincent Massol added a comment - 22/Aug/08 14:48
Review on 2008-08-22:
  • Code needs to use xwiki-xml/ HTMLCleaner
  • OfficeConverter.java should have only one method that outputs a stream. Client code can do the conversion to other formats (string, array of bytes, etc).
  • Missing Javadoc (for example for OfficeConverter.java)
  • Missing @version and @since tags
  • XWiki coding style not applied to some files (for example to pom.xml - it's using 4 spaces instead of 2; same for the office converter excpetion class which has braces at the end of a line for a method)
  • There should be no @author tags
  • Sentences should start with uppercase letter (in javadocs).
  • Unused code. For example:
	public static final String DEFAULTCONVERTERTYPE = "localhost";
	public static final String SERVICECONVERTERTYPE = "service";

Actually the office converter factory should be removed altogether since it's not used.

  • Use a component for OfficeConverter/DefaultOfficeConverter
  • Needs better error messages. Need to use "[" and "]" in error messages when displaying values. Best practice is actually to use String.format().
  • AbstractOfficeConverter not required since there's now only 1 method in OfficeConverter
  • TODO leftover like:
        // TODO automatically start the open office server
        // soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
  • Provide configuration parameter in components.xml for DefaultOfficeConverter component for server name/IP + port
  • OfficeServerConnection class is not required and the code for connecting to the office server should be moved to DefaultOfficeConverter, in it's initialize() method (it should implement Initializable).
  • Util JDOM to string should be removed and you should use XMLUtils.toString() instead where required.
  • Util should be removed in general and methods moved elsewhere.
  • There should be an output type for wiki syntax. We need to design this so that it's independent of the wiki syntax use and uses the rendering system for that.
  • I'll review the plugin later on when these issues have been fixed.

Thanks!


Vincent Massol added a comment - 31/Dec/08 14:22
This is now a duplicate of XAOFFICE-1