Issue Details (XML | Word | Printable)

Key: XWIKI-49
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Sergiu Dumitriu
Reporter: Xavier MOGHRABI
Votes: 2
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
XWiki Core

PDF generation failed with vietnamese characters encoded in UTF-8

Created: 16/Jun/05 11:27   Updated: 14/Aug/07 22:29
Component/s: None
Affects Version/s: 0.9.793
Fix Version/s: 1.0 RC2

Environment: Linux Debian 3.0, JVM Sun 1.5.03, Tomcat 5.5.9
Issue Links:
Duplicate
 
Related
 

Date of First Response: 13/Sep/05 19:51
Resolution Date: 14/Aug/07 20:18


 Description  « Hide
I've got a xwiki installation that contains vietnamese characters encoded in UTF-8.
I can see the characters without problem in my browser. The characters are encoded in UTF-8 int the file ApplicationResouces_vn.properties (with \uXXXX).

However it seems I canno't generate PDF when there are vietnamese characters. I've got the below exception in Tomcat. You can test it by your self on
http://www.cosgov.org



 All   Comments   Change History   FishEye      Sort Order: Ascending order - Click to sort in descending order
Artem Melentev added a comment - 13/Sep/05 19:51
I found in com.xpn.xwiki.pdf.impl.PdfExportImpl:
..
public PdfExportImpl() {
...
tidy.setCharEncoding(Configuration.LATIN1); // need .UTF8
}
Now PDF generate.
But utf chars in result pdf was replaced by '#'. Maybe special fonts needed?


Mustafa Tan added a comment - 12/Oct/06 14:51
i have the same problem with turkish characters...

Sergiu Dumitriu added a comment - 24/Apr/07 04:30
Almost fixed in 2973. Indeed, the problem was at
tidy.setCharEncoding(Configuration.LATIN1);
. I upgraded to the latest JTidy (0.8-SNAPSHOT), which allows setting an encoding by name.

Some characters are replaced by # because FOP does not have a font containing those chars. We need to add such a font and configure JTidy to use it.

Another problem is when XWiki is configured to ISO-8859-1, and the document contains xml char references (&#xyz;). It seems that these references are lost somewhere in the process.


Vincent Massol added a comment - 24/Apr/07 09:12
Sergiu, are we going to close this issue for RC2?

I'm worried that we've commmitted code for it in the 1.0 branch and if we don't close it we won't have any jira issue representing the code we have committed.

Thanks
-Vincent


Sergiu Dumitriu added a comment - 24/Apr/07 13:06
Well, it does output a PDF (no more stack traces). But the non-latin1 characters are missing from the output. We could close this bug as fixed, and make a new one regarding the missing chars.

Vincent Massol added a comment - 24/Apr/07 13:22

We could close this bug as fixed, and make a new one regarding the missing chars.

Sounds the best to me.


Catalin Hritcu added a comment - 14/Aug/07 20:18
This should have been closed earlier. New issue is XWIKI-1609.