Description
Ideally this is the job of htmlcleaner. I have asked a question in their forum here http://sourceforge.net/forum/forum.php?thread_id=2890619&forum_id=637245. While waiting for an answer for it, I think we have to workaround it for the moment. To test for the validity of my claim, consider the following html document:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>I AM YOUR DOCUMENT TITLE REPLACE ME</title>
<meta http-equiv="content-type" content="application/xhtml+xml;charset=utf-8" />
<meta http-equiv="content-style-type" content="text/css" />
</head>
<body>
<table>
<tbody>
<tr/>
</tbody>
</table>
</body>
</html>
If this html code is entered in http://validator.w3.org/check it will report an error regarding the <tr/> tag. We have a choice to either convert the <tr/> into <tr><td/></tr> or to completely strip it off. The latter choice is more logical because empty rows are not rendered in html.