Basically I got to the point where:
I convert from pdf to word with no images because it maintains the format better than pdf to html.
Then after that I add all of the images back into the file and convert to html using word's algorithm.
Unfortunately, MSWord 2007 (because my shop is cheap and hasn't upgraded, not that it would necessarily make a difference) does not deal with browsers newer than IE6, and has no support for Mozilla.
On top of that it cannot handle data formatted into multiple columns.
So now it seems like I will have to go page by page, book by book (bout 35 pages per book) and hand code everything into html.