How to Convert a PDF to HTML With Ubuntuby Ben Lingenfelter
There are several ways to attempt changing a PDF file into an HTML. Keep in mind, the finished product will probably not look as good as the original. The Portable Document Format is not easily circumvented. HTML does not work with text and images in quite the same way that PDF files do, especially complex ones, but here are a few ways to attempt it.
The easiest way is to go to the Adobe Web site and upload your PDF. Probably due to the rash of software being marketed to do this very thing, Adobe offers it for free. All you have to do is fill in a few blanks, click a button, and off you go.
Another way is to use a nifty little tool called Image Magick. It's easy to find in Synaptic. Download it, choose it from the "open with" menu, and "save as" html. The only hang-up with it is that you can only do one page at a time.
The final way is to use a little program called pdftohtml. To do this you have to use the terminal to make sure poppler-utils is installed.
sudo aptitude install poppler-utils
The program will install automatically, and then you have to navigate to the directory in which your PDF file is located. Once there, all you have to do is type:
pdftohtml -c [filename].pdf [filename].html
The finished product isn't much different from that given you by the Adobe Web site, but you'll be supporting open source software by using it!
- The only other way (and the best) is to extract the text (usually you can copy and paste it) and then the images into your favorite HTML editor, like NVU or even a text document in OpenOffice.org. Once it's all positioned the way you want, you can "save as" an HTML or XHTML. You'll get a better finished product, but you'll almost be recreating the wheel while you're at it. If you use The Gimp, you can save the PDF as an image. Even the text becomes part of the image. But you could then insert the whole thing into an HTML document.
- It's not a one-step process often, and it's not always pretty, but these methods work. Unless the PDFs are very complex, you should get a usable finished product.