How to Extract Data From a PDF With JavaScript

By Jim Campbell

Portable Document Format, or PDF, files are standard read-only file formats offered online. PDFs let businesses create documents that readers are unable to change without the administrator password. JavaScript can be used to open the file and read the content of the PDF file. You can then use the data to add to the database, write to a Web page or edit the data for another PDF file.

Create the document object that points to the JavaScript file. You create a document variable object so that you can read or edit the file. The following code creates the variable that points to the "pdffile.pdf" file:

var doc = new jsPDF("pdffile.pdf");

Read the data in the PDF file. You use the "content" function available from the new "doc" object to read in content. Add the following code to extract data:

content = doc.output();

Print the results to the screen. To test that the function was a success, print out the extracted data using the following code:

Jaxer.response.addHeader('Content-Disposition', 'attachment; filename=pdffile.pdf');

Jaxer.response.addHeader('Content-Type', 'application/pdf');

Jaxer.response.setContents(content);

×