How to Read a DOC File Using PHP

By James Highland

Finding a key does not require editing the Windows Registry.
i Ciaran Griffin/Lifesize/Getty Images

PHP programmers are often seeking ways to extend the functionality of this language outside its comfort zone. PHP is mostly used on Linux operating systems, but the visitors of PHP websites are usually working on Windows or Macintosh systems. A website designed to upload a Microsoft Word file from these users may need to extract the contents of this file and email or process its text using PHP algorithms. But Microsoft Word files, which end in the DOC extension, are not native to Linux or PHP. Bridging these two computer environments is possible with some preparation.

Step 1

Verify with your host provider that PHP access is available for your website. Most servers support PHP, but occasionally this service requires an account upgrade. You cannot read a DOC file using PHP without access to the compiler.

Step 2

Download the Antiword MS Word document reader utility (see References). This open-source program contains programming materials that extend PHP for DOC conversion.

Step 3

Unzip the Antiword archive to extract its contents.

Step 4

Upload the entire Antiword file library to the web server. Place the files in the "bin" directory of the host account. This directory is a common repository for utilities and other binary operations that web server users must frequently exploit.

Step 5

Call the Antiword program in any PHP script designed to read a DOC Microsoft Word document. The command is implemented using a single function. Type "$content = shell_exec('/usr/local/bin/antiword '.$filename);" where "$filename" equals the full file name of the DOC document. The contents of the DOC file will be read into the variable "$content". These variable names, beginning with the "$" symbol, are customizable.

Step 6

Process the "$content" variable as desired to manipulate the contents of the DOC file. Once the DOC is read by PHP, the full text of the file is available for any form of further activity. The contents can be emailed or stored to a database, for example.