How to Read a DOC File Using PHP
By James Highland
PHP programmers are often seeking ways to extend the functionality of this language outside its comfort zone. PHP is mostly used on Linux operating systems, but the visitors of PHP websites are usually working on Windows or Macintosh systems. A website designed to upload a Microsoft Word file from these users may need to extract the contents of this file and email or process its text using PHP algorithms. But Microsoft Word files, which end in the DOC extension, are not native to Linux or PHP. Bridging these two computer environments is possible with some preparation.
Step 1
Verify with your host provider that PHP access is available for your website. Most servers support PHP, but occasionally this service requires an account upgrade. You cannot read a DOC file using PHP without access to the compiler.
Step 2
Download the Antiword MS Word document reader utility (see References). This open-source program contains programming materials that extend PHP for DOC conversion.
Step 3
Unzip the Antiword archive to extract its contents.
Step 4
Upload the entire Antiword file library to the web server. Place the files in the "bin" directory of the host account. This directory is a common repository for utilities and other binary operations that web server users must frequently exploit.
Step 5
Call the Antiword program in any PHP script designed to read a DOC Microsoft Word document. The command is implemented using a single function. Type "$content = shell_exec('/usr/local/bin/antiword '.$filename);" where "$filename" equals the full file name of the DOC document. The contents of the DOC file will be read into the variable "$content". These variable names, beginning with the "$" symbol, are customizable.
Step 6
Process the "$content" variable as desired to manipulate the contents of the DOC file. Once the DOC is read by PHP, the full text of the file is available for any form of further activity. The contents can be emailed or stored to a database, for example.
References
Tips
- Server configurations vary between hosting providers. It is possible that the "bin" directory for your account resides in a different location than the example code included here. The path of your "bin" directory may be something other than "'/usr/local/bin". If so, adjust the "shell_exec" command to suit the particular environment of your server account.
Writer Bio
James Highland started writing professionally in 1998. He has written for the New York Institute of Finance and Chron.com. He has an extensive background in financial investing and has taught computer programming courses for two New York companies. He has a Bachelor of Arts in film production from Indiana University.