Extra characters when using PHP to upload a Word file

Hi everyone,

I am new here although I am not new to PHP!

I am developing a site whereby our users upload a text document (this is likely to be a Word doc - we do not want to restrict the file type) and the text content is taken and added to a mysql database.
I have tried this with field type as Blob and Text and I still have the same problem. The php upload works fine and the file is on the server and fully readable.

Basically, we all know that Word adds extra characters to all code it generates, and as such, the code going into the database and then being used on the website, contains lots of extra characters before and after the actual content.
I have tried to change the page Charset but am unsure exactly what it should be - and I don’t even know if this will work.

If anyone can help me out it will be VERY appreciated!

Word not only adds extra characters, it might as well be a binary file depending upon which version of word you are using.

Newer versions keep “Copies” of the document in a format readable by previous versions as well for backward compatibility. So to try and parse a word document for just it’s text, I think would be quite a daunting task. You might want to search for a Word to Text converter. I just found one at http://obninsk-doc2text-converter.obninchive.org/

You could also search Google.com Yourself

Sponsor our Newsletter | Privacy Policy | Terms of Service