A little while back I came across a pdf file that I wanted to convert to text, in an automated fashion.
After a fair bit of searching I found pdftotext, which does exactly what I needed, I ran it on my home pc (Ubuntu) and it worked a treat, then migrated my files to a CentOS server and it no longer worked so nicely
In the end I figured out I needed to add the ‘-raw’ option on CentOS to get it to create a similar file to the one I used on Ubuntu.
Syntaxt I used is as follows;
pdftotext -enc Latin1 -raw {filename}
Say your filename is abc123.pdf, you will now have an additional file abc123.txt, brilliant!
I installed it on Ubuntu using apt-get, and on CentOS using yum, easy peasy.
Posted under Uncategorized
This post was written by Shaun on July 21, 2011

Photo: Screen capture from video
Screenshot from 

Image: Marco Castro Cosio