Page 1 of 1

CATS won't recognize antiword, htmltotext, unrtf, or pdf2txt

Posted: 02 Oct 2010, 20:27
by zoomiest
Hello all,
I am in the process of moving my long-term CATS 9.1 installation from a Ubuntu server box to a VPS with CentOS. Although the correct text parsing tool had to be sleuthed out, I think I found all of the right ones.

However, now, going through the installation wizard, this CATS install doesn't recognize the resume indexing tool binaries. For each tool (antiword, pdftotext, unrtf, and html2text, it says that they are not in the /usr/bin/ directory - but they are! (ug!)

What factors would play on this? How would I *make* this app recognize these tools? Any help would be appreciated!

Background:
  • CentOS release 5.3
  • UnRTF from here
  • html2text from here
  • pdf2txt from here
  • antiword from here
  • I used the opencats 9.1a download

Re: CATS won't recognize antiword, htmltotext, unrtf, or pdf2txt

Posted: 05 Oct 2010, 23:52
by RussH
zoomiest wrote:Hello all,
I am in the process of moving my long-term CATS 9.1 installation from a Ubuntu server box to a VPS with CentOS. Although the correct text parsing tool had to be sleuthed out, I think I found all of the right ones.

However, now, going through the installation wizard, this CATS install doesn't recognize the resume indexing tool binaries. For each tool (antiword, pdftotext, unrtf, and html2text, it says that they are not in the /usr/bin/ directory - but they are! (ug!)

What factors would play on this? How would I *make* this app recognize these tools? Any help would be appreciated!

Background:
  • CentOS release 5.3
  • UnRTF from here
  • html2text from here
  • pdf2txt from here
  • antiword from here
  • I used the opencats 9.1a download

Hi Zoomiest...

this is odd - pretty similar to my setup. Basic one first - does your path include /usr/bin/ ?

if you go to your home directory and type html2text does it execute ?

also - does your config.php in cats directory look like this?
Code: Select all
/* Text parser settings. Remember to use double backslashes (\) to represent
 * one backslash (\). On Windows, installing in C:\antiword\ is
 * recomended, in which case you should set ANTIWORD_PATH (below) to
 * 'C:\\antiword\\antiword.exe'. Windows Antiword will have problems locating
 * mapping files if you install it anywhere but C:\antiword\.
 */
define('ANTIWORD_PATH', "/usr/bin/antiword");
define('ANTIWORD_MAP', '8859-1.txt');

/* XPDF / pdftotext settings. Remember to use double backslashes (\) to represent
 * one backslash (\).
 * http://www.foolabs.com/xpdf/
 */
define('PDFTOTEXT_PATH', "/usr/bin/pdftotext");

/* html2text settings. Remember to use double backslashes (\) to represent
 * one backslash (\). 'html2text' can be found at:
 * http://www.mbayer.de/html2text/
 */
define('HTML2TEXT_PATH', "/usr/bin/html2text");

/* UnRTF settings. Remember to use double backslashes (\) to represent
 * one backslash (\). 'unrtf' can be found at:
 * http://www.gnu.org/software/unrtf/unrtf.html
 */
define('UNRTF_PATH', "/usr/local/bin/unrtf");

Re: CATS won't recognize antiword, htmltotext, unrtf, or pdf2txt

Posted: 16 Oct 2010, 15:46
by zoomiest
I have tested antiword, at the command line, and it did execute.\
Yes, my config.php does look like that.

Status:
the issues were experienced while setting up a new CentOS VPS account. I couldn't fix it, and I had never seen that before. So, I closed off the VPS account (at webserve.ca), and am in the middle of setting up another server, locally (I am a control freak).