General discussion of OpenCATS

Moderators: RussH, cptr13

Forum rules: Just remember to play nicely once you walk through the door. You can disagree with us, or any other commenters in this forum, but keep comments directed to the topic at hand.
User avatar
By alexukie
#22
Hey guys,

So I've never got around installing Sphinx - was wondering if anyone still has instructions / tips / life experience on how to do this.. Cheers. And does it really speed it up that much? Mine takes about 10-20 seconds to do the search..

Alex
User avatar
By RussH
#25
alexukie wrote:Hey guys,

So I've never got around installing Sphinx - was wondering if anyone still has instructions / tips / life experience on how to do this.. Cheers. And does it really speed it up that much? Mine takes about 10-20 seconds to do the search..

Alex
'fraid not - I posted the 'sphinx for cats' tarball into the downloads section... but there are instructions in the tarball;
$Id: INSTALL 2672 2007-07-12 15:18:01Z andrew $

Sphinx_for_CATS Installation Guide
Last Revised: July 12, 2007
_____________________________________________________________________

Sphinx for CATS Installation Procedure


Preamble:
NOTE: This document is not yet complete. Please visit the forums at
http://www.catsone.com/ if additional help is needed.


Index:
A) Requirements
B) Introduction
C) New Installation - Windows
D) New Installation - Unix/Linux


A) Requirements

* Linux, FreeBSD or Windows NT-based (2000, XP, Vista) Operating System
* CATS (open source applicant tracking system) installed

B) Introduction

Sphinx is a full-text search engine, distributed under GPL version 2.
With CATS, Sphinx can dramatically improve the speed of database
text searches (for example, candidate resume searches). This package
includes the following:

i) A copy of Sphinx designed to work with CATS.

ii) A CATS module designed to take advantage of Sphinx.

iii) An automated installer to setup Sphinx and CATS configuration
files and install all other necessary files.

This installer is a Bash script that will run under most UNIX-like
operating systems as well as Cygwin, a Linux-like environment for
Microsoft Windows.

C) New Installation - Windows

To install Sphinx for CATS using Microsoft Windows, you first need to
dowload and install a program called Cygwin, which is a Linux-like
environment. You can download the latest copy of Cygwin directly from
their website @ http://www.cygwin.com/

Installation Steps:

1) Download and install Cygwin. Run Cygwin.

2) Download "sphinx_for_cats.tar.gz" and extract it into your CATS
directory. It is possible to use Cygwin to do this:

cd /cygdrive/c/path/to/cats
tar -zxvf /cygdrive/c/path/to/sphinx_for_cats.tar.gz
cd sphinx_for_cats

3) Run the automated installer to complete the installation.

./install.sh

The automated installer will install Sphinx, the CATS module, all
configuration files and it will start the Sphinx searchd service.

Once the automated installer has completed it will create a file
called "install_service.com" inside your new Sphinx installation
directory (it will tell you where that is).

To install Sphinx as a service (this is necessary or Sphinx will
not run after a computer restart) run this file as per the installer's
instructions.

D) New Installation - Unix/Linux

You will need shell access or a terminal on the Unix/Linux machine
you are running CATS on to use the automated installer.

Note: You should be logged on as a user with write access to the
CATS folder. It is NOT SUGGESTED to be logged on as root.

Installation Steps:

1) Download "sphinx_for_cats.tar.gz" and extract it into your
CATS directory.

cd /path/to/cats
tar -zpxvf /path/to/sphinx_for_cats.tar.gz
cd sphinx_for_cats

2) Allow the automated installer to execute and run the automated
installer to complete the installation.

chmod a+x install.sh
./install.sh

The automated installer will install Sphinx, the CATS module, all
configuration files and it will start the Sphinx searchd service.

If you use an init.d compatible service the installer will attempt to
install a startup script called "searchd". In most cases, this will
be unsuccessful (as you shouldn't be logged on as root).

If you have an init.d compatible system, follow these instructions
to install Sphinx as a startup service after you have run the
automated installer:

1) Log on as root

su

2) Copy the "searchd" init script (created in the current directory)
to /etc/rc.d/init.d (or wherever your init.d scripts go on your
system)

cp searchd /etc/rc.d/init.d

3) Run chkconfig to run Sphinx on startup.

chkconfig --add searchd

_____________________________________________________________________

Copyright (C) 2006 - 2007 Cognizo Technologies, Inc.
Let me know how it goes!
By MePHiSTY
#26
Sphinx will save your time...

I have a 2gb index(240000 resumes), and 2 seconds is a bad time

Sphinx 0.9.8 is near to be released...
User avatar
By alexukie
#27
Also I just realized that Russ sent me this link about Sphinx:
http://www.notsofaqs.com/catsdoc/doku.php

It is about (it applies to 0.6.1 only so i dont know if it will work in 0.9.1)_:
* Sphinx Integration with CATS
* Automatically Parse and Add Resumes to CATS
* Add PDF, RTF, HTM Converters to CATS
* Bug Fixes/Mods to 0.6.1


Alex
User avatar
By RussH
#30
Yep - I did wonder about that from HelpHand.. he had resume parsing working early on, and then the CATSONE guys introduced it as a 'paid for' feature later on.. Would be worthwhile digging into that one!
By asimbaig
#91
RussH wrote:Yep - I did wonder about that from HelpHand.. he had resume parsing working early on, and then the CATSONE guys introduced it as a 'paid for' feature later on.. Would be worthwhile digging into that one!
This is factually incorrect. Helphand introduced us to Sphinx. He signed an ICL (Individual Contributer License) agreement and gave us some code, may be less then 200 lines of code (I cant remember). He NEVER worked on resume parsing. period.

We introduced resume parsing much much later in Fall 07. It was written from the ground up and had a webservice architecture XML, SOAP, and some very very difficult mathematical models. The parsing infrastructure codename "resfly" (http://www.resfly.com) is over 30k lines of code and growing. That's what was introduced as a "paid" feature.

I respect open source, attribution, integrity. We don't take code from the community and introduce it as a commercial feature. Every single commercial feature had "hours" of discussions and short of blood baths within the CATS team. We took incredible care in making sure, we don't take someone else's work and call it ours.

Asim
952-232-0880 x101
asim@catsone.com
By akandels
#114
RussH wrote:Yep - I did wonder about that from HelpHand.. he had resume parsing working early on, and then the CATSONE guys introduced it as a 'paid for' feature later on.. Would be worthwhile digging into that one!
I take offense to this completely unfounded accusation as I wrote every single line of code for the Resfly parsing engine, which is enormous and was based on NO existing product by anyone -- simply several mathematical models and some own theories theories that I started in my own time as a skunk works project and later hacked into C++ and finally PHP. I worked and continue to work very hard on this project, this statement is extremely insulting.

The perl script written by HelpHand "guessed" names so long as your file names match the candidate name. Maybe an email here and then, but whos? The candidates or some reference they listed. And no offense to Helphand, but if you actually review his feature it was extremely simple and resembles a basic automated script at best.

Resly parsing is a HUGE project that can detect names, US address and phones, email, skill sets, education, experience, etc. from the entire document regardless of format or filename, or placement within the file. It parses the webpages sent from our toolbar from sites like Monster and it includes document conversion so that resume previews include markups like bold and color. It literally took months to develop and still isn't and will likely never be fully complete.

There are very few commercial parsers out there as good ones are extremely difficult to write. Companies like Sovren charge hundreds of thousands of dollars for SaaS parsing or up to 25c or more per resume parsed. Comparing a perl script that took at best an hour or two to write to a full fledged parsing solution is like comparing a skateboard to a jet airplane.
User avatar
By RussH
#118
Andrew,

I'm obviously no coder or I'd have realised that it's difficult to create a natural language parser. If the code had been open, I'd have realised when I saw your 100,000 lines compared to that available from HelpHand.

Thanks for coming back to the community - and I see you've helped some people already. That's what we're here for.
By cdsmerybuck
#267
A big thank yoy for bringing the CATS forum back, now I've got access I'm trying to install Sphinx. It seems to have installed OK, but as soon as I run a search I have a fatal error Sphinx Error: Connection to localhost:3312 failed.

Can anyone point me in the right direction to fix this

Thanks in advance

This is the "import from resume"[…]

EMAIL CONFIGURATION

Hi, the email configuration (including different[…]

as the title says...

It's essential to keep these synchronized to ensur[…]