Character encoding issuses using OCR (namely: docx2txt, odt2txt)

Moderators: RussH, cptr13

Forum rules: Just please remember to play nicely once you walk through the door. You can disagree with us, or any other commenters in this forum, but respect our space and keep your comments directed to the topic at hand.

6 posts

6 posts

Re: Character encoding issuses using OCR (namely: docx2txt, odt2txt)#5920

By MarcinP - 19 Dec 2024, 11:57

- 19 Dec 2024, 11:57 #5920

I've just tried the chat to no avail:

Code: Select all

case DOCUMENT_TYPE_DOCX:
    $this->_rawOutput = mb_convert_encoding($this->docx2text($fileName), 'UTF-8', 'auto');
    if ($this->_rawOutput == null) {
        return false;
    }
    $this->_linesArray = explode("\n", $this->_rawOutput);
    $this->_linesString = $this->_rawOutput;
    return true;

case DOCUMENT_TYPE_ODT:
    $this->_rawOutput = mb_convert_encoding($this->odt2text($filename), 'UTF-8', 'auto');
    if ($this->_rawOutput == null) {
        return false;
    }
    $this->_linesArray = explode("\n", $this->_rawOutput);
    $this->_linesString = $this->_rawOutput;
    return true;

That does nothing.
I'll keep looking.

Re: Character encoding issuses using OCR (namely: docx2txt, odt2txt)#6032

By clogstweed - 27 Feb 2025, 08:48

- 27 Feb 2025, 08:48 #6032

Though I had some success, I can't manage to sort out the encoding when uploading data to candidate profile. Mind you, when I check all the OCR tools on the server, they all run well.