Page 1 of 1

Re: Character encoding issuses using OCR (namely: docx2txt, odt2txt)

Posted: 19 Dec 2024, 11:57
by MarcinP
I've just tried the chat to no avail:
Code: Select all
case DOCUMENT_TYPE_DOCX:
    $this->_rawOutput = mb_convert_encoding($this->docx2text($fileName), 'UTF-8', 'auto');
    if ($this->_rawOutput == null) {
        return false;
    }
    $this->_linesArray = explode("\n", $this->_rawOutput);
    $this->_linesString = $this->_rawOutput;
    return true;

case DOCUMENT_TYPE_ODT:
    $this->_rawOutput = mb_convert_encoding($this->odt2text($filename), 'UTF-8', 'auto');
    if ($this->_rawOutput == null) {
        return false;
    }
    $this->_linesArray = explode("\n", $this->_rawOutput);
    $this->_linesString = $this->_rawOutput;
    return true;

That does nothing.
I'll keep looking.

Re: Character encoding issuses using OCR (namely: docx2txt, odt2txt)

Posted: 27 Feb 2025, 08:48
by clogstweed
Though I had some success, I can't manage to sort out the encoding when uploading data to candidate profile. Mind you, when I check all the OCR tools on the server, they all run well.