Latest OCR software combines Thai, European (Latin/Roman), Hebrew, Chinese/Japanese/Korean (CJK), Cyrillic, Greek, and Armenian languages

assets/files/oldimages/3893-Thai-example.jpg

INFORMATION: Free information is available from ABBYY on the subject in this story. Click here to request a copy

ABBYY FineReader Engine First to Combine Extensive Language Support in a Single, Comprehensive Software Development Kit 

ABBYY, a world leader in the development of document recognition, data capture and linguistic technologies, has announced the newest edition of ABBYY FineReader Engine Software Development Kit (SDK), which is the first in the market to combine OCR (optical character recognition) on Thai, European (Latin / Roman), Hebrew, Chinese / Japanese / Korean (CJK), Cyrillic, Greek, and Armenian languages in a single SDK.

The new version, which delivers breakthrough features such as the ability to export documents to PDF/Archive (PDF/A) format, provides an intelligent approach for tuning the speed and accuracy ratio for OCR and PDF conversion to give developers unmatched choice and flexibility.

PDF/A Export

FineReader Engine allows documents to be exported to the newly developed searchable PDF/A format for long-term electronic preservation of page-oriented documents. The format provides reliable data exchange in enterprise and government environments and promises to become the primary document storage standard. Already, it is widely donated by national archives, records management divisions and agencies, state ministry archives and other influential organizations.

ABBYY is dedicated to continuing to evolve its technologies to support the latest standards, said Alexander Rylov, ABBYY Chief Product Manager. With ABBYY's rich linguistic and artificial intelligence expertise along with the addition of the PDF/A archiving format to FineReader Engine, ABBYY reinforces its status of a clear leader providing the most advanced SDK for document conversion and data capture.

Thai and Hebrew OCR

The newest edition of ABBYY FineReader Engine now processes documents in Thai and Hebrew by means of its proprietary linguistic technology and advanced OCR.

The Thai language, used by over 70 million people, is one of the most difficult for accurate natural language processing. It has about 80 characters including consonants, vowels, diacritics, and numerals. Thai words can be composed of four levels, with vowels written behind, over, under or around the consonants and diacritics located over and under the basic characters. In addition, there are no spaces between words. ABBYYs unique technology detects characters and separate text strings from each other to provide reliable recognition and is 50% more accurate than the competing Thai OCR.

Meanwhile, Hebrew, used by approximately 9 million people over the world, is written from right to left using the Hebrew alphabet, while numbers are written in the opposite direction, from left to right (most Hebrew texts today use European digits). In addition, texts in Hebrew often include words in English and other left-to-right languages. Abbyy FineReader Engines bi-directional recognition overcomes the challenge of processing Hebrew texts for OCR in both directions simultaneously within a single document. 

This is much more than just a support for two additional languages for OCR, said Alexander Rylov. ABBYY is proud to have reached a new level in developing recognition technologies in order to overcome the most challenging recognition hurdles, such as those posed by Thai and Hebrew. 

Other Featured Enhancements

Featured upgrades to ABBYY FineReader Engine also include:

  • Extended CJK export to PDF and RTF Expanded export capabilities for Chinese/Japanese/Korean (CJK) documents to PDF and RTF with vertical texts and complex layout retention.
  • Tuning of PDF conversion accuracy and speed balance Developers can now select one of the four different modes for tuning PDF conversion accuracy and speed balance that matches their specific processing requirements.  
  • Balanced recognition mode for OCR In addition to Accurate and Fast modes, the new balanced recognition mode provides a middle ground between recognition speed and accuracy. These pre-defined modes allow developers to quickly select the quality and speed ratio that best suits their projects requirements.
  • Support for EAN 13 supplemental barcode and MICR CMC-7 font The EAN 13 supplemental barcode remains the standard in the publishing industry for encoding ISBN numbers on books. CMC-7 processing provides high accuracy when recognizing bank checks and institutional payment remittances.
     

About ABBYY FineReader Engine

ABBYY's platform for computer intelligence technologies includes all the technologies needed for developing state-of-the-art data capture, document conversion, archiving, and content/document management systems. The FineReader Engine SDK enables developers to build applications of any scale and complexity: from client-oriented solutions to server-based distributed projects.

 
Availability and Pricing

ABBYY FineReader Engine is available through ABBYYs network of reseller partners. FineReader Engine is sold via a flexible, modular licensing policy. Developers may select the best combination of tools and pricing options for their project: they can choose only the functions and features they need. Pricing varies according to the number of pages processed.

A special time-limited trial version is also available for testing. Information on licensing models, pricing, and other technical information is available from your local ABBYY office.


About ABBYY Software House

ABBYY (ABBYY Software House) develops linguistic and artificial intelligence (AI) software providing a full line of document recognition, conversion and data capture technologies and products. ABBYYs products include: FineReader OCR systems a family of end-user programs and development tools for recognition of printed text, tables and forms; FormReader an ICR program for recognition and processing of hand-printed forms; and ABBYY FlexiCapture Studio a software tool that helps to extract data from semi-structured forms and documents. Companies that license ABBYY recognition technologies include Anoto (C-Technologies), Autonomy (Verity/Cardiff), Banctec, BenQ, DICOM (Kofax, LCI, Neurascript), EMC (Captiva, SWT, Documentum), EPSON, Freedom Scientific, Fujitsu, Hewlett-Packard, Kurzweil, Microtek, NewSoft, Notable Solutions, Stellent, Panasonic, ReadSoft, Samsung Electronics, Saperion, Siemens Nixdorf, Sumitomo Electric Systems, and Toshiba. ABBYY is headquarted in Moscow with offices in Ukraine (Kiev), the USA (Fremont, CA), the UK (Bishops Stortford, England), Germany (Munich) and Japan (Tokyo).

 

INFORMATION: Free information is available from ABBYY on the subject in this story. Click here to request a copy

Add a Comment

No messages on this article yet

Editorial: +44 (0)1892 536363
Publisher: +44 (0)208 440 0372
Subscribe FREE to the weekly E-newsletter