ABBYY Software House has announced FineReader Engine 8.0, the latest platform release of its powerful recognition SDK. By integrating full page recognition, field-level recognition, PDF conversion and data capture capabilities in one SDK, Engine 8.0 essentially provides a single source for developers to integrate ABBYY's technology into a variety of DMS/ECM applications including: document/content processing, classification, indexing, archiving, document/PDF conversion, forms processing, and data capture from semi-structured forms and documents.
For the first time, FineReader Engine 8.0 addresses key new audiences with major field level recognition enhancements, making it an ideal platform for supporting applications such as key word indexing and document classification, control and verification systems, and data extraction from different documents by intelligent analysis (cheques, invoices, passports). These features, combined with enhanced PDF conversion and new customisation tools for added developer support make FineReader 8.0 Engine is the most accurate and comprehensive software development kit for document conversion and data capture. Unlike any other toolkit in its class, FineReader Engine including all the key functionality needed to support today's DMS and ECM applications.
ABBYY FineReader Engine 8.0 supports 189 OCR, 91 ICR languages, OMR, plus 1D and 2D barcodes. The new version delivers an overall boost in recognition accuracy boost, enhancement on field-level recognition, new document analysis tools, and new features such as full-text index preprocessing, making it applicable for different tasks. It also provides programming-specific tools to aid developers in creating accurate and efficient applications, such as external Voting API support (for solutions with multiple engines) and lower level access for recognition <tuning on-the-fly>. Developers can also take advantage of the database of code samples, complete with sample images and benchmark data for common-use cases. ABBYY also offers professional services and works closely with its developer community to help achieve the optimal balance between speed and accuracy for each particular application.
Overall recognition enhancements
- OCR accuracy enhancement. ABBYY FineReader Engine 8.0 delivers a significant increase in overall recognition accuracy with up to 30percent accuracy improvement for "difficult-to-read" images such as faxes and documents scanned at low resolution.
- Fast Mode for ICR - FineReader Engine also offers an option for increasing speed on field-level ICR; up to 2 times faster
- Adaptive image pre-processing for camera images. The new technology applies different processing algorithms and corrects specific imagedistortions typically seen in digital camera images. This provides an improvement of up to 40% in digital camera OCR (compared to previous versions of the technology).
Field-level/zonal recognition improvements
FineReader 8.0 contains a complete set of field-level recognition functions using OCR, ICR, OMR or barcode recognition and extracting the text or data from specified zones or snippets of images. Special enhancements in 8.0 ensure accuracy and speed enhancement on small fields/zones.
These improvements include:
- Fast mode ICR, performing ICR up to two times faster.
- Better text extraction from the fields, even when text is overlapped with field lines.
- Detection of in-field spacing, accurately recognising fields where the spaces are allowed. Version 8.0 also includes dictionaries which may contain word-combinations with spaces.
- Intelligent processing of blocks with intersecting parts and lines, recognising the text (words and symbols) which are completely located within the block borders without spending the time to recognise non-relevant text blocks.
- Text block despeckle, with the ability to specify the size of white or black "garbage".
- Voting API, word-level and character-level hypotheses for following voting scenarios.
- "On-the-fly" recognition tuning, allowing integrators to influence hypothesis choice by inserting additional ranking criteria during the recognition process.
Full page recognition/document (PDF) conversion features
With significant technology enhancements, ABBYY FineReader Engine 8.0 offers higher performance and a recognition rate up to twice as fast when converting source PDF files. With extensive functions for both PDF input and output, version 8.0 also provides developers with new powerful tools to create PDF conversion applications (including PDF to a variety of formats, or image to searchable PDF)
Enhanced PDF conversion (PDF input)
- More accurate and up to 2 times faster PDF processing
When processing PDF files, ABBYY FineReader Engine determines whether or not text is embedded, examines the integrity of the text layer, and analyses internal information within the source PDF files (ie: annotations, meta-data, text objects, font dictionaries and content streams). Using all this information it makes a decision as to whether to extract the text or apply OCR. It exams each block individually, and selects the most appropriate method to apply to each block. This process ensures more accurate and faster PDF conversion.
- Extraction of internal PDF links and hyperlinks
- Compliance with security settings of source PDF files
Enhanced PDF output
- PDF security setting and encryption support. ABBYY FineReader Engine 8.0 supports open and permission passwords for output PDF files, allowing users to restrict printing, editing, or extracting of file content making it well-suited for professionals working in government ministries and other organisations demanding high security. It also supports RC4-based from 40-bit to 128-bit, and AES (Advanced Encryption Standard)-based 128-bit encryption.
- Tagged PDF: In addition to output to a variety of searchable PDFs and image only PDFs, version 8.0 now creates Tagged PDFs that allow text to be reflowed to fit different page or screen width. This makes it easy to generate PDF files that are optimised for viewing on handheld devices and accessible by screen readers typically used by the visually impaired community.
- Meta-data for PDF files. It is possible to add the following meta information during the PDF Export: bookmarks, hyperlinks, and document properties.
Document analysis for full text indexing
This feature supports automatic detection and recognition of text on an image including the text embedded in pictures, charts, and diagrams. Document Analysis for Full Text Indexing provides exhaustive information on text that is vital for further document index building. This makes FineReader Engine 8.0 truly indispensable for indexing solutions (for building an index in/for DMS, CMS, archiving systems). Data capture from semi-structured forms and documents
The new ABBYY FineReader Engine offers semi-structured form and document processing through support for the latest ABBYY FlexiCapture Studio 1.5 tool. This makes form and semi-structured document processing even more accurate and minimises the amount of adjustments required for each project.
New features supported by FlexiCapture Studio 1.5 include:
- Table element support, enabling proper reading of tables in documents, providing easy extraction of line-item details. Ideal for processing invoices and other financial documents.
- Specialised numerical element support, supporting new "Phone" and "Currency" element types streamlines the description of this type of data on the form and thus increases capture quality.
- Texture filtering, offering enhanced pre-processing technologies screen out irrelevant texture that may affect recognition quality.
- Multiple language selection for pre-recognition, enabling the pre-selection of mixed-language combinations, for example English-German, for easier processing of multiple language documents.
Development platform function enhancement
Sample codes for maximum performance and efficiency
The new SDK is supplied with the database of common Engine Usage Samples which help to tune FineReader Engine for each particular project in the most appropriate way. This is a set of "ready-to-load" profiles with the optimal speed and accuracy performance balance. The profiles are designed for particular tasks such as field-level recognition, archiving with imaging and indexing (e.g. searchable PDFs), full-page conversion to RTF and HTML, etc. Also contains sample images and benchmarks.
External voting algorithm support
When using FineReader Engine as one of the participating engines in a third party application, FineReader supplies recognition alternatives (or hypotheses) with relevant confidence level on characters, words and inter character separation. This information helps developers design an efficient and accurate voting algorithm. For example when recognising "O" - FR Engine may return 3 hypotheses: as "0" (zero), with 60%, or Capital "O", with 80%, and Capital "C" with 10% confidence. Another example for inter character separation: m can have hypotheses m, or rn, or in.
On-the-fly core recognition tuning
The version 8.0 SDK provides developers with the access and ability to manipulate the engine during the recognition process on a core level. The FineReader recognition engine generates hypotheses (or recognition alternatives) and allows developer's to influence or fine-tune the procedure of setting confidence level for each hypotheses (or selecting the best hypothesis) using their own specific ranking criteria.
"Our developer customers want to use FineReader Engine to enhance their ISV applications with document conversion and data capture capabilities that deliver the optimal balance between accuracy and speed," explained Alex Rylov, Chief product manager for ABBYY's technology licencing products.
"FineReader Engine 8.0 delivers a powerful combination of core technologies, and builds upon that by delivering productivity tools such as diagnostic tools, pre-defined samples for the popular processing scenarios, and a Voting API and recognition tuning. We give our customers the tools they need to significantly influence their productivity while our technical teams can work closely with theirs to help them achieve their ideal levels of performance -- whatever the application"
Input/output formats support for all types of functions
ABBYY FineReader Engine supports a variety of input image formats (including BMP, PCX, DCX, JPEG, PNG, TIF and PDF) and document savingformats (including DOC, RTF, PDF, HTML, PPT, TXT, XLS, DBF, and three types of XML). The new version also supports new input formats: GIF and DjVu, which are very useful for web publishing, online archiving, SPAM filtering and other tasks concerning the Web.