Scanning Pens Digital Pens

Show VAT Inc / Excl / Both

The new Reading Pen TS Oxford is designed to help readers of all ages and abilities improve their English Reading Skills....

Find out more

Place your order by 1pm for UK Next Day Delivery

+442079764910

Got a question?  Call us on:

Knowledge Base

CHAPTER 8 RECOGNIZING DOCUMENTS

INTRODUCTION

To recognize documents, Readiris applies linguistics during the recognition phase. As a result, Readiris recognizes text, tables and graphics, barcodes and handprinted text in all kinds of documents. Readiris even copes with complex columnized documents, low- quality documents, faxes, dot matrix printouts, badly scanned and copied documents containing too light or dark font shapes, etc.

Readiris supports 125 languages: all American and European languages are supported, including the Central-European, Baltic and Cyrillic languages as well as Greek and Turkish. Optionally, Readiris can read Hebrew documents and four Asian languages - Japanese, Simplified and Traditional Chinese and Korean. Readiris even copes with mixed alphabets: the software detects “Western” words that occur in Greek, Cyrillic, Hebrew and Asian documents - many untranscribable proper names, brand names etc. are written using the Western symbols.

Readiris is based on the most advanced recognition technologies. Font-independent text recognition is complemented by self-learning techniques. The system is able to learn new characters and words through contextual and linguistic analysis. This means that the OCR accuracy of the recognition system will improve as it goes along.

Besides that, Readiris has a user verification function. When activated, the user verification function (Interactive learning) not only flags characters the recognition system isn't sure of but also allows to increase the system's accuracy. All solutions you confirm are memorized temporarily during recognition, increasing the system speed and confidence and rendering the system more intelligent as you go along. This powerful learning tool also allows you to train Readiris on special characters such as mathematical symbols and dingbats and to handle distorted fonts. 

 

The interactive learning results can also be stored permanently in font dictionaries for future use.

 

SELECTING THE DOCUMENT LANGUAGE

Readiris offers OCR in 125 languages. Readiris supports all American and European languages including the Central-European, Cyrillic and Baltic languages, as well as Greek and Turkish.

Readiris Pro Asian and Readiris Corporate Asian additionally recognize documents in Japanese, Simplified Chinese, Traditional Chinese, Korean and Hebrew.

In order for Readiris to recognize a document, the document language must be specified.

 

To do so:

Click the globe button on the main toolbar and select the language of your choice in the Primary language list.

Important: select the document language before executing page analysis when you are dealing with Asian or Hebrew documents. Specific page analysis routines are used for these documents.

Tip: in case you want to recognize documents in multiple languages, make sure to select the language with the biggest character set. E.g. if you want to recognize a document that contains both English and French text, select French as document language. This way, the accents will be recognized correctly.

The recognition can also be limited to a Numeric character set to optimally recognize tables and figures. Readiris then only recognizes the numerals 0-9 and the following series of symbols:


To activate numeric mode, select Numeric at the top of the Primary language list.

 

Recognizing documents with mixed languages

Readiris also allows you to enable mixed character sets. That way Readiris switches languages in the middle of a sentence automatically and recognizes English words (proper names etc.) that occur in "exotic" languages.

Click the globe button on the main toolbar and select the required language combination in the Primary language list.

Note: when processing Asian or Hebrew documents, mixed characters sets are used automatically.

 

Selecting the language per page

When specific pages use a different language than the overall document, you don't need to define a secondary language. You can apply a different language to those pages.

Select the pages in the drawer, Ctrl-click them and use the command Language to assign another language than the overall document language to that/those page/pages.


Pages with a different language than the overall language are marked in red in the drawer.


Note: the tooltip of each page in the drawer indicates which language applies to that page.

 

DEFINING THE DOCUMENT CHARACTERISTICS

Next to the document language, other document characteristics such as the Font type and Character pitch play an important role in the recognition process. 

Font type

Readiris distinguishes between "regular" and dot matrix printed documents. Dot matrix symbols (of the type 9 pin) are made up of isolated, separate dots.

Special segmentation and recognition techniques are required to recognize dot matrix documents and need to be activated.

To select the font type:

  •  On the Settings menu, point to Font type.
  •  The font type is set to Automatic by default.

That way, Readiris recognizes "25 pin" or "NLQ" (Near Letter Quality) dot matrix, or other "normal" printing.

  •  To recognize only dot matrix printed documents, click Dot matrix.

Readiris will recognize so-called "draft" or "9 pin" dot matrix printed documents. 

Character pitch

The character pitch is the number of characters per inch in a typeface. The character pitch can either be fixed, in which case all characters have the same width, or proportional, in which case the characters have a different width.

 

To select the character pitch:

  •  On the Settings menu, point to Character Pitch.
  •  The character pitch is set to Automatic by default.
  •  Click Fixed if all characters of the typeface have the same width.
    This is often the case in old typewriter documents.
  •  Click Proportional if the characters of the typeface have a different width. Virtually all fonts in newspapers, magazines and books are proportional.

Important: these document characteristics do not apply to Asian or to Hebrew documents.

USING INTERACTIVE LEARNING

Readiris offers an interactive learning function. By means of Interactive learning you can train the recognition system on fonts and character shapes, and correct the OCR results if necessary. During interactive learning, any characters the recognition system isn't sure of are displayed in a preview window, in combination with their parent word and the proposed solution.

Interactive learning can substantially enhance the accuracy of the recognition system and is particularly useful when recognizing distorted, defaced forms. Interactive learning can also be used to train Readiris on special symbols it is unable to recognize initially, such as mathematical and scientific symbols and dingbats.

 

To enable interactive learning:

  •  On the Learn menu, click Interactive Learning.
  •  Click the Recognize + Save button to recognize the document.
    Readiris enters the interactive learning phase.

The characters the recognition system isn't sure of are displayed.


If the results are correct:

o  Click the Learn button to save the result as sure.

The learning results are temporarily stored in the computer memory, for the duration of the recognition. Readiris will no longer display the learned characters when OCRing the rest of the document.

When a new document is OCRed, the learning results are erased.

To save learning results permanently, use a font dictionary. For more information, see the section Using font dictionaries.

o  Click Finish to save all solutions the software offers. If the results are incorrect:

o  Type in the correct characters and click the Learn button.

 

Note: if you are dealing with documents that contain special characters make sure you click the command Special Characters on the Edit menu. Double-click the characters you want to insert.

or

o  Click Don't learn to save the result as unsure.

Use this command for damaged characters which could be confused with other characters if learned.

E.g. the number 1 and the letter I, which have an identical form in many fonts.

o  Click Delete to delete characters from the output.

Use this button to prevent document noise from appearing in the output file.

o  Click Undo to correct mistakes.

Readiris keeps track of the last 32 operations.

o  Click Abort to abort interactive learning.

All learning results will be deleted. Next time you click Recognize + Save, interactive learning will start again.

USING FONT DICTIONARIES

 

When scanning many documents of the same type, font quality and printing quality, you may not want to repeat the learning process every time. Therefore, it is useful to use font dictionaries. Font dictionaries contain font information learned during interactive learning and can substantially increase the recognition results.

Note that font dictionaries are limited to 500 shapes. You are recommended to create separate dictionaries for specific applications.

 

To create a new font dictionary:

  •  On the Learn menu click the command New Dictionary.
  •  Click Interactive Learning on the Learn menu to activate it.
  •  Click Recognize + Save to recognize the document.
  •  Readiris enters the interactive learning phase. Use the buttons of the dialog box to save characters in the font dictionary.
  •  When the recognition is completed, click Save to save the document.
  •  Then return to the Learn menu and click Save Dictionary to save it.
  •  Enter the name of the dictionary and click Save.

 

To use an existing font dictionary:

 

  • On the Learn menu click Open Dictionary.
  •  Select the dictionary you want to use and click Open.
  •  Click Recognize + Save to recognize the document.

 

 

Calificación:

Accepted Credit Cards

RSS

Skype Me™!

Order Our Free Catalogue

SSL

As a customer since 2007 I would like to say the service and quality provided have been exceptional. You saved my time on many occasions and proved yourself 100%.

- Muzaffar, London


Click here to submit your own testimonial.

Payment Options

Terms and Conditions | Support | Privacy Policy | Tracking | Payment Methods | How to Order | Trade Account | Catalogue
Opinions expressed in comments or reviews within this site are those of their owners and may not reflect the opinion of Scanning Pens Ltd. All logos and trademarks within this site are the property of their respective owners.
Scanning Pens Ltd is part of the TJSC Group. Copyright © 2002-2011 Scanning Pens Ltd.
Sitemap | Can you help?

  Loading...