Provides: OCR (Optical Character Recognition)
Developer: IRIS
Minimum Requirements: Mac OS X v10.3 or later, 110MB free hard disk space, TWAIN compliant scanner
Retail Price: $129.99
There are several pleasures with this release of ReadIris OCR software, starting with some nice improvements coupled with the fact that Iris Software released these updates on the Mac before the PC. That notwithstanding, there are a few glitches in an otherwise outstanding program.
ReadIris is OCR (Optical Character Recognition) software that allows users to scan a printed page or open a PDF and turn any text on that page into standard, editable text. That is, you can take a page of text on a paper, scan it, run it through ReadIris, and work on that text in MS Word, Excel or any other computer program that handles text. For more on the OCR process and past OCR reviews, please see either (or both) of two reviews: here from 2000 and here from 2002. In addition, you can read my previous review of ReadIris 9.
Of all of the new features in this release, the most important one to me is that ReadIris can now understands super- and sub-scripting. It may not seem like much, but for scientific work, this is fundamental. This was a big hole in the features of the previous version, and its addition is a big boost. Otherwise, the other two new big features are that ReadIris can now "read" scanner codes and (in a very limited fashion) can read hand-printed text. More on both of these later.
Superficially, ReadIris now has a "brushed aluminum" look (or not, as you can turn this off in the Preferences) and have somewhat updated the icons in the tool bar. The full window looks as seen below after having scanned an article that appeared in a journal:

The icons across the top can be personalized by removing and/or re-ordering them. If you chose to re-set them, this can be done by clicking on the Customize icon (to the right (out of view) in the image above).
There is significant redundancy throughout the program for controls, a feature I've always appreciated in any program. On the left-hand side are the Process controls divided into three sections. The top (which looks like a yellow package with a red ribbon) is for complete automatic processing and can be "On" or "Off." (The red ribbon is an "X" because this feature is currently in the "Off" mode.) If you are working with very standard consistent pages of text, this works great. If there are any variations where the human head is needed to make decisions, turn this off.
Below that are the Acquire buttons. The top button shows EPSON (I have an Epson scanner) which allows direct scanning into ReadIris (this requires Twain support). The other option on this button is to "Open" files such as a PDF, TIFF, JPEG or other electronic image document. The "Text" button's alter-ego is a "Graphic" option (are you saving text or graphics). The last icon is for tools which are coupled with some of the icons across the top of the window. As seen below, what this does is set up as automatic operations the process of Page Deskewing (minor rotation of the page so that the text is horizontal if the page on the scanner was not perfectly placed), Detect Page Orientation auto-rotates the page 90 or 180 degrees as needed, Page Analysis auto selects the text and sections and automatically determines what's a graphic, text, table, or other possible selection. Lastly, there is Despeckling which can automatically remove artifacts from a scan. While this may seem good, it can also remove the dots above "i" or "j" letters and/or periods (.). The difference between this automatic option and the icon "Adjust" across the top of the window is that the latter, in addition to despeckling, also provides the ability to lighten and darken an image in addition to setting contrast.

The other group is the "Recognize" set. On the top is an earth icon, and it shows English. This is one of the greatest strengths of ReadIris: its language capabilities. ReadIris boasts it can read 123 languages, including Hebrew and various Asian languages (there are extra costs for the later two language sets).
You can self-set any zone. This is done by selecting the "Zones" icon on the top of the window and selecting the type of zone you want ReadIris to set for that region of the page. Thus, for example, if there is text on the page that you want to remain as a graphic, you can select "Graphic Zones" and drag out a rectangle around that text. You might wish to do this if there were some complex mathematical formula on the page that you knew you could not reproduce in your word processing program. By maintaining this as a graphic, you would not have to worry about losing the structure of the formula.

Likewise, if there is part of the page that is set up as a table but without any lines for the column or rows, you could set that part of the page as a "Table Zone" so that it would be converted into a Table as opposed to simple text.
The Sort icon, just to the left of the Zone's icon in the image above, is used to re-establish the order of how the paragraphs would be created. The zones are numbered in the order they are created. Thus, if you have a graphic on the top of a page that you had re-set with a different zone type, after resetting it would be place on the bottom of the page unless you re-numbered the paragraphs. When you click on the Sort icon, all zones lose their numbers. You simply click on each zone in turn until you have re-established the zone order in which you want them to be recognized.
Once the article has been opened in ReadIris (and if you have the zones automatically set), unless you have absolute straight text pages to convert, you are wise to review each page and see how the zones were established. As can be seen below, ReadIris determined that the graph lines in the top of this image were of a table, so it boxed that region in a "Table Zone." Similarly, it took the numeric axis identification data and marked each one of them as a separate text zone. For my purposes, I want to consider this entire region in red as a single Graphic Zone. Unfortunately, that means one has to select each separate zone and using the Contextual Menu (or a right-click) select "Delete Zone." Unfortunately, there is no way to select multiple zones at one time. Thus, you have to select each zone to be deleted one at a time.

Another problem with automatic zone setting can be seen in the image below. There, you can see that the text block on the right has merged into the table on the left and the legend for that graph. Fortunately when you click on any zone, resizing handles appear on the mid-points and corners. Thus, I had to drag and re-set the Text Zone so that it was solely around the text block on the right. You do have to be a bit careful as it is not hard to jiggle the zone a bit off it's original location. As you can see below, the zone is created just at the very edge of the text. If you happen to move the zone off just a hair, the text within will not be "in the zone" and errors will be created because you will not have full shaped letters for analysis.

ReadIris will merge each line of text into its respective paragraph, but occasionally there are glitches that will need to be repaired. As seen below, a paragraph of text is wrapped around a graphic. The curious thing is that when this is processed, a new paragraph will be started on the word "component" and not a continuation of the paragraph ending with the word "every."

There are two divisions when processing ReadIris: the first is identifying the material to be processed, the second is how the material will be processed. To the right of the Recognize button can be seen an Earth. Here, you can establish what language you want to process. ReadIris can recognize over 120 languages. There is also "Numeric," used for reading spreadsheets of numeric data. Below that is the Learning icon (a graduation hat). This is used when you have a unique font that ReadIris is not familiar with. As you "teach" ReadIris each of the letters, your need to interact will decrease. Lastly, there is the Output Format icon.
Output can either be RTF (Rich Text Format), PDF, HTML (for web sites) or Unicode (a 16 bit character space, good for multiple character types on the web and also good for straight text with no formatting whatsoever). If, for Layout, you select "Create body text," you get text with no images. If you select "Retain word and paragraph formatting," all of the images will be placed in the zone order they were set. If you select "Recreate source document," ReadIris does an amazing job of recreating the original document. However, if you plan on doing any reformatting of this document (like selecting a smaller font size or any action that will change the layout structure of the original) you will have much grief. This is because each paragraph will be placed in its own text box in Word and any text flow is not possible. Because of this, selecting "Retain word and paragraph formatting" is the most practical unless you have no reason to alter the text after you convert the document.

Please note the small selection on the bottom-middle of the image above where it says "Page Sizes..." Select this early. ReadIris is a Belgian product, and the default page setting is A4. When you go to print, you will be told your printer is out of paper. What it is out of (if you are in the US) is A4 paper. This can be changed per document in the Page Setup... selection (under the File menu), but it is much more efficient to reset the page size in this button.
What's missing in ReadIris 11 that existed in ReadIris 9 is the ability to set default fonts. While ReadIris does a good job recreating the fonts in any given document, you can't control what the font will end up as when you are creating general body text.
Once all the zones are established, the user clicks on the "Recognize" button on the bottom left region of the window. Wisely, ReadIris does not provide a "way-station" for the text prior to transferring the data into a word processing program. That is, in some OCR programs, after the text is recognized, the user is presented with the complete text within the OCR program for correction. It's been my experience that no OCR program could perform spell corrections as well as a formal word processing program. With ReadIris, text export is via RTF, and you can open this up in Word or any word processing program for final correction.
Despite all of the problems I've mentioned so far, at this point, ReadIris does two things wonderfully: it process the text in blazing speeds, and it's amazingly accurate. There are the occasional problems with the letter "i" being read as "l," and I found that despite ReadIris doing a good job on sub- and super-scripting, negative exponents would occasionally not have the hyphen be super-scripted (e.g., 10-4 would be read as 10-4). Similarly, I also noted that endnote numbers also were not consistent (e.g., "...according to Rosenblatt3, 4 ." would end up as "...according to Rosenblatt3, 4 .")
Another problem area are hyphens at the end of a line of text when the sentences will end up different than the original text. This is extremely common if you are taking text from (for example) narrow column text such as newspaper columns. When you bring text like this into Word, all hyphens at the end of lines show up as Optional Hyphens in the middle of lines. These look different than standard hyphens, as they have a foot on them. You can see one of these in the image below. The good news is that these only show up if you have the paragraph symbol set to display.

[If you want to globally remove Optional Hyphens in Word, bring up the Find dialog and click on the Replace tab. In the Find field, type in "^-" (without the quotation marks). In the replace field, type in nothing. Now press the "Replace All" button. The good news is that now any necessary hyphens will show up as misspelled words and are easy to repair.]
However, when I talk about ReadIris being accurate, I mean that (assuming you are working with a good quality scan). A ten page document might end up with fewer than five errors. However, keep in mind that if the document has any super- or sub-scripted text, the problems I previously mentioned will cause problems.
One of the two big features for ReadIris 11 is the ability to read bar scans. This has been around in the ReadIris world with their IrisPen (see my review) for some time, and now it has been brought into ReadIris. (If you get the corporate version, you can also read business cards directly into your PIM programs, just as you can with CardIris (see my review.)
The big big feature of ReadIris turns out to be a bit of a letdown. What people have wanted for years is the ability for OCR programs to read handwriting. Sadly, ReadIris cannot do that at all. Hand-printing, on the other hand, it canbut with major caveats. To perform this, you need to first print out special printing sheets that ReadIris provides in both PDF and Word format. You need to print out in either portrait or landscape view of rows and columns of yellow, orange, or blue squares. Once these are printed out, the user must print capital letters within the little boxes. It is from these that ReadIris can read hand-printing.
Perhaps you have already had to fill in special boxes when filling out forms. It's not fun, and this isn't either. Accuracy isn't very good (but it's not fair to judge based on my hand printing), but suffice it to say that this is a work in progress.
I also have to cover one other aspect in my review, and that is the issue of obtaining updates and getting support. If you go to the ReadIris site, there is no mention as to which version is the most recent version, nor is there anything to download to update your program. If you go to versiontracker.com and you click on the Download button, you are brought to the ReadIris site. If you contact ReadIris support, they will tell you to go to versiontracker.com to get the update.
To properly obtain updates, what you are supposed to do is go to the Help menu and select "Check for Updates." When I did this with version 11.0.1, I was told there were no updates. When I contacted support, it took about four days of cross e-mails before I finally was provided an ftp link to obtain the 11.0.3 version which did successfully get the "Check for Updates" to work so I could update to 11.0.5.
Over the years, I've gone though a lot of software and have gone through many more updates. To have the program provide updates from within the program is not truly rare, but when the system breaks down (as it did for me), the whole process failed miserably. Because of this, ReadIris has about the worst update process I've encountered, and their support had me go through more hoops than I've normally had to go through to get assistance.
In short, there is much to like and much to be frustrated with in ReadIris 11. On one hand, it is the only OCR package for the Mac right now, so I don't want to slap its hand that much. But of the three main new features (bar code reading, hand printing reading and super- sub-script reading), the former is only practical if the bar code can be placed on a flat scanner, the hand-printing is only practical if you have the special sheets to work with and you have the time to carefully print, and the super- and sub-scripting is almost guaranteed to require fixing. When you couple that with their poor support for updating a program, it leads you to be frustrated. I also implore ReadIris to let the user select more than one zone at a time and to preferably be able to do so via marqueeing around a group of zones.
All that notwithstanding, when it comes to basic OCR, ReadIris is blazingly fast and incredibly accurate.
Applelinks Rating

___________ Gary Coyne has been a scientific glassblower for over 30 years. He's been using Macs since 1985 (his first was a fat Mac) and has been writing reviews of Mac software and hardware since 1995.
Tags: Reviews ď Utilities Reviews ď Writing/Publishing ď

Other Sites