|
Wednesday, July 07, 2004OS X Odyssey 591 - Trapeze 1.0.4 PDF To RTF, ASCII Or Plain Text Conversion UtilityRelated categories • OSX Odyssey Comments • Tell-a-Friend • Print • Today's Headlines Trapeze is a drag-and-drop text extraction utility that converts PDF files to editable RTF, ASCII or plain text, with options for white space stripping, paragraph rewrapping and page break marking. Additional features include: advanced text encoding (e.g. smart quotes, bullet icons, copyright and trademark symbols, etc.), targeted RTF optimization (TextEdit or Microsoft Word), and support for encrypted (password protected) PDF. ![]() Trapeze's reformatting engine is capable of high quality RTF conversion (in some cases a near facsimile of the original) and faithful plain text reproduction. I receive a lot of press releases and other research materials in PDF format these days, and while OS X's excellent Preview PDF application has made PDFs a lot less of a pain than they used to be, cutting and pasting from PDFs is still cumbersome and painful. Trapeze sounded like a promising solution. When you launch Trapeze, a drop window appears into which you can drag and drop PDF files from the Finder. The drop window also allows you to set the desired conversion format and options (you can also set a preference to prevent the drop window from being shown). Drag and drop a PDF file from the Finder into the Drop window to begin the conversion process. If the Drop window is not visible, select Convert from the File menu to specify which PDF file to convert. If you drop (or specify) multiple files, they will be converted in a batch. To convert files from outside Trapeze, drag and drop a PDF file from the Finder onto the Trapeze application icon. If Trapeze is not running, it will launch, display a Popup window, and quit automatically when finished. While a conversion is the progress, a dialog box will display its status. Conversion options are located in either the Drop window or the Popup window. ![]() The Strip formatting white space option limits any formatting white space (spaces and new lines) to two consecutive characters. This is useful if you are interested in the PDF extraction of text content rather than the document layout. This option will result in the leftward collapse of tables, right and center justified text and text columns. Consequently this option is not suitable for PDF documents with multi-column pages. The Rewrap paragraphs option removes the hard return (newline) at the end of each line in a text block that is identified by Trapeze to be a paragraph. This is useful if you intend to repurpose the text from a PDF document into a new word processing document. Note that Trapeze's algorithm for rewrapping paragraphs is somewhat conservative to avoid inadvertent rewrapping of lists, tables and footnotes. This option is only applicable if Strip formatting white space is on. The Mark page breaks option adds a visible page break between the pages of a converted file. This is useful if formatting has been stripped from the converted file or if the output format does not have an inherent page boundary (i.e. ASCII and plain text). This option is only applicable to RTF conversion if Strip formatting white space is on. Trapeze extracts text from a PDF file, but not images. In some PDFs, the text you see on the page may be a vector drawing or even a raster image (e.g. from a scanned PDF), and the actual text is not embedded in the file. In these cases, only an OCR (Optical Character Recognition) tool can deduce the text represented by the image. Trapeze does not have OCR capabilities. The reason for separate TextEdit and Word RTF optimizations is that TextEdit does not support some of the layout properties in the RTF specification, which results in substantial layout differences (compared with Word), which are compensated for in the optimization process. Here is what a press release I received this week looks like in Preview: ![]() Here's what the text looked like manually cut and pasted into Tex Edit Plus: ![]() Here is a Trapeze Plain text conversion: ![]() And here's what a Plain text conversion looked like with the "Strip formatting white space" option selected β€” the configuration I found most useful for my purposes. ![]() If one is planning to print the document and wants to retain the text formatting, the RTF (Text Edit) conversion did a very decent job: ![]() Whether Trapeze is worth $29.95 will depend upon how useful the ability to convert PDFs to editable files is to you. System requirements: Mac OS X 10.3 or later Trapeze is demoware. An unregistered copy will convert the first three pages of any PDF. A single user license for Trapeze sells for $29.95. For more information, visit: http://www.mesadynamics.com/trapeze.htm Related categories • OSX Odyssey Comments • Tell-a-Friend • Print • Today's Headlines | RSS | Del.icio.us | Digg | Ma.gnolia | Reddit | Spurl | Newsvine | StumbleUpon | (0) Trackbacks •
Reader CommentsNext Article: Low Stress ‘Books Previous Article: Moore's MailBag - Wednesday, July 7, 2004
| |||||||||||||||||||||||||