REALbasic University Resources:

RBU: Glossary Defines common REALbasic programming terms
  Archives Previously published columns
Translations: Dutch Courtesy of Floris van Sandwijk
  Japanese Courtesy of Kazuo Ishizuka
  Chinese Courtesy of Dong Li
  RBU Translation Guide Information on Translating RBU into other languages
Books: Matt's Book (2nd Edition!) Ideal for experienced programmers
  Erick's Book Best for beginning programmers
Websites: Mother Ship The publisher of REALbasic
  RB Webring Links to hundreds of REALbasic websites
  RESExcellence Another REALbasic programming column
  REALbasic Developer Magazine The premiere source for REALbasic instruction.

REALbasic University is Sponsored by

Make your Mac do what YOU want it to. Create games, utilities, cool Mac OS X tricks. Download REALbasic now and create your own software.


Print This Article

REALbasic University: Column 109

Debugging: Part Four

Last time we finished our "interactive" style of programming example. Today we're going to take a different approach and plan carefully. The result will give you a comparision of how debugging changes depending on the style of programming you're doing.

The Plan-Ahead Method

When I brought up this debugging topic, you'll remember I talked about two styles of programming: interactive and plan-ahead. For the last couple of lessons we explored writing an HTML stripper routine using the "dive in without testing the water" interactive approach. Today we'll write the same routine again, but this time we're going to think about it first and do most of our debugging in our heads (or on paper).

In the case of our HTML stripper routine, we've got a distinct advantage having just written it one way: that makes it much easier for us to know what we're getting into and what kinds of problems we'll encounter. For instance, we know from the beginning that our algorithm should handle unusual HTML like empty tags and <= REALbasic comparison operators.

In real life, you may not have that advantage. But as you gain experience programming, you'll find many situations repeat themselves. For instance, I've written that "strip HTML" routine on a number of occasions for various programs, so it was much easier for me to dive in and write it. Of course I made mistakes and there were flaws in my approach which had to be repaired at the end, but in the end it worked and it wasn't that bad of a process. If I'd begun with minimal experience and no real idea of where I was going, it might have taken a little longer at the beginning, and I might have even gone down a dead-end path or two.

What is most common is that you're faced with writing a routine you haven't written in years (maybe decades) and while you remember the vague steps, the specifics are lost in time. Still, even that vague knowledge can give you a huge step up in the process.

In the plan-ahead style of programming, we don't begin with code, but with a plan -- an algorithm. For those of you used to interactive programming, this can seem boring or tedious (and for a simple task like stripping HTML it probably is), but it's a critical part of the process and in many cases will save you many hours of later aggravation.

Our first task is to look at several algorithms and select which one we'll use. Then we test it and refine it -- all within our mind -- and if it seems to work, then we implement it in code.

Some Algorithms

Let's begin with the "brute force" approach we used in our interactive example. Brute force means we solve the problem not with intelligence or cleverness, but by using the incredibly fast processing power of the computer.

For example, a brute force approach to find a person's name in a ten million person database would be to look through every name in the database and stop when we find the name we're looking for. A more intelligent (though more complex to implement) approach would be to index the database (with a hash table or binary tree or something similar) and cut the search time to a tiny fraction.

In the case of Strip HTML, our brute force approach is to simply look for < and > combinations, assume they are tags, and remove them and their enclosing text from the passed string. Granted, that would work, and it might be our best method, but let's brainstorm other approaches before we make our final selection.

The main problem with the original brute force method is that it blindly finds "unfinished" tags such > comparision operators in REALbasic code that's in our sample HTML file. So why not reverse the process? We could first search for all occurences of "<>", "<=", ">=" or "<>" and replace them with HTML entities. Then we could use a bare bones brute force search to find and delete all HTML tags. When we're done, we simply change the HTML entities back to < and > symbols.

An alternative approach could be to search for specific HTML tags. Here I'm talking about the full tag: stuff like <b> and <i> and <emphasis>. While searching for the full tag would eliminate all the problems with broken, empty, or bad tags, HTML is a complex language growing more complex all the time. The number of tags HTML supports now is huge, and many include various dynamic options that make identifying the tag difficult. For instance, for RBU I use Cascading Stylesheets, which let me turn an ordinary <b> into something like this: <b class="mystyle">. Obviously searching for <b> is simple, but finding the latter involves parsing the extra option which would be complicated. This algorithm could work for a small subset of HTML; for instance, if you knew exactly which tags were going to be used and you wanted to delete them. I've done that kind of thing in the past where I needed a simple markup language to indicate bold or italic in a help file and thus my "strip HTML" routine did not need to be very complicated.

But what other methods could we use to delete HTML tags? How about an object oriented approach? What would that be?

Well, OOP suggests objects, right? So we could break our HTML string down into multiple objects. Each object would be a tagged object. For instance, if this were our HTML:

  
<p>This is our first <b>paragraph</b>.</p>

That entire paragraph would be stored as a paragraph object (a <p> tag). Inside that object would be another object, a <b> tag, which would hold the word paragraph.

Imagine if this were extended to an entire file. There'd be an HTML object, which holds the entire text (the <html> tag), and inside of it a series of objects that represent all the parts of the file. There'd be a <head> object, a <body> object, etc. Additional characteristics of a tag -- such as a stylesheet or color or width -- could be properties added to the tag object.

The actual tags themselves would be removed during this process as each object would be a container storing only the text enclosed by the tag. The tag itself would be remembered by the object type (a <b> tag object is a bold object, etc.). So to reconstruct the file without the tags would be a simple matter of traversing the series of objects that make up the file, adding each text bit to a string until we run out of objects.

While complicated to program, this particular method would be extremely powerful, as we'd basically be almost understanding the HTML (it'd be easy to extend this system to one that would actually interpret and draw an HTML page). This would be ideal in a program that needed that feature: the strip HTML routine would simply gather the tagless text into a single string. Of course this is overkill if you aren't needing to parse the HTML.

One big drawback to this method is that it requires all tags to having begining and ending tags. While XML is strict and requires balanced tags, HTML is more flexible. For instance, you don't have to end paragraphs with </p> if you don't want -- the web browser will figure out what you meant. If your HTML stripper routine will see files from unknown sources (i.e. sources you can't control), it could be fed files with unbalanced tags, which would break your routine. While you could program around common HTML errors, that would be extra work.

Are there any other ways we could strip the HTML? I can think of one. What we're really doing is finding HTML tags and replacing them with nothing, right? So couldn't we use REALbasic's Regular Expressions (regex) feature to locate tags and delete them? Surely the regex engine is powerful enough to allow us to get specific about the tags we delete. It's worth examining, to be sure.

Well, that's enough for this week. Next time we'll pick an algorithm and implement it.

Next Week

We pick a method and implement it.

Letters

This week's letter comes from France where a reader wants to create a background drawing grid like in Photoshop:

I'm a new French programmer and I need help in a graphic software I try to program.

My question is about how to draw a setable grid like in Photoshop and how I can be able to have different drawing style as scale in pixels, decimal inches, 8th inches, and centimeter.

Thanks to answer to me because I pull very hard my hair..

Excuse me for my little english.

Interesting question. Though it's beyond a letter answer to provide you with a completed project, I don't want you to lose all your hair! So I made a start on it for you. I've created a gridCanvas class (a subclass of canvas) which includes properties which lets you specify the grid settings (frequency, color, zoom level, etc.). It draws a grid within itself based on these settings.

The demonstration project lets you zoom in and out and change the grid frequency:

Of course this canvas does not support drawing or snap-to-grid features like Photoshop -- you'll have to add those yourself if that's what you have in mind.

As for changing measurement systems, that's mostly a matter of mathematics. The demo is based on inches (using 72 dots per inch), but you could easily change that. One of the best methods I've found for converting between measurement systems, depending on the accuracy you need, is to use the system Adobe PageMaker uses internally.

PageMaker (the page layout software) supports a number of different measurement systems, including rare types like ciceros. But inside everything is converted to and from what PageMaker's engineers call twips.

A twip is 1440th of an inch. Because PageMaker uses twips as its internal measurement system, nothing in PageMaker can be more accurate than 1440th of an inch. Generally that's plenty accurate, but in some situations you might need more accuracy in which case the twip system is not for you.

The advantage of the twip system is that twips are evenly divided into all the major measurement systems:

Inches Millimeters Picas Points Ciceros
Twips 1440 56 240 20 256

This makes conversion between measurement systems easy. If there's a line that's 5.25 inches long, for instance, that's exactly 7560 twips. Divide that by 56 and you know the line is 135 millimeters (13.5 centimeters). Divide it by 240 and you've got 31.5 picas (378 points -- 12 points to a pica).

The user never has to see or know about twips: that's just an internal measuring system you use to ease conversion. You see, twips are so precise you never have to mess with fractions with them, which is excellent. If you base your screen image on 72 dots per inch (which you should though the screen probably isn't) you can easily scale an image to any measurement system or zoom level. Just convert its actual size in the current measurement system back to twips, then convert twips to dots-per-inch by dividing by 1440 and then multiplying by 72. So our 5.25" line is 378 pixels (which sharp-eyed readers will note is the same as points -- the 72 DPI screen standard was designed to match the points measurement system). Multiple that by your zoom level to get the final image size.

Hopefully this will get you started. Write again if you run into a block. Click here to grab the project file.


About the Column
REALbasic University is a weekly instructional column on programming with REALbasic and is brought to you by REALbasic Developer, the magazine for REALbasic programmers.

Each week we answer select reader questions, and we're always open to ideas for future columns. Send your questions to . (Keep your questions simple and specific. General queries like "How do I write my own web browser?" will be neglected.) Your question won't be answered immediately, but will be answered in a future column. (If you don't want your correspondence published, just be sure to indicate that when you write. Otherwise it's fair game.)

About the Author
is an author, philosopher, graphic designer, photographer, film director, soccer fanatic, and programmer (among other things). He writes for MacOpinion, runs his own software company, Stone Table Software, which sells the revolutionary Z-Write word processor, and is Publisher and Editor of REALbasic Developer. He lives in Northern California with his cats, Mischief and Mayhem, and is rapidly running out of free time.

See the REALbasic University Archives


REALbasic University contents ©2001-2004 by Marc Zeedar and REALbasic Developer. All Rights Reserved.

Email This Article - Comment On This Article

.

Reader Specials

Server Racks Online:
Apple Xserve CompatibleServer Racks and Universal Network Racks
42U KVM Switch Solutions:
High-End Mac and Multi-Platform KVM Matrix switching solutions!
Digital Camera Online:
Great prices on Digital Cameras and accessories!
KVM Switches Online:
Great prices on Mac KVM Switches from the leading manufacturers!
LCD Monitors Online:
Great prices on LCD Monitors from the leading manufacturers!
LCD Projectors Online:
Shop online for LCD Projectors from the leading manufacturers!
USB 2.0 Online:
Great prices on USB 2.0 products from the leading manufacturers

Serious Business Software:
Accounting, Sales, Inventory, CRM, Shipping, Payroll & more!

KVM Switch solutions for MACs:
DAXTEN is a KVM switch, KVM extender and monitor splitter specialist for PC, SUN and MAC applications from name brand manufacturers - offices worldwide.

The "Think Different Store: The iPod Accessories Store - iPod cases, iPod mini, iPod photo, speakers, itrip, inMotion, Soundstage and all other iPod accessories

Earn Cash with the ThinkDifferent Store Affiliates Program

Need A Web Site?
Applelinks Web Hosting Starting at 19.95 a Month

iTunes_RGB_9mm

.

iTunes_RGB_9mm

Cool Mac Gear


iPod 1G-2G
iPod 3G
iPod 4G
iPod Mini
PowerBook-iBook
Keyboard Skins
Garageband