REALbasic University Resources:

RBU: Glossary Defines common REALbasic programming terms
  Archives Previously published columns
Translations: Dutch Courtesy of Floris van Sandwijk
  Japanese Courtesy of Kazuo Ishizuka
  Chinese Courtesy of Dong Li
  RBU Translation Guide Information on Translating RBU into other languages
Books: Matt's Book (2nd Edition!) Ideal for experienced programmers
  Erick's Book Best for beginning programmers
Websites: Mother Ship The publisher of REALbasic
  RB Webring Links to hundreds of REALbasic websites
  RESExcellence Another REALbasic programming column
  REALbasic Developer Magazine The premiere source for REALbasic instruction.

REALbasic University is Sponsored by

Make your Mac do what YOU want it to. Create games, utilities, cool Mac OS X tricks. Download REALbasic now and create your own software.


Print This Article

REALbasic University: Column 098

OOP University: Part Twenty-Two

Last time we explored several potential RBlog data structures. Today we'll evaluate those in order to see which is best for our needs.

But first, let's solve the "unique key" problem with Data Structure #4.

Finalizing Data Structure #4

Your "homework" after the last lesson was to try to figure out how to overcome the unique key requirement for REALbasic's dictionary class. This was a problem because if we use each weblog entry's date as the key, we'd be limited to a single posting per day. If we used a date's totalSeconds property as the key, it would be unique since no two posts would have the same date and exact time, but it would be difficult to search for a particular record because we'd only be able to find it if we knew the exact time it was posted.

What we need is a way to quickly look up entries by date, yet still be able to separate them later by time. Since a dictionary can only store one item per key (the key must be unique), are we forced to abandon the dictionary approach?

The answer is no. You see, a dictionary can contain any kind of data (it's type is variant for that reason). So let's store an array of weblog entries inside a single dictionary element!

Create an object class called dayEntriesClass and have it contain all the entries for a single day. Like this:

  
dim theTime, theTitle as string
dim i, numEntries as integer
dim aDay as dayEntriesClass

aDay = new dayEntriesClass
if findByDate.hasKey("5/20/2003") then
aDay = findByDate.value("5/20/2003")

// The aDay object contains an array of
// entries for a specific date ("5/20/2003"
// in this case).
numEntries = aDay.count
for i = 1 to numEntries
// Here we retrieve the details of each
// field for each record.
theTime = aDay.entries(i).time
theTitle = aDay.entries(i).title
...
etc.
next // i
end if

Hopefully this code is clearer than a lot of complicated diagrams. Basically we'll need to create a dayEntriesClass class which contains an array, entries, of entryClass. This way there can be unlimited entries on the same day, but each will have a unique time.

Since entry topics (categories) will also have duplicates, we can organize the findByTopic dictionary object the same way. Each item in findByTopic will point to a topicEntriesClass object which will point to an array structure of entryClass objects.

Complicated? Sure, but look at the advantages: we'll have one central dictionary object as our main data repository. Yet we can instantly get a list of objects by date or subject. It's the best of both worlds!

And if we build some sorting ability within our objects, we can easily keep those lists sorted in reverse chronological order (by time, for the findByDate list) and alphabetically (for the findByTopic list). Later, when we're ready to publish our data as HTML pages, we can simply traverse the list grabbing each entry and converting it to HTML.

Are we done?

So far Data Structure #4, while being the most complicated, also seems to have the most advantages. But have we pushed it far enough? Could the structure be made more efficient or more flexible?

The answer is yes. It's always a good idea to reevaluate solutions before you implement them: make sure they are as good as they can be.

In the case of Data Structure #4, there are two areas of inefficiency.

First, while our new array addition solves the unique key requirement of the dictionary class, we know that arrays must be searched sequentially, which is slow. For a day's worth of entries, however, that's not a problem: there should never be more than a few entries for a single day anyway. But the subject (category) array is a different story. Over years of publishing, the number of posts for a single topic could be in the thousands. So now we can find the start of a subject list quickly, but then we must search each entry within that subject sequentially!

The solution to this is to expand on what we've already done. Instead of an array inside the subject object, why not use another dictionary? Each entry would use its date as its unique key, just like we use within the findByDate object. Now we're back to quick searches: finding a subject is instant, as it's a dictionary, and once we've got that we can quickly search for particular dates within another dictionary object. Entries within that dictionary object, since they're limited to a single day each, would point to an array structure of posts for that day.

Another problem with #4 is that we've hard-coded it for specific objects. That's easy to see when you look at the similarities between findByDate and findByTopic: even though they're almost the same, we must create custom object classes for each object. Doesn't that seem awkward?

Plus, if we decided to add a new search parameter such as Author, we'd want our Author look-ups to be as fast finding as date and topic. Right now, though, that'd mean creating several new object classes. Instead, how about a generic class that can be customized for whatever we need?

Originally we had a class for findByDate and a class for findByTopic. What we need is a generic class, like findByField. FindByField wouldn't be built with a particular field in mind -- the actual field characteristics (data type, size, etc.) and how that field is structured (array, dictionary, etc.) can be specified dynamically while the program is running.

We could also add in a built-in sorting mechanism so that fields that include an array would know how to sort themselves and keep the content sorted. That way when a new entry is added, it would be inserted into the appropriate (sorted) location in the array. We'd use a generic sorting mechanism like the kind Matt Neuburg describes in REALbasic Developer (October/November 2002, page 34) which allows different data types to be sorted since each data type has its own comparison operator.

This would allow a date's list of entries to be sorted by time (in reverse chronological order), making exporting them extremely quick.

Just so this complex object structure is clear, here's a diagram illustrating the basic concept:

This represents the basic classes and core data structures we'd need. We'd create dynamic variations of findByFieldClass, for instance, to hold the search indexes of the various fields (date, topic, author, etc.).

Evaluate the Solutions

Now let's move to step three in program design and analyze and pick our best choice for program structure. First we'll figure out the advantages and disadvantages of each approach.

Data Structure #1: The String Array

Advantages:

  • Fast searching within record.
  • Simple to program.
  • Addition, deletion, reorganization easy with dynamic array structure in REALbasic.
  • Saving to disk requires little conversion.

Disadvantages:

  • Inherently linear, making finding a particular record a slow sequential search.
  • Difficult to expand or change as field order is fixed and critical.
  • Structure adds overhead of converting date/time object to-and-from a string.
  • Tab-deliminated fields require excessive use of slow nthField function.

Data Structure #2: The Dictionary

Advantages:

  • Lightning fast search for a particular record if you know its date/time.
  • Simple to program.
  • Addition, deletion, reorganization easy with dictionary structure.

Disadvantages:

  • Fast find only works if you know exact time stamp -- there's no way to seach by date only.
  • No way to quickly retrieve records by subject -- we must sequentially search all records.
  • Requires frequent conversion to-from date object to double.

Data Structure #3: The Data Object Approach

Advantages:

  • Object-oriented design allows future data structure changes.

Disadvantages:

  • Retrieving a record requires sequential search through all records.
  • By using an array to hold the data objects, this isn't much different from Option #1.
  • Even though we're using OOP principles, the result is not particularly reusable since our objects are so specific to this project. We'd need to redesign this with more generic data objects if we wanted to make this structure reusable.

Data Structure #4: The Multiple List Idea

Advantages:

  • Quick find by date or subject.
  • Object-oriented design allows future growth.
  • Minimal redundancy since we only store links to original data object.
  • Once created, easy to work with via the interface we set up for it. For example, we can hide all the private dictionary structure manipulations and such within public data access methods so our external code doesn't know anything about the actual data structure.

Disadvantages:

  • Complex to program.
  • Though based on OOP principles, the initial result is not particularly reusable since our objects are so specific to this project. The redesign with generic data objects solves this, making the structure reusable.

Conclusion

Obviously, for this project, Data Structure #4 is ideal. However, it does require a large amount of overhead to initially create the data structure. Some users -- for instance, someone who would rarely post to the weblog -- might decide that speed and efficiency isn't the topmost priority and therefore one of the slower but simpler solutions is best.

But that's the whole point of brainstorming and evaluating multiple data strategies for your projects: it's up to you to find the best structure that meets the needs of your program, and gives you the flexibility required for the future.

Next Week

More on program design.

News

The next issue of REALbasic Developer is being printed right now and it's packed with some terrific articles. In fact, there's so much stuff we had to leave out the interview feature! (Don't worry: it'll be back next issue.)

Here's a sneak preview of what's coming up in the June/July 2003 issue:

First up, Erick Tejkowski is back with another excellent QuickTime article. He explains how you can manipulate QuickTime video and audio tracks within REALbasic. It's a must read for anyone interested in multimedia.

Next, Charles Yeomans explains one of REALbasic's best kept secrets, control binding. Control binding lets RB do the programming for you. With control binding you can link two controls -- a listBox to an editField, for instance -- without writing a single line of code! Unfortunately control binding hasn't been documented very well, but Charles has fixed that with this excellent article.

Finally, Joe Strout writes about getting the most out of Quesa, the 3D library, by using declares to push it beyond the norm.

In this issue, I wrote the Postmortem myself, detailing writing and selling my Z-Write word processor. I made plenty of mistakes, so please learn from my experience!

Of course we've also got all our regular columnists, reviews of cool products, and more. If you haven't subscribed yet, what are you waiting for?

Letters

Today we've got a non-technical question from Richard who writes:

Will the REALBasic University Lessons be available in PDF Format? I think that this would be better than printing them or trying to read them from your site on screen.

Richard E. Meyeroff

You're not the first to ask this. One method is to print the pages to PDF yourself, either using Mac OS X's "Save as PDF" feature or using a third-party driver in Classic. That wouldn't necessarily improve readability, but it would give you an off-line archive of the articles which might be valuable.

I've thought of providing PDFs, and it is a possibility, but here's the problem. Applelinks is an advertising-supported site, of course, so we need you to read the pages and see the ads and clink the ad links. That's how the site survives. If we provided it in PDF format, there'd potentially be a reduction in ad revenue.

Also, it would be a lot of work to go through and convert nearly 100 articles into PDF format. Without a financial incentive, it's probably not worth my time.

That said, I have thought of a couple ideas. For instance, would readers be willing to pay for a PDF edition of RBU? Either a small one-time fee for all the previous issues, and/or a subscription fee to automatically be mailed the new columns? If so, how much would you be willing to pay? $5? $10? $20?

Another concept I'm looking at is to actually print all the RBU columns into a book. If there was sufficient interest in that, it would give me an incentive to go back and revise the old columns, updating them for Mac OS X, Windows, and REALbasic 5. Would you pay $30 for a 300-page RBU book? Or would a shorter, cheaper, "best of RBU" book be better? Would you prefer the book in PDF format?

Let me know what you think of any of these ideas (or suggest your own) by sending mail to rbu@stonetablesoftware.com: if there's enough interest, something like this might actually happen!


About the Column
REALbasic University is a weekly instructional column on programming with REALbasic and is brought to you by REALbasic Developer, the magazine for REALbasic programmers.

Each week we answer select reader questions, and we're always open to ideas for future columns. Send your questions to . (Keep your questions simple and specific. General queries like "How do I write my own web browser?" will be neglected.) Your question won't be answered immediately, but will be answered in a future column. (If you don't want your correspondence published, just be sure to indicate that when you write. Otherwise it's fair game.)

About the Author
is an author, philosopher, graphic designer, photographer, film director, soccer fanatic, and programmer (among other things). He writes for MacOpinion, runs his own software company, Stone Table Software, which sells the revolutionary Z-Write word processor, and is Publisher and Editor of REALbasic Developer. He lives in Northern California with his cats, Mischief and Mayhem, and is rapidly running out of free time.

See the REALbasic University Archives


REALbasic University contents ©2001-2004 by Marc Zeedar and REALbasic Developer. All Rights Reserved.

Email This Article - Comment On This Article

.

Reader Specials

Server Racks Online:
Apple Xserve CompatibleServer Racks and Universal Network Racks
42U KVM Switch Solutions:
High-End Mac and Multi-Platform KVM Matrix switching solutions!
Digital Camera Online:
Great prices on Digital Cameras and accessories!
KVM Switches Online:
Great prices on Mac KVM Switches from the leading manufacturers!
LCD Monitors Online:
Great prices on LCD Monitors from the leading manufacturers!
LCD Projectors Online:
Shop online for LCD Projectors from the leading manufacturers!
USB 2.0 Online:
Great prices on USB 2.0 products from the leading manufacturers

Serious Business Software:
Accounting, Sales, Inventory, CRM, Shipping, Payroll & more!

KVM Switch solutions for MACs:
DAXTEN is a KVM switch, KVM extender and monitor splitter specialist for PC, SUN and MAC applications from name brand manufacturers - offices worldwide.

The "Think Different Store: The iPod Accessories Store - iPod cases, iPod mini, iPod photo, speakers, itrip, inMotion, Soundstage and all other iPod accessories

Earn Cash with the ThinkDifferent Store Affiliates Program

Need A Web Site?
Applelinks Web Hosting Starting at 19.95 a Month

iTunes_RGB_9mm

.

iTunes_RGB_9mm

Cool Mac Gear


iPod 1G-2G
iPod 3G
iPod 4G
iPod Mini
PowerBook-iBook
Keyboard Skins
Garageband