| |||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||
Print This Article REALbasic University: Column 098
OOP University: Part Twenty-TwoLast time we explored several potential RBlog data structures. Today we'll evaluate those in order to see which is best for our needs. But first, let's solve the "unique key" problem with Data Structure #4.
Finalizing Data Structure #4Your "homework" after the last lesson was to try to figure out how to overcome the unique key requirement for REALbasic's dictionary class. This was a problem because if we use each weblog entry's date as the key, we'd be limited to a single posting per day. If we used a date's totalSeconds property as the key, it would be unique since no two posts would have the same date and exact time, but it would be difficult to search for a particular record because we'd only be able to find it if we knew the exact time it was posted. What we need is a way to quickly look up entries by date, yet still be able to separate them later by time. Since a dictionary can only store one item per key (the key must be unique), are we forced to abandon the dictionary approach? The answer is no. You see, a dictionary can contain any kind of data (it's type is variant for that reason). So let's store an array of weblog entries inside a single dictionary element! Create an object class called dayEntriesClass and have it contain all the entries for a single day. Like this:
Hopefully this code is clearer than a lot of complicated diagrams. Basically we'll need to create a dayEntriesClass class which contains an array, entries, of entryClass. This way there can be unlimited entries on the same day, but each will have a unique time. Since entry topics (categories) will also have duplicates, we can organize the findByTopic dictionary object the same way. Each item in findByTopic will point to a topicEntriesClass object which will point to an array structure of entryClass objects. Complicated? Sure, but look at the advantages: we'll have one central dictionary object as our main data repository. Yet we can instantly get a list of objects by date or subject. It's the best of both worlds! And if we build some sorting ability within our objects, we can easily keep those lists sorted in reverse chronological order (by time, for the findByDate list) and alphabetically (for the findByTopic list). Later, when we're ready to publish our data as HTML pages, we can simply traverse the list grabbing each entry and converting it to HTML.
Are we done?So far Data Structure #4, while being the most complicated, also seems to have the most advantages. But have we pushed it far enough? Could the structure be made more efficient or more flexible? The answer is yes. It's always a good idea to reevaluate solutions before you implement them: make sure they are as good as they can be. In the case of Data Structure #4, there are two areas of inefficiency. First, while our new array addition solves the unique key requirement of the dictionary class, we know that arrays must be searched sequentially, which is slow. For a day's worth of entries, however, that's not a problem: there should never be more than a few entries for a single day anyway. But the subject (category) array is a different story. Over years of publishing, the number of posts for a single topic could be in the thousands. So now we can find the start of a subject list quickly, but then we must search each entry within that subject sequentially! The solution to this is to expand on what we've already done. Instead of an array inside the subject object, why not use another dictionary? Each entry would use its date as its unique key, just like we use within the findByDate object. Now we're back to quick searches: finding a subject is instant, as it's a dictionary, and once we've got that we can quickly search for particular dates within another dictionary object. Entries within that dictionary object, since they're limited to a single day each, would point to an array structure of posts for that day. Another problem with #4 is that we've hard-coded it for specific objects. That's easy to see when you look at the similarities between findByDate and findByTopic: even though they're almost the same, we must create custom object classes for each object. Doesn't that seem awkward? Plus, if we decided to add a new search parameter such as Author, we'd want our Author look-ups to be as fast finding as date and topic. Right now, though, that'd mean creating several new object classes. Instead, how about a generic class that can be customized for whatever we need? Originally we had a class for findByDate and a class for findByTopic. What we need is a generic class, like findByField. FindByField wouldn't be built with a particular field in mind -- the actual field characteristics (data type, size, etc.) and how that field is structured (array, dictionary, etc.) can be specified dynamically while the program is running. We could also add in a built-in sorting mechanism so that fields that include an array would know how to sort themselves and keep the content sorted. That way when a new entry is added, it would be inserted into the appropriate (sorted) location in the array. We'd use a generic sorting mechanism like the kind Matt Neuburg describes in REALbasic Developer (October/November 2002, page 34) which allows different data types to be sorted since each data type has its own comparison operator. This would allow a date's list of entries to be sorted by time (in reverse chronological order), making exporting them extremely quick. Just so this complex object structure is clear, here's a diagram illustrating the basic concept: ![]() This represents the basic classes and core data structures we'd need. We'd create dynamic variations of findByFieldClass, for instance, to hold the search indexes of the various fields (date, topic, author, etc.).
Evaluate the SolutionsNow let's move to step three in program design and analyze and pick our best choice for program structure. First we'll figure out the advantages and disadvantages of each approach.
Data Structure #1: The String ArrayAdvantages:
Disadvantages:
Data Structure #2: The DictionaryAdvantages:
Disadvantages:
Data Structure #3: The Data Object ApproachAdvantages:
Disadvantages:
Data Structure #4: The Multiple List IdeaAdvantages:
Disadvantages:
ConclusionObviously, for this project, Data Structure #4 is ideal. However, it does require a large amount of overhead to initially create the data structure. Some users -- for instance, someone who would rarely post to the weblog -- might decide that speed and efficiency isn't the topmost priority and therefore one of the slower but simpler solutions is best. But that's the whole point of brainstorming and evaluating multiple data strategies for your projects: it's up to you to find the best structure that meets the needs of your program, and gives you the flexibility required for the future.
Next WeekMore on program design.
NewsThe next issue of REALbasic Developer is being printed right now and it's packed with some terrific articles. In fact, there's so much stuff we had to leave out the interview feature! (Don't worry: it'll be back next issue.) Here's a sneak preview of what's coming up in the June/July 2003 issue: First up, Erick Tejkowski is back with another excellent QuickTime article. He explains how you can manipulate QuickTime video and audio tracks within REALbasic. It's a must read for anyone interested in multimedia. Next, Charles Yeomans explains one of REALbasic's best kept secrets, control binding. Control binding lets RB do the programming for you. With control binding you can link two controls -- a listBox to an editField, for instance -- without writing a single line of code! Unfortunately control binding hasn't been documented very well, but Charles has fixed that with this excellent article. Finally, Joe Strout writes about getting the most out of Quesa, the 3D library, by using declares to push it beyond the norm. In this issue, I wrote the Postmortem myself, detailing writing and selling my Z-Write word processor. I made plenty of mistakes, so please learn from my experience! Of course we've also got all our regular columnists, reviews of cool products, and more. If you haven't subscribed yet, what are you waiting for?
LettersToday we've got a non-technical question from Richard who writes:
You're not the first to ask this. One method is to print the pages to PDF yourself, either using Mac OS X's "Save as PDF" feature or using a third-party driver in Classic. That wouldn't necessarily improve readability, but it would give you an off-line archive of the articles which might be valuable. I've thought of providing PDFs, and it is a possibility, but here's the problem. Applelinks is an advertising-supported site, of course, so we need you to read the pages and see the ads and clink the ad links. That's how the site survives. If we provided it in PDF format, there'd potentially be a reduction in ad revenue. Also, it would be a lot of work to go through and convert nearly 100 articles into PDF format. Without a financial incentive, it's probably not worth my time. That said, I have thought of a couple ideas. For instance, would readers be willing to pay for a PDF edition of RBU? Either a small one-time fee for all the previous issues, and/or a subscription fee to automatically be mailed the new columns? If so, how much would you be willing to pay? $5? $10? $20? Another concept I'm looking at is to actually print all the RBU columns into a book. If there was sufficient interest in that, it would give me an incentive to go back and revise the old columns, updating them for Mac OS X, Windows, and REALbasic 5. Would you pay $30 for a 300-page RBU book? Or would a shorter, cheaper, "best of RBU" book be better? Would you prefer the book in PDF format? Let me know what you think of any of these ideas (or suggest your own) by sending mail to rbu@stonetablesoftware.com: if there's enough interest, something like this might actually happen! About the Column REALbasic University is a weekly instructional column on programming with REALbasic and is brought to you by REALbasic Developer, the magazine for REALbasic programmers. Each week we answer select reader questions, and we're always open to ideas for future columns. Send your questions to . (Keep your questions simple and specific. General queries like "How do I write my own web browser?" will be neglected.) Your question won't be answered immediately, but will be answered in a future column. (If you don't want your correspondence published, just be sure to indicate that when you write. Otherwise it's fair game.) About the Author See the REALbasic University Archives
REALbasic University contents ©2001-2004 by Marc Zeedar and REALbasic Developer. All Rights Reserved.
| |||||||||||||||||||||||||||