| |||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||
Print This Article REALbasic University: Column 097
OOP University: Part Twenty-OneLast week we began learning about how to design object-oriented programs. I revealed the three steps in program design:
To give us something practical to think about, I proposed a fictitious project, RBlog, a weblog program. After we'd defined the requirements, I gave you the homework assignment of coming up with at least three completely different approaches to a data structure for this project.
RBlog Data StructuresAs I've mentioned many times, a program's data structure is the heart of the program. As much as you might think a program is its user interface, that is nothing but a layer sitting on top of the data structure. Everything the user does must be translated into some way in which the program manipulates the data. A data structure needs to be organized in a way that facilitates its anticipated needs. This can be any combination of sorting, fast searching, additions, deletions, or redefinitions (expansion or customization). Remember, everything in programming is a trade-off. For example, it takes time to search data. However, if you have an index of that data, you can search the data much faster (that's how Sherlock searches the contents of your entire hard drive in seconds). But maintaining that index will slow down other program operations such as adding or deleting data (again, think about Sherlock generating its index). Every choice you make is compromise. Your job is to pick the best data structure for your particular program. With RBlog's needs in mind, what data structures can we brainstorm for it?
Data Structure #1: The String ArrayLet's start with the obvious: a simple array. It's linear, and thus won't search particularly fast (every element must be examined during a search), but it's easy to program. Since each weblog entry is made up of only a few elements (date/time, title, category, and text), rather than incorporate a complex object for each entry, let's just store the whole mess in a string. We can separate each field with a tab character, easily broken apart with the nthField function. This has the advantage of making a simple inStr() search check the entire record -- title, subject, date, and text -- in one go. Not too bad, but not especially sophisticated. Let's play with some other ideas.
Data Structure #2: The DictionaryREALbasic includes the powerful dictionary object, a special data structure that's similar to a collection, except that a dictionary has a hash table (an index) which makes finding elements much faster. Since REALbasic maintains a dictionary's hash table, the structure is easy to use and incredibly fast. Each entry in a dictionary consists of two parts, a value and a key. The key is used to uniquely identify the element (it could be a record's id in a database, for instance). The value is whatever data is being stored. Both key and value are of type variant, which means they can be any type of data you'd like (a string, a number, an object, etc.). For instance, in RBlog, we could use a date's totalSeconds property as the key. Since no two postings will have the exact same date and time, the totalSeconds property will be unique to each post. And by using the date/time as the index for our dictionary, we can quickly find any posting by date/time.
Data Structure #3: The Data Object ApproachAnother idea is to go with the more conventional object-oriented approach, and create a series of data structure objects to hold our data. The data objects can be stored in a master object as an array. To properly encapsulate our data, we can include a data manipulation interface in the master object, so our outside code doesn't know anything about the actual data structure. This gives us the flexibility to later change the data structure without breaking our program. Picture a structure something like this: ![]() As you can see, dbClass holds methods to manipulate the data, giving our external code an interface to manipulate the data (there would need to be many more routines than I show here). The entryClass structure holds the data for a single record of data. dbClass contains an array of entryClass objects which hold all the program's data.
Data Structure #4: The Multiple List IdeaSo far our ideas aren't bad, but they aren't inspired either. They also only satisfy a portion of our requirements. For example, one of our needs is a fast way to look up an entry by date/time. But since RBlog also needs to generate web pages by category (topic), it'd be ideal if we could have a fast way to look them up quickly by subject as well. What would be best if we had a data structure that could be organized multiple ways at same time! Of course that's absurd, right? Storing the data in multiple ways would be redundant and wasteful. But wait a second. Think about our specific needs. We've already determined that our database will be of limited size -- we're guessing 11,000 records after ten years, and that's on the high side -- so maybe a little redundancy for such a small database wouldn't be so bad. Let's at least explore the option. How would we go about implementing a multiple-storage database? First, we know that REALbasic's speedy dictionary class gives us near-instant searching for a particular record. So let's try to base a search system around that kind of structure. If we want fast searches for multiple kinds of data we'd need a separate dictionary for each type of data:
Second, let's assume we organize our data using a structure similar to Option #3: array of data objects. Then we could just copy those data objects into the dictionary objects for quick finding. Wait a second -- why copy? Why not just include the original entry's index number as a link? That way the object doesn't need to be copied, but we've got a reference to it and can still access it. Then our dictionary objects only need to store a few numbers instead of duplicate objects! Okay, this is sounding interesting, but already I see a couple big problems. Let's say we've got an array of data objects indexed as 1 through 4 like this:
Our findByDate dictionary object might look like this:
This looks good. If we search like this:
we should get an instant true/false result. If true, we know there's a matching entry and we can retrieve it like this:
The variable theIndex would then contain the index of the entry in our data array. Sounds great, except there are two big problems. First, the key in a dictionary must be unique (otherwise only the first one is found). If we use the date as the key, that's not unique as there might be several posts on the same day. Unfortunately, knowing the specific time of a post could be difficult. For instance, it'd be easy enough to iterate through dates and ask "Any postings for this date?" But if we had to that for every second of every day that would be ridiculous! Our second problem is if we rearrange the array structure of the data by inserting or deleting, the index numbers change and our findByDate object contains inaccurate links. For example, if we delete the third item in the array (the "5/22/2003" date), item 4 becomes item 3 (the last item in the array). But findByDate.value("5/23/2003") will still return 4 because it doesn't know that the entry's index has changed. Now we could solve for the second problem by rebuilding our findByDate dictionary every time we modify the array, but that's a lot of overhead. A simpler solution is just to make sure each entry has a unique id that never changes and instead of linking to the array's index, link to the id. Then that presents yet another problem -- we need a way to look up entries by id. Wait a second! Why not just use a dictionary object for our main data structure instead of an array? It would still let us add and delete data like an array, but every entry would be indexed by a unique key, and finding a specific record by id would be instantaneous. That still leaves us with the first problem, however. I've devised a solution, but how would you solve it? I'll let you think about it until next time and then you can compare your ideas with mine. Let that be your homework assignment.
Next WeekWe finish designing RBlog's data structure and evaluate the solutions we've generated. LettersToday we've got a great OOP question from Paul Rushton. He writes:
Terrific question, Paul -- thanks for submitting it! Scott Forbes' "Doing Undo" article (in RBD issue 1.2) is a terrific example of OOP. For those who haven't read the article, let me briefly explain. The idea is to create a flexible, reusable undo system you can just drop into all your projects. With this system a checkbox would know how to undo itself, as would reordering items in a listbox. Thus the ultimate "undoable" object is a generic object that knows how to undo itself. But what's great about the article is Scott doesn't just hand us all the answers. Instead he walks us through various undo system concepts, similar to what I'm doing right now without our various data structure options. Scott demonstrates each method of implementation, then reveals the pitfalls of each. An example of this: Scott points out that in a single-level undo system, if each object remembers its previous state, all but one are wasting memory (since only one object will be chosen to undo). Scott then comes up with a better system with a master control object (a "director") that keeps track of the undo info for each object. For this to work, each object, when an action happens, saves that action in a particular format (an UndoableAction class) and passes that object to the director. Later, if the user chooses to undo that action, the director object simply sends that UndoableAction object back to the sending object (checkbox, listbox, whatever) with the command to undo the action. Of course for this to work the director needs to know two pieces of information. One, it needs the UndoableAction object so it can save it. Two, it needs to know what object sent it -- otherwise who would it send it to? Since that info is always required, we store it right inside the UndoableAction object as a property called sender. Then the director object can just look up sender to know where to return the UndoableAction object! But just because sender is a property of UndoableAction, that's no guarantee that you'll remember to set it, right? So Scott uses a constructor method that requires the sending object as its parameter. This means when you create an UndoableAction object, you are required to tell it which object generated the action (usually the current object):
Make sense? A constructor is simply an initialization method. It gets called when an object is created. A destructor is the opposite: it gets called when an object is destroyed. This allows you initialize a new object and clean up an object before it goes away. In REALbasic, most objects have Open and Close events: those are their constructor and destructor methods. In REALbasic, you can create a constructor method by naming it the same as the class. For instance, if testClass has a method called testClass, that method gets called when a new object is created. Download testclass.rb to see this in action. To create a destructor in REALbasic, you precede the name of the method (again, the class' name) with a tilde (~) as in ~testClass. That method is called when the object is destroyed. Destructors don't support parameters.
I'll admit that not being an OOP expert I didn't even know REALbasic supported constructors and destructors (I don't know where they are documented -- I found them in Matt's book). In the past I've created my own init method when I had an object that needed initialization. But of course that means whatever creates the object must remember to call the init routine. Using a constructor is far superior as the initialization routine cannot be forgotten. About the Column REALbasic University is a weekly instructional column on programming with REALbasic and is brought to you by REALbasic Developer, the magazine for REALbasic programmers. Each week we answer select reader questions, and we're always open to ideas for future columns. Send your questions to . (Keep your questions simple and specific. General queries like "How do I write my own web browser?" will be neglected.) Your question won't be answered immediately, but will be answered in a future column. (If you don't want your correspondence published, just be sure to indicate that when you write. Otherwise it's fair game.) About the Author See the REALbasic University Archives
REALbasic University contents ©2001-2004 by Marc Zeedar and REALbasic Developer. All Rights Reserved.
| |||||||||||||||||||||||||||||