An Effective Information Retrieval System Should Express Several Key Elements



Assignment #1 Candybars 1

An effective information retrieval system should express several key elements. First, the records within the system should be consistent in terms of content and format. This allows users to navigate the database and develop expectations as to what the database contains. In order to achieve this consistency, it is necessary for the information system developer to design and implement rules that govern the creation of records. Second, the developer of the information system must ensure that the database records contain information that will be relevant to the database’s audience. In order to determine the relevance of various informational components, the database designer must have a coherent concept of the database rationale. This is accomplished through a statement of purpose that defines the key characteristics of the database by outlining the purpose of the database and predicting, at least to some degree, how the user will use the database. A database that is both consistent in its records and clear in its purpose will facilitate the efficient retrieval of information. Upon further analysis of our Candy Bar database, we realized that it would be necessary to expand and clarify our rules. The beta testing team found the rules that were provided to be somewhat vague. For instance, they questioned whether the color field in the database referred to the color of the entire candy bar wrapper or only the color of the front of the candy bar wrapper. This rule would need to be elucidated in order for the team to determine the correct value for the field. Further, they wanted clarification on whether or not the manufacturer’s name should be included in the name of the Candy Bar. Finally, they were unclear as to whether the filling field referred to the ingredients in the candy bar as noted on the label or the main contents of the candy bar as noted on the front of the packaging. Obviously, it would be necessary to reinforce the database rules by clarifying the guidelines for how one would determine the appropriate values for a given field. The beta testing team was also somewhat unsure as to who the intended audience for such a database would be. In order to determine what information should be contained in the records, they needed to know how and why the database would be used. For instance, if the intended audience were people with food allergies, it would be important to provide allergen information in the database. In such an instance, it would be appropriate to list such allergens as peanuts or dairy products when used as ingredients. Assignment #1 Candybars 2

However, if the intended audience were late night snackers, it might be more prudent to include information such as the average cost of the candy bar or the main filling or contents of the candy bar. On the other hand, if the intended audience were dieters, it would be appropriate to include information such as calorie and fat content in the records. Obviously, it is impossible to know how any given user will attempt to use a specific database. However, a clear and concise statement of purpose edifies the user as to the kind of information contained in the database and the purpose of the database. The Rules presented to the beta-testing team were as follows:


1. Candy Name: Write the name of the candy bar that is on the front of the package. 2. Main Color: Write the main color or colors of the packaging. 3. Manufacture: The name of the company manufacturing the candy, usually found on the back of the package. If is part of a division include the main manufacturer if possible. 4. Number of pieces: Is the candy one big piece, two or more? If more than two, label as other. 5. Filling: Write the fillings, if any, that in the candy. 6. Calories: Amount of calories per serving, not the fat calories. 7. Weight in grams: Weight of the candy in grams. 8. Shape: The main shape of the candy, rectangular, round, or other. 9. Chocolate: What kind of chocolate, if applicable, does the candy have? Milk, dark, white. 10. Note: Any other information concerning the candy that may be important to the searcher.

Overall, these rules were effective despite the fact that they were not extremely specific. When the rules were written, we made some assumptions about the ease in finding the appropriate information and how the terms used in each rule would be interpreted. After the beta-testing, it was clear that the rules needed to be more precise and that fewer assumptions should be made about the people using the data structure and how they would interpret the terms used. A response that may seem obvious to the developer may actually be unclear to the user and lead to ambiguity. This became apparent when comparing the beta test to the initial data-structure. Most of the responses were the same, but there were a few differences due mainly to the lack of specificity in the rules. The rules below are the ones that were particularly unclear to the group doing the beta testing.

1) One of the problems that arose was that of the candy name. The rules stated that the name to be used was the one on the front of the candy package. However, there are some packages that have more than one name on the front and sometimes the manufacturer's name is also on the front of the package. In order to ensure consistency in the records, the rule corresponding to the candy name would have to be rewritten to be more specific. Candy Name: Write the name of the candy that is on the front of the package. If the candy is a variety of another candy with a similar name include the variety. For instance, if the candy comes in both "original" form and "extra crunchy" then either original or extra crunchy would be included in the record as a part of the candy name. Include the manufacturer's name if it is a major part of the name.

2) Another rule that needed clarification is the rule concerning the color of the packaging. The beta-testing group questioned whether the color referred to the entire package or simply the front of the package. The rule would be rewritten to read: Main Color: Write the main color or colors from the front of the package. Choose the predominate color on the package. If package is multi-colored choose the two main colors. If possible choose a color or colors that are similar to those on the validation list. Do not include shades of the colors such as dark or light.

4) This field is concerning the number of pieces of candy within the package. The original rule limited the data to 1, 2, 4, or other. However, the beta testing team brought up the question of whether ‘other’ was too broad a choice. ‘Other’ could mean 5, 10, or 100 and the searcher would have no way of knowing approximately how much candy was contained in the package. In order to eliminate ambiguity, the validation list in the data structure would need to be amended and so the rule would also need to change. Number of pieces: Is the candy one big piece, two, or more? If more than two, choose the corresponding range from the validation list. If it is unclear as to how many pieces are in the package, use best estimate.

5) The fillings of the candy also presented a problem because the rules were unclear as to where the information for this field could be found. Some of the information could be taken from the ingredient section on the package or simply from the front of the package. The rule would have to specify which fillings were being requested. Filling: Write the fillings, if any, that are in the candy. Use the information listed on the front of the package. Use the validation list and include all the fillings that are applicable from the list.

6) The other group was unclear as to what to put in the calorie field if the information was unavailable. They assumed that if the information was not listed on the candy the appropriate choice was unknown. Calories: Amount of calories per serving, not the fat calories. Information is usually listed on the back of the package. If the information is unavailable, ‘unknown’ is an appropriate value for the field.

10) For the note field the rule was again too broad and so the beta testing group was unclear as to what to include in the field. They decided to include allergy information and more specific information about the ingredients. The rule for the note would need to specify what to include here.

Note: Any other information concerning the ingredients found in the candy.

The beta group also included nutritional information in the note field, the fat grams in the candy. Since fat grams is information that many searchers would be interested in we decided to add another field. Fat: The amount of fat in grams found in the candy. Information is usually located on the back of the package under total fat. Write only the total fat, not the sat. fat or trans. fat. If information is unavailable, value of field is ‘unknown.’

Comparison between records of two database records

RECORDS FROM CANDYBAR TEAM RECORDS FROM BETA-TESTING TEAM candy_no 1 candy_no 1 candy_name HEATH candy_name Heath main_color BROWN main_color BROWN manufacture HERSHEY manufacture HERSHEY no_of_pieces 1 no_of_pieces 1 filling TOFFEE filling TOFFEE calories 200 calories 210 weight_in_grams 39 weight_in_grams 39 shape RECTANGULAR shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 12g. candy_no 2 candy_no 2 candy_name ROLO candy_name Rolo main_color BROWN main_color BROWN manufacture HERSHEY manufacture HERSHEY no_of_pieces other no_of_pieces other filling CARAMEL filling CARAMEL calories UNKNOWN calories unknown weight_in_grams 48 weight_in_grams 48 shape ROUND shape ROUND chocolate MILK chocolate MILK note Chewy note Fat content = unknown candy_no 3 candy_no 3 candy_name SNICKERS candy_name Snickers Cruncher main_color YELLOW main_color YELLOW manufacture MARS manufacture MARS no_of_pieces 1 no_of_pieces 1 filling NUTS filling CARAMEL RICE NUTS calories UNKNOWN RICE weight_in_grams 44.2 calories 240 shape RECTANGULAR weight_in_grams 44.2 chocolate MILK shape RECTANGULAR note cruncher chocolate MILK note Fat content = 14g. Contains peanuts. candy_no 4 candy_no 4 Assignment #1 Candybars 5 candy_name Hershey's candy_name Hershey's Cookies n' Cookies'n'Creme Cream main_color WHITE main_color WHITE manufacture HERSHEY manufacture HERSHEY no_of_pieces 1 no_of_pieces 1 filling OTHER filling OTHER calories 230 calories 230 weight_in_grams 43 weight_in_grams 43 shape RECTANGULAR shape RECTANGULAR chocolate OTHER chocolate OTHER note Fat content = 12g. Contains chocolate cookie bits. Chocolate type = white. candy_no 5 candy_no 5 candy_name Twix candy_name Twix main_color YELLOW main_color OTHER manufacture MARS manufacture MARS no_of_pieces 2 no_of_pieces 2 filling CARAMEL filling CARAMEL OTHER OTHER calories 280 calories 280 weight_in_grams 56.7 weight_in_grams 56.7 shape RECTANGULAR shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 14g. Allergy information: may contain peanuts. candy_no 6 candy_no 6 candy_name Almond M&M's candy_name S'mores main_color YELLOW main_color BROWN manufacture MARS manufacture HERSHEY no_of_pieces other no_of_pieces 1 filling NUTS filling OTHER calories 200 calories 240 weight_in_grams 37.1 weight_in_grams 46 shape ROUND shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 11g. Allergy information: may contain peanuts. candy_no 7 candy_no 7 candy_name Hershey's S'mores candy_name M&M's Almond main_color BROWN main_color OTHER manufacture HERSHEY manufacture MARS no_of_pieces 1 no_of_pieces other filling OTHER filling NUTS calories 240 calories 200 weight_in_grams 46 weight_in_grams 37.1 shape RECTANGULAR shape OTHER chocolate MILK chocolate MILK note Fat content = 11g. Beige packaging. Contains almonds. Oval shape. Assignment #1 Candybars 6 candy_no 8 candy_no 8 candy_name Take 5 candy_name Milky Way main_color RED main_color BROWN manufacture HERSHEY manufacture MARS no_of_pieces 2 no_of_pieces 1 filling PRETZELS filling CARAMEL CARAMEL OTHER OTHER calories 270 calories 220 weight_in_grams 270 shape RECTANGULAR weight_in_grams 42 chocolate MILK shape RECTANGULAR note Fat content = 10 g. Contains chocolate MILK nougat. Allergy information: may contain peanuts. candy_no 9 candy_no 9 candy_name Almond Joy candy_name Almond Joy main_color BLUE main_color BLUE WHITE WHITE manufacture HERSHEY manufacture HERSHEY no_of_pieces 2 no_of_pieces 2 filling COCONUT filling COCONUT NUTS NUTS calories 220 calories 220 weight_in_grams 45 weight_in_grams 45 shape RECTANGULAR shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 12g. candy_no 10 candy_no 10 candy_name Butterfinger candy_name Butterfinger main_color YELLOW main_color BLUE BLUE YELLOW manufacture NESTLE manufacture NESTLE no_of_pieces 1 no_of_pieces 1 filling NUTS filling OTHER calories 270 calories 270 weight_in_grams 59.5 weight_in_grams 59.5 shape RECTANGULAR shape RECTANGULAR chocolate OTHER chocolate MILK note Fat content = 11g. Contains ground peanuts and cocoa. candy_no 11 candy_no 11 candy_name Milky Way candy_name Take 5 main_color BROWN main_color RED manufacture MARS manufacture HERSHEY no_of_pieces 1 no_of_pieces 2 filling CARAMEL filling CARAMEL OTHER NUTS calories 270 OTHER weight_in_grams 58.1 PRETZELS shape RECTANGULAR calories 220 chocolate MILK weight_in_grams 42 shape RECTANGULAR Assignment #1 Candybars 7

chocolate MILK note Fat content = 11g. Contains peanut butter. candy_no 12 candy_no 12 candy_name Milk Chocolate M&M's candy_name Snickers main_color BROWN main_color BROWN manufacture MARS manufacture MARS no_of_pieces other no_of_pieces 1 filling OTHER filling CARAMEL calories 240 NUTS weight_in_grams 47.9 calories 280 shape ROUND weight_in_grams 58.7 chocolate MILK shape RECTANGULAR chocolate MILK note Fat content = 14g. Contains peanuts. candy_no 13 candy_no 13 candy_name Snickers candy_name M&M's main_color BROWN main_color BROWN manufacture MARS manufacture MARS no_of_pieces 1 no_of_pieces other filling NUTS filling OTHER CARAMEL calories 240 calories 280 weight_in_grams 47.9 weight_in_grams 58.7 shape ROUND shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 10g. Allergy information: may contain peanuts. candy_no 14 candy_no 14 candy_name 100 Grand candy_name 100 Grand main_color RED main_color RED manufacture NESTLE manufacture NESTLE no_of_pieces 2 no_of_pieces 2 filling CARAMEL filling CARAMEL RICE RICE calories 190 calories 190 weight_in_grams 42.5 weight_in_grams 42.5 shape RECTANGULAR shape RECTANGULAR chocolate MILK chocolate MILK note Fat content = 8g. Allergy information: may contain peanuts.

Textbase Structure

Textbase: C:\MLIS202\candy\candy Created: 2/16/2005 4:58:19 PM Modified: 2/22/2005 7:24:58 PM

Field Summary:

1. candy_no: Automatic Number(next avail=15, increm=1), Term 2. candy_name: Text, Term & Word Validation: required 3. main_color: Text, Term & Word Validation: valid-list 4. manufacture: Text, Term & Word Validation: single-only, valid-list 5. no_of_pieces: Text, Term & Word Validation: single-only, valid-list 6. filling: Text, Term & Word Validation: valid-list 7. calories: Text, Term & Word 8. weight_in_grams: Number, Term 9. shape: Text, Term & Word Validation: single-only, valid-list 10. chocolate: Text, Term & Word Validation: single-only, valid-list 11. note: Text, Term & Word

Log file enabled, showing 'candy_no' Leading articles: a an the Stop words: a an and by for from in of the to

Textbase Defaults: Default indexing mode: SHARED IMMEDIATE Default sort order: Textbase passwords: Master password = '' 0 Access passwords: No Silent password

Validation lists





No_of_pieces: 1 2 4 other


Even though the structure and validation lists were adequate to maintain the collection, after reviewing the data from both teams, our group agreed to modify the data structure to describe the field value more specifically. 1. It is better to have one more field (attribute) to describe fat content. New Field #13 = fat_content_in_grams : Number, Term

2. The validation list of Chocolate should be DARK, MILK, NONE, WHITE, OTHER.

The statement of purpose was originally written as:

The purpose for this database is to create a general catalog chocolate bars of a standard serving size (one standard-sized candybar) based on type of chocolate, filling, caloric content and predominant wrapper coloring. This catalog will also be able to sorted by the major manufacturers and their general description on the wrapper/packaging.

The statement of purpose for the chocolate bar catalog, as originally written, contains all of the questions that were asked in and did cover enough to have a very similar output. The main difference was what is in the notes field. The statement of purpose, in that instance, was too broad. It needs more information about the target audience to focus what is included there. The term “general catalog” is too broad a term and gave the other group difficulty in explaining what kind of information would be useful to the user. A more focused description would be “a catalog of chocolate bars for people on a diet”. The catalog does contain a calorie field already and can have other nutrition fields added with a change to more health-oriented database. The notes field can then be used to focus on any nutritional warnings on wrapper. Assignment #1 Candybars 10

The statement of purpose was re-written as:

The purpose of this database is to create a general catalog chocolate bars of a standard serving size. The records contain information on the type of chocolate used in the candy bar, the filling contained in the candy bar, caloric content of the candy bar and predominant wrapper coloring. The records will also contain information on the manufacturer of the candy and the general description on the wrapper/packaging. The database is intended to be used by people interested in choosing a candy bar as a snack and may have some dietary restrictions but are also concerned about how the candy bar will taste.

By clarifying our rules and expanding our statement of purpose, we believe that the database will be more consistent and that the database would be more useful for the intended audience.

