1. Global Knowledgebase Search on Browser Text

Total Page:16

File Type:pdf, Size:1020Kb

1. Global Knowledgebase Search on Browser Text

Author: Eric Xu Date: 2004/03/02

Protégé Search

Current Protégé has many search tools that help users to find some specific frames in knowledgebase (e.g. the binoculars, the query tab, the string search tab, etc.). However, each of these tools only covers one type of search. And, there’re some searches that are not covered by any of these tools. The purpose of this project is to define different types of searches in Protégé-based knowledgebase, develop a protégé search API that allows programmers to do most common searches, and, just as an add-on, develop a user interface in protégé that allows users to do most common searches without dealing with different search interfaces.

Use Cases 1. Global Knowledgebase Search on Browser Text e.g. search for frames whose browser text match “*vaccination*of*” in knowledgebase “library”.

2. Global Knowledgebase Search on Any Slot Value e.g. search for frames that have slot values matching ”*vaccination*” in knowledgebase “library”. (String Search Tab)

3. Class Search for Instances e.g. search for instances that are of type “Criterion” and whose browser text is “age” in the current knowledgebase.

4. ClassTree Search for Classes e.g. search for classes whose browser text matches “*nut” in class tree “SNOMED_CT”. (Class Tab Binocular Search)

5. InstanceTree Search for Instances e.g. In instanceTree “Newspaper”, search for instances that have browser text “*day” and whose “sections” slot has value class “Sports”. This search can be done at only 1 level, or recursively.

6. Inverse InstanceTree Search for Instances through a Named Slot e.g. search for instances that references instance “Student A” through slot “teaches”. This search can be done at only 1 level, or recursively.

Design Rationale

We view every search query as two parts, the search context, and the conditional clause. Search context defines the space the search is going to be performed in. It can be a knowledgebase which consists of all the frames or a set of frames that has reference relationship through particular slots. We do not allow cross knowledgebase searches. Because of the scope of this project, we do not handle cases where the search space is defined outside of our definition. However, we realize many context definitions indeed can be viewed as a set of frames that has reference relationships (e.g. the left panel in class tab, the class tree, is a context where frames have reference relationships through :DIRECT- SUBCLASS and :DIRECT-SUPERCLASS slots). We introduce Instance Tree (a reference relationship) and reverse Instance Tree(a being-referenced relationship) to describe these contexts. Search condition further defines the search result space. Here, we make the restriction that search conditions can only be on the result frame’s slots. We call them slot value conditions. For example, it can be "find all frames whose slot salary has value greater than 100000 in some context (e.g. knowledgebase or instance tree) ". This restriction does eliminate some type of searches. They will be discussed in the "Limitations and Capabilities" section.

The returned search results are always a collection of frames.

Search API Our search API conforms to our design rationale. We expect our users to be UI programmers who are familiar with Protégé and Protégé API. We provide four search classes presenting four types of searches with different search context. For any of these searches, you can set its slot value conditions. Instead of returning a collection of frames as the search result, we decide to return a lazy Iterator. This iterator is lazy because it only takes whatever is necessary to find the next element. That means it only tries to find the next element when next() or hasNext() is called. The reason we chose to write LazyIterators instead of providing functions that return Collections of result frames was because we want to give users more control to the amount of search he wants to do. Doing it this way, users can stop the search at any time. For example, users can control the search by giving it a maximum search time or a maximum number of returned search frames.

When we design the classes, we tried very hard to make this API extensible. We’ll talk about extensions as we talk about each section of the API.

Main Classes of the Search API

This part defines the four main search classes we provide.

o Abstract class LazySearchIterator (implements java.util.Iterator) o class KnowledgeBaseSearchIterator o class ReferencerSearchIterator o class ReferenceSearchIterator o class ReferenceWithSlotValueSearchIterator

As shown above, all the search classes extend LazySearchIterator. To do a search, one needs to do the following things 1. Declare a SlotValueCondition. This step may be skipped if you don’t want to add any search condition. 2. Create a SearchIterator instance by using the SlotValueCondition you defined in step 1. Depend on what type of search (what search context) you are doing, you will need to choose the right Iterator type. The ReferenceSearchIterator searches in an InstanceTree context, whereas the ReferencerSearchIterator searches in a reversed Instance Tree context. The ReferenceWithSlotValueSearchIterator is a bit complicated and may be a little bit confusing too. It will be discussed in the following examples. 3. After you create the SearchIterator instance, you can just treat it as a regular iterator, and use a while loop to get the results one by one.

Below are some examples on how to use the search API, please ignore the creating SlotValueCondition part for now, as it will be described in details later.

/** * KnowledgebaseSearchIterator * Given a knowledge base, search for frames whose slot values * satisfy condition specified in the “condition” parameter. */

For example: search for frames whose browser text matches “*vaccination of*” in the current knowledgebase

SlotValueCondition condition = new SlotValueSimpleConditionOnBrowserText( StringValueComparator.MATCHES, "*vaccination of*"); KnowledgeBaseSearchIterator search = new KnowledgeBaseSearchIterator(kb, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame); }

/** * Given an instance, a breath first search for frames (1) that is * referenced by treeRoot directly or indirectly through slots in * the linkSlots set and (2) whose slot values satisfy condition * specified in the condition parameter. * * @param treeRoot * @param linkSlots * @param topLevel search is done at the top level or at all * levels * @param condition */

For example: search for classes who has name “age” in class tree “SNOMED_CT”. (Class Tab Binocular Search)

SlotValueCondition condition = new SlotValueSimpleConditionOnAnySlot( SymbolValueComparator.MATCHES, "age", false);

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot(":DIRECT-SUBCLASS"); linkedSlots.add(slot1);

Instance treeRoot = kb.getInstance("SNOMED_CT");

ReferenceSearchIterator search = new ReferenceSearchIterator( treeRoot, linkedSlots, false, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

/** * Given an instance, a breath first search for frames * (1) that refernces treeRoot directly or indirectly through * slots in the the linkSlots set and * (2) whose slot values satisfy condition specified in the * condition parameter. * * @param treeRoot * @param linkSlots * @param topLevel search is done at the top level or at * all levels * @param condition */

For example: search for instances that references “kidney” through slot “connected to”. This search can be done at only 1 level, or recursively.

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot("connected to"); linkedSlots.add(slot1);

Instance treeRoot = kb.getInstance("kidney");

boolean topLevel = false;

ReferencerSearchIterator search = new ReferencerSearchIterator(treeRoot, linkedSlots, topLevel, null); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

/** * Given an instance, first find the set of frames that is * referenced directly or indirectly through slots in the * “linkSlots” set and then among these referenced * frames, find frames * (1) that are referenced directly through the "targetSlot" * by these frames and (2) whose slot values satisfy condition * specified in the “condition” parameter. * The reason we are providing this search class is because * it handles the search of type “find all direct or indirect * simple instances that are of type class A. */ For example: search for instances that are of type “Criterion” and whose browser text is “age” in the current knowledgebase.

SlotValueCondition condition = new SlotValueSimpleConditionOnBrowserText( StringValueComparator.MATCHES, "age");

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot(":DIRECT-SUBCLASSES"); linkedSlots.add(slot1);

Slot targetSlot = kb.getSlot(":DIRECT-INSTANCES");

Instance treeRoot = kb.getCls("Criterion");

ReferenceWithSlotValueSearchIterator search = new ReferenceWithSlotValueSearchIterator( treeRoot, linkedSlots, targetSlot, false, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot("connected to"); linkedSlots.add(slot1);

Instance treeRoot = kb.getInstance("kidney");

boolean topLevel = false;

ReferencerSearchIterator search = new ReferencerSearchIterator(treeRoot, linkedSlots, topLevel, null); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

If you find it necessary to define your own search context, you can simply add another search iterator class which extends the LazySearchIterator class. The only methods you need to implement is searchNext() which returns the next search result. You don’t need to take care of next() or hasNext() methods as they’ve already been taken care of in the LazySearchIterator class.

Helper Classes interface SlotValueCondition

o class SlotValueComplexCondition (implements SlotValueCondition) o class SlotValueSimpleCondition (implements SlotValueCondition)

o class SlotValueSimpleConditionOnAnySlot o class SlotValueSimpleConditionOnBrowserText o class SlotValueSimpleConditionOnSingleSlot

A SlotValueSimpleCondition is a condition clause that does not have an (AND or OR) operator (e.g. “a ^ b" is a SlotValueComplexCondition, but "a" byitself is a simple condition).

public interface SlotValueCondition { public boolean isSimpleCondition();

/** * @param frame the frame to be tested * do the slots of the frame satisfy this * SlotValueCondition if the slotValueCondition does not * apply to the argument frame, false is returned */ public boolean isSatisfied(Frame frame); }

The constructor of a SlotValueComplexCondition takes an operator and a collection of operands which are SlotValueCondition’s.

There are three types of SlotValueSimpleCondition’s. As their names say, SlotValueSimpleConditionOnBrowserText is a condition on browser text, and SlotValueSimpleConditionOnSingleSlot is a condition on single slot. SlotValueSimpleConditionOnAnySlot applies the testing String value on any slots – i.e. a frame contains a slot that has a value that matches the string or a slot that has a value whose browser text matches the string. Please lookup the API, and the above examples to find out how to construct SlotValueConditions. If you find it necessary to define a new SlotValueSimpleCondition, just extend this class, and implement the constructor and the isSatisfied(Frame) method.

There’re some other classes defined to help to construct SlotValueSimpleCondition’s.

o class ValueComparator

o class BooleanValueComparator o class ClsValueComparator o class FloatValueComparator o class InstanceValueComparator o class IntegerValueComparator o class StringValueComparator o class SymbolValueComparator

The …ValueComparator.Matches operator handles wildcard matches, but not regular expression matches. You should use the correct type of operator (according to slot types) when constructing SlotValueCondtion’s.

 class ValueSlotType //provides static member to indicate direct or own slots  class Relation //defines AND and OR operators

Examples Test.java in the search package provides some examples, and is runnable on the newspaper_extend project. Following are more examples

1. Global Knowledgebase Search on Browser Text e.g. search for frames whose browser texts match “*vaccination*of*” in the current knowledgebase

SlotValueCondition condition = new SlotValueSimpleConditionOnBrowserText( StringValueComparator.MATCHES, "*vaccination of*"); KnowledgeBaseSearchIterator search = new KnowledgeBaseSearchIterator(kb, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame); }

2. Global Knowledge Search on Any Slot Value e.g. search for frames who have any slot values that match ”*vaccination*” in the current knowledgebase. (similar to String Search Tab)

SlotValueCondition condition = new SlotValueSimpleConditionOnAnySlot( StringValueComparator.MATCHES, "*vaccination*"); KnowledgeBaseSearchIterator search = new KnowledgeBaseSearchIterator(kb, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame); }

3. Class Search for Instances e.g. search for instances that are of type “Criterion” and whose browser text is “age” in the current knowledgebase.

SlotValueCondition condition = new SlotValueSimpleConditionOnBrowserText( StringValueComparator.MATCHES, "age");

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot(":DIRECT-SUBCLASSES"); linkedSlots.add(slot1);

Slot targetSlot = kb.getSlot(":DIRECT-INSTANCES");

Instance treeRoot = kb.getCls("Criterion");

ReferenceWithSlotValueSearchIterator search = new ReferenceWithSlotValueSearchIterator( treeRoot, linkedSlots, targetSlot, false, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

4. ClassTree Search for Classes e.g. search for classes who has name “age” in class tree “SNOMED_CT”. (Class Tab Binocular Search but with a condition)

SlotValueCondition condition = new SlotValueSimpleConditionOnAnySlot( StringValueComparator.IS, "age", false);

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot(":DIRECT-SUBCLASS"); linkedSlots.add(slot1);

Instance treeRoot = kb.getInstance("SNOMED_CT"); ReferenceSearchIterator search = new ReferenceSearchIterator( treeRoot, linkedSlots, false, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

5. InstanceTree Search for Instances e.g. search for instances whose “sections” whose only value is class “Sports”, and who is directly referenced by Instance “Newspaper”.

SlotValueCondition condition = new SlotValueSimpleConditionOnSingleSlot( getSlot(“sections”), ValueSlotType.OWN, ClsValueComparator.IS, getCls(“Sports”), true);

Instance treeRoot = kb.getInstance("Newspaper");

ReferenceWithSlotValueSearchIterator search = new ReferenceWithSlotValueSearchIterator( treeRoot, null, true, condition); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

6. InstanceTree Search for Instances through a Named Slot e.g. search for instances that references “kidney” through slot “connected to” directly or indirectly.

Vector linkedSlots = new Vector(); Slot slot1 = kb.getSlot("connected to"); linkedSlots.add(slot1);

Instance treeRoot = kb.getInstance("kidney");

boolean topLevel = false; ReferencerSearchIterator search = new ReferencerSearchIterator(treeRoot, linkedSlots, topLevel, null); while (search.hasNext()) { Frame frame = (Frame) search.next(); System.out.println(frame.getBrowserText()); }

Limitations and Capabilities

Limitations  No relationship between different slots in slotValueCondition (e.g. we cannot construct a search query to find all rectangles whose length > 2* width) as can be done by PAL query.  No joins across classes, i.e. no search that requires slots of two classes satisfy the some conditions that compare values of slots from these classes (e.g. search for instances of class Rectangle such that there exists some instance of Square where length of the rectangle is greater than the length of the square). , as can be done by PAL query  It's not a path-search interface. All search results are collections of frames.

Capabilities  It's strong at referencing searching. References can be traced recursively along specified slots in instanceTrees. This cannot be done with any search tool currently available in Protégé, including OWL and PAL.  It also handles global knowledgebase searches, and is a superset of all currently available protégé search tools except for the limitations mentioned above.

Recommended publications