Suffix Trees for Fast Sensor Data Forwarding

Suffix Trees for Fast Sensor Data Forwarding Jui-Chieh Wub Hsueh-I Lubc Polly Huangac Department of Electrical Engineeringa Department of Computer Science and Information Engineeringb Graduate Institute of Networking and Multimediac National Taiwan University [email protected], [email protected], [email protected] Abstract— In data-centric wireless sensor networks, data are alleviates the effort of node addressing and address reconfig- no longer sent by the sink node’s address. Instead, the sink uration in large-scale mobile sensor networks. node sends an explicit interest for a particular type of data. The source and the intermediate nodes then forward the data Forwarding in data-centric sensor networks is particularly according to the routing states set by the corresponding interest. challenging. It involves matching of the data content, i.e., This data-centric style of communication is promising in that it string-based attributes and values, instead of numeric ad- alleviates the effort of node addressing and address reconfigura- dresses. This content-based forwarding problem is well studied tion. However, when the number of interests from different sinks in the domain of publish-subscribe systems. It is estimated increases, the size of the interest table grows. It could take 10s to 100s milliseconds to match an incoming data to a particular in [3] that the time it takes to match an incoming data to a interest, which is orders of mangnitude higher than the typical particular interest ranges from 10s to 100s milliseconds. This transmission and propagation delay in a wireless sensor network. processing delay is several orders of magnitudes higher than The interest table lookup process is the bottleneck of packet the propagation and transmission delay. forwarding delay and the most time consuming part is the Consider the MicaZ [4] and TinyOS [5] sensor network repeated string matching involved. Motivated to enable fast sensor data forwarding, we study and evaluate the use of an development platform. The packet size limit is 36 bytes. The efficient data structure, suffix tree (ST), for fast string matching wireless radio transmits at 100s kbps. The transmission delay in resource-limited sensor networks. Results of our experiments is thus at the scale of 1s milliseconds. Assume 10 meter radio show that ST is faster than the state of the art by up to 29% range and 2 ∗ 108 m/s propagation speed The propagation for identifying a match and 48% for identifying a non-match. delay can be found at the scale of 0.01s microseconds. The Further with a novel space optimization scheme, the memory space requirement for ST is reduced by up to 24% which makes processing delay is evidently the bottleneck of the per hop ST feasible to run on resource-limited sensor nodes. forwarding delay. The interest lookup delay is contributed by 2 levels of I. INTRODUCTION matching - interest and predicate matching. Each interest may consist of multiple predicates. At the higher level, the On the address-centric Internet, communication nodes are system needs to identify, among various interests, a particular numerically addressed, for example, 140.112.42.220.A source interest that matches the incoming data. At the lower level, node sends data by the destination node’s address. Forwarding, the system verifies whether a predicate in an interest matches also known as the routing table lookup problem, involves how, given the destination address of an incoming data packet, each intermediate router locates a matching entry in the routing table. From the matching entry, the router identifies the network interfaces (or ports) towards the next hops that the data packet should be forwarded further. Efficient algorithms and data structures such as [1] are proposed to speed up the number matching, which lead to the design of very high-speed IP switches today. Motivated to achieve high-speed forwarding for sensor networks, we seek efficient algorithms and data structures to speed up string matching for data-centric sensor networks. In data-centric wireless sensor networks [2], nodes are no longer addressed. Data do not carry the destination address. Instead, each sink node sends an explicit interest through the network to draw in a particular type of data. The intermediate nodes in turn disseminate the data based on the data content rather than the destination node’s address. This data-centric style of communication is particularly promising for that it Fig. 1. An example of interest matching the incoming data. Illustrated in Fig 1 are 3 example interests we implement three string matching algorithms and evaluate composed by a number of predicates. Interest 1 is looking for how well they will perform in practice. The remainder of this anything related to the nslab group. Interest 2 looks for data paper is organized as follows. The related work is presented about a faculty member whose name is polly, and interest 3 next in Section II. We describe then in Section III the string looks for data about all 2nd year and above master students. matching algorithms. Next in Section IV, we provide an The incoming data in Fig 1 matches both interest 1 and 2. analytical comparison of the algorithms. In Section V, VI, and Much of the recent work [6][7][8][9] has focus on the VII we detail the experimental setup, results, and our findings strategies of structuring interests or content types to enable on how the algorithms perform in practice. fast interest matching. Their objective is to reduce the number of predicate matching required. Our work complements these II. RELATED WORK earlier studies in that we focus on improving the efficiency of A. Data-Centric Communication individual predicate matching. In the traditional IP network, nodes communicate to each Predicate matching involves attribute matching and value other by the fixed IP addresses. This communication model matching. Attribute matching is essentially an exact string is proven, by the daily operation of Internet, effective in matching problem. One common practice to speed up attribute supporting applications running on static and full-fledged com- matching is to fix the bit position of all possible attributes in puters. For mobile and resource-limited sensor networks, how the data packet as well as the routing table. This method, to configure and reconfigure node addresses in the precense of although simplifies the attribute matching process, will be node dynamics poses a great challenge. This problem is first memory and bandwidth consuming when the number of differ- raised and addressed in one of the pioneer work on sensor ent data types is high. When there are different sensors to be networks [2]. In that, the authors propose the data-centric added to the network, the system will not be easily extendable communication paradigm. without changing the packet format and interest table data In data-centric communication, digital information are dis- structure. Value matching is also a string matching problem, seminated based on the feature/attribute/content of the infor- when the data type is string. Depending on the operator of mation itself, not the addresses of issuers or receivers. In the predicate, value matching may require exact or sub-string the first data-centric routing mechanism for wireless sensor matching. In essence, the efficiency of predicate is determined networks [12], sinks send explicit interest packets to set up by the efficiency of the string matching algorithm used. routing states at the intermediate nodes. These interest specific For efficient string matching, prior work [13] suggests the routing states in turn draw in the data of interest for the sinks. use of ternary search tree (TST). It is a string matching Such dissemination scheme relies on well-defined naming algorithm with O(|P | + log(N)) time complexity and O(|S|) system to describe data attributes and sink interests. The space complexity. P denotes the input string, typically an corresponding naming system and the filter-based forwarding attribute or value string in the incoming data. S denotes the mechanism are detailed in [6]. The string maching problem, training word set which concatenates all the strings appear although recognized as the performance bottleneck, is not in the entire interest table. N denotes the total number of addressed. strings in the training set. To speed up the string matching process further, we propose and evaluate the use of suffix B. Publish/Subscribe Systems tree (ST), a linear time string matching algorithm that can In [13][14], the authors design a set of efficient forwarding be easily extended to perform efficient prefix, suffix and and routing mechanisms for content-based data dissemina- substring matching. ST has an O(|P |) time and O(|S|) space tion. The notion of content-based communication is essen- complexity [10][11]. tially the same as data-centric communication. The mech- Although the large-scale performance of TST and ST’s anisms proposed, although descends from the literature of memory requirement is the same, we find, in real implemen- publish/subscribe systems, are applicable to sensor networks. tations, the amount of memory required by ST is significantly There are two different kinds of publish/subscribe systems: higher than that of TST. This is a serious problem for sensor channel-based and content-based. In both systems, multiple nodes in which the memory space is very limited. To tackle users may subscribe to the data of interest. In channel- the problem, we further propose a scheme to optimize the based systems, the users subscribe to a particular channel and memory consumption for ST. the corresponding data broker pushes particular data to the To observe how the algorithms will perform in practice, channel from which the subscribing users receive the data. In we implement TST and ST, as well as a simple hash-based content-based systems, the concept of channel is refined as method for comparison.

Suffix Trees for Fast Sensor Data Forwarding

Compressed Suffix Trees with Full Functionality

Suffix Trees

Search Trees

Lecture 1: Introduction

Design and Analysis of Data Structures

Suffix Trees and Suffix Arrays in Primary and Secondary Storage Pang Ko Iowa State University

Final Review

Finger Search Trees

Lecture Notes of CSCI5610 Advanced Data Structures

String Algorithm

Pointer-Machine Algorithms for Fully-Online Construction of Suffix

Analysis of Algorithms and Data Structures for Text Indexing