A DISTRIBUTED APPROACH TO MULTICAST SESSION DISCOVERY MDNS - A GLOBALLY SCALABLE MULTICAST SESSION DIRECTORY ARCHITECTURE

By

PIYUSH HARSH

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2010 °c 2010 Piyush Harsh

2 I dedicate this to my parents who always supported my decisions and all my teachers who made me the person I am today.

3 ACKNOWLEDGMENTS I would like to extend my gratitude to my adviser, Dr. Richard Newman, who has been more of a father figure and a friend for me than just an adviser. His thoughts on life and the numerous discussions I have had with him over the number of years on almost everything under the sun has helped me a lot become a person that I am today. Special thanks to Dr. Randy Chow who guided me and took time out of his very busy schedule when Dr. Newman took a sabbatical break. His meticulous approach to scientific quest and his knowledge of how the system works was an eye opener. I would also like to thank all my friends that I made over the period of my stay at

University of Florida, including Pio Saqui, Jenny Saqui, InKwan Yu, Mahendra Kumar and others (you know who you are) for keeping me sane and grounded. All of you have been a very pleasant distraction. You all will always be in my heart and mind. I would like to thank all of CISE office staff especially Mr. John Bowers for taking care of administrative details concerning my enrollment and making sure things proceed smoothly for me. I would like to thank CISE administrators with whom I had numerous discussions on intricacies of managing a large . Notable among them are Alex M. Thompson and Dan Eicher. Lastly I would like to acknowledge UF Office of Research, ACM, College of Engineering, and UF Student Government for providing me with numerous travel grants for attending conferences held all over the world.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 8

LIST OF FIGURES ...... 9

ABSTRACT ...... 12

CHAPTER 1 GENERAL INTRODUCTION ...... 14 1.1 IP Multicast ...... 14 1.1.1 Why Multicast? ...... 15 1.1.2 Requirements for Enabling/Using Multicast ...... 18 1.1.2.1 Multicast addressing ...... 18 1.1.2.2 Multicast ...... 20 1.1.2.3 IGMP/MLD: Internet group management protocol .... 21 1.1.2.4 Users perspective: low usability ...... 24 1.1.2.5 ISPs perspective: network complexity ...... 25 1.2 What This Dissertation Tries to Solve? ...... 26 1.3 Conclusion ...... 27 2 TOWARD SEAMLESS MULTICAST SESSION DISCOVERY ...... 28 2.1 Design Goals ...... 28 2.2 Distributed ...... 29 2.2.1 Records Structure ...... 29 2.2.2 DHT Hierarchy Construction ...... 31 2.2.3 DHT Operations ...... 34 2.2.3.1 Addition of a domain ...... 34 2.2.3.2 Removal of a domain ...... 35 2.2.3.3 Addition of session record ...... 36 2.2.3.4 Deletion of a session record ...... 37 2.2.4 DHT Stability ...... 38 2.3 Supporting Multicast Session Discovery ...... 39 2.3.1 Database Design ...... 39 2.3.1.1 Global records database ...... 39 2.3.1.2 Local records database ...... 41 2.3.1.3 Geo-tagged database ...... 41 2.3.2 Associated Algorithms ...... 44 2.3.2.1 Session registration ...... 44 2.3.2.2 Session search ...... 46 2.3.2.3 Recovering from parent failures ...... 49

5 2.4 Related Work ...... 52 2.4.1 Multicast Session Search Strategies ...... 52 2.4.1.1 mbone sdr - session directory ...... 52 2.4.1.2 Multicast session announcements on top of SSM ..... 53 2.4.1.3 The next generation IP multicast session directory .... 54 2.4.1.4 Harvest ...... 54 2.4.1.5 Layered transmission and caching for the multicast session directory service ...... 55 2.4.1.6 Towards multicast session directory services ...... 55 2.4.1.7 IDG: information discovery graph ...... 55 2.4.2 Peer-2-peer DHT Schemes ...... 56 2.4.2.1 ...... 56 2.4.2.2 OpenDHT ...... 57 2.4.2.3 Tapestry ...... 57 2.4.2.4 Pastry ...... 58 2.4.2.5 ...... 58 2.4.2.6 CAN: a scalable content addressable network ...... 59 2.5 Conclusion ...... 59 3 TACKLING USABILITY ...... 61

3.1 IP Unicast vs Multicast ...... 61 3.2 Domain Name Service ...... 62 3.2.1 DNS Hierarchy ...... 62 3.2.2 DNS Name Resolution ...... 63 3.2.3 DNS Records ...... 63 3.3 URL Registration Server ...... 64 3.3.1 URS Internals ...... 65 3.3.2 mDNS Name Resolution ...... 66 3.3.3 Additional Usage ...... 67 3.4 Conclusion ...... 68

4 BRINGING USABILITY AND SESSION DISCOVERY TOGETHER ...... 69 4.1 Revisiting Objectives ...... 69 4.2 Integrating ‘mDNS’ DHT and URL Scheme ...... 70 4.2.1 A Complete Picture ...... 70 4.2.2 System Setup in Various Network Environment ...... 72 4.3 Use of Caching ...... 74 4.4 Domain Specific Search ...... 76 4.5 Managing Faults ...... 76 4.5.1 Failure in Portions of DNS Infrastructure ...... 76 4.5.2 Failure of URS ...... 77 4.5.3 Failure of MSD Server ...... 78 4.6 Goals Achieved ...... 78 4.6.1 Global and Distributed Design ...... 78

6 4.6.2 Existence in Present Network Environment ...... 79 4.6.3 Real Time Session Discoverability ...... 79 4.6.4 Ability to Perform a Multi-Parameter Search ...... 79 4.6.5 Fairness in Workload Distribution ...... 80 4.6.6 Plug-n-Play Design With Low System Administrator Overhead ... 80 4.6.7 Partial and Phased Deployment ...... 80 4.6.8 Self Management ...... 81 4.6.9 Multicast Mode Independence ...... 81 4.7 Looking Back - High Level Assessment of the ‘mDNS’ Service Framework 81 4.8 Conclusion ...... 82

5 ARCHITECTURE VALIDATION: SIMULATION AND ANALYSIS ...... 83 5.1 Introduction ...... 83 5.2 Simulation Environment and Strategy Description ...... 83 5.2.1 Starting the Simulation ...... 85 5.2.2 Validity ...... 86 5.2.3 Simulation Domain Hierarchy Setup ...... 86 5.3 Simulation Results ...... 87 5.3.1 Latency Experiment Results ...... 95 5.4 Qualitative Analysis and Comparison ...... 101 5.4.1 Geo-Tagged Database - Complexity Analysis ...... 101 5.4.2 Hash-Based Keyword Routing - Fairness Analysis ...... 103 5.4.3 A Comparison with Other DHT Schemes ...... 105 5.5 Conclusion ...... 107 6 CONCLUDING REMARKS ...... 108

APPENDIX: SIMULATION CONFIGURATION PARAMETERS ...... 113 REFERENCES ...... 127

BIOGRAPHICAL SKETCH ...... 134

7 LIST OF TABLES Table page 1-1 IANA assigned multicast addresses (few examples) ...... 19

3-1 Common DNS record types ...... 64

4-1 Typical cache structure ...... 75

5-1 Partial simulation data for scenario 1 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] ...... 89

5-2 Partial simulation data for scenario 2 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] ...... 92 5-3 Partial simulation data for scenario 3 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] ...... 93 5-4 Latency measurements summary ...... 96

5-5 DHT feature comparison ...... 106

8 LIST OF FIGURES Figure page 1-1 Data transmission in unicast v multicast ...... 15

1-2 Perceived data rate in unicast v multicast ...... 16

1-3 Bandwidth requirements vs number of recipients in unicast and multicast ... 17

1-4 Multicast address format in IPv6 ...... 19

1-5 IGMP v3 packet format - membership query ...... 22 1-6 IGMP v3 packet format - membership report ...... 23 2-1 Local and global session records structure ...... 30

2-2 A general domain hierarchy ...... 32 2-3 Example routing table structure ...... 35

2-4 Steps in DHT domain addition ...... 36 2-5 DHT record insertion example ...... 37 2-6 Global sessions database design ...... 40 2-7 Geo-tagged database design ...... 42 2-8 Screenshot - session registration tool ...... 45

2-9 Session registration ...... 46 2-10 Session search ...... 47 2-11 Parent node failure recovery strategy ...... 51 3-1 Location and names of DNS root servers [source: ICANN] ...... 63

3-2 Typical steps in ‘mDNS’ URI name resolution ...... 67 4-1 A typical mDNS domain components ...... 70 4-2 A typical mDNS hierarchy in ASM network ...... 72

4-3 A mDNS hierarchy in mixed network operation mode ...... 73 5-1 Screenshot - mDNS auto simulator program ...... 84

5-2 Screenshot - mDNS latency measurement tool ...... 85

5-3 Various network topologies chosen for simulation ...... 87

9 5-4 Average hash skew - scenario 1 ...... 88 5-5 Skew standard deviation - scenario 1 ...... 88

5-6 Average control bandwidth - scenario 1 ...... 90

5-7 Control bandwidth standard deviation - scenario 1 ...... 90 5-8 Average route switches - scenario 1 ...... 90

5-9 Route switch standard deviation - scenario 1 ...... 90 5-10 Average route stabilization time - scenario 1 ...... 91 5-11 Route stabilization time standard deviation - scenario 1 ...... 91

5-12 Average hash skew - scenario 2 ...... 91 5-13 Skew standard deviation - scenario 2 ...... 91

5-14 Average control bandwidth - scenario 2 ...... 94

5-15 Control bandwidth standard deviation - scenario 2 ...... 94 5-16 Average route switches - scenario 2 ...... 94

5-17 Route switch standard deviation - scenario 2 ...... 94 5-18 Average route stabilization time - scenario 2 ...... 95 5-19 Route stabilization time standard deviation - scenario 2 ...... 95 5-20 Average hash skew - scenario 3 ...... 95 5-21 Skew standard deviation - scenario 3 ...... 95

5-22 Average control bandwidth - scenario 3 ...... 96 5-23 Control bandwidth standard deviation - scenario 3 ...... 96

5-24 Average route switches - scenario 3 ...... 96 5-25 Route switch standard deviation - scenario 3 ...... 96

5-26 Average route stabilization time - scenario 3 ...... 97 5-27 Route stabilization time standard deviation - scenario 3 ...... 97

5-28 Summary chart for latency experiments ...... 97 5-29 Range chart for latency experiments ...... 98

5-30 Median Latency ...... 98

10 5-31 Average Latency ...... 98 5-32 Average of weighted scores - scenario 1 ...... 99

5-33 Standard deviation of weighted scores- scenario 1 ...... 99

5-34 Average of weighted scores - scenario 2 ...... 99 5-35 Standard deviation of weighted scores- scenario 2 ...... 99

5-36 Average of weighted scores - scenario 3 ...... 100 5-37 Standard deviation of weighted scores- scenario 3 ...... 100

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A DISTRIBUTED APPROACH TO MULTICAST SESSION DISCOVERY MDNS - A GLOBALLY SCALABLE MULTICAST SESSION DIRECTORY ARCHITECTURE By Piyush Harsh August 2010

Chair: Richard Newman Major: Computer Engineering

This dissertation addresses the issue of multicast session discovery by an end user. IP Multicast has tremendous network bandwidth utilization benefits over conventional data transmission strategies. Use of multicast could prove cost effective for many Content Distribution Networks (CDN). From an end user perspective, accessing a live stream using multicast will result in better video reception quality compared to the unicast transmission. This being imposed largely due to limited line bandwidth being

shared among several competing data streams. Still the deployment is very sparse in the Internet. One of the reasons is less user demand due to lower usability compared to IP unicast. The supporting network infrastructure that was deployed after standardization of TCP protocol helped tremendously in improving the usability of IP unicast. The Domain Name Service (DNS) infrastructure allowed users to access target hosts using a Fully Qualified Domain Name (FQDN) string against using the dotted decimal IP

addresses [1]. Since the unicast IP addresses were allotted in a regulated manner and because of the longevity of assignments, it became easier to search and locate resources on the Internet. Lack of such infrastructure support has deprived multicast its usability from an end user perspective. More importantly, shared nature of multicast

12 addresses and the short life of address use and frequent reuse from the common pool makes it difficult to search and discover content by the end user.

This dissertation provides a distributed hierarchical architecture that efficiently addresses some of the usability issues raised above. The tree hierarchy closely co-located with the DNS infrastructure allows the presented scheme to assign Universal Resource Identifiers (URIs) for multicast streams that an end user can bookmark. The proposed scheme automatically re-maps the correct session parameters with the URIs in case they change in future. The Distributed Hash Table (DHT) approach for search and discovery of multicast sessions presented in this dissertation uses a tree hierarchy which is more suitable for the task at hand. Many live multicast streams are not replicated, so there is a need to locate the source of the data and therefore the search scheme required is somewhat traditional in nature. The relative instability of many multicast streams and associated session parameters makes many traditional P2P DHT schemes unsuitable for the problems addressed in this work.

Simulation results and analytical comparison of the proposed scheme with existing approaches are presented towards the end of this dissertation. A detailed discussion of why several of the existing DHT schemes for keyword search and Session Announcement Protocol (SAP) / Session Discovery Protocol (SDP) based multicast session discovery schemes are unsuitable for the identified problem is presented as well.

13 CHAPTER 1 GENERAL INTRODUCTION

1.1 IP Multicast

Network traffic in the Internet can be broadly classified into connection oriented or connectionless stream. The three communication paradigms that internet protocol

(IP) supports are unicast, [2], and multicast [3][4]. Unicast allows point to point communication between networked hosts. In IP unicast, the source and destination addresses identify unique nodes in the global network. In anycast model, associated with a fixed anycast address, there could be more than one host associated. The communication paradigm that it supports is one to at least one model. The network routers tries to deliver the data to at least one of the hosts associated with that anycast address. IP multicast lies at the other end of the spectrum. It allows for one to many (SSM) [5] or many to many (ASM) [6] transmission paradigms. The multicast transmission paradigm is partly determined by the distribution tree that the core network uses for data distribution among interested recipient hosts.

Rendezvous Point (RP) [7] based distribution tree generally allows Any Source Multicast (ASM) to operate. In RP based distribution tree, a network host interested in receiving group communication joins the distribution tree at one of the nearest leaf nodes. The source sends data to the RP nodes and the data is disseminated down to all the interested recipients. In ASM model, the data source needs to locate the RP node in order to send group data. The sender is not required to join the multicast group in order to send the data. Since many hosts can send data to the same multicast group by just transmitting data through the RP node, hence the name “Any Source Multicast”.

In Source Specific Multicast (SSM) which is sometimes also referred to as single source multicast, the data distribution tree is rooted at the data source. Now, in addition to finding out the multicast group that a recipient node is interested in joining, it must also find out the source node IP address in order to join the correct data distribution tree.

14 SSM is significantly different than ASM as only the source node at which the distribution tree is rooted, is allowed to transmit data along the tree to the interested recipient hosts and thus the name “Single Source Multicast”.

1.1.1 Why Multicast?

IP multicast offers tremendous bandwidth benefits to the source as well as better quality of service (QoS) perception to the end users. In multicast, the core network does data stream replication along the branches in the distribution tree. Along any path from the root to the leaf node, there exists just one data stream. Compare this strategy using IP multicast to the data distribution using IP unicast where data stream replication must be done at the source itself. The bandwidth requirement at the source in unicast increases linearly with the number of subscribers interested in receiving the data stream.

Figure 1-1. Data transmission in unicast v multicast

In Figure 1-1, the source node has to replicate the data stream 4 times to support 4 recipient hosts. There is higher bandwidth load on intermediate sections of the core network as well. Comparing this with the case where the source node is transmitting data using multicast, the sender just provides one data stream and the core network

15 replicates the data efficiently along the branches in the distribution tree. The overall network load is also much lower in the case of multicast as compared to the unicast case. Use of multicast makes economic sense where all the recipients are interested in receiving same data stream. Live events broadcast to subscribers using multicast should be preferable over unicast delivery model.

Since network bandwidth is a shared resource. Available bandwidth in the core network is generally shared among all competing traffic. Ongoing debate on net neutrality is in favor of maintaining this unbiased sharing of core network bandwidth. Assuming that the competing data streams receive fair of the link bandwidth, overall stream data rate will be governed by the bandwidth it receives at the bottleneck link along the route from source to destination node. In such a scenario, multicast can play a big role in improving the perceived QoS at the receiving host.

Figure 1-2. Perceived data rate in unicast v multicast

Let us assume that the bottleneck link has 200 Kbps as shown in Figure 1-2. In unicast scenario, there are 4 data streams sharing that bottleneck link, and therefore assuming fair share of bandwidth each stream gets 50 Kbps rate. Even though the

16 recipient nodes immediate network may be capable of transmitting data at a much higher rate, the unicast model becomes a limitation on the QoS perception at the recipients end. In contrast, since there is just 1 stream along the bottleneck link, and since the network replicates the streams as and when required along the branches in the distribution tree, each recipient node receives the stream at 200 Kbps thereby improving the QoS perception at recipient nodes.

Multicast also offer tremendous cost benefits to the content providers. A provider transmitting data using multicast can potentially serve large subscriber base using a small server farm as the bandwidth requirement would be fairly constant regardless of the number of subscribers. In contrast, with unicast, the bandwidth requirement at the source (content provider) grows linearly with the number of subscribers. A popular content provider may potentially need to manage a large server farm and purchase larger bandwidth at a premium from its Internet Service Provider (ISP).

Figure 1-3. Bandwidth requirements vs number of recipients in unicast and multicast

Figure 1-3 shows the bandwidth requirements as number of recipients grow for unicast versus multicast mode of data transmission at the content host (sender). The

figure assumes that the base data stream transmission is at 100 Kbps and the core

17 network has limitless bandwidth. Multicast can also provide monetary benefits to Content Distribution Networks (CDN).

Clearly, multicast offers tremendous bandwidth savings and monetary benefits to content providers, end users and core as well as fringe ISPs. Let us now find out how multicast can be enabled in the network and the hardware and software requirements before one can use it.

1.1.2 Requirements for Enabling/Using Multicast

For end users to use multicast, a network must support these mechanisms needed for multicast -

1. addressing 2. routing capabilities 3. ability for end users to join 4. must be usable from users’ perspective 5. lower complexity of deployment for ISPs The top 3 requirements are primary for multicast capability to even exist in the network, but the last 2 points are very important for actual deployment and usage by end users. Lets discuss each of these requirements in brief now.

1.1.2.1 Multicast addressing

Any communication medium requires a form of addressing that helps the message to be routed from the source to the intended recipient. In IP unicast, every node is assigned a unique address and that helps the data packets to be efficiently routed to the destination address. Because multicast operates with the group communication paradigm, the intended recipients could be several hosts in the Internet. Therefore the multicast destination addresses are not mapped to any single host. Unlike unicast, where the address assignment is done is a controlled manner by the ISPs or the organizations, address allotment in IP multicast for groups is not possible in a similar fashion. The existence and composition of the group is not known a-priori. Also the duration when a particular group is active is also very difficult to ascertain beforehand.

18 Table 1-1. IANA assigned multicast addresses (few examples) Address (IPv4) Address (IPv6) Usage Scope 224.0.0.1 FF02:0:0:0:0:0:0:1 All Node Addresses Link Local 224.0.0.2 FF02:0:0:0:0:0:0:2 All Multicast Routers Link Local FF01:0:0:0:0:0:0:1 All Node Addresses Node Local FF05:0:0:0:0:0:1:3 All DHCP Servers Site Local FF0X:0:0:0:0:0:0:130 UPnP All Scopes

Keeping all these restrictions in view, Internet Assigned Number Authority (IANA) adopted a somewhat relaxed attitude towards multicast addresses.

IANA assigned the old class D address space for multicast group addressing. All

addresses in this range have 1110 prefix as the first 4 bits of IPv4 address. Therefore, IP multicast addresses range from 224.0.0.0 - 239.255.255.255. Multicast addresses in IPv6 are identified by the first octet of the address set as 0xFF. In the earlier days, the multicast data packet’s scope was determined by Time to Live (TTL) scoping rules. Over the period, TTL scoping was found to be confusing to implement and manage in the deployed networks. As IP multicast gained some traction, IANA started to manage the

address space more efficiently. Table 1-1 shows some of the addresses that have been assigned by the IANA and their intended purpose and valid scopes. In IPv4 239.0.0.0 to 239.255.255.255 has been reserved as Administratively

Scoped [8] multicast addresses. Data transmitted on these groups are not allowed to cross the administrative domain boundaries. For IPv6 this range is defined as FFx4::/16. FFxE::/16 is defined as global scope, i.e. data packets addressed to this address range

are eligible to be routed over the public internet. Figure 1-4 shows the general format of IP multicast addresses in IPv6.

Figure 1-4. Multicast address format in IPv6

19 The deployment of IPv6 slowly will solve the lack of addresses for IP multicast in IPv4. A significant issue with IPv4 was the high probability of address clash among active multicast sessions due to lack of a managed address allocation scheme [9][10]

[11][12][13][14][15]. And with SSM slowly replacing ASM model of multicast in the Internet (or at least we are hoping that it will be the case), the address clash problem will be taken care of. Lets take a brief look into what is required for multicast data packets to be routed in the network. 1.1.2.2 Multicast routing

Routing of data packets in multicast is generally done via shared distribution trees. In contrast to unicast where packet forwarding decision is based on the destination address, almost every multicast routing protocol makes use of some form of reverse path forwarding (RPF) [16] check. For example, if the incoming data packet distribution is done along a source tree (as in the case of SSM), the RPF check algorithm checks whether the packet arrived on the interface that is along the reverse path to the source from that router. If yes, then the packet is forwarded on all interfaces along which one of more recipient host(s) can be reached. If not, the packet fails the RPF check and is dropped.

The RPF check table could be built using a separate multicast reachability table as is done in the case of Distance Vector Multicast Routing Protocol (DVMRP) or could be done using IP unicast forwarding table as in done in Protocol Independent Multicast (PIM).

Many competing intra-network multicast routing protocols exist in the networks today including DVMRP [17][18], PIM (both dense mode (DM) [19] and sparse mode (SM) [20]), Multicast Open Shortest Path First (MOSPF) [21], Core Based Trees (CBT)

[22], etc. to name a few. Because PIM uses unicast forwarding tables for RPF check, it is easier to implement and is rapidly gaining acceptance among internet vendors.

PIM-SM has emerged as the protocol of choice for implementing multicast routing in

20 modern routers. PIM-SM along with Multiprotocol Extensions for BGP (MBGP) [23] have become protocols of choice for developing inter-network multicast networks. Another

upcoming protocol similar to MBGP is Border Gateway Multicast Protocol (BGMP) [24]

that helps interconnect separate networks using a global bi-directional shared tree. PIM-SM makes use of shared tree for initial data delivery to receivers. These shared trees are rooted at nodes referred as Rendezvous Points (RP). In a network, several RPs could exist, each RP could be configured to act as a distribution tree for several multicast groups. There exist several vendor specific protocols ex. Cisco’s

Auto-RP [7] that allows all PIM-SM routers to learn the Group-to-RP mappings. Similar vendor specific protocols exist for RP discovery.

Assuming that a RP in a domain would eventually know about all the active sources

in that domain, there needs to be a way to find out about sources in other domains. This is achieved using Multicast Source Discovery Protocol (MSDP) [25]. Using MSDP, a RP is peered with another RP in some other domain. MSDP helps form a network among MSDP-peered RPs. These peered RPs exchange multicast source information among each other. Thus in time a RP in one domain is able to discover multicast source in an external domain.

Thus enabling global IP multicast involves a plethora of complex protocols to be implemented in the multicast capable routers. The charm of SSM is that it does away with several complexity necessitated because of multiple sources support in ASM mode.

Lets examine next how an end user joins or leaves a multicast group.

1.1.2.3 IGMP/MLD: Internet group management protocol

IGMP/MLD [26][27][28][29][6][30][31] allows an end user to indicate to the first hop multicast capable router about its interest in receiving multicast packets belonging

to a particular group. The very first version of IGMP was “Host Membership Protocol” written by Dr. Steve Deering as part of his doctoral research. IGMP protocol is used by

21 IPv4 capable hosts whereas Multicast Listener Discovery (MLD) provides the similar functionality as IGMP but for IPv6 capable hosts.

The most recent version for IGMP is version 3 [31] whereas the latest MLD protocol

stands at version 2 [29]. IGMP version 2 [30] added low host leave latency to IGMP 1 [6] and IGMP 3 added source filtering capabilities to version 2. MLD version 1 [27] provided

similar functionalities as IGMP v2 and MLD v2 [29] allows for similar functions as IGMP v3. Both IGMP v3 and MLD v2 are designed to be backward compatible with earlier versions.

Figure 1-5. IGMP v3 packet format - membership query

Figure 1-5 shows the format of a membership query message that is sent by the multicast capable routers to query the membership status on the neighboring interfaces. Hosts can contact the neighboring routers notifying them of their multicast reception state or any changes to it using a membership report message. The format of IGMP v3

membership report message is shown in Figure 1-6.

A multicast capable router uses membership query message to find out if any

active listener is present in any of its neighboring interfaces. In Figure 1-5, the “Max

Resp Code” field denotes the maximum time allowed before sending the response to

the query, “S” is the “Suppress Router-Side Processing” flag, QQIC is the “Querier’s

22 Query Interval Code” and QRV is the “Querier’s Robustness Variable” value. The “Group Address” field is set to 0 to send a general query or is set to a specific multicast IP address if a group specific query or a group and source specific query has to be sent. If “Type” field is not 0x11 or 0x22, the packet must be processed for backward compatibility depending on whether these values are present instead.

1. 0x12: Version 1 Membership Report 2. 0v16: Version 2 Membership Report 3. 0x17: Version 2 Leave Group

Figure 1-6. IGMP v3 packet format - membership report

Using these IGMP query and IGMP report messages, the hosts and the routers are able to determine on what neighboring interfaces they should forward the multicast

packets to and what all interfaces must be removed from the multicast forwarding list for

any given multicast address. Hosts can also take initiative and notify the neighboring

23 routers to start forwarding data packets for the multicast group they are interested to receive data from using IGMP report messages. The routers take appropriate action to join the distribution tree (RP based or source specific) if they have not already done so far, for some other host for that same multicast group. Now that we have briefly seen the essential building blocks for enabling native multicast to be deployed in the network, let us try to understand the reasons for its lack of mass deployment and user acceptance in the Internet. 1.1.2.4 Users perspective: low usability

In the case of unicast, because of the longevity of the address assignments to hosts, it has been possible to assign aliases to IP addresses and setup a global name resolution system to resolve these aliases to associated IP address. Universal Resource Locators (URLs) are the address aliases that has made the Internet more usable for an average user. DNS [32] is the name resolution service that maps FQDNs to appropriate IP addresses and in turn makes the use of URLs possible. As most of the resource on the Internet are stable resources with long term availability, end users can bookmark FQDNs and URLs for future use.

Long term stability of resources and their availability has another benefit. These can be indexed by search engines using web crawlers. This allows users to locate content over the Internet using keyword searches. Keyword searches allowed by search engines like Yahoo and Google along with the use of URLs and FQDNs have helped improve the usability of web for an average user.

As documented earlier, lack of prior knowledge of group composition and the time and duration of a group’s existence along with no explicit restriction on the use of a group address other than the general classification enforced by the IANA presents several challenges that are absent in the case of unicast.

1. unstable group address: because of no long term stability associated with multicast addresses assigned to user groups, these can not be used with the current DNS scheme. DNS has been designed with stability of FQDNs and

24 associated IP address as a factor that allows DNS entries to be cached. With inherent instability in the multicast addresses, DNS in its current format can not be used. 2. lack of a standard content discovery mechanism: as multicast contents have transient life cycles with varied duration and availability, it becomes almost impractical for modern search engines to crawl multicast content space and maintain a crawler data of contents whose availability is at best uncertain. There is a lack of a standard service that would allow end users to locate multicast contents.

Traditionally users get information about the time and duration of popular multicast groups through usenet groups and through emails from friends. Clearly these discovery mechanisms are not scalable if multicast has to become a user driven technology. There has to be a standard and scalable service that would allow end users to locate existing multicast session in almost real time. Improving usability will ensure next wave of user acceptance for multicast technology. Let us examine now why ISPs have been reluctant to deploy multicast!

1.1.2.5 ISPs perspective: network complexity

Compared to unicast where the core routers have to perform routing table updates periodically and do packet forwarding, a multicast router has to execute numerous protocols. We have already seen in 1.1.2.2 what all goes to make multicast work in a modern network. Supporting ASM mode of operation is especially complex as the responsibility of source discovery rests with the routers (configured as RPs). The 2 step PIM-SM protocol where initially hosts get data via shared distribution trees rooted at RPs and later switch to shortest path tree (SPT) adds complexity. With addition of source filtering in IGMP v3 [31] and MLD v2 [29], the receiver gains the capability to specifically denote a set of sources it is interested in getting the multicast data from. Network researchers agree that implementing a single source multicast with strict unidirectional data flow from the source to interested hosts is much easier to implement. If further, source discovery is made a user prerogative, the network will be released from the added burden to run MSDP protocol and maintaining a list of active senders. Further

25 if source is known right from the start, the PIM-SM can start directly with the second phase with building of the SPT towards the source. Hence there would be no need to maintain a RP rooted distribution tree in the network.

It is thus easy to see why network operators and router vendors are reluctant to implement and deploy ASM multicast in the network. But with IGMP v3 becoming a standard and SSM research maturing, the network layer responsibilities for IP multicast will be reduced significantly. If sufficient user demand exists for IP multicast (provided usability concerns raised in 1.1.2.4 are addressed), ISPs should have no difficulty in enabling IP multicast in their networks. Pricing incentives for content hosts and CDN service providers could also spur multicast growth.

1.2 What This Dissertation Tries to Solve?

Newer routers have multicast capability built in largely due to the fact that, with SSM only support, the protocol complexity to be supported at the network layer comes down significantly. Still network administrators are reluctant in turning on the multicast support in the installed routers in their domain. The reason being lack of user demand for the technology. Cases are coming into light where influential players are encouraging ISPs

to adopt IP multicast. Recent BBC multicast trial in UK [33] and multicast streaming of digital TV in Spain are a few examples. True global deployment of native multicast will come only if ISPs are pressurized to deploy multicast due to rising end-user multicast demand.

This dissertation work started with the question - “Why is there low end user demand for IP multicast?”. The lack of a seamless mechanism to search for multicast content was one of the factors. This dissertation presents a tree-based DHT approach that solves the multicast session search issue in a globally scalable manner. Geo-tagging

was introduced in the session record structure to allow for more advance search dimensionality. Another major factor for lower user demand was lower usability of

multicast technology as compared to unicast. Complementary support architecture is

26 proposed in this text that enables URLs to be assigned to multicast streams and co-exist seamlessly with the current DNS architecture. It is referred to as “mDNS” in this text and the intended usage of this name is to emphasize DNS like name resolution capability for transient multicast streams. The aim of this work is to herald in an era of true end user multicast use. If proper infrastructure support is provided, one can imagine several interesting use cases to emerge due to it. Real-time, truly scalable, citizen iReporting which would instantly provide video feed to millions of viewers worldwide, could be one such use case. Disaster preparedness and disaster management could be made easier. All this would require an architecture that would allow new multicast sessions to be made discoverable to others in a real time fashion. The proposed architecture achieves this very goal.

Some skeptics may ask - ‘Why not use Google or Yahoo search to achieve session discovery?’. The answer for that can be found in section 1.1.2.4.

1.3 Conclusion

This chapter described in brief basic building blocks for IP multicast. It gave arguments for use of IP multicast. The benefits that multicast provides to content providers, end users, core and fringe ISPs and other service enablers like CDNs were also discussed. It described addressing for multicast, provided brief overview on multicast routing and shared as well as dedicated distribution trees. It also described IGMP in some details. IGMP allows end users to join or leave multicast group. It also talked about reasons why multicast has not taken off in the same manner as unicast even though in many data delivery situations it performs better and has a better price/bandwidth ratio. The following chapters delve deeper into the issues raised here in this chapter and their solutions and validation.

27 CHAPTER 2 TOWARD SEAMLESS MULTICAST SESSION DISCOVERY

2.1 Design Goals

In order to make multicast contents seamlessly discoverable by the end users, there has been a need for a globally scalable and distributed architecture that provides the required services. Further, one of the important requirements in our design was the real time discoverability of the sessions. Many multicast sessions may not be planned ahead in time. For such session contents to be delivered to the target audience, there could be two possible solutions: one where the content creator notifies its target demographic of the existence using out of band mechanisms such as mass emailing, posting on IRQ channels and message boards, the other possibility would be to make the directory architecture design able to support the instant searchability of the session by end users. The major design goals and supported features of the proposed architecture are -

1. global scalability and distributed design - the proposed architecture must be globally scalable and should be distributed in nature to minimize bottlenecks 2. ability to exist in current network environment - a proposal that necessitates major network hardware and software overhaul will most likely not get deployed because of monetary constraints, so existence in existing environment becomes important 3. real time discoverability of sessions - even those session that are created at a moments notice should be immediately discoverable by users 4. multi-parameter search - the proposal must provide end users a multi-dimensional and full search capability 5. fairness in workload distribution - the design must be fair with respect to workload and resource requirements on the participating parties 6. plug-n-play design with lower system administrators’ involvement - the proposal must require minimal involvement on the system administrators’ part, network administrators are already over worked 7. ability of partial/in phase deployment - the deployment should be useful even if deployed on a very small scale 8. self managing structure in the face of topographical changes - the architecture should be self managing in the face of dynamic topographical changes 9. session geo-tagging to support location based search 10. multicast mode independence - ability to exist in ASM or SSM networks or even in non multicast environments

28 As immediate session search visibility is a major design goal, the system is not a crawler based architecture like traditional search engines. Since the multicast space is not well organized, use of crawlers is not even feasible. The system is a registration based

design where the content host (source) registers the session only with its domain local system component. The system makes the session visible globally. The design details and architecture components are described next.

2.2 Distributed Hash Table

Secure hashing [34] is a reasonable tool that achieves equitable distribution of data over multiple sites. In a Distributed Hash Table (DHT) scheme, the record’s keyword is hashed. The hash value determines where the actual record will be stored for retrieval later. In “mDNS” search architecture, each data site or Multicast Session Directory (MSD) manages two types of records. These are called -

• Domain local session database records • Global session database records Each site or MSD maintains three databases -

• Locally Scoped Sessions Database • Globally Scoped Sessions Database • Geographical Database These data sites are linked to one another in a tree hierarchy with a single root node. The DHT hash space is divided among all the participants in the tree overlay. The algorithms managing the hash space distribution and redistribution in the face of topology changes are discussed later in this chapter.

2.2.1 Records Structure

Figure 2-1 shows the components that make up an administratively scoped multicast session (local session) and globally scoped multicast session records. Some of the data elements’ importance will be revealed in next two chapters.

A brief explanation of the various fields follows next -

29 Figure 2-1. Local and global session records structure

30 • Expiration Time: Time after which session records may be purged from the MSD database • Start Time: Time before which a session may not exist • Keyword: One of possibly several keywords used to tag this particular session • URS Identifier: Unique session identifier registered with the URS server (see Chapter 3) • Channel IP: Multicast session IP • Channel Port: Multicast session Port • Source IP: if network type is SSM, this IP determines the content host machine • Fail Over Unicast IP: backup unicast stream source IP address (optional) • Fail Over Unicast Port: backup unicast stream port (optional) • Channel Scope: Multicast stream scope (global/local) • Geographical Common Name: Common name of the place associated with the stream • Latitude: Latitude value associated with the session • Longitude: Longitudinal value associated with the session • Network Type: multicast compatibility of the session’s hosting network (ASM/SSM) • Stream Type: Identifies the nature and type of the multicast stream • Preferred Application: Identifies the suggested application to be used to access the stream type • CLI Arguments: Command Line Interface (CLI) Arguments denotes arguments to be supplied to the preferred application • Mime Type: MIME type of the stream data, it must be one of the IANA registered MIME types • URI: URL string of the URL Registration Server (URS) in the domain of the original MSD server • Inversion Flag: denotes whether the keyword hash-validity for storage is determined using regular hashing technique or the value computed using bit-inverted hashed value. Some of the terms such as URS will be explained in great detail in Chapter 3. The hashing strategy that can be used is any secure hashing algorithm. For the proof of concept experiment we used MD5 [35].

2.2.2 DHT Hierarchy Construction

The system administrator of a domain has to manually configure a few parameters in the URL Registration Server (URS) hosted within his domain. These parameters are -

• PMCAST - multicast address of the channel used to receive communication from the parent domain

31 • CMCAST - multicast address for a domain to send communication to its children domains • IGMP version support • URL string of the parent domain, if no parent URL exists, then set to ‘void’.

URS will be described in more details in Chapter 3. The MSD servers in each domain retrieves the above parameters during bootstrap phase. The values of the parameters allow the MSD server to join an existing hierarchy. Multiple MSD servers can be started in the same domain to improve local redundancy. Every MSD server is equipped to execute a common leader election protocol. The elected leader becomes the

‘designated’ MSD server. Such a server in a domain will be referred to as MSDd .

Figure 2-2. A general domain hierarchy

Each domain reports the count of domains in the subtree rooted at self to its parent domain. Figure 2-2 shows a general domain hierarchy. The numbers listed next of each

32 node in the hierarchy denotes the domain count that particular domain sends to its parent. Soon the root node finds out the total count and the count distribution along direction towards each of its children domains. This information allows the root node to partition the overall hash-space into equal (almost equal, as the range is composed of discrete values and not a continuous line) chunks to be assigned to all the participating domains. The hash space allotment details travel from the root node to all the leaves and all intermediate nodes. As the space allotment information trickles down through the tree, each intermediate node makes a decision on how to further subdivide the space among itself and the children domains. Depending on the total domain count, the hash space is divided using only the ‘n’ most significant bits (MSB) of the entire hash-space where 2n ≥ Count. For example, if there are 8 domains in the hierarchy, and MD5 hash algorithm is used to generate the hash value (128 bits), only the first 3 MSB bits are used for hash space division. Algorithm 1 shows the hash space division process performed at the root node.

Very similar algorithm to algorithm 1 is performed at each intermediate node in the hierarchy when they have to allot subsapces to their children. The only change is that MSB is set to the value that is received from the parent node and is not computed, and START and END values are set to the hash range start and end values received. As and when the hash distribution is propagated down, the nodes update their routing tables that allow a node to route a particular hash value towards the target domain in the hierarchy. Figure 2-3 shows the example of hash space assignment to domains A and G in the hierarchy as well as the DHT routing table construction at the two nodes, node A and node G using algorithm 1. Now that we have seen how the DHT hierarchy is created in the proposed architecture, let us discuss the operations permitted in the DHT hierarchy. Typically, any P2P DHT scheme allows insertion and removal of participating peers as well as addition and deletion of data records. Each of these 4 operations are discussed next.

33 Algorithm 1: Hash Space Division Algorithm : At Root Node 1 begin P 2 set COUNTdomain = number − of − reported − count − from − each − subdomain + 1 3 MSB ←− 1 MSB 4 while 2 ≤ COUNTdomain do 5 MSB ←− MSB + 1 6 end 7 set START ←− 0 MSB 8 set END ←− 2 − 1 9 set RANGE ←− END − START + 1 10 set CURRENT ←− START 11 set ( oat)RATIO ←− RANGE ÷ COUNTdomain 12 set NUMBER2ALLOT ←− (long)RATIO 13 set ( oat)FRACTION Remaining ←− RATIO − (long)RATIO 14 if CURRENT + NUMBER2ALLOT − 1 ≥ CURRENT then 15 {set CURRENT to CURRENT + NUMBER2ALLOT -1 range to self} 16 end 17 set CURRENT ←− CURRENT + NUMBER2ALLOT 18 foreach child-domain from 1 to i do 19 set NUMBER2ALLOT ←− (long)RATIO × COUNTi 20 set FRACTION Remaining ←− FRACTION Remaining+RATIO×COUNTi −NUMBER2ALLOT

21 while FRACTION Remaining ≥ 1.0do 22 set NUMBER2ALLOT ←− NUMBER2ALLOT + 1 23 set FRACTION Remaining ←− FRACTION Remaining − 1 24 end 25 if CURRENT + NUMBER2ALLOT − 1 ≥ CURRENT then 26 {set CURRENT to CURRENT + NUMBER2ALLOT -1 range for CHILDi } 27 end 28 set CURRENT ←− CURRENT + NUMBER2ALLOT 29 end 30 end

2.2.3 DHT Operations

2.2.3.1 Addition of a domain

When a new “mDNS” domain decides to participate in the hierarchy, the system administrator in that domain configures the necessary parameters in the URS stationed in that domain. The MSDd retrieves the necessary connectivity parameters from the URS and start reporting itself to the parent MSD host. The overall domain count gets updated at the root after a few periodic update cycles. The root reassigns the hash space and the hash space allotment is updated as it percolates down from the root to the leaf nodes. If any stored record at a particular

34 Figure 2-3. Example routing table structure node now does not lie within its assigned hash space, that record is migrated to the correct newer destination using the hash-routing table maintained at each node. Frequent addition of domains in the overall hierarchy can lead to frequent hash space reassignments and frequent migration of records. The hierarchy routing infrastructure and records location stability is improved using the domain-count reporting strategy described in 2.2.4. Every node also knows the root node’s unicast connection details which they may use in case of a dead parent scenario in order to locate the appropriate grafting point in the tree. The strategy is described in later chapters.

2.2.3.2 Removal of a domain

Removal of a domain from the hierarchy would result in the parent node not receiving the periodic heartbeat messages from that node. After a predetermined

35 Figure 2-4. Steps in DHT domain addition timeout, the parent will update the child count and this updated count will be propagated towards the root. This would lead to hash space reassignment from root towards all the leaves. Stabilization strategy is described in 2.2.4.

2.2.3.3 Addition of session record

Each multicast session record contains a list of keywords used to tag the session. Inclusion of these keywords allow the users to search of such sessions in future. Each keyword is hashed using correct hashing algorithm (MD5 in our case). Addition of a record could be initiated at any domain in the hierarchy. Using the hash value, the record is routed to the target domain that manages the hash space where this particular hash value lies. Routing is done in accordance to the routing table maintained at every node in the DHT tree. The target domain stores the record in its database. Only the appropriate number of significant bits of the hash value is used for routing. Further, in order to improve domain failure masking, a second copy (shadow) of the record is routed

36 using the inverted hash value to another destination in the overall hierarchy. Figure 2-5 shows an example case where a record and its shadow copy is routed through the DHT structure to appropriate target domains. The routing is done using the keyword hash of the record. The hash value shown in the figure is arbitrary and is provided for clarity only. Routing is done based on first 4 bits of the hash in the figure.

Figure 2-5. DHT record insertion example

2.2.3.4 Deletion of a session record

Explicit removal of a session record is not permitted in the architecture. Every record has a set expiration time which is interpreted as the ‘number of clock ticks’ into future from the time the record was inserted. This provides protection from local

37 time discrepancies at remote nodes. Each site runs a periodic garbage collector that removes the records past their expiration time.

2.2.4 DHT Stability

The hash space distribution is dynamic in nature. It changes with the change in the . We foresee the hierarchy to become stable in time. But even a single addition and removal of domain from the overall hierarchy can lead to DHT space reshuffling. This could lead to system wide records migration in order for the databases

to get realigned with the new space assignment. In order to reduce this system wide data movement every time a new domain is added or removed, each domain runs this

simple count reporting and space reassignment algorithm specified in algorithm 2.

Algorithm 2: Node count reporting algorithm

1 begin 2 set values for α and β 3 set COUNTa = current count 4 set COUNTb = current count 5 begin periodic behavior 6 listen for count reporting from children 7 update COUNTb COUNTb−COUNTa 8 100 if COUNTa × ≤ α then 9 do nothing COUNTb−COUNTa 10 100 else if α < COUNTa × ≤ β then 11 do proportional hash space reassignment among children domains and self 12 update routing table 13 else 14 report COUNTb to parent node 15 set COUNTa = COUNTb 16 end 17 if new hash space assignment comes from parent then 18 create new routing table 19 end 20 end 21 end

The values of α and β are set such that α > 0 and α ≤ β. The choice of values need not be fixed universally but is left as a domain level decision and to be set by the system administrators. The optimal value for α and β depends on the overall topology of the DHT hierarchy and is discussed in Chapter 5. Whenever a domain’s hash-space

38 assignment changes, the MSDd runs the data realignment process that migrates the stored records to correct remote sites if needed. If more than one MSD server exists in one domain, the sync process must bring other MSD servers in sync with the MSDd store. Now that the DHT scheme used in the architecture has been specified, let us see how this DHT scheme is employed in seamless multicast session registration by the content providers and how search by end user is supported in the structure.

2.3 Supporting Multicast Session Discovery

The main software component that facilitates session discovery in “mDNS” architecture is an MSD server. Every participating domain must host at least one MSD

server. If multiple MSD servers exist for redundancy reasons, one must be selected as designated server so that it can take part in “mDNS” operations. Others can exist in standby mode in case where the primary fails. In order to support the search and implemented strategies that differentiates between domain local search and global searches as well as domain specific searches, each MSD server must support three types of records database. Let us see them in details.

2.3.1 Database Design

Database design has been left open for the implementors as part of the IETF proposed RFC [36] that we floated. In this section we will describe the design we implemented for the proof of concept version that was developed.

2.3.1.1 Global records database

Global Records Database is used to store globally scoped multicast records whose associated keyword hash values lie in the range that is managed by this domain. Globally scoped multicast sessions whose keyword hash values after bit-inversion lies in the hash space managed by this domain are also stored in the ‘Global Records

Database’. The data structure components used in this database are shown in Figure 2-6. We

have used a keyword meta-store to speed up the search process. The meta-store is a

39 hash structure that uses sorted linked-lists for overflow management. The meta-store is used to find out quickly if the keyword being searched has any corresponding session records in the main database in the first place.

Figure 2-6. Global sessions database design

40 The main database is a three level structure. The key to be searched is first hashed to find out the location in the hash table. Then the overflow linked list of target keys is traversed to located the desired key. The third level is the linked list of all the session

records associated with the requested search keyword. Actual session record structure for globally scoped multicast session is shown earlier in Figure 2-1.

2.3.1.2 Local records database

The local records database construction is very similar to the global records

database. The difference is that it stores all the session records whose registration request originates from within its own domain. The sessions are stored irrespective of

whether they are administratively scoped or globally scoped.

2.3.1.3 Geo-tagged database

Each multicast session record is geo-tagged based on the location data that the session creator provides during session registration phase. The location data normally would be the location where the content is originating. But in some cases it can also depend on the nature of the content itself. The inclusion of geographical information allows end users to fine tune multicast search by using proximity as a search criteria in addition to the keyword search parameters. Figure 2-7 shows the idea behind the database construction. It shows the earth coordinate system and the schematic representation of the geo-tagged database. Earth geographic locations can be addressed precisely using latitude and longitude

coordinates. Latitudes vary from -90◦to +90◦along south-north corridor. Similarly longitudes vary from -180◦to +180◦along west-east corridor. Latitudes are parallel to each other and are equidistant. Every degree separation between latitudes equals 110.9 km in ground distance. The distance relationship between longitudes is not that straightforward because they converge at the poles. This relationship is further complicated as earth is not a perfect sphere.

41 s 2 2 π a4 cos (φ) + b4 sin (φ) × cos φ × (2–1) 180◦ (a cos φ)2 + (b sin φ)2 Equation (2–1) shows the east-west distance between every degree change in longitudes at latitude φ with a = 6, 378, 137 m and b = 6, 356, 752.3 m.

Figure 2-7. Geo-tagged database design

Under the current grid map where major lines being latitudes and longitudes, each being 1◦apart, earth can be mapped into 180x360 grid space. Since almost 70% of earth surface is covered by water, 70% of the grid locations naturally would map to water bodies. Of the remaining 30% of landmass, research shows only 50% of land area is inhabited by human. Therefore, we foresee only 15% of full grid locations to be ever used to group multicast sessions belonging to such grid position depending on their geo-tags. Therefore sparse-matrix implementation of planetary grid seems reasonable.

42 Since a 1◦x 1◦grid at equator represents an area of 111.3x110.9 km2, it might be necessary to further subdivide the area into smaller zones. The grid subdivision or node branching factor “k” determines how a larger grid area is subdivided. The depth of tree and choice of “k” depends on the final areal resolution desired. For instance, if an arial resolution of at most 5 x 5 km2 is desired, and let us say that the branching factor is 2, i.e. k = 2, then a tree with height 5 would result in an areal resolution of 3.48 x 3.47 km2 at equatorial plane. In general, areal resolution at depth “n” for branching factor “k” is governed by these equations below:

110.9 km (2–2) k n

s 2 2 π a4 cos (φ) + b4 sin (φ) 1 × cos φ × × km (2–3) 180◦ (a cos φ)2 + (b sin φ)2 k n Equation (2–2) governs the north-south resolution at tree depth “n” and equation (2–3) governs the east-west resolution at same depth at latitude φ◦. Any session that gets stored at either “global” or “local” databases also keeps a corresponding geo-reference in the “geo-tagged” database. These references are maintained at correct grid location in the level 0 structure and at correct leaf linked-list in the tree rooted at corresponding level 0 grid position. The “garbage-collector” thread while removing stale sessions from “global” and “local” databases removes the corresponding reference from “geo-tagged” database as well. Maintaining this additional structure allows new service paradigms to be supported that was previously not possible. Few such services such as real-time ‘iReporting’ and support for geo-specific and proximity search criteria have already been mentioned earlier. Next let us look at the key algorithms needed to support a seamless user search experience and discuss the modalities of the supported operation.

43 2.3.2 Associated Algorithms

Algorithms play pivotal role in a distributed system. They act as gel in bringing separated components seamlessly together and provide location and failure transparency to end users. In this section we will present key algorithms that aims to make the user’s multicast session search experience a seamless one. In the architecture based on the DHT scheme discussed above, a content provider (a.k.a. multicast session creator) must register its session with a local MSD server. The users generally present search queries to the local MSD server although in some cases they can do a domain specific search. All this works well if the hierarchy remains connected. But in rare cases an upstream domain may leave the hierarchy (gracefully or abruptly), the overall system has to cope with the situation as best as possible. Let us see these algorithms and protocols in some details now.

2.3.2.1 Session registration

Every domain that participates in the “mDNS” service hierarchy has at least one MSD server hosted in the domain. Any session that is created or hosted from within that domain must register the session details with the MSDd server in that domain. Figure 2-8 shows the screen-shot of the session registration tool implemented as part of the ‘proof of concept’ demonstration of this service.

As part of the registration data, the content host must provide a valid location, list of keywords that closely describe the session content, scope of the session and other associated parameters. The MSD server on receiving the registration request, creates a session record for every keyword specified in the request and stores those records under ‘Local Records Database’ regardless of the scope of the session. If the session scope is global in nature, the MSDd server creates a ‘remote-register’

[36] protocol message for each of the keywords and routes the request to remote domains using the hash routing table maintained locally. In case if few such requests are routed to ‘self’ then the MSD server creates a session record for that keyword

44 Figure 2-8. Screenshot - session registration tool

45 and stores it in the ‘Global Records Database’. Figure 2-9 shows a simple registration scenario with session scope set as ‘global’.

Figure 2-9. Session registration

2.3.2.2 Session search

“mDNS” architecture supports global as well as domain specific search. In a domain

specific search, the end user uses the ‘mDNS’ URL for the target domain to pass on the search query to the target MSDd server in that domain. The domain runs the search query only on the ‘Local Session Records’ database at that site and returns only globally scoped multicast sessions, out of candidate sessions, back to the requesting end user. The domain specific search support uses services from URS for ‘mDNS’ URL name resolution. More details on URS are provided in Chapter 3.

The other kind of search support that ‘mDNS’ supports is the global search function. The end user presents the search query to the local MSDd server. Depending on the

search criteria, the ‘mDNS’ hierarchy routes and processes the query and the candidate session details are sent back to the requesting end user. In order to reduce processing load on the servers, the search aggregation is left for the end user’s search tool to

perform. The details on this type of search is given next.

46 Figure 2-10. Session search

Figure 2-10 shows the general scheme behind global search support in the architecture presented. The end user makes the session search query to the domain local MSDd server. MSD server parses the query and if the scope of the search is ‘administrative’ only, then only the ‘Local Session Records’ database is searched and the matching sessions are returned back to the requesting party.

However, things get somewhat complicated in the case of global search. A naive

way would have been to flood the query to all participating domains. Because of the DHT tree structure and keyword routing using hash values, the search is more efficient. MSDd parses the query string and transforms a single search query into multiple

‘msd-probe’ protocol messages [36] for each unique keyword present in the search

47 query. The search queries are sent asynchronously and in parallel in order to reduce possible delays due to timeouts and various other network related artifacts.

Algorithm 3: MSD search algorithm

1 begin 2 {Incoming: Search query from end user in the self domain } 3 if search scope set as administrative then 4 {Query Local Records database only, filter administratively scoped sessions } 5 {If needed, cross-reference Geo-DB database and return the candidates as result } 6 end 7 if search scope set as global then 8 foreach keywords in search query do 9 if keyword found in redirection cache then 10 {send ‘redirect’ with necessary connection details to search } 11 end 12 else 13 {send ‘msd-probe’ with target keyword towards target remote MSD } 14 {route ‘msd-probe’ using keyword routing table } 15 {wait for ‘msd-probe-reply’ from remote servers } 16 end 17 end 18 if ‘msd-probe-response’ message is received then 19 {enter keyword and the MSD connectivity information into redirection-cache or inversion-cache } 20 {send ‘redirect’ with new connectivity details to the search client } 21 end // after receiving ‘redirect’ message, the search client is supposed to initiate ‘ext-search’ protocol exchange sequence with the target MSD server 22 // clients can send ‘invalidate’ back to the MSD server if the remote server no longer maintains session records for the requested keyword, in that case the MSD invalidates the stale cache entry for that keyword and sends ‘msd-probe’ again to refresh the state entry. 23 // if client decides that the remote MSD server is down it can request the MSD in its local domain for backup server details, that the server finds out by sending ‘msd-probe’ with bit inversion set to TRUE 24 25 end 26 end

Algorithm 3 shows what happens when a search query is received at the MSDd from a search client in the same domain. Let us take a look at what happens when the search query comes from an external search client. In that case, only globally scoped sessions are searched - because sending administratively scoped session details to a

48 search client in the remote domain is pointless. Algorithm 4 shows what happens when an ‘ext-search’ message is received at the target MSDd server.

Algorithm 4: MSD external search algorithm

1 begin 2 {Incoming: Search query ‘ext-search’ from end user in an external domain } // only globally scoped session are returned 3 4 set keyword hash ←− hash(keyword) 5 if keyword hash lies within assigned hash space then 6 {search ‘Global Session Records’ database for candidate sessions } 7 {if needed cross-reference with ‘Geo-DB’ database } 8 {send qualifying session records as search response back to the remote client } 9 end 10 else 11 {return ‘ext-search-invalid’ message to the remote client } 12 end 13 end

For search operation to work properly, the ‘mDNS’ hierarchy must remain connected. But domains may end their participation in ‘mDNS’ architecture or may crash for unknown duration. Some of these domains could have child domains below them and their failure will leave the children nodes with no mechanism to forward the messages using keyword-routing scheme to the next higher layer. Let us now look at a failure recovery strategy that deals with this very problem.

2.3.2.3 Recovering from parent node failures

A parent node periodically sends a heartbeat message to all its children over the CMCAST - child multicast channel or via unicast, if some children are not able to receive multicast messages from it. If a child node was initially subscribed to receive parent’s communication over PMCAST - parent multicast channel (PMCAST at child node is same as CMCAST at parent’s node) and is not receiving any parent messages for a set number of consecutive timeouts, it tries to contact the parent node by unicast to inform it to switch its communication to unicast. If this process fails, or if already it was subscribed to receive parent’s communication over unicast channel and did not get any heartbeat message for a consecutive set number of timeouts, it initiates parent node

49 failure recovery algorithm. As part of the parent’s heartbeat messages, the child node gets the hash space assigned to the parent node. It uses this knowledge to find a still alive ancestor node in the higher up hierarchy. In the face of node failures, one might

ask this question, how would one recover the stored records now rendered inaccessible by the loss of this failed node? The shadow copy stored at a location determined by bit inverting the keyword hash would allow the end user to locate the inaccessible records stored at the failed node. Even with the possible hash space reassignments later (in case the node failure is substantially prolonged), the shadow copy, with high probability, will exist in the hierarchy. Additionally, each session registration may have to be refreshed with a set periodicity by the session originators. This would aid a session registration details to reinserted and be discoverable by end users in case of node

failures and topographical changes in the global hierarchy. This approach is consistent with recovery approach adopted by several popular DHT schemes [37][38][39] but has not been incorporated in the proof of concept implementation neither in our proposed

IETF RFC [36] yet. Figure 2-11 shows the sequence of events after a node failure leading to temporary grafting of the child node at an appropriate ancestor node in the hierarchy. Algorithm 5 describes what happens in a parent node failure situation.

If the hierarchy root domain fails, then each of the children node will have no temporary ‘graft’ option. After some period each of them will assume root responsibilities

and the hierarchy will deteriorate into disconnected forest. It is essential to provide sufficient redundancy at the root level in the form of multiple backup MSD servers running at any given time to prevent such a scenario from realization. In a rare scenario, simultaneous URS and MSDd failure can also result in a failed domain even if that domain has multiple backup MSD servers. Such a scenario should be prevented at the root level at least.

50 Figure 2-11. Parent node failure recovery strategy

51 Algorithm 5: Node failure recovery algorithm

1 begin 2 {No parent heartbeat received for ‘n’ consecutive timeouts } 3 if subscribed to receive parent’s communication over PMCAST then 4 {send request to parent to receive communication over unicast } 5 if connection re-established then 6 {proceed to function normally } 7 end 8 else 9 {proceed to else section of outer ‘if’ } 10 end 11 end 12 else 13 {send [tracer] request to the root node } 14 {upon receiving graft details, initiate grafting } 15 {periodically keep pinging original parent node } 16 if original parent comes online later then 17 {detach from the temporary graft location } 18 {resume regular operations } 19 end 20 end 21 end

2.4 Related Work

We have so far seen the construction and maintenance of the tree DHT structure and using such a structure for aiding in a seamless multicast session search and discovery by an end user. Let us take a brief look into some of the other competing multicast search strategies and peer-to-peer DHT schemes.

2.4.1 Multicast Session Search Strategies

2.4.1.1 mbone sdr - session directory

Traditionally “sdr” [40] has been used to create as well as broadcast multicast session information to all parties interested. “sdr” uses SDP and SAP for packaging and transmitting these multicast session information on a well known globally scoped multicast channel, sap.mcast.net (224.2.127.254). But‘sdr” has numerous limitations. The bandwidth restrictions enforced on SAP causes significant delays in session information reaching remote hosts. Also, every receiver must constantly listen to periodic announcements on sap.mcast.net and “sdr” clients multicasts the session details for

52 the duration of the sessions, the approach generates tremendous amount of traffic in the network. Clearly “sdr” approach is not scalable as the number of content providers increase.

Another problem with “sdr” and its underlying SAP implementation is caused due to announcements burst. The delay between burst cycles are greater than multicast routing states timeout period. This is caused due to default bandwidth restriction of 4000bps in SAP. This leads to unnecessary control packets being sent in the network recreating the already timed out multicast distribution tree in the core network.

2.4.1.2 Multicast session announcements on top of SSM

In their published work titled “ Multicast session announcements on top of SSM” [41], authors have tried to address some of the issues in “sdr”. They proposed a multi-tier mesh of relay proxy servers to announce multicast sessions using SSM to interested recipients. In their approach, every network operator that provides SSM service also runs a SAS (Session Announcement Server). They propose relaxing the bandwidth limit of SAP in local networks to a higher bandwidth limit. Further each such SAS server links to the level 2 SAS server that runs in the core network. Every level

2 SAS server is interconnected in a mesh fashion with each other. Such an extensive mesh could cause significant network traffic in the core network with increasing number of level 2 SAS servers deployed. They assume that only a few level 2 SAS servers would be needed in their scheme. Regardless, their scheme still remains a push based scheme and suffers from limitations of SAP. There still remains a significant delay in session information being disseminated to remote hosts (albeit much lesser delay compared to sdr). Their scheme also transmits the complete session details to every SAS server in the hierarchy on a periodic basis causing unnecessary network traffic.

Administrative burden is increased in this scheme as well, as every level 2 SAS server must be fed the connection details of every other level 2 SAS servers.

53 2.4.1.3 The next generation IP multicast session directory

In their published work titled “The next generation IP Multicast Session Directory”

[42], the authors analyzed drawbacks of “sdr”. They analyzed announcement delays caused due to bandwidth restrictions in SAP. They found out that on average, minimum announcement interval equals 5 minutes. Taking into account packet loss over the internet, they conjectured, users potentially would need to wait 10 or more minutes to build the sessions list. They proposed an architecture using SDP proxy that supposedly is online for much larger intervals of time and builds session list over period. End users can now directly contact the nearest SDP proxy to get the session list. The problem still lies with the delay involved for a newly created session to be discovered at remote user. Announcements from one SDP proxy to another is still rate limited. By the time a short duration session comes online and is discovered by a remote user, potentially that session could already be over. SDP Proxies are not suitable for transient and short lived sessions.

2.4.1.4 Harvest

Harvest [43] architecture and protocol suite was developed by researchers to enhance information gathering and indexing from disparate sources over the internet in order to reduce network load. Although the original purpose of Harvest has been very different but the architecture and the protocol suite can be modified slightly in order to serve as a multicast sessions discovery architecture. The Harvest architecture uses multiple “Gatherers” that reside close to information source and it interfaces with multiple “Broker” applications that provides a standard query interface. It uses replicated caches based on eventual consistency. Although if modified, Harvest would not suffer from bandwidth restrictions of SAP but the eventual consistency model could cause problems for short lived sessions created without preplanning. Eventual consistency model could also render several session undiscoverable for significant duration of time. Further, replicated caches would result in duplicates being created wasting resources.

54 2.4.1.5 Layered transmission and caching for the multicast session directory service

In this published work [44] the authors in the effort to enable layered multimedia transmission to receivers with varying capabilities, proposed modifications to “sdr” as they proposed a two stage session directory service, a persistent server that caches SAP announcements and ephemeral client that contacts the server to get the sessions list thereby reducing the long latency associated normally with “sdr”. Usual problems with “sdr” still persists, users have to browse through a long session list in order to find a session of interest.

2.4.1.6 Towards multicast session directory services

In their article titled “Towards Multicast Session Directory Services” [45], the authors have reflected on the limitations of session directory based on Session Description Protocol and Session Announcement Protocol. They have argued that although “sdr” approach is not scalable but the session discovery can be made better by standardizing the additional attributes in SDP so that it can be organized and indexed in a separate server that would provide Multicast Session Directory Service (MSDS) to end users. These MSDS servers then can disseminate information on a well known single multicast channel or multiple theme based multicast channels to the end users. 2.4.1.7 IDG: information discovery graph

The Information Discovery Graph (IDG) [46] that has been developed as part of Sematic Multicast project strives to provide a self organizing, hierarchical distributed cache where multimedia sources register their content information and the topical managers intelligently determine where in the hierarchy to store the session information.

Their approach still makes use of SAP like periodic announcements which is a waste of bandwidth even though these announcements are mainly on the content managers hierarchy information. It is still not clear how IDG enables end users to perform multicast session search based on multiple keywords. How is the additional network hardware

55 automatically commissioned if workload increases at a particular manager has not been specified. The authors have specified several ongoing and future research areas pertaining to IDG and once they are incorporated, it could turn out to be a viable

alternative to above mentioned approaches. Our proposal goes several steps forward and even allows users to perform searches based on geo-specific criteria and allows them to bookmark their favorite sessions just like one can bookmark a popular webpage these days. 2.4.2 Peer-2-peer DHT Schemes

Distributed Hash Table (DHT) schemes allow a faster and structured lookup of resources in a distributed peer-to-peer network. Some of the more popular DHT

schemes are based on circular arrangement of host nodes ex. chord [47], [37] and bamboo [48], distributed mesh arrangements as in [49], hierarchical

structure such as Kademlia [38] and spatial DHT as in CAN [39] where routing is done using cartesian coordinates. The P2P DHT schemes allow for scalability, self organization, and . Yet they may suffer from issues resulting from churn [50] and non-transitivity [51] in connections among participating nodes. Researchers

have even proposed unstructured DHT [52] overlays that provide benefits of structured DHTs. Let us now briefly look at some of these DHT schemes. In Chapter 5 we will discuss reasons to develop our own DHT scheme and explain why we chose not to use current DHT schemes for mDNS architecture.

2.4.2.1 Chord

Chord [47] is a distributed P2P architecture that allows users to store keys and associated data in an overlay. Given a key it provides service that maps that key onto an existing node in the overlay. Each node maintains information about routing keys to appropriate node in its local finger table. Finger tables are constructed based on local interaction among participating nodes and any node need not know the global state of the overall system. The routing table (finger table) size in a stable state grows at

56 O(logN) for a N node overlay. The routing is based on the successor relationship, as long as a node knows its predecessor in the key space, any node can compute what keys are mapped onto it.

2.4.2.2 OpenDHT

OpenDHT [53] is a free and shared DHT deployment that can be used by multitude of applications. The design goals focus on adequate control over storage allocation mechanism so that each user/application gets its fair-share of storage and somewhat

general API requirements so that the overlay can be used by a broad spectrum of applications. It provides a persistent storage semantics based somewhat on the

Palimpsest shared public storage system [54]. The implementation provides a simple put/get based API for simple application development and a more sophisticated API set called ReDiR. The main focus in this DHT scheme is starvation prevention and fair allocation of storage to applications. Competing applications’ keys are stored using unique name-spaces assigned to each application. Keyword routing in OpenDHT is tree

based and is done hierarchically. Details can be found in [55]. 2.4.2.3 Tapestry

Tapestry [49][56] is a DHT overlay where routing is done according to the digits in the node address. At each routing step, the message is routed to a node whose address has a longer matching address prefix than the current node. The routing scheme is very

similar to scheme presented by Plaxton [57] with support for dynamic node environment.

In their scheme they propose using salts to store objects at multiple roots thus improving availability of data in their scheme. They use neighbor maps to incrementally route messages to destination ID digit by digit. The neighbor map entry in tapestry has space

complexity O(logbN) where ‘b’ is the base for node IDs. Tapestry scheme uses several backpointers to notify neighbors of node additions or deletions. Several successful applications that use tapestry for message routing have been developed. Notable

57 among them are OceanStore [58] which is a wide-area persistent distributed storage system meant to scale the globe and Bayeux [59], an application-level multicast protocol

2.4.2.4 Pastry

Pastry [37] is an application layer overlay developed in collaboration with Rice University and Microsoft Research. Each node in pastry is assigned a unique 128 bits ID that indicates its position in the circular ID space. Every pastry node maintains a routing table, neighborhood set and a leaf set that helps the overlay to deal with intermittent node failures. Neighborhood set contains predefined number of nodes that are closest to the given nodes based on some set proximity criteria. Whereas a leaf set contain nodes whose nodeIDs are closest to the current node’s ID. Neighborhood set is not used in routing but used to guarantee locality bounds in routing. The routing table has dlog2b Ne entries with 2b − 1 entries in each row. ‘b’ is a configuration parameter typically set to 4 by the authors. The routing scheme is prefix based and is very similar to one adopted by

Tapestry [49][56]. Several successful applications have been developed that use pastry as their routing base. Notable among them are PAST [60] and SCRIBE [61]. A global bootstrapping service [62] for any application layer overlay has also been proposed that uses pastry as its routing base. 2.4.2.5 Kademlia

Kademlia [38] has several attractive features compared to other application overlays. It minimizes the number of configuration messages that nodes must exchange in order to find out about each other. It uses XOR of nodeIDs as a measure of distance between two nodes. Because of symmetric nature of XOR, nodes participating in Kademlia overlay learn useful routing information from the keyword queries received. Other DHTs lack this ability. Additionally, Kademlia uses a unified routing algorithm from beginning till end regardless of proximity of intermediate nodes to the target node. This simplifies the routing algorithm quite significantly. Nodes are treated as leaves in a binary tree where each node’s position in the tree is determined by the shortest unique

58 prefix of its ID. Routing of queries to their destination proceeds due to the fact that each node knows a node in a series of successively lower subtree where this present node where the query arrived does not lie. This results in a query being routed in logarithmical number of steps. Routing table in Kademlia is arranged in k-buckets of nodes whose distance lies between 2i and 2i+1 from itself for 0 ≤ i < 160. ‘k’ is a design parameter which the authors chose as 20. The routing table is itself logically arranged as a binary tree where each leaves are k-buckets. Each k-bucket covers some range of the ID space and together they cover the entire 160 bit ID space.

2.4.2.6 CAN: a scalable content addressable network

CAN [39] is an overlay where node space is a d-dimensional coordinate space. The coordinate space at any time is completely partitioned among all participating N nodes. Each key in CAN is mapped to a coordinate in the CAN coordinate space and thus is mapped to the node managing the space within which this key lies. Routing is done by forwarding message to the neighboring node whose coordinate is closest to the destination coordinate. A CAN node maintains a coordinate routing table that holds virtual coordinate zone of each of its immediate neighbors only. If there are ‘n’ nodes that divides the whole coordinate space into n equal zones, then average routing path length in CAN is (d/4)(n1/d ) hops and individual nodes maintain 2d neighbors for a d-dimensional coordinate space. The authors propose using multiple ‘reality’ along with multiple peers in each zone and multiple hash-functions for routing optimizations and improving the overall availability of data in their scheme.

2.5 Conclusion

This chapter provides a detailed discussion of the DHT scheme and the keyword routing scheme that forms the backbone of multicast session search and discovery in

“mDNS”. The scheme presented is adaptable to changes in topology with a major goal to distribute storage evenly across all participating domains and reducing the routing

59 delays to reach to the target destination. The simulation results and analysis of the structure is presented later in Chapter 5. This chapter presents necessary algorithm for failure recovery, searches and other necessary operations supported in “mDNS”.

As soon as a multicast session is registered at any domain, the details are routed to appropriate destination in the hierarchy using keyword routing thereby immediately making the session discoverable by end users. Use of geo-coding adds an extra dimension to search that proves useful to end users who may be interested in finding contents that are either locally hosted or are on a regionally significant topic.

The architecture, as it has been implemented in application layer as an overlay achieves independence from lower layer details and is incrementally deployed. Even if a ‘mDNS’ domain is not linked to the global hierarchy, it can still provide valuable directory services to the domain’s local end users. In conjunction with URS it can allow users to search and bookmark their popular multicast contents for later viewing.

60 CHAPTER 3 TACKLING USABILITY

3.1 IP Unicast vs Multicast

One of the prominent reason for higher deployment and end user demand for IP unicast as compared to multicast is the ease of use associated with unicast. IP unicast has unique IP addresses assigned to networked hosts. And for many hosts on the Internet that provide services of some form to end users these addresses are long term assigned (static). That allows for mnemonics to be assigned to dotted-decimal IP addresses. With a global translation service an end user needs to know the mnemonics only to access a desired resource such as an HTML page or access his email in-box. This global name translation between mnemonics (or URLs / domain names) and the dotted-decimal IP addresses is provided by Domain Name Service (DNS) [32]. Use of domain names and URLs have made the Internet more usable for end users. Most of the content made available to the end users on the Internet are static and are made long term available. A web page hosted somewhere will most likely be found at the same location for many weeks to come. This quasi-permanence of the data allows search engines like Yahoo and Google to crawl the web and index the content. These web indexes can be searched by end users to locate contents they desire. Existence of web indexes and DNS service have, without arguments, made the Internet a much more usable technology today compared to its wee days.

The scenario for IP multicast is totally different. Group addresses for content delivery to interested users are not permanent. Further the content stream transmitted over such multicast groups are typically very dynamic in nature. Generally speaking, IP multicast traffic is not crawler technology friendly and the transient nature of the session makes indexing almost impossible. Content discovery similar to that provided by web search engines that would allow end users to locate a session of interest is non existent in the Internet. The approach provided in Chapter 2 addresses that.

61 Can a usable mnemonic be assigned to transient multicast address? There are several challenges that have to be tackled before such a solution can become feasible. Since there is no directive for use of certain groups of multicast addresses, a content provider using multicast technology could change the group address as he wishes. The name translation service must address such a scenario so that the mnemonic directs the end users to the most up to date session details. In this chapter let us look in details on the proposal that aims to solve the issue of mnemonic assignments to multicast groups. Such a scheme along with content search and discovery capability for multicast contents can help improve the multicast usability significantly.

3.2 Domain Name Service

Domain Name Service is the hierarchical distributed database that provides name resolution service to end user applications like email clients, browsers and several others. Use of mnemonics such as URLs and domain-names are preferred over dotted-decimal IP addresses. To a network router, the mnemonics provides no aid in determining where to route the data packet. The mnemonic hides any location information. Routers prefer IP addresses since they are hierarchically interpreted from left to right to determine the direction to forward the packet and thus bringing it closer to the destination. DNS provides the translation service from mnemonics to IP addresses thereby allowing the human usable addresses to be mapped to the router usable format. Let us look at the details of the DNS service.

3.2.1 DNS Hierarchy

At the very top in the hierarchy are the 13 root servers named A through M. They are named as letter.root-servers.net. Below the root level lie several top level domain (TLD) servers. These include TLD servers for com, org, edu and all the country TLDs. Below the TLDs are several organizations’ or operators’ authoritative or non-authoritative

DNS servers. Figure 3-1 shows the location and names of the 13 DNS root servers. These servers are replicated for security and redundancy reasons.

62 Figure 3-1. Location and names of DNS root servers [source: ICANN]

3.2.2 DNS Name Resolution

The DNS name resolutions begins when an end user application sends a DNS resolution request to the local DNS client. The initial DNS server connection information is fed automatically in many case through DHCP [63][64] to the client’s machine. The client side DNS server asks the root servers for the address of the respective TLD DNS server. The TLD DNS server has an entry that points to the authoritative DNS server for the domain that has to be resolved. The local DNS client then queries the authoritative DNS server and gets the IP address of the mnemonic (domain-name) to be resolved.

The DNS name resolution is done using both recursive and iterative resolution. Resolution proceeds iteratively until the query reaches the authoritative name server and if there are local servers below that, the resolution proceeds recursively until the address record is located and sent back to the requesting DNS client. A DNS server maintains several record types in its internal database. Let us take a brief look at some of the common records that are stored as part of the DNS database.

3.2.3 DNS Records

DNS records are identified by their record types. Table 3-1 shows the most common record types. Now that we have seen the basics of a DNS server, let us see in details how an URS is designed and how it achieves its intended goals.

63 Table 3-1. Common DNS record types Record Type Name Value A hostname IP Address NS domain authoritative DNS server name CNAME alias hostname Canonical hostname MX alias hostname name Canonical name of mail server

3.3 URL Registration Server

The URL Registration Server (URS) main task is to ensure uniqueness among all registered session identifiers within a particular domain. Further, it also acts as a bootstrapping device for MSD servers running in that domain. The system administrators are needed to set a few configurable parameters in the URS and rest of the components in ‘mDNS’ are self configurable. Just like a DNS server can be replicated for security and redundancy, so can an URS in a domain be as well. The DNS server in a particular domain has an ‘A’ record for the URS. The name ‘mcast’ is used. A typical DNS record entry file may look something like this -

; cons.cise.ufl.edu $TTL 604800 @ IN SOA ns1.cons.cise.ufl.edu. ( 2006020249 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800); Negative Cache TTL ; @ IN NS ns1 IN MX 10 mail IN A 128.227.170.50 ; mcast IN A 128.227.170.43 ;

Considering the above example of DNS settings, one can access the URS using mcast.cons.cise.ufl.edu URL string. If multiple URS servers are maintained at a domain, the DNS server load balancing feature might be used to handle high traffic situations. Let us now take a look at the URS components.

64 3.3.1 URS Internals

Every URS maintains a URS records database. The record member elements are very similar to MSD ‘Local Session Record’ structure shown in Chapter 2 with a few minor differences. The elements are listed here -

• Expiration Time: Time after which session records may be purged from the URS database • URS Identifier: Unique session identifier registered with the URS server (see Chapter 4) • Channel IP: Multicast session IP • Channel Port: Multicast session Port • Source IP: if network type is SSM, this IP determines the content host machine • Fail Over Unicast IP: backup unicast stream source IP address (optional) • Fail Over Unicast Port: backup unicast stream port (optional) • Channel Scope: Multicast stream scope (global/local) • Geographical Common Name: Common name of the place associated with the stream • Latitude: Latitude value associated with the session • Longitude: Longitudinal value associated with the session • Network Type: multicast compatibility of the session’s hosting network (ASM/SSM) • Stream Type: Identifies the nature and type of the multicast stream • Preferred Application: Identifies the suggested application to be used to access the stream type • CLI Arguments: Command Line Interface (CLI) Arguments denotes arguments to be supplied to the preferred application • Mime Type: MIME type of the stream data, it must be one of the IANA registered MIME types

This record helps in ‘mDNS’ URI name resolution process. A URS only maintains records for sessions created in its domain only. The uniqueness in the ‘URS Identifier’ value is only enforced with respect to its own domain. That is, no two sessions created in that domain and registered with the URS will have same ‘URS Identifier’. The content provider in a domain is required to register the session details along with a unique URS identifier with the URS in his/her domain. If in future his/her session’s connection parameters changes, he/she is required to immediately update the URS record. This updation process can be automated in the session management tool

65 provided to the content creator so that if any session parameter changes, the tool automatically updates the local URS.

3.3.2 mDNS Name Resolution

Assuming that each domain has a DNS server running and it has a valid FQDN assigned to the network domain, one can construct a unique URI for every multicast session that has a unique identifier registered with the URS. The URI will be relative to the URS server’s URI. For example, if the FQDN for a domain is dom1.somenetwork.example,

and the URS server has an ’A’ entry in the DNS server with value ’mcast’, then the URI of the URS becomes mcast.dom1.somenetwork.example. Furthermore if a multicast

session creator has registered a unique identifier ‘channelx’ with this URS server, the URI for his/her multicast stream in this architecture becomes - mcast.dom1.somenetwork.example/channelx

In every participating domain there must be a URS installed and operational. Now let us see how when an end user accesses a book-marked multicast session, the architecture is able to resolve the URI and let them access the multicast stream. Figure 3-2 shows the steps involved.

Let us say the user is trying to access a multicast video stream that has an ‘mDNS’ URL mcast.ufl.edu/gators. The base URL mcast.ufl.edu string is resolved using the standard DNS name resolution algorithm. The name resolves to the URS operating in the ufl.edu domain. The end user client software then requests the URS for the record associated with the identifier gators. The content provider has already registered the session details with the ‘URS-identifier’ set to gators and so the URS locates the relevant record and sends it back to the end user. This record has all the necessary parameters needed by the multicast stream receiver to join the relevant session. If the record gators was not found at the target URS, the name resolution would have failed. In that case a ‘Resource Not Found’ type error message will be displayed at the end user’s machine.

66 Figure 3-2. Typical steps in ‘mDNS’ URI name resolution

3.3.3 Additional Usage

As mentioned earlier an URS is also used as a bootstrapping device. The system administrator configures a few parameters while setting up the URS. These parameters include -

• PMCAST : The parent’s multicast communication channel • CMCAST : The children’s multicast communication channel • supported IGMP version • ‘mDNS’ parent domain’s URL string MSD servers use these parameters to set up necessary communication channels in order to join the ‘mDNS’ service hierarchy. Parent domain’s URL string is needed in

67 case a fail-over configuration has to be set up in case communication with parent or children domains fail using multicast channels. Apart from this, a URS may sometimes also act as MSD leader election facilitator.

3.4 Conclusion

The main goal of URS is to enable users to access and bookmark their favorite multicast sessions using mnemonics similar to domain names and web site URLs with which they are familiar already. The URS enforces uniqueness among the registered identifiers and thus enabling the architecture for supporting ‘mDNS’ URLs. The URL strings are created relative to the URL associated with the URS. This whole scheme is very closely intertwined with the existing DNS hierarchy and depends on DNS name resolution protocol for most of the resolution process. Along with the DHT scheme unveiled in Chapter 2, the capability to assign URLs to multicast streams, even the most transient ones, can help lift the usability roadblock associated with the multicast. This can help in easy adoption by end users and its seamless integration in modern networks on the same scale as unicast. In Chapter 4 we will focus on overall system integration and see how MSD and URS work together and achieve the tasks identified initially.

68 CHAPTER 4 BRINGING USABILITY AND SESSION DISCOVERY TOGETHER

In the previous two chapters, Chapter 2 and Chapter 3, we have seen two different issues facing IP multicast acceptance by an average end user, namely the ability to locate relevant multicast streams and secondly, a convenient mechanism to remember, bookmark and access a favorite stream in future. Chapter 2 dealt with a structured proposal that allows an average user to locate a multicast stream along the similar lines of keyword based web searches. Chapter 3 proposed a mechanism that would allow the multicast streams to be assigned a mnemonic name just like a web-page URLs and domain names. Use of mnemonics will greatly improve the recall-ability of stream names as compared to unusable network IP addresses typically assigned to such streams.

Those two chapters dealt with the issues in isolation to each other. In this chapter we will present the complete system architecture that merges the resources described in Chapter 2 and 3 into a seamless global system that improves the overall usability of multicast technology.

4.1 Revisiting Objectives

Before we delve deeper into integration of search and assignment of URLs to multicast sessions lets refresh on the goals that this dissertation started out with. We want to develop a structured, globally scalable, distributed service architecture that allows end users to seamlessly search and discover desired multicast sessions, bookmark sessions for later use in such a way that even if the multicast parameters change later on, the user bookmarks would remain valid. From a content host perspective, the system goal is set out to minimize latency between session creation and its discoverability by the users. In Chapter 2 we discussed a tree DHT scheme and described how keyword routing is done within the DHT structure. We described how such a scheme allows fast session searches. In Chapter 3 we described a scheme that leverages the DNS hierarchy in assigning URLs to the multicast sessions. Combining

69 the schemes described in Chapters 2 and 3 in a seamless global hierarchy achieves the overall major goals set out initially. The rest of this chapter will be focused on describing the integration of the two solutions.

4.2 Integrating ‘mDNS’ DHT and URL Scheme

Now that we have looked at the MSD server and URS in isolation in Chapters 2 and 3, let us see, when combined, how does an ‘mDNS’ domain setup looks like and let us analyze the overall global hierarchy that emerges out.

4.2.1 A Complete Picture

Figure 4-1. A typical mDNS domain components

Figure 4-1 shows typical components that makes up a ‘mDNS’ domain. Each domain has a DNS server setup and one of more replicated URS. If the URS is replicated for load balancing purposes, it is achieved via DNS load balancing feature.

The URS port number is assumed fixed and well known and possibly IANA assigned value. The domain can also have one or more MSD servers running. Other MSD

70 servers if present are there for fault tolerance. One out of many possible MSDs is selected as designated MSD server of that domain. Communication among MSD servers running in the same domain depends on the kind of multicast supported in the network. If ASM mode is supported then intra-domain MSD servers communicate over MSD-LOCAL-MCAST administratively scoped channel. This channel is also assumed well known and possibly IANA assigned. If only SSM multicast mode is supported, the communication among intra-domain MSD servers revert to unicast to URS as the send channel and relayed back using SSM channel to all MSD servers in that domain. The channel would then become (URS-IP, MSD-LOCAL-MCAST) [using (S,G) notation].

As mentioned in Chapter 3 the URS acts as bootstrapping mechanism for MSD servers. The system administrator needs to configure PMCAST, CMCAST, network’s

IGMP support, and the parent’s domain URL at the time of URS startup. PMCAST is the globally scoped multicast group on which this domain receives communication from its parent domain. If ASM mode is supported then any communication to the parent can be sent using this channel otherwise the communication upstream must be done through unicast. PMCAST value of a particular domain is same as CMCAST value in the parent domain.

CMCAST is the globally scoped multicast group over which a domain sends communication to its children domains. If ASM mode is supported then the child node can communicate back to this domain over the same group otherwise they must use unicast to communicate upstream. CMCAST value of a particular domain is same as PMCAST value in any child domain.

Apart from hard-coded configuration parameters, URS also maintains several soft state parameters. Important among them are -

• IP address of the parent’s domain MSDd server • IP address of MSDd server in self-domain

71 These learned values are updated if situation warrants it and these values help in proper fault recovery and other functioning of the system.

4.2.2 System Setup in Various Network Environment

Depending on the nature of supported multicast in the network, the ‘mDNS’ global hierarchy takes on different configurations. Figure 4-2 shows the communication overlay in the ASM network scenario. Since CMCAST channel of parent domain is same as PMCAST channel in the child domain and flow of communication is allowed along both parent-to-child and child-to parent paths, parents and all children domains join the common multicast channel for communicating with each other.

Figure 4-2. A typical mDNS hierarchy in ASM network

The ‘mDNS’ structure is capable of operating in a mixed multicast environment as well. A network domain that supports both ASM and SSM multicast mode of operation and supports both (S, G) and (*, G) joins as well as deploys any required supporting

72 protocols such as MSDP and RP-discovery and can act as glue between networks that support only SSM or only ASM mode of operations. Such a network that can act as glue is said to be operating in a hybrid environment. Figure 4-3 shows a scenario where two hybrid multicast networks are shown as connecting disparate multicast networks.

Figure 4-3. A mDNS hierarchy in mixed network operation mode

The domain’s URS helps decide what sort of multicast mode the MSD server will operate in. The inclusion of parent’s domain URL string allows the URS to contact the parent domain’s URS and get relevant network support information. In case where the multicast communication between parent and child is not possible using multicast then after a preset communication timeout (soft-state refresh), a unicast link is setup between the two domains. Thus in scenarios where no hybrid network type exists and there is no consistent network support for multicast, the communication hierarchy will degenerate gracefully to unicast links between parent and children domains. Let us now see in some detail how caching is used in ‘mDNS’.

73 4.3 Use of Caching

The delay incurred in resolving the target MSD connection details before sending a ‘redirect’ to the search client can be significantly reduced if the target MSD information for a popular keyword is cached locally. Let us justify why MSD connection details are suitable for caching.

In mDNS, once the hash-space allocation and hash-routing construction phase stabilizes, the MSD connection details become stable as well. Unless many domains join and leave the mDNS hierarchy in an arbitrary fashion, the hierarchy as well as hash space allotment remains stable. One way a target MSD may change even if the hierarchy itself is stable, is if the designated MSD server fails. In this case if a backup MSD server is running, it will soon become the designated MSD server (after a fresh leader election) and thus the IP address will change. But we expect such cases to be very rare. These arguments make MSD connection details an excellent candidate for caching.

With caches in place, when an end-user requests a keyword search for multicast sessions, the domain-local MSD server checks the cache. If there is a cache hit, then it immediately sends the cached connection details for the target MSD server to the requesting end-user. The end-user tries to connect to the remote target MSD server; if it succeeds, the delay incurred is reduced significantly. If it fails, most likely due to changed connection information in the target domain (due to primary MSD server failure), or if the target domain is not responsible for the keyword due to more recent hash space reassignments (likely caused due to network topography changes), the end-user prompts the domain-local MSD server to invalidate the stale entry. The original two-pass protocol is then used, which refreshes the stale entry and the process continues from there.

74 Table 4-1. Typical cache structure keyword access time freq score ip:port gators 1249607331102 534 247.00424 abc:q football 1249607331102 61 57.804245 def:w nfl 1249607377712 712 318.67035 ghi:e beach 1249607339173 11 37.884953 jkl:r

Caching Strategy

In order to maximize the benefits of Least Recently used (LRU) [65][66] and Least

Frequently used (LFU) [65] caching strategies, we used a hybrid caching strategy. Table (4-1) shows typical cache entries in our hybrid caching strategy.

Assume the current system time is 1249607590678 and the timeout value is 3600000

milliseconds (10 hours). The above table entries correspond to values calculated using α = 0.4. The score component for any cache entry is computed using · ¸ timeout (t t ) (freq) + (1 ) − curr − last−access α × − α × 60000

0 ≤ α ≤ 1

If α equals 1 then the cache operates under the LFU scheme, and if α is 0, the cache operates as an LRU cache. The cache entry with the lowest score value is selected for replacement (if needed). In the above table, timeout is a configurable parameter that determines the soft timeout for any cache entry. tlast−access is the time when the cache entry was last accessed or updated (not counting the current access that modifies the entry), and tcurr is the current system time when this entry is being accessed. After computing the score, the tlast−access is replaced by tcurr value and freq value is updated. The use of caches should reduce the query time significantly compared to a situation where no cache is used.

Now that we have seen global system integration in some detail, let us understand

how a domain specific search is supported in ‘mDNS’. Later we will find out in what

situations the ‘mDNS’ services to an end user may deteriorate of even fail completely.

75 4.4 Domain Specific Search

We talked about session searches based on keywords and geographical constraints in Chapter 2. It was mentioned earlier that ‘mDNS’ supports domain specific search as well. Domain-specific search is used to locate sessions that are hosted or originating from within a desired domain. A domain specific search query can come from users both external and internal to the domain and this can be easily inferred by the search subsystem using the remote IP address and its own network mask and IP address. If a query comes from an external client, only candidate sessions whose scope is global in nature is returned back. In the case of a local query, both global and administratively scoped candidate sessions can be returned.

A user must specify what domain to search using the ‘mDNS’ URL string. The domain URL is resolved first using the DNS and URS name resolution algorithm described in Chapter 3. Once the URL has been resolved, the client can query the URS of the remote domain and get the remote MSDd connection details. Then it can query the MSD directly providing the search string and wait for the results to be returned (if any).

4.5 Managing Faults

MSD servers, URS and DNS infrastructure play significant role in the overall operation of session discovery and usability. Failures arising in any component can lead to degradation in service as perceived by the end users. Let us now take a closer look on the kind of failures and their impact on the overall quality of service.

4.5.1 Failure in Portions of DNS Infrastructure

DNS infrastructure plays an important role in ‘mDNS’ URL resolution. Although DNS is a distributed service with redundancy built in at higher layers, let us consider failure scenarios and their implications now.

• scenario 1 - failure at the root level: root level DNS servers are highly replicated for load balancing and redundancy reasons and since the TLDs are very frequently visited, it is very likely that the TLD DNS server connectivity details are cached at

76 lower level DNS servers. Mostly in name resolution the root servers are bypassed entirely. Failure at root level should not impact ‘mDNS’ resolution process terribly. • scenario 2 - failure at the TLDs: failure in the TLD could cause problems in the name resolution process unless the authoritative DNS details are cached at lower level DNS servers, in which case the TLD can be bypassed. The caching at DNS servers depends on the frequency of visits to a particular domain. Typically a DNS cache entry is set to expire in 48 hours. Therefore unless a particular domain is visited often the entry would not be present in the local DNS server and thus ‘mDNS’ name resolution process will fail. • scenario 3 - failure along the resolution path: failure in the DNS server along the resolution path would also disrupt the name resolution process. But typically many domains maintain a primary and a secondary DNS detail so an alternate resolution path can be used. If the path converges before the failed link in the resolution chain, the overall resolution process will fail as well. • scenario 4 - failure of the authoritative DNS server: this will most likely lead to failure as the IP mapping of the URS is maintained as an A record at the authoritative DNS server.

The DNS failures are generally very rare. In the past there were some TLD poisoning attacks but they were largely ineffective because of caching and replication of the TLD DNS infrastructure. 4.5.2 Failure of URS

The immediate implication of URS failure at a particular domain is that all the sessions ‘mDNS’ URLs registered at that domain’s URS will become inaccessible. The resolution process that should ideally resolve to multicast session parameters so that the end user is able to start accessing the multicast stream will not be achieved. But such a failure will not impact resolution of any multicast session registered in any other ‘mDNS’ domain. URS server failures can be tackled easily by providing replication. That can also serve as a load balancing strategy by using the IP rotation feature of administrative DNS server.

URS failure can also affect normal ‘mDNS’ functionality in another way. Since

URS maintains parent domain’s URL string and therefore is able to query the parent’s URS for details such as IP address and port details of the MSDd server in the parent’s domain. These details might be needed in case the MSDd server is unable to receive

77 communication from the parent domain over multicast and as a fall-back option would try to switch over to IP unicast channel with the parent domain. If the domain’s URS fails, and the MSDd fails after URS failure, the backup MSD server would try to assume the

task of the MSDd and would not be able to establish unicast channel with the parent domain. The new MSDd server would soft state timeout and would try to initiate ‘mDNS’ domain failure algorithm to find a suitable grafting location with some ancestor higher up in the hierarchy tree. 4.5.3 Failure of MSD Server

Failure of MSD server can impact a normal ‘mDNS’ operation if there are no backup MSD servers to take up the responsibility of the failed MSDd server. If that happens then all children domains will soft timeout and would initiate domain failure recovery algorithm to find a suitable grafting point at some ancestor domain higher up in the hierarchy. Failure of an MSDd server would not affect the ‘mDNS’ URL resolution as long as the domain’s URS is operational. The stored session records in the MSD database will become inaccessible, but globally scoped multicast sessions records’ would still (likely) remain searchable as a shadow copy is saved at another location in the overall hierarchy. All the administratively scoped multicast session records would become inaccessible through search to end users within that failed ‘mDNS’ domain. 4.6 Goals Achieved

In Chapter 2 we gave a list of design goals that the service architecture proposed in

this dissertation, intends to achieve. Let us see how far this has been achieved. 4.6.1 Global Scalability and Distributed Design

Many of the earlier proposals for multicast session including ‘sdr’ had a flat structure. The session records detail were being propagated to every sdr client active in

the Internet. This worked fine when overall number of sessions were small. If multicast has to gain the same level of acceptance as unicast, the number of multicast sessions

will increase exponentially in which case sdr and similar proposals will fail to scale.

78 ‘mDNS’ proposal uses a distributed tree DHT structure to distribute the session records over a few number of nodes that can be queried for desired multicast records. The architecture scales well and has a low maintenance overhead (communication).

4.6.2 Existence in Present Network Environment

Network components are expensive and also extremely difficult to upgrade. Network administrators generally do not want to change the core routing infrastructure as it is a cumbersome task. Any new proposal that aims to be deployed fast must be able to work in the existing network environment. Changes in the network stack is specially difficult to effect. ‘mDNS’ architecture is an overlay structure implemented entirely in the application layer. The proposal requires no change in the existing network stack and needs no network hardware upgrades. 4.6.3 Real Time Session Discoverability

sdr and several similar proposals, because of the limitations placed by the SAP/SDP protocol bandwidth limitations, had a very slow rate of session record propagation to the remote sdr clients. From a content providers perspective, this could be frustrating. In the proposed architecture, the session details are routed over the DHT hierarchy immediately thereby making the session details searchable and thus discoverable by end users immediately. This service feature becomes critical for sessions that are extremely transient in nature. ‘mDNS’ achieves this requirement.

4.6.4 Ability to Perform a Multi-Parameter Search

Every multicast session in the proposed scheme can be tagged with up to 10 descriptive keywords and further they are relevantly geo-tagged. The proposed architecture allows an end user to perform multi parameter search and supports all major boolean operators for combining search parameters. The scheme allows users to narrow down search results based on their geographical preferences as well.

79 4.6.5 Fairness in Workload Distribution

The proposed architecture is a collaboration among several independent administrative domains. In a collaborative environment, it becomes important to distribute responsibilities evenly. ‘mDNS’ achieves fair hash-space allotment to participating domains. The division is periodically updated to reflect changes in the global topology. Although we must agree that equitable hash space distribution does not guarantee fair workload distribution. Skewed popularity of some keywords over others would increase the database load on that domain which was assigned the hash space where that popular keyword routes to. Further the communication overhead increases as one goes up the tree. Regardless, the proposal kept workload distribution as a goal during design phase.

4.6.6 Plug-n-Play Design With Low System Administrator Overhead

In ‘mDNS’, the system administrator of a domain needs to set only a few parameters in URS. Other components do not need any administrator involvement. The ‘mDNS’ hierarchy is self adaptive to changes in topology and is able to detect failures and is equipped to recover from intermittent failures on its own. Reducing system administrators’ involvement in the management of the global hierarchy was a major design goal.

4.6.7 Partial and Phased Deployment

A new architecture can not be expected to be universally deployed over a short time duration. The success of the proposal depends on the value added to the Internet even if deployed at a very small scale. As and when the user demand grows, larger deployments will happen and they should seamlessly integrate with the existing deployed infrastructure. ‘mDNS’ can be deployed in phases. A stand-alone deployment in a domain would provide search and bookmark-ability of sessions in that domain and domain specific search capability to global users. With gradual network wide deployment, ‘mDNS’ domains can link up seamlessly and manage the DHT on their own.

80 4.6.8 Self Management

Network topology can change gradually and sometimes rapidly. If an architecture is capable in doing self-management and resource realignment in the face of infrastructure changes, the users load can be redistributed and the users experience will degrade or improve gradually giving a sense of service stability. The ‘mDNS’ architecture uses soft-state protocols to keep track of changes in network topology. 4.6.9 Multicast Mode Independence

Currently multicast is in a transitionary state migrating from ASM towards SSM mode. So any system using multicast for optimized communication must be able to exist in both ASM and SSM networks. ‘mDNS’ has that capability. Depending on the underlying network type, it can subscribe to appropriate groups. In case the communication is not possible over multicast at all, the components are capable of switching to IP unicast in order to send/receive important communications.

‘mDNS’ architectural outline and specifics provided in this dissertation have been able to achieve most of the goals that were identified early on. Now let us figure out the quality of goals met and the services provided and identify shortcomings and areas of improvements in the architecture. 4.7 Looking Back - High Level Assessment of the ‘mDNS’ Service Framework

One of the design goals of ‘mDNS’ was equitable work load distribution. In the proposal presented in this dissertation, we have achieved within acceptable limits equal hash space division among participating domains. But we realize that actual data distribution stored at each domain will be skewed depending on the popularity of keywords used to tag the sessions. Also the communication load is not equal. The nodes that are close to the root of the DHT will have to process more routing messages through them. The DHT soft-state maintenance algorithm requires some control messages to be sent between parent and children domains at periodic intervals. The communication overhead is not a function of the level in the tree but the fan-out factor

81 at that node. Once the ‘mDNS’ structure stabilizes, the use of redirect and indirection caches at every MSD should tremendously reduce routing burden at nodes closer to the root in the DHT hierarchy.

The area of reliability in the face of domain failures may be improved a bit. Although storing a shadow session record at a different location helps alleviate the issue a bit but a more robust domain failure safeguard algorithm could be designed. The authentication and security aspect of inter-domain communication especially validation and verification of control messages has to be worked on. Although because of transition to SSM from ASM mode some of the more flagrant security issues in IP multicast, such as spurious cross traffic and an unhindered sender policy where the sender need not be part of the multicast group where it sends data on, should take care of itself.

4.8 Conclusion

This chapter describes the integration of the DHT hierarchy and the URS based URL scheme along with the larger picture of ‘mDNS’ architecture. It describes how ‘mDNS’ is capable of coexisting in both ASM and SSM multicast environments. Various failure scenarios were visited and analyzed in some details. And towards the end, this chapter revisited the designed goals listed in Chapter 2 and argued how the overall architecture achieve those goals. We briefly describe some areas of improvements in the proposed architecture.

82 CHAPTER 5 ARCHITECTURE VALIDATION: SIMULATION AND ANALYSIS

5.1 Introduction

In this chapter we present some experiments that we performed with the ‘mDNS’ architecture. Our simulation strategy will be described and we would justify our choice of that strategy. A detailed simulation results in raw data format as well as graphical interpretation of the data which will be presented. Then we will analyze and compare the presented scheme with other DHT schemes along relevant lines. Message complexity and workload analysis at various levels in the DHT tree will also be presented. 5.2 Simulation Environment and Strategy Description

In order to test various system parameters and performance benchmarks we developed a simulation strategy that allowed to run multiple instances of ‘mDNS’ software in a simulated domain hierarchy on a single host machine. We developed a simulator application that comprised of a virtual DNS server implementation and interfacing of the virtual DNS and actual MSD and URS implementation of the ‘mDNS’ components. The unmanned simulation controller developed for this purpose took the domain startup order sequence and the delay value list. It also took the starting and ending values for α and β parameters that govern the DHT stability algorithm mentioned in Chapter 2. Figure 5-1 shows the screen shot of the auto simulator program. The auto simulator program started the required number of virtual DNS applications with appropriate configuration parameters and started the URS and MSD server for each virtual DNS server instance thus creating a simulated number of ‘mDNS’ domains. The virtual DNS server parameters were set in a way to link the domains appropriately according to the simulation domain hierarchy scheme. The virtual DNS software was capable of domain URL translation in an iterative manner. It also supported basic protocol handling capabilities that allowed other programs to query certain simulation parameters over TCP/IP sockets.

83 Figure 5-1. Screenshot - mDNS auto simulator program

The host environment for running the simulation was a Windows machine with the following configuration -

CPU: Intel Core2 Quad Q6600 @ 2.40 GHz Memory: 5120 MB OS: Windows 7 Professional 64 bit

We also developed a session discovery latency measurement tool that allowed us to measure the latency between session registration and its discovery by end users.

The tool allowed us to start up to 10 virtual connected domains and took a list of keys

84 as input file. It measured latency between session registration & discovery times in milliseconds. Figure 5-2 shows the screen shot of the tool.

Figure 5-2. Screenshot - mDNS latency measurement tool

5.2.1 Starting the Simulation

Using multiple instances of the DNS simulator and setting appropriate values for PMCAST, CMCAST, and MSD-LOCAL-MCAST allows us to simulate a particular domain. By using pointers to parent DNS and children domain DNS servers in the simulation we were able to simulate a connected hierarchical network domain arrangement.

85 We then ran URS servers on those port numbers that matched the corresponding entries in the DNS simulators. We start our MSD servers by pointing to the ports on which our DNS simulators are running; this enables the MSD servers to find out about

URS servers and then using this information they query the URS servers to locate the details on PMCAST, CMCAST, and MSD-LOCAL-MCAST channels. From here the MSD servers and URS servers execute exactly in the same manner as if they were installed and running in a real network domain. An alternate approach would have been to use multiple instances of virtual machines (VM) to simulate each domain. This approach is highly taxing in terms of host machine resources. Each VM consumes significant resource and hence one is limited to running only a few instances on any given machine. Using our strategy, we

are able to simulate 10 - 15 domains on our test machine without incurring a significant performance penalty.

5.2.2 Validity

Since the type of simulation data we seek to measure, e.g., the number of domain hops, number of keyword routing table updates, etc., is not dependent on the finer

details of a network simulation such as the physical layer, data links, and lower level interconnections, our simulation strategy does not invalidate our data. For this reason and because our softwares were implemented as an overlay in the application layer, we

do not need a full featured network simulator such as ns2 [67] for our simulation. Multicast session directory (MSD) software was modified to simulate inter-domain network latency for use with the session discovery latency measurement experiments. Inter-domain latencies were set randomly between 25ms - 100ms. 5.2.3 Simulation Domain Hierarchy Setup

We performed our simulation with three different types of domain hierarchy to see

the effect of network topology on optimal component parameters. Figure 5-3 shows the three domain topology used in the simulation data collection.

86 Figure 5-3. Various network topologies chosen for simulation

The above figure shows scenario 1 that is a somewhat balanced domain arrangement in the hierarchy. Scenario 2 and 3 shows the two extremes of the domain linkage scheme. Scenario 2 is the two level scenario where there is only one parent domain (flat arrangement, tree of height two) and scenario 3 is the other extreme where all the domains are linked in a linear order (tree of height 10). In the figure, the direction of an arrow shows the relationship “is a child” of, e.g., A → B means A is a child of B.

For all the three scenarios we configured the simulation controller to start the virtual domain according to the permutation: [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] and inter-domain delay values [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. The value in permutation location i acts as pointer to position in the delay-list for locating the delay value to wait before starting the next domain. This is how the simulation controller acts: it first starts virtual domain 10, looks into 10th place in the delay-list, finds the value 30, waits for 30 seconds before starting the virtual domain 4 and so on. Another set of values that we used in our simulation was domain start-up permutation value [10, 1, 4, 5, 2, 3, 6, 7, 8, 9] and delay values [5, 5, 5, 5, 5, 540, 5, 5, 5, 5].

5.3 Simulation Results

Table 5-1 shows a partial list of values of measured system parameters for scenario

1 hierarchy using the domain startup permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] and inter-domain startup delay values [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. Each row

87 shows β and α values used for the simulation run. They also contain values of several parameters that were measured - such as the average hash space assignment skew value among domains, average control bandwidth usage per participating domain in the overall hierarchy, average routing table switch per domain before the routing system stabilized, average time it took for the overall hierarchy to stabilize, and the overall hash-skew represented as a percentage value. The table presents only a partial list of values. A complete list can be accessed from the project site [68]. Each experiment was run three times and the values represent the averages across these runs. The tables also give the standard deviation values across these runs.

Figure 5-5. Skew standard deviation - Figure 5-4. Average hash skew - scenario 1 scenario 1

We collected the total number of routing table updates that any domain underwent before stabilizing, the time taken for routes to be stabilized, and measured the hash-space assignment ‘skew’ and control bandwidth used up among the participating domains. Hash ‘skew’ for each domain is measured using this formula:

1 Hashskew = (Hash )assigned P | frac − domains |

88 Table 5-1. Partial simulation data for scenario 1 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] BETA ALPHA SKEW ST-DEV CBW ST-DEV R-SWITCH ST-DEV ST-TIME ST-DEV SCORE ST-DEV 0.1 0.1 0.03 0 8.446 0 5.2 0 649.70 0 6.449 0 0.3 0.1 0.03 0 8.443 0.001 5.3 0 649.70 0 6.499 0 0.3 0.3 0.04 0 8.424 0.003 3.4 0 495.97 0.153 4.98 0.001 0.5 0.1 0.03 0 8.444 0 5.3 0 649.77 0.058 6.499 0 0.5 0.3 0.03 0 8.446 0 4.267 0.058 496.53 1.012 5.216 0.024 0.5 0.5 0.03 0 8.446 0.001 4.2 0 496.00 0 5.18 0 0.7 0.1 0.03 0 8.447 0.001 4.5 0 459.63 0.115 5.148 0.001 0.7 0.3 0.04 0 8.425 0 3.4 0 496.03 0.058 4.98 0 0.7 0.5 0.03 0 8.441 0.01 4.3 0 460.23 0.924 5.051 0.005 0.7 0.7 0.04 0 8.433 0.001 2.967 0.058 459.67 0.115 4.582 0.028 0.9 0.1 0.03 0 8.446 0.003 4.5 0 459.33 0.635 5.147 0.003 0.9 0.3 0.04 0 8.425 0 3.4 0 496.03 0.058 4.98 0 0.9 0.5 0.03 0 8.448 0.001 4.3 0 459.67 0.058 5.048 0 89 0.9 0.7 0.04 0 8.434 0.001 3 0 459.67 0.058 4.598 0 0.9 0.9 0.04 0 8.434 0 3 0 459.70 0 4.599 0 1.1 0.1 0.03 0 8.433 0.012 4.9 0 687.37 1.935 6.487 0.01 1.1 0.3 0.04 0 8.422 0.001 3.9 0 499.03 0.058 5.245 0 1.1 0.5 0.04 0 8.424 0.001 3.4 0 498.10 0.872 4.991 0.004 1.1 0.7 0.04 0 8.425 0 3.3 0 496.03 0.058 4.93 0 1.1 0.9 0.04 0 8.423 0.002 3.3 0 497.00 0.866 4.935 0.004 1.1 1.1 0.04 0 8.424 0.001 3.1 0 496.57 0.896 4.833 0.004 1.3 0.1 0.03 0 8.44 0 4.9 0 686.33 0.153 6.482 0.001 1.3 0.3 0.04 0 8.422 0.001 3.9 0 499.07 0.058 5.245 0 1.3 0.5 0.04 0 8.425 0.001 3.4 0 497.60 0.173 4.988 0.001 1.3 0.7 0.04 0 8.425 0 3.3 0 497.03 0.723 4.935 0.004 1.3 0.9 0.04 0 8.424 0 3.3 0 496.03 0.058 4.93 0 1.3 1.1 0.04 0 8.425 0 3.1 0 496.07 0.058 4.83 0 1.3 1.3 0.04 0 8.425 0 3.1 0 496.03 0.058 4.83 0 1.5 0.1 0.03 0 8.43 0.018 4.867 0.058 687.70 2.598 6.472 0.016 1.5 0.3 0.04 0 8.422 0 3.9 0 499.07 0.058 5.245 0 Figure 5-6. Average control bandwidth - Figure 5-7. Control bandwidth standard scenario 1 deviation - scenario 1

Figure 5-8. Average route switches - Figure 5-9. Route switch standard deviation scenario 1 - scenario 1

Figure 5-4 shows the average hash-skew plot per domain for simulation hierarchy type 1. Figure 5-5 shows the standard deviation in the hash skew values computed over three experimental runs. Figure 5-6 shows the average control bandwidth usage per domain in the network to maintain the domain hierarchy for simulation hierarchy type 1. Figure 5-7 shows the standard deviation in the control bandwidth used per domain for three runs of the simulation. Figure 5-8 shows the average routing table switch per domain for different values of

α and β for hierarchy type 1. Figure 5-9 shows the standard deviation in routing switches

90 Figure 5-10. Average route stabilization Figure 5-11. Route stabilization time time - scenario 1 standard deviation - scenario 1

for different values of α & β among three experimental runs for hierarchy 1. Figure 5-10 shows the routing table stabilization time for different values of α and β for the same domain hierarchy structure. Table 5-2 shows the partial data values for experiments done on domain hierarchy setup type 2 and with domain starting order permutation list as [10, 4, 5, 6, 1, 2, 7, 8, 9,

3] and inter-domain startup delay values as [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. Table 5-3

shows the partial data values for domain hierarchy scenario 3 and with same domain startup order and delay parameters as before.

Figure 5-12. Average hash skew - scenario Figure 5-13. Skew standard deviation - 2 scenario 2

91 Table 5-2. Partial simulation data for scenario 2 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] BETA ALPHA SKEW ST-DEV CBW ST-DEV R-SWITCH ST-DEV ST-TIME ST-DEV SCORE ST-DEV 0.1 0.1 0.030 0.0 7.864 0.026 2.933 0.058 512.40 0.000 4.629 0.029 0.3 0.1 0.030 0.0 7.879 0.000 3.000 0.000 512.40 0.000 4.662 0.000 0.3 0.3 0.035 0.0 7.893 0.022 2.300 0.000 327.93 0.115 3.490 0.001 0.5 0.1 0.030 0.0 7.866 0.024 3.000 0.000 512.37 0.058 4.662 0.000 0.5 0.3 0.035 0.0 7.904 0.004 2.433 0.321 345.23 29.849 3.643 0.309 0.5 0.5 0.035 0.0 7.906 0.000 2.300 0.000 327.97 0.058 3.490 0.000 0.7 0.1 0.030 0.0 7.874 0.005 2.933 0.058 512.37 0.058 4.628 0.029 0.7 0.3 0.035 0.0 7.893 0.022 2.300 0.000 327.93 0.058 3.490 0.000 0.7 0.5 0.035 0.0 7.889 0.024 2.233 0.058 328.00 0.000 3.457 0.029 0.7 0.7 0.035 0.0 7.889 0.024 2.233 0.058 328.00 0.000 3.457 0.029 0.9 0.1 0.030 0.0 7.879 0.000 3.000 0.000 512.43 0.058 4.662 0.000 0.9 0.3 0.035 0.0 7.906 0.000 2.300 0.000 327.97 0.058 3.490 0.000 0.9 0.5 0.035 0.0 7.893 0.022 2.300 0.000 328.00 0.000 3.490 0.000 92 0.9 0.7 0.035 0.0 7.879 0.018 2.267 0.058 328.00 0.000 3.473 0.029 0.9 0.9 0.035 0.0 7.904 0.004 2.267 0.058 327.97 0.058 3.473 0.029 1.1 0.1 0.030 0.0 7.877 0.003 2.967 0.058 512.40 0.000 4.645 0.029 1.1 0.3 0.035 0.0 7.893 0.022 2.300 0.000 328.00 0.000 3.490 0.000 1.1 0.5 0.035 0.0 7.906 0.000 2.300 0.000 328.00 0.000 3.490 0.000 1.1 0.7 0.035 0.0 7.904 0.003 2.267 0.058 328.00 0.000 3.473 0.029 1.1 0.9 0.035 0.0 7.903 0.005 2.267 0.058 327.97 0.058 3.473 0.029 1.1 1.1 0.035 0.0 7.906 0.000 2.300 0.000 328.00 0.000 3.490 0.000 1.3 0.1 0.030 0.0 7.868 0.023 3.100 0.000 514.00 0.000 4.720 0.000 1.3 0.3 0.030 0.0 7.879 0.003 2.967 0.058 512.37 0.058 4.645 0.029 1.3 0.5 0.030 0.0 7.854 0.023 3.000 0.000 512.43 0.058 4.662 0.000 1.3 0.7 0.030 0.0 7.851 0.053 3.000 0.000 512.30 0.173 4.661 0.001 1.3 0.9 0.030 0.0 7.872 0.009 2.900 0.100 511.60 0.985 4.608 0.054 1.3 1.1 0.030 0.0 7.877 0.004 2.933 0.058 512.40 0.000 4.629 0.029 1.3 1.3 0.030 0.0 7.904 0.003 2.400 0.000 603.23 0.058 4.816 0.000 1.5 0.1 0.030 0.0 7.866 0.021 3.067 0.058 514.00 0.000 4.703 0.029 1.5 0.3 0.030 0.0 7.861 0.020 3.000 0.000 515.87 6.004 4.679 0.030 Table 5-3. Partial simulation data for scenario 3 hierarchy for permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] BETA ALPHA SKEW ST-DEV CBW ST-DEV R-SWITCH ST-DEV ST-TIME ST-DEV SCORE ST-DEV 0.1 0.1 0.03 0.0 7.450 0.055 9.000 0.200 1049.37 17.398 10.347 0.148 0.3 0.1 0.03 0.0 7.490 0.002 8.733 0.115 1047.97 38.352 10.207 0.224 0.3 0.3 0.03 0.0 7.531 0.012 7.700 0.100 1081.53 17.908 9.858 0.047 0.5 0.1 0.03 0.0 7.500 0.072 7.600 0.100 1084.17 68.490 9.821 0.362 0.5 0.3 0.03 0.0 7.512 0.088 7.667 0.115 1003.90 46.687 9.453 0.289 0.5 0.5 0.03 0.0 7.510 0.011 7.433 0.058 1042.93 34.845 9.531 0.185 0.7 0.1 0.04 0.0 7.453 0.018 7.400 0.100 1072.60 71.899 9.863 0.334 0.7 0.3 0.04 0.0 7.410 0.030 7.400 0.100 1038.43 32.102 9.692 0.148 0.7 0.5 0.04 0.0 7.423 0.049 7.267 0.115 993.37 67.946 9.400 0.397 0.7 0.7 0.04 0.0 7.468 0.068 6.500 0.000 1033.60 71.125 9.218 0.356 0.9 0.1 0.04 0.0 7.463 0.052 7.433 0.058 1036.37 28.491 9.699 0.170 0.9 0.3 0.04 0.0 7.453 0.020 7.400 0.100 1045.97 25.007 9.730 0.076 0.9 0.5 0.04 0.0 7.442 0.020 7.333 0.115 1045.27 39.495 9.693 0.236 93 0.9 0.7 0.04 0.0 7.477 0.077 6.600 0.100 1061.50 19.459 9.408 0.051 0.9 0.9 0.04 0.0 7.475 0.043 6.533 0.058 1047.37 28.625 9.304 0.136 1.1 0.1 0.16 0.0 7.560 0.049 3.767 0.058 528.17 9.292 7.724 0.075 1.1 0.3 0.16 0.0 7.586 0.045 3.833 0.115 527.43 5.316 7.754 0.041 1.1 0.5 0.16 0.0 7.542 0.058 3.733 0.058 533.03 5.862 7.732 0.030 1.1 0.7 0.16 0.0 7.578 0.034 3.733 0.058 522.43 6.924 7.679 0.015 1.1 0.9 0.16 0.0 7.596 0.034 3.733 0.058 527.53 14.858 7.704 0.051 1.1 1.1 0.18 0.0 7.581 0.053 1.367 0.058 264.57 1.973 5.606 0.039 1.3 0.1 0.16 0.0 7.535 0.054 3.833 0.058 523.27 9.372 7.733 0.064 1.3 0.3 0.16 0.0 7.571 0.022 3.767 0.115 525.27 1.242 7.710 0.055 1.3 0.5 0.16 0.0 7.555 0.082 3.767 0.058 534.97 2.219 7.758 0.030 1.3 0.7 0.16 0.0 7.600 0.072 3.733 0.058 524.07 11.324 7.687 0.081 1.3 0.9 0.16 0.0 7.559 0.080 3.767 0.058 532.33 9.911 7.745 0.059 1.3 1.1 0.18 0.0 7.669 0.040 1.367 0.058 265.13 2.836 5.609 0.037 1.3 1.3 0.18 0.0 7.633 0.040 1.333 0.058 267.90 3.936 5.606 0.044 1.5 0.1 0.16 0.0 7.567 0.044 3.767 0.058 541.23 11.039 7.790 0.084 1.5 0.3 0.16 0.0 7.558 0.020 3.800 0.100 528.87 4.600 7.744 0.072 Figure 5-14. Average control bandwidth - Figure 5-15. Control bandwidth standard scenario 2 deviation - scenario 2

Figure 5-16. Average route switches - Figure 5-17. Route switch standard scenario 2 deviation - scenario 2

Figure 5-12 shows the average hash-skew per domain for different values of α and

β for simulation scenario 2. Similarly Figure 5-14 shows the plot for average control bandwidth in bytes/second for simulation scenario 2, Figure 5-16 depicts the plot for average routing table switch and Figure 5-18 shows the stabilization time (in seconds) for routing flux to subside for simulation scenario 2. Figures 5-20, 5-22, 5-24, and 5-26 shows the same figure types but for simulation scenario 3. Our simulation run for each

type covers α and β values between 0.1 to 2.0 with step size 0.2 with α ≤ β.

94 Figure 5-18. Average route stabilization Figure 5-19. Route stabilization time time - scenario 2 standard deviation - scenario 2

Figure 5-20. Average hash skew - scenario Figure 5-21. Skew standard deviation - 3 scenario 3

5.3.1 Latency Experiment Results

We performed latency measurement experiments with 1 to 5 domains using arrangement shown in hierarchy scenario 3 as shown in Figure 5-3. The data values

represented in Table 5-4 are in milliseconds. Figure 5-28 shows all the parameters represented as horizontal bars. X axis denotes the time in milliseconds. Figure 5-29 shows the maximum, minimum, and

average latency values for experiments conducted with domains ranging in numbers

from 1 through 5. X axis shows number of domains and y-axis shows time in milliseconds.

95 Figure 5-22. Average control bandwidth - Figure 5-23. Control bandwidth standard scenario 3 deviation - scenario 3

Figure 5-24. Average route switches - Figure 5-25. Route switch standard scenario 3 deviation - scenario 3

Table 5-4. Latency measurements summary

Domainscnt Latencymax Latencymin Latencyavg Latencystd−dev Latencymedian 1 1060 503 548.64 113.81 508 2 4736 503 1198.08 994.92 865 3 4904 503 1315 1034.11 934 4 4937 503 1493.92 1181.18 980.5 5 4990 504 1713.22 1217 1088

96 Figure 5-26. Average route stabilization Figure 5-27. Route stabilization time time - scenario 3 standard deviation - scenario 3

Figure 5-28. Summary chart for latency experiments

Figures 5-30 and 5-31 shows the median and the average latency values in milliseconds. The x-axis shows the number of domains. The significant jump in discovery latency time from one domain to higher is due to the ‘MSDPROBE’ & ‘REDIRECT’ protocol steps involved in the domain external searches as compared to domain local search which is the case with simulating

with just one domain. In the experiments we performed, session registration was

performed at randomly chosen domain and search initiation was done immediately after

97 Figure 5-29. Range chart for latency experiments

Figure 5-30. Median Latency Figure 5-31. Average Latency registration again at randomly chosen domain/node. These two random selections were independent of each other. Results Interpretation

For easy interpretation of results, we combined the three data, namely, number of route updates, time taken to stabilize and hash ‘skew’, using weights of 0.5, 0.3 and 0.2 to arrive at a weighted score. Lower weighted score represents better

98 performance. Before linear combination, each data value was normalized to fall in the same approximate range of magnitudes. This was done to prevent bias for any one element in computation of the weighted score. For example, generally hash-skew was of magnitude 10−2, stabilization-time (in seconds) was of magnitude 102 and count of routing update was in the order of 101, so before computing the weighted score, we multiplied hash-skew by 100 to convert it into a percent and divided stabilization-time by 60 to convert into minutes.

Figure 5-32. Average of weighted scores - Figure 5-33. Standard deviation of weighted scenario 1 scores- scenario 1

Figure 5-34. Average of weighted scores - Figure 5-35. Standard deviation of weighted scenario 2 scores- scenario 2

99 Figure 5-36. Average of weighted scores - Figure 5-37. Standard deviation of weighted scenario 3 scores- scenario 3

Figure 5-32 shows the average weighted scaled scores for simulation scenario 1 for three experimental runs. The figure has been drawn using weights of 0.5 for routing table switches, 0.3 for route stabilization time, and 0.2 for hash-skew value. Figure 5-34 shows the average weighted scaled scores for scenario 2 and Figure 5-36 shows the averages of weighted scaled scores for simulation scenario 3. The x-axis represents β values from 0.1 to 2.0, y-axis represents α ranging from 0.1 to β, and the z-axis represents the scaled weighted score.

Looking into the weighted score plot, one can see that for scenario 1 simulation setup, the best performance is achieved if α, β ∈ [1.8 − 2.0] with α ≤ β. For scenario 2, the optimal system performance is achieved at α ∈ [0.4−1.0] and β ∈ [1.8−2.0] to report some of the values. For scenario 3 the system performed better with α, β ∈ [1.2 − 2.0] with α ≤ β. Considering the simulation results, it is clear that the choice of α and β depends on the network topology. A system administrator is free to choose a value of his liking although it is advisable to follow the common selection guidelines for the full hierarchy. In

order to maintain global routing table stability, a relatively high value of β is suggested,

and for routing table stability at the subtree level, a higher value of α is advised.

100 One drawback for choosing a high value for α and β is the possibility of somewhat uneven hash-space assignments across participating nodes. Hence in order to maintain the balance between workload and routing stability, a comprise value of α and β must be found. It is important to clarify the fact that routing instability does not mean that the routing tables will be unstable always. Addition or removal of a domain to/from the hierarchy will result in hash space reassignments, which will lead to route table updates. Once the hierarchy becomes stable, routing tables will become stable as well. Yet it is desirable to reduce this period of instability while the whole hierarchy is still reorganizing, as it can

have an adverse impact on the quality of service.

The session discovery latency experiments demonstrates the strength of our

proposal. As compared to previous approaches [46][44][40], where session discovery by interested receivers could take anywhere from few minutes to few hours, ignoring other troubles in such schemes, the latency in our scheme is in the order of milliseconds to few seconds. The results are for the keywords usage pattern that rendered caching ineffective. The big jump in the median and average latency values are due to extra protocol delay incurred due to MSDPROBE and REDIRECT steps involved in inter-domain searches in cache miss situation. With effects of caching kicking in, on an average the latency in query will come down significantly.

5.4 Qualitative Analysis and Comparison

Let us now look at the qualitative complexity analysis of some of the components used in ‘mDNS’ systems in this section.

5.4.1 Geo-Tagged Database - Complexity Analysis

Let us begin by analyzing the search complexity in mDNS scheme especially when “geo-tagged” database is used. This complexity depends on many factors such as

sparse matrix representation format for the planetary grid, each grid location branching

factor “k” as one traverses down the tree rooted at that position, search radius “r” desired

101 and database areal resolution parameter “d”, which along with “k” determines the tree depth. Where the session originates is generally not known a priori. Therefore we have implemented sparse matrix using array of linked lists. Given the geographic coordinates, using integral part of latitude degree, row index is computed in O(1) time. Since the linked list size for every occupied longitude corresponding to a latitude could be at most 360, a linear traversal to find the correct longitude location in the list would also take O(1) time.

110.9 km (5–1) k n

s 2 2 π a4 cos (φ) + b4 sin (φ) 1 × cos φ × × km (5–2) 180◦ (a cos φ)2 + (b sin φ)2 k n Using equations (5–1) and (5–2) and specified values for “k” and “d” the maximum depth of the tree “h” rooted at linked-list entry representing a (latitude, longitude) pair d 110.9 h = log 110.9 can be computed using: ≥ kh or d k d e. Because session references are maintained only at the leaf node linked-lists, one must traverse the complete tree height regardless of the search radius specified. Now the depth h’ of tree where the grid vertical resolution is greater than search diameter 2 x r can be computed h = log 110.9 using 0 b k 2×r c. At this depth, the grid’s horizontal resolution can be computed using equation (5–2) by replacing “n” in that equation by h’. Because the horizontal “east-west” distance decreases as one moves towards the poles, number of sub-grids N that we must traverse laterally in east-west direction in our sparse-matric representation can be computed using -

2 r N = × d q 2 2 e (5–3) π cos a4 cos (φ) +b4 sin (φ) 1 180◦ × φ × (a cos φ)2+(b sin φ)2 × kh0 And hence the number of possible linked lists at tree leave nodes that one must traverse

h−h0 to find out candidate session records can be easily found out using Nleafgrids = N × k .

102 The actual geo-search complexity depends on the length of linked-lists, O(list) rooted at the tree-leaves in our sparse-matrix representation. Hence the search complexity can be approximated by -

110.9 110.9 dlogk e−blogk c C × (N × k d 2×r × O(list)) (5–4)

where C is a constant that can vary between 1 and 4 depending on search-criterion’s (read:coordinates) proximity to the grid edges or corners of the target quadrant at tree height h’. The search complexity can be reduced greatly if we replace leaf linked-lists by hash-tables and using perfect hashing functions.

5.4.2 Hash-Based Keyword Routing - Fairness Analysis

Although its nearly impossible to find out a priori the relative popularity of keywords used to do session searches, let us, for the time being, assume that every keyword is

likely to be searched equally. Further because routing is done using MD5 hash of these keywords, the cryptographic nature of the hashing function makes any hash value in the entire hash space to be routed in equal likelihood. Keeping these assumptions in mind, let us analyze the hash space distribution among participating mDNS MSD servers and the search and routing workload analysis on them. Suppose the root node has k-child domains such that the sum of MSD-designate count at the root node is N. Let us denote the node count from each child node being

reported to the root node by ni . Thus -

Xk ni = N (5–5) i=1

Since MD5-128 hash is used, the keyword hash space that must be distributed among participating nodes is 2128. As we use prefix routing in mDNS, suppose the significant bits that are needed to route appropriately be “m”. And therefore -

m 2 ≥ N or m ≥ log2N (5–6)

103 Without loss of generality, assume that the root node does not participate as MSD server (possibly a TLD server), then the share of hash-space allotted to each child

ni 2m 2(128−m) child domain at the root level will be N × × . Therefore for i - n share : i 2128 i N × (5–7)

Further, each child node reallocates its assigned hash-space among its children and itself, the space must be divided equally into ni shares, and thus each participating domain’s designated-MSD server share comes out to -

n 1 share : i 2128 n or 2128 MSD N × ÷ i N × (5–8)

This of course is valid provided the domain hierarchy remains stable over time. As new domains may be added and some domains may leave the mDNS hierarchy over time, there could be times when the above equitable distribution might be violated for short durations. This situation should not arise frequently and we conjecture, it would mostly occur during bootstrapping process. This minor turbulence in stable equitable distribution occurs because of the way Algorithm 2 (see Chapter 2) has been designed to minimize routing instability and to reduce frequent routing flux.

Let us analyze the workload due to routing of search and registration requests to appropriate MSD servers. Clearly, a node that comes higher up in the tree hierarchy must carry out more routing responsibilities compared to a node that is located close to the leaf domains. At any node in the routing tree, nodei , suppose there are m children domains, then the routing load at that particular nodei becomes

1 Xm load = ( count + 1) 100% i N × j × (5–9) j=1 where countj is the MSD count propagated to nodei from its childj sub-domain. This of course assumes stable and equitable hash space distribution.

104 Now if every keyword is likely to be searched equally over long period of time, then the workload on a mDNS node nodei over a duration of time “t” becomes -

workloadi = t × ratequery × probabilityrange (5–10)

shareMSD probabilityrange = (5–11) 2128

Now using equation (5–8) in equation (5–11), we get -

1 workload = t rate i × query × N (5–12) which shows that the search related workload is also generally equitable provided that the keywords are searched at equal likelihood. Although during short durations, some keywords are more popular than others, however the trend over significant longer period of time remains to be seen.

5.4.3 A Comparison with Other DHT Schemes

Let us now reason, in favor of our hierarchical DHT overlay scheme and unsuitability of other DHTs, for mDNS architecture. One of the design goals of mDNS has been the ability to assign long term URLs to the multicast streams registered with the service. That necessitated a close design correspondence with existing DNS infrastructure. Design criteria such as ability to filter out administratively scoped sessions from being sent out to requesting users external to that domain led us to design the system along domain hierarchy. Doing so made the search algorithm and the session database management efficient and simple. But this choice led to a design that deviates from typical P2P design. mDNS DHT overlay design is not a P2P design. In a typical peer to peer architecture, every node is assumed to have similar responsibility and share the same workload. Typically all nodes in a DHT based P2P scheme are at the same level. Therefore, mDNS DHT is not a peer to peer design although it incorporates several design principles found in a typical P2P DHT design.

105 Table 5-5. DHT feature comparison DHT scheme routing table size average hop count Chord m O(logN) Tapestry b × logbN O(logbN b Pastry (2 − 1) × dlog2b Ne + 2 × L dlog2b Ne i i+1 Kademlia 0 ≤ i ≤ 160, k-bucket list with 2 to 2 distance dlog2Ne CAN 2 × d (d/4) × (n1/d ) mDNS 2 + fan-out factor (k) O(logk n)

mDNS DHT deviates from other DHT designs in many ways. Let us read some of the differences now:

• mDNS overlay mirrors domain hierarchy whereas other schemes employ flat structuring where generally overlay node IDs are assigned randomly in a circular hash space ring virtual structure. • Generally, in DHT based overlays, participating nodes that are neighbors in the nodeID space may be very far apart in actual network distances. The neighboring nodes in mDNS, because the overlay mirrors domain hierarchy in the Internet, are likely to be not very far in actual network distances. • In a typical DHT based overlay, all participating nodes assume similar responsibilities and workloads because the nodes are in a flat arrangement. Even in Kademlia [38] that employs a binary tree arrangement of nodes, all participating nodes are leaves in the binary tree. In contrast, in mDNS, the higher up nodes have typically larger message routing burden compared to leaf nodes. The root node manages the overall hash space allotment and its subsequent management.

Similar to other DHT overlays that have a constant Relative Delay Penalty (RDP) factor [69], because of the nature of the mDNS hierarchy where neighboring domains are more likely to be network neighbors too, we conjecture that mDNS RDP factor will be within a constant factor of actual network routing path length. Let us compare the various DHT schemes with respect to their respective routing table sizes, average routing hop counts, and their logical node placement strategies. Table 5-5 shows the comparisons. Among the compared DHT schemes, Chord [47] and Pastry [37] have node placements in a logical ID space rings, Tapestry [49][56] has nodes in a graph logical arrangement, Kademlia [38] constructs a binary prefix tree with nodes as leaves,

CAN [39] arranges participating nodes in a d-dimensional coordinate space, and mDNS

106 constructs a k-ary tree hierarchy of participating domains where ‘k’ is the number of typical children domains attached at a node.

In Table 5-5, ‘m’ represents the number of bits representing a chord node ID. ‘N’

represents the number of participating nodes under chord and pastry, but represents the size of namespace with base ‘b’. ‘n’ denotes the actual number of participating nodes for kademlia, CAN, and mDNS entries. For pastry, ‘b’ represents the number of bits used to represent the base of the node ID representation, b = 4 signifying a base 16 representation. For pastry, ‘l’ denotes the size of leaf set and proximity neighbors list.

5.5 Conclusion

The ‘mDNS’ framework we described in this dissertation allows for an easy discovery of multicast session and improved usability due to URLs assignment capability offered by the architecture. The session discovery is based upon the distributed tree DHT structure that depends on the internal parameters α and β. In this chapter we presented our simulation scheme and described the setup in detail. We performed experiments with range of values of α and β to find out the range of values of these parameters for better stability and better overall system performance.

We presented analytical assessment of session search complexity when using Geo-DB database. We also presented arguments on the fairness claim made in the dissertation with respect to the participating domains in the overall ‘mDNS’ hierarchy. We presented search latency experiment results and their interpretations. We presented comparative analysis among various popular P2P architecture and the architecture presented in this dissertation.

107 CHAPTER 6 CONCLUDING REMARKS

IP multicast is a very efficient mechanism for transmission of live data and video streams to large group of recipients. Compared to unicast, it offers tremendous bandwidth savings to content providers and reduces traffic in the core network, letting free the expensive bandwidth for other services. The bandwidth savings become valuable and noticeable over thin Trans-Atlantic data pipes. From an end-users’ perspective, multicast improves their quality of service perception because instead of multiple data streams competing for a congested link resource, there exists one data stream and therefore it gets allotment of higher congested link bandwidth.

Even though multicast has numerous benefits over unicast for data transmission in various scenarios, its end user demand and network deployment remains sparse. Unlike unicast, where the source and destination addresses are unique and generally somewhat stable, multicast addresses are usually assigned for a short term to the group. The group addressing is typically flat and offers no clue about message transmission direction to routers. So data forwarding is typically done using RPF checks and using a shared distribution tree. The construction of shared distribution tree and source discovery including multicast routing requires the network layer to implement several complex protocols such as MSDP, BGMP, Cisco-RP, PIM-SM.

This increased complexity in the network layer and therefore, increased network management complexity acts as a deterrent for the system administrators against native multicast deployment. Furthermore, lack of a scalable and realtime global multicast session discovery support in the Internet and lack of usability prevents the end users from tapping into the benefits of multicast.

With the increasing deployment of IGMP v3 in the network edges, end users have gained capability to filter and subscribe to specific sources they are interested in. SSM reduces network layer complexity and thereby increases its acceptance by system

108 administrators. Deploying a global mechanism for session discovery and improving usability will help this technology gain acceptance among end users and increase the adoption by system administrators everywhere.

For unicast, there exists several search engines like Google and Yahoo that maintains indexes of web content hosted in the Internet today. Why can we not leverage these engines for multicast content discovery as well? Multicast streams are short lived and the data transmitted over them are also not available for a long time. Moreover the group addressing is not stable and there is no global hierarchical structure in addressing as it exists for unicast addresses. These constraints make the use of any crawler based technology extremely difficult to use. Inherent delays associated with the web crawler technology also makes it almost impossible for short duration, transient, non planned sessions to be ever indexed and therefore discoverable by the target audience. To improve session discovery, this dissertation proposes using a tree DHT hierarchy. Use of tree DHT structure allows for efficient search routing in the structure. The DHT proposed in this dissertation tries to achieve equal hash-space assignments among participating domains. The DHT structure is self managing in the face of changes to the network topology. The use of domain count reporting thresholds α and β imparts stability to the global routing structure. The DHT structure is also capable in recovering from intermittent domain failures. The use of geo-tagged databases allow for an extra search criteria namely geographical searches to be performed.

The URS design and its close placement with DNS infrastructure allows multicast sessions to be assigned long term ‘mDNS’ URLs that an end user can bookmark to be used later. Even if the session parameters change later, the URLs would remain valid and would map to the correct session. Existence of an URS allows domain specific search to be performed from anywhere in the Internet. This in fact makes even a stand-alone deployment in just a single domain useful to the end users. This ability for incremental deployment in the global Internet is an asset in the design. URS aids in

109 failure recovery and keeps track of the MSDd server in its domain as well. It also acts as a bootstrapping device for other MSD servers deployed in its domain. This minimizes domain’s system administrator’s involvement in ‘mDNS’ components’ management.

The integration of the tree-DHT scheme and the URS allows for improved session search capabilities in the network. Since the system is registration based and is capable in distributing the registration data to appropriate site in the hierarchy in real time, even the most transient sessions become discoverable by the end users. We performed detailed simulation runs to test and discover optimal parameters setting for the DHT structure in different topology scenarios. For scenario 1 simulation setup, simulation data suggests that the best performance is achieved if α, β ∈ [1.8 − 2.0] with α ≤ β. For scenario 2, the optimal system performance is achieved for α ∈

[0.4 − 1.0] and β ∈ [1.8 − 2.0]. For scenario 3 the system performed better with α, β ∈ [1.2 − 2.0] with α ≤ β. It is clear that the choice of α and β depends on the network topology. A system administrator is free to choose a value of his/her liking although it is advisable to follow the common selection guidelines for the full hierarchy. In order to maintain global routing table stability, a relatively high value of β is suggested, and for routing table stability at the subtree level, a higher value of α is advised. New Services

Consider a scenario where you live in a particular part of a country and love listening to jazz music. You travel often and while on the go you would love to continue listening to your favorite genre of music. You perform a geo-location sensitive search to

find a multicast [70] session that broadcasts jazz music from a location near you.

Picture another scenario, emergency channels multicasted from a region hit by a natural disaster would generally be more effective in providing real time relief

information to residents in that area. Information such as where to go in order to get clean drinking water, a bag of ice and medical aid or to disseminate casualty information

can be updated in real time by emergency workers present at disaster sites rather than

110 someone sitting at a far away location. Geo-tagging such multicast sessions would help people discover relevant sessions faster and with higher accuracy. In fact in some cases, multicast sessions could also foster better inter-agency coordination enabling them to orchestrate an efficient relief program in the affected areas. Geo-tagged multicast sessions could also herald an era of real-time yet discoverable citizen news reporting by eye witnesses at news sites. Consider a scenario where a major traffic pileup has occurred on I95, a few eye witnesses on accident site may start a live video feed using their camera phones (modern cell phones are packing in more and more compute power), using 3G [71] or GPRS [72], register the multicast session using descriptive keywords such as, I95, pileup, accident etc. and let the whole world watch the news as it unfolds.

Raging California wild fires have made the county officials issue voluntary evacuations. Homeowners who decide to move out are always on their toes to find out the status of their homes. A few daredevils who decide to stay back, could start a video feed of their surrounding, geo-tagging their session with relevant location would make such sessions discoverable with more accuracy and homeowners who vacated could find the status of that area. Furthermore network traffic if sourced from nearby location generally is more reliable and impervious to network vagaries. Link capacities and traffic profile have a tremendous impact on the quality of sessions that have a larger hop count. Therefore usually one would want to get contents from sessions hosted from a location near oneself.

These are a few scenarios among many that suggest that geo-tagging of multicast sessions could have significant impact on the way people would consider using multicast in the future. Not only multicast will be a viable alternative in transmitting live broadcast on the Internet but would also make it more appealing to general masses and would help in creating demand of multicast services from consumers. It would also enable

111 various new services to be offered, a few that are envisioned above, which could not have been offered beforehand. Future research directions

One immediate research problem is to balance the tree-DHT structure if it becomes severely imbalanced. The imbalanced structure would result in deterioration in delay and increased routing complexities. But the major research challenge is how to achieve this without significant administrator involvement. Is there a possibility of using static configuration that would allow administrators to specify rules on what domains to join and what to avoid? How is the root responsibilities be migrated in the balancing algorithm so that there is no service disruption?

An organic research task would be to investigate the possibility of a shared multicast distribution tree in a mobile network with nodes coming in and going out at their own will. True potential of ‘mDNS’ can be realized if mobile users can multicast sessions on the fly and that such extremely transient sessions would still be discoverable by other users in the global Internet. Hopefully this dissertation is a starting point in revamping multicast research interest in the scientific community.

112 APPENDIX SIMULATION CONFIGURATION PARAMETERS

The simulations are run using custom implemented DNS tool. This tool takes in domain-setup file that contains all necessary parameters such as PMCAST, CMCAST, URS IP and port values and other associated parameters that allows the simulation of a connected hierarchy of domains. Here are the ‘Virtual DNS Tool’ configuration files for each domain in the three connected hierarchies used for experimentation. The DNS settings files should have an extension ‘.mds’. The domain numbers corresponds to the numbers as shown in the hierarchy diagram Figure 5.3.

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:0 ChildMulticast:239.0.0.100 ParentMulticast:239.0.0.101 URSIP:127.0.0.1 SelfPort:2000 CMPort:2001 PMPort:2002 URSPort:2003 MCAST_LOCAL_IP:239.0.0.102 MCASTPort:2004 ANYCASTPort:2005 DomainName:level0.testdomain.info @ dom00 127.0.0.1:2006 @ dom01 127.0.0.1:2011 @ dom02 127.0.0.1:2016

Scenario 1 - domain 10

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.103 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2006 CMPort:2007

113 PMPort:2001 URSPort:2008 MCAST_LOCAL_IP:239.0.0.104 MCASTPort:2009 ANYCASTPort:2010 DomainName:dom00.level0.testdomain.info @ dom000 127.0.0.1:2021 @ dom001 127.0.0.1:2026

Scenario 1 - domain 1

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.105 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2011 CMPort:2012 PMPort:2001 URSPort:2013 MCAST_LOCAL_IP:239.0.0.106 MCASTPort:2014 ANYCASTPort:2015 DomainName:dom01.level0.testdomain.info

Scenario 1 - domain 4

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.107 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2016 CMPort:2017 PMPort:2001 URSPort:2018 MCAST_LOCAL_IP:239.0.0.108 MCASTPort:2019 ANYCASTPort:2020 DomainName:dom02.level0.testdomain.info @ dom020 127.0.0.1:2016

114 Scenario 1 - domain 5

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2006 ChildMulticast:239.0.0.109 ParentMulticast:239.0.0.103 URSIP:127.0.0.1 SelfPort:2021 CMPort:2022 PMPort:2007 URSPort:2023 MCAST_LOCAL_IP:239.0.0.110 MCASTPort:2024 ANYCASTPort:2025 DomainName:dom000.dom00.level0.testdomain.info

Scenario 1 - domain 2

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2006 ChildMulticast:239.0.0.111 ParentMulticast:239.0.0.103 URSIP:127.0.0.1 SelfPort:2026 CMPort:2027 PMPort:2007 URSPort:2028 MCAST_LOCAL_IP:239.0.0.112 MCASTPort:2029 ANYCASTPort:2030 DomainName:dom001.dom00.level0.testdomain.info

Scenario 1 - domain 3

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2016 ChildMulticast:239.0.0.113 ParentMulticast:239.0.0.107 URSIP:127.0.0.1

115 SelfPort:2031 CMPort:2032 PMPort:2017 URSPort:2033 MCAST_LOCAL_IP:239.0.0.114 MCASTPort:2034 ANYCASTPort:2035 DomainName:dom020.dom02.level0.testdomain.info @ dom0201 127.0.0.1:2036 @ dom0200 127.0.0.1:2041 @ dom0202 127.0.0.1:2046

Scenario 1 - domain 6

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2031 ChildMulticast:239.0.0.115 ParentMulticast:239.0.0.113 URSIP:127.0.0.1 SelfPort:2036 CMPort:2037 PMPort:2032 URSPort:2038 MCAST_LOCAL_IP:239.0.0.116 MCASTPort:2039 ANYCASTPort:2040 DomainName:dom0200.dom020.dom02.level0.testdomain.info

Scenario 1 - domain 7

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2031 ChildMulticast:239.0.0.117 ParentMulticast:239.0.0.113 URSIP:127.0.0.1 SelfPort:2041 CMPort:2042 PMPort:2032 URSPort:2043 MCAST_LOCAL_IP:239.0.0.118 MCASTPort:2044

116 ANYCASTPort:2045 DomainName:dom0201.dom020.dom02.level0.testdomain.info

Scenario 1 - domain 8

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2031 ChildMulticast:239.0.0.119 ParentMulticast:239.0.0.113 URSIP:127.0.0.1 SelfPort:2046 CMPort:2047 PMPort:2032 URSPort:2048 MCAST_LOCAL_IP:239.0.0.120 MCASTPort:2049 ANYCASTPort:2050 DomainName:dom0202.dom020.dom02.level0.testdomain.info

Scenario 1 - domain 9 The ‘Virtual DNS’ settings for domain hierarchy scenario 2 are presented next. The domain numbers are as shown in the hierarchy figure shown earlier in Chapter 5.

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:0 ChildMulticast:239.0.0.100 ParentMulticast:239.0.0.101 URSIP:127.0.0.1 SelfPort:2000 CMPort:2001 PMPort:2002 URSPort:2003 MCAST_LOCAL_IP:239.0.0.102 MCASTPort:2004 ANYCASTPort:2005 DomainName:level0.testdomain.info @ dom00 127.0.0.1:2006 @ dom01 127.0.0.1:2011 @ dom02 127.0.0.1:2016 @ dom000 127.0.0.1:2021

117 @ dom001 127.0.0.1:2026 @ dom020 127.0.0.1:2031 @ dom0200 127.0.0.1:2036 @ dom0201 127.0.0.1:2041 @ dom0202 127.0.0.1:2046

Scenario 2 - domain 10

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.103 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2006 CMPort:2007 PMPort:2001 URSPort:2008 MCAST_LOCAL_IP:239.0.0.104 MCASTPort:2009 ANYCASTPort:2010 DomainName:dom00.level0.testdomain.info

Scenario 2 - domain 1

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.105 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2011 CMPort:2012 PMPort:2001 URSPort:2013 MCAST_LOCAL_IP:239.0.0.106 MCASTPort:2014 ANYCASTPort:2015 DomainName:dom01.level0.testdomain.info

Scenario 2 - domain 2

Anycast:127.0.0.1

118 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.107 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2016 CMPort:2017 PMPort:2001 URSPort:2018 MCAST_LOCAL_IP:239.0.0.108 MCASTPort:2019 ANYCASTPort:2020 DomainName:dom02.level0.testdomain.info

Scenario 2 - domain 3

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.109 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2021 CMPort:2022 PMPort:2001 URSPort:2023 MCAST_LOCAL_IP:239.0.0.110 MCASTPort:2024 ANYCASTPort:2025 DomainName:dom000.level0.testdomain.info

Scenario 2 - domain 4

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.111 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2026 CMPort:2027 PMPort:2001 URSPort:2028

119 MCAST_LOCAL_IP:239.0.0.112 MCASTPort:2029 ANYCASTPort:2030 DomainName:dom001.level0.testdomain.info

Scenario 2 - domain 5

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.113 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2031 CMPort:2032 PMPort:2001 URSPort:2033 MCAST_LOCAL_IP:239.0.0.114 MCASTPort:2034 ANYCASTPort:2035 DomainName:dom020.level0.testdomain.info

Scenario 2 - domain 6

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.115 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2036 CMPort:2037 PMPort:2001 URSPort:2038 MCAST_LOCAL_IP:239.0.0.116 MCASTPort:2039 ANYCASTPort:2040 DomainName:dom0200.level0.testdomain.info

Scenario 2 - domain 7

Anycast:127.0.0.1 ParentDNS:127.0.0.1

120 ParentDNSPort:2000 ChildMulticast:239.0.0.117 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2041 CMPort:2042 PMPort:2001 URSPort:2043 MCAST_LOCAL_IP:239.0.0.118 MCASTPort:2044 ANYCASTPort:2045 DomainName:dom0201.level0.testdomain.info

Scenario 2 - domain 8

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.119 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2046 CMPort:2047 PMPort:2001 URSPort:2048 MCAST_LOCAL_IP:239.0.0.120 MCASTPort:2049 ANYCASTPort:2050 DomainName:dom0202.level0.testdomain.info

Scenario 2 - domain 9 ‘Virtual DNS Tool’ parameters for scenario 3 domain hierarchy is given next. Refer to Figure 5.3 for more details.

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:0 ChildMulticast:239.0.0.100 ParentMulticast:239.0.0.101 URSIP:127.0.0.1 SelfPort:2000 CMPort:2001

121 PMPort:2002 URSPort:2003 MCAST_LOCAL_IP:239.0.0.102 MCASTPort:2004 ANYCASTPort:2005 DomainName:level0.testdomain.info @ dom00 127.0.0.1:2006

Scenario 3 - domain 10

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2000 ChildMulticast:239.0.0.103 ParentMulticast:239.0.0.100 URSIP:127.0.0.1 SelfPort:2006 CMPort:2007 PMPort:2001 URSPort:2008 MCAST_LOCAL_IP:239.0.0.104 MCASTPort:2009 ANYCASTPort:2010 DomainName:dom00.level0.testdomain.info @ dom01 127.0.0.1:2011

Scenario 3 - domain 1

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2006 ChildMulticast:239.0.0.105 ParentMulticast:239.0.0.103 URSIP:127.0.0.1 SelfPort:2011 CMPort:2012 PMPort:2007 URSPort:2013 MCAST_LOCAL_IP:239.0.0.106 MCASTPort:2014 ANYCASTPort:2015 DomainName:dom01.dom00.level0.testdomain.info @ dom02 127.0.0.1:2016

122 Scenario 3 - domain 4

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2011 ChildMulticast:239.0.0.107 ParentMulticast:239.0.0.105 URSIP:127.0.0.1 SelfPort:2016 CMPort:2017 PMPort:2012 URSPort:2018 MCAST_LOCAL_IP:239.0.0.108 MCASTPort:2019 ANYCASTPort:2020 DomainName:dom02.dom01.dom00.level0.testdomain.info @ dom000 127.0.0.1:2021

Scenario 3 - domain 5

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2016 ChildMulticast:239.0.0.109 ParentMulticast:239.0.0.107 URSIP:127.0.0.1 SelfPort:2021 CMPort:2022 PMPort:2017 URSPort:2023 MCAST_LOCAL_IP:239.0.0.110 MCASTPort:2024 ANYCASTPort:2025 DomainName:dom000.dom02.dom01.dom00.level0.testdomain.info @ dom001 127.0.0.1:2026

Scenario 3 - domain 2

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2021 ChildMulticast:239.0.0.111

123 ParentMulticast:239.0.0.109 URSIP:127.0.0.1 SelfPort:2026 CMPort:2027 PMPort:2022 URSPort:2028 MCAST_LOCAL_IP:239.0.0.112 MCASTPort:2029 ANYCASTPort:2030 DomainName:dom001.dom000.dom02.dom01.dom00.level0.testdomain.info @ dom020 127.0.0.1:2031

Scenario 3 - domain 3

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2026 ChildMulticast:239.0.0.113 ParentMulticast:239.0.0.111 URSIP:127.0.0.1 SelfPort:2031 CMPort:2032 PMPort:2027 URSPort:2033 MCAST_LOCAL_IP:239.0.0.114 MCASTPort:2034 ANYCASTPort:2035 DomainName:dom020.dom001.dom000.dom02.dom01.dom00.level0.testdomain.info @ dom0200 127.0.0.1:2036

Scenario 3 - domain 6

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2031 ChildMulticast:239.0.0.115 ParentMulticast:239.0.0.113 URSIP:127.0.0.1 SelfPort:2036 CMPort:2037 PMPort:2032 URSPort:2038 MCAST_LOCAL_IP:239.0.0.116

124 MCASTPort:2039 ANYCASTPort:2040 DomainName:dom0200.dom020.dom001.dom000.dom02.dom01.dom00.level0.testdomain.info @ dom0201 127.0.0.1:2041

Scenario 3 - domain 7

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2036 ChildMulticast:239.0.0.117 ParentMulticast:239.0.0.115 URSIP:127.0.0.1 SelfPort:2041 CMPort:2042 PMPort:2037 URSPort:2043 MCAST_LOCAL_IP:239.0.0.118 MCASTPort:2044 ANYCASTPort:2045 DomainName:dom0201.dom0200.dom020.dom001.dom000.dom02.dom01.dom00.level0. testdomain.info @ dom0202 127.0.0.1:2046

Scenario 3 - domain 8

Anycast:127.0.0.1 ParentDNS:127.0.0.1 ParentDNSPort:2041 ChildMulticast:239.0.0.119 ParentMulticast:239.0.0.117 URSIP:127.0.0.1 SelfPort:2046 CMPort:2047 PMPort:2042 URSPort:2048 MCAST_LOCAL_IP:239.0.0.120 MCASTPort:2049 ANYCASTPort:2050 DomainName:dom0202.dom0201.dom0200.dom020.dom001.dom000.dom02.dom01.dom00. level0.testdomain.info

Scenario 3 - domain 9

125 For session discovery latency experiments, we used the following list of keywords for session registration. Each keyword generated one registration request, and the same keyword was used for session search which immediately followed the registration request. gator, hindi, rediff, football, soccer, movies, audio, songs, picture, piyush, amrita, table, dinner, restaurant, match, base, tyre, car, couch, potato, refrigerator, shelf, motorcycle, sweater, shirt, dress, purse, mobile, watch, clock, top, jacket, coat, idol, deity, kitchen, market, mall, road, footpath, spectacle, television, knife, board, onion, jalapeno, beer, time, mouse, telefone, pen, cover, case, copy, book, pencil, light, bulb, fan, tape, suitcase, paper, garland, garden, flower, carpet, tie, necklace, lens, camera, battery, cake, icing, sugar, milk, egg, water, envelope, drawer, cheque, belt, shoe, slipper, scanner, cards, rocket, shuttle, tennis, ball, legs, hands, fingers, nail, toe, hammer, srew, plier, match-stick, gun, fun, park, swing, slope, ranch, grass, bike, helmet, gear, gloves, batter, pillow, quilt, tissue, mop, broom, cargo, sweet, perfume, frangrance, meat, butter, salt, tea, coffee, ground, boil, receipt, plastic, floor, wire, number, frown, torch, rope, tent, camp, row, boat, tide, river, stream, ocean, mountain, mushroom, fungi, algae, ferns, leaf, bud, eggplant, cucumber, radish, mustard, honey, oil, pan, spatula, mixer, dough, juice, cook, cookie, spice, walnut, cinnamon, eat, jump, hop, run, play, alligator, turtle, fish, snake, slime, moss, bullet, cannon, lamp, medicine, vitamin, cholera, disease, hospital, doctor, nurse, patient, foot, malaria, scalp, ear, throat, drink, force, hair, long, dictionary, speaker, album, mirror, lip-stick, petroleum, gasoline, flourine, asbestos, arsenic, mild, wild, animal, deep, blue, whale, dolphin, puppy, birds, aquarium, radium, mars, planet, solar, sun, rays, ozone, atmosphere, aeroplane, flight, orange, pretzel, dance, salsa, latino, pepper, good, sauce, scream, shout, yell, radio, next, rock, guitar, saxophone, castle, stairs, porch, patio, change, pool, fry, saute, grind, burn, churn, turn, garbage, dust-bin, bun, noodles, rice, ring, police, jeep, truck, bus, children, school, nursery, animation, alien, combat, challenge, whip, leash, cream, pie, hat, bat, door, kid, prank, switch, blanket, death, fear, insect, net, mosquito, robot, laser, robot, hello, greet, smile, grin, strap, breeze, wind, air, gale, hurricane, storm, rain, current, ship, yatch, enough

Data was collected using up to 5 domains connected according to scenario 3 hierarchy. The ‘Virtual DNS’, MSD, and URS server parameters were setup according to the configuration details for domains 10, 1, 4, 5, and 2 provided earlier.

126 REFERENCES [1] R. Wright, IP Routing Primer. Macmillan Technical Publishing, 1998.

[2] C. Partridge, T. Mendez, and W. Milliken, “Host Anycasting Service,” RFC 1546 (Informational), Internet Engineering Task Force, Nov. 1993. [Online]. Available: http://www.ietf.org/rfc/rfc1546.txt [Accessed: July 21, 2010]

[3] B. Williamson, Developing IP Multicast Networks. Cisco Press, 1999.

[4] B. M. Edwards and B. Wright, Interdomain Multicast Routing: Practical Juniper Networks and Cisco Systems Solutions. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2002, foreword By-John W. Stewart.

[5] S. Bhattacharyya, “An Overview of Source-Specific Multicast (SSM),” RFC 3569 (Informational), Internet Engineering Task Force, July 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3569.txt [Accessed: July 21, 2010] [6] S. Deering, “Host extensions for IP multicasting,” RFC 1112 (Standard), Internet Engineering Task Force, Aug. 1989, updated by RFC 2236. [Online]. Available: http://www.ietf.org/rfc/rfc1112.txt [Accessed: July 21, 2010] [7] D. Farinacci and L. Wei, Auto-RP: Automatic discovery of Group-to-RP mappings for IP multicast. CISCO Press, Sept 9, 1998. [Online]. Available: ftp://ftpeng.cisco.com/ipmulticast/pim-autorp-spec01.txt [Accessed: July 20, 2010] [8] D. Meyer, “Administratively scoped IP multicast,” RFC 2365 (Best Current Practice), July 1998. [Online]. Available: http://www.ietf.org/rfc/rfc2365.txt [Accessed: July 20, 2010] [9] M. Handley, “Session directories and scalable internet multicast address allocation,” SIGCOMM Comput. Commun. Rev., vol. 28, no. 4, pp. 105–116, 1998.

[10] D. Zappala, V. Lo, and C. GauthierDickey, “The multicast address allocation problem: Theory and practice,” Special Issue of Computer Networks, 2004. [11] V. Lo, D. Zappala, C. Gauthierdickey, and T. Singer, “A theoretical framework for the multicast address allocation problem,” in IEEE Globecom, Global Internet Symposium, Tech. Rep., 2002.

[12] M. Livingston, V. Lo, K. Windisch, and D. Zappala, “Cyclic block allocation: A new scheme for hierarchical multicast address allocation,” in in First International Workshop on Networked Group Communication. Bowersock, 1999, pp. 216–234. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.351 [Accessed: July 20, 2010]

[13] S. Pejhan, A. Eleftheriadis, and D. Anastassiou, “Distributed multicast address management in the global Internet,” Selected Areas in Communications, IEEE Journal on, vol. 13, no. 8, pp. 1445–1456, Oct 1995.

127 [14] S. Kumar, P. Radoslavov, D. Thaler, C. Alaettinoglu,˘ D. Estrin, and M. Handley, “The MASC/BGMP architecture for inter-domain multicast routing,” SIGCOMM Comput. Commun. Rev., vol. 28, no. 4, pp. 93–104, 1998. [15] V. Jacobson, “Multimedia conferencing on the Internet,” SIGCOMM, Aug 1994.

[16] Y. K. Dalal and R. M. Metcalfe, “Reverse path forwarding of broadcast packets,” Commun. ACM, vol. 21, no. 12, pp. 1040–1048, 1978.

[17] D. Waitzman, C. Partridge, and S. Deering, “Distance Vector Multicast Routing Protocol,” RFC 1075 (Experimental), Internet Engineering Task Force, Nov. 1988. [Online]. Available: http://www.ietf.org/rfc/rfc1075.txt [Accessed: July 21, 2010]

[18] A. S. Thyagarajan and S. E. Deering, “Hierarchical distance-vector multicast routing for the MBone,” in SIGCOMM ’95: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication. New York, NY, USA: ACM, 1995, pp. 60–66.

[19] D. Estrin, D. Farinacci, A. Helmy, V. Jacobson, and L. Wei, “Protocol independent multicast (PIM) dense mode protocol specification,” 1996. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.2968 [Accessed: July 20, 2010] [20] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, and L. Wei, “Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification,” RFC 2362 (Experimental), Internet Engineering Task Force, June 1998, obsoleted by RFCs 4601, 5059. [Online]. Available: http://www.ietf.org/rfc/rfc2362.txt [Accessed: July 20, 2010]

[21] J. Moy, “Multicast Extensions to OSPF,” RFC 1584 (Historic), Internet Engineering Task Force, Mar. 1994. [Online]. Available: http://www.ietf.org/rfc/rfc1584.txt [Accessed: July 20, 2010]

[22] A. Ballardie, “Core Based Trees (CBT) Multicast Routing Architecture,” RFC 2201 (Historic), Internet Engineering Task Force, Sept. 1997. [Online]. Available: http://www.ietf.org/rfc/rfc2201.txt [Accessed: July 20, 2010] [23] T. Bates, R. Chandra, D. Katz, and Y. Rekhter, “Multiprotocol Extensions for BGP-4,” RFC 4760 (Draft Standard), Internet Engineering Task Force, Jan. 2007. [Online]. Available: http://www.ietf.org/rfc/rfc4760.txt [Accessed: July 20, 2010] [24] D. Thaler, “Border gateway multicast protocol (BGMP): Protocol specification,” RFC 3913 (Historic), Sep 2004. [Online]. Available: http://www.ietf.org/rfc/rfc3913.txt [Accessed: July 20, 2010]

[25] B. Fenner and D. Meyer, “Multicast Source Discovery Protocol (MSDP),” RFC 3618 (Experimental), Internet Engineering Task Force, Oct. 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3618.txt [Accessed: July 21, 2010]

128 [26] S. Deering, W. Fenner, and B. Haberman, “Multicast listener discovery (MLD) for IPv6,” RFC 2710 (Proposed Standard), Oct 1999, updated by RFCs 3590, 3810. [Online]. Available: http://www.ietf.org/rfc/rfc2710.txt [Accessed: July 20, 2010] [27] B. Haberman, “Source address selection for the multicast listener discovery (MLD) protocol,” RFC 3590 (Proposed Standard), Sep 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3590.txt [Accessed: July 20, 2010] [28] R. Vida and L. Costa, “Multicast listener discovery version 2 (MLDv2) for IPv6,” RFC 3810 (Proposed Standard), Jun 2004, updated by RFC 4604. [Online]. Available: http://www.ietf.org/rfc/rfc3810.txt [Accessed: July 20, 2010] [29] H. Holbrook, B. Cain, and B. Haberman, “Using internet group management protocol version 3 (IGMPv3) and multicast listener discovery protocol version 2 (MLDv2) for source-specific multicast,” RFC 4604 (Proposed Standard), Aug 2006. [Online]. Available: http://www.ietf.org/rfc/rfc4604.txt [Accessed: July 20, 2010] [30] W. Fenner, “Internet Group Management Protocol, Version 2,” RFC 2236 (Proposed Standard), Internet Engineering Task Force, Nov. 1997, obsoleted by RFC 3376. [Online]. Available: http://www.ietf.org/rfc/rfc2236.txt [Accessed: July 21, 2010] [31] B. Cain, S. Deering, I. Kouvelas, B. Fenner, and A. Thyagarajan, “Internet Group Management Protocol, Version 3,” RFC 3376 (Proposed Standard), Internet Engineering Task Force, Oct. 2002, updated by RFC 4604. [Online]. Available: http://www.ietf.org/rfc/rfc3376.txt [Accessed: July 21, 2010] [32] P. Mockapetris and K. J. Dunlap, “Development of the domain name system,” SIGCOMM Comput. Commun. Rev., vol. 18, no. 4, pp. 123–133, 1988.

[33] “BBC multicast trial.” [Online]. Available: http://support.bbc.co.uk/multicast/ [Accessed: July 21, 2010]

[34] M. Naor and M. Yung, “Universal one-way hash functions and their cryptographic applications,” in STOC ’89: Proceedings of the twenty-first annual ACM symposium on Theory of computing. New York, NY, USA: ACM, 1989, pp. 33–43.

[35] R. Rivest, “The MD5 message-digest algorithm,” RFC 1321 (Informational), Apr 1992. [Online]. Available: http://www.ietf.org/rfc/rfc1321.txt [Accessed: July 20, 2010] [36] P. Harsh and R. Newman, A Hierarchical Multicast Session Directory Service Architecture, Nov 2009, internet Engineering Task Force, ID 19409. [Online]. Available: https://datatracker.ietf.org/doc/draft-mdns-rfc-informational/ [Accessed: July 21, 2010]

129 [37] A. I. T. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems,” in Middleware ’01: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg. London, UK: Springer-Verlag, 2001, pp. 329–350.

[38] P. Maymounkov and D. Mazires, “Kademlia: A peer-to-peer information system based on the XOR metric,” in Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2002, pp. 53–65.

[39] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A scalable content-addressable network,” in SIGCOMM ’01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications. New York, NY, USA: ACM, 2001, pp. 161–172. [40] M. Handley, “The sdr session directory: An MBone conference scheduling and booking system,” April 1996. [Online]. Available: http: //www-mice.cs.ucl.ac.uk/multimedia/software/sdr/ [Accessed: July 20, 2010] [41] P. Namburi and K. Sarac, “Multicast session announcements on top of SSM,” Communications, 2004 IEEE International Conference on, vol. 3, pp. 1446–1450 Vol.3, 20-24 June 2004.

[42] P. Liefooghe and M. Goosens, “The next generation IP multicast session directory,” SCI, Orlando FL, July 2003. [43] C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz, “The Harvest information discovery and access system,” Computer Networks and ISDN Systems, vol. 28, no. 1–2, pp. 119–125, December 1995.

[44] A. Swan, S. McCanne, and L. A. Rowe, “Layered transmission and caching for the multicast session directory service,” in ACM Multimedia, 1998, pp. 119–128. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.7847 [Accessed: July 20, 2010] [45] A. Santos, J. Macedo, and V. Freitas, “Towards multicast session directory services.” [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.3212 [Accessed: July 20, 2010]

[46] N. Sturtevant, N. Tang, and L. Zhang, “The information discovery graph: towards a scalable multimedia resource directory,” Internet Applications, 1999. IEEE Workshop on, pp. 72–79, Aug 1999. [47] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” SIGCOMM Comput. Commun. Rev., vol. 31, no. 4, pp. 149–160, 2001.

130 [48] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica, and H. Yu, “OpenDHT: a public DHT service and its uses,” in SIGCOMM ’05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications. New York, NY, USA: ACM, 2005, pp. 73–84.

[49] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz, “Tapestry: a resilient global-scale overlay for service deployment,” Selected Areas in Communications, IEEE Journal on, vol. 22, no. 1, pp. 41–53, Jan. 2004.

[50] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling churn in a DHT,” in ATEC ’04: Proceedings of the annual conference on USENIX Annual Technical Conference. Berkeley, CA, USA: USENIX Association, 2004, pp. 10–10. [51] Freedman, M. J., Lakshminarayanan, Karthik, Rhea, Sean, and I. Stoica, “Non-transitive connectivity and DHTs,” WORLDS’05: Proceedings of the 2nd conference on Real, Large Distributed Systems, pp. 55–60, 2005. [52] K. P. N. Puttaswamy and B. Y. Zhao, “A case for unstructured distributed hash tables,” in Proc. of Global Internet Symposium, Anchorage, AK, May 2007.

[53] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica, and H. Yu, “OpenDHT: a public DHT service and its uses,” in SIGCOMM ’05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications. New York, NY, USA: ACM, 2005, pp. 73–84. [54] T. Roscoe and S. Hand, “Palimpsest: soft-capacity storage for planetary-scale services,” in HOTOS’03: Proceedings of the 9th conference on Hot Topics in Operating Systems. Berkeley, CA, USA: USENIX Association, 2003, pp. 22–22. [55] B. K. Sylvia, S. R. S. Rhea, and S. Shenker, “Spurring adoption of DHTs with OpenHash, a public DHT service,” in IPTPS, 2004. [56] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph, “Tapestry: An infrastructure for fault-tolerant wide-area location and,” University of California at Berkeley, Berkeley, CA, USA, Tech. Rep., 2001.

[57] C. G. Plaxton, R. Rajaraman, and A. W. Richa, “Accessing nearby copies of replicated objects in a distributed environment,” in SPAA ’97: Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures. New York, NY, USA: ACM, 1997, pp. 311–320.

[58] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, C. Wells, and B. Zhao, “OceanStore: an architecture for global-scale persistent storage,” in ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems. New York, NY, USA: ACM, 2000, pp. 190–201.

131 [59] S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, R. H. Katz, and J. D. Kubiatowicz, “Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination,” in NOSSDAV ’01: Proceedings of the 11th international workshop on Network and operating systems support for digital audio and video. New York, NY, USA: ACM, 2001, pp. 11–20.

[60] A. Rowstron and P. Druschel, “Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility,” SIGOPS Oper. Syst. Rev., vol. 35, no. 5, pp. 188–201, 2001.

[61] A. I. T. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel, “SCRIBE: The design of a large-scale event notification infrastructure,” in NGC ’01: Proceedings of the Third International COST264 Workshop on Networked Group Communication. London, UK: Springer-Verlag, 2001, pp. 30–43.

[62] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, “One ring to rule them all: service discovery and binding in structured peer-to-peer overlay networks,” in EW 10: Proceedings of the 10th workshop on ACM SIGOPS European workshop. New York, NY, USA: ACM, 2002, pp. 140–145. [63] R. Droms, “Automated configuration of TCP/IP with DHCP,” Internet Computing, IEEE, vol. 3, no. 4, pp. 45–53, Jul/Aug 1999.

[64] ——, “Dynamic host configuration protocol,” RFC 2131 (Draft Standard), Mar 1997, updated by RFCs 3396, 4361. [Online]. Available: http://www.ietf.org/rfc/rfc2131.txt [Accessed: July 20, 2010] [65] R. Karedla, J. S. Love, and B. G. Wherry, “Caching strategies to improve disk system performance,” Computer, vol. 27, no. 3, pp. 38–46, 1994. [66] E. J. O’Neil, P. E. O’Neil, and G. Weikum, “The LRU-K page replacement algorithm for database disk buffering,” in SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data. New York, NY, USA: ACM, 1993, pp. 297–306.

[67] “The Network Simulator - ns-2.” [Online]. Available: http://www.isi.edu/nsnam/ns/ [Accessed: July 21, 2010]

[68] P. Harsh, “mDNS simulation data access website.” [Online]. Available: http://www.cons.cise.ufl.edu/mdns/ [Accessed: July 21, 2010] [69] Y.-h. Chu, S. G. Rao, and H. Zhang, “A case for end system multicast (keynote address),” in SIGMETRICS ’00: Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. New York, NY, USA: ACM, 2000, pp. 1–12.

132 [70] S. E. Deering, “Multicast routing in internetworks and extended LANs,” in SIGCOMM ’88: Symposium proceedings on Communications architectures and protocols. New York, NY, USA: ACM, 1988, pp. 55–64. [71] G. Camarillo and M. A. Garcia-Martin, The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds. John Wiley & Sons, 2006.

[72] R. Kalden, I. Meirick, and M. Meyer, “Wireless internet access based on GPRS,” 2000. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1. 11.7851 [Accessed: July 20, 2010]

133 BIOGRAPHICAL SKETCH Piyush Harsh has been born in a well educated and scientifically oriented family. His father, who is a M.D. has been the biggest influence on him, instilled scientific curiosity right from his early childhood. He was always a good student, excelling in studies in his high school. All his hard work paid off when he got a chance to go and study at Indian Institute of Technology, Roorkee. He graduated with a bachelor’s degree in Computer Science and Technology in Spring 2003. To further his scientific training he decided to accept full scholarship from University of Florida and joined into the Ph.D. program at Department of Computer Science and

Engineering in Fall 2003 and came to the US. He was the first one ever to travel abroad for higher education in his family.

Under the able guidance of Dr. Richard Newman (his adviser) and his Ph.D. committee members, especially Dr. Randy Chow, he was involved in numerous scientific projects. During his stay at University of Florida, he worked in the fields of security, computer networks and cognitive computing. Lately his research interest has been focused on bio-inspired network models including ways to adapt models of human brain into future network design. When he is not doing research work, he enjoys outdoor activities including long distance trail biking and hiking in nature reserves. He believes in preservation of environment and aspires to be an active participant in near future in this nobel cause.

134