<<

The of a Sensor: Integrating Real-world Experience in Virtual Worlds using Mobile Phones

Mirco Musolesi∗, Emiliano Miluzzo∗, Nicholas D. Lane∗, Shane B. Eisenman†, Tanzeem Choudhury∗, and Andrew T. Campbell∗ ∗Dartmouth College, USA †Columbia University, USA

Abstract Keywords simulators like Second Life represent the Sensor systems, mobile phones, Second Life, virtual latest and most successful frontier of online services for en- worlds, activity recognition tertainment and business. People have virtual lives in these worlds using personal avatars. Bridging real life and these 1 Introduction virtual worlds together is challenging, but enables new ap- Virtual world simulators represent one of the latest and plication scenarios for these systems. Existing work focuses most successful frontiers of online entertainment. Among on the representation of objects inside virtual worlds. We be- them, Second Life [8] is one of the most popular Internet- lieve that the overall experience will be improved if we can based virtual worlds in terms of subscribers with over reproduce not only inanimate and passive objects like build- 500,000 active users. A client program called the Second ings and beer steins but also the subjects in these virtual en- Life Viewer allows its users (called residents) to interact with vironments. In order to have a seamless integration between each other through mobile avatars. Inside the game, resi- the physical and virtual worlds, and to maximize the scale of dents can socialize and participate in individual and group adoption, the system should rely on devices that people al- activities. An ’s appearance may be customized across ready use every day. For this reason, we believe that mobile a wide range of physical attributes. Avatars may be com- phones are a natural and ideal conduit. pletely invented, without any relation to the physical aspect In this paper, we argue that the sensors embedded in com- of the human user, or can be made to closely resemble the hu- mercial mobile phones can be used to infer real-world activi- man whom they represent. However, in the current system, ties, that in turn can be reproduced in virtual settings. We dis- is still a divide between the virtual world and the real cuss the challenges related to the implementation of such a one. Specifically, a user may be able to recreate his appear- system both from algorithmic and systematic points of view. ance but he can not automatically mirror/translate his current We present the design of a prototype integrating a platform actions and movements in his real life to the virtual life of his for inferring activities using mobile phones and Second Life. avatar. To the best of our knowledge this is the first time that sensors Our vision is to bridge the current divide between real used on everyday mobile phones reflecting a person’s activ- worlds and virtual worlds, by having real world activities ity in the physical world has been expressed in virtual worlds (e.g., sitting, walking, running, cooking) and other aspects of such as Second Life. the human represented in real time in the virtual world. Our aim is to go a step further than simply reproducing spaces Categories and Subject Descriptors (e.g., rooms, buildings, public spaces [7,11]) and the objects C.3 [Communication/Networking and Information with which humans interact. We focus instead on provid- Technology]: Special-Purpose and Application-Based Sys- ing a virtual representation of humans, their surroundings as tems—Ubiquitous Computing sensed by the humans themselves, and interactions with the members of their social networks (e.g., fetched from external General Terms sources such as Facebook [3]). Design, Measurement, Experimentation, Human Factors There are existing virtual worlds that map to the real one with great accuracy in terms of geographical locations and local features. Complex virtual representations of real geo- graphical places exist such as the ones provided by Google Earth [4] and Microsoft Virtual Earth [9]. In Google Earth it is possible to insert 3-D representations of buildings and ar- Permission to make digital or hard copies of all or part of this work for personal or tifacts by means of Google Sketch. We believe that the next classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation step will be representing real world activities in this kind of on the first page. To copy otherwise, to republish, to post on servers or to redistribute virtual setting whereby personal avatars will be able to move to lists, requires prior specific permission and/or a fee. EmNets’08 June 2-3, 2008, Charlottesville, Virginia, USA inside these artificial spaces following the real positions and Copyright 2008 ACM XXX-X-XXXXX-XXX-X/XX/XX....00 ...$5.00 activities of people. of the extraction of high-level activities (e.g., sitting, run- ning, walking or talking) from time series of raw data sam- pled from sensors (e.g., accelerometer, microphone). There Wi-Fi Connection CenceMe Back-end is significant interest in the research community in designing

CenceMe high accuracy activity recognition algorithms [6, 12]. Many Second Life Second Life Bridge Servers user-specified visualization policies for inferred activity may Component exist. Some users might want to map their physical activities Cellular Network in the real world into different virtual ones. For example, a user may want to specify that the action of running cor- CenceMe Clients responds to flying in the virtual world. The other interest- Figure 1. Architecture of the system bridging CenceMe ing aspect is information accuracy. Some users may prefer and Second Life to discard the information if the inference is not sufficiently accurate given predefined quantitative thresholds. Alterna- tively, a default activity (e.g., standing) can be preset, which Sensors have been used to support presence in virtual en- will be visualized if the system is not able to perform activity vironments since the first experiments on the Virtual Real- recognition with a sufficient accuracy. This process can be ity Theaters [2]. In most of these systems, individuals carry based on the definition of specific classifier thresholds on the special wearable devices like helmets and wristbands with Receiver Operating Characteristic (ROC) curves [13]. Gen- embedded sensors measuring their movements. These are erally, the accuracy of these measurements is related to the special-purpose external devices used to map user’s actions. limitations of currently available embedded sensors. We be- More recently, the integration of sensors and Second Life is lieve activity recognition accuracy will only improve in the discussed as a future work in [7], where the authors propose future when more powerful sensors become available for use to reproduce the entire floor of a building using data collected with more effective inference algorithms. by means of the Plug [7] platform. However, the focus of [7] We also consider the possibility of information flow from is on the reproduction of a working space and not on the ac- virtual worlds to the phones. Of particular interest is re- tivities of people moving inside it (i.e, it is focused on places searching the best methods for reporting and representing in and objects and not on subjects). Moreover, [7] is based on a the real world changes and interactions that occur in the vir- non-standard hardware platform produced in small quantities tual world. An example of a very simple interaction is the and not available to the general public. opening of an internet chat window if avatars start a conver- Starting from this technological scenario, the contribution sation in Second Life. More complex interaction dynamics of this paper can be summarized as follows: can be envisaged and the improvements of the user interfaces • We discuss the systems challenges and the design issues of the phones are likely to open new research opportunities. in integrating real world people-centric sensor systems 2.2 Intermittent Connectivity and virtual worlds; Cell phones may experience intermittent connectivity due • We present a system that is able to reproduce human ac- to limits in network coverage. Virtual avatars should be set tivities and not only inanimate objects in a virtual world to a meaningful state (such as the latest known state with for the first time; a question mark appearing above the avatar) during intervals when no new updates are received. A more complex solution • We discuss the design of a preliminary prototype based may involve the use of prediction techniques to determine the on the integration of a system for activity inference current state and the location given the sequence of the previ- based on mobile phones and Second Life. In Section 2 we present an overview of the main challenges ous states. Delay tolerant mechanisms can also be adopted to in implementing systems supporting real activity inference temporarily store user status locally on the phone in case of algorithms for virtual worlds based on data extracted using disconnection. The time series of state data can later be input mobile phones. In Section 3 we discuss our ongoing work on to the virtual world. Research is needed to determine how to the CenceMe platform [1] that is the basis for the practical best (if at all) represent such a delayed burst of physical state implementation of the proposed system, and present prelim- information in the virtual world. We believe that strict real- inary results that are at the root of the proposed architecture. time action rendering may be not necessary for most users. Section 4 concludes the paper, summarizing the contribution However, this is an interesting aspect that should be analyzed of our work. in detail for different activities and classes of users by means of experiments and surveys. 2 System Design Challenges 2.3 Scalability In this section, we discuss the key issues in detecting real- Given the potentially high number of users, designing world activities by means of commercial mobile phones and for scalability is important. To reduce the number of sta- seamlessly integrating this information into virtual worlds tus updates flowing into the Second Life visualization, the environments. status/activity of the virtual avatar is updated only when the 2.1 Activity Recognition and Visualization inferred status/activity of the human user changes. This, in Activity inference, mapping and visualization are key as- turn, is dependent on how often the activity classifier is exe- pects of the proposed system. Activity inference consists cuted. By leveraging historical activity traces, we propose to set the classifier execution frequency proportional to the de- sensor data using common and popular media such as Web- gree of variability of the user’s past behavior. A sensor data based portals (e.g., the SensorMap project [10], the BikeNet push model from the mobile phone to the server is used. We system [1]). People-centric sensing is an ideal way of col- believe the mobile phone is best positioned to manage the lecting data for these systems in real-time. Data can be col- trade-off between limited local storage, data upload opportu- lected and displayed in virtual worlds such as Second Life in nity, and user expectations of application responsiveness. a powerful and captivating manner by changing the appear- Therefore, our default approach is to run activity recog- ance of an environment according to the sensed data. For ex- nition algorithms on the mobile device to reduce communi- ample, high humidity may trigger the visualization of rain or cation costs and also to reduce the computational burden on misty conditions. A pre-processing of the raw environmen- the server. However, we recognize that some classification tal data may be performed in the back-end. Users may also algorithms may be too computationally intensive for hand- actively participate in the data collection process through held devices and propose to run a portion (possibly all) of the the acquisition of multimedia content such as recordings of classifier on a back end server in these cases. This may be sounds and voices, photos and videos to be integrated in the particularly appropriate during the training phase of a clas- virtual worlds. While these aspects are very promising, they sification model, or when data sourced from multiple users are out of the scope of this work. support feature vectors input to the classifier. Low-fidelity 2.7 Privacy and Social Implications classifiers requiring fewer resources can be used on mobile To manage their exposure, users should be able to dis- phones in absence of connectivity or to reduce transmission connect from the virtual worlds at any time or be able to costs. interrupt the sampling of their activities. The actual posi- 2.4 Data Sharing and Group Activities tions and the actions of users in the virtual world may be dis- The focus of existing work on sensor-based human activ- closed according to pre-configured user policies (e.g., only to ity inference is on the analysis and processing of information one’s listed buddies imported from other websites like Face- collected by devices carried by or attached to a single indi- book [3]). The problem of location privacy is even more im- vidual [6, 12]. This inference process exploits only the in- portant in this class of systems where personal information formation extracted by the local sensors. We believe that the is exposed to a potentially large set of people. Additional system should leverage the sharing of raw or processed data privacy issues are related to the collection of environmen- among different people in physical proximity by means of tal data by means of people carrying devices and, above all, short range radio communication. This mechanism may also data collected about them. In [5] the authors discuss impor- allow for a more accurate inference of group activities such tant privacy and security challenges related to people-centric as talking or moving together, improving their rendering in sensing. the virtual space [14]. Moreover, by means of this mecha- 3 System Prototype Implementation nism, individuals will be able to update not only their status, In this section, we discuss the overall architecture of the but also those of their neighbors. system for the integration of activity information into vir- 2.5 External Sensing Devices tual worlds such as Second Life and Map Services such as In addition to sourcing data from phones, the activity Google Earth. We first describe ongoing work related to the recognition process may be augmented by means of exter- CenceMe [1] platform, which acts as the core service for the nal devices connected to the phones through the Bluetooth retrieval and the analysis of the sensed data in our architec- radio. These devices may be equipped with additional sen- ture. sors in order to improve the accuracy of activity classifiers Bridging CenceMe and Virtual Worlds. We give running on the phone, or to extend the set of the actions that a brief overview of the CenceMe platform in order to un- can be inferred. Having a small size sensing module external derstand the key issues that underpin the proposed archi- to the phone (e.g., attached to the ankle) allows for more ac- tecture. For more detailed information on CenceMe refer curate activity inference (e.g., running) compared with using to [1]. CenceMe is a general personal sensing system that phone-embedded sensors which may be ill-positioned (e.g., enables members of social networks to share their sensing in the pocket). presence extracted by means of mobile phones with their For this purpose, we developed a badge-sized device, the buddies. CenceMe is able to provide information about the BlueCel, integrating a set of various sensors including ac- personal status of its users and their surroundings. Similarly, celerometer, light, temperature and Galvanic Skin Response by mining the sensed data CenceMe can extract patterns and (GSR) sensors. The GSR sensor can be used to infer emo- features of importance in a user’s life routines, such as the tions like stress level. In our vision, the system should not current status in terms of activity (e.g, as sitting, walking, rely on these devices, but on off-the-shelf platforms (i.e., the standing, dancing, meeting friends) and physical or logical sensors integrated in the mobile phones). At the same time, location (e.g, at the gym, coffee shop, at work or other sig- it should be able to use them in a seamless and opportunistic nificant places for the individuals). manner when available. We have integrated the CenceMe back-end and the Sec- ond Life servers by means of a connector component, devel- 2.6 Context Reflection oped for the Second Life client, that receives inputs Virtual worlds are an excellent medium to represent en- from the CenceMe back-end. The logical architecture of the vironmental data. There is a growing interest in displaying system is shown in Figure 1. The Second Life Linux client induceAction(list outputMapping) { ... if (body == MAPPING_D) { llStartAnimation(PHY_ACTIVITY_D); state = PHY_ACTIVITY_D; } else if (body == MAPPING_G) { llStartAnimation(PHY_ACTIVITY_G); state = PHY_ACTIVITY_G; } ... else { llStartAnimation(DEFAULT_ACTIVITY); state = DEFAULT_ACTIVITY; } } default { timer() { Figure 3. Second Life display of near-real-time inferred http_request_id = llHTTPRequest(cme_backend),parameter_list, activities injected from CenceMe. Three states set by default_request); the CenceMe infrastructure are translated according to } a user-defined mapping: sitting → yoga-floating (left), http_response(key request_id, integer status, list metadata, string body) { standing → standing (center), and running → flying outputMapping = parseResponse(body); (right). induceAction(outputMapping); } } Figure 2. Code of the LSL script triggering the mapping [8]). The interactions with the CenceMe infrastructure use of the activities the same CenceMe external application extension APIs as other application plug-ins (i.e., Facebook, Pidgin) that have been constructed (as described in [1]). Objects acquire a new is available as open-source software [8] as an alpha version state by polling the CenceMe infrastructure at a rate that can release. CenceMe can be used to interconnect virtual worlds be set by users (within ranges defined by CenceMe adminis- and social networking systems such as Facebook that have trators). The avatar can disconnect from CenceMe by taking become extremely popular. Information can be fetched from off the object. social network sites and displayed in virtual environments The interface between the CenceMe system and Second (e.g., presence information about the buddies/groups can be Life largely uses off-the-shelf technologies. Java servlets automatically reflected in Second Life). Virtual social events hosted within a Tomcat server offer a set of APIs made can be also organized between members of the same social available via XML-RPC for retrieving information about networking community. Instead of using basic instant mes- user activities from a MySQL database. The communica- saging, members may see what their friends and colleagues tion paradigm between the Second Life and CenceMe plat- are doing (playing sports, walking, etc.) and their current forms that we adopt is pull-based, i.e., periodic requests are geographical position by means of a three-dimensional visu- issued by a script running in the Second Life platform to the alization provided by a virtual environment rendering engine CenceMe server. Our implementation uses this API to both like the one used by Second Life. This may replace the cur- retrieve inferred activities about CenceMe users as well as rent simple but rather cryptic idle/away/online status icons a specification that defines how these inferences are to be and text messages. represented. A library of potential Second Life activities is There is currently no publicly available API to create and maintained (e.g., walking, teleporting, yoga-floating) from move characters in map services using Google Earth or Mi- which the user may choose to map specific real-world ac- crosoft Live Search Maps. A proposal for merging Google tivities. These relationships are imported and applied to all Earth and Second Life exists [11] as far as inanimate objects incoming data. Real actions can be mapped to different ac- are concerned. We plan to integrate CenceMe and these tools tions in the virtual world (e.g., running mapped to flying). as soon as an API is made available. Each virtual object within Second Life retrieves data in- Design and Preliminary Evaluation of the Prototype. dependently from the CenceMe server. Each object is imple- We design a preliminary prototype of a component bridging mented using the LSL language. LSL can be used to specify Second Life and CenceMe to better understand the limita- object behavior within the Second Life environment. The tions of our systems design. The integration is based on the code is compiled by Second Life servers into bytecode with exploitation of virtual CenceMe objects that the Second Life a maximum memory usage limit of 16KB. The bytecode is avatar can wear or carry. This object alters the avatar an- executed by virtual machines running on these servers. LSL imation to reflect the actual activity state of a human user is an event oriented language and it is suited to implement carrying a cell phone that is running the CenceMe sensing state machine style solutions. In our case, it is used to im- application. The CenceMe object is implemented as a stan- plement reactions to changes in real world activities by trig- dard Second Life object that we program to communicate gering modifications of the activities displayed in the virtual with the CenceMe infrastructure via XML encoded requests world. and responses exchanged via HTTP. Second Life CenceMe In Figure 2, we show a fragment of the LSL implemen- object behavior is written in LSL (Linden Script Language tation of the objects. The flow of execution is as follows: delay due to the transmission from the CenceMe back-end to Real world transitions Virtual world transitions the Second Life servers, and the display of the activity infor- mation in Second Life is negligible.

running We are augmenting our current implementation to: inves- tigate controlling movement in the geographical space; de- fine appropriate privacy policies; and develop mechanisms to walking address periods when the avatar becomes disconnected from the physical user, or when inferred user state accuracy falls below an acceptable bound. standing still 4 Conclusion 0 20 40 60 80 100 120 140 160 180 Time (seconds) In this paper, we have proposed to bridge real and virtual Figure 4. Delay Measurements worlds using commercial mobile phones equipped with stan- dard sensors. We have discussed a number of key issues that need to be tackled to realize our vision. We presented a pre- liminary implementation of this architecture, demonstrating once the object is attached to the avatar, a periodic timer a live system integration of CenceMe and Second Life. To is initiated. In our experiment, the period of the timer was the best of our knowledge, this is the first time that sensors 0.5 seconds. The body of the timer causes the HTTP re- used on everyday mobile phones reflecting a person’s activ- quest to be made (see the timer() body in Figure 2). An ity in the physical world has been expressed in Second Life. XML-RPC based request for both the latest inferred action and the user defined mapping from CenceMe inferred ac- Acknowledgment tivities to Second Life actions is sent. An event handler This work is supported in part by Intel Corp., Nokia, for the returning response (see the http_response() block) NSF NCS-0631289, ARO W911NF-04-1-0311, and the In- first calls parseResponse() which processes the incoming stitute for Security Technology Studies (ISTS) at Dartmouth response, returning a list of potentially invoked avatar ac- College. ISTS support is provided by the U.S. Department tions. Then, this output is provided to a second function, of Homeland Security under award 2006-CS-001-000001, induceAction() which takes these responses and deter- and by award 0NANB6D6130 from the U.S. Department of mines which avatar action should be displayed. Commerce. The views and conclusions contained in this document are those of the authors and should not be inter- As described above, we use a clothing metaphor to pro- preted as necessarily representing the official policies, either vide an intuitive way for people to configure and control expressed or implied, of any supporting agency. the mapping of their actions. In Figure 3, we show a num- ber of screenshots of an avatar that carries a CenceMe vir- 5 References [1] A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, R. Peterson, H. Lu, tual object (in this case, a pager - see inset). In the figures, X. Zheng, M. Musolesi, K. Fodor, and G.-S. Ahn. The rise of people-centric we show three states set by the CenceMe infrastructure and sensing. IEEE Internet Computing Special Issue on Mesh Networks, June/July translated according to a user-defined mapping: sitting → 2008. yoga-floating (left), standing → standing (center), and run- [2] C. Cruz-Neira, D. J. Sandin, and T. A. DeFanti. Surround-screen projection- → based virtual : the design and implementation of the CAVE. In Proceed- ning flying (right). Details about the activity classifiers ings of SIGGRAPH ’93, pages 135–142, New York, NY, USA, 1993. ACM Press. can be found in [1]. Alternative CenceMe clothing and ac- cessories include significant places [1] t-shirts that display [3] Facebook. Facebook Website. http://www.facebook.com. the logo graphic representing a location of significance in [4] Google. Google Sketch. http://sketchup.google.com/. the user’s life (i.e., the home, gym, pizzeria, etc.), and vir- [5] P. C. Johnson, A. Kapadia, D. Kotz, and N. Triandopoulos. People-centric urban sensing: Security challenges for the new paradigm. Technical Report TR2007- tual mood rings that display the inferred emotional state of 586, Dartmouth College, Department of Computer Science, February 2007. the individual. We also present some preliminary perfor- [6] J. Lester, T. Choudhury, and G. Borriello. A practical approach to recognizing mance results of the integrated system. In Figure 4, we show physical activities. In Proceedings of PERVASIVE’06, May 2006. the average delay experienced between performing the activ- [7] J. Lifton, M. Feldmeier, Y. Ono, C. Lewis, and J. A. Paradiso. A platform for ity in the real world and displaying the inferred activities in ubiquitous sensor deployment in occupational and domestic environments. In Second Life. The experiment consists of a sequence of ac- Proceedings of IPSN’07, April 2007. tivities performed by users in repeated sessions at different [8] Linden Research Inc. Second Life. http://secondlife.com/. times of the day (at noon, 4pm, and 8pm ET). We average the [9] Microsoft. Microsoft Virtual Earth. http://www.microsoft.com/virtualearth/default.aspx. data over 15 sessions. The sequence of activities is standing [10] S. Nath, J. Liu, and F. Zhao. SensorMap for Wide-Area Sensor Webs. Computer, still → walking → running → standing still. The transitions 40(7):90–93, 2007. between states are reported as step curves. We observe an [11] W. Roush. Second Earth. Technology Review, July/August 2007. average delay of about 30 seconds in reporting the activity [12] S. Wang, W. Pentney, A.-M. Popescu, T. Choudhury, and M. Philipose. Common changes. This delay is due to the activity recognition pro- Sense Based Joint Training of Human Activity Recognizer. In Proceedings of cess on the phones, the transmission of the inferred activity IJCAI’07, January 2007. to the back-end, its retrieval from CenceMe database, and [13] I. H. Witten and E. Frank. Data Mining. Morgan Kaufmann Publishers, 2005. its display in Second Life. The delay is mostly caused by [14] D. Wyatt, T. Choudhury, and J. Blimes. Conversation Detection and Speaker Segmentation in Privacy Sensitive Situated Speech Data. In Proceedings of In- the periodic inference process running on the phones and the terspeech 2007, August 2007. uploading of this information to the CenceMe back-end. The