University of Helsinki Department of Computer Science

Contextual Presentation

Petteri Nurmi

University of Helsinki – Department of Computer Science

1. Table of Contents 1. Table of Contents 2. *Abstract 3. Introduction 4. Content Adaptation 4.1. Introduction 4.2. The Content Adaptation Process 4.3. Challenges in Content Adaptation 4.4. Approaches to Content Adaptation 4.5. Device Independence 5. Personalization 5.1. Introduction 5.2. Collaborative methods 5.3. Content-based personalization 5.4. Predict methods for mobile applications 6. Applications 6.1. Introduction 6.2. Aura 6.3. *ParcTAB 6.4. *GroupLENS 6.5. Amazon 6.6. SPOT Watch 7. *References 8. *Method references 9. *Techniques references 10. *Pictures index4 11. *Index 2. Abstract a 3. Introduction What is contextual presentation? In [Schilit-94] different categories such as proximate selection were given but this offers a very limited view of the possibilities of using contextual information. Generally speaking contextual presentation deals with the use of context dependent information in applications and devices.

How can contextual presentation help? In this paper we consider only two different aspects of contextual presentation, namely content adaptation and personalization. Content adaptation deals with the problem of providing different presentations of the same data depending on the device- and network capabilities. Personalization deals with learning from the user. In simplest case this means providing information to the user he/she might find interesting. A more complex possibility is to learn dynamics of the user behaviour and use this to help the user.

Contextual presentation isn’t an own field of study at the moment. Instead most of the research results go hand-in-hand with ubiquitous application/devices. That is why we selected a large set of devices and applications and offer also a practical view to the problem at hand.

The paper is organized as follows. In section 4 we discuss what content adaptation is and why it’s important. Section 5 describes various aspects of personalization based on the categorization by [Hirsch-02]. Section 6 deals with some interesting applications such as the AURA[], ParcTAB[] and Amazon[]. 4. Content Adaptation and Device Independence 4.1. Introduction The amount of information in the Internet has grown rapidly. Most of the content is designed for tabletop computers and they are not suitable for devices with small screens, limited colour depth etc. Web content designers usually use rich-media content on the web pages and this poses another problem – what if the bandwidth is limited? This problem can be also described as the “World Wide Wait” problem. Content-Adaptation tries to solve both problems by transforming the web content into a more suitable form that takes into account the users personal preferences, device capabilities and the environmental parameters such as the bandwidth.

Device Independence deals with the problem of making the content accessible from different kinds of devices. A simple scenario would be an office worker, who is browsing the web in his office. Later in the evening he wants visit some page he visited during day- time but now he has only a mobile phone offering web access. The page should still be available and the quality of the transformed page should be such that the page is still readable. 4.2. The Content Adaptation Process The content adaptation process starts when a client requests a web page. The order in which the different phases of the process occur is almost identical to every approach so we give here a more generalized model of the adaptation process. Figure 5.1 illustrates the process.

Figure 5.1 the content adaptation process

After the request arrives at the server we either determine the client’s capabilities or send the requested contents to an external entity that queries for the clients capabilities and then performs the transcoding operations. The problem arising here is that what parameters we need and how to send them? One framework for delivering the parameters is to use CC/PP (Composite Capability/ Preference Profile [CCPP]). The CC/PP makes it possible to get the necessary device capabilities from the device vendors and so it reduces the amount of data the client needs to send.

The parameters we want choose can be described in terms of user preferences, device capabilities and networking parameters. Important user parameters include such as colour, timing, scaling etc. Device capabilities can include the remaining energy, buffer size, network adapter speed etc. The Network parameters can be estimated from the sent headers; these include bandwidth and round-trip time etc.

After we have the contents we want to transform and the parameters we need to take into account, we select a set of transformations that are applied and the resulting content is sent to the client. Typical transformation algorithms can be categorized as follows [Teeuw-01].

 Information abstraction: reduce bandwidth requirements but preserve the information that has the highest value to the user (etc. by compression techniques).

 Modality transformation: different modes for content, etc. transforming video data into image data.

 Data transcoding: convert the data into a different (and more suitable?) format. Example transforming JPG pictures to GIF pictures.

 Data priority: Remove irrelevant data and preserve relevant and/or interesting (to the user) data.

 Purpose classification: Allow priority levels for different objects on the page and order the objects in relevancy order. If the full content can’t be shown, remove the data that is redundant. Figure 5.2 illustrates the usage of a section-outlining algorithm. The algorithm transforms the different section headers into hyperlinks that allow the user to get the content of the section and the actual text in the sections is removed.

Figure 5.2 Example of a transformation algorithm – Section outlining

The transformation algorithms face the problem of how to select which information is relevant? Semantic analysis of the document is too slow and too error sensitive. In [Wu- 99] the need for context-categorization techniques is discussed. The trivial solution would be to assigns priorities to different objects and the levels would guide the transformation process.

After the content is delivered to the user, the user has the possibility to alter his/her profile. Also the network parameters can change, so we can’t cache the parameter values but the process will be started from the beginning the next time the user requests a page.

4.3. Challenges in Content Adaptation Main challenge is to implement an architecture that provides web content to all devices with web capabilities and makes it possible to adapt to the different environmental and device-based limitations and take into account what the user want. The system should be able to respond to changes in the environment and it should offer means of providing some kind of QoS (quality of service) based services.

Another important issue relates to privacy and copyright issues. The process shouldn’t raise new privacy related threats/issues and the content delivers should have some means of controlling the quality and content that is delivered.

4.4. Approaches to Content Adaptation The basic question is where the transcoding of the content is done? The possible solutions are to use client-side scripting, proxy-based transcoding, content transforming serv- ers or author-side transcoding.

Client-side scripting uses JavaScript ([JavaScript]) or other client side script languages to perform some transcoding to the web content. Another client-side method is to implement the transcoding process in the web browser or to build some “plug-in” applications that first transcodes the data and after that give the result to the web browser. This approach has some serious drawbacks:

 The full content is delivered to the client every time. If the bandwidth is limited, the transcoding process can only take into account the capabilities of the device.  The number of various devices with web access is large, thus implementing a general authoring program is next to impossible.

 Small devices have limited memory and/or limited processing capabilities thus affecting the complexity of the scripts very much.

Proxy based transcoding is an intermediate solution. The device sends the request to a web proxy that handles the request, transcodes the page and returns the resulting page to the client. In [Han-98] a dynamic adaptation system is described that does image transcoding on a web-proxy. The system is dynamic in the sense that it calculates the workload of the server and tries to perform transcoding only if there is enough computational capacity available. If the server doesn’t have enough resources, the image is left out thus degrading the quality of the contents but minimizing the timeouts due to server overload.

The main problems in proxy-based adaptation are the server overload problem discussed above and the problems with copyright issues. The latter means that the resulting content may be unacceptable for the content provider. The following problem illustrates the problem. Consider a web service that uses banner ads. The proxy may leave these banners off from the final version but for the provider the visibility of the banners is important because it gets money from them.

The next approach discussed here is the usage of a specialized content-transformation server. The situation is illustrated in figure 5.2 c). This approach is not used (ever) because of the serious problems it has. The following list should make it clear, why this method is never used.

 Breaks end-to-end security, for example hash signatures can’t be used anymore because the content is altered in between.

 The quality degration may not be acceptable for the content provider.

 Security threats in the communications (man-in-the-middle etc.).

The final approach discussed here is the usage of a content-adaptation engine at the provider side. This approach is used for example in [Lum-02]. When the request arrives to the provider’s web server, the context parameters (device capabilities, user preferences, network parameters) and the requested page are given to a content-decision engine that decides the optimum transforms and forms the resulting content. This is then sent to the client. The main advantage of this approach is that it allows the content provider to fully control the resulting content. The disadvantages are that it’s expensive to have own content adaptation engines and that learning to adapt is nearly impossible to implement because of the following reasons:

 The content provider can’t learn the user’s global behaviour.

 The devices may not have enough storage and/or computational capacity for performing behavioural learning.

FIGURE 5.2 the different approaches to content adaptation a) Client side scripting

b) Proxy-based transcoding

c) Intermediate server based transcoding

d) Provider (author) based transcoding

4.5. Device Independence Device Independence can be achieved by providing languages that allow device independence and have support for content adaptation. According to [Lemlouma-02] such lan- guage doesn’t exist. Usually when speaking of device independency, the following techniques are mentioned: XHTML, XML, XSL and CSS ([XHTML] [XML] [XSL] [CSS]). The usage of this kind of languages would make both sided independency possible. Because there exists various different devices with different capabilities, it’s impossible for the content provider (or too expensive at least) to design the contents differently for every configuration. Device independent languages use device independent methods for showing the content on the client side but also allow the content provider to use conditional structures to have some control over the result. A simple scenario of this is a web application that chooses the used style sheet according to some client side capabilities. 5. Personalization as a method for Context-Presenta- tion 5.1. Introduction In [Hirsch-02] personalization tasks are categorized into three main tasks: content- based prediction, collaborative methods and using the past to predict the future. Consider an online news portal that has news from various categories. When a user is browsing the online content, he usually is interested in only few categories and the preferences for these categories vary from “highly-interesting” to “somewhat interesting”. Con- tent-based prediction is the task of learning these interesting categories, using the extracted information and to make access to categories, the user might think interesting, easier. Methods for content-based prediction are discussed in section 5.3. These methods are quite new so the section only presents two research-papers that deal with this subject [] [].

Collaborative methods are used to “automate the word-of-mouth”. Consider for example a movie portal that allows people to rate different items. When a user reviews a film that he liked, he would probably like to have recommendations about films that might interest him/her. Collaborative methods look at most similar users and make recommendations based on the items the other users have found interesting. Collaborative methods are discussed in section 5.2.

Using the past to predict the future means that some history information is used to predict what the user (or application) might do next. In everyday computer use, users need to do monotonic action sequences. The sequences vary depending on the user so no general model can be done to ease up the situation. Instead we can register the action sequences and try to predict the next actions and offer shortcuts that are easily accessible. The current research so far hasn’t used prediction methods very much but they surely are a promising technique in the future. In section 5.4 some possible methods for future prediction in mobile applications are discussed.

5.2. Collaborative methods Collaborative filtering is widely used on commercial web sites such as Amazon, IMDB (Internet Movie DataBase) etc. The benefits of collaborative filtering methods are mutual, the users are provided information about items that might interest him/her and the enter- prises are offered a simple marketing method for trying to increase their sales count.

Collaborative filtering has three main challenges that are listed below:

 Quality of recommendations.

 Computational cost of recommendation algorithm.

 The used algorithms are complete. This means that every purchased/rated item in the database should be recommended at some point.

The quality of recommendations means simply that the users should be satisfied with the recommendations they get from the site. If the users are dissatisfied, they will be dis- appointed and won’t use the system anymore. This easily leads to a situation where the algorithms offer only recommendations that have very strong confidence. This means that the system recommends only items that it thinks are very interesting (probability > 0.9?). This kind of scenario usually means that only a limited set of items is used for recommendations. Some web sites have very many (millions) customers and products. Going through such a large dataset in real-time is impossible so the used algorithms must be divided into offline and online parts, where the time-complexity of the online part should be as small as possible.

The problem with many collaborative methods is that the easily lead to monotonic behaviour as the system recommends the items with most ratings/purchases etc. This leaves a set of products completely out of the spectrum and can lead to a situation, where some users don’t get any recommendations because they only have bought/rated rare items. This kind of situation should be avoided at any cost.

Figure 5.1 making recommendations – overview of the process

The simplified overview of the recommendation making process is shown on figure 5.1. The first phase of the process is to perform information retrieval techniques [] to build a vector-space model from the customers and items. In clustering methods [] the first phase is to cluster the data and then perform the vector-space modelling. Because the data sets are sparse, some dimensionality reduction [] techniques can be used to reduce the space requirements of the algorithms. Dimensionality reduction techniques remove coordinates that are irrelevant because of general noise of measurements. One such technique is the principal component analysis [].

The second phase of the process is to calculate the most similar users. For this phase some similarity metric is used. The most commonly used metrics ([][][])can be seen in figure 5.2.

Figure 5.2 Similarity metrics

The actual recommendation phase is quite easy after the similar users are found. Usu- ally this is done by calculating a summarizing vector of the similar users group. This summarizing vector can be modelled as a bar graph as seen in figure 5.3.

Figure 5.3 Summarizing vectors

The summarizing vector is normalized. Now the values of the coordinates can be though of as probabilities of interest so the top-N values are selected and recommended.

If no pre-clustering of data is used, the whole customer database has to be compared with the user. In the worst case this takes O(MN) time , where M is the number of items and N is the number of customers. None of the computations can be done offline so if the site has a large customer and/or products base, these methods can’t be used. The quality of recommendations is usually quite good but they tend to offer recommendations from only a restricted set of items.

If pre-clustering is used the data is first clustered [] offline and summarizing vectors that represent the group are formed. When the actual online computation is performed, the user vector is compared with the cluster vectors and the items are selected within the clusters. The clustering can be done by selecting only one cluster/user or by allowing users to belong to multiple clusters with some confidence value. The good thing about clustering is that it makes the computations more effective as the calculation can be divided into offline and online parts. The major drawback is that the quality of recommendations is usually much worse than with standard collaborative filtering methods. The quality of recommendations can be improved by a more fine-grained clustering but this makes the online computations more time-consuming. Also clustering tends to recommend only frequent items as the rare items are mixed up with the general measurement errors. What is interesting is that no paper discussed the possibility to use multiple group clustering and then perform collaborative filtering within the most similar customers.

One way of forming recommendations is to use search-based methods. These usually search for items from the same author, actors etc. that are popular. This is the simplest method of doing the process and also the worst. The recommendations tend to be very general and if the product base is very large this can lead to very large result sets. The good point with this approach that it puts all programming issues to the programmers of the used database management system.

For large datasets and real-time recommendations another way of calculating the recommendations must be used: item-based recommendation algorithms [][]. Amazon () is an example of a site that uses this kind of methods. The idea of the process is quite similar to user-based recommendations; this is illustrated in figure 5.4.

Figure 5.4 item-based recommendation algorithms – the process

Basically the process consists of first finding sets of items than tend to appear togeth- er. This data is used to calculate a similar-items table. The table construction phase consists of iterating through the items that occurred with item i and calculating a similarity value using some similarity metric (see figure 5.2). When the user purchases/rates some item the item similarity tables are used to find matches for the item and from these the top-k items are recommended. This easily (and usually) leads to general recommendations but the good thing is that the online calculations can be done very fast. At Amazon the offline calculation take O(N^2 M) time in the worst case so this complexity could be op- timized. Generally the item-based recommendations are simple association rule mining processes [] so the existing algorithms could be modified to support different similarity metrics and thus providing better offline performance and the possibility to customize the metrics depending on the task. 5.3. Content based personalization Content based personalization is quite a new topic so no general overview can be given. Currently the methods are based on clustering and probabilistic modelling. We present here two different research projects as an introductory material to this topic.

The first project [] was done by Microsoft Research in 2000. First the data was clustered and secondly the clusters were used to build simple Markov processes from the data. From the data two kinds of models were constructed. First-order Markov models from the data were constructed for each cluster. For example consider a situation where the users request pages from the weather category after they had read sport news. Then this forms a simple two state Markov process and the frequencies can be used as occur- rence probabilities. The other model was to construct unordered Markov processes, where the order of the visits doesn’t matter, but only the visited categories are interesting.

The second project [] discussed here was done by University of Helsinki and it used a very complex Bayesian model. The clustering was done in two ways, both the pages and users were clustered, and this data was used to construct a Bayesian network []. The vari- ables of the Network are illustrated in figure 5.5. This model gives the probability that a certain user belonging to a certain group will request a view that has this article from this cluster of pages and this information was used to customize views for different users.

Figure 5.5 a two-way clustered Bayesian model

At the current moment content based personalization is an emerging technology that probably will be used widely in the future. People would prefer this kind of solutions if they work well enough and thus commercial sites are interested in this kind of applications. Ac- curate models for demographic and content dependent personalization are difficult to build and this is why only few sites can offer this kind of service. If a generic application could be built, the market possibilities would be huge.

5.4. Predict methods for mobile applications Predicting user behaviour in mobile applications leads to a better user experience as the user doesn’t need to repeat monotonic actions repeatedly. No paper that considers models for this kind of applications was found so we only present some possibilities.

A simple method would be to use data mining techniques [][]. After certain amount of actions sequences the log data of action sequences is used to generate association rules [] and their confidence levels. This method poses for example the following problems:

 Memory requirements too big? This depends of the amount of data that is stored and the number of sequences.

 How often would the mining (and updating) be performed?

 This offers only periodic learning. The next possible method that extends the previous model is to use the Bayesian rule \latex {} to update the probabilities after the first clustering phase. This allows us to replace the log data with the probabilities and thus memory requirements would be eased. Some possible problems:

 If the user doesn’t use a shortcut key provided by the system, how is the updating controlled?

 Memory requirements?

Hidden Markov Models [] are quite good in modelling simple action sequences. With HMM the next states can be predicted and various shortcut keys to different following states can be offered. This is probably the best model for many situations as its reason- able simple to implement and it doesn’t require large amounts of memory. The interesting problem is how to learn the characteristics of the process. This can be seen as a Markov Decision Process (MDP) [] and reinforcement learning techniques can be used.

The last model discussed here is to use Bayesian networks []. Using Bayesian networks [] the system doesn’t necessary need to learn the distribution in the initial state as user studies can be used to form an initial distribution, which is then updated depending on the user actions. 6. Applications and Devices 6.1. Introduction From the different applications we tried to select those that offer some practical sights into the theories presented before. Aura [] and ParcTAB [] are more ambitious projects of which the first was done by Carnegie Mellon University and the second by Xerox. These offer a more wide view to how context presentation can be used.

GroupLens [] and Amazon [] are typical examples of how personalization is and can be used. A more ambitious project that isn’t discussed here is the Lumière [] project by Microsoft. GroupLens is a web site that offers recommendations of NetNews and Amazon is a book store in the web that offers recommendations for items to buy.

For a simple example of how location can be used to deliver content, we discuss shortly the commercial project SpotWATCH [], which was done in collaboration with Microsoft and is based on using radio signals to transmit information and using a simple algorithm to filter the data depending on the location.

6.2. Aura Project Aura is a research project by Carnegie Mellon University. The main goal is to provide a framework that can support effective use of resources and minimize the need for user distractions in a pervasive computing environment.

The main idea is to divide the environment into four different components that all have their specific tasks. The architectural overview of Aura is shown in figure 9.1. Figure 9.1 Components of Aura in a certain environment.

Every environment has two static components, the environment manager and the content observer. The dynamic components are the task manager and the service suppliers.

For the user the tasks are represented as a collection of abstract services. An example of this is edit text + watch video. First the task manager negotiates a configuration with the environment manager. After the negotiation phase the environment manager returns a handle to a service supplier. Using this handle the task manager can access the supplier that offers the required service.

When a user requests for a certain kind of service the environment manager looks through its database of service suppliers and selects the most appropriate one. The simplest form of selection can be illustrated by the following example. Assume that the user is using Linux and requests text editing. Now the environment manager selects XEmacs but if the user was using windows it would have chosen Microsoft Word. The architecture allows more sophisticated control using XML [XML]. The service suppliers are basically different applications that are wrapped to the Aura API according to some parameters.

What if the environment changes? We said that the task manager is responsible for the reconfiguration but we need some way to inform it that changes have occurred. This is done using a context observer. The context observer gets its information from the different sensors and uses this information to notify the task managers about changes in the environment.

6.3. ParcTAB 6.4. GroupLENS 6.5. Amazon Amazon is a typical example of an internet web site that offers recommendations to the user. The algorithm Amazon uses is an item-to-item collaborative filtering algorithm (section 7.5). According to [Linden-03] Amazon had over 29 million customers and sever- al million data items in January 2003. Because Amazon is a web site, it should calculate the recommendations in real-time thus offline processing is needed. The offline phase consists of building similar items tables by finding items that customers tend to by customers. This is a frequent set data mining problem and the existing algorithms offer effective means of finding these sets. The similarity between items is calculated using the cosine similarity metric. Because of this comprehensive offline calculation phase, the recommendations can be provided in real-time.

6.6. SPOT Watch FM Radio signals can be used to send different data such as traffic forecasts, movie times, traffics alerts and advertisements. Figure 9.2 illustrates a simple scenario where proximity content is delivered to a wristwatch.

Figure 9.2 The SPOT Watch architecture listens to different radio stations and recognizes from the signal data those that send SPOT data. This recognition can be done using pre-programming the system to listen to only certain frequencies. The other way is to use some form of identification patterns in the signal data. Once the radio stations are known, an intens- ity vector of the signal strengths is extracted. Some filtering method can be applied to reduce the effects of noise.

The SPOT watch uses a Right SPOT algorithm to infer the current location. The system doesn’t try to get the exact location but instead to locate the area/neighbourhood, where the user is. The location of the radio transmitters is known in advance, so the strength of the signal can be used to estimate, where the user is. Right SPOT uses Bayesian inferring to calculate conditional probabilities of the areas. This is illustrated in figure 9.3.

Figure 9.3

From the probabilities a histogram is built. The area with the largest probability is the most probable (maximum likelihood) location at the moment so it’s selected. This information is used to filter SPOT data from the radio signals. 7. References

[1] [Bickmore-97] Timothy W. Bickmore and Bill N. Schilit, Digestor: Device- Independent Access to the World Wide Web, Proceedings of the 6th World Wide Web Conference (WWW 6), 1997, pages 655–663;

[2] [Gerasimov-00] Vadim Gerasimov and Walter Bender, Things that talk: Us- ing sound for device-to-device and device-to-human communication, IBM Systems Journal Vol. 39 Nos 3&4, 2000.

[3] [Han-98] Richard Han, Pravin Bhagwat, Richard LaMaire, Todd Mum- mert, Veronique Perret and Jim Rubas, Dynamic Adaptation in an Image Transcoding Proxy for Mobile Web Browsing, IEEE Personal Communica- tions Magazine, December, 1998.

[4] [Horvitz-99] Eric Horvitz, Lumiere Project: Bayesian Reasoning for Automated Assistance, Microsoft Research 1999.

[5] [Lemlouma-02] Tayeb Lemlouma and Nabil Layaïda, Device Independent Principles for Adapted Content Delivery, OPERA Project, 2002.

[6] [Linden-03] Greg Linden, Brent Smith and Jeremy York, Amazon.com Recommendations: Item-to-item collaborative filtering.

[7] [Lum-02] Wai Yip Lum and Francis C.M. Lau, A Context-Aware De- cision Engine for Content Adaptation, Pervasive-Computing 5:41-49, 2002.

[8] [Koll-01] Siva Kollipara, Rohit Sah, Srinivasan Badrinarayanan, Rabee Alshemali, SENSE: A Toolkit for Stick – e Frameworks, December 2001. [9] [Krumm-03] John Krumm and Eric Horvitz, RightSPOT: A Novel Sense of Location for a Smart Personal Object, Microsoft Research Paper, Ubicomp 2003, Seattle.

[10] [Madhav-03] Anil Madhavapeddy, David Scott and Richard Sharp, Contex- t-Aware Computing with Sound, Ubicomp 2003, Seattle.

[11] [Sousa-02] João Pedro Sousa and David Garlan, Aura: An Architectur- al Framework for User Mobility in Ubiquitous Computing Environments, Proceedings of the 3rd Working IEEE/IFIP Conference on Software Architecture, August 2002.

[12] [Teeuw-01] Wouter Teeuw, Content Adaptation, Telematica Institut.

[13] [Wu-99] Jon C.S. Wu, Eric C.N. Hsi, Warner ten Kate and Peter M.C. Chen, A Framework for Web Content Adaptation, Philips Research Paper, 1999. 8. References to used techniques

[1][CCPP] Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies, W3C Working Draft 28 July 2003 [2] wont