Modern Web and Video Technologies Survey for New Interactions Jimmy Nystrom,¨ Nicklas Nystrom¨ and Peter Parnes Lulea˚ University of Technology

COMPEIT SURVEY, NOVEMBER 2013 1

Index Terms—Compeit, Videoconferencing, Augmented Reality, Tangible interaction, 3D Video, Stereopsis, Motion Parallax, Mediated Gaze !

[email protected] [email protected] [email protected] COMPEIT SURVEY, NOVEMBER 2013 2

CONTENTS

1 Introduction 4

2 Videoconferencing Systems 4 2.1 Skype Group Video Calling ...... 4 2.2 Google+ Hangouts ...... 5 2.2.1 Google+ Hangouts API ...... 5 2.3 Adobe Connect ...... 5 2.4 Microsoft Avatar Kinect ...... 5

3 WebRTC 6 3.1 Standardization ...... 6 3.2 Deployment ...... 6 3.3 WebRTC Frameworks ...... 7 3.3.1 Licode ...... 7 3.4 WebRTC Support ...... 8

4 Augmented Reality 8 4.1 AR with visual input ...... 8 4.1.1 Video see-through ...... 8 4.1.2 Optical see-through ...... 8 4.1.3 Projective displays ...... 9 4.2 AR Frameworks ...... 9 4.2.1 OpenCV ...... 9 4.2.2 ALVAR ...... 9 4.2.3 Javascript AR Libraries ...... 10 4.2.4 KinectFusion ...... 10

5 3D Video 10 5.1 3D-Based Mediated Gaze ...... 11 5.2 Stereopsis and Motion Parallax ...... 11 5.3 3D Reconstruction ...... 12

6 Browser Graphics 13 6.1 WebGL ...... 13 6.1.1 ThreeJS ...... 13 6.1.2 BabylonJS ...... 13 6.1.3 Unity3D ...... 13 6.1.4 CSS3 ...... 14 6.2 Vector Graphics ...... 14

7 Tangible Interaction 14 7.1 Tangible Interaction in Entertainment and Education ...... 15 7.2 TangibleAPI ...... 15

8 Hardware 15 8.1 Oculus Rift ...... 15 8.2 3D printed VR headsets ...... 15 8.3 Microsoft PixelSense ...... 16 8.4 Narrative ...... 16 COMPEIT SURVEY, NOVEMBER 2013 3

8.5 Microsoft Kinect ...... 16 8.6 Leap Motion ...... 17 8.7 Arduino ...... 17 8.8 Smart Toys ...... 17 8.8.1 Sphero ...... 17 8.8.2 Sifteo cubes ...... 18 8.8.3 Bo and Yana ...... 18 8.8.4 Oriboo ...... 18

9 Conclusions 18

References 19

Appendix A: Publication 21 A.0.1 3D and Augmented and Virtual Reality ...... 21 A.0.2 Group Communication ...... 21 A.0.3 Distributed Applications and Cloud Computing ...... 21 A.0.4 Other Web Related Journals ...... 22 COMPEIT SURVEY, NOVEMBER 2013 4

Abstract—This document is the results of a survey of current leveraging on cloud based software technology and research in areas relating to a EU project called access and distribution. Three key do- COMPEIT. Its purpose is to act as a starting point and reference mains are identified for improving for the development of the COMPEIT platform. The survey quality of experience: covers current videoconferencing systems, augmented reality technology, 3D video creation and visualization, WebRTC- and • Spatial Connectedness WebGL frameworks, and hardware that could be of interest to • Social Connectedness the project. • Information Connectedness COMPEIT develops and validates three key web services: 1 INTRODUCTION • Shared Experience with Tangible This project was done as part of an advanced Interaction enables audiences to course in computer science, and is closely re- experience enhanced live media lated to the COMPEIT project described below. together, complemented by tangi- Its educational goals were to give us insight ble and interactive games into state of the art research and technology in • A Broadcast Presence Studio to the fields relating to the COMPEIT project, and mix live media with various types knowledge of available tools and frameworks. of web-based content The work done was also meant to provide a • Mixed-Reality Interaction, an starting point for further research in the subject advanced WebRTC-based service area. where content generated by From the COMPEIT project description: the Broadcast Presence Studio service can be mixed into the COMPEIT creates a web-based system viewer’s physical environment for highly interactive, personalised, using ambient devices. shared media experiences. Research Prototypes will be validated in an Ex- and development will link content- perience Lab and via an online media delivery networks with tools for en- distribution platform, complemented hancing mediated presence. COM- by a new national hospital which in- PEIT takes the view that Internet- tegrates real-time mediated commu- based distribution will transform tra- nication to distributed care centres, ditional broadcasting towards higher private homes and classrooms. levels of interactivity and integration with virtual, mixed and augmented reality, enabled by advanced 2 VIDEOCONFERENCING SYSTEMS web technologies and the prolifera- Videoconferencing can be a great tool for me- tion of audio/video/tangible devices. diating presence and has several benefits com- The project addresses Quality of Ex- pared to physical meetings, especially when perience in flexible, interactive media they can replace long business trips: production and consumption systems • Less traveling means less carbon dioxide designed for professional collabora- emissions from transportation tion and shared leisure activities. It • Cheaper introduces the next step in interac- • Time saving tive broadcasting systems by focusing • Allows for greater flexibility technologies that enrich social con- This section lists and discusses videoconferenc- nections, improve the feeling of being systems that are currently used. ing together in one shared space and enhance collaboration. Modular software will be developed based on low- 2.1 Skype Group Video Calling cost, easily accessible web technolo- Sky Group Video Calling was launched in Jan- gies (e.g HTML5, WebRTC, WebGL), uary 2011. It lets up to 10 people participate COMPEIT SURVEY, NOVEMBER 2013 5

2.3 Adobe Connect Adobe Connect is a Flash-based web conferencing platform for web meetings, eLearning and webinars. Meeting rooms are organized into pods performing specific roles, like chat, whiteboard or note. The following features are available: • Unlimited and customizable meeting rooms • Multiple meeting rooms per user • Breakout sessions within a meeting Fig. 1. A Skype group video calling session • VoIP • Audio and video conferencing in a video chat session. Every user can see all • Meeting recording other users videos and hear their voices simul- • Screen sharing taneously, and not just the active speaker. The • Notes, chat, and whiteboards video windows of active speakers are either • User management, administration, and re- highlighted or enlarged. Users can be added to porting the group conversation on the fly[28]. Figure 1 • Polling shows a Skype session in action. • Central content library • Collaboration Builder SDK 2.2 Google+ Hangouts Google+ Hangouts has many similarities with 2.4 Microsoft Avatar Kinect Skypes counterpart: it was launched in 2011, Microsoft Avatar Kinect was released as part allows up to ten people to participate in a of Kinect Fun Labs in 2011. It lets users so- group video chat, all users’ videos are visible at cialize in themed virtual environments. Each the bottom of the window, and the participant user is represented by an avatar which mirrors who is currently is shown in an enlarged video the users movements and facial expressions window. Participants can also watch Youtube through the use of a Microsoft Kinect (see clips or work on Google documents together Figure 2). Up to eight people can participate in while video chatting. It has a casual invite a session. The focus is on creating content from model. When a user starts a video chat, this is your own living room. The group sessions can announced to friends, a circle of acquaintances, be recorded, edited and shared on Facebook. or to the public, depending on what the user Users can choose between 24 different virtual wants. The public has adopted Hangouts in environments for their sessions to take place many creative ways, by creating events like in, and the activities are filmed by a virtual concerts, online classes and writing groups.[28] producer. Usability studies have shown that Avatar Kinect resonates more with younger 2.2.1 Google+ Hangouts API users. Also, users are more inclined to use The Google+ Hangouts API lets developers it to express themselves creatively than as a build videoconferencing apps that run inside a video chat system.[28] Both Skype and Hang- Hangout. The API provides real-time function- outs can show different video streams in the ality, such as synchronization of data between main window based on what is going on in the users and - of course - audio and video com- conversation, but the idea of a virtual producer munication. This could be interesting to look takes this one step further. Showing different into when developing the COMPEIT platform, viewing angles depending on the situation is partly to see how their API is structured but easier in a virtual environment, but the feature also for rapid prototyping. could be translated to regular videoconferenc- COMPEIT SURVEY, NOVEMBER 2013 6 ing if more cameras were incorporated in the used on the web and many mobile devices con- session. tain H.264 decoder hardware, which can significantly reduce power consumption. WebRTC was originally a Google initiative and lacks support from many of the large competing corporations: Apple is seemingly uninterested in the standard and their browser, Safari, offers no support for WebRTC. Microsoft has even proposed an alternative standard called CU- RTC-WEB. The constantly evolving standard causes problems for developers as every new version of a browser handles WebRTC slightly differ- Fig. 2. Microsofts avatar Kinect ently and WebRTC applications risk to break with every browser update. Using a WebRTC frameworks to work around this problem is 3 WEBRTC discussed in Section 3.3. WebRTC is an open-source project enabling 3.2 Deployment web applications to incorporate audio, video and data Real-Time Communication (RTC) di- When deciding how to deploy a WebRTC- rectly in the browser without relying on exter- based videoconferencing system or develop- nal plugin software. The three major compo- ing a communications platform, the following nents of the Javascript API are: needs to be considered: 1) MediaStream (getUserMedia) - camera • Peer-to-peer communication or Multi- and microphone access point control unit? 2) PeerConnection - sending and receiving – With P2P, the server only needs to ini- media tiate communication, but it puts high 3) DataChannels - sending non-media di- bandwidth demands on the users if rectly between browsers there are many people in the con- WebRTC is a central component of the COM- ference (each user has to stream au- PEIT project as the communications platform dio/video/data to every other user) will be built on top of it. – With an MCU, each user connects to a single server, demanding a lot less bandwidth. Of course, this option puts 3.1 Standardization a lot more strain on the server. The standardization process of WebRTC is on- – Which one to use depends on the use- going, with work being split between several case. P2P is best suited for video chats working groups in the World Wide Web Con- with a limited number of participants sortium (W3C) and the Internet Engineering (n <= 4). With a larger number of Taskforce (IETF), responsible for the Javascript participants, the upstream bandwidth API and the protocol definitions respectively. requirements become too great for the The standard is still in flux, in part due to typical user, in which case an MCU an ongoing discussion on which video codec may be a better option. should be Mandatory to Implement (MTI). Two • Scalability - what happens if the number codecs, VP8 and H.264, are proposed as MTI, of users suddenly increases tenfold? but consensus has at the time of writing not – Deploying a system on a cloud plat- yet been reached. VP8 is (alledgedly) free of form allows it to scale automatically. licensing fees while H.264 uses patented tech- New (virtual) servers can be set to nology. On the other hand, H.264 is widely start and stop as needed depending COMPEIT SURVEY, NOVEMBER 2013 7

on certain metrics, such as bandwidth- which can act as a Multipoint Control Unit or CPU consumption, or the num- as well as enable peer-to-peer connections be- ber of simultaneous connections to the tween users. Licode provides virtual confer- server. ence rooms for sharing of video, audio and • Availability - strategic placement of the data through JavaScript APIs for both server servers is important: for example, Lulea˚ and client side applications. Since 2013, it can may not be the best place for a media relay be deployed with autoscaling in the cloud, for server if most users are situated in Israel. example using Amazon AWS. Licode was used • TURN Server? The typical internet user in our masters thesis project[31]. The platform sits behind a network address translator was relatively easy to use and maintain. The (NAT) and/or a firewall, which means versions of Licode that we’ve tested so far are that they do not necessarily know their somewhat unreliable, due to problems with public IP address. The STUN protocol al- browser interoperability causing the MCU to lows computers to set up peer-to-peer like crash occasionally. However, the large commu- connections by revealing their respective nity of developers work continuously to keep public IPs. However, depending on the up with the changing standard. type and configuration of the NATs and firewalls, this is not always possible. In 3.3.2 OpenTok/Mantis such cases, data can be relayed by a third Like Licode, OpenTok/Mantis is scalable and party server using the TURN protocol. cloud based. It dynamically adapts the bitrate A robust communications solution should of the encoded stream to the current network use STUN with TURN relay servers as conditions. It is highly interoperable, support- fallback when peer-to-peer is not possible. ing Google Chrome and Firefox as well as • Cost - an autoscaling, MCU based video- Internet Explorer through Chrome Frame. It conferencing system can be very costly also works with Chrome on Android, and it when traffic is high. can be run in native applications on iOS. Unlike Licode, it is not free of charge. 3.3 WebRTC Frameworks The WebRTC API in modern browsers is con- 3.3.3 Weemo stantly and rapidly evolving. Code that works Weemo offers cloud-based real-time communi- today is likely to be broken tomorrow: One cation services through JavaScript APIs as well way to alleviate this problem is to build ap- as SDKs for Android and iOS. If a browser plications using a higher-level client side API, without WebRTC support is used, Weemo users with a more stable interface, where the un- can communicate via WebSockets Secure or derlying code is maintained by other devel- HTTPS Long Poll to a lightweight communica- opers. Several such frameworks are described tions driver. Weemos own cloud is used, and below. Even if the partners of the COMPEIT the service is not free of charge. project end up building their own WebRTC framework, providing exactly the functionality that is needed, it might be a good idea to start 3.3.4 EasyRTC prototyping using an existing framework; less Another open source WebRTC platform based work will be required to get started, and the on Node.js. Aside from featuring video and partners will learn what features are useful audio communication, it supports messaging and which ones are in need of improvement using both RTCDataChannel and WebSockets or missing. (via the Socket.io module). It won the Best WebRTC Tools Award at the WebRTC Expo 3.3.1 Licode and Conference in November 2012 as well as Licode (previously called Lynckia) is an open the Editor’s Choice Award in June 2013.[42] source WebRTC communications platform, EasyRTC is suitable for fast development. COMPEIT SURVEY, NOVEMBER 2013 8

3.3.5 SimpleWebRTC are commonly applied are sight, sound and SimpleWebRTC is comprised of a set of inde- touch. Audio input in AR systems are usu- pendent modules, each providing a specific fea- ally mono, stereo or surround sound head- ture. At SimpleWebRTC.com, sandbox servers phones and loudspeakers. True 3D audio can are available for development and testing. For be found in some virtual environments. AR production, it is possible to build your own systems based on sight are described in detail signaling server based on their Signalmaster below. module. 4.1 AR with visual input 3.4 WebRTC Support Visual AR can be divided into three methods: Video see-through, optical see-through and AR WebRTC is currently supported in the follow- overlay. In video see-through, virtual objects ing browsers: are displayed on a live video feed of real- • PC: Google Chrome 23, Mozilla Firefox 22, ity. In optical see-through, the AR objects are Opera 12 displayed as an overlay on top of the real • Mac OS: Google Chrome 23, Mozilla Fire- world by means of lenses or low-power lasers fox 22, Opera 12 drawing directly on the retina. Lastly, the AR • Android: Google Chrome 28 (Enabled by overlay can be projected onto the real world. default since 29), Mozilla Firefox 24, Opera There is a fourth method of visual AR which Mobile 12 is not yet available for the masses: true 3- As WebRTC still lacks support on many com- dimensional displays. Such systems have been binations of platforms/browsers, developing a created; [3] developed a true 3D system which framework that works for everyone requires could display 1000 points per second using effort, and interoperability between browsers plasma in the air as early as the year 2000. The poses further challenges. currently used methods are described in some detail below. 4 AUGMENTED REALITY 4.1.1 Video see-through Mixed-Reality Interaction is one of the web Real objects can be completely removed from services that will be developed in the Com- view. It is easy to match the brightness and peit project. It will be an advanced WebRTC- contrast of virtual and real objects. Also, per- based service where content generated by the ception delays of the real and virtual world Broadcast Presence Studio service can be mixed are easily matched. The disadvantages of this into the viewer’s physical environment using method: The real world gets a limited resolu- ambient devices. Augmented reality (AR) en- tion. Also, users may get disoriented due to hances our perception of reality by making the fact that the camera is offset from the users virtual objects appear to exist in the same space eyes. Video see-through is used commercially, as the real world. AR is gaining adoption, and for example in Nintendos handheld gaming will likely continue to do so, supporting us console Nintendo 3DS (see Figure 3). in fields like education, maintenance, design, reconnaissance and more.[22] AR systems have 4.1.2 Optical see-through the following properties: The resolution of the real world is unchanged. • They combine real and virtual objects in a They are also safer than video see-through, real environment since users can still see if the power fails. How- • They align real and virtual objects with ever, occlusion of real objects becomes more each other difficult, as the light of the real object is com- • They run interactively, in three dimen- bined with the virtual image. This problem can sions, and in real time be overcome; [5] solved this by using an over- AR can be used to enhance any of our senses lay that can be opacified per pixel. Optical see- with additional information, but the ones that through displays usually have problems with COMPEIT SURVEY, NOVEMBER 2013 9

Fig. 5.

Fig. 3. An AR application running on a Nintendo One example of a system using projective 3DS displays is CAVE (Cave Automatic Virtual En- vironment). It is a virtual reality room where stereoscopic images are projected onto the low brightness and contrast on both the virtual walls. Using 3D glasses, which are synchro- and real world. However, Retinal scanning dis- nized with the alternating projector images, plays, where the AR images are drawn directly these walls become 3D displays. Speakers are on the retina, solve this problem. Thanks to placed around the room, producing surround their low power-consumption, these can also be sound.[38] suitable for outdoor use. [17] developed such a system which featured full color binocular AR 4.2 AR Frameworks vision with dynamic refocus. 4.2.1 OpenCV OpenCV (Open Source Computer Vision Li- 4.1.3 Projective displays brary) is an open source library for computer On the positive side, projected AR displays vision and machine learning, providing a com- do not require any eye-wear, and they do not mon infrastructure for computer vision appli- cause any problems when users eyes refocus. cations. It has interfaces for C, C++, Java, MAT- However, the projectors need to be recalibrated LAB and Python and supports Android, Linux, whenever the environment changes (this can Mac OS and Windows. It has more than 2500 be automated). Also, projective displays are optimized algorithms, including both classic limited to indoor environments, since projected and state-of-the-art computer vision algorithms images tend to have a low brightness and for identifying objects and faces, classifying hu- contrast. Occlusion of real objects by virtual man actions in videos, tracking camera move- ones becomes difficult. This can be dealt with ments, tracking moving objects, extracting 3D by using head-worn projectors and covering models of objects, producing 3D point clouds surfaces with retro-reflective material, which from stereo cameras, following eye movements, reflects the projected images directly back to establishing markers (to be overlayed by AR), the light source (close to the users head) with and more. OpenCV is released under the BSD- a minimum of scattering. This method is illus- license, and is widely used in companies, re- trated in Figure 4. Figure 5 shows how it works. search groups and by government bodies.[33]

4.2.2 ALVAR ALVAR is a VR and AR library developed by VTT Technical Research Centre of Finland. It is built using OpenCV, designed to be flexible and easy to work with. It offers high-level tools and methods as well as interfaces for all the Fig. 4. Projection display used to camouflage an low-level methods. Alvar features include object coated with retro-reflective material • Marker based tracking – accurate marker pose estimation COMPEIT SURVEY, NOVEMBER 2013 10

– two types of square matrix markers – recovering from occlusions • Using multiple markers for pose detection – The marker setup coordinates can be set manually – or they can be automatically deduced by autocalibration • Markerless tracking – feature-based (tracking features from the environment) – template-based (matching against pre- Fig. 6. Fiducial marker used for augmented re- defined images or objects) ality • Other – hiding markers from view – tools for calibrating cameras This is illustrated in Figure 7. Even small and – several methods for tracking optical arbitrarily shaped parts of the reconstructed flow scene interact with the virtual objects with – distorting/undistorting points, pro- compelling results. A moving Kinect is used jecting points for the reconstruction. The reconstructed model – finding exterior orientation using is highly detailed; the DELL logo engraving is point-sets less than 1mm deep. – Kalman library and several other fil- ters It is released under the GNU Lesser General Public License, version 2.1 or later, and is currently available on Windows and Linux.[35]

4.2.3 Javascript AR Libraries JSAruco and JSARToolkit are two Javascript libraries for augmented reality applications. They both provide the web browser with functionality for tracking markers (of the type seen in Figure 6) in video streams. Each code represents an integer between zero and 1023. JSAruco is a lightweight library for marker Fig. 7. Interactive simulation of physics on re- detection and 3D pose estimation and JSAR- constructed 3D model Toolkit offers some more advanced features. We used JSAruco for calibration when developing the mediated sketching table in The 5 3D VIDEO Tangibles project [40]. Detecting markers in Using 3D models in videoconferences allows a projected image proved difﬁcult, but under for super realism: for example manipulating good lighting conditions the marker detection the perceived line of sight of participants to cre- works quite well. ate reciprocal gaze, creating a sense of depth by viewpoint-yoked rendering or placing the par- 4.2.4 KinectFusion ticipants in a virtual world. Several attempts KinectFusion[25] enables geometry-aware aug- have been made to develop videoconferenc- mented reality, where virtual objects can inter- ing systems with three-dimensional represen- act with the reconstructed scene in real-time. tations of the participants, presented either COMPEIT SURVEY, NOVEMBER 2013 11 directly[31] or in a virtual world[8]. This sec- those that are closer seem to move faster than tion discusses methods of creating 3D models those that are far away. This difference in speed of physical objects from camera images and is used to determine the distance to objects. different ways of using such models to enhance Stereoscopic displays are common nowa- videoconferencing. days; most (high-end) TVs sold today are 3D- capable. They create a sense of 3D by pre- 5.1 3D-Based Mediated Gaze senting different images to each eye, normally requiring the viewer to wear a pair of glasses. Gaze, or where someone is looking, is a vi- Active shutter 3D systems alternatingly show tal part of body language. It is helpful for two different viewpoints of an image, while giving feedback and expressing feelings and the glasses worn by the viewer blocks one attitudes[13]. The ability to see where someone eye at the time. Such a pair of glasses can be else is looking over a video link is called me- seen in Figure 8. It requires the glasses to be diated gaze. In a multi-party videoconference, synchronized with the TV in order to block the mutual gaze is necessary to determine who the correct eye at the right time. other participants are looking at, and whom they are addressing. It also helps direct turn- taking in conversation. Most current videoconferencing systems do not allow for mediated gaze. It is possible that this may have been a limiting factor in their social acceptance[11]. A typical videoconferencing system uses a webcam placed above or below the screen. This Fig. 8. Active shutter 3D glasses means that when you look directly at someone’s eyes on the screen on such a system, to Polarized 3D systems project the two view- them it will appear as if you are looking away points simultaneously, and polarized glass in slightly. the glasses lets through different images to In our joint masters thesis, an attempt was each eye. This method is used in 3D cinema, made to achieve mediated gaze by display- since it does not require any synchronization ing 3D video of the other user, with the between the projector and the glasses, which view rotated to compensate for the cam- would be impractical with so many viewers. era/screen offset[31]. This was done directly Anaglyph 3D is the classic method of encoding in the browser using WebGL. True mediated the two viewpoint images in different colors, gaze was never achieved because of a too low typically red and cyan. The main advantage resolution (160x120 pixels). In [10], the authors of this method is that it is very simple and attempted to solve this using a software ap- can be reproduced on any screen, even printed proach. Users were ﬁlmed, and the video of on paper. Autostereoscopic systems display 3D the face was texture-mapped onto a simple images without the use of special glasses. The 3D head model. Furthermore, their eyes were upper screen of the Nintendo 3DS is an ex- replaced with synthetic eyes. This way, head ample of such a display that uses a so called orientation and gaze direction could be prop- parallax barrier to display different images to erly conveyed. each eye on the same display. A nice 3D effect is achieved, but the viewing angle and distance 5.2 Stereopsis and Motion Parallax is critical. The human brain determines depth mainly The fact that the images presented to each through stereopsis; the two slightly different eye is predetermined by the media being dis- images received by the eyes are compared, and played on the screen can be disturbing to a the disparity is used to calculate distances. The viewer, as (s)he might expect the focus to shift brain also uses another depth cue, called mo- when looking at something in the background tion parallax. When observing moving objects, of the image, or the viewpoint to change when COMPEIT SURVEY, NOVEMBER 2013 12

(s)he moves around. Though not nearly as Several other approaches have been inves- effective as stereopsis[19], motion parallax pro- tigated. For example, a physical environment duces a consistent, reliable and unambiguous can be reconstructed as a computer model us- impression of depth[1]. A video screen that ing passive cameras[27], online images[20][16] changes its viewpoint based on a user’s line or from unordered 3D points[15], [26]. of sight to create motion parallax can be used In [25], the authors created a system called to take advantage of this phenomenon. This KinectFusion. It is a real-time 3D reconstruc- can be achieved on a non-3D display by using tion and interaction system which performs 3D head- and/or eye-tracking to estimate the po- tracking, reconstruction, rendering and inter- sition of the viewer and render content on the action. This is achieved through the use of a screen accordingly[31][12]. A drawback of this Microsoft Kinect and novel extensions to the approach is that it limits the number of viewers GPU pipeline developed by the authors. The to one person (at least on a normal display), 3D scene is built in real-time as the user moves since the rendered content can only be adjusted around with the Kinect camera. The system to one viewpoint at any given time. tracks the Kinect, continually fusing live depth data into the global 3D model of the envi- 5.3 3D Reconstruction ronment. The resulting reconstructions are of Constructing 3D models of physical objects a high quality, considering the low quality of can be done in different ways, but the most the input data and the real-time requirements. common approach is view synthesis - that is, Any surface that has been reconstructed with using images of a subject from several different KinectFusion can be turned into a multi-touch viewpoints to generate a 3D model. surface. Figure 9 shows KinectFusion in action. One such method is to extract the subject’s contour from the different viewpoints and in- tersect them to create a ”visual hull” (IBVH) [4]. Researchers at HP Labs developed a videoconferencing system called Coliseum that places live 3D models of participants in a virtual world. They used ﬁve synchronized cameras to capture different views of the user and a method based on IBVH called Image-Based Photo Hulls [9] to generate 3D models from the images. One of the papers published on the system [12] describes their solutions to many of the challenges involved in generating and streaming 3D content in real-time. Fig. 9. Left: Raw Kinect data. Right: Reconstruc- Another method of generating 3D models is tion to compare the relative movement of pixels between images from different views to calculate It is possible to extract 3D information from a disparity map. The disparity map translates a single stereo image. New 3D-capable Bluray into a 3D point cloud that can be used to players can automatically convert a 2D movie reconstruct the 3D surface of the subject. A to 3D, although the results are not always depth sensor, such as Microsoft’s Kinect, can accurate. [39] developed a learning-based 2D be used to obtain a depth map (point cloud) to 3D conversion algorithm using lookups in directly. The Point Cloud Library[43] provides a repository of reference data. Their algorithm an extensive set of state-of-the-art algorithms runs relatively fast and gives reasonable re- for image- and point cloud processing, such sults. How well it adapts to a real-time environ- as 3D surface reconstruction, feature estima- ment, such as a videoconference, remains to be tion and registration (combining several point seen. A survey of technologies for converting clouds into one). 2D to 3D can be found here: [14]. COMPEIT SURVEY, NOVEMBER 2013 13

AutoDesks free 123D Catch app (for iPhone Group, developers of the WebGL library, main- and iPad) lets you take up to 40 photos of tains a page for ”user contributions”1 where an object, upload them to the cloud and they most of the available WebGL frameworks can automatically construct a digital 3D model of be found, along with a number of example the object for you. projects. Some of them are discussed in the sections below.

6 BROWSER GRAPHICS 6.1.1 ThreeJS Three.js is one of the most widely used WebGL Modern browsers are becoming as capable as frameworks with a large, active community. native applications when it comes to rendering The API documentation is incomplete, but a graphics. This section describes recent develop- large number of examples are provided to help ments in browser graphics. developers get started. Three.js was developed before the introduction of WebGL. Because of this, it sports some features that its competition 6.1 WebGL lacks; in addition to its WebGL functionality, it can be used with SVG and the features of WebGL is a technology that will be impor- HTML5s canvas element. It can even use regu- tant to the COMPEIT project. It is a royalty- lar canvas rendering as fallback in cases where free JavaScript low-level API for 2D and 3D WebGL is not supported[32]. Figure 10) shows graphics, enabling GPU accelerated rendering an example image rendered with Three.js. in the browser without the use of plugins. It is based on the OpenGL for Embedded Systems 2.0 (OpenGL ES 2.0), which in turn is a subset of the Open Graphics Library (OpenGL). At the time of writing, WebGL is supported by several major browsers. WebGL lets the pro- grammer write programs consisting of a vertex shader and a fragment shader, that run on the GPU. Such a program can be set up to perform processing on each pixel of an input image to produce virtually any result. It can be Fig. 10. A night sky rendered using Three.js used to perform advanced image processing to create a huge variety of interesting effects on images or video streams. For example, chroma 6.1.2 BabylonJS keying can be used to replace the background Babylon is a relatively new framework that in a video, placing a person in completely new is not featured on the above mentioned list. surroundings. Displacement maps and photo Three.js was built to provide a wide range of ﬁlters can be applied to make a video appear GPU accelerated 3D graphics features. Baby- distorted or give it a new look. A nice showcase lon.js, on the other hand, is primarily aimed at of photo ﬁlters applied to a live video stream game development. It focuses more on tradi- (using WebGL and WebRTC) can be found tional game engine features like collision de- here: http://webcamtoy.com/app/ tection, antialiasing and custom lighting[32]. Drawing a 3D model with WebGL is a bit Figure 11 shows a screenshot from a virtual caf more complicated and involves a lot of boil- demo on the BabylonJS example page[47]. The erplate code and a lot of linear algebra. Luck- demo could provide some inspiration for im- ily, there are frameworks available to abstract plementing a virtual space for video meetings. away some of the details and complexity of We- bGL, making it easier and less time-consuming 1. http://www.khronos.org/webgl/wiki/User to develop new applications. The Khronos Contributions COMPEIT SURVEY, NOVEMBER 2013 14

7 TANGIBLE INTERACTION One of the three key web services that will be developed in the COMPEIT project is Shared Experience with Tangible Interaction. It will enable audiences to experience enhanced live media together, complemented by tangible and interactive games. Tangible user interfaces (TUIs) is a branch of this research which has recently gained a more widespread adoption. A TUI lets users interact with a the digital system by touching/moving Fig. 11. Windows caf, from the BabylonJS ex- a physical object which represents part of the amples page system. One purpose of this is to create a more intuitive interaction. They can take advantage of users skills and experience from working 6.1.3 Unity3D directly with physical objects. An example of Unity 3D is a game engine complete with an a simple and intuitive TUI is Apples iPod (at IDE. It is cross-platform, supporting develop- least one of its functions); you can shuffle the ment for Android, BlackBerry 10, Flash, iOS, playlist by simply shaking the device. Linux, OS X, PlayStation 3, Xbox 360, Wii U Tangible interaction has both strengths and and Windows Phone 8. It comes with tools weaknesses. Tangibles have an inviting quality for easy multiplatform publishing. Games built compared to classic, single-mouse graphical in- with Unity3D can also be deployed on the web, terfaces; they yield a significantly higher num- but this requires installation of their Web Player bers of visitors when used in a museum context plugin. Unity is (especially among girls)[21]. Tangible interaction is sometimes coupled 6.1.4 CSS3 with AR technology. In 2001, [6] built an ap- CSS3 allows you to apply 3D transforms to plication called MagicBook, which let users HTML elements on a webpage. While this interact with a computer system using a nor- is mostly used for fancy transition effects, it mal book. When looking at the pages through can actually be used to create an interactive an AR display, animated virtual 3D objects 3D environment[48]. This is very complicated would appear. There are also implementations though, and not a recommended approach. of tangible interfaces using tabletops as a basis for interaction.[30]. Studies in this area increas- ingly investigate mixed technologies, where a 6.2 Vector Graphics few tangible input devices are used on a multi- Another interesting option for drawing inter- touch table. [18] offers some thoughts and de- active graphics in the browser is using a vec- sign lessons concerning such systems: tor graphics library. The main competitors in • The creation of a novel and engaging user the field are Paper.js, Processing.js and Raphal. experience must be balanced with what They each come with their pros and cons (an makes sense for the envisioned tasks of extensive comparison can be found here[49]), the system; incorporating physical objects but the one that is most likely to be of use in is pointless if users rather use shortcuts the COMPEIT project is Paper.js. It has been and learned conventions that reduce the used for online collaborative drawing (by a amount of physical work needed. company called iDipity), and supports mouse • When modelling actions for which no ob- and keyboard events and hit testing, allowing vious physical tool exists, it may be wise for creation of beautiful and interactive web to rely on established digital conventions content. (or creating new ones). COMPEIT SURVEY, NOVEMBER 2013 15

• Are there already existing physical artifacts Many museums sport interactives combin- which are designed speciﬁcally for the task ing hands-on interaction with digital displays, (such as a pen for drawing). If that is which can be interpreted as TUIs; at the Vienna the case, it may be valuable to investigate Haus der Music, visitors can roll dice to au- ways of incorporating them into the de- tomatically generate a waltz. At the Glasgow sign. Science Museum, an exhibition about DNA • Controlling a 3D world on a 2D surface featured exhibits which let users tangibly ma- (as in [31]) can create design conundrums. nipulate strands of DNA to understand how It encourages the use of gestures to ma- different selections effect genes[23]. nipulate virtual objects, but the lack of a third dimension makes many such actions 7.2 TangibleAPI impossible. Designing a gestural vocabu- lary and ﬁnding a good balance between TangibleAPI was developed as a masters the- actions that are seen as physical work and sis project at LTU. Its purpose is to make those that provide shortcuts that users ex- computer-connected tangible devices accessible pect in the digital world becomes a chal- from a web browser in a general way. The lenge. devices that are connected need to have drivers written for them to expose their functionality to The last of these points is discussed in [37], the TangibleAPI server. where the authors explored how the traditional 2D multi-touch interaction transitions to 3D objects. In their work, gestures that act on, with, 8 HARDWARE or like something else (like swiping to unlock) The COMPEIT project description states that are referred to as metaphorical, while gestures Internet-based distribution will transform tra- that should ostensibly have the same effect on ditional broadcasting towards higher levels a table with physical objects are called physical. of interactivity and integration with virtual, They concluded that designers should take into mixed and augmented reality, enabled by the consideration that users intuitively use the ges- proliferation of audio/video/tangible devices. tures they are accustomed to (metaphorical). If The following is a list of such devices, in physical gestures are to be used, there needs to particular ones that may be of interest for the be some form of guidance for the users. COMPEIT project, both for augmented reality and tangible interaction purposes. 7.1 Tangible Interaction in Entertainment and Education 8.1 Oculus Rift There are several examples of systems making Oculus Rift is a virtual reality head-mounted use of tangible interaction in entertainment and display (HMD) developed by Oculus VR. It education: A toy called MusicBlocks lets chil- displays virtual worlds in true stereopsis 3D. dren create musical score by placing colored It sports a 7 inch screen which is partitioned blocks into it. Another toy, called SonicTiles, into two displays, one for each eye (see Figure allows children to play with the alphabet.[23] 12). The users head is tracked using a gyro- Nintendo Wii is a recent example of the market scope inside the HMD, and the virtual camera potential of tangible interaction; the controller, is moved accordingly. A consumer version is which is shaped like a TV remote control, is expected to be released in the near future. A the main input device for the video game con- development kit, complete with SDK, can be sole. Using an accelerometer and optical sensor ordered for 300 USD [46]. technology, it can be used to interact with items An observation: The Oculus rift has many on screen through gesturing and pointing. An possibilities, but also some inherent drawbacks. input device with similar functionality, called While it may be a good tool for virtual reality Playstation Move, is available for Sonys video applications, virtual meetings with real people game console Playstation 3. will be problematic. The head mounted display COMPEIT SURVEY, NOVEMBER 2013 16 is very large, covering most of the users face. make the COMPEIT platform compatible with This means that in order for users to see each the Pixelsense. However, it is not aimed at a others expressions, avatars must be used in- mainstream audience, and so it should not be stead of real video. a high priority.

Fig. 12. Oculus Rift - Separate images are shown to the users left and right eye Fig. 13. Microsoft PixelSense

8.2 3D printed VR headsets 8.4 Narrative It is possible to create your own 3D virtual Narrative is a small, wearable life-logging cam- reality headset using a 3D printer and a smart- era which automatically snaps a picture every phone; the smartphone is used as a display, the 30 seconds. The Narrative app makes the col- rest of the parts are 3D printed. The phones lected images searchable and shareable. display is divided into two parts, one for It is being developed by a Swedish company, each eye, similar to the Oculus Rift. There are funded through kickstarter. The user needs that JavaScript libraries for designing web appli- Narrative addresses, according to its creators, cations compatible with such systems.VR2GO are the following: Mobile Viewer is such a 3D printable headset, • Re-experience - reliving moments from one which comes with a software package for cre- own’s and other people’s lives ating immersive virtual and augmented reality • Surprise - having pictures that wouldn’t experiences. have been taken • Being present - avoid disturbing a ”magic” moment 8.3 Microsoft PixelSense • Life improvement - using data to observe Microsoft PixelSense is a computing platform and change behaviour for high-deﬁnition multitouch displays, which • Preservation - documenting our children’s can be placed horizontally, like tables. The life implementation that is currently being sold is • Control - knowing the pictures are stored called Samsung SUR40. It can detect 52 simul- safely taneous points of contact made by ﬁngers, tags • Convenience - having an automatic service or blobs. Visual data is also available to its manage the picture applications. Several people can use the device Either way, many people today feel a need to and its applications simultaneously, standing share almost everything that happens around on all sides. The Samsung SUR40 can be bought them on some social network, be it Facebook or with a starting price of 8,400 USD. Figure 13 Twitter or something else. Taken to its logical shows a Microsoft Pixelsense. extreme, this trend requires a piece of hardware The Pixelsense can be a great tool for dig- that can take photographs of everything. Nar- ital collaboration, so it might be valuable to rative could be used in some way for the media COMPEIT SURVEY, NOVEMBER 2013 17 production aspect of the COMPEIT project. If so, great consideration must be given to the ethical implications of constantly recording users activities.

8.5 Microsoft Kinect Kinect is an input device for Microsoft’s Xbox 360 video game console and Windows PCs. It features a regular RGB video camera as well as a depth sensor and microphone. An infrared laser beam is dispersed through a prism to create a grid of IR dots on the environment Kinect; it achieves sub-millimeter precision in in front of the Kinect. A camera captures the (almost) real-time.[36] The Leap Motion can infrared light bouncing back to the Kinect, and be ordered for 701.99 Swedish Kronor. An the information is used to construct a three- SDK and sample applications are available for dimensional view of the surrounding environ- download for free, and the Leap Motion web- ment. site hosts an active developer community.[45] A new version of Kinect is to be shipped to- While the Leap is really fast and accurate, gether with the upcoming video game console actually using it can be a little awkward. Keep- Xbox One. It features several improvements: It ing your hands in the air and waving them uses a 1080p wide-angle camera. It can detect around is fun for a while, but as your arms get a users facial expression and numerous new tired you start longing for the good old mouse portions of the users skeleton, analyzing the and keyboard pretty quickly. Another use case weight put on each limb, as well as speed and that feels more suitable is 3D scanning: imag- angle of body parts. Up to six users can be ine having a Leap in front of your keyboard tracked at the same time. [34] In our master’s and being able to scan physical objects by just thesis, a Kinect was used in a web application holding them up in front of the screen. This is in several different ways, the main two being already possible using a Kinect, but the extra ad hoc touch surfaces and streaming 3D video. precision and speed of the Leap may produce While a depth sensor can make valuable better results. data available to the application, relying on relatively uncommon hardware and third-party browser extensions excludes a large number of 8.7 Arduino potential users. Image processing methods on Arduinos are open-source, programmable elec- regular webcam video can to some extent be tronic circuits that can be connected to sen- used to gain the same input data. A library sors and actuators to do basically anything. called HeadTrackr.js, for example, can be used They can connect to the internet, which makes to track a users head orientation and position in remote control via a web browser possible. 3D. This could possibly be used as a fallback for Allowing remote participants in a videoconfer- users who do not have access to a depth sensor, ence to control aspects of the physical environ- resulting in a more graceful degradation. ment adds a whole new dimension of presence Below is a list of browser plugins that let web and could/should be investigated in some way applications access a depth sensor. in the COMPEIT project.

8.6 Leap Motion 8.8 Smart Toys Leap Motion is an input device for real-time hand gesture recognition. It offers greater ac- Finally, there are several so called smart toys curacy and lower latency than comparable con- available on the market, which may be of in- trollers in the same price range, like Microsofts terest. COMPEIT SURVEY, NOVEMBER 2013 18

8.8.1 Sphero Spheros are robotic balls made by a company called Orbotix, which can be connected to a smartphone or a computer via Bluetooth. The connected device can tell the Sphero to move with a specified speed and direction, or to light up in different colors. The Sphero can in turn send reports to the connected device about its movements. The newest version of Sphero is faster and smarter than the original. It can roll at speeds of seven feet per second. Sphero Fig. 14. Three Sifteo cubes Original is sold for 109.99 USD, and Sphero 2.0 is sold for 129.99 USD. These can be made accessible to the browser via the TangibleAPI. 8.8.3 Bo and Yana Spheros were used in a student project called Bo and Yana are toy robots developed by a The Tangibles, where a ”mediated sketching company called Play-i. Children as young as table” was developed[40]. Each table (option- 5-8 are supposed to be able to use it to learn ally) had a Sphero that started blinking when basic computer programming. The robots are the user was invited to join a chat room. A programmed using a smartphone, through a lot of time was spent getting the system to basic touch-screen interface which makes use communicate with the tangible object, but they of music, stories and animations. To make it weren’t paid much attention in the user tests usable by such young individuals, it does not that were performed. A more interesting use make use of a keyboard. The robots commu- case may be to allow remote users to control nicate through Bluetooth 4.0, and can be used the Sphero via the chat room, which would individually or together[50]. give them a real, physical presence. Patients at a children’s hospital are one of tar- get demographics for the COMPEIT platform, 8.8.2 Sifteo cubes meaning that a fun and easy-to-use interface will be needed. Easily programmable robot Sifteo are intelligent cubes running 32-bit toys may very well be perfect for this purpose. ARM-based CPUs. They can connect a computer running SiftRunner software via a wire- 8.8.4 Oriboo less USB radio with a 20 foot-range. Using a Oriboo is another smart toy aimed at children built-in three-axis accelerometer, they can de- that is relevant for the same reasons as Bo an tect movements like shaking, rotating, flipping. Yana above. It is a robotic ball, like the Sphero, They are also capable of sensing and reacting but with cameras, a small display and speakers. to other nearby Sifteo cubes, and the display acts as a button (see Figure 14). Each cube has 8 Mbytes of flash memory, and runs on 9 CONCLUSIONS rechargeable batteries.[29] Sifteo cubes can be The COMPEIT project touches upon many made accessible to the browser via the Tan- large research fields. This report gives an gibleAPI. In the Tangibles project mentioned overview of some of them. Areas that have above, Sifteo cubes were used as another input been discussed include modern videoconfer- device beside the Sphero ball. The same goes encing systems, augmented and virtual reality, for them that they added little to the experi- 3D video, tangible interaction, browser graph- ence, as they merely acted as buttons to replace ics and communication, and the available hard- some keyboard and mouse functionality. This is ware that may be of interest for the COMPEIT not a problem with the toys themselves, rather project. us not coming up with anything very useful Modern videoconferencing systems still lack for them to do. virtual eye contact. This is a serious drawback COMPEIT SURVEY, NOVEMBER 2013 19 to the medium, which will be dealt with. As is the Google+ Hangouts API, as it can be is discussed in section 2, this can be done combined with external Javascript libraries and both through hardware and software solutions. provides highly reliable media streaming and New features such as 3D avatars and automatic data synchronization out of the box. control of camera movements are emerging in Tangible devices will be used to further en- virtual video chat systems. It might be interest- hance the user experience, providing an intu- ing to integrate such functionality with real live itive interaction with the system. Smart toys videoconferencing technology, for example by can add value to the user experience, but it adding several cameras in the same location. is important to get the design right, otherwise Augmented reality will be used to enhance they may end up being a distraction rather the user experience. Section 4 describes modern than adding immersion. Simply replacing the AR technologies and frameworks which will mouse and keyboard with something the user be useful to the project. Past experience has is unfamiliar with just for the sake of it is rarely proven that tracking AR markers on a display successful. Section 7 lists important lessons for or in a projected image can be very diffi- the design of such user interfaces. cult [40]. Designing an environment for mixed reality needs to take this into consideration. REFERENCES New web technologies enables rich applica- [1] B. Rogers and M. Graham, Motion parallax as an independent tions directly in the browser, without the need cue for depth perception, 1979 for external software. The COMPEIT project [2] S. Acker, and S. Levitt, Designing videoconference facilities for improved eye contact, Journal of Broadcasting and Electronic will largely rely on such technologies. For ex- Media, 31(2): 181-191, 1987 ample, advances are made in the field of 3D [3] D. Schmalstieg, A. Fuhrmann and G. Hesina. Bridging video, and new web APIs such as WebGL make multiple user interface dimensions with augmented reality, 2000 [4] W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. it possible to take advantage of these advances McMillan. Image-based visual hulls. In SIGGRAPH, pages using only a browser. There are many different 369374, 2000. ways to produce and share 3D content, some [5] G. Klinker, D. Stricker, and D. Reiners, Augmented reality for exterior construction applications, 2001 of which have been discussed in this paper. [6] M. Billinghurst, H. Kato and I. Poupyrev, The MagicBook - Streaming live depth and video data from a Moving Seamlessly between Reality and Virtuality, 2001 Kinect was investigated and proven feasible [7] S. Thrun, Robotic Mapping: A Survey, 2002 in our master’s thesis work, but only for a [8] P. Kauff and O. Schreer, An Immersive 3D Video-conferencing System Using Shared Virtual Team User Environments, 2002 limited number of participants. Algorithms im- [9] Slabaugh G., Schafer, R., and Hans, M., Image-Based Photo plemented in the Point Cloud Library can help Hulls, 1st International Symposium on 3D Processing, Vi- produce better results at a lower bandwidth sualization, and Transmission, 2002, pp. 704-708. [10] J. Gemmell and D. Zhu, Implementing gaze-corrected video- cost and should be investigated. Generating conferencing, Proceedings of CIIT, 2002 3D content from multiple camera images is [11] R. Vertegaal and Y. Ding, Explaining Effects of Eye Gaze on another possibility, but setting up cameras and Mediated Group Conversations: Amount or Synchronization?, 2002 calibrating them requires a lot of work from the [12] Baker, H. Harlyn, et al. Computation and performance issues user. It should be possible to build a system in coliseum: an immersive videoconferencing system. Proceed- that automatically generates a 3D representa- ings of the eleventh ACM international conference on Multimedia. ACM, 2003. tion of a room from any available video input, [13] J. Congote, I. Barandiaran, J. Barandiaran, T. Montserrat, be it smartphones or surveillance cameras. J. Quelen, C. Ferran, P. Mindan, O. Mur, F. Tarres and O. The COMPEIT platform will be built on Ruiz, Real-time depth map generation architecture for 3D, 2005 [14] Wei, Q. Converting 2d to 3d: A survey, 2005 top of WebRTC, a new standard for real- [15] M. Kazhdan, M. Bolitho, and H. Hoppe, Poisson surface time browser communication. There are many reconstruction, 2006 frameworks available designed to ease the [16] P. Merrell et al, Real-time visibility-based fusion of depth maps, 2007 design process when working with WebRTC. [17] P. Shirley, K. Sung, E. Brunvand, A. Davis, S. Parker and These are of great interest to the project even if S. Boulos, Fast ray tracing and the potential effects on graphics the COMPEIT partners decide to develop their and gaming courses, 2008. [18] D. S. Kirk, A. Sellen, S. Taylor, N. Villar and S. Izadi, own solution, especially for fast prototyping. Putting the physical into the digital: Issues in Designing hybrid Another good option for rapid prototyping Interactive Surfaces, 2009 COMPEIT SURVEY, NOVEMBER 2013 20

[19] S. P. McKee and D. G. Taylor, The precision of binocular and monocular depth judgments in natural settings, Smith- Kettelwell Eye Research Instiite, San Fransisco, CA, USA, 2010 [20] J. Frahm et al, Building Rome on a cloudless day, 2010 [21] M. S. Horn, E. T. Solovey, R. J. Crouser and R. J. K. Jacob, Comparing the Use of Tangible and Graphical Programming Interfaces for Informal Science Education, 2009. [22] DWF Van Krevelen and R Poelman, A Survey of Augmented Reality Technologies, Applications and Limitations, 2010 [23] Orit Shaer and Eva Hornecker, Tangible User Interfaces: Past, Present, and Future Directions, 2010 [24] Buxton, Bill, Sketching User Experiences: Getting the Design Right and the Right Design: Getting the Design Right and the Right Design, 2010 [25] Shahram Izadi et al, KinectFusion: Real-time 3D Reconstruc- tion and Interaction Using a Moving Depth Camera, 2011 [26] K. Zhou, M. Gong, X. Huang, and B. Guo, Data-parallel octrees for surface reconstruction, 2011 [27] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2011 [28] J. Tang et al, Social telepresence bakeoff: Skype Group Video Calling, Google+ Hangouts, and Microsoft Avatar Kinect, 2012 [29] L. Garber, Tangible User Interfaces: Technology You Can Touch, 2012 [30] T.J. de Greef, C. Gullstrom,¨ L. Handberg, H.T. Nefs and P. Parnes, Shared mediated workspaces, 2012 [31] Jimmy Nystrom¨ and Nicklas Nystrom,¨ Mediating Presence, Masters thesis, Lulea˚ University of Technology, 2013. [32] Three.js and Babylon.js: a Comparison of WebGL Frameworks, https://developers.google.com/speed/webp/docs/ webp study, 2013 [33] OpenCV.org, http://opencv.org, 2013 [34] IGN.com, http://www.ign.com/articles/2013/05/22/ xbox-ones-kinect-is-legitimately-awesome, 2013 [35] virtual.vtt.fi, http://virtual.vtt.fi/virtual/proj2/ multimedia/alvar/index.html [36] F Weichert, D Bachmann, B Rudak, D Fisseler, Analysis of the Accuracy and Robustness of the Leap Motion Controller, 2013 [37] Sarah Buchanan, Bourke Floyd, Will Holderness and Joseph J. LaViola, Towards User-Defined Multi-Touch Gestures for 3D Objects, 2013 [38] YAGV Boas, Overview of Virtual Reality Technologies [39] Learning-Based, Automatic 2D-to-3D Image and Video Con- version, Janusz Konrad, Fellow, IEEE, Meng Wang, Prakash Ishwar, Senior Member, IEEE, Chen Wu, and Debargha Mukherjee [40] The Tangibles Project, http://the-tangibles.blogspot.se/, 2012 [41] github.com/kig/JSARToolKit, https://github.com/kig/ JSARToolKit [42] easyRTC.com, http://www.easyrtc.com [43] PointClouds.com, http://pointclouds.org/ [44] Google Scholar Metrics, http://scholar.google.com/ citations?view op=top venues [45] LeapMotion.com, https://www.leapmotion.com/ [46] OculusVR.com, http://www.oculusvr.com [47] babylonJS.com, http://www.babylonjs.com/ [48] Keith Clark experiment, http://www.keithclark.co.uk/ labs/css3-fps-new/ [49] Web-Drawing Throwdown: Paper.js Vs. Processing.js Vs. Raphael, http://coding.smashingmagazine.com/2012/02/ 22/web-drawing-throwdown-paper-processing-raphael/ [50] Bo and Yana, http://mashable.com/2013/10/28/playi/ COMPEIT SURVEY, NOVEMBER 2013 21

APPENDIX A • IEEE International Workshops on Enabling PUBLICATION Technologies: Infrastructure for Collabora- tive Enterprises (14, 254) The COMPEIT project covers a wide range of • Conference of the Center for Advanced research areas, which means there are many Studies on Collaborative Research (13, 245) suitable targets for publication available. Below • Collaborative Computing (11, 136) is a list of conferences and publications relating • International Conference on Intelligent to those areas. The numbers in parentheses next Networking and Collaborative Systems to the journals show their h5-index (h-index for (INCoS) (8, 97) articles published in the journal the past ﬁve • International Conference on Interactive years) and h5-median from Google Scholar[44]. Collaborative Learning (ICL) (4, 5)

A.0.1 3D and Augmented and Virtual Reality A.0.3 Distributed Applications and Cloud • IEEE Transactions on Image Processing Computing (71, 107) • IEEE International Conference on Cloud • European Conference on Computer Vision Computing (CLOUD) (30, 49) (66, 101) • IEEE International Conference on Cloud • IEEE Transactions on Circuits and Systems Computing Technology and Science for Video Technology (48, 66) (CloudCom) (29, 40) • ACM SIGGRAPH/Eurographics Sympo- • P2P, Parallel, Grid, Cloud and Internet sium on Computer Animation (25, 40) Computing (7, 12) • ACM Transactions on Graphics (72, 94) • ACM International Symposium on High • IEEE Transactions on Visualization and Performance Distributed Computing (27, Computer Graphics (48, 66) 44) • Journal of Visual Communication and Im- • IEEE International Conference on Cloud age Representation (21, 37) Computing and Intelligence Systems (6, 6) • Virtual reality, Multimedia Tools and Ap- • International Conference on Cloud and plications (24, 34) Green Computing (2, 3) • CHI - Conference on Human Factors in • IEEE Transactions on Parallel and Dis- Computing Systems tributed Systems (51, 65) • ITS - ACM International Conference on In- • IEEE International Symposium on Parallel teractive Tabletops and Surfaces (ITS) (24, & Distributed Processing (45, 59) 35) • Distributed Computing (19, 29) • UIST - ACM Symposium on User Interface • ACM/IEEE International Conference on Software and Technology (36, 66) Distributed Smart Cameras (19, 26) • TEI - Tangible and Embedded Interaction • International Symposium on Distributed (22, 28) Computing (DISC) (18, 28) • SIGGRAPH Annual Conference on Com- • arXiv Distributed, Parallel, and Cluster puter Graphics (16, 27) Computing (cs.DC) (34, 80) • Journal of Parallel and Distributed Com- A.0.2 Group Communication puting (33, 44) • ACM International Conference on Sup- • International Conference on Distributed porting group work (17, 30) Computing Systems, ICDCS (32, 44) • Group Processes & Intergroup Relations • ACM Symposium on Principles of Dis- (Social Psychology journal) (22, 29) tributed Computing (PODC) (29, 42) • International Journal of Computer- • Euromicro International Conference on Supported Collaborative Learning (26, Parallel, Distributed and Network-Based 482) Processing (20, 34) • International Symposium on Collaborative • IEEE International Conference on Dis- Technologies and Systems (15, 193) tributed Computing in Sensor Systems (19, COMPEIT SURVEY, NOVEMBER 2013 22

30)

A.0.4 Other Web Related Journals • International World Wide Web Confer- ences (WWW) (87, 148) • European Physical Journal Web of Confer- ences (13, 24) • International Journal of Web and Grid Ser- vices (11, 18) • Web Information Systems and Technolo- gies (11, 14) • Web Semantics: Science, Services and Agents on the World Wide Web (38, 79) • IEEE International Conference on Web Ser- vices (31, 45) • IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (22, 27) • International Conference on Web Engineer- ing (ICWE) (19, 25) • International Conference on Internet and Web Applications and Services (18, 21) • Conference on Web Accessibility (16, 21) • Web Information Systems Engineering (WISE) (16, 21) • European Conference on Web Services (16, 19) • World Wide Web (15, 25) • International Journal of Web Based Com- munities (10, 19) • ACM International Conference on Web Search and Data Mining (48, 78) • International Conference on The Semantic Web (42, 63) • Extended Semantic Web Conference (ESWC) (41, 58) • International Conference on Information Integration and Web-based Applications & Services (IIWAS) (14, 20)