Kinect Identity: Technology and Experience
Total Page:16
File Type:pdf, Size:1020Kb
ENTERTAINMENT COMPUTING Kinect Identity: Technology and Experience Tommer Leyvand, Casey Meekhof, Yi-Chen Wei, Jian Sun, and Baining Guo, Microsoft Kinect Identity, a key component of Microsoft’s Kinect for the Xbox 360, combines multiple technologies and careful user interaction design to achieve the goal of recognizing and tracking player identity. ontroller-less immersion • session, in which the system • extracting the facial signature to has been the Holy Grail remembers who’s who during capture the face’s microstructure. of game designers and a particular game session, for C developers for many example, player 1 versus player Adding clothing color and the player’s years now. One of the primary 2, along with their scores. height to facial recognition technology challenges is how to seamlessly fuses multiple characteristics to form track and successfully recognize an To maximize the chances of creating identity recognition. individual’s identity during game play a successful identity tracking system, At runtime, the game asks the to ensure a smooth and natural user Kinect’s developers experimented system to remember a new player, interface. with a set of independent identification so the system gathers prints, or So far, no perfect solution exists. technologies, selecting a set that, when signatures, for each of the unidentified Nevertheless, the identity technology combined, created a complete picture. skeleton’s characteristics—face, behind Microsoft’s Kinect for the Xbox Each of the selected technologies clothing, height—each of which 360 provides enhanced gaming and needed to be robust (at least for short- provides a response of “positive” entertainment experiences by combin- term changes), non-CPU and memory (for example, shirt color matches ing multiple technologies based on the intensive, and as independent as the known print), “negative” (stored use of RGB cameras, depth-sensing, possible from the others. shirt color is blue but unidentified and careful user interaction design. The final set consisted of three shirt color is green), or “unknown” techniques: face recognition, clothing (too close to call for this particular IDENTITY TRACKING color tracking, and height estimation. characteristic). TECHNIQUES We can break down the When the game attempts to The Kinect system tracks identity most important of these—facial determine whether a new skeleton in two ways: recognition—into three subtasks: is already known, the system runs through all existing candidates • biometric sign-in, where the • determining the face’s location and produces a “truth table” that system learns a player’s appear- and size; gives a recommendation for each ance over time and signs that • aligning the face to “normal” characteristic. In the example person in when he or she is in coordinates, that is, head straight shown in Table 1, when the face view; and up, facing the camera; and characteristic is compared, #1, #3, 94 COMPUTER Published by the IEEE Computer Society 0018-9162/11/$26.00 © 2011 IEEE Table 1. A truth table for identifying a new player in a game. Characteristic Enrolled ID #1 Enrolled ID #2 Enrolled ID #3 Enrolled ID #4 Face Positive Negative Positive Unknown Clothing color Unknown Unknown Negative Negative Height Positive Positive Unknown Unknown and #4 are candidates. However, since time. Session identity has a short to fix the capture conditions, which #1 and #3 are both positive, they are time frame, and players don’t tend greatly improves the speed and treated as unknowns. Comparing to change their clothes, rearrange accuracy of identification. color eliminates #3 and #4 as options their hairstyle, grow a beard, or Naturally, players might take off because of the negative responses. decide to switch between wearing a layer of clothing during a single Although #1 is left as an option, the contacts or glasses in the middle gaming session, which will produce process continues because a positive of a game, but they do change their a false negative if the new layer of response is still missing. Finally, facial expressions, strike different clothing is a different color than the when height is compared, #1 has a poses, or change the lighting in that system previously matched with positive response. Since it is the only span. Biometric identity is even more that player. If the returned response positive response in the candidate set, challenging because appearance is always to reject a clothing color a successful identification has been will almost certainly change across mismatch, then a false negative will made. different gaming sessions (players always be produced when someone For the system to successfully are likely to wear different clothes, takes off a piece of clothing during identify the currently unidentified perhaps someone got a haircut a gaming session. However, never skeleton as a previously enrolled between sessions), as will lighting rejecting in this scenario would entity, the following must hold true: conditions (playing in the afternoon discard a lot of valuable information. or at night). This is another reason why relying • At least one positive response To work around these issues, on other characteristics is so crucial must be returned, and no neg- the system associates prints with to identity in Kinect: face and height ative responses are allowed as many environmental details as will provide a positive response and (except for a very strong face possible, such as where they were override the clothing color mismatch recognition match). captured and what the lighting levels (under certain circumstances). • Starting from the face, then cloth- were like. The system then takes How does Kinect differentiate ing color, and finally height, only these into account when matching between identical twins? The one candidate can be positive; in a new player against a set of known truth table has a better chance of the case of multiple matches, the players. For biometric sign-in identity, succeeding if it knows that it’s dealing system deduces that the match Kinect Identity relies only on facial with two people who look exactly can’t be fully trusted, so it treats recognition. It asks players to move the same as opposed to knowing it as unknown. around the play space to capture the only one twin and matching the various local environmental details other incorrectly. If Kinect sees them The system then processes the and then reruns the tool to capture concurrently, or if they have two results of the truth table to produce more data if the players significantly separate Kinect identity profiles, the final result. Interestingly, some alter their appearance or the lighting the system stands a better chance of characteristics, such as height, changes. telling the two apart. This is further are excellent for rejecting a clear As part of remembering a person enhanced during a single session, mismatch, but by themselves, these and trying to recognize a new when clothing differences can help characteristics aren’t accurate enough skeleton, the system captures a series differentiate between the two. But to accept a match: many people are of frames to procure more of the keep in mind that if a human has a the same height, so height by itself possible variations, usually when the difficult time telling identical twins isn’t enough to identify a person. conditions are suitable or, better still, apart, so will Kinect! optimal, for recognition. For example, CHANGES OVER TIME it’s better to skip frames when the CHALLENGES Of course, identification tech- player isn’t facing the sensor or when One of the most challenging nology also must be able to adapt to the face is occluded. Kinect also gives aspects of developing Kinect Identity changes in physical appearance over players feedback and an opportunity involved accuracy—both measuring APRIL 2011 95 ENTERTAINMENT COMPUTING and regressing—when making menting a good user experience communicating to the player which changes to the code or algorithm. around it is equally, if not more, approach is currently in use. The Accuracy is extremely important challenging. How do you design an key is clear and consistent feedback. because the environmental and experience around a system that’s A common pattern that tends to personal conditions in a working never quite sure if it’s right? The work is determining identity prior environment aren’t representative of best way to do so is to design how to starting game play to review the the customer base. everything should work, assuming active profiles and then letting users Kinect’s developers focused on two identity doesn’t exist, and then look jump in and take control without requirements for measuring accuracy: for shortcuts—areas that can be sped changing their identity. An important the data should be representative up using identity. decision in Kinect’s development of real-world environments, and The “wow” factor is very important was to treat profile selection as an they needed a lot of it. For the latter for Kinect, so its developers carefully identity operation, meaning that requirement, developers used various considered the options for handling whenever a user selects a profile in beta programs to collect as much data positive results (the system believes it the sign-in dialog, the player’s identity as possible: the data-capturing tool has matched the skeleton to a known is associated with that profile for the itself, tools for tagging the ground- user), negative results (the system session. truth, and testing tools to train the doesn’t recognize the person), and algorithm. To make things even the failed operation (the system s with any system, it isn’t more challenging, some changes is unable to complete). Because just one or two parts that that directly affected the image frame the positive result has the greatest A make Kinect great, but (camera settings such as exposure potential to wow, developers put the rather the fusion of a variety of tech- or zoom) invalidated the old dataset most emphasis on it by using a “trust nologies, techniques, methodologies, and required gathering new data with but verify” approach: instead of and ideas into one design.