AR / VR Opinions on The State of the industry by < [email protected] >

Contents:

Where VR went wrong, why this matters to AR and how to set it right. Private Launch Party VR / AR Style Augmented World Expo Optically transparent Christopher Grayson The largest displays will be speaking. and event of any kind, in the world. general purpose AR interface Other notable developments Consumer face recognition & Mesh-networks Monday, Feb. 27th Wednesday, May 31st Events Request an invite WeWork Civic Center, to Friday, June 2nd CONCLUSIONS San Francisco Santa Clara Convention Center 2

Where VR went wrong, why this matters to AR and how to set it right. Where VR content went off course 3

2011 2012 2013 2014 2015 2016 2017 2018 2019

Oculus Rift ships to the consumer Consumer Stereoscopic (class of 2011)

For CES 2011, stereoscopic cameras were a thing. Many were introduced but now no longer available. The trend peaked prematurely. The problem in 2011 was that there were no good 3D content consumption devices, VR headsets were still a few years away. You could take stereoscopic 3D images and videos, but there were very few ways to view them. The personal viewers that existed were little more than glorified Victorian era stereoscopes. Where VR content went off course 4

2011 2012 2013 2014 2015 2016 2017 2018 2019

Oculus Oculus DK1 Kickstarter ships ships to the consumer Oculus & The 2nd coming of VR

Oculus launched their Kickstarter in 2012, by the time they came to market in 2016, there were many other pretenders joining the fray.

The focus shifted to gaming where 360° content is the norm. With the gaming industry in the driver’s seat, and with 360° the standard approach, it was taken as a given that 360° was best for all things (and even treated as more important than depth). Stereoscopic cameras were off the market just before headsets arrived. Where VR content went off course 5

2011 2012 2013 2014 2015 2016 2017 2018 2019

Consumer 360° (The Dead end)

Then the consumer market was flooded with 360º cameras, most of them ball shaped. This category was a huge distraction. A good test for gadget success in the mass consumer market is, “How will a smartphone eat this?” In the short run, new gadgets come along as standalone On 360° Cinema What is 360° devices. They typically achieve mass market success if “It’s nonsense, video good for? they are “pocketable,” and if so, they are only viable up you are looking forward, and • Real Estate until they are absorbed into the smartphone. Phone? sometimes left and right, • Entertainment events: Music Player? Digital Assistant/Day Planner? Camera? but not behind you. • Sports All now merely features or apps on our smartphones. It’s really a waste of pixels.” • Music Ball shaped 360º cameras are neither pocketable nor a —Greg Madison … but not for either form factor easily absorbed into a smartphone … and UX, cinematic or UGC most do not even capture in stereoscopic, so the “VR” source: Fast Companyx x 3D video content. experience is an inside-the-cylinder effect. Where VR content went off course 6 …and why it matters to AR (to go forward, take one step back)

2011 2012 2013 2014 2015 2016 2017 2018 2019

Who will show leadership?

“Camera-Through” AR The stereoscopic camera in a phone was the right direction, but the lenses need to be placed at the proper pupillary-distance, aprox. 60mm, to match human scale. Vintage examples:

Vuzix camera-through AR To go forward, AR needs to take one step back. This turns any phone-based VR headset into an AR headset via camera-through, just as AR was performed in pre-2011 AR & VR headsets. With the size of the iPhone’s hardware market, if Apple adopted this approach, it would both flood the market with UGC VR content, as well as create a transitional Camera-through attachments shown on NVIS & Sensics VR headsets in years past. stage from smartphone to AR smartglasses. Where VR content went off course 7

Notable Had the industry not lost focus and taken a distracting detour with 360° cameras, stereoscopic camera smartphones could have been both a boon for UGC VR content, as well as the basis for camera-through AR.

Lucid is principally a VR video editing software company. CEO Han Jin says they Course correction introduced their stereoscopic camera, LucidCam, because If a player steps up to shows some leadership, this can still happen. there were no good consumer stereoscopic Apple should have done it with the introduction of the iPhone 7. I’ve lost cameras on the market. confidence that Apple is going to show industry leadership. Given the LucidCam is a reference model. Han also agrees with introduction of both the Studio and HoloLens, I’m most inclined to view Greg Madison of Unity, and as the innovation leader among large tech companies today. As a Mac this author, that 360º video is, user since the 80s, I don’t say that casually and would love to be proven wrong. in most cases, a distraction. I write with unwavering conviction: If just one major handset maker stepped up and introduced a smartphone with a stereoscopic camera*, capable of UGC VR video and camera-through AR, it would do more to propel both the VR & AR industries forward than another dozen me-too headsets, or anything happening in the AR smartglasses space.

* it has to happen at scale, so it has to be from a major handset maker — a startup simply cannot do it —in fact, Apple or Samsung may be the only two players with enough market share to impact the market — though in their absence, another handset maker could make a big splash in the market by moving fast. Where VR content went off course 8

Notable To The Detractors Don’t let the Perfect be the Enemy of The Good.

I receive a lot of industry resistance to my advocacy of stereoscopy in smartphone cameras. It comes in the form of two arguments, both almost entirely from those in the gaming space: Notable The hand gesture tracking of • Those who say 180º stereoscopic capture for UGC Occipital’s volumetric mapping , as demonstrated tech is similar to that which is not VR, to be VR is must be 360º capture and for VR, also has cross-over Microsoft uses in , or the anything less should not be tolerated. application in AR. Apple acquired Prime Sense. • Those who say only six-degrees of freedom of The Occipital Bridge headset combines their Structure Sensor movement through volumetric space is VR and and some impressive software, anything less should not be tolerated. creating a camera-through inside- out mapping AR experience. To both I say: 1 The Structure Sony sold 9.7M PS4’s in the holiday quarter of 2016 . Sensor is also In the same quarter Apple sold 78.3M iPhones2 available stand- and the total Android market sold 350.2M3. alone for iOS.

Sources: 1 Sony Corporation’s final calendar quarter is their Q3 fiscal quarter, hence quarters are reported here as holiday 2016 quarter. 2 Apple reported their Q4 2016 iPhone sales. 3 iDC Worldwide Quarterly Mobile Phone Tracker, Feb. 1 2017 Volumetric broadcast is the holy grail. 9

Yes, of course, volumetric Notable Don’t let the Perfect be the Enemy of The Good.

8i is currently leading the industry There are use cases where 360° is acceptable. There in volumetric capture. They’ve are use cases where 180° is preferable. There are use just raised a $27M series B led by Time Warner Investments. They’re cases where directed stereoscopic works best … Notable the hottest thing in volumetric capture at this time. I expect their main competition to come from Volumetric Simplygon the Hollywood special effects …but broadcasting volumetrically captured content is industry. It should be no surprise Microsoft just keeps winning. that they’re based in LA. coming. Everyone in the industry knows it is the metric against which all other VR content will measured. They recently acquired Simplygon, maker of compression software Between startups, gaming, industrial products in the that substantially reduces the engineering and medical field, and in polygon count of 3D models. Cappasity is a small startup out special effects — there are too many hardware players in When content is volumetrically of Russia that relocated to Silicon the space to mention here. scanned, it is first a point Valley. Recently pivoting from The software companies to watch are those that can cloud model that must then be game avatars to focus on the take the point cloud data from capture devices, convert converted into polygons — facets apparel industry and the multi- that comprise an object’s surface. it to polygons of acceptable resolution for display, and billion-dollar problem of online-fit. A complex shape can comprise They are an intel partner, but their compress them enough to push through a pipe … and millions of polygons, fine for software works with other capture eventually do all of this on the fly so that the content can special effects post production, hardware as well. be streamed at an acceptable frame rate. but far too large to stream. 10

Optically transparent displays Optical see-through displays 11 Waveguides Today, there are only three companies that matter in waveguides. Why? Because there are only three who have shown they can go all the way to manufacturing. Hardware is hard. Near-eye optics are very hard. Waveguides are harder still. Going from an engineering concept to a functional prototype is exceptionally difficult. Taking that IP from prototype to a scalable, manufacturable product is much more difficult than filing patents and issuing press releases (It’s clearly more difficult than raising money).

Nokia Lumus DigiLens Nokia’s surface relief waveguides are Lumus’ design is the simplest of the DigiLens has a much lower profile in the consumer used by Microsoft in their HoloLens waveguides on the market. They have space than either Nokia or Lumus. They’ve principally device. Their IP is also licensed by none-the-less shown this design can been a supplier / partner of Rockwell Collins in the who raised money from Intel be mass produced and are providing avionics space. In addition to their classified work for to build a manufacturing plant in waveguides to both Daqri and Atheer. the U.S. military, they’re the supplier to Rockwell Collins’ New York for their M3000. Nokia has waveguide display systems for Embraer Legacy 450 & Lumus has shown they can do more not only shown that they can scale 500 cockpit displays. They have most recently entered than technology, they can do business: to manufacturing, but their Vuzix the consumer space with BMW’s smart helmet. They have they’ve worked with Optivent, Meta and deal shows that their manufacturing an aggressive road-map to increase FOV. collaborated with Himax. process itself can be replicated. While Nokia and Lumus’ offerings are passive lenses, Lumus recently closed $45M in The physics of their surface relief DigiLens’ waveguides are active — a liquid crystal based additional funding from HTC, Quanta design hits a wall at about 30° FOV. structure that is electrically switchable. They recently and Shanda Group. closed $22M in additional funding from Sony / Foxconn. But what about … I continue to be bullish on DigiLens. I’m also watching TruLife Optics and Akonia Holographics to see if they can take their designs to manufacturing. My skepticism of does not stem from their recent PR problems, but from their ability to take their ambitious designs to manufacture (or even to prototype?). Kayvan Mirza of Optivent recently wrote a Magic Leap Analysis that is recommended reading. Optical see-through displays 12 Waveguides

Addendum

Journey Technologies In the course of writing this report, I was contacted by the founder of Journey Technologies of Beijing. She boasted that they have a design “like Lumus,” with a 36° FOV and 1280x720 resolution. She claims they can manufacture 500 units per month, and can “easily” ramp up to 1000 per month with their existing manufacturing facility. She included a photo of their optical unit that indeed resembled Lumus’ design. The Chinese are masters of reverse engineering and the Lumus design is very , making them vulnerable to commoditization. Optical see-through displays 13 Others

There are other near-eye-optic see-through display systems besides waveguides. Meta is notable for their exceptionally wide, 90° FOV, the widest on the market. It is a beautiful display. I compare the Meta 2 display to the top-of-the-line graphic work station displays of the late 90s. Even as flat-panel displays came onto the market in the early oughts, a nice CRT still had better color and higher resolution, and a 21” Viewsonic was a larger screen than anything available in an LCD … but clearly

the writing was on the wall. META 2 To achieve Meta’s incredible field of view, they project a panel onto a Pepper’s ghost style combiner — a semi-reflective transparent screen that allows the viewer to see their environment through the reflection. ODG R9 These will ultimately become obsolete as waveguides grow in FOV. Until then, the Meta 2 is excellent for UI development in AR. One notable exception to the waveguide trend in low profile displays is ODG (Osterhout Design Group), whose R9 has a magnified OLED micro-display projecting into an outward beam-splitter, onto a reflector/combiner, all in a slim (-ish) form factor, featuring both a higher resolution, and a 50° FOV that rivals that of current waveguides. It is also worth mentioning that they don’t suffer the display artifacts of current generation waveguides including halo/glow and rainbow-ing image distortion. Just as with the inferior image quality of early generation flat-panels, these problems will be solved in time.* * You can read a good counter argument against waveguides at Karl Guttag’s blog, Magic Leap & Hololens: Waveguide Ego Trip? 14

Smartglasses general purpose AR interface 15

Smartglasses general purpose AR interface Waveguides + Eye-Tracking Depth Sensors Computer vision electroencephalogram (EEG) Smartglasses: a general purpose AR interface 16 EYe-Tracking

Eye-Tracking alone can substantially improve image quality in AR & VR by enabling foveated rendering — rendering in highest resolution only that portion of the display directly where the user is looking — putting less burden on the GPU and allowing for higher frame rates. Coupled with depth sensors and computer vision, things get much more interesting. If the device knows both where the user is looking, and has an understanding of the environment (i.e.: knows what the user is looking at), we only need to add intent (user command). It is in this context where I will make the case for Eye-Tracking + EEG.

Voice Command is Overrated Voice as command interface is convenient Where People Use Voice Assistants when in the privacy of one’s home, or in Users of smartphone-based voice assistants who use them in the following locations an automobile. It looses its appeal when in an office or any public environment.

51% 39% 6% 1%

in the car at home in publicat the office

Based on a survey of 500 consumers in the United States Source: Business Insider / Creative Strategies Creative Commons cc BY-ND Waveguides + Eye-Tracking + Sensors + Cv + EEG for A | B Smartglasses: a general purpose AR interface 17 Voice is for

See Real-time Translation language translation by .

Talk to anyone in any langue, and they hear your translation in real time to any language. Listen to anyone speaking any language, and hear them in the language of your choice.

Voice Command is Overrated Voice as command interface is convenient when in the privacy of one’s home, or in an automobile. It looses its appeal when in an office or any public environment.

International Language translation

Waveguides + Eye-Tracking + Sensors + Cv + EEG for A | B + Speaker + Microphone Smartglasses: a general purpose AR interface 18 electroencephalogram (EEG) “EEG based brain computer interface has A B UI Analogy: demonstrated its capability to control a An A | B selection via EEG is device such as controlling the fly of a model derivative of a two button mouse helicopter. The EEG based binary input and the user’s eyes as the cursor. device shall become available in the near Notables in the consumer EEG device space future depending on the market needs.” include interXon (maker of Muse), Personal —Dr. Bin He, University of Minnesota Neuro (maker of Spark), and Emotiv (maker EEG AS Command of Epoc+, and Insight). All currently marketed as EEG still has a substantially slower wellness products, with various meditative features, Emotiv has response time than that of a human finger an SDK for third-party developers, some of whom are already on a mouse. On the following page experimenting with using EEG for command. Dr. Geoff Mackellar, CEO, Emotiv gives a Smith Lowdown Focus MUSE by interXon Personal Neuro more nuance and cautious analysis. by Spark

NOTABLE: Safilo’s SMITH brand of eye-frames have partnered Emotiv Epoc+ Emotiv Insight with interXon to introduce the Smith Lowdown Focus, Mpowered by Muse. The form factor is proven possible, though the functionality is, at this Waveguides + Eye-Tracking + time, still limited to Muse’s Sensors + Cv + EEG for A | B meditation app. Smartglasses: a general purpose AR interface 19 electroencephalogram (EEG)

EEG AS Command … with Caveat EEG still has a substantially slower There are several issues in calculating latency for any response. The Sensory Detection Time is the basic brain response time - typically around 210-260ms response time than that of a human finger depending on age, for an unfamiliar but expected task. This is the point at which on a mouse. Dr. Geoff Mackellar, the brain has decided to act and starts to initiate the motor signal in response, CEO, Emotiv gives a more which takes around 100ms to execute. Sensory Detection time and the motor delay are both reduced for highly trained tasks, so for example athletes and nuanced and cautious analysis. very experienced gamers can reduce the overall reaction time to specific kinds of events as a result of habituation. While EEG-based response may not Emotiv offers a direct mental command system based be fast enough to compete with motor on a user-trained set of reproducible mental patterns signals “in general,” it is still the opinion which are related by a machine-learning classifier system as belonging to each of the commands. In this case the classification is made at 250ms intervals, of this author that the trade-offs in privacy based on analysis of the most recent second of data. Latency in this case for voice command and social acceptance depends both on the computational side, with an expected delay of around for gestural interfaces will make EEG the 250ms, and the user’s ability to form the mental state, which depends on the user’s experience level and ranges from 50ms to several seconds. These delays winning UI in most consumer user cases. occur end-on with the ordinary Sensory Detection Time, where the subject must form the intention to act before starting the process. Additional latencies much also be taken into account for EEG systems. Firstly, the signal processing chain usually includes finite impulse response filters to remove signal artefacts such as the 50/60Hz line hum present almost everywhere. This is picked up from the electrical mains supply and contributes very large artefacts into the data stream. Typical filter have an inherent latency of 50-60 milliseconds. Computational effects also introduce latency, where a series of calculations must be made and these are generally applied to a retrospective signal, often in the range of 0.25-5 seconds. In general, EEG-based is not fast enough to compete with direct motor signals Waveguides + Eye-Tracking + except in cases where the subject’s motor system is compromised. Sensors + Cv + EEG for A | B — Dr. Geoff Mackellar, CEO, Emotiv 20

Other things worth mentioning Worth mentioning exhibit A 21

At this early stage, Microsoft and Intel should not be considered late to the . SOMETHING TO PROVE Intel lost out on the smartphone market to Project Alloy Windows 10 vr Partners ARM, and aims not to make that same mistake again. Cutting deals with Google, Recon, Vuzix and Luxottica, they’ve all but locked down the processor market for AR smartglasses, and they can be expected to go after the VR market just as aggressively. Microsoft has chosen to follow their previous path of success in the desktop / laptop market by extending Windows 10 into headsets, relying on their hardware partners to flood the market with low-cost VR headsets. Microsoft and Intel are very much in the game. Worth mentioning exhibit B 22 AGAIN

While included on the waveguides page, DigiLens deserves closer examination —­­ they have proven they can not only make active waveguides, they can mass produce them, and at a lower cost than others can make passive lenses. The Sleeping Unicorn Starting in military industrial, DigiLens has diversified into aviation with Rockwell Collins for Embraer, then into motorcycle helmets with BMW, and are now pursuing automotive partnerships. Their technology is the market leader in waveguides, yet until a recent investment from Sony / Foxconn they had never even been mentioned in TechCrunch. Don’t Believe the hype Let’s put DigiLens into context: MagicLeap has raised over a billion dollars with a $4.5B valuation, yet has not thus far been able to demonstrate that they can make technology that DigiLens has already brought to manufacturing. There is a world of difference between being able to do a once off in a lab environment vs scaling to manufacturing at a market-acceptable price. Some will point out that MagicLeap’s secret sauce is adaptive focus. But if they cannot yet even manufacture active waveguides, who will get to adaptive focus first? Only a few companies have shown that they can even manufacture passive waveguides, and DigiLens alone has demonstrated that they can manufacture active waveguides. TruLife Optics of the UK claims to have produced an active lens in a lab, yet the entirety of their reassurance that they can take that to manufacturing is a single sentence on their website that reads, “TruLife Optics is exploring manufacturing options and is confident that our optic will be available in quantity for companies and or individuals wishing to incorporate them into their products.” Really? Prove it. … Akonia Holographics of Colorado spent a decade (unsuccessfully) trying to developing holographic storage. While their team may possess some transferable expertise in holographic technology and they’re worth watching, DigiLens has spent the same decade developing holographic wavesguides and succeeded. My prediction: DigiLens is a sleeping unicorn and will likely exit via a bidding war that may include Apple, among others. Worth mentioning exhibit c 23

goTenna entered the market via a crowd- sourced campaign billed as a P2P antenna, purportedly aimed at wilderness hikers — keep in contact with your companion when outside of the range of cell towers. They executed their goTenna is a mesh-network. go-to market strategy brilliantly. Though their goTenna is not the new walkie-talkie. antenna current works for text-messaging, they When incorporated into are often referred to as the new “walkie-talkie.” every connected device, mesh-networks will disrupt the global telecommunications industry and upend the state surveillance apparatus. Mesh-networks are coming

Jott, a messaging app, uses low energy If many are turning Bluetooth or a router that can reach to apps like Signal, for their end-to-end within 100 feet of each user, to create a encryption, consider the mesh-network among users to send P2P privacy advantages of messages. It is favored by many elementary a P2P platform where the internet itself is and Jr. high school kids, who typically have bypassed all together. less control of their own data plans. While Jott’s user base is still relatively small, I predict that either Jott, or an app like it could ride the same “privacy” wave that is currently catapulting the use of Signal. 24

Face recognition Why Mesh-networks matter to AR Consumer Face Recognition 25 The wrong way to do face recognition In the conventional framework, A database of face profiles The object of the observer’s gaze is stored in the cloud. is referenced against the cloud the observed loses all agency database. over their own identity. In recent years, particularly in public spaces in first-world high density urban environments, and international points of public transit, being surveilled by the state is the new normal. However, there has up to now been an unwritten social contract that, at least among strangers, people imagine themselves to be anonymous in the crowd. Discomfort arises with the breaking of this social contract. Why Mesh-networks matter to AR Doing it wrong At Mobile World Congress in 2010 & 2011, multiple smartphone based augmented reality, face recognition mobile app, proof-of-concepts were demonstrated. None were ever introduced to market. In 2014 Google banned face recognition apps from the app store, citing privacy concerns. In December 2016 Blippar announced their intention to introduce face recognition as a new feature 2010 2010 in their upcoming release of their iPhone app, more notably, it has been reported* that Blippar intends to encourage users to contribute face-and-name pairs to their database, without seeking consent from the individuals featured. Though the announcement was made over two months ago, no such update has yet appeared in the iOS app store. It is my speculation that Apple has withheld approval. 2011 * As reported in Newsweek. Consumer Face Recognition 26 The Right way to do face recognition In the proposed framework, the observed retains agency Instead over their own identity. The user maintains an Via a local area encrypted version of mesh network, users their own face profile who run the app can locally on their then set preferences own device. as to who they No Database share their key with. The cloud is often overrated, Individual observers, but I digress. or groups can be blocked. Why Mesh-networks matter to AR

As recently proposed at MIT Media Lab

In my recent lecture at MIT Media Lab, I made my case for face recognition The Case for as the killer app Consumer Face Recognition for consumer in Augmented Reality augmented reality. by Christopher Grayson MIT Media Lab, AR in Action. Consumer Face Recognition 27 The case for consumer face recognition The case for consumer face recognition is not to identify strangers that you don’t know, but to identify and be recognized by the people whom you do know. Dunbar’s One of Dunbar’s non-eponymous Number: numbers: Two use cases for consumer face recognition — one medical and one modern: 150 1500 Medical: For treating Alzheimer’s and other aphasia inducing Approximate number Approximate number of names that expressive language deficits. of people that the the typical person can associate typical person can with a face: includes friends, family, maintain meaningful and all notable contemporary and Cognitive Enhancement: The work of anthropologist and relationships. historical figures. evolutionary psychologist, Robin Dunbar suggest that the evolution of human tribal relationships has placed cognitive limits our ability to associate names to about 1500 faces, yet it is now becoming common to manage social networks of substantially larger numbers. Why microsoft’s Linkedin Acquisition Also Matters to ar

Given Dunbar’s research, and given that the business model of a social network follows a variation of Metcalfe’s Law — Linkedin’s business would greatly benefit from face recognition enhanced personal network management. Add to this, Microsoft’s own industry leadership in AR, (including Face API), Microsoft is the candidate best positioned to take the lead on this implementation. I have agitated for advocated for this consumer face recognition framework to the Linkedin UI/UX leadership, as well as members of Microsoft’s AR team. 28

Upcoming VR/AR events Spring 2017

29

THIS IS AN INVITE ONLY PARTY - DATE AND LOCATION WILL ONLY BE SHARED WITH THOSE WHO REQUEST AN INVITATION

CLICK HERE TO REQUEST YOUR INVITATION February 27, 2017

30 TALK : BLIND SPOT fashion eye-frames waveguides low power processors&

the coming battle to own consumer aR smar tglasses. Speaking Christopher grayson

Monday BUY TICKETS Civic Center, San Francisco Feb. 27th San Francisco, I am in you. San Francisco, April 31 - June 2, 2017

31 The Largest AR VR Event in the world

Augmented World Expo April 31-June 2

super EARLY BIRD PRICING UNTIL February 28 32 Conclusions

While the media treats VR as the new gaming console, Represented by I see both VR and AR as globally transformative

technologies — something as big as the invention For custom Reports By of the telephone, on par with the invention of the Christopher grayson internet itself — VR and AR are the technologies COntact Jay Shiel the internet has been waiting for. +1 212-984-8500 [email protected] Virtual Reality, the ability to place two or more people from anywhere in the world into the same virtual space — potentially infinite in size, with a sense of presence— will sweep away national borders, challenge the global political landscape, and possibly even change the way we think of humanity itself. This report has been a collection of observations, ideas, and opinions, loosely connected by the common theme of virtual and augmented reality. Entrepreneur If you like what I’ve had to say, I’d like to hear from you, Creative Director and see how we can change the world together. Futurist

Christopher Grayson < [email protected] > 33

This report is available for download at:

by < [email protected] >

Christopher Grayson BY-SA Creative Commons Attribution Share-Alike 2.5

All product names, logos, and brands are property of their respective owners.