Big Data Bites Editor: Jimmy Lin, [email protected] Searching from

Jimmy Lin, Charles L.A. Clarke, and Gaurav Baruah • University of Waterloo

ow would you search from Mars? No, somewhat familiar to a user from today. Of course, seriously. technology will have advanced dramatically, but the H The Mars to Stay concept describes a point is that we’ll likely still be searching with some- series of related proposals for establishing a perma- thing that looks like a Web search engine, engaging nent colony on Mars (see marstostay.com and www. with friends on something that looks like a social mars-one.com).1 Instead of landing there network, and purchasing items through something with the intention of bringing them home after a that looks like an e-commerce site. Just for rhetori- short visit, the plan is to send astronauts to become cal convenience, we’ll refer to brands that everyone the first colonists. Such missions would be is familiar with today, so when we say “Facebook,” far less expensive, since we wouldn’t need to bring we really mean “Facebook, or whatever social net- along fuel for the return voyage (and manufactur- working service we’ll all be using in 10 years.” ing fuel on Mars is risky). Far from being a cuckoo idea, this approach to exploration has the sup- Why Search from Mars? port of many — including Elon Musk, the founder The first question is, why? More precisely, from an of SpaceX (and co-founder of PayPal and Tesla information retrieval perspective, what’s the task Motors)2 and , the second human to set model? Mars missions, at least in the short term, foot on the .3 will require substantial ground support on , so Scientists and engineers have worked out many our fearless Martian colonists will have access to the of the details, and the quite surprising conclusion is best minds from Earth to help with their problems. that this plan requires no new technological break- Plus, the missions will likely have been planned out throughs (that is, it’s doable with present chemical in sufficient detail that responses to most survival- rockets) and is economically feasible (to the extent critical challenges will have been already mapped that a single wealthy individual could bankroll the out. Thus, searching from Mars will likely not be a entire endeavor). Oh, and there’s no lack of volun- “Houston, we have a problem” need. teers willing to go on this one-way trip.1 We anticipate that Martian colonists will be A survey of mission plans is beyond the scope using the Web much in the same way we do today of this column, as is a discussion of numerous — reading the news, interacting with friends, watch- challenges ranging from sustaining life (air, food, ing highlights from yesterday’s game, searching for shelter, and so on) to maintaining social structures information related to a leisurely pursuit, accessing (dealing with conflict and long-term isolation) and adult entertainment, and so on. For convenience, even issues such as cost recovery (for example, we’ll call this casual Web use. One initial reaction there are proposals to turn the entire endeavor into might be: What are the colonists doing wasting a reality show). In short, there are lots of smart peo- time on Facebook? Quite the contrary, these activi- ple thinking about these issues on which we have ties are critical to the psychological and emotional no expertise. However, we do know a thing or two health of the colonists. They will continue to have about search and Big Data, and on that we will strong ties to Earth, having left behind family and most happily speculate. friends, and sustaining these connections will be How would you search from Mars? And more important to overall well-being. It seems silly and a generally, how would you use the Web from Mars? waste of resources to call up ground support to ask To make the scenario more concrete, let’s look for score updates from a football game or to obtain roughly 10 years into the future, and let’s assume a new vegetable stew recipe. Although intermedi- that “searching” and “the Web” will still feel at least ated interactions have been the norm in human

78 Published by the IEEE Computer Society 1089-7801/16/$33.00 © 2016 IEEE IEEE INTERNET COMPUTING Searching from Mars

Big Data Bites space missions throughout history, it’s hard to imagine how such an approach “Big Data Bites” is a regular department in IEEE Internet Computing that aims to is sustainable for a permanent colony. deliver thought-provoking and potentially controversial ideas about all aspects of Big Indeed, we’re already moving away Data. Interested in contributing? Drop me a line! from such rigid interactions: for exam- —Jimmy Lin ple, personal Internet use is possible on the International Space Station today. Thus, we want to be able to search from Mars. the military, where the colonists are per- and we need to take into account the Another category of information manently “on-duty” and paid a salary, sun’s gravitational influence. is needs will likely be scholarly search. so having disposable income is entirely accomplished by transfer orbits — inject- An important goal of Mars missions plausible. ing a vehicle into an orbit around the sun is to advance science, so our colonists that intersects the at the will require access to all of the scien- The Constraints right time. This approach requires two tific literature on Earth. For example, How do we replicate on Mars the com- separate changes in velocities (delta-v’s): the colonists might want to publish plete “Web experience” on Earth? Before first, from low-Earth orbit into the trans- about breakthroughs in hydroponics, sketching out the solution, let’s first lay fer orbit, and then from the transfer orbit and thus would need the Internet in out the constraints and resources. In once we arrive at Mars. exactly the same way that an Earth- terms of the latter, the Mars colony likely There are a variety of options with bound scientist would: looking up wouldn’t be self-sufficient for a while, so different tradeoffs: the Hohmann Trans- related work, reading papers, interact- we anticipate substantial ground support fer Orbit is the most efficient in terms of ing with peers, and so on. Although it and continued investment, including fuel but is based on a particular con- might be possible to have an Earth- regular cargo supply rockets from Earth. figuration of the planets such that a side co-author handle all these inter- What about the constraints? launch window only opens up once actions, this would be awkward and Mars is sufficiently far that com- every two years.8 Bi-elliptical Trans- frustrating for the colonists, not to munication latencies are problematic. It fer Orbits take more time but require mention contrary to the workflows of takes radio signals between around 4 to less fuel.8 Conjunction class transfers modern science. For these reasons, we 24 minutes to travel to Mars, depend- are our best bet, with current or near- want to be able to search from Mars. ing on the relative positions of the future rocket technology: they’ll get us Our goal is to make searching from planetary bodies, so we need to cope to Mars in between 120 to 270 days,9 Mars and Web use in general as close with a roundtrip latency of between 8 which is in line with the historical aver- to the experience from Earth as pos- and 48 minutes.6 That means a Skype age: missions have taken between 150 sible. This contrasts with alternative call between Earth and Mars is out of to 300 days to reach Mars over the last approaches built on the idea of “slow the question, and we’re not likely to half century.10 As an aside, if we can search”4 and asynchronous search mod- figure out faster-than-light communi- overcome the political objections of els.5 It will be a while before we have cation anytime soon (perhaps ever). using nuclear rockets (for example, as Amazon Prime on Mars, but it’s perhaps There exist technologies built around outlined in the Orion project), it might reasonable to expect that a colonist laser-based communication where it’s be possible to cut the travel time to could purchase (small) personal items possible to achieve good bandwidth around two months.11 from Amazon and have it delivered on between Earth and Mars. The Lunar Finally, it’s also worth mentioning the next supply rocket (estimated deliv- Laser Communications Demonstration the concept of a , which is ery time: eight months). Furthermore, achieved a 622-Mbps downlink and a a vehicle in a special orbit around the the colonists will want to buy presents 20-Mbps uplink between the Earth and sun that encounters Earth and Mars for friends and family. Although the lat- the Moon,7 so something like this to on a regular basis.12 Instead of inser- est holographic display might be too Mars is technologically feasible. If we tion into Martian orbit, a cycler keeps large for shipping to Mars, it still makes need more bandwidth, we simply build flying in an endless loop (around the a great Father’s Day gift. Such transac- more satellites, and thus it’s reasonable sun) — payloads “hitch” a ride on the tions, as well as purchasing the latest to count on substantial (but not infinite) cycler and then “get off” at the right Kindle release or the digital plans for the bandwidth between Earth and Mars. time. Cyclers are attractive in that latest gadget (for 3D printing) shouldn’t Getting physically from Earth to they can rely on gravity-assist fly- be any more difficult on Mars than Mars is more complicated than aiming a bys to maintain or alter their trajec- on Earth. One possible organization of rocket at Mars: both planets are in orbits tories, and thus require minimal fuel Mars missions would be modeled after around the sun (at different velocities) once the initial orbit is established. Of january/february 2016 79 Big Data Bites

course, the downside is that transit is data, coupled with data from millions bandwidth consumption and not gen- limited to periodic windows. more users and perhaps some manual erating excessive load on remote Web curation, could be leveraged to build servers. Fortunately, this problem is It’s the User Model, Stupid! a long-term predictive model of user already being solved today. So, how do we actually do it? interests and information consump- So, we could (ask Google to) keep Before sketching our solution, we tion — and from that, Google could track of all the content in transit and note that our focus is on applica- extract a portion of the Web that any capture any updates, continue to tion-level challenges, as opposed to individual or group is likely to use. refine the predictive user models on lower-level advances in computing This could be set up as a relatively Earth with new data, and beam over technologies: for example, we simply straightforward machine learning the “diffs” to Mars. We would send assume that computing equipment problem; something like (digital) bin over some empty storage in a robotic will have been hardened to withstand packing, where the goal is to maximize cargo mission prior to the large sneak- the stresses of space flight and that the pages’ overall expected utility. ernet delivery, and this could hold the technologies such as delay-tolerant Next: because the colonists will “patches” that are applied when “the networking13 have already been need entertainment, we do the same Web” lands. The reasonable bandwidth deployed. for Netflix. That is, based on the view- we’d expect to Mars, coupled with this Human missions to Mars will likely ing history and preferences of each temporary storage (which the colonists be preceded by multiple robotic cargo colonist, it’s possible to assemble a might later use to store new data), missions that transport shelter, equip- playlist, subjected to a storage bud- would ensure that the colonists arrive ment, and supplies. A part of the cargo get, that maximizes “viewing pleasure” to a fresh cache of the Web. would simply be a copy of the Web (predicted “star ratings” would be a Our colonists have arrived on Mars. — in other words, we start with the start, but hopefully the recommender The biggest enemy now is latency: the first interplanetary sneakernet. Obvi- systems community will have invented worst possible experience is for a col- ously, we can’t send everything, but as something better). Leaving aside intel- onist to issue a search query or click Andrew Tanenbaum once said, “Never lectual property issues (with the enter- on a link and have the response be, underestimate the bandwidth of a sta- tainment content and everything else), “Sorry, this content doesn’t exist on tion wagon full of tapes hurtling down gathering all these data is straightfor- Mars. Please stand by while it’s being the highway.” Even with today’s tech- ward. We’re not lawyers, of course, but fetched from Earth. Estimated time of nology (16-Tbyte solid-state drives), even the intellectual property issues delivery: 24 minutes.” a petabyte will weigh less than four might not be that thorny if we simply We envision solving this problem kilograms. Of course, this is just one think of this approach as the logical using the same type of technology that copy and we need redundancy (not extension of edge caching — content was deployed to create the sneakernet to mention space-hardening) so there distribution networks already do some- delivery to begin with. Each colonist will be added bulk, but physically thing along these lines today. would have an avatar on Earth that shipping to Mars a part of the Web is Okay, we got a rocket hurtling represents his or her user model — this entirely feasible. By the time the mis- through space with a sizeable chunk of avatar would continuously receive a sion actually lifts off, we would expect the Web on it. Here’s where we encoun- stream of interaction and other data another order of magnitude improve- ter the first problem: the voyage to (from Mars), and based on these data, ment in storage technology: either 10 Mars is a long one (especially for cargo predictively fetch relevant portions times more capacity or a tenth of the ships, which are likely to take longer, of the Web on Earth, package up the weight. more fuel-efficient transfer orbits), and content, and beam the material over to What should we include? A copy by the time our cache of the Web lands Mars to update/replace the cache there. of all books and scientific articles that on Mars, it’s already stale. For example, the avatar might observe have ever been published would be a Not to worry, this is already a from the colonist’s personal diary good start (that’ll be relatively com- challenge that all Web search engine that she’s contemplating growing the pact). We’d need some webpages too: companies deal with today: the prob- first Martian bonsai, and proactively For this, we’ll ask Google to donate lem of crawl prioritization. The Web fetches relevant webpages that are a crawl (plus index) of the portion is, of course, dynamic and constantly related to the subject. The next morn- of the Web that would be most valu- changing, but some parts more fre- ing, when the colonist starts searching able to the colonists. How would they quently than others. Google must about bonsai, the pages are already know? Well, Google already has the identify those parts and recrawl them, there — the search experience is seam- colonists’ interaction logs (who have along with new content, subjected to less and she has no idea that the pages been searching on Earth, no?). This constraints such as trying to minimize were only delivered last night.

80 www.computer.org/internet/ IEEE INTERNET COMPUTING Searching from Mars

Prefetching by the predictive mod- and so on) for health and safety rea- being done today and the rich user els could be supplemented by further sons, which might provide interesting models necessary to support a seam- sneakernet deliveries from Earth, pig- sources of signal. The avatar would also less Web experience from Mars is the gybacked alongside regular cargo mis- have access to all personal communica- optimization objective. Somewhat sions or new waves of colonists — of the tions (such as voice/video messages and oversimplifying, the focus today by updated Netflix catalog, for example. email), other personal files (for exam- most Internet companies is on short- Considering that much content ple, diaries), as well as official mission term prediction — query typeahead and (particularly TV shows) is available on logs and reports. In addition to human- ad prediction are two great examples. Netflix only after a substantial delay, generated data, we would expect a The first strives to save users a few adding the transit time doesn’t seem multitude of sensor data, ranging from keystrokes and the second attempts like a particularly big deal. It seems environmental monitors to the output to predict the next interaction (click- straightforward to weigh the benefits of scientific experiments. ing an ad). In contrast, user models and costs of beaming versus rocketing In short, it would be relatively easy to support searching from Mars need bits over to Mars to select the appropri- to capture all data coming in, going to capture longer-term user interests, ate delivery method. out, and being generated on Mars — potentially over months and even Each colonist’s avatar could also thus creating the for the years (in the case of scientific research proxy websites for transactions such as ultimate big brother. We can justify being conducted on Mars). This repre- Amazon purchases: from the Martian gathering all these data, but this also sents an interesting direction in infor- point of view, the experience would creates interesting data privacy issues. mation retrieval and machine learning be seamless, but a final confirmation Just because an avatar could take research, and recent work on model- would arrive only after the actions advantage of all these data doesn’t ing longer-term user engagement sug- have been relayed and applied “for mean it should. The avatar of each gests that industry has begun to move real” on Earth. This would take some colonist should have tight security in this direction.14 amount of software engineering, but safeguards and be kept logically dis- In summary, what does it take to seems eminently doable. It might be tinct — from other colonists and from make searching from Mars work? It’s interesting to consider how you might eavesdroppers on Earth. However, it the user model, stupid! trade equities from Mars, but let’s leave isn’t hard to construct scenarios where aside asynchronous transactions in the information inadvertently leaks across remainder of this piece. avatars. espite the technological and eco- Back to search: the predictive user The scenario described above isn’t D nomic feasibility of colonizing models in the avatars would have a dif- fanciful science fiction, but already Mars, there are presently no concrete ferent objective than the ones used to here today. Every user interaction plans to actually do it. This doesn’t “bin pack” the initial cache. Whereas with an online service is already being mean, however, that we should just sit the latter are optimized to maximize logged. Personal communication is idle waiting for Elon Musk to deliver. expected utility per unit storage, the already being monitored and captured, In fact, searching from a rural village avatar’s primary goal is to hide latency, for example, by cell phone companies in the developing world exactly paral- and hence it might be more profligate and Web-based email services. Online lels searching from Mars: instead of a in its use of bandwidth. However, the calendars, airline e-tickets, and GPS cache of the Web on Mars, we have a underlying principles are the same — keep track of where we are and what cache of the Web at the Internet access we’d probably still be using some form we’re doing nearly all the time. With point shared by the villagers. of machine learning. All of this can be the advent of cloud storage, our per- Internet connectivity in the devel- accomplished with today’s technology. sonal files are already “out there,” and oping world is often intermittent and It’s interesting to speculate what the same goes for personal physiologi- poor in quality: we can use the pro- sources of data the avatars could cal data through the proliferation of posed techniques to hide latency from bring to bear on the prediction prob- fitness and wearable devices. The only Mars to create a more seamless user lem. Naturally, we would expect all significant difference today is that experience for rural villagers in India the types of interaction data that Web all these data are gathered in silos, (for example). We can even substitute search engines already capture today: whereas on Mars, all data would be a “robotic cargo ship” with a “FedEx queries, clicks, dwell times, and so on conveniently accessible (an important delivery of hard drives” and the sneak- (yawn). Our colonists likely would be distinction that probably leaves many ernet concept is still applicable.15,16 under constant (non-intrusive) physi- companies today salivating). Yes, we can build the search from ological monitoring (heart rate, corti- In terms of machine learning, the Mars experience today to improve levels, amount of physical exertion, biggest difference between what’s search on Earth. Let’s do it! january/february 2016 81 Big Data Bites

Acknowledgments 9. P.D. Wooster et al., “Mission Design Options Jimmy Lin holds the David R. Cheriton Chair in We thank Doug Oard and Stephen Green for for Human Mars Missions,” The Int’l J. the David R. Cheriton School of Computer their helpful comments on earlier drafts. Mars Science and Exploration, vol. 3, 2007, Science at the University of Waterloo. His pp. 12–28; doi:10.1555/mars.2007.0002. research lies at the intersection of informa- References 10. F. Cain, “How Long Does It Take to Get tion retrieval and natural language process- 1. N. Angier, “A One-Way Trip to Mars? to Mars?,” Universe Today, 9 May 2013; ing, with a particular focus on Big Data and Many Would Sign Up,” New York Times, 8 www.universetoday.com/14841/how-long- large-scale distributed infrastructure for Dec. 2014; www.nytimes.com/2014/12/09/ does-it-take-to-get-to-mars. text processing. Lin has a PhD in electrical science/a-one-way-trip-to-mars-many- 11. G.R. Schmidt, J.A. Bunornetti, and P.J. engineering and computer science from MIT. would-sign-up.html. Morton, “Nuclear Pulse Propulsion — Orion Contact him at [email protected]. 2. E. Howell, “SpaceX’s Elon Musk to Reveal and Beyond,” Proc. AIAA/ASME/SAE/ Mars Colonization Ideas This Year;” Space. ASEE Joint Propulsion Conf. and Exhibit, Charles L.A. Clarke is a professor in the David com, 9 Jan. 2015; www.space.com/28215- 2000; http://ntrs.nasa.gov/archive/nasa/ R. Cheriton School of Computer Science at elon-musk-spacex-mars-colony-idea.html. casi.ntrs.nasa.gov/20000096503.pdf. the University of Waterloo. His research 3. B. Aldrin, “The Call of Mars,” New York 12. R.P. Russell and C.A. Ocampo, “Systematic interests include information retrieval, Times, 13 June 2013; www.nytimes. Method for Constructing Earth-Mars Cyclers Web search, Web data mining, text data com/2013/06/14/opinion/global/buzz- Using Free-Return Trajectories,” J. Guid- mining, and software tools. Clarke has a aldrin-the-call-of-mars.html. ance, Control, and Dynamics, vol. 27, no. PhD in computer science from the Univer- 4. J. Teevan et al., “Slow Search: Information 3, 2004; pp. 321–335; doi:10.2514/1.1011. sity of Waterloo. Contact him at claclark@ Retrieval without Time Constraints,” Proc. 13. K. Fall, “A Delay-Tolerant Network Archi- plg.uwaterloo.ca. 7th Ann. Symp. Human-Computer Inter- tecture for Challenged Internets,” Proc. action and Information Retrieval, 2013, 2003 Conf. Applications, Technologies, Gaurav Baruah is a PhD student in the David R. article no. 1. Architectures, and Protocols for Computer Cheriton School of Computer Science at the 5. W. Thies et al., “Searching the World Wide Comm., 2003, pp. 27–34. University of Waterloo. His research inter- Web in Low-Connectivity Communities,” 14. M. Lalmas, “Evaluating the Search Expe- ests include information retrieval, search Proc. 11th Int’l World Wide Web Conf., 2002; rience: From Retrieval Effectiveness to User engine evaluation, and data mining. Baruah www2002.org/CDROM/alternate/714/. Engagement,” keynote talk, Conf. and Labs of has a master’s degree in computer science 6. T. Ormston, “Time Delay between Mars the Evaluation Forum, 2015; http://clef2015. and engineering from the Indian Institute and Earth,” blog, , 5 Aug. clef-initiative.eu/CLEF2015/keynotes.php. of Technology, Guwahati. Contact him at 2012; http://blogs.esa.int/mex/2012/08/05/ 15. A. Pentland, R. Fletcher, and A. Has- [email protected]. time-delay-between-mars-and-earth. son, “DakNet: Rethinking Connectivity in 7. B.S. Robinson et al., “The Lunar Laser Com- Developing Nations,” Computer, vol. 37, munications Demonstration,” Proc. Int’l no. 1, 2004, pp. 78–83. Conf. Space Optical Systems and Applica- 16. A. Seth et al., “Low-Cost Communication tions, 2011; doi:10.1109/SMC-IT.2009.57. for Rural Internet Kiosks Using Mechani- 8. D.A. Vallado, Fundamentals of Astrody- cal Backhaul,” Proc. 12th Ann. Int’l Conf. Selected CS articles and columns namics and Applications, Springer-Verlag, Mobile Computing and Networking, 2006, are also available for free at http:// 2007. pp. 334–345. ComputingNow.computer.org.

Engineering and Applying the Internet

IEEE Internet Computing reports emerging tools, technologies, and applications implemented through the Internet to support a worldwide computing environment. For submission information and author guidelines, please visit www.computer.org/internet/author.htm

82 www.computer.org/internet/ IEEE INTERNET COMPUTING