Techware: Mobile Media Search Resources
Total Page:16
File Type:pdf, Size:1020Kb
[best of THE WEB] Zhu Liu and Michiel Bacchiani TechWare: Mobile Media Search Resources problem of multimedia search has wid- Users might use speech to input a text Please send suggestions for Web ened given the additional input modali- query, or they might use their cameras resources of interest to our readers, ties now commonly available through to input an image-based query. Touch proposals for columns, as well as the sensor rich clients. screens make multimodal consumption general feedback, by e-mail to Dong Yu (“Best of the Web” associate edi- This column first reviews the recent of complex results an obvious presenta- tor) at [email protected]. progress in the field of mobile media tion method. The swift change in the search and then discusses the enabling problem setting, the size and nature of media indexing and query technologies. the content indexed, and the prolifera- n this issue, “Best of the Web” focus- Some useful online links and resources tion of devices used to engage in the es on mobile media search, which will be listed for the interested readers content have created a new and exciting has enjoyed rapid growth in con- for further research. Unless otherwise field of multimedia search. sumer demand and hence receives a noted, the resources are free. The momentum of this technology is lot of attention from content provid- not only evident in the many products Iers and device manufacturers. The com- MOBILE MEDIA SEARCH produced by industry in this space, it is bination of a large increase in Not being mined, a diamond is simply an also evident in the academic communi- computational power and an always-on ordinary stone buried in the dirt. That is ty. Recently, the “Mobile Media Search: broadband connection available every- also true for multimedia content. With Has Media Search Finally Found Its where at all times makes for an ideal the growing availability of low-cost mul- Perfect Platform?” panel series at the device for all our social, business, and timedia capture devices (video, image, ACM International Multimedia 2009 entertainment needs. In addition, such and sound) and the availability of easy- Conference and ICASSP 2009 addressed devices provide a wide array of multime- to-use video editing software, shooting a this field. In these two panels, leading dia sensors such as a high-quality cam- video clip and publishing it online is no experts in the field from both industry era, a touch screen, audio-capturing longer a complicated task. For example, and academia shared their opinions on abilities, and a global positioning system in the last year, video material uploaded mobile media search. (GPS). This makes the devices not only to the YouTube video sharing service has In the following two sections, we suited for multimedia consumption but increased from about 20 h/min to about introduce two dimensions of mobile also a powerful platform for multimedia 35 h/min. This explosion of content media search applications and systems: content generation. The relatively small makes the problem of multimedia search multimodal query interfaces and multi- form factor of the mobile devices in com- both more pressing as well as challeng- media retrieved results. parison to desktop machines also brings ing. As a result, the area of multimedia challenges as it is much more difficult to search has received a lot of attention as MULTIMODAL QUERY IN MOBILE provide textual input to the device. On it moved from a text-only retrieval prob- MEDIA SEARCH the other hand, the wide availability of lem to a much richer and multifaceted The dominant search interface both on the multimedia sensors makes additional problem. It brought in several other desktop as well as mobile devices is input modalities more natural and many research fields like speech recognition, based on text-query input. Such queries applications make good use of the added image/video processing, natural lan- might be keywords, key phrases, or natu- opportunities this provides. As a result, guage understanding, computer vision, ral language questions. Alternatively, the importance of multimedia search has and music identification to better under- some search engines provide a hierarchi- risen considerably given the explosion of stand the content being indexed. In addi- cal categorization of the available con- devices attempting to locate content and tion, the group-based content generation tent and provide the users an interface to the explosion of content generation and consumption has added user signals navigate that hierarchy. Contrary to itself. In addition, the spectrum of the that can be used in retrieval (e.g., a video desktops, input of a text query can be might “go viral”). And the many sensor- quite challenging on a mobile device due rich clients make the use of multimodal to its small form factor. On the other Digital Object Identifier 10.1109/MSP.2011.941095 Date of publication: 15 June 2011 input a more realistic usage setting. hand, unlike desktops, mobile devices IEEE SIGNAL PROCESSING MAGAZINE [142] JULY 2011 1053-5888/11/$26.00©2011IEEE are equipped with a large array of sen- social network updates. The Dragon response (QR) barcodes or written text sors making alternate input modalities a search app searches a variety of top that can be automatically analyzed and widely available option. This section Web sites using spoken queries. transformed to a query. A barcode read focuses on the alternate interfaces might lead to a URL or a product mobile devices provide. Specifically, we MUSIC-BASED SEARCH search. A text snippet might be a query highlight voice-based, music-based, Often while on the go, we hear a catchy in itself or be the input to machine image-based, and multimodal-based tune or a song that we recognize but translation. In other cases, the image- query input methods. can’t quite place. This presents a search based query can be a complex scene problem in a modality that isn’t covered such as a photo of a product or a char- VOICE-BASED SEARCH well by classical text-based information acteristic building. Several companies A major shortcoming of mobile devices retrieval. One could query based on part provide image-based query processing is the effort needed to input a textual of the lyrics but no such option exists for as apps on mobile devices since they query to the device. Several companies instrumental music, and for many songs universally are equipped with high- are addressing this issue by providing a the lyrics that one might remember are quality cameras, making the image cap- speech recognition-based solution. This not very descriptive. Given the availabili- ture an easily accessible modality to in itself is a hard speech recognition ty of a microphone on all mobile devices, feed such searches. problem. Queries have a very large capturing a snippet of the song becomes ■ Google [www.google.com/mobile/ vocabulary. They tend to be short, mean- a possibility and several companies pro- goggles (mobile app)]: The Goggles ing that a language model provides a flat vide a music-based query search. In fact, app uses camera input to search dif- prior to the search problem. The user improvements in the robustness of the ferent kinds of objects and places, population of this technology is very technology allow the users in cases to including text, landmarks, books, large. And the most common use of the simply hum the tune as the input query contact info, artwork, wine, and logos. technology is on-the-go, likely in a noisy to the search and successfully retrieve ■ SnapTell [www.snaptell.com (mobile environment like on a city street or in an the tune in question. app)]: The SnapTell app finds informa- airport. The application also provides ■ Shazam [www.shazam.com (mobile tion about any book, DVD, CD, or modeling opportunities as the query app)]: Shazam identifies recorded video game using a picture of its context might be available from the text music by listening to the song for cover or the UPC/European article input location (the text input field and about 5–7 s. After recognizing the number (EAN) barcode image. surrounding Web page) and the profile of song, it returns the song name, art- ■ Kooaba [www.kooaba.com (mobile the user is likely consistent as most ist, album, link to buy the song, and app)]: The kooaba visual search app mobile devices are owned and operated YouTube video if available. recognizes media covers, including by a single user. ■ SoundHound [www.soundhound. books, CDs, DVDs, games, newspa- ■ Vlingo [www.vlingo.com (mobile com (free and paid mobile apps)]: pers, and magazines, and provides rel- app)]: As a voice recognition app, Similar to Shazam, SoundHound also evant product information. Vlingo takes voice commands to allows users to search music by sing- send e-mail and text messages, dial ing/humming the song or speaking a MULTIMODAL-BASED SEARCH the phone, update Twitter and title or band name. Lyrics, artist, and Finally, some queries are most naturally Facebook statuses, and search for other relevant information are expressed by a combination of modali- information on the Web. retrieved. ties. Location is easily expressed by a ■ Google [www.google.com/mobile/ ■ MusicID [musicid2.com (paid GPS or gesturing on a map whereas a voice-search (integral to Android and mobile service)]: MusicID identifies named location is best described by a as an app)]: Voice search allows users music by hearing it. Additional fea- text query. Furthermore, in cases where to input search queries by voice. In tures include viewing lyrics and artist multiple locations are the results of a addition, the app allows dictation of a biographies and downloading songs.