Basic Database Searching

Total Page:16

File Type:pdf, Size:1020Kb

Basic Database Searching LIS 663 /Dr. Péter Jacsó Term Paper Basic Database Searching By HaiYing Wang UNIVERSITY OF HAWAII AT MANOA Library and Information Science Program Dec. 1st, 2003 I. Topic Analysis I like to use Google to begin my search. Google can provide fast search and retrieve highly relevant information needed from millions of resources with one single search query. It enables me to use natural language to build my search strategy, and also gives me suggestion about the queries. That is why I am so interested in the OneSearch features in the DialogWeb. Therefore, the topic of my search is about “cross database searching.” I aim to find information in technical aspect about the topic, that is, why cross database searching can be implemented, and how it is executed, etc. The first three articles that I have read about the topic are: i. Jacsó, Péter. Cross-searching electronic journal archives. Information Today 19(6) (June, 2002): 34. ii. Jacsó, Péter. Cross-database searching on the Web with term mapping from multiple thesauri. In Proceedings of 20th National Online Meeting, New York, NY 18-20 May 1999 (M.E. Williams, ed.): 217-25. iii. Tennant, Roy. The right solution: Federated search tools. Library Journal. 128(11) (Jun 15, 2003): 28 These three articles helped me to learn that there exist three main techniques in cross-data base searching: Query mapping, Metadata search, and Z39.50 protocol/standard. I initially defined two concept groups and corresponding terms as follows: Concept groups Corresponding terms Cross-database searching Cross database searching simultaneous database searching multiple database searching Information retrieval techniques Query mapping Metadata Z39.50 As Z39.50 protocol and metadata search are very new topics, the limit of the publication year can be set from 1990 to present. I am preferred to the type of journal articles because I can find the fulltext easily compared to conference papers, or research report. The language limited is English only. 1 II. Six Nominated Databases chosen in DIALINDEX Because my topic is related to computer and information Science, I began DIALINDEX search with setting files to two subject categories: CompSci and InfoSci (b 411; sf compsci,infosci). There are 26 files selected. According to the two concept groups mention above, I tested several queries: 2 According to the posting numbers for each test query, I took the following 10 files/databases as nominees. 3 File Database Name 1: ERIC (1966-present) 2: INSPEC (1969-present) 6: NTIS (1964-present) 8: Ei Compendex ® (1970-2003) 34: SciSearch® - a Cited Reference Science Database (1990-present) 144: PASCAL (1973-2003) 148: Gale Group Trade & Industry Database(TM) (1976-present) 202: Information Science & Technology Abstracts (1966-present) 233: Internet & Personal Computing Abstracts(TM) (1981-present) 438: Library Literature and Information Science (1984-present) After studying the File Description and Subject Coverage in the blue sheets of these 10 files, I kicked File 438 out of game first because it is an index only database although the subject coverage is perfect for the topic. File 148 was kicked out secondly because it has too much information about business products and the document types are journal articles and newsletters/newspaper articles only. File 233 was kicked out because only 90 journals are indexed and abstracted so that the document type is journal articles only. Kicking out File 34 is the most difficult decision I made. This file is the most famous citation database by the Institute for Scientific Information (ISI) and has over 12,104,390 records as of November 2003. From the viewpoint of the posting number in each test query, it should be kept. However, when I used File 414 (Dialog Journal Name Finder) to find and report the journal name of “Library Journal” I found none of File 2, 6, 8, and 34 includes this important journal. Compared with the other 6 files left (File 1, 2, 6, 8, 144, 202), there are only 2 document types in File 34: Journal Articles and Book Reviews, and an abstract is not always available for each record. Although there is a descriptor index, but the consistency of descriptors is not good because the descriptors are from author(s)’ key words. Through the comparisons among 6 nominated files/databases (File 1, 2, 6, 8, 144, 202), it can be seen clearly that the time coverage of these 6 files are almost the same, the Geographic Coverage of each file is international, and the document types of each file include journal articles, conference papers, books, reports, and theses/dissertations (except File 8: Ei Compendex). Three basic indexes: TI (word indexing), AB (word indexing), and DE (word & phrase indexing) are available for all 6 files. 4 5 III. Two Best Databases Chosen The 6 nominated files/databases mentioned above can be categorized into 2 clusters: File 2, 6, and 8 are more engineering aspect, and File 1, 144, and 202 are more general science. So I attempted to choose one best database from each cluster. Based on the test queries for the concept groups executed in the section II of this paper, the following queries were formed by including synonyms and term variations, and expanding proximity. Set Term Searched Items File S1 CROSS()DATABASE()SEARCH? 53 S1 CROSS()DATABASE()SEARCH? 16 202 S1 CROSS()DATABASE()SEARCH? 8 144 S1 CROSS()DATABASE()SEARCH? 4 8 S1 CROSS()DATABASE()SEARCH? 1 6 S1 CROSS()DATABASE()SEARCH? 18 2 S1 CROSS()DATABASE()SEARCH? 6 1 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 423 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 95 202 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 83 144 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 45 8 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 39 6 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 120 2 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 41 1 S3 INFORMATION()RETRIEVAL()TECHNI? 603 S3 INFORMATION()RETRIEVAL()TECHNI? 75 202 S3 INFORMATION()RETRIEVAL()TECHNI? 90 144 S3 INFORMATION()RETRIEVAL()TECHNI? 103 8 S3 INFORMATION()RETRIEVAL()TECHNI? 57 6 S3 INFORMATION()RETRIEVAL()TECHNI? 242 2 S3 INFORMATION()RETRIEVAL()TECHNI? 36 1 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 9818 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1869 202 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1553 144 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1731 8 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1451 6 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1568 2 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1646 1 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 16396 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 533 202 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 1325 144 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 2612 8 6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 286 6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 11457 2 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 183 1 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 1582 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 110 202 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 402 144 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 275 8 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 55 6 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 604 2 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 136 1 S7 Z39()50 727 S7 Z39()50 199 202 S7 Z39()50 207 144 S7 Z39()50 41 8 S7 Z39()50 10 6 S7 Z39()50 226 2 S7 Z39()50 44 1 S8 S5 OR S6 OR S7 18607 S8 S5 OR S6 OR S7 834 202 S8 S5 OR S6 OR S7 1920 144 S8 S5 OR S6 OR S7 2916 8 S8 S5 OR S6 OR S7 350 6 S8 S5 OR S6 OR S7 12228 2 S8 S5 OR S6 OR S7 359 1 S9 S4 AND S8 376 S9 S4 AND S8 109 202 S9 S4 AND S8 36 144 S9 S4 AND S8 60 8 S9 S4 AND S8 10 6 S9 S4 AND S8 125 2 S9 S4 AND S8 36 1 S10 S2 AND S9 2 S10 S2 AND S9 0 202 S10 S2 AND S9 2 144 S10 S2 AND S9 0 8 S10 S2 AND S9 0 6 S10 S2 AND S9 0 2 S10 S2 AND S9 0 1 S11 S2 OR S9 797 S11 S2 OR S9 204 202 S11 S2 OR S9 117 144 S11 S2 OR S9 105 8 S11 S2 OR S9 49 6 7 S11 S2 OR S9 245 2 S11 S2 OR S9 77 1 S12 ID (sorted in duplicate order) 797 S13 IDO S11 (duplicates only) 170 S14 RD S11 (unique items) 701 S14 RD S11 (unique items) 200 202 S14 RD S11 (unique items) 78 144 S14 RD S11 (unique items) 76 8 S14 RD S11 (unique items) 33 6 S14 RD S11 (unique items) 237 2 S14 RD S11 (unique items) 77 1 S15 S14/ENG 667 S15 S14/ENG 195 202 S15 S14/ENG 72 144 S15 S14/ENG 75 8 S15 S14/ENG 29 6 S15 S14/ENG 219 2 S15 S14/ENG 77 1 S16 S15/1990:2003 508 S16 S15/1990:2003 156 202 S16 S15/1990:2003 71 144 S16 S15/1990:2003 53 8 S16 S15/1990:2003 13 6 S16 S15/1990:2003 169 2 S16 S15/1990:2003 46 1 From the query log above, it shows that File 2 and 202 have the most two highest posting numbers in Set 2, 9, and 11.
Recommended publications
  • Spatial-Semantic Image Search by Visual Feature Synthesis
    Spatial-Semantic Image Search by Visual Feature Synthesis Long Mai1, Hailin Jin2, Zhe Lin2, Chen Fang2, Jonathan Brandt2, and Feng Liu1 1Portland State University 2Adobe Research 1 2 {mtlong,fliu}@cs.pdx.com, {hljin,zlin,cfang,jbrandt}@adobe.com Person Surfboard Water Text-based query a) Image search with semantic constraints only Person Water Surfboard Person Water Surfboard Spatial-semantic query b) Image search with spatial-semantic constraints Figure 1: Spatial-semantic image search. (a) Searching with content-only queries such as text keywords, while effective in retrieving relevant content, is unable to incorporate detailed spatial intents. (b) Spatial-semantic image search allows users to interact with the 2-D canvas to express their search intent both spatially and semantically. Abstract 1. Introduction Image retrieval is essential for various applications, such The performance of image retrieval has been improved as browsing photo collections [6, 52], exploring large visual tremendously in recent years through the use of deep fea- data archives [15, 16, 38, 43], and online shopping [26, 37]. ture representations. Most existing methods, however, aim It has long been an active research topic with a rich literature to retrieve images that are visually similar or semantically in computer vision and multimedia [8, 30, 55, 56, 57]. In relevant to the query, irrespective of spatial configuration. recent years, advances in research on deep feature learning In this paper, we develop a spatial-semantic image search have led to effective image and query representations that technology that enables users to search for images with are shown effective for retrieving images that are visually both semantic and spatial constraints by manipulating con- similar or semantically relevant to the query [12, 14, 25, 53].
    [Show full text]
  • Image Retrieval Within Augmented Reality
    Image Retrieval within Augmented Reality Philip Manja May 5, 2017 Technische Universität Dresden Fakultät Informatik Institut für Software und Multimediatechnik Professur für Multimedia-Technologie Master’s Thesis Image Retrieval within Augmented Reality Philip Manja 1. Reviewer Prof. Raimund Dachselt Fakultät Informatik Technische Universität Dresden 2. Reviewer Dr. Annett Mitschick Fakultät Informatik Technische Universität Dresden Supervisors Dr. Annett Mitschick and Wolfgang Büschel (M.Sc.) May 5, 2017 Philip Manja Image Retrieval within Augmented Reality Master’s Thesis, May 5, 2017 Reviewers: Prof. Raimund Dachselt and Dr. Annett Mitschick Supervisors: Dr. Annett Mitschick and Wolfgang Büschel (M.Sc.) Technische Universität Dresden Professur für Multimedia-Technologie Institut für Software und Multimediatechnik Fakultät Informatik Nöthnitzer Straße 46 01187 Dresden Abstract The present work investigates the potential of augmented reality for improving the image retrieval process. Design and usability challenges were identified for both fields of research in order to formulate design goals for the development of concepts. A taxonomy for image retrieval within augmented reality was elaborated based on research work and used to structure related work and basic ideas for interaction. Based on the taxonomy, application scenarios were formulated as further requirements for concepts. Using the basic interaction ideas and the requirements, two comprehensive concepts for image retrieval within augmented reality were elaborated. One of the concepts was implemented using a Microsoft HoloLens and evaluated in a user study. The study showed that the concept was rated generally positive by the users and provided insight in different spatial behavior and search strategies when practicing image retrieval in augmented reality. Abstract (deutsch) Die vorliegende Arbeit untersucht das Potenzial von Augmented Reality zur Verbes- serung von Image Retrieval Prozessen.
    [Show full text]
  • IEEE Paper Template in A4
    International Journal for Research in Advanced Computer Science and Engineering ISSN: 2208-2107 Image Search Engine of Mono Image Asmaa Salah Aldin Ibrahim1, Mohammed Ali Mohammed2 ¹Baghdad College of Economic Sciences University, Iraq ²University of Information Technology and Communication, Iraq Abstract— In recent year, images are widely used in many applications, such as facebook, snapchat. The large numbers of these images are saved in the smart system to easy access and retrieve. This paper aims to design and implement the new algorithm which is used in search of images. The mono image (black and white) is used as input data to the proposed algorithm. The methodology of this paper is to split image into number of block (block size = 8*8). For each block set 1 or 0 in order to count the number of black and white pixels. Finally, the result compare with other image dataset with the threshold value. The result show's that the proposed algorithm is successful passed in tested stage. Keywords— Image processing, Search Image, Mono Image, Search by Image. I. INTRODUCTION Describe the automatic selection of features from an image training set were done by using the theories of multidimensional discriminant analysis and the associated optimal linear projection. The demonstration of the effectiveness of these most discriminating features for view-based class retrieval from a large database of widely varying real-world objects presented as "well-framed" views, and compared with that of the principal component analysis[1]. An image retrieval system contains a database with a large number of images. The system retrieves images from the database are similar to a query image entered by the user.
    [Show full text]
  • Image Based Search Engine
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 Image based Search Engine Anjali Sharma1, Bhanu Parasher1 1M.Tech., Computer Science and Engineering, Indraprastha Institute of Information Technology, Delhi, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - We can see that different kinds of data are CNN model to resolve the problem of similar cloth retrieval floating on the internet from the last couple of years. It and clothing-related problems. As the dataset is large, a fine- includes audio, video, images, and text data. Processing of all tuned, pre-trained model is used to lower the complexity these data has been a key interest point for Researchers. To between training and transfer learning. Domain transfer effectively utilize these data, we want to explore more about it. learning helps in fine-tuning the main idea behind is reusing As it is said, an image speaks more than a thousand words, so the low-level and midlevel network across domains. In paper here in this article, we are working on images. Many [5] (Venkata and Yadav, 2012), the author proposed a researchers showed their interest in Content-based image method of image classification based on two features. First retrieval (CBIR). CBIR doesn't work on the metadata, for is edge detection using the Sobel edge detection filter. The example, tags, image description, etc. However, it works on the second feature is the colour of an image for which the author details of the images, or we can say the features of the images, used CCV.
    [Show full text]
  • NEXT GENERATION CATALOGUES: an ANALYSIS of USER SEARCH STRATEGIES and BEHAVIOR by FREDRICK KIWUWA LUGYA DISSERTATION Submitted I
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository NEXT GENERATION CATALOGUES: AN ANALYSIS OF USER SEARCH STRATEGIES AND BEHAVIOR BY FREDRICK KIWUWA LUGYA DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Library and Information Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2017 Urbana, Illinois Doctoral Committee: Associate Professor Kathryn La Barre, Chair and Director Assistant Professor Nicole A. Cooke Dr. Jennifer Emanuel Taylor, University of Illinois Chicago Associate Professor Carol Tilley ABSTRACT The movement from online catalogues to search and discovery systems has not addressed the goals of true resource discoverability. While catalogue user studies have focused on user search and discovery processes and experiences, and construction and manipulation of search queries, little insight is given to how searchers interact with search features of next generation catalogues. Better understanding of user experiences can help guide informed decisions when selecting and implementing new systems. In this study, fourteen graduate students completed a set of information seeking tasks using UIUC's VuFind installation. Observations of these interactions elicited insight into both search feature use and user understanding of the function of features. Participants used the basic search option for most searches. This is because users understand that basic search draws from a deep index that always gives results regardless of search terms; and because it is convenient, appearing at every level of the search, thus reducing effort and shortening search time.
    [Show full text]
  • A Digital Libraries System Based on Multi-Level Agents
    A Digital Libraries System based on Multi-level Agents Kamel Hamard1, Jian-Yun Nie1, Gregor v. Bochmann2, Robert Godin3, Brigitte Kerhervé3, T. Radhakrishnan4, Rajjan Shinghal4, James Turner5, Fadi Berouti6, F.P. Ferrie6 1. Dept. IRO, Université de Montréal 2. SITE, University of Ottawa 3. Dept. Informatique, Univ. du Québec à Montréal 4. Dept. Of Computer science, Concordia University 5. Dept. de bibliothéconomie et science d’information, Université de Montréal 6. Center for Intelligent Machines, McGill University Abstract In this paper, we describe an agent-based architecture for digital library (DL) systems and its implementation. This architecture is inspired from Harvest and UMDL, but several extensions have been made. The most important extension concerns the building of multi-level indexing and cataloguing. Search agents are either local or global. A global search agent interacts with other agents of the system, and manages a set of local search agents. We extended the Z39.50 standard in order to support the visual characteristics of images and we also integrated agents for multilingual retrieval. This work shows that the agent-based architecture is flexible enough to integrate various kinds of agents and services in a single system. 1. Introduction In recent years, many studies have been carried out on Digital Libraries (DL). These studies have focused on the following points: • the description of digital objects • organization and processing of multimedia data • user interface • scalability of the system • interoperability • extensibility UMDL and Harvest are two examples of such systems. Although not specifically designed for DL, Harvest [Bowman 94] proposes an interesting architecture for distributed DL system.
    [Show full text]
  • Web Image Retrieval Re-Ranking with Relevance Model
    Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA, 15213 U.S.A {whlin,rong,alex}@cs.cmu.edu Abstract web search engines (e.g. Google Image Search [10], Al- taVista Image [1], specialized web image search engines Web image retrieval is a challenging task that requires (e.g. Ditto [8], PicSearch [18]), and web interfaces to com- efforts from image processing, link structure analysis, and mercial image providers (e.g. Getty Images [9], Corbis [6]). web text retrieval. Since content-based image retrieval is Although capability and coverage vary from system to still considered very difficult, most current large-scale web system, we can categorize the web image search engines image search engines exploit text and link structure to “un- into three flavors in terms of how images are indexed. The derstand” the content of the web images. However, lo- first one is text-based index. The representation of the im- cal text information, such as caption, filenames and adja- age includes filename, caption, surrounding text, and text cent text, is not always reliable and informative. Therefore, in the HTML document that displays the image. The sec- global information should be taken into account when a ond one is image-based index. The image is represented in web image retrieval system makes relevance judgment. In visual features such as color, texture, and shape. The third this paper, we propose a re-ranking method to improve web one is hybrid of text and image index.
    [Show full text]
  • A Framework for Evaluating the Retrieval Effectiveness of Search Engines Dirk Lewandowski Hamburg University of Applied Sciences, Germany
    1 A Framework for Evaluating the Retrieval Effectiveness of Search Engines Dirk Lewandowski Hamburg University of Applied Sciences, Germany This is a preprint of a book chapter to be published in: Jouis, Christophe: Next Generation Search Engine: Advanced Models for Information Retrieval. Hershey, PA: IGI Global, 2012 http://www.igi-global.com/book/next-generation-search-engines/59723 ABSTRACT This chapter presents a theoretical framework for evaluating next generation search engines. We focus on search engines whose results presentation is enriched with additional information and does not merely present the usual list of “10 blue links”, that is, of ten links to results, accompanied by a short description. While Web search is used as an example here, the framework can easily be applied to search engines in any other area. The framework not only addresses the results presentation, but also takes into account an extension of the general design of retrieval effectiveness tests. The chapter examines the ways in which this design might influence the results of such studies and how a reliable test is best designed. INTRODUCTION Information retrieval systems in general and specific search engines need to be evaluated during the development process, as well as when the system is running. A main objective of the evaluations is to improve the quality of the search results, although other reasons for evaluating search engines do exist (see Lewandowski & Höchstötter, 2008). A variety of quality factors can be applied to search engines. These can be grouped into four major areas (Lewandowski & Höchstötter, 2008): • Index Quality: This area of quality measurement indicates the important role that search engines’ databases play in retrieving relevant and comprehensive results.
    [Show full text]
  • E-Book: an Introduction to Custom Search Engines for Websites
    E-book: An Introduction to Custom Search Engines for Websites What are you looking for? Search www.sitesearch360.com 1 Contents An introduction to search engines........................................ 4 Your visitor’s search journey: the key elements................ 23 What is site search?................................................................ 6 Presentation — how does it look?................................................ 24 Why is site search important?............................................... 6 Processing — how does it work?.................................................. 28 Navigation vs. search................................................................. 7 Results — how does it deliver?.................................................... 32 How are people using search?............................................... 9 Implementation challenges.................................................... 36 Why are custom search engines the perfect choice?........ 12 JavaScript................................................................................ 36 Semantic search........................................................................ 12 Security.................................................................................. 37 Natural search language............................................................ 13 WordPress............................................................................... 37 Actionable analytics.................................................................. 14 Cloudflare and
    [Show full text]
  • Arxiv:1905.12794V3 [Cs.CV] 25 Nov 2020
    Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback Hui Wu*1;2 Yupeng Gao∗2 Xiaoxiao Guo∗2 Ziad Al-Halah3 Steven Rennie4 Kristen Grauman3 Rogerio Feris1;2 1 MIT-IBM Watson AI Lab 2 IBM Research 3 UT Austin 4 Pryon Abstract Classical Fashion Search Dialog-based Fashion Search Length Product Filtered by: I want a mini sleeveless dress Conversational interfaces for the detail-oriented retail Short Mini White Red Midi fashion domain are more natural, expressive, and user Long Sleeveless friendly than classical keyword-based search interfaces. In … this paper, we introduce the Fashion IQ dataset to sup- Color I prefer stripes and more port and advance research on interactive fashion image re- Blue covered around the neck White trieval. Fashion IQ is the first fashion dataset to provide Orange human-generated captions that distinguish similar pairs of … garment images together with side-information consisting Sleeves I want a little more red accent of real-world product descriptions and derived visual at- long tribute labels for these images. We provide a detailed analy- 3/4 Sleeveless sis of the characteristics of the Fashion IQ data, and present … a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with Figure 1: A classical fashion search interface relies on the image features, user feedback, and dialog history, leading user selecting filters based on a pre-defined fashion ontol- to improved performance over the state of the art in dialog- ogy. This process can be cumbersome and the search results based image retrieval.
    [Show full text]
  • Metadata for Images: Emerging Practice and Standards
    Metadata for images: emerging practice and standards Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath, Bath, BA2 7AY, United Kingdom http://www.ukoln.ac.uk/ [email protected] Abstract The effective organisation and retrieval of digital images in a networked environment will depend upon the development and use of relevant metadata standards. This paper will discuss metadata formats and digital images with particular reference to the Dublin Core initiative and the standard developed by the Consortium for the Computer Interchange of Museum Information (CIMI). Issues relating to interoperability, the management of resources and digital preservation will be discussed with reference to a number of existing projects and initiatives. 1 Introduction From prehistoric times, human communication has depended upon the creation and use of image-based information. Images have been a key component of human progress, for example, in the visual arts, architecture and geography. According to Eugene Ferguson, technological developments relating to images in the Renaissance, including the invention of printing and the use of linear perspective, had a positive effect upon the rise of modern science and engineering [1]. The invention of photography and moving-image technology in the form of film, television and video recordings has increased global dependence on communication through images. The importance of this image material is such that many different types of organisation exist in order to create, collect and maintain collections of them. These include publicly funded bodies like art galleries, museums and libraries as well as commercial organisations like television companies and newspapers [2].
    [Show full text]
  • Natural Language Processing for Online Applications Text Retrieval, Extraction and Categorization
    Natural Language Processing for Online Applications Natural Language Processing Editor Prof. Ruslan Mitkov School of Humanities, Languages and Social Sciences University of Wolverhampton Stafford St. Wolverhampton WV1 1SB, United Kingdom Email: [email protected] Advisory Board Christian Boitet (University of Grenoble) Jn Carroll (University of Sussex, Brighton) Eugene Charniak (Brown University, Providence) Eduard Hovy (Information Sciences Institute, USC) Richard Kittredge (University of Montreal) Geoffrey Leech (Lancaster University) Carlos Martin-Vide (Rovira i Virgili Un., Tarragona) Andrei Mikheev (University of Edinburgh) Jn Nerbonne (University of Groningen) Nicolas Nicolov (IBM, T.J. Watson Research Center) Kemal Oflazer (Sabanci University) Allan Ramsey (UMIST, Manchester) Monique Rolbert (Université de Marseille) Richard Sproat (AT&T Labs Research, Florham Park) K-Y Su (Baviour Design Corp.) Isabelle Trancoso (INESC, Lisbon) Benjamin Tsou (City University of Hong Kong) Jun-ichi Tsujii (University of Tokyo) Evene Tzoukermann (Bell Laboratories, Murray Hill) Yorick Wilks (University of Sheffield) Volume 5 Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization by Peter Jackson and Isabelle Moulinier Natural Language Processing for Online Applications Text Retrieval, Extraction and Categorization Peter Jackson Isabelle Moulinier omson Legal & Regulatory John Benjamins Publishing Company Amsterdam / iladelia TM The paper used in this publication meets the minimum requirements of American 8 National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data Jackson, Peter, 1948- Natural language processing for online applications : text retrieval, extraction, and categorization / Peter Jackson, Isabelle Moulinier. p.cm.(Natural Language Processing, issn 1567–8202 ; v.5) Includes bibliographical references and index.
    [Show full text]