Analysis of Query Keywords of Sports-Related Queries Using Visualization and Clustering Jin Zhang and Dietmar Wolfram School of Information Studies, University of Wisconsin—Milwaukee, Milwaukee, WI 53201. E-mail: {jzhang, dwolfram}@uwm.edu Peiling Wang School of Information Sciences, College of Communication and Information, University of Tennessee at Knoxville, Knoxville, TN 37996–0341. E-mail:
[email protected] The authors investigated 11 sports-related query key- the user’s request includes the client Internet Protocol (IP) words extracted from a public search engine query log address, request date/time, page requested, HTTP code, bytes to better understand sports-related information seeking served, user agent, referrer, and so on. The data are kept in on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related key- a standard format in a transaction log file (Hallam-Baker & words were identified, along with frequently co-occurring Behlendorf, 2008). query terms associated with the identified keywords. Although a transaction log comprises rich data, including Relationships among each sports-related focus keyword browsing times and traversal paths, it is the queries directly and its related keywords were characterized and grouped submitted by users that have attracted the most research using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two attention. Query data contain keywords that reflect users’ approaches were synthesized in a visual context by high- wide-ranging information needs. Query logs have been ana- lighting the results of the hierarchical clustering analysis lyzed from a variety of sources with different emphases and in the visual MDS configuration.