Augmenting Text Mining Approaches with Social Network Analysis

Full citation: Lee, C.W., Licorish, S.A., Savarimuthu, B.T.R., & MacDonell, S.G. (2016) Augmenting text mining approaches with social network analysis to understand the complex relationships among users’ requests: a case study of the Android operating system, in Proceedings of the 49th Hawaii International Conference on System Sciences (HICSS2016). Koloa HI, USA, AIS, pp.1144-1153. doi:10.1109/HICSS.2016.145 Augmenting Text Mining Approaches with Social Network Analysis to Understand the Complex Relationships among Users’ Requests: a Case Study of the Android Operating System Chan Won Lee, Sherlock A. Licorish, Bastin Tony Roy Savarimuthu and Stephen G. MacDonell Department of Information Science, University of Otago PO Box 56, Dunedin 9016, New Zealand [email protected], [email protected], [email protected], [email protected] needs as the main basis for system development. A Abstract subsequent study by Norman and Draper also supports the Text mining approaches are being used increasingly for critical role that users play in developing and maintaining business analytics. In particular, such approaches are now high quality systems [2]. Later approaches to delivering central to understanding users’ feedback regarding applications have thus looked to bridge the gaps between systems delivered via online application distribution developers and users [3]. Such a strategy is also relevant to platforms such as Google Play. In such settings, large contemporary methods of systems development and volumes of reviews of potentially numerous apps and deployment, which are driven by growing interest in virtual 1 systems means that it is infeasible to use manual communities (e.g., Python development community ) and mechanisms to extract insights and knowledge that could online application distribution platforms (OADPs, e.g., 2 3 inform product improvement. In this context of identifying Apple’s AppStore and Google Play ). The burgeoning software system improvement options, text mining adoption of mobile devices means that users are now 4 techniques are used to reveal the features that are integrally involved in the development of apps . In many mentioned most often as being in need of correction (e.g., cases users themselves are the architects of apps [4]. In GPS), and topics that are associated with features addition, given the ubiquitous nature of virtual perceived as being defective (e.g., inaccuracy of GPS). communities and social media, multiple communication Other approaches may supplement such techniques to channels that facilitate rapid communication between provide further insights for online communities and systems developers and target users are now available [5]. solution providers. In this work we augment text mining These channels enable (and even empower) users to approaches with social network analysis to demonstrate describe their experiences while using apps, express the utility of using multiple techniques. Our outcomes satisfaction or dissatisfaction after using apps, rate apps and suggest that text mining approaches may indeed be request new app features. supplemented with other methods to deliver a broader Thus, app owners are able to extract potentially meaningful range of insights. insights from users’ reviews with a view to improving the features they deliver to end-users. This process may not be 1. INTRODUCTION straightforward, however, as large volumes of reviews can make this activity very challenging [6]. Thus, there is Understanding the needs and expectations of users has long growing research effort aimed at aiding this process, and played an important role in maximizing the usefulness and overcoming the need for extensive human involvement in usability of delivered software systems. Gaining such an the knowledge extraction process [7]. In particular, understanding therefore features prominently in researchers have demonstrated that there is utility in contemporary business analytics, just as it has for decades. various text mining approaches (e.g., those that reveal In their seminal work on usability [1], for instance, Gould which features are said to be most in need of correction [8]) and Lewis suggested that the number one determinant for and modelling techniques (e.g., those that reveal the topics producing successful systems is the consideration of users’ raised in relation to particular features [9]). Such 1 https://www.python.org/community/ 4 We use the term ‘app’ to describe a program or software product that 2 itunes.apple.com/en/genre/ios/id36?mt=8 is frequently delivered (especially to mobile devices) via an online 3 play.google.com/store/apps?hl=en platform. 1 approaches may be complemented by other techniques, to game was associated with update and levels, where the reveal not only what is most problematic or topical, but also authors inferred that users were interested in games having how issues might be nested, or how issues concerning more levels) [8]. While Iacob and Harrison [8] used a certain features may influence others. For instance, linguistic rule and LDA, others have used part-of-speech coupling between software modules A and B may mean tagging and sentiment analysis to achieve a similar goal, that issues in module A may also affect module B. where nouns were extracted to study mentioned features Revealing such relationships through the study of users’ and users’ sentiments associated with such features [9]. complaints and requests could inform maintenance One extremely large and successful community that has strategies. We provide a demonstration of the use of received considerable text mining attention is that of the complementary techniques in this work, investigating the Android OS. Studies have examined where bugs are most Android OS5 as a case study. prevalent in the Android architecture [14], the quality of In the next section (Section 2) we provide our study various types of bug reports [15], top user concerns and the background, and we develop our hypotheses to address our attention they receive from the Android community [6], and objectives in Section 3. We then describe our study setting the scale and severity of Android security threats as in Section 4. In Section 5 we present our results, and in reported by its stakeholders [16]. With Android devices Section 6 we discuss our findings and outline their now leading mobile device sales [17], it is fitting that the implications. We then consider the threats to the validity of text mining community would look to understand this the work in Section 7, before providing concluding remarks platform and provide improvement advice. That said, we in Section 8. believe that the methods used for extracting knowledge from Android issue logs, and those used for understanding end-users’ reviews logged on OADPs in general, could be 2. BACKGROUND usefully supplemented by approaches from the field of Understanding the content in online reviews has become complex networks [18], and in particular, social network integral to enhancing users’ experiences, their satisfaction, analysis [19], to discover useful patterns in this and ultimately, maintaining a competitive advantage for the unstructured data. We outline two hypotheses to software provider [10]. However, gaining such operationalize such an approach in the next section. understandings is challenging due to the labor-intensive nature of the insight extraction process from very large 3. HYPOTHESES DEVELOPMENT volumes of free text. In particular, and as mentioned above, the abundance of mobile devices and web-based Previous studies have tended to employ text mining communication channels means that it can be infeasible to approaches in order to identify requests for specific feature manually extract insights from massive volumes of enhancements [8, 9, 20]. However, due to the customer feedback [11]. While structured data such as interconnected nature of software systems’ components, rankings (or likes) and pricing details are easily analyzed, dependencies between modules may mean that issues in complexity is inherent, and rampant, in unstructured data one module could be interconnected with others, thus or free text (e.g., customer reviews and opinions) [7]. creating a web of potentially complex issues that may only Acknowledgment of this complexity has led to increasing be detected if the relationships among user requests or interest in the potential utility of text mining. complaints are explored. In fact, those issues that are most frequently reported in isolation may not be as important to Text mining is the extraction of information and knowledge the community as those issues that have a wider effect from (very) large volumes of textual data using automated through dependencies and other connections. Social techniques [12]. Approaches typically provide contextual network theoretical constructs, often used in social and knowledge that may inform future project improvements, behavioral sciences research to study the relationships help individuals and organizations to avoid previous between individuals and groups, have indeed noted that mistakes, and lead to improvements in customer service. popularity (or count) as a measure may not adequately Thus, text mining has the potential to make individuals and signify the importance or influence of an entity in a group, organizations aware of previously unknown facts and whereas closeness measures may be more indicative of problem areas regarding a product [13]. these characteristics in network connectivity

Augmenting Text Mining Approaches with Social Network Analysis

A Network Approach to Define Modularity of Components In

Practical Applications of Community Detection

Social Network Analysis with Content and Graphs

Networks in Cognitive Science

Multi-Layered Network Embedding

An Ecological Approach to Software Supply Chain Risk Management

Exponential Random Graph Models An

Networktoolbox

Patterns in Syntactic Dependency Networks

Evaluating Network Analysis and Agent Based Modeling for Investigating the Stability of Commercial Air Carrier Schedules

Fast Cross-Layer Dependency Inference on Multi-Layered Networks

Mining Social Dependencies in Dynamic Interaction Networks