Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
Total Page:16
File Type:pdf, Size:1020Kb
CHAPTER 2 Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More In this chapter, we’ll tap into the Facebook platform through its (Social) Graph API and explore some of the vast possibilities. Facebook is arguably the heart of the social web and is somewhat of an all-in-one wonder, given that more than half of its 1 billion users1 are active each day updating statuses, posting photos, exchanging messages, chatting in real time, checking in to physical locales, playing games, shopping, and just about anything else you can imagine. From a social web mining standpoint, the wealth of data that Facebook stores about individuals, groups, and products is quite exciting, because Facebook’s clean API presents incredible opportunities to synthesize it into information (the world’s most precious commodity), and glean valuable insights. On the other hand, this great power commands great responsibility, and Facebook has instrumented the most sophisticated set of online privacy controls that the world has ever seen in order to help protect its users from exploit. It’s worth noting that although Facebook is self-proclaimed as a social graph, it’s been steadily transforming into a valuable interest graph as well, because it maintains rela‐ tionships between people and the things that they’re interested in through its Facebook pages and “Likes” feature. In this regard, you may increasingly hear it framed as a “social interest graph.” For the most part, you can make a case that interest graphs implicitly exist and can be bootstrapped from most sources of social data. As an example, Chap‐ ter 1 made the case that Twitter is actually an interest graph because of its asymmetric “following” (or, to say it another way, “interested in”) relationships between people and other people, places, or things. The notion of Facebook as an interest graph will come 1. Internet usage statistics show that the world’s population is estimated to be approximately 7 billion, with the estimated number of Internet users being almost 2.5 billion. 45 www.it-ebooks.info up throughout this chapter, and we’ll return to the idea of explicitly bootstrapping an interest graph from social data in Chapter 7. The remainder of this chapter assumes that you have an active Facebook account, which is required to gain access to the Facebook APIs. Although there are plenty of fun things that you can do to analyze public information, you may find that it’s a little bit more fun if you are analyzing data from your own social network, so it’s worth adding a few friends if you are new to Facebook. Always get the latest bug-fixed source code for this chapter (and every other chapter) online at http://bit.ly/MiningTheSocialWeb2E. Be sure to also take advantage of this book’s virtual machine experience, as described in Appendix A, to maximize your enjoyment of the sample code. 2.1. Overview As this is the second chapter in the book, the concepts we’ll cover are a bit more complex than those in Chapter 1, but should still be highly accessible for a very broad audience. In this chapter you’ll learn about: • Facebook’s Social Graph API and how to make API requests • The Open Graph protocol and its relationship to Facebook’s Social Graph • Analyzing likes from Facebook pages and from Facebook friends • Techniques such as clique analysis for analyzing social graphs • Visualizing social graphs with the D3 JavaScript library 2.2. Exploring Facebook’s Social Graph API The Facebook platform is a mature, robust, and well-documented gateway into what may be the most comprehensive and well-organized information store ever amassed, both in terms of breadth and depth. It’s broad in that its user base represents about one- seventh of the entire living population, and it’s deep with respect to the amount of information that’s known about any one of its particular users. Whereas Twitter features an asymmetric friendship model that is open and predicated on following other users without any particular consent, Facebook’s friendship model is symmetric and requires a mutual agreement between users to gain visibility into one another’s interactions and activities. 46 | Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More www.it-ebooks.info Furthermore, whereas virtually all interactions except for private messages between users on Twitter are public statuses, Facebook allows for much more finely grained privacy controls in which friendships can be organized and maintained as lists with varying levels of visibility available to a friend on any particular activity. For example, you might choose to share a link or photo only with a particular list of friends as opposed to your entire social network. As a social web miner, the only way that you can access a Facebook user’s account data is by registering an application and using that application as the entry point into the Facebook developer platform. Moreover, the only data that’s available to an application is whatever the user has explicitly authorized it to access. For example, as a developer writing a Facebook application, you’ll be the user who’s logging into the application, and the application will be able to access any data that you explicitly authorize it to access. In that regard, as a Facebook user you might think of an application a bit like any of your Facebook friends, in that you’re ultimately in control of what the application can access and you can revoke access at any time. The Facebook Platform Policies docu‐ ment is a must-read for any Facebook developer, as it provides the comprehensive set of rights and responsibilities for all Facebook users as well as the spirit and letter of the law for Facebook developers. If you haven’t already, it’s worth taking a moment to review Facebook’s developer policies and to bookmark the Facebook Developers home page, since it is the definitive entry point into the Facebook platform and its documentation. Keep in mind that as a developer mining your own account, you may not have a problem allowing your own application to access all of your account data. Beware, however, of aspiring to develop a successful hosted application that requests access to more than the minimum amount of data necessary to complete, because it’s quite likely that a user will not recognize or trust your application to command that level of privilege (and rightly so). Although we’ll programmatically access the Facebook platform later in this chapter, Facebook provides a number of useful developer tools, including a Graph API Explorer app that we’ll be using for initial familiarization with the Social Graph. The app provides an intuitive and turnkey way of querying the Social Graph, and once you’re comfortable with how the Social Graph works, translating queries into Python code for automation and further processing comes quite naturally. Although we’ll work through the Graph API as part of the discussion, you may benefit from an initial review of the well-written “Getting Started: The Graph API” document as a comprehensive preamble. 2.2. Exploring Facebook’s Social Graph API | 47 www.it-ebooks.info In addition to the Graph API, you may also encounter the Facebook Query Language (FQL) and what is now referred to as the Legacy REST API. Be advised that although FQL is still very much alive and will be briefly introduced in this chapter, the Legacy REST API is in depreca‐ tion and will be phased out soon. Do not use it for any development that you do with Facebook. 2.2.1. Understanding the Social Graph API As its name implies, Facebook’s Social Graph is a massive graph data structure repre‐ senting social interactions and consisting of nodes and connections between the nodes. The Graph API provides the primary means of interacting with the Social Graph, and the best way to get acquainted with the Graph API is to spend a few minutes tinkering around with the Graph API Explorer. It is important to note that the Graph API Explorer is not a particularly special tool of any kind. Aside from being able to prepopulate and debug your access token, it is an ordinary Facebook app that uses the same developer APIs that any other developer application would use. In fact, the Graph API Explorer is handy when you have a par‐ ticular OAuth token that’s associated with a specific set of authorizations for an appli‐ cation that you are developing and you want to run some queries as part of an explor‐ atory development effort or debug cycle. We’ll revisit this general idea shortly as we programmatically access the Graph API. Figures 2-1 through 2-4 illustrate a progressive series of Graph API queries that result from clicking on the plus (+) symbol and adding connections and fields. There are a few items to note about this particular query: Access token The access token that appears in the application is an OAuth token that is provided as a courtesy for the logged-in user; it is the same OAuth token that your application would need to access the data in question. We’ll opt to use this access token throughout this chapter, but you can consult Appendix B for a brief overview of OAuth, including details on implementing an OAuth flow for Facebook in order to retrieve an access token. As mentioned in Chapter 1, if this is your first encounter with OAuth, it’s probably sufficient at this point to know that the protocol is a social web standard that stands for Open Authorization.