Business Intelligence Features

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 18 Sep 2012 06:03:44 UTC Contents

Articles

Features 1 1 Online analytical processing 8 Analytics 13 Data mining 15 Process mining 28 Complex event processing 32 Business performance management 37 Benchmarking 42 Text mining 46 Predictive analytics 53 Prescriptive analytics 64 Decision support system 67 Competitive intelligence 72 Data warehouse 77 Data mart 84 Forrester Research 86 Information management 88 Business process 90 Business Intelligence portal 95 Information extraction 96 Granularity 100 Mashup (web application hybrid) 103 Green computing 108 Social network 118 Data 127 Mobile business intelligence 132 Composite application 139 Cloud computing 140 Multi-touch 154 References Article Sources and Contributors 161 Image Sources, Licenses and Contributors 166 Article Licenses License 167 1

Features

Business intelligence

Business intelligence (BI) is defined as the ability for an organization to take all its capabilities and convert them into knowledge. This produces large amounts of information that can lead to the development of new opportunities. Identifying these opportunities, and implementing an effective strategy, can provide a competitive market advantage and long-term stability within the organization's industry. [1] BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics. Business intelligence aims to support better business decision-making. Thus a BI system can be called a decision support system (DSS).[2] Though the term business intelligence is sometimes used as a synonym for competitive intelligence (because they both support decision making), BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. If understood broadly, business intelligence can include the subset of competitive intelligence.[3]

History In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."[4] Business intelligence as it is understood today is said to have evolved from the decision support systems which began in the 1960s and developed throughout the mid-1980s. DSS originated in the computer-aided models created to assist with decision making and planning. From DSS, data warehouses, Executive Information Systems, OLAP and business intelligence came into focus beginning in the late 80s. In 1989, Howard Dresner (later a Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems."[2] It was not until the late 1990s that this usage was widespread.[5]

Business intelligence and data warehousing Often BI applications use data gathered from a data warehouse or a data mart. However, not all data warehouses are used for business intelligence, nor do all business intelligence applications require a data warehouse. In order to distinguish between concepts of business intelligence and data warehouses, Forrester Research often defines business intelligence in one of two ways: Using a broad definition: "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making."[6] When using this definition, business intelligence also includes technologies such as data integration, data quality, data warehousing, master data management, text and content analytics, and many others that the market sometimes lumps into the Information Management segment. Therefore, Forrester refers to data preparation and data usage as two separate, but closely linked segments of the business Business intelligence 2

intelligence architectural stack. Forrester defines the latter, narrower business intelligence market as "referring to just the top layers of the BI architectural stack such as reporting, analytics and dashboards."[7]

Business intelligence and business analytics Thomas Davenport has argued that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI based on , prediction, and optimization.[8]

Applications in an enterprise Business intelligence can be applied to the following business purposes, in order to drive business value. 1. Measurement – program that creates a hierarchy of performance metrics (see also Metrics Reference Model) and benchmarking that informs business leaders about progress towards business goals (business process management). 2. Analytics – program that builds quantitative processes for a business to arrive at optimal decisions and to perform business knowledge discovery. Frequently involves: data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modeling, complex event processing and prescriptive analytics. 3. Reporting/enterprise reporting – program that builds infrastructure for strategic reporting to serve the strategic management of a business, not operational reporting. Frequently involves , executive information system and OLAP. 4. Collaboration/collaboration platform – program that gets different areas (both inside and outside the business) to work together through data sharing and electronic data interchange. 5. Knowledge management – program to make the company data driven through strategies and practices to identify, create, represent, distribute, and enable adoption of insights and experiences that are true business knowledge. Knowledge management leads to learning management and regulatory compliance. In addition to above, business intelligence also can provide a pro-active approach, such as ALARM function to alert immediately to end-user. There are many types of alerts, for example if some business value exceeds the threshold value the color of that amount in the report will turn RED and the business analyst is alerted. Sometimes an alert mail will be sent to the user as well. This end to end process requires data governance, which should be handled by the expert.

Prioritization of business intelligence projects It is often difficult to provide a positive business case for business intelligence initiatives and often the projects will need to be prioritized through strategic initiatives. Here are some hints to increase the benefits for a BI project. • As described by Kimball[9] you must determine the tangible benefits such as eliminated cost of producing legacy reports. • Enforce access to data for the entire organization.[10] In this way even a small benefit, such as a few minutes saved, will make a difference when it is multiplied by the number of employees in the entire organization. • As described by Ross, Weil & Roberson for Enterprise Architecture,[11] consider letting the BI project be driven by other business initiatives with excellent business cases. To support this approach, the organization must have Enterprise Architects, which will be able to detect suitable business projects. Business intelligence 3

Success factors of implementation Before implementing a BI solution, it is worth taking different factors into consideration before proceeding. According to Kimball et al., these are the three critical areas that you need to assess within your organization before getting ready to do a BI project:[12] 1. The level of commitment and sponsorship of the project from senior management 2. The level of business need for creating a BI implementation 3. The amount and quality of business data available.

Business sponsorship The commitment and sponsorship of senior management is according to Kimball et al., the most important criteria for assessment.[13] This is because having strong management backing will help overcome shortcomings elsewhere in the project. But as Kimball et al. state: “even the most elegantly designed DW/BI system cannot overcome a lack of business [management] sponsorship”.[14] It is important that management personnel who participate in the project have a vision and an idea of the benefits and drawbacks of implementing a BI system. The best business sponsor should have organizational clout and should be well connected within the organization. It is ideal that the business sponsor is demanding but also able to be realistic and supportive if the implementation runs into delays or drawbacks. The management sponsor also needs to be able to assume accountability and to take responsibility for failures and setbacks on the project. Support from multiple members of the management will ensure that the project will not fail if one person leaves the steering group. However, having many managers work together on the project can also mean that there are several different interests that attempt to pull the project in different directions, such as if different departments want to put more emphasis on their usage. This issue can be countered by an early and specific analysis of the business areas that will benefit the most from the implementation. All stakeholders in project should participate in this analysis in order for them to feel ownership of the project and to find common ground. Another management problem that should be encountered before start of implementation is if the business sponsor is overly aggressive. If the management individual gets carried away by the possibilities of using BI and starts wanting the DW or BI implementation to include several different sets of data that were not included in the original planning phase. However, since extra implementations of extra data will most likely add many months to the original plan, it is probably a good idea to make sure that the person from management is aware of his actions.

Business needs Because of the close relationship with senior management, another critical thing that needs to be assessed before the project is implemented is whether or not there is a business need and whether there is a clear business benefit by doing the implementation.[15] The needs and benefits of the implementation are sometimes driven by competition and the need to gain an advantage in the market. Another reason for a business-driven approach to implementation of BI is the acquisition of other organizations that enlarge the original organization it can sometimes be beneficial to implement DW or BI in order to create more oversight.

Amount and quality of available data Without good data, it does not really matter how good the management sponsorship or business-driven motivation is. Without the proper data, or with too little quality data, any BI implementation will fail. Before implementation it is a good idea to do data profiling; this analysis will be able to describe the “content, consistency and structure [..]”[15] of the data. This should be done as early as possible in the process and if the analysis shows that your data is lacking, it is a good idea to put the project on the shelf temporarily while the IT department figures out how to do proper data collection. Business intelligence 4

When planning for business data and business intelligence requirements,it is always advisable to consider the specific scenarios that apply to that particular organization, and then select the features of business intelligence needs that is best suited for the scenario selected. More often than not, scenarios will revolve around distinct business processes, each of which is built on one or more data sources. These sources are used by features that present that data as information to knowledge workers, who subsequently act on those information. The business needs of the organization for each business process adopted will correspond to the essential steps of business intelligence. These essential steps of business intelligence includes but not limited to: 1)Go through business data sources in order to collect needed data 2)Convert business data to information and present approproately 3)Query and analyze data, and 4)Act on those data collected

User aspect Some considerations must be made in order to successfully integrate the usage of business intelligence systems in a company. Ultimately the BI system must be accepted and utilized by the users in order for it to add value to the organization.[16][17] If the usability of the system is poor, the users may become frustrated and spend a considerable amount of time figuring out how to use the system or may not be able to really use the system. If the system does not add value to the users´ mission, they will simply not use it.[17] In order to increase the user acceptance of a BI system, it may be advisable to consult the business users at an early stage of the DW/BI lifecycle, for example at the requirements gathering phase.[16] This can provide an insight into the business process and what the users need from the BI system. There are several methods for gathering this information, such as questionnaires and interview sessions. When gathering the requirements from the business users, the local IT department should also be consulted in order to determine to which degree it is possible to fulfill the business's needs based on the available data.[16] Taking on a user-centered approach throughout the design and development stage may further increase the chance of rapid user adoption of the BI system.[17] Besides focusing on the user experience offered by the BI applications, it may also possibly motivate the users to utilize the system by adding an element of competition. Kimball[16] suggests implementing a function on the Business Intelligence portal website where reports on system usage can be found. By doing so, managers can see how well their departments are doing and compare themselves to others and this may spur them to encourage their staff to utilize the BI system even more. In a 2007 article, H. J. Watson gives an example of how the competitive element can act as an incentive.[18] Watson describes how a large call centre has implemented performance dashboards for all the call agents and that monthly incentive bonuses have been tied up to the performance metrics. Furthermore the agents can see how their own performance compares to the other team members. The implementation of this type of performance measurement and competition significantly improved the performance of the agents. Other elements which may increase the success of BI can be by involving senior management in order to make BI a part of the organizational culture and also by providing the users with the necessary tools, training and support.[18] By offering user training, more people may actually use the BI application.[16] Providing user support is necessary in order to maintain the BI system and assist users who run into problems.[17] User support can be incorporated in many ways, for example by creating a website. The website should contain great content and tools for finding the necessary information. Furthermore, helpdesk support can be used. The helpdesk can be manned by e.g. power users or the DW/BI project team.[16] Business intelligence 5

Marketplace There are a number of business intelligence vendors, often categorized into the remaining independent "pure-play" vendors and the consolidated "megavendors" which have entered the market through a recent trend of acquisitions in the BI industry.[19] Some companies adopting BI decide to pick and choose from different product offerings (best-of-breed) rather than purchase one comprehensive integrated solution (full-service).[20]

Industry-specific Specific considerations for business intelligence systems have to be taken in some sectors such as governmental banking regulations. The information collected by banking institutions and analyzed with BI software must be protected from some groups or individuals, while being fully available to other groups or individuals. Therefore BI solutions must be sensitive to those needs and be flexible enough to adapt to new regulations and changes to existing law.

Semi-structured or unstructured data Businesses create a huge amount of valuable information in the form of e-mails, memos, notes from call-centers, news, user groups, chats, reports, web-pages, presentations, image-, video-files, and marketing material and news. According to Merrill Lynch, more than 85% of all business information exists in these forms. These information types are called either semi-structured or unstructured data. However, organizations often only use these documents once.[21] The management of semi-structured data is recognized as a major unsolved problem in the information technology industry.[22] According to projections from Gartner (2003), white collar workers will spend anywhere from 30 to 40 percent of their time searching, finding and assessing unstructured data. BI uses both structured and unstructured data, but the former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision making.[22][23] Because of the difficulty of properly searching, finding and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task or project. This can ultimately lead to poorly informed decision making.[21] Therefore, when designing a business intelligence/DW-solution, the specific problems associated with semi-structured and unstructured data must be accommodated for as well as those for the structured data.[23]

Unstructured data vs. semi-structured data Unstructured and semi-structured data have different meanings depending on their context. In the context of relational database systems, it refers to data that cannot be stored in columns and rows. It must be stored in a BLOB (binary large object), a catch-all data type available in most relational database management systems. But many of these data types, like e-mails, word processing text files, PPTs, image-files, and video-files conform to a standard that offers the possibility of metadata. Metadata can include information such as author and time of creation, and this can be stored in a relational database. Therefore it may be more accurate to talk about this as semi-structured documents or data,[22] but no specific consensus seems to have been reached. Business intelligence 6

Problems with semi-structured or unstructured data There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich,[24] some of those are: 1. Physically accessing unstructured textual data – unstructured data is stored in a huge variety of formats. 2. Terminology – Among researchers and analysts, there is a need to develop a standardized terminology. 3. Volume of data – As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis. 4. Searchability of unstructured textual data – A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008)[24] gives an example: “a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies.”

The use of metadata To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of metadata.[21] Many systems already capture some metadata (e.g. filename, author, size, etc.), but more useful would be metadata about the actual content – e.g. summaries, topics, people or companies mentioned. Two technologies designed for generating metadata about content are automatic categorization and information extraction.

Future A 2009 Gartner paper predicted[25] these developments in the business intelligence market: • Because of lack of information, processes, and tools, through 2012, more than 35 percent of the top 5,000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets. • By 2012, business units will control at least 40 percent of the total budget for business intelligence. • By 2012, one-third of analytic applications applied to business processes will be delivered through coarse-grained application mashups. A 2009 Information Management special report predicted the top BI trends: "green computing, social networking, data visualization, mobile BI, predictive analytics, composite applications, cloud computing and multitouch."[26] Other business intelligence trends include the following:[27] • Third party SOA-BI products increasingly address ETL issues of volume and throughput. • Cloud computing and Software-as-a-Service (SaaS) are ubiquitous. • Companies embrace in-memory processing, 64-bit processing, and pre-packaged analytic BI applications. • Operational applications have callable BI components, with improvements in response time, scaling, and concurrency. • Near or real time BI analytics is a baseline expectation. • Open source BI software replaces vendor offerings. Other lines of research include the combined study of business intelligence and uncertain data.[28][29] In this context, the data used is not assumed to be precise, accurate and complete. Instead, data is considered uncertain and therefore this uncertainty is propagated to the results produced by BI. According to a study by the Aberdeen Group, there has been increasing interest in Software-as-a-Service (SaaS) business intelligence over the past years, with twice as many organizations using this deployment approach as one year ago – 15% in 2009 compared to 7% in 2008. An article by InfoWorld’s Chris Kanaracus points out similar growth data from research firm IDC, which predicts the SaaS BI market will grow 22 percent each year through 2013 thanks to increased product sophistication, strained Business intelligence 7

IT budgets, and other factors.[30]

References [1] (Rud, Olivia (2009). Business Intelligence Success Factors: Tools for Aligning Your Business in the Global Economy. Hoboken, N.J: Wiley & Sons. ISBN 978-0-470-39240-9.)

[2] D. J. Power (10 March 2007). "A Brief History of Decision Support Systems, version 4.0" (http:/ / dssresources. com/ history/ dsshistory. html). DSSResources.COM. . Retrieved 10 July 2008.

[3] Kobielus, James (30 April 2010). "What’s Not BI? Oh, Don’t Get Me Started....Oops Too Late...Here Goes...." (http:/ / blogs. forrester. com/

james_kobielus/ 10-04-30-what’s_not_bi_oh_don’t_get_me_startedoops_too_latehere_goes). . "“Business” intelligence is a non-domain-specific catchall for all the types of analytic data that can be delivered to users in reports, dashboards, and the like. When you specify the subject domain for this intelligence, then you can refer to “competitive intelligence,” “market intelligence,” “social intelligence,” “financial intelligence,” “HR intelligence,” “supply chain intelligence,” and the like."

[4] H P Luhn (1958). "A Business Intelligence System" (http:/ / www. research. ibm. com/ journal/ rd/ 024/ ibmrd0204H. pdf). IBM Journal 2 (4): 314. doi:10.1147/rd.24.0314. .

[5] Power, D. J.. "A Brief History of Decision Support Systems" (http:/ / dssresources. com/ history/ dsshistory. html). . Retrieved 1 November 2010.

[6] Evelson, Boris (21 November 2008). "Topic Overview: Business Intelligence" (http:/ / www. forrester. com/ rb/ Research/

topic_overview_business_intelligence/ q/ id/ 39218/ t/ 2). .

[7] Evelson, Boris (29 April 2010). "Want to know what Forrester's lead data analysts are thinking about BI and the data domain?" (http:/ / blogs.

forrester. com/ boris_evelson/ 10-04-29-want_know_what_forresters_lead_data_analysts_are_thinking_about_bi_and_data_domain). .

[8] Henschen, Doug (4 January 2010). Analytics at Work: Q&A with Tom Davenport (http:/ / www. informationweek. com/ news/ software/ bi/ 222200096). (Interview). . [9] Kimball et al., 2008: 29

[10] "Are You Ready for the New Business Intelligence?" (http:/ / content. dell. com/ us/ en/ enterprise/ d/ large-business/

ready-business-intelligence. aspx). Dell.com. . Retrieved 2012-06-19. [11] Jeanne W. Ross, Peter Weil, David C. Robertson (2006) "Enterprise Architecture As Strategy", p. 117 ISBN 1-59139-839-8. [12] Kimball et al. 2008: p. 298 [13] Kimball et al., 2008: 16 [14] Kimball et al., 2008: 18 [15] Kimball et al., 2008: 17 [16] Kimball [17] Swain Scheps "Business Intelligence For Dummies", 2008, ISBN 978-0-470-12726-0 [18] Watson, Hugh J.; Wixom, Barbara H. (2007). "The Current State of Business Intelligence". Computer 40 (9): 96. doi:10.1109/MC.2007.331.

[19] Pendse, Nigel (7 March 2008). "Consolidations in the BI industry" (http:/ / www. bi-verdict. com/ fileadmin/ FreeAnalyses/ consolidations. htm). The OLAP Report. .

[20] Imhoff, Claudia (4 April 2006). "Three Trends in Business Intelligence Technology" (http:/ / www. b-eye-network. com/ view/ 2608). .

[21] Rao, . (2003). "From unstructured data to actionable intelligence" (http:/ / www. ramanarao. com/ papers/ rao-itpro-2003-11. pdf). IT Professional 5 (6): 29. doi:10.1109/MITP.2003.1254966. .

[22] Blumberg, R. & S. Atre (2003). "The Problem with Unstructured Data" (http:/ / soquelgroup. com/ Articles/ dmreview_0203_problem. pdf). DM Review: 42–46. .

[23] Negash, S (2004). "Business Intelligence" (http:/ / site. xavier. edu/ sena/ info600/ businessintelligence. pdf). Communications of the Association of Information Systems 13: 177–195. . [24] Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from "Managing Unstructured data in the organization", Prentice Hall 2008, pp. 1–13

[25] Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond (http:/ / www. gartner. com/ it/ page. jsp?id=856714). gartner.com. 15 January 2009

[26] Campbell, Don (23 June 2009). "10 Red Hot BI Trends" (http:/ / www. information-management. com/ specialreports/ 2009_148/

business_intelligence_data_vizualization_social_networking_analytics-10015628-1. html). Information Management. .

[27] Wik, Philip (11 August 2011). "10 Service-Oriented Architecture and Business Intelligence" (http:/ / www. servicetechmag. com/ I53/ 0811-2). Information Management. . [28] Rodriguez, Carlos; Daniel, Florian; Casati, Fabio; Cappiello, Cinzia (2010). "Toward Uncertain Business Intelligence: The Case of Key Indicators". IEEE Internet Computing 14 (4): 32. doi:10.1109/MIC.2010.59.

[29] Rodriguez, C., Daniel, F., Casati, F. & Cappiello, C. (2009), Computing Uncertain Key Indicators from Uncertain Data (http:/ / mitiq. mit.

edu/ ICIQ/ Documents/ IQ Conference 2009/ Papers/ 3-C. pdf), pp. 106–120, | conference = ICIQ'09 | year = 2009

[30] SaaS BI growth will soar in 2010 | Cloud Computing (http:/ / infoworld. com/ d/ cloud-computing/ saas-bi-growth-will-soar-in-2010-511). InfoWorld (2010-02-01). Retrieved on 17 January 2012. Business intelligence 8

Bibliography • Ralph Kimball et al. "The Data warehouse Lifecycle Toolkit" (2nd ed.) Wiley ISBN 0-470-47957-4

External links • Chaudhuri, Surajit; Dayal, Umeshwar; Narasayya, Vivek (August 2011). "An Overview Of Business Intelligence

Technology" (http:/ / cacm. acm. org/ magazines/ 2011/ 8/

114953-an-overview-of-business-intelligence-technology/ fulltext). Communications of the ACM 54 (8): 88–98. doi:10.1145/1978542.1978562. Retrieved 26 October 2011.

Online analytical processing

In computing, online analytical processing, or OLAP ( /ˈoʊlæp/), is an approach to swiftly answer multi-dimensional analytical (MDA) queries.[1] OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining.[2] Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM),[3] budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture.[4] The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).[5] OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing.[6] Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. In contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region’s sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time.[7] They borrow aspects of navigational databases, hierarchical databases and relational databases.

Overview of OLAP systems The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a Vector space. The usual interface to manipulate an OLAP cube is a matrix interface like Pivot in a program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a foreign key column to the fact table. This allows an analyst to view the measures along any combination of the dimensions. For example: Online analytical processing 9

Sales Fact Table +------+------+ | sale_amount | time_id | +------+------+ Time Dimension | 2008.10| 1234 |---+ +------+------+ +------+------+ | | time_id | timestamp | | +------+------+ +---->| 1234 | 20080902 12:35:43 | +------+------+

Multidimensional databases Multidimensional structure is defined as “a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data”.[8] The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions”.[9] Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models.[10]

Aggregations It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data.[11][12] The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data.[13] Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-Complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A* search algorithm. Online analytical processing 10

Types OLAP systems have been traditionally categorized using the following taxonomy.[14]

Multidimensional 'MOLAP' is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.

Relational ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Hybrid There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

Comparison Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers. • Some MOLAP implementations are prone to database explosion, a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. • MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression techniques.[15] • ROLAP is generally more scalable.[15] However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously. • Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use. • HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support. Online analytical processing 11

Other types The following acronyms are also sometimes used, although they are not as widespread as the ones above: • WOLAP - Web-based OLAP • DOLAP - Desktop OLAP • RTOLAP - Real-Time OLAP

APIs and query languages Unlike relational databases, which had SQL as the standard query language, and widespread such as ODBC, JDBC and OLEDB, there was no such unification in the OLAP world for a long time. The first real standard API was OLE DB for OLAP specification from Microsoft which appeared in 1997 and introduced the MDX query language. Several OLAP vendors - both server and client - adopted it. In 2001 Microsoft and Hyperion announced the XML for Analysis specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de-facto standard.[16] Since September-2011 LINQ can be used to query SSAS OLAP cubes from Microsoft .NET.[17]

Products

History The first product that performed OLAP queries was Express, which was released in 1970 (and acquired by Oracle in 1995 from Information Resources).[18] However, the term did not appear until 1993 when it was coined by Edgar F. Codd, who has been described as "the father of the relational database". Codd's paper[1] resulted from a short consulting assignment which Codd undertook for former Arbor Software (later Hyperion Solutions, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product, Essbase, a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article. OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998, Microsoft released its first OLAP Server - Microsoft Analysis Services, which drove wide adoption of OLAP technology and moved it into mainstream.

Product comparison See: Comparison of OLAP Servers.

Market structure Below is a list of top OLAP vendors in 2006, with figures in millions of US Dollars.[19] Online analytical processing 12

Vendor Global Revenue Consolidated company

Microsoft Corporation 1,806 Microsoft

Hyperion Solutions Corporation 1,077 ORACLE

Cognos 735 IBM

Business Objects 416 SAP

MicroStrategy 416 MicroStrategy

SAP AG 330 SAP

Cartesis SA 210 SAP

Applix 205 IBM

Infor 199 Infor

Oracle Corporation 159 ORACLE

Others 152 Others

Total 5,700

Microsoft was the only vendor that continuously exceeded the industrial average growth during 2000-2006. Since the above data was collected, Hyperion has been acquired by Oracle, Cartesis by Business Objects, Business Objects by SAP, Applix by Cognos, and Cognos by IBM.[20]

Bibliography • Daniel Lemire (2007-12). "Data Warehousing and OLAP-A Research-Oriented Bibliography" [21]. • Erik Thomsen. (1997). OLAP Solutions: Building Multidimensional Information Systems, 2nd Edition. John Wiley & Sons. ISBN 978-0-471-14931-6. • Ling Liu and Tamer M. Özsu (Eds.) (2009). "Encyclopedia of Database Systems [22], 4100 p. 60 illus. ISBN 978-0-387-49616-0. • O’Brien, J. A., & Marakas, G. M. (2009). Management information systems (9th ed.). Boston, MA: McGraw-Hill/Irwin.

References

[1] Codd E.F., Codd S.B., and Salley C.T. (1993). "Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate" (http:/ /

www. fpm. com/ refer/ codd. html). Codd & Date, Inc. . Retrieved 2008-03-05.

[2] Deepak Pareek (2007). Business Intelligence for Telecommunications (http:/ / books. . com/ ?id=M-UOE1Cp9OEC). CRC Press. pp. 294 pp. ISBN 0-8493-8792-2. . Retrieved 2008-03-18. [3] Apostolos Benisis (2010). Business Process Management:A Data Cube To Analyze Business Process Simulation Data For Decision Making

(http:/ / www. google. com/ products?q=9783639222166). VDM Verlag Dr. Müller e.K.. pp. 204 pp. ISBN 978-3-639-22216-6. . [4] Abdullah, Ahsan (2009). "Analysis of mealybug incidence on the cotton crop using ADSS-OLAP (Online Analytical Processing) tool , Volume 69, Issue 1". Computers and Electronics in Agriculture 69 (1): 59–72. doi:10.1016/j.compag.2009.07.003.

[5] "OLAP Council White Paper" (http:/ / www. symcorp. com/ downloads/ OLAP_CouncilWhitePaper. pdf) (PDF). OLAP Council. 1997. . Retrieved 2008-03-18. [6] O'Brien & Marakas, 2011, p. 402-403

[7] Hari Mailvaganam (2007). "Introduction to OLAP - Slice, Dice and Drill!" (http:/ / www. dwreview. com/ OLAP/ Introduction_OLAP. html). Data Warehousing Review. . Retrieved 2008-03-18. [8] O'Brien & Marakas, 2009, pg 177 [9] O'Brien & Marakas, 2009, pg 178 [10] Williams, C., Garza, V.R., Tucker, S, Marcus, A.M. (1994, January 24). Multidimensional models boost viewing options. InfoWorld, 16(4)

[11] MicroStrategy, Incorporated (1995). "The Case for Relational OLAP" (http:/ / www. cs. bgu. ac. il/ ~onap052/ uploads/ Seminar/ Relational

OLAP Microstrategy. pdf) (PDF). . Retrieved 2008-03-20. Online analytical processing 13

[12] Surajit Chaudhuri and Umeshwar Dayal (1997). "An overview of data warehousing and OLAP technology" (http:/ / doi. acm. org/ 10. 1145/

248603. 248616). SIGMOD Rec. (ACM) 26 (1): 65. doi:10.1145/248603.248616. . Retrieved 2008-03-20. [13] Gray, Jim; Chaudhuri, Surajit; Layman, Andrew; Reichart, Hamid; Venkatrao; Pellow; Pirahesh (1997). "Data Cube: {A} Relational

Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals" (http:/ / citeseer. ist. psu. edu/ gray97data. html). J. Data Mining and Knowledge Discovery 1 (1): 29–53. . Retrieved 2008-03-20.

[14] Nigel Pendse (2006-06-27). "OLAP architectures" (http:/ / www. olapreport. com/ Architectures. htm). OLAP Report. . Retrieved 2008-03-17.

[15] Bach Pedersen, Torben; S. Jensen, Christian (December 2001). "Multidimensional Database Technology" (http:/ / ieeexplore. ieee. org/ iel5/

2/ 20936/ 00970558. pdf) (PDF). Distributed Systems Online (IEEE): 40–46. ISSN 0018-9162. .

[16] Nigel Pendse (2007-08-23). "Commentary: OLAP API wars" (http:/ / www. olapreport. com/ Comment_APIs. htm). OLAP Report. . Retrieved 2008-03-18.

[17] "SSAS Entity Framework Provider for LINQ to SSAS OLAP" (http:/ / www. agiledesignllc. com/ Products). .

[18] Nigel Pendse (2007-08-23). "The origins of today’s OLAP products" (http:/ / olapreport. com/ origins. htm). OLAP Report. . Retrieved November 27, 2007.

[19] Nigel Pendse (2006). "OLAP Market" (http:/ / www. olapreport. com/ market. htm). OLAP Report. . Retrieved 2008-03-17.

[20] Nigel Pendse (2008-03-07). "Consolidations in the BI industry" (http:/ / www. olapreport. com/ consolidations. htm). . Retrieved 2008-03-18.

[21] http:/ / www. daniel-lemire. com/ OLAP/

[22] http:/ / www. springer. com/ computer/ database+ management+ & + information+ retrieval/ book/ 978-0-387-49616-0

Analytics

Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight. The most common application of analytics is to business data in order to describe, predict and improve business performance. Specifically, arenas within analytics include enterprise decision management, retail analytics, marketing and web analytics, predictive science, credit risk analysis, and fraud analytics. Since analytics can require extensive

computation (See Big Data), the algorithms and software used for A sample dashboard. Tools like analytics harness the most current methods in computer science, this help businesses identify trends and make statistics, and mathematics.[1] decisions.

Analytics vs Analysis Analytics is a two-sided coin. On one side, it uses descriptive and predictive models to gain valuable knowledge from data - . On the other, analytics uses this insight to recommend action or to guide decision making - communication. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology. There is a pronounced tendency to use the term analytics in business settings e.g. text analytics vs. the more generic text mining to emphasize this broader perspective. Analytics 14

Examples

Marketing optimization Marketing has evolved from a creative process into a highly data-driven process. Marketing organizations use analytics to determine the outcomes of campaigns or efforts and to guide decisions for investment and consumer targeting. Demographic studies, customer segmentation, conjoint analysis and other techniques allow marketers to use large amounts of consumer purchase, survey and panel data to understand and communicate marketing strategy. Web analytics allows marketers to collect session-level information about interactions on a website. Those interactions provide the web analytics information systems with the information to track the referrer, search keywords, IP address, and activities of the visitor. With this information, a marketer can improve the marketing campaigns, site creative content, and information architecture.

Portfolio analysis A common application of business analytics is portfolio analysis. In this, a bank or lending agency has a collection of accounts of varying value and risk. The accounts may differ by the social status (wealthy, middle-class, poor, etc.) of the holder, the geographical location, its net value, and many other factors. The lender must balance the return on the loan with the risk of default for each loan. The question is then how to evaluate the portfolio as a whole. The least risk loan may be to the very wealthy, but there are a very limited number of wealthy people. On the other hand there are many poor that can be lent to, but at greater risk. Some balance must be struck that maximizes return and minimizes risk. The analytics solution may combine time series analysis, with many other issues in order to make decisions on when to lend money to these different borrower segments, or decisions on the interest rate charged to members of a portfolio segment to cover any losses among members in that segment.

Challenges In the industry of commercial analytics software, an emphasis has emerged on solving the challenges of analyzing massive, complex data sets, often when such data is in a constant state of change. Such data sets are commonly referred to as big data. Whereas once the problems posed by big data were only found in the scientific community, today big data is a problem for many businesses that operate transactional systems online and, as a result, amass large volumes of data quickly.[2]. The analysis of unstructured data types is another challenge getting attention in the industry. Unstructured data differs from structured data in that its format varies widely and cannot be stored in traditional relational databases without significant effort at data transformation.[3] Sources of unstructured data, such as email, the contents of word processor documents, PDFs, geospatial data, etc#, are rapidly becoming a relevant source of business intelligence for businesses, governments and universities.[4] For example, in Britain the discovery that one company was illegally selling fraudulent doctor's notes in order to assist people in defrauding employers and insurance companies,[5] is an opportunity for insurance firms to increase the vigilance of their unstructured data analysis. The McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per year and the European public sector €250 billion.[6] These challenges are the current inspiration for much of the innovation in modern analytics information systems, giving birth to relatively new machine analysis concepts such as complex event processing, full text search and analysis, and even new ideas in presentation.[7] One such innovation is the introduction of grid-like architecture in machine analysis, allowing increases in the speed of massively parallel processing by distributing the workload to many computers all with equal access to the complete data set.[8] Analytics 15

References [1] Kohavi, Rothleder and Simoudis (2002). "Emerging Trends in Business Analytics". Communications of the ACM 45 (8): 45–48.

[2] Naone, Erica. "The New Big Data" (http:/ / www#technologyreview#com/ computing/ 38397/ ). Technology Review, MIT. . Retrieved August 22, 2011. [3] Inmon, Bill (2007). Tapping Into Unstructured Data. Prentice-Hall. ISBN 978-0-13-236029-6.

[4] Wise, Lyndsay. "Data Analysis and Unstructured Data" (http:/ / www#dashboardinsight#com/ articles/ business-performance-management/ data-analysis-and-unstructured-data#aspx). Dashboard Insight. . Retrieved February 14, 2011.

[5] "Fake doctors' sick notes for Sale for £25, NHS fraud squad warns" (http:/ / www#telegraph#co#uk/ news/ uknews/ 2626120/ Fake-doctors-sick-notes-for-sale-on-web-for-25-NHS-fraud-squad-warns#html). The Telegraph. . Retrieved August 2008.

[6] "Big Data: The next frontier for innovation, competition and productivity as reported in Building with Big Data" (http:/ /

www#economist#com/ node/ 18741392). The Economist. May 26, 2011. Archived (http:/ / web#archive#org/ web/ 20110603031738/ http:/ /

www#economist#com/ node/ 18741392) from the original on 3 June 2011. . Retrieved May 26, 2011.

[7] Ortega, Dan. "Mobililty: Fueling a Brainier Business Intelligence" (http:/ / www#itbusinessedge#com/ cm/ community/ features/

guestopinions/ blog/ mobility-fueling-a-brainier-business-intelligence/ ?cs=47491). IT Business Edge. . Retrieved June 21, 2011.

[8] Khambadkone, Krish. "Are You Ready for Big Data?" (http:/ / www#infogain#com/ company/ perspective-big-data#jsp). InfoGain. . Retrieved February 10, 2011.

External links

• INFORMS' bi-monthly, digital magazine on the analytics profession (http:/ / analyticsmagazine. com/ )

Data mining

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD),[1] is a field at the intersection of computer science and statistics,[2][3][4] is the process that attempts to discover patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.[2] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.[2] Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[2] The term is a buzzword, and is frequently misused to mean any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) but is also generalized to any kind of computer decision support system, including artificial intelligence, machine learning, and business intelligence. In the proper use of the word, the key term is discovery, commonly defined as "detecting something new". Even the popular book "Data mining: Practical machine learning tools and techniques with Java"[5] (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons.[6] Often the more general terms "(large scale) data analysis", or "analytics" – or when referring to actual methods, artificial intelligence and machine learning – are more appropriate. The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This usually involves using database techniques such as spatial indexes. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps. The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test Data mining 16

against the larger data populations.

Etymology In the 1960s, statisticians used terms like "Data Fishing" or "Data Dredging" to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term "Data Mining" appeared around 1990 in the database community. At the beginning of the century, there was a phrase "database mining"™, trademarked by HNC, a San Diego-based company (now merged into FICO), to pitch their Data Mining Workstation;[7] researchers consequently turned to "data mining". Other terms used include Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction, etc. Gregory Piatetsky-Shapiro coined the term "Knowledge Discovery in Databases" for the first workshop on the same topic (1989) and this term became more popular in AI and Machine Learning Community. However, the term data mining became more popular in the business and press communities.[8] Currently, Data Mining and Knowledge Discovery are used interchangeably.

Background The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees (1960s), and support vector machines (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns[9] in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever larger data sets.

Research and evolution The premier professional body in the field is the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Since 1989 they have hosted an annual international conference and published its proceedings,[10] and since 1999 have published a biannual academic journal titled "SIGKDD Explorations".[11] Computer science conferences on data mining include: • CIKM Conference – ACM Conference on Information and Knowledge Management • DMIN Conference – International Conference on Data Mining • DMKD Conference – Research Issues on Data Mining and Knowledge Discovery • ECDM Conference – European Conference on Data Mining • ECML-PKDD Conference – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases • EDM Conference – International Conference on Educational Data Mining • ICDM Conference – IEEE International Conference on Data Mining • KDD Conference – ACM SIGKDD Conference on Knowledge Discovery and Data Mining • MLDM Conference – Machine Learning and Data Mining in Pattern Recognition • PAKDD Conference – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining • PAW Event – Predictialytics World • SDM Conference – SIAM International Conference on Data Mining (SIAM) • SSTD Symposium – Symposium on Spatial and Temporal Databases • WSDM Conference – ACM Conference on Web Search and Data Mining Data mining 17

Data mining topics are also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and International Conference on Very Large Data Bases

Process The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages: (1) Selection (2) Pre-processing (3) Transformation (4) Data Mining (5) Interpretation/Evaluation.[1] It exists, however, in many variations on this theme, such as the Cross Industry Standard Process for Data Mining (CRISP-DM) which defines six phases: (1) Business Understanding (2) Data Understanding (3) Data Preparation (4) Modeling (5) Evaluation (6) Deployment or a simplified process such as (1) pre-processing, (2) data mining, and (3) results validation.

Pre-processing Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target dataset must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate datasets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.

Data mining Data mining involves six common classes of tasks:[1] • Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors and require further investigation. • Association rule learning (Dependency modeling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. • Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. • Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam". • Regression – Attempts to find a function which models the data with the least error. • Summarization – providing a more compact representation of the data set, including visualization and report generation. Data mining 18

Results validation The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish "spam" from "legitimate" emails would be trained on a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify. A number of statistical methods may be used to evaluate the algorithm, such as ROC curves. If the learned patterns do not meet the desired standards, then it is necessary to re-evaluate and change the pre-processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.

Standards There have been some efforts to define standards for the data mining process, for example the 1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0) and the 2004 Java Data Mining standard (JDM 1.0). Development on successors to these processes (CRISP-DM 2.0 and JDM 2.0) was active in 2006, but has stalled since. JDM 2.0 was withdrawn without reaching a final draft. For exchanging the extracted models – in particular for use in predictive analytics – the key standard is the Predictive Model Markup Language (PMML), which is an XML-based language developed by the Data Mining Group (DMG) and supported as exchange format by many data mining applications. As the name suggests, it only covers prediction models, a particular data mining task of high importance to business applications. However, extensions to cover (for example) subspace clustering have been proposed independently of the DMG.[12]

Notable uses

Games Since the early 1960s, with the availability of oracles for certain combinatorial games, also called tablebases (e.g. for 3x3-chess) with any beginning configuration, small-board dots-and-boxes, small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining has been opened. This is the extraction of human-usable strategies from these oracles. Current pattern recognition approaches do not seem to fully acquire the high level of abstraction required to be applied successfully. Instead, extensive experimentation with the tablebases – combined with an intensive study of tablebase-answers to well designed problems, and with knowledge of prior art (i.e. pre-tablebase knowledge) – is used to yield insightful patterns. Berlekamp (in dots-and-boxes, etc.) and John Nunn (in chess endgames) are notable examples of researchers doing this work, though they were not – and are not – involved in tablebase generation.

Business Data mining in customer relationship management applications can contribute significantly to the bottom line. Rather than randomly contacting a prospect or customer through a call center or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. More sophisticated methods may be used to optimize resources across campaigns so that one may predict to which channel and to which offer an individual is most likely to respond (across all potential offers). Additionally, sophisticated applications could be used to automate mailing. Once the results from data mining (potential prospect/customer and Data mining 19

channel/offer) are determined, this "sophisticated application" can either automatically send an e-mail or a regular mail. Finally, in cases where many people will take an action without an offer, "uplift modeling" can be used to determine which people have the greatest increase in response if given an offer. Data clustering can also be used to automatically discover the segments or groups within a customer data set. Businesses employing data mining may see a return on investment, but also they recognize that the number of predictive models can quickly become very large. Rather than using one model to predict how many customers will churn, a business could build a separate model for each region and customer type. Then, instead of sending an offer to all people that are likely to churn, it may only want to send offers to loyal customers. Finally, the business may want to determine which customers are going to be profitable over a certain window in time, and only send the offers to those that are likely to be profitable. In order to maintain this quantity of models, they need to manage model versions and move on to automated data mining. Data mining can also be helpful to human resources (HR) departments in identifying the characteristics of their most successful employees. Information obtained – such as universities attended by highly successful employees – can help HR focus recruiting efforts accordingly. Additionally, Strategic Enterprise Management applications help a company translate corporate-level goals, such as profit and margin share targets, into operational decisions, such as production plans and workforce levels.[13] Another example of data mining, often called the market basket analysis, relates to its use in retail sales. If a clothing store records the purchases of customers, a data mining system could identify those customers who favor silk shirts over cotton ones. Although some explanations of relationships may be difficult, taking advantage of it is easier. The example deals with association rules within transaction-based data. Not all data are transaction based and logical, or inexact rules may also be present within a database. Market basket analysis has also been used to identify the purchase patterns of the Alpha Consumer. Alpha Consumers are people that play a key role in connecting with the concept behind a product, then adopting that product, and finally validating it for the rest of society. Analyzing the data collected on this type of user has allowed companies to predict future buying trends and forecast supply demands. Data mining is a highly effective tool in the catalog marketing industry. Catalogers have a rich database of history of their customer transactions for millions of customers dating back a number of years. Data mining tools can identify patterns among customers and help identify the most likely customers to respond to upcoming mailing campaigns. Data mining for business applications is a component which needs to be integrated into a complex modeling and decision making process. Reactive business intelligence (RBI) advocates a "holistic" approach that integrates data mining, modeling, and interactive visualization into an end-to-end discovery and continuous innovation process powered by human and automated learning.[14] In the area of decision making, the RBI approach has been used to mine knowledge that is progressively acquired from the decision maker, and then self-tune the decision method accordingly.[15] An example of data mining related to an integrated-circuit production line is described in the paper "Mining IC Test Data to Optimize VLSI Testing."[16] In this paper, the application of data mining and decision analysis to the problem of die-level functional testing is described. Experiments mentioned demonstrate the ability to apply a system of mining historical die-test data to create a probabilistic model of patterns of die failure. These patterns are then utilized to decide, in real time, which die to test next and when to stop testing. This system has been shown, based on experiments with historical test data, to have the potential to improve profits on mature IC products. Data mining 20

Science and engineering In recent years, data mining has been used widely in the areas of science and engineering, such as bioinformatics, genetics, medicine, education and electrical power engineering. In the study of human genetics, sequence mining helps address the important goal of understanding the mapping relationship between the inter-individual variations in human DNA sequence and the variability in disease susceptibility. In simple terms, it aims to find out how the changes in an individual's DNA sequence affects the risks of developing common diseases such as cancer, which is of great importance to improving methods of diagnosing, preventing, and treating these diseases. The data mining method that is used to perform this task is known as multifactor dimensionality reduction.[17] In the area of electrical power engineering, data mining methods have been widely used for condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on, for example, the status of the insulation (or other important safety-related parameters). Data clustering techniques – such as the self-organizing map (SOM), have been applied to vibration monitoring and analysis of transformer on-load tap-changers (OLTCS). Using vibration monitoring, it can be observed that each tap change operation generates a signal that contains information about the condition of the tap changer contacts and the drive mechanisms. Obviously, different tap positions will generate different signals. However, there was considerable variability amongst normal condition signals for exactly the same tap position. SOM has been applied to detect abnormal conditions and to hypothesize about the nature of the abnormalities.[18] Data mining methods have also been applied to dissolved gas analysis (DGA) in power transformers. DGA, as a diagnostics for power transformers, has been available for many years. Methods such as SOM has been applied to analyze generated data and to determine trends which are not obvious to the standard DGA ratio methods (such as Duval Triangle).[18] Another example of data mining in science and engineering is found in educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning,[19] and to understand factors influencing university student retention.[20] A similar example of social application of data mining is its use in expertise finding systems, whereby descriptors of human expertise are extracted, normalized, and classified so as to facilitate the finding of experts, particularly in scientific and technical fields. In this way, data mining can facilitate institutional memory. Other examples of application of data mining methods are biomedical data facilitated by domain ontologies,[21] mining clinical trial data,[22] and traffic analysis using SOM.[23] In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since 1998, used data mining methods to routinely screen for reporting patterns indicative of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.[24] Recently, similar methodology has been developed to mine large collections of electronic health records for temporal patterns associating drug prescriptions to medical diagnoses.[25] Data mining has been applied software artifacts within the realm of software engineering: Mining Software Repositories. Data mining 21

Human rights Data mining of government records – particularly records of the justice system (i.e. courts, prisons) – enables the discovery of systemic human rights violations in connection to generation and publication of invalid or fraudulent legal records by various government agencies.[26][27]

Medical data mining In 2011, the case of Sorrell v. IMS Health, Inc., decided by the Supreme Court of the United States, ruled that Pharmacies may share information with outside companies. This practice was authorized under the 1st Amendment of the Constitution, protecting the "freedom of speech."[28]

Spatial data mining Spatial data mining is the application of data mining methods to spatial data. The end objective of spatial data mining is to find patterns in data with respect to geography. So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, traditions, and approaches to visualization and data analysis. Particularly, most contemporary GIS have only very basic spatial analysis functionality. The immense explosion in geographically referenced data occasioned by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasizes the importance of developing data-driven inductive approaches to geographical analysis and modeling. Data mining offers great potential benefits for GIS-based applied decision-making. Recently, the task of integrating these two technologies has become of critical importance, especially as various public and private sector organizations possessing huge databases with thematic and geographically referenced data begin to realize the huge potential of the information contained therein. Among those organizations are: • offices requiring analysis or dissemination of geo-referenced statistical data • public health services searching for explanations of disease clustering • environmental agencies assessing the impact of changing land-use patterns on climate change • geo-marketing companies doing customer segmentation based on spatial location.

Challenges Geospatial data repositories tend to be very large. Moreover, existing GIS datasets are often splintered into feature and attribute components that are conventionally archived in hybrid data management systems. Algorithmic requirements differ substantially for relational (attribute) data management and for topological (feature) data management.[29] Related to this is the range and diversity of geographic data formats, which present unique challenges. The digital geographic data revolution is creating new types of data formats beyond the traditional "vector" and "raster" formats. Geographic data repositories increasingly include ill-structured data, such as imagery and geo-referenced multi-media.[30] There are several critical research challenges in geographic knowledge discovery and data mining. Miller and Han[31] offer the following list of emerging research topics in the field: • Developing and supporting geographic data warehouses (GDW's): Spatial properties are often reduced to simple aspatial attributes in mainstream data warehouses. Creating an integrated GDW requires solving issues of spatial and temporal data interoperability – including differences in semantics, referencing systems, geometry, accuracy, and position. • Better spatio-temporal representations in geographic knowledge discovery: Current geographic knowledge discovery (GKD) methods generally use very simple representations of geographic objects and spatial relationships. Geographic data mining methods should recognize more complex geographic objects (i.e. lines and polygons) and relationships (i.e. non-Euclidean distances, direction, connectivity, and interaction through Data mining 22

attributed geographic space such as terrain). Furthermore, the time dimension needs to be more fully integrated into these geographic representations and relationships. • Geographic knowledge discovery using diverse data types: GKD methods should be developed that can handle diverse data types beyond the traditional raster and vector models, including imagery and geo-referenced multimedia, as well as dynamic data types (video streams, animation).

Sensor data mining Wireless sensor networks can be used for facilitating the collection of data for spatial data mining for a variety of applications such as air pollution monitoring.[32] A characteristic of such networks is that nearby sensor nodes monitoring an environmental feature typically register similar values. This kind of data redundancy due to the spatial correlation between sensor observations inspires the techniques for in-network data aggregation and mining. By measuring the spatial correlation between data sampled by different sensors, a wide class of specialized algorithms can be developed to develop more efficient spatial data mining algorithms.[33]

Visual data mining In the process of turning from analogical into digital, large data sets have been generated, collected, and stored discovering statistical patterns, trends and information which is hidden in data, in order to build predictive patterns. Studies suggest visual data mining is faster and much more intuitive than is traditional data mining.[34][35]

Music data mining Data mining techniques, and in particular co-occurrence analysis, has been used to discover relevant similarities among music corpora (radio lists, CD databases) for the purpose of classifying music into genres in a more objective manner.[36]

Surveillance Data mining has been used to stop terrorist programs under the U.S. government, including the Total Information Awareness (TIA) program, Secure Flight (formerly known as Computer-Assisted Passenger Prescreening System (CAPPS II)), Analysis, Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE),[37] and the Multi-state Anti-Terrorism Information Exchange (MATRIX).[38] These programs have been discontinued due to controversy over whether they violate the 4th Amendment to the United States Constitution, although many programs that were formed under them continue to be funded by different organizations or under different names.[39] In the context of combating terrorism, two particularly plausible methods of data mining are "pattern mining" and "subject-based data mining".

Pattern mining "Pattern mining" is a data mining method that involves finding existing patterns in data. In this context patterns often means association rules. The original motivation for searching association rules came from the desire to analyze supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. For example, an association rule "beer ⇒ potato chips (80%)" states that four out of five customers that bought beer also bought potato chips. In the context of pattern mining as a tool to identify terrorist activity, the National Research Council provides the following definition: "Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise."[40][41][42] Pattern Mining includes new areas such a Music Information Retrieval (MIR) where patterns seen both in the temporal and non temporal domains are imported to classical knowledge discovery search methods. Data mining 23

Subject-based data mining "Subject-based data mining" is a data mining method involving the search for associations between individuals in data. In the context of combating terrorism, the National Research Council provides the following definition: "Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum."[41]

Knowledge grid Knowledge discovery "On the Grid" generally refers to conducting knowledge discovery in an open environment using grid computing concepts, allowing users to integrate data from various online data sources, as well make use of remote resources, for executing their data mining tasks. The earliest example was the Discovery Net,[43][44] developed at Imperial College London, which won the “Most Innovative Data-Intensive Application Award” at the ACM SC02 (Supercomputing 2002) conference and exhibition, based on a demonstration of a fully interactive distributed knowledge discovery application for a bioinformatics application. Other examples include work conducted by researchers at the University of Calabria, who developed a Knowledge Grid architecture for distributed knowledge discovery, based on grid computing.[45][46]

Reliability/Validity Data mining can be misused, and can also unintentionally produce results which appear significant but which do not actually predict future behavior and cannot be reproduced on a new sample of data. See Data snooping, Data dredging.

Challenges In four annual surveys of data miners,[47] data mining practitioners consistently identify three key challenges that they face more than any others, specifically (a) dirty data, (b) explaining data mining to others, and (c) unavailability of data/difficult access to data. In the 2010 survey data miners also shared their experiences in overcoming these particular challenges.[48]

Privacy concerns and ethics Some people believe that data mining itself is ethically neutral.[49] It is important to note that the term "data mining" has no ethical implications, but is often associated with the mining of information in relation to peoples' behavior (ethical and otherwise). To be precise, data mining is a statistical method that is applied to a set of information (i.e. a data set). Associating these data sets with people is an extreme narrowing of the types of data that are available in today's technological society. Examples could range from a set of crash test data for passenger vehicles, to the performance of a group of stocks. These types of data sets make up a great proportion of the information available to be acted on by data mining methods, and rarely have ethical concerns associated with them. However, the ways in which data mining can be used can in some cases and contexts raise questions regarding privacy, legality, and ethics.[50] In particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.[51][52] Data mining requires data preparation which can uncover information or patterns which may compromise confidentiality and privacy obligations. A common way for this to occur is through data aggregation. Data aggregation involves combining data together (possibly from various sources) in a way that facilitates analysis (but that also might make identification of private, individual-level data deducible or otherwise apparent).[53] This is not data mining per se, but a result of the preparation of data before – and for the purposes of – the analysis. The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has Data mining 24

access to the newly compiled data set, to be able to identify specific individuals, especially when the data were originally anonymous. It is recommended that an individual is made aware of the following before data are collected[53]: • the purpose of the data collection and any (known) data mining projects • how the data will be used • who will be able to mine the data and use the data and their derivatives • the status of security surrounding access to the data • how collected data can be updated. In America, privacy concerns have been addressed to some extent by the US Congress via the passage of regulatory controls such as the Health Insurance Portability and Accountability Act (HIPAA). The HIPAA requires individuals to give their "informed consent" regarding information they provide and its intended present and future uses. According to an article in Biotech Business Week', "'[i]n practice, HIPAA may not offer any greater protection than the longstanding regulations in the research arena,' says the AAHC. More importantly, the rule's goal of protection through informed consent is undermined by the complexity of consent forms that are required of patients and participants, which approach a level of incomprehensibility to average individuals."[54] This underscores the necessity for data anonymity in data aggregation and mining practices. Data may also be modified so as to become anonymous, so that individuals may not readily be identified.[53] However, even "de-identified"/"anonymized" data sets can potentially contain enough information to allow identification of individuals, as occurred when journalists were able to find several individuals based on a set of search histories that were inadvertently released by AOL.[55]

Software

Free open-source data mining software and applications 1. Carrot2: Text and search results clustering framework. 2. Chemicalize.org: A chemical structure miner and web search engine. 3. ELKI: A university research project with advanced cluster analysis and outlier detection methods written in the Java language. 4. GATE: a natural language processing and language engineering tool. 5. JHepWork: Java cross-platform data analysis framework developed at Argonne National Laboratory. 6. KNIME: The Konstanz Information Miner, a user friendly and comprehensive data analytics framework. 7. NLTK (Natural Language Toolkit): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python language. 8. : A component-based data mining and machine learning software suite written in the Python language. 9. R: A programming language and software environment for statistical computing, data mining, and graphics. It is part of the GNU project. 10. RapidMiner: An environment for machine learning and data mining experiments. 11. UIMA: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM. 12. Weka: A suite of machine learning software applications written in the Java programming language. 13. ML-Flex: A software package that enables users to integrate with third-party machine-learning packages written in any programming language, execute classification analyses in parallel across multiple computing nodes, and produce HTML reports of classification results. In 2010, the open source R language overtook other tools to become the application used by more data miners (43%) than any other, according to a well-known annual survey.[47] Data mining 25

Commercial data-mining software and applications • Microsoft Analysis Services: data mining software provided by Microsoft • SAS: Enterprise Miner – data mining software provided by the SAS Institute. • : Data Miner – data mining software provided by StatSoft. • Oracle Data Mining: data mining software by Oracle. • LIONsolver: an integrated software application for data mining, business intelligence, and modeling that implements the Learning and Intelligent OptimizatioN (LION) approach According to Rexer's Annual Data Miner Survey in 2010, IBM SPSS Modeler, STATISTICA Data Miner, and the R received the strongest satisfaction ratings.[47]

Marketplace surveys Several researchers and organizations have conducted reviews of data mining tools and surveys of data miners. These identify some of the strengths and weaknesses of the software packages. They also provide an overview of the behaviors, preferences and views of data miners. Some of these reports include: • Annual Rexer Analytics Data Miner Surveys[47] • Forrester Research 2010 Predictive Analytics and Data Mining Solutions report[56] • Gartner 2008 "Magic Quadrant" report[57] • Haughton et al.'s 2003 Review of Data Mining Software Packages in The American Statistician[58] • Robert A. Nisbet's 2006 Three Part Series of articles "Data Mining Tools: Which One is Best For CRM?"[59] • 2011 Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery[60]

References

[1] Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases" (http:/ /

www. kdnuggets. com/ gpspubs/ aimag-kdd-overview-1996-Fayyad. pdf). . Retrieved 17 December 2008.

[2] "Data Mining Curriculum" (http:/ / www. sigkdd. org/ curriculum. php). ACM SIGKDD. 2006-04-30. . Retrieved 2011-10-28.

[3] Clifton, Christopher (2010). "Encyclopædia Britannica: Definition of Data Mining" (http:/ / www. britannica. com/ EBchecked/ topic/

1056150/ data-mining). . Retrieved 2010-12-09. [4] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction"

(http:/ / www-stat. stanford. edu/ ~tibs/ ElemStatLearn/ ). . Retrieved 2012-08-07. [5] Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12-374856-0. [6] Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010). "WEKA Experiences with a Java open-source project". Journal of Machine Learning Research 11: 2533–2541. "the original title, "Practical machine learning", was changed [...] The term "data mining" was [added] primarily for marketing reasons." [7] Mena, Jesús (2011). Machine Learning Forensics for Law Enforcement, Security, and Intelligence. Boca Raton, FL: CRC Press (Taylor & Francis Group). ISBN 9-781-4398-6069-4.

[8] Piatetsky-Shapiro, Gregory; Parker, Gary (2011). "Lesson: Data Mining, and Knowledge Discovery: An Introduction" (http:/ / www.

kdnuggets. com/ data_mining_course/ x1-intro-to-data-mining-notes. html). Introduction to Data Mining. KD Nuggets. . Retrieved 30 August 2012. [9] Kantardzic, Mehmed (2003). Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons. ISBN 0-471-22852-4. OCLC 50055336.

[10] Proceedings (http:/ / www. kdd. org/ conferences. php), International Conferences on Knowledge Discovery and Data Mining, ACM, New York.

[11] SIGKDD Explorations (http:/ / www. kdd. org/ explorations/ about. php), ACM, New York. [12] Günnemann, Stephan; Kremer, Hardy; Seidl, Thomas (2011). "An extension of the PMML standard to subspace clustering models". Proceedings of the 2011 workshop on Predictive markup language modeling - PMML '11. pp. 48. doi:10.1145/2023598.2023605. ISBN 9781450308373. [13] Monk, Ellen; Wagner, Bret (2006). Concepts in Enterprise Resource Planning, Second Edition. Boston, MA: Thomson Course Technology. ISBN 0-619-21663-8. OCLC 224465825.

[14] Battiti, Roberto; and Brunato, Mauro; Reactive Business Intelligence. From Data to Models to Insight (http:/ / www.

reactivebusinessintelligence. com/ ), Reactive Search Srl, Italy, February 2011. ISBN 978-88-905795-0-9. Data mining 26

[15] Battiti, Roberto; Passerini, Andrea (2010). "Brain-Computer Evolutionary Multi-Objective Optimization (BC-EMO): a genetic algorithm

adapting to the decision maker" (http:/ / rtm. science. unitn. it/ ~battiti/ archive/ bcemo. pdf). IEEE Transactions on Evolutionary Computation 14 (15): 671–687. doi:10.1109/TEVC.2010.2058118. .

[16] Fountain, Tony; Dietterich, Thomas; and Sudyka, Bill (2000); Mining IC Test Data to Optimize VLSI Testing (http:/ / web. engr.

oregonstate. edu/ ~tgd/ publications/ kdd2000-dlft. pdf), in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM Press, pp. 18–25 [17] Zhu, Xingquan; Davidson, Ian (2007). Knowledge Discovery and Data Mining: Challenges and Realities. New York, NY: Hershey. p. 18. ISBN 978-1-59904-252-7. [18] McGrail, Anthony J.; Gulski, Edward; Allan, David; Birtwhistle, David; Blackburn, Trevor R.; Groot, Edwin R. S.. "Data Mining Techniques to Assess the Condition of High Voltage Electrical Plant". CIGRÉ WG 15.11 of Study Committee 15. [19] Baker, Ryan S. J. d.. "Is Gaming the System State-or-Trait? Educational Data Mining Through the Multi-Contextual Application of a Validated Behavioral Model". Workshop on Data Mining for User Modeling 2007. [20] Superby Aguirre, Juan Francisco; Vandamme, Jean-Philippe; Meskens, Nadine. "Determination of factors influencing the achievement of the first-year university students using data mining methods". Workshop on Educational Data Mining 2006. [21] Zhu, Xingquan; Davidson, Ian (2007). Knowledge Discovery and Data Mining: Challenges and Realities. New York, NY: Hershey. pp. 163–189. ISBN 978-1-59904-252-7. [22] Zhu, Xingquan; Davidson, Ian (2007). Knowledge Discovery and Data Mining: Challenges and Realities. New York, NY: Hershey. pp. 31–48. ISBN 978-1-59904-252-7. [23] Chen, Yudong; Zhang, Yi; Hu, Jianming; Li, Xiang (2006). "Traffic Data Analysis Using Kernel PCA and Self-Organizing Map". IEEE Intelligent Vehicles Symposium. [24] Bate, Andrew; Lindquist, Marie; Edwards, I. Ralph; Olsson, Sten; Orre, Roland; Lansner, Anders; and de Freitas, Rogelio Melhado; A

Bayesian neural network method for adverse drug reaction signal generation (http:/ / dml. cs. byu. edu/ ~cgc/ docs/ atdm/ W11/

BCPNN-ADR. pdf), European Journal of Clinical Pharmacology 1998 Jun; 54(4):315–21 (http:/ / www. ncbi. nlm. nih. gov/ pubmed/ 9696956) [25] Norén, G. Niklas; Bate, Andrew; Hopstadius, Johan; Star, Kristina; and Edwards, I. Ralph (2008); Temporal Pattern Discovery for Trends and Transient Effects: Its Application to Patient Records. Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining (SIGKDD 2008), Las Vegas, NV, pp. 963–971.

[26] Zernik, Joseph; Data Mining as a Civic Duty – Online Public Prisoners’ Registration Systems (http:/ / www. scribd. com/ doc/ 38328591/ ), International Journal on Social Media: Monitoring, Measurement, Mining, 1: 84–96 (2010)

[27] Zernik, Joseph; Data Mining of Online Judicial Records of the Networked US Federal Courts (http:/ / www. scribd. com/ doc/ 38328585/ ), International Journal on Social Media: Monitoring, Measurement, Mining, 1:69–83 (2010)

[28] http:/ / articles. latimes. com/ 2011/ jun/ 24/ nation/ la-na-court-drugs-20110624 [29] Healey, Richard G. (1991); Database Management Systems, in Maguire, David J.; Goodchild, Michael F.; and Rhind, David W., (eds.), Geographic Information Systems: Principles and Applications, London, GB: Longman [30] Camara, Antonio S.; and Raper, Jonathan (eds.) (1999); Spatial Multimedia and Virtual Reality, London, GB: Taylor and Francis [31] Miller, Harvey J.; and Han, Jiawei (eds.) (2001); Geographic Data Mining and Knowledge Discovery, London, GB: Taylor & Francis [32] doi: 10.3390/s8063601

This citation will be automatically completed in the next few minutes. You can jump the queue or expand by hand (http:/ / en. wikipedia. org/

wiki/ Template:cite_doi/ _10. 3390. 2fs8063601?preload=Template:Cite_doi/ preload& editintro=Template:Cite_doi/ editintro& action=edit) [33] Ma, Yajie; Guo, Yike; Tian, Xiangchuan; Ghanem, Moustafa (2011). "Distributed Clustering-Based Aggregation Algorithm for Spatial Correlated Sensor Networks". IEEE Sensors Journal 11 (3): 641. doi:10.1109/JSEN.2010.2056916. [34] Zhao, Kaidi; and Liu, Bing; Tirpark, Thomas M.; and Weimin, Xiao; A Visual Data Mining Framework for Convenient Identification of

Useful Knowledge (http:/ / dl. acm. org/ citation. cfm?id=1106390)

[35] Keim, Daniel A.; Information Visualization and Visual Data Mining (http:/ / citeseer. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 135. 7051)

[36] Pachet, François; Westermann, Gert; and Laigre, Damien; Musical Data Mining for Electronic Music Distribution (http:/ / www. csl. sony.

fr/ downloads/ papers/ 2001/ pachet01c. pdf), Proceedings of the 1st WedelMusic Conference,Firenze, Italy, 2001, pp. 101–106. [37] Government Accountability Office, Data Mining: Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks, GAO-07-293 (February 2007), Washington, DC

[38] Secure Flight Program report (http:/ / www. msnbc. msn. com/ id/ 20604775/ ), MSNBC

[39] "Total/Terrorism Information Awareness (TIA): Is It Truly Dead?" (http:/ / w2. eff. org/ Privacy/ TIA/ 20031003_comments. php). Electronic Frontier Foundation (official website). 2003. . Retrieved 2009-03-15. [40] Agrawal, Rakesh; Mannila, Heikki; Srikant, Ramakrishnan; Toivonen, Hannu; and Verkamo, A. Inkeri; Fast discovery of association rules, in Advances in knowledge discovery and data mining, MIT Press, 1996, pp. 307–328 [41] National Research Council, Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, Washington, DC: National Academies Press, 2008 [42] Haag, Stephen; Cummings, Maeve; Phillips, Amy (2006). Management Information Systems for the information age. Toronto: McGraw-Hill Ryerson. p. 28. ISBN 0-07-095569-7. OCLC 63194770. Data mining 27

[43] Ghanem, Moustafa; Guo, Yike; Rowe, Anthony; Wendel, Patrick (2002). "Grid-based knowledge discovery services for high throughput informatics". Proceedings 11th IEEE International Symposium on High Performance Distributed Computing. pp. 416. doi:10.1109/HPDC.2002.1029946. ISBN 0-7695-1686-6. [44] Ghanem, Moustafa; Curcin, Vasa; Wendel, Patrick; Guo, Yike (2009). "Building and Using Analytical Workflows in Discovery Net". Data Mining Techniques in Grid Computing Environments. pp. 119. doi:10.1002/9780470699904.ch8. ISBN 9780470699904.

[45] Cannataro, Mario; Talia, Domenico (January 2003). "The Knowledge Grid: An Architecture for Distributed Knowledge Discovery" (http:/ /

grid. deis. unical. it/ papers/ pdf/ CACM2003. pdf). Communications of the ACM 46 (1): 89–93. doi:10.1145/602421.602425. . Retrieved 17 October 2011.

[46] Talia, Domenico; Trunfio, Paolo (July 2010). "How distributed data mining tasks can thrive as knowledge services" (http:/ / grid. deis.

unical. it/ papers/ pdf/ CACM2010. pdf). Communications of the ACM 53 (7): 132–137. doi:10.1145/1785414.1785451. . Retrieved 17 October 2011.

[47] Rexer, Karl; Allen, Heather; and Gearan, Paul (2010a); 2010 Data Miner Survey Summary (http:/ / www. rexeranalytics. com/

Data-Miner-Survey-Results-2010. html), presented at Predictive Analytics World, October 2010

[48] Rexer, Karl; Allen, Heather; and Gearan, Paul (2010b); 2010 Data Miner Survey – Overcoming Data Mining's Top Challenges (http:/ /

www. rexeranalytics. com/ Overcoming_Challenges. html)

[49] Seltzer, William. The Promise and Pitfalls of Data Mining: Ethical Issues (http:/ / www. amstat. org/ committees/ ethics/ linksdir/

Jsm2005Seltzer. pdf). .

[50] Pitts, Chip (15 March 2007). "The End of Illegal Domestic Spying? Don't Count on It" (http:/ / www. washingtonspectator. com/ articles/

20070315surveillance_1. cfm). Washington Spectator. .

[51] Taipale, Kim A. (15 December 2003). "Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data" (http:/ / www.

stlr. org/ cite. cgi?volume=5& article=2). Columbia Science and Technology Law Review 5 (2). OCLC 45263753. SSRN 546782. .

[52] Resig, John; and Teredesai, Ankur (2004). "A Framework for Mining Instant Messaging Services" (http:/ / citeseer. ist. psu. edu/

resig04framework. html). Proceedings of the 2004 SIAM DM Conference. .

[53] Think Before You Dig: Privacy Implications of Data Mining & Aggregation (http:/ / www. nascio. org/ publications/ documents/

NASCIO-dataMining. pdf), NASCIO Research Brief, September 2004 [54] Biotech Business Week Editors (June 30, 2008); BIOMEDICINE; HIPAA Privacy Rule Impedes Biomedical Research, Biotech Business Week, retrieved 17 November 2009 from LexisNexis Academic

[55] AOL search data identified individuals (http:/ / www. securityfocus. com/ brief/ 277), SecurityFocus, August 2006

[56] Kobielus, James; The Forrester Wave: Predictive Analytics and Data Mining Solutions, Q1 2010 (http:/ / www. forrester. com/ rb/ Research/

wave& trade;_predictive_analytics_and_data_mining_solutions,/ q/ id/ 56077/ t/ 2), Forrester Research, 1 July 2008

[57] Herschel, Gareth; Magic Quadrant for Customer Data-Mining Applications (http:/ / mediaproducts. gartner. com/ reprints/ / vol5/

article3/ article3. html), Gartner Inc., 1 July 2008 [58] Haughton, Dominique; Deichmann, Joel; Eshghi, Abdolreza; Sayek, Selin; Teebagy, Nicholas; and Topi, Heikki (2003); A Review of

Software Packages for Data Mining (http:/ / www. jstor. org/ pss/ 30037299), The American Statistician, Vol. 57, No. 4, pp. 290–309

[59] Nisbet, Robert A. (2006); Data Mining Tools: Which One is Best for CRM? Part 1 (http:/ / www. information-management. com/

specialreports/ 20060124/ 1046025-1. html), Information Management Special Reports, January 2006

[60] Mikut, Ralf; Reischl, Markus (September/October 2011). "Data Mining Tools" (http:/ / onlinelibrary. wiley. com/ doi/ 10. 1002/ widm. 24/ abstract). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (5): 431–445. doi:10.1002/widm.24. . Retrieved October 21, 2011.

Further reading • Cabena, Peter; Hadjnian, Pablo; Stadler, Rolf; Verhees, Jaap; and Zanasi, Alessandro (1997); Discovering Data Mining: From Concept to Implementation, Prentice Hall, ISBN 0-13-743980-6 • Feldman, Ronen; and Sanger, James; The Text Mining Handbook, Cambridge University Press, ISBN 978-0-521-83657-9 • Guo, Yike; and Grossman, Robert (editors) (1999); High Performance Data Mining: Scaling Algorithms, Applications and Systems, Kluwer Academic Publishers • Hastie, Trevor, Tibshirani, Robert and Friedman, Jerome (2001); The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, ISBN 0-387-95284-5 • Liu, Bing (2007); Web Data Mining: Exploring Hyperlinks, Contents and Usage Data, Springer, ISBN 3-540-37881-2 • Murphy, Chris (16 May 2011). "Is Data Mining Free Speech?". InformationWeek (UMB): 12. • Nisbet, Robert; Elder, John; Miner, Gary (2009); Handbook of Statistical Analysis & Data Mining Applications, Academic Press/Elsevier, ISBN 978-0-12-374765-5 Data mining 28

• Poncelet, Pascal; Masseglia, Florent; and Teisseire, Maguelonne (editors) (October 2007); "Data Mining Patterns: New Methods and Applications", Information Science Reference, ISBN 978-1-59904-162-9 • Tan, Pang-Ning; Steinbach, Michael; and Kumar, Vipin (2005); Introduction to Data Mining, ISBN 0-321-32136-7 • Theodoridis, Sergios; and Koutroumbas, Konstantinos (2009); Pattern Recognition, 4th Edition, Academic Press, ISBN 978-1-59749-272-0 • Weiss, Sholom M.; and Indurkhya, Nitin (1998); Predictive Data Mining, Morgan Kaufmann • Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12-374856-0. (See also Free Weka software) • Ye, Nong (2003); The Handbook of Data Mining, Mahwah, NJ: Lawrence Erlbaum

External links

• Data Mining Software (http:/ / www. dmoz. org/ Computers/ Software/ Databases/ Data_Mining/ ) at the Open Directory Project

Process mining

Process mining is a process management technique that allows for the analysis of business processes based on event logs. The basic idea is to extract knowledge from event logs recorded by an information system. Process mining aims at improving this by providing techniques and tools for discovering process, control, data, organizational, and social structures from event logs.[1]

Overview Process mining techniques are often used when no formal description of the process can be obtained by other approaches, or when the quality of an existing documentation is questionable. For example, the audit trails of a workflow management system, the transaction logs of an enterprise resource planning system, and the electronic patient records in a hospital can be used to discover models describing processes, organizations, and products. Moreover, such event logs can also be used to compare event logs with some prior model to see whether the observed reality conforms to some prescriptive or descriptive model. Contemporary management trends such as BAM (Business Activity Monitoring), BOM (Business Operations Management), BPI (business process intelligence) illustrate the interest in supporting the diagnosis functionality in the context of Business Process Management technology (e.g., Workflow Management Systems but also other process-aware information systems).

Application According to relevant sources[2], process mining follows the options known from business process engineering and goes beyond with feedback to business process modeling: • process analysis filters, orders and compresses logfiles for further insight in to the connex of process operations. • process design may be supported by feedback from process monitoring, which means basically action or event logging • process enactment uses results from process mining based on logging for triggering further process operation Process mining 29

Classification There are three classes of process mining techniques. This classification is based on whether there is a prior model and, if so, how it is used. • Discovery: There is no a priori model, i.e., based on an event log some model is constructed a process model can be discovered based on low-level events. For example, using the alpha algorithm, which is a didactically driven approach, where the authors state the lack of analytiic capability for large event data volumes with such simple method. [3] . There exist many techniques to automatically construct process models (e.g., in terms of a Petri net) based some event log.[3][4][5][6][7] Recently, process mining research also started to target the other perspectives (e.g., data, resources, time, etc.). For example, the technique described in (Aalst, Reijers, & Song, 2005)[8] can be used to construct a social network. • Conformance analysis: There is an a priori model. This model is compared with the event log and discrepancies between the log and the model are analyzed. For example, there may be a process model indicating that purchase orders of more than 1 million euro require two checks. Another example is the checking of the so-called “four-eyes” principle. Conformance checking may be used to detect deviations to enrich the model. An example is the extension of a process model with performance data, i.e., some a priori process model is used to project the bottlenecks on. Another example is the decision miner described in (Rozinat & Aalst, 2006b)[9] which takes an a priori process model and analyzes every choice in the process model. For each choice the event log is consulted to see which information is typically available the moment the choice is made. Then classical data mining techniques are used to see which data elements influence the choice. As a result, a decision tree is generated for each choice in the process. • Extension: There is a prior model also. This model is extended with a new aspect or perspective, i.e., the goal is not to check conformance but to enrich the model. An example is the extension of a process model with performance data, i.e., some prior process model dynamically annotated with performance data (e.g., bottlenecks are shown by coloring parts of the process model). See the book Process Mining: Discovery, Conformance and Enhancement of Business Processes by Wil van der Aalst for details.

Software for process mining A software framework for the evaluation of process mining algorithms has been developed at the Eindhoven University of Technology by Wil van der Aalst and others, and is available as an open source toolkit. • Process Mining[10] • ProM Framework[11] • ProM Import Framework[12] Process Mining functionality is also offered by the following commercial vendors[13]: • Celonis Discovery [14], a Process Mining and Business Intelligence plattform for ERP systems like SAP or Oracle • Celonis Orchestra [15], a Process Mining and Business Intelligence plattform for IT Service Management systems like BMC Remdy or HP Service Manager • Futura Reflect[16], a Process Mining and Process Intelligence suite developed by Futura Technology • Interstage Automated Process Discovery,[17] a Process Mining service offered by Fujitsu, Ltd. as part of the Interstage Integration Middleware Suite. • BPMone[18], offering both basic process mining functionality as well as a more comprehensive process mining module as part of the Pallas Athena BPMone software suite. • Nitro[19] is a tool by Fluxicon[20] for easily converting CSV and XLS event logs for ProM [21]. • ARIS Process Performance Manager[22], a Process Mining and Process Intelligence Tool offered by Software AG as part of the Process Intelligence Solution. Process mining 30

• QPR ProcessAnalyzer[23], a tool for Automated Business Process Discovery, and QPR ProcessAnalysis[24], a service based on the aforementioned, offered by QPR Software Plc[25] • Disco[26] is a complete process mining software by Fluxicon[27].

Related links • Business Process Discovery • Business Process Management • Workflow • Scientific workflow system • Sequence mining • Data mining

References

[1] Process mining website (http:/ / www. processmining. org). Accessed April 18th, 2011. [2] Process Mining: Discovery, Conformance and Enhancement of Business Processes, Springer Verlag, Berlin (ISBN 978-3-642-19344-6).

(http:/ / springer. com/ 978-3-642-19344-6) [3] Aalst, W. van der, Weijters, A., & Maruster, L. (2004). Workflow Mining: Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineering, 16 (9), 1128–1142. [4] Agrawal, R., Gunopulos, D., & Leymann, F. (1998). Mining Process Models from Workflow Logs. In Sixth international conference on extending database technology (pp. 469–483). [5] Cook, J., & Wolf, A. (1998). Discovering Models of Software Processes from Event-Based Data. ACM Transactions on Software Engineering and Methodology, 7 (3), 215–249. [6] Datta, A. (1998). Automating the Discovery of As-Is Business Process Models: Probabilistic and Algorithmic Approaches. Information Systems Research, 9 (3), 275–301. [7] Weijters, A., & Aalst, W. van der (2003). Rediscovering Workflow Models from Event-Based Data using Little Thumb. Integrated Computer-Aided Engineering, 10 (2), 151–162. [8] Aalst, W. van der, Beer, H., & Dongen, B. van (2005). Process Mining and Verification of Properties: An Approach based on Temporal Logic. In R. Meersman & Z. T. et al. (Eds.), On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2005 (Vol. 3760, pp. 130–147). Springer-Verlag, Berlin. [9] Rozinat, A., & Aalst, W. van der (2006a). Conformance Testing: Measuring the Fit and Appropriateness of Event Logs and Process Models. In C. Bussler et al. (Ed.), BPM 2005 Workshops (Workshop on Business Process Intelligence) (Vol. 3812, pp. 163–176). Springer-Verlag, Berlin.

[10] Process Mining (http:/ / www. processmining. org)

[11] Prom Framework (http:/ / prom. sourceforge. net)

[12] Prom Import Framework (http:/ / promimport. sourceforge. net)

[13] Bloor Research: Automating Business Process Discovery (http:/ / www. bloorresearch. com/ analysis/ 11670/

automating-business-process-discovery. html)

[14] Celonis Discovery (http:/ / www. celonis. de/ en/ discovery. html)

[15] Celonis Orchestra (http:/ / www. celonis. de/ en/ orchestra. html)

[16] Futura Reflect (http:/ / www. futuratech. nl/ site/ index. php?option=com_content& view=article& id=72& Itemid=54)

[17] Interstage Automated Process Discovery (http:/ / www. fujitsu. com/ global/ services/ software/ interstage/ abpd)

[18] Pallas Athena Process Analysis (http:/ / www. pallas-athena. com/ business-process-management-software-products/ process-mining. html)

[19] Nitro (http:/ / fluxicon. com/ nitro)

[20] Fluxicon (http:/ / fluxicon. com)

[21] http:/ / prom. sf. net

[22] (http:/ / www. softwareag. com/ corporate/ products/ aris_platform/ aris_controlling/ aris_process_performance/ overview/ default. asp)

[23] QPR ProcessAnalyzer (http:/ / www. qpr. com/ products/ qpr-processanalyzer. htm)

[24] QPR ProcessAnalysis (http:/ / www. qpr. com/ services/ qpr-process-analysis. htm)

[25] QPR Software Plc (http:/ / www. qpr. com)

[26] Disco (http:/ / fluxicon. com/ disco)

[27] Fluxicon (http:/ / fluxicon. com) Process mining 31

Further reading • Aalst, W. van der (2011). Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer Verlag, Berlin (ISBN 978-3-642-19344-6). • Aalst, W. van der, Dongen, B. van, Herbst, J., Maruster, L., Schimm, G., & Weijters, A. (2003). Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering, 47 (2), 237–267. • Aalst, W. van der, Reijers, H., & Song, M. (2005). Discovering Social Networks from Event Logs. Computer Supported Cooperative work, 14 (6), 549–593. • Jans, M., van der Werf, J.M., Lybaert, N., Vanhoof, K. (2011) A business process mining application for internal transaction fraud mitigation, Expert Systems with Applications, 38 (10), 13351–13359 • Dongen, B. van, Medeiros, A., Verbeek, H., Weijters, A., & Aalst, W. van der (2005). The ProM framework: A New Era in Process Mining Tool Support. In G. Ciardo & P. Darondeau (Eds.), Application and Theory of Petri Nets 2005 (Vol. 3536, pp. 444–454). Springer-Verlag, Berlin. • Dumas, M., Aalst, W. van der, & Hofstede, A. ter (2005). Process-Aware Information Systems: Bridging People and Software through Process Technology. Wiley & Sons. • Grigori, D., Casati, F., Castellanos, M., Dayal, U., Sayal, M., & Shan, M. (2004). Business Process Intelligence. Computers in Industry, 53 (3), 321–343. • Grigori, D., Casati, F., Dayal, U., & Shan, M. (2001). Improving Business Process Quality through Exception Understanding, Prediction, and Prevention. In P. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, & R. Snodgrass (Eds.), Proceedings of 27th international conference on Very Large Data Bases (VLDB’01) (pp. 159–168). Morgan Kaufmann. • IDS Scheer. (2002). ARIS Process Performance Manager (ARIS PPM): Measure, Analyze and Optimize Your Business Process Performance (whitepaper). • Ingvaldsen, J.E., & J.A. Gulla. (2006). Model Based Business Process Mining. Journal of Information Systems Management, Vol. 23, No. 1, Special Issue on Business Intelligence, Auerbach Publications • zur Muehlen, M. (2004). Workflow-based Process Controlling: Foundation, Design and Application of workflow-driven Process Information Systems. Logos, Berlin. • zur Muehlen, M., & Rosemann, M. (2000). Workflow-based Process Monitoring and Controlling – Technical and Organizational Issues. In R. Sprague (Ed.), Proceedings of the 33rd Hawaii international conference on system science (HICSS-33) (pp. 1–10). IEEE Computer Society Press, Los Alamitos, California. • Rozinat, A., & Aalst, W. van der (2006b). Decision Mining in ProM. In S. Dustdar, J. Faideiro, & A. Sheth (Eds.), International Conference on Business Process Management (BPM 2006) (Vol. 4102, pp. 420–425). Springer-Verlag, Berlin. • Sayal, M., Casati, F., Dayal, U., & Shan, M. (2002). Business Process Cockpit. In Proceedings of 28th international conference on very large data bases (VLDB’02) (pp. 880–883). Morgan Kaufmann. • Huser V, Starren JB, EHR Data Pre-processing Facilitating Process Mining: an Application to Chronic Kidney

Disease. AMIA Annu Symp Proc 2009 link (http:/ / independent. academia. edu/ VojtechHuser/ Papers/ 990979/ EHR_Data_Pre-processing_Facilitating_Process_Mining_an_Application_to_Chronic_Kidney_Disease)

External links

• Process mining research (http:/ / www. processmining. org) at Eindhoven University of Technology, the Netherlands.

• Process mining research (http:/ / processmining. ugent. be) at Ghent University, Belgium. Complex event processing 32 Complex event processing

Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events),[1] and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources[2] to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats)[3] and respond to them as quickly as possible. These events may be happening across the various layers of an organization as sales leads, orders or customer service calls. Or, they may be news items,[4] text , social media posts, stock market feeds, traffic reports, weather reports, or other kinds of data.[1] An event may also be defined as a "change of state," when a measurement exceeds a predefined threshold of time, temperature, or other value. Analysts suggest that CEP will give organizations a new way to analyze patterns in real-time, and help the business side communicate better with IT and service departments.[5] The vast amount of information available about events is sometimes referred to as the event cloud.[1]

Conceptual description Among thousands of incoming events, a monitoring system may for instance receive the following three from the same source: 1. church bells ringing. 2. the appearance of a man in a tuxedo with a woman in a flowing white gown. 3. rice flying through the air. From these events the monitoring system may infer a complex event: a wedding. CEP as a technique helps discover complex events by analyzing and correlating other events:[6] the bells, the man and woman in wedding attire and the rice flying through the air. CEP relies on a number of techniques,[7] including: • Event-pattern detection • Event abstraction • Modeling event hierarchies • Detecting relationships (such as causality, membership or timing) between events • Abstracting event-driven processes Commercial applications of CEP include algorithmic stock-trading,[8] the detection of credit-card fraud, business activity monitoring, and security monitoring.[9]

Related concepts CEP is used in Operational Intelligence (OI) solutions to provide insight into business operations by running query analysis against live feeds and event data. OI solutions use real-time data to collect and correlate against historical data to provide insight into and analysis of the current situation. Multiple sources of data can be combined from different organizational silos to provide a common operating picture that uses current information. Wherever real-time insight has the greatest value, OI solutions can be applied to deliver the information and need. In network management, systems management, application management and service management, people usually refer instead to event correlation. As CEP engines, event correlation engines (event correlators) analyze a mass of events, pinpoint the most significant ones, and trigger actions. However, most of them do not produce new inferred events. Instead, they relate high-level events with low-level events.[10] Complex event processing 33

Inference engines, e.g. rule-based reasoning engines typically produce inferred information in artificial intelligence. However, they do not usually produce new information in the form of complex (i.e., inferred) events.

Example A more systemic example of CEP involves a car, some sensors and various events and reactions. Imagine that a car has several sensors—one that measures tire pressure, one that measures speed, and one that detects if someone sits on a seat or leaves a seat. In the first situation, the car is moving and the pressure of one of the tires moves from 45 psi (pound per square inch) to 41 psi over 15 minutes. As the pressure in the tire is decreasing, a series of events containing the tire pressure is generated. In addition, a series of events containing the speed of the car is generated. The car's Event Processor may detect a situation whereby a loss of tire pressure over a relatively long period of time results in the creation of the "lossOfTirePressure" event. This new event may trigger a reaction process to note the pressure loss into the car's maintenance log, and alert the driver via the car's portal that the tire pressure has reduced. In the second situation, the car is moving and the pressure of one of the tires drops from 45 psi to 20 psi in 5 seconds. A different situation is detected—perhaps because the loss of pressure occurred over a shorter period of time, or perhaps because the difference in values between each event were larger than a predefined limit. The different situation results in a new event "blowOutTire" being generated. This new event triggers a different reaction process to immediately alert the driver and to initiate onboard computer routines to assist the driver in bringing the car to a stop without losing control through skidding. In addition, events that represent detected situations can also be combined with other events in order to detect more complex situations. For example, in the final situation the car was moving normally but suffers a blown tire which results in the car leaving the road and striking a tree and the driver is thrown from the car. A series of different situations are rapidly detected. The combination of "blowOutTire", "zeroSpeed" and "driverLeftSeat" within a very short space of time results in a new situation being detected: "occupantThrownAccident". Even though there is no direct measurement that can determine conclusively that the driver was thrown, or that there was an accident, the combination of events allows the situation to be detected and a new event to be created to signify the detected situation. This is the essence of a complex (or composite) event. It is complex because one cannot directly detect the situation; one has to infer or deduce that the situation has occurred from a combination of other events.

Types Most CEP solutions and concepts can be classified into two main categories: 1. Computation-oriented CEP 2. Detection-oriented CEP A computation-oriented CEP solution is focused on executing on-line algorithms as a response to event data entering the system. A simple example is to continuously calculate an average based in data on the inbound events. Detection-oriented CEP is focused on detecting combinations of events called events patterns or situations. A simple example of detecting a situation is to look for a specific sequence of events. Complex event processing 34

Integrating CEP with Business Process Management Of course, rarely does the application of a new technology exist in isolation. A natural fit for CEP has been with business process management, or BPM.[11] BPM very much focuses on end-to-end business processes, in order to continuously optimize and align for its operational environment. However, the optimization of a business does not rely solely upon its individual, end-to-end processes. Seemingly disparate processes can affect each other significantly. Consider this scenario: In the aerospace industry, it is good practice to monitor breakdowns of vehicles to look for trends (determine potential weaknesses in manufacturing processes, material, etc.). Another separate process monitors current operational vehicles' life cycles and decommissions them when appropriate. Now one use for CEP is to link these separate processes, so that in the case of when the initial process (breakdown monitoring) discovers a malfunction based on metal fatigue (a significant event) an action can be created to exploit the second process (life cycle) to issue a recall on vehicles using the same batch of metal discovered as faulty in the initial process. The integration of CEP and BPM must exist at two levels, both at the business awareness level (users must understand the potential holistic benefits of their individual processes) and also at the technological level (there needs to be a method by which CEP can interact with BPM implementation). Computation-oriented CEP's role can arguably be seen to overlap with Business Rule technology. For example, customer service centers are using CEP for click-stream analysis and customer experience management. CEP software can factor real-time information about millions of events (clicks or other interactions) per second into business intelligence and other decision-support applications. These "recommendation applications" help agents provide personalized service based on each customer's experience. The CEP application may collect data about what customers on the phone are currently doing, or how they have recently interacted with the company in other various channels, including in-branch, or on the Web via self-service features, instant messaging and email. The application then analyzes the total customer experience and recommends scripts or next steps that guide the agent on the phone, and hopefully keep the customer happy.[12] Another example of CEP in practice is in the healthcare industry. The HyReminder system, developed by the Worcester Polytechnic Institute and UMass Medical School, continually tracks healthcare workers for hygiene compliance (e.g. sanitizing hands and wearing masks), reminding them to perform hygiene when appropriate to prevent the spread of infectious disease. Each worker wears an RFID badge that displays a green (safe), yellow (warning) or red (violation) light, depending on what behavior the RFID chip has observed.[13]

CEP in the Financial Services Industry The financial services industry was an early adopter of CEP technology, using complex event processing to structure and contextualize available data so that it could inform trading behavior, specifically algorithmic trading, by identifying opportunities or threats that indicate traders (or automatic trading systems) should buy or sell.[14] Algorithmic trading is a growing trend in competitive financial markets. CEP is expected to continue to help financial institutions improve their algorithms and be more efficient. Recent improvements in CEP technologies have made it more affordable, helping smaller firms to create trading algorithms of their own and compete with larger firms.[3] CEP has evolved from an emerging technology to an essential platform of many capital markets. The technology's most consistent growth has been in banking, serving fraud detection, online banking, and multichannel marketing initiatives.[15] Today, a wide variety of financial applications use CEP, including profit, loss, and risk management systems, order and liquidity analysis, quantitative trading and signal generation systems, and others. Complex event processing 35

Integrating CEP with Time Series Databases A time series database is a software system that is optimized for the handling of data organized by time. Time series are finite or infinite sequences of data items, where each item has an associated timestamp and the sequence of timestamps is non-decreasing. Elements of a time series are often called ticks. The timestamps are not required to be ascending (merely non-decreasing) because in practice the time resolution of some systems such as financial data sources can be quite low (milliseconds, microseconds or even nanoseconds), so consecutive events may carry equal timestamps. Time series data provides a historical context to the analysis typically associated with complex event processing. This can apply to any vertical industry such as finance[16] and cooperatively with other technologies such as BPM as described elsewhere in this document. Consider the scenario in finance where there is a need to understand historic price volatility to determine statistical thresholds of future price movements. This is helpful for both trade models and transaction cost analysis. The ideal case for CEP analysis is to view historical time series and real-time streaming data as a single time continuum. What happened yesterday, last week or last month is simply as extension of what is occurring today and what may occur in the future. An example may involve comparing current market volumes to historic volumes, prices and volatility for trade execution logic. Or the need to act upon live market prices may involve comparisons to benchmarks that include sector and index movements, whose intra-day and historic trends gauge volatility and smooth outliers.

Academic research • Stream Mill Miner [17] (UC Los Angeles) • Aurora [18] (Brandeis University, Brown University and MIT) • Borealis [19] (Brandeis University, Brown University and MIT) • Cayuga [20] (Cornell University) • ETALIS [21] (Forschungszentrum Informatik Karlsruhe and Stony Brook University) • Global Sensor Networks [22] (EPFL) • NiagaraST [23] (Portland State University) • PIPES [24] (University of Marburg) • SASE [25] (UC Berkeley/UMass Amherst) • STREAM [26] (Stanford University) • Telegraph [27] (UC Berkeley) • epZilla [28] (University of Moratuwa) • ACAIA.org [29] Acoustic Event Detection (AED) (Acoustic Computing for AI/AmI Applications)

References

[1] Luckham, David C. (2012). Event Processing for Business: Organizing the Real-Time Enterprise (http:/ / ee. stanford. edu/ ~luckham/ ). Hoboken, New Jersey: John Wiley & Sons, Inc.,. p. 3. ISBN 978-0-470-53485-4. .

[2] Schmerken, Ivy (May 15, 2008), Deciphering the Myths Around Complex Event Processing (http:/ / www. wallstreetandtech. com/

data-latency/ 207800335), Wall Street & Technology, [3] Bates, John, John Bates of Progress explains how complex event processing works and how it can simplify the use of algorithms for finding

and capturing trading opportunities (http:/ / fixglobal. com/ category/ free-tag/ aite-group), Fix Global Trading, , retrieved May 14, 2012

[4] Crosman, Penny (May 18, 2009), Aleri, Ravenpack to Feed News into Trading Algos (http:/ / www. wallstreetandtech. com/ articles/ 217500395), Wall Street & Technology,

[5] McKay, Lauren (August 13, 2009), Forrester Gives a Welcoming Wave to Complex Event Processing (http:/ / www. destinationcrm. com/

Articles/ CRM-News/ Daily-News/ Forrester-Gives-a-Welcoming-Wave-to-Complex-Event-Processing-55492. aspx), Destination CRM, [6] D. Luckham, "The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems", Addison-Wesley, 2002. [7] O. Etzion and P. Niblett, "Event Processing in Action", Manning Publications, 2010. Complex event processing 36

[8] Complex Event Processing for Trading (http:/ / fixglobal. com/ content/ secrets-revealed-trading-tools-uncover-hidden-opportunities), FIXGlobal, June 2011

[9] Details of commercial products and use cases (http:/ / www. complexevents. com) [10] J.P. Martin-Flatin, G. Jakobson and L. Lewis, "Event Correlation in Integrated Management: Lessons Learned and Outlook", Journal of Network and Systems Management, Vol. 17, No. 4, December 2007. [11] C. Janiesch, M. Matzner and O. Müller: "A Blueprint for Event-Driven Business Activity Management", Lecture Notes in Computer Science, 2011, Volume 6896/2011, 17-28, doi:10.1007/978-3-642-23059-2_4

[12] Kobielus, James (September 2008), Really Happy in Real Time (http:/ / www. destinationcrm. com/ Articles/ Columns-Departments/

Connect/ Really-Happy-in-Real-Time-50530. aspx), Destination CRM,

[13] Wang, Di; Rundensteiner, Elke A.; Ellison III, Richard T. (September, 2011), Active Complex Event Processing over Event Streams (http:/ /

www. vldb. org/ pvldb/ vol4/ p634-wang. pdf), Proceedings of the VLDB Endowment, Seattle, Washington,

[14] The Rise of Unstructured Data in Trading (http:/ / www. aitegroup. com/ Reports/ ReportDetail. aspx?recordItemID=489), Aite Group, October 29, 2008,

[15] Complex Event Processing: Beyond Capital Markets (http:/ / www. aitegroup. com/ Reports/ ReportDetail. aspx?recordItemID=870), Aite Group, November 16, 2011,

[16] "Time Series in Finance", Retrieved May 16, 2012 (http:/ / cs. nyu. edu/ shasha/ papers/ jagtalk. html)

[17] http:/ / yellowstone. cs. ucla. edu/ projects/ index. php/ SMM

[18] http:/ / www. cs. brown. edu/ research/ aurora

[19] http:/ / www. cs. brown. edu/ research/ borealis/ public/

[20] http:/ / www. cs. cornell. edu/ database/ cayuga

[21] http:/ / code. google. com/ p/ etalis

[22] http:/ / sourceforge. net/ projects/ gsn/

[23] http:/ / datalab. cs. pdx. edu/ niagara/

[24] http:/ / dbs. mathematik. uni-marburg. de/ Home/ Research/ Projects/ PIPES

[25] http:/ / sase. cs. umass. edu/

[26] http:/ / www-db. stanford. edu/ stream

[27] http:/ / telegraph. cs. berkeley. edu/

[28] http:/ / epzilla. org/

[29] http:/ / ACAIA. org/

External links

• Separating the Wheat from the Chaff (http:/ / www. rfidjournal. com/ article/ view/ 1196) Article about CEP as applied to RFID, appeared in RFID Journal

• The Event Processing Technical Society (http:/ / www. ep-ts. com)

• Complex event processing: still on the launch pad in Computerworld (http:/ / www. computerworld. com. au/

index. php/ id;918976660;fp;4;fpid;1398720840)

• Processing Flows of Information: From Data Stream to Complex Event Processing (http:/ / home. dei. polimi. it/

margara/ papers/ survey. pdf) - Survey article on Data Stream and Complex Event Processing Systems

• S4 (http:/ / s4. io/ ) Distributed Event Processing Framework from Yahoo Business performance management 37 Business performance management

Business performance management is a set of management and analytic processes that enable the management of an organization's performance to achieve one or more pre-selected goals. Synonyms for "business performance management" include "corporate performance management (CPM)"[1] and "enterprise performance management".[2][3] Business performance management is contained within approaches to business process management.[4] Business performance management has three main activities: 1. selection of goals, 2. consolidation of measurement information relevant to an organization’s progress against these goals, and 3. interventions made by managers in light of this information with a view to improving future performance against these goals. Although presented here sequentially, typically all three activities will run concurrently, with interventions by managers affecting the choice of goals, the measurement information monitored, and the activities being undertaken by the organization. Because business performance management activities in large organizations often involve the collation and reporting of large volumes of data, many software vendors, particularly those offering business intelligence tools, market products intended to assist in this process. As a result of this marketing effort, business performance management is often incorrectly understood as an activity that necessarily relies on software systems to work, and many definitions of business performance management explicitly suggest software as being a definitive component of the approach.[5] This interest in business performance management from the software community is sales-driven - "The biggest growth area in operational BI analysis is in the area of business performance management."[6] Since 1992, business performance management has been strongly influenced by the rise of the balanced scorecard framework. It is common for managers to use the balanced scorecard framework to clarify the goals of an organization, to identify how to track them, and to structure the mechanisms by which interventions will be triggered. These steps are the same as those that are found in BPM, and as a result balanced scorecard is often used as the basis for business performance management activity with organizations. In the past, owners have sought to drive strategy down and across their organizations, transform these strategies into actionable metrics and use analytics to expose the cause-and-effect relationships that, if understood, could give insight into decision-making.

History Reference to non-business performance management occurs in Sun Tzu's The Art of War. Sun Tzu claims that to succeed in war, one should have full knowledge of one's own strengths and weaknesses as well as those of one's enemies. Lack of either set of knowledge might result in defeat. Parallels between the challenges in business and those of war include: • collecting data - both internal and external • discerning patterns and meaning in the data (analyzing) • responding to the resultant information Prior to the start of the Information Age in the late 20th century, businesses sometimes took the trouble to laboriously collect data from non-automated sources. As they lacked computing resources to properly analyze the data, they often made commercial decisions primarily on the basis of intuition. As businesses started automating more and more systems, more and more data became available. However, collection often remained a challenge due to a lack of infrastructure for data exchange or due to incompatibilities Business performance management 38

between systems. Reports on the data gathered sometimes took months to generate. Such reports allowed informed long-term strategic decision-making. However, short-term tactical decision-making often continued to rely on intuition. In 1989 Howard Dresner, a research analyst at Gartner, popularized "business intelligence" (BI) as an umbrella term to describe a set of concepts and methods to improve business decision-making by using fact-based support systems. Performance management builds on a foundation of BI, but marries it to the planning-and-control cycle of the enterprise - with enterprise planning, consolidation and modeling capabilities. Increasing standards, automation, and technologies have led to vast amounts of data becoming available. Data warehouse technologies have allowed the building of repositories to store this data. Improved ETL and enterprise application integration tools have increased the timely collecting of data. OLAP reporting technologies have allowed faster generation of new reports which analyze the data. As of 2010, business intelligence has become the art of sieving through large amounts of data, extracting useful information and turning that information into actionable knowledge.

Definition and scope Business performance management consists of a set of management and analytic processes, supported by technology, that enable businesses to define strategic goals and then measure and manage performance against those goals. Core business performance management processes include financial planning, operational planning, business modeling, consolidation and reporting, analysis, and monitoring of key performance indicators linked to strategy. Business performance management involves consolidation of data from various sources, querying, and analysis of the data, and putting the results into practice.

Methodologies Various methodologies for implementing business performance management exist. The discipline gives companies a top-down framework by which to align planning and execution, strategy and tactics, and business-unit and enterprise objectives. Reactions may include the Six Sigma strategy, balanced scorecard, activity-based costing (ABC), Total Quality Management, economic value-add, integrated strategic measurement and Theory of Constraints. The balanced scorecard is the most widely adopted performance management methodology. Methodologies on their own cannot deliver a full solution to an enterprise's CPM needs. Many pure-methodology implementations fail to deliver the anticipated benefits due to lack of integration with fundamental CPM processes.

Metrics and key performance indicators Some of the areas from which bank management may gain knowledge by using business performance management include: • customer-related numbers: • new customers acquired • status of existing customers • attrition of customers (including breakup by reason for attrition) • turnover generated by segments of the customers - possibly using demographic filters • outstanding balances held by segments of customers and terms of payment - possibly using demographic filters • collection of bad debts within customer relationships • demographic analysis of individuals (potential customers) applying to become customers, and the levels of approval, rejections and pending numbers • delinquency analysis of customers behind on payments Business performance management 39

• profitability of customers by demographic segments and segmentation of customers by profitability • campaign management • real-time dashboard on key operational metrics • overall equipment effectiveness • clickstream analysis on a website • key product portfolio trackers • marketing-channel analysis • sales-data analysis by product segments • callcenter metrics Though the above list describes what a bank might monitor, it could refer to a telephone company or to a similar service-sector company. Items of generic importance include: 1. consistent and correct KPI-related data providing insights into operational aspects of a company 2. timely availability of KPI-related data 3. KPIs designed to directly reflect the efficiency and effectiveness of a business 4. information presented in a format which aids decision-making for management and decision-makers 5. ability to discern patterns or trends from organized information Business performance management integrates the company's processes with CRM or ERP. Companies should become better able to gauge customer satisfaction, control customer trends and influence shareholder value.

Application software types People working in business intelligence have developed tools that ease the work of business performance management, especially when the business-intelligence task involves gathering and analyzing large amounts of unstructured data. Tool categories commonly used for business performance management include: • OLAP — online analytical processing, sometimes simply called "analytics" (based on dimensional analysis and the so-called "hypercube" or "cube") • scorecarding, dashboarding and data visualization • data warehouses • document warehouses • text mining • DM — data mining • BPO — business performance optimization • EPM — enterprise performance management • EIS — executive information systems • DSS — decision support systems • MIS — management information systems • SEMS — strategic enterprise management software Business performance management 40

Design and implementation Questions asked when implementing a business performance management program include: • Goal-alignment queries Determine the short- and medium-term purpose of the program. What strategic goal(s) of the organization will the program address? What organizational mission/vision does it relate to? A hypothesis needs to be crafted that details how this initiative will eventually improve results / performance (i.e. a strategy map). • Baseline queries Assess current information-gathering competency. Does the organization have the capability to monitor important sources of information? What data is being collected and how is it being stored? What are the statistical parameters of this data, e.g., how much random variation does it contain? Is this being measured? • Cost and risk queries Estimate the financial consequences of a new BI initiative. Assess the cost of the present operations and the increase in costs associated with the BPM initiative. What is the risk that the initiative will fail? This risk assessment should be converted into a financial metric and included in the planning. • Customer and stakeholder queries Determine who will benefit from the initiative and who will pay. Who has a stake in the current procedure? What kinds of customers / stakeholders will benefit directly from this initiative? Who will benefit indirectly? What quantitative / qualitative benefits follow? Is the specified initiative the best or only way to increase satisfaction for all kinds of customers? How will customer benefits be monitored? What about employees, shareholders, and distribution channel members? • Metrics-related queries Information requirements need operationalization into clearly defined metrics. Decide which metrics to use for each piece of information being gathered. Are these the best metrics and why? How many metrics need to be tracked? If this is a large number (it usually is), what kind of system can track them? Are the metrics standardized, so they can be benchmarked against performance in other organizations? What are the industry standard metrics available? • Measurement methodology-related queries Establish a methodology or a procedure to determine the best (or acceptable) way of measuring the required metrics. How frequently will data be collected? Are there any industry standards for this? Is this the best way to do the measurements? How do we know that? • Results-related queries Monitor the BPM program to ensure that it meets objectives. The program itself may require adjusting. The program should be tested for accuracy, reliability, and validity. How can it be demonstrated that the BI initiative, and not something else, contributed to a change in results? How much of the change was probably random? Business performance management 41

References [1] "Introducing the CPM Suites Magic Quadrant", Lee Geishecker and Frank Buytendijk, 2 October 2002, www.gartner.com, M-17-4718

[2] Frolick, Mark N.; Thilini R. Ariyachandra (Winter 2006). "Business performance management: one truth" (http:/ / snyfarvu. farmingdale.

edu/ ~schoensr/ bpm. pdf) (PDF). Information Systems Management (www.ism-journal.com): 41–48. . Retrieved 2010-02-21. "Business Performance Management (BPM) [...] is also known and identified by other names, such as corporate performance management and enterprise performance management."

[3] "Technology-enabled Business Performance Management: Concept, Framework, and Technology" (http:/ / www. mbaforum. ir/ download/

335_Full_sanamojdeh. pdf). 3rd International Management Conference. 2005-12-20. p. 2. . Retrieved 2010-02-21. "Confusion also arises because industry experts can not agree what to call BPM, let alone how to define it, META Group and IDC use the term 'Business Performance Management', Gartner Group prefers 'Corporate Performance Management', and others favor 'Enterprise Performance Management'." [4] vom Brocke, J. & Rosemann, M. (2010), Handbook on Business Process Management: Strategic Alignment, Governance, People and Culture

(http:/ / www. bpm-handbook. com) (International Handbooks on Information Systems). Berlin: Springer

[5] BPM Mag, What is BPM? (http:/ / www. bpmmag. net/ magazine/ article. html?articleID=13966)

[6] The Next Generation of Business Intelligence: Operational BI (http:/ / www. information-management. com/ issues/ 20050501/ 1026064-1.

html) White, Colin (May 2005). "The Next Generation of Business Intelligence: Operational BI" (http:/ / www. information-management.

com/ issues/ 20050501/ 1026064-1. html?zkPrintable=true). Information Management Magazine. . Retrieved 2010-02-21. "The biggest growth area in operational BI analysis is in the area of business performance management (BPM)."

Further reading • Mosimann, Roland P., Patrick Mosimann and Meg Dussault, The Performance Manager. 2007 ISBN 978-0-9730124-1-5 • Performance Leadership, Frank Buytendijk, 2008, McGraw-Hill, ISBN 0071599649 • Dresner, Howard, The Performance Management Revolution: Business Results Through Insight and Action. 2007 ISBN 978-0-470-12483-3 • Dresner, Howard, Profiles in Performance: Business Intelligence Journeys and the Roadmap for Change. 2009 ISBN 978-0-470-40886-5 • Cokins, Gary, Performance Management: Integrating Strategy Execution, Methodologies, Risk, and Analytics. 2009 ISBN 978-0-470-44998-1 • Cokins, Gary, Performance Management: Finding the Missing Pieces (to Close the Intelligence Gap). 2004 ISBN 978-0-471-57690-7 • Paladino, Bob, Five Key Principles of Corporate Performance Management. 2007 ISBN 978-0-470-00991-8 • Wade, David and Ronald Recardo, Corporate Performance Management. Butterworth-Heinemann, 2001 ISBN 0-87719-386-X • Coveney, Michael, Strategy to the Max FSN Publishing Limited, 2010 ISBN 978-0-9555900-6-1

External links

• "Giving the Boss the Big Picture: A dashboard pulls up everything the CEO needs to run the show" (http:/ / www.

businessweek. com/ magazine/ content/ 06_07/ b3971083. htm). BusinessWeek. Bloomberg L.P.. 2006-02-13. Retrieved 2010-02-22.

• Business Finance: Bred Tough: The Best-of-Breed, 2009 (http:/ / businessfinancemag. com/ article/ bred-tough-best-breed-2009-0729) (July 2009) Benchmarking 42 Benchmarking

Benchmarking is the process of comparing one's business processes and performance metrics to industry bests or best practices from other industries. Dimensions typically measured are quality, time and cost. In the process of benchmarking, management identifies the best firms in their industry, or in another industry where similar processes exist, and compare the results and processes of those studied (the "targets") to one's own results and processes. In this way, they learn how well the targets perform and, more importantly, the business processes that explain why these firms are successful. The term benchmarking was first used by cobblers to measure people's feet for shoes. They would place someone's foot on a "bench" and mark it out to make the pattern for the shoes. Benchmarking is used to measure performance using a specific indicator (cost per unit of measure, productivity per unit of measure, cycle time of x per unit of measure or defects per unit of measure) resulting in a metric of performance that is then compared to others. Also referred to as "best practice benchmarking" or "process benchmarking", this process is used in management and particularly strategic management, in which organizations evaluate various aspects of their processes in relation to best practice companies' processes, usually within a peer group defined for the purposes of comparison. This then allows organizations to develop plans on how to make improvements or adapt specific best practices, usually with the aim of increasing some aspect of performance. Benchmarking may be a one-off event, but is often treated as a continuous process in which organizations continually seek to improve their practices.

Benefits and use In 2008, a comprehensive survey [1] on benchmarking was commissioned by The Global Benchmarking Network, a network of benchmarking centers representing 22 countries. Over 450 organizations responded from over 40 countries. The results showed that: 1. Mission and Vision Statements and Customer (Client) Surveys are the most used (by 77% of organizations of 20 improvement tools, followed by SWOT analysis(72%), and Informal Benchmarking (68%). Performance Benchmarking was used by 49% and Best Practice Benchmarking by 39%. 2. The tools that are likely to increase in popularity the most over the next three years are Performance Benchmarking, Informal Benchmarking, SWOT, and Best Practice Benchmarking. Over 60% of organizations that are not currently using these tools indicated they are likely to use them in the next three years.

Collaborative benchmarking Benchmarking, originally described [Rank Xerox]], is usually carried out by individual companies. Sometimes it may be carried out collaboratively by groups of companies (e.g. subsidiaries of a multinational in different countries). One example is that of the Dutch municipally-owned water supply companies, which have carried out a voluntary collaborative benchmarking process since 1997 through their industry association. Another example is the UK construction industry which has carried out benchmarking since the late 1990s again through its industry association and with financial support from the UK Government.

Procedure There is no single benchmarking process that has been universally adopted. The wide appeal and acceptance of benchmarking has led to the emergence of benchmarking methodologies. One seminal book is Boxwell's Benchmarking for Competitive Advantage (1994).[2] The first book on benchmarking, written and published by Kaiser Associates,[3] is a practical guide and offers a seven-step approach. Robert Camp (who wrote one of the earliest books on benchmarking in 1989)[4] developed a 12-stage approach to benchmarking. Benchmarking 43

The 12 stage methodology consists of: 1. Select subject 2. Define the process 3. Identify potential partners 4. Identify data sources 5. Collect data and select partners 6. Determine the gap 7. Establish process differences 8. Target future performance 9. Communicate 10. Adjust goal 11. Implement 12. Review and recalibrate The following is an example of a typical benchmarking methodology: • Identify problem areas: Because benchmarking can be applied to any business process or function, a range of research techniques may be required. They include informal conversations with customers, employees, or suppliers; exploratory research techniques such as focus groups; or in-depth marketing research, quantitative research, surveys, questionnaires, re-engineering analysis, process mapping, quality control variance reports, financial ratio analysis, or simply reviewing cycle times or other performance indicators. Before embarking on comparison with other organizations it is essential to know the organization's function and processes; base lining performance provides a point against which improvement effort can be measured. • Identify other industries that have similar processes: For instance, if one were interested in improving hand-offs in addiction treatment one would identify other fields that also have hand-off challenges. These could include air traffic control, cell phone switching between towers, transfer of patients from surgery to recovery rooms. • Identify organizations that are leaders in these areas: Look for the very best in any industry and in any country. Consult customers, suppliers, financial analysts, trade associations, and magazines to determine which companies are worthy of study. • Survey companies for measures and practices: Companies target specific business processes using detailed surveys of measures and practices used to identify business process alternatives and leading companies. Surveys are typically masked to protect confidential data by neutral associations and consultants. • Visit the "best practice" companies to identify leading edge practices: Companies typically agree to mutually exchange information beneficial to all parties in a benchmarking group and share the results within the group. • Implement new and improved business practices: Take the leading edge practices and develop implementation plans which include identification of specific opportunities, funding the project and selling the ideas to the organization for the purpose of gaining demonstrated value from the process.

Costs The three main types of costs in benchmarking are: • Visit Costs - This includes hotel rooms, travel costs, meals, a token gift, and lost labor time. • Time Costs - Members of the benchmarking team will be investing time in researching problems, finding exceptional companies to study, visits, and implementation. This will take them away from their regular tasks for part of each day so additional staff might be required. • Benchmarking Database Costs - Organizations that institutionalize benchmarking into their daily procedures find it is useful to create and maintain a database of best practices and the companies associated with each best practice now. Benchmarking 44

The cost of benchmarking can substantially be reduced through utilizing the many internet resources that have sprung up over the last few years. These aim to capture benchmarks and best practices from organizations, business sectors and countries to make the benchmarking process much quicker and cheaper.

Technical/product benchmarking The technique initially used to compare existing corporate strategies with a view to achieving the best possible performance in new situations (see above), has recently been extended to the comparison of technical products. This process is usually referred to as "technical benchmarking" or "product benchmarking". Its use is well-developed within the automotive industry ("automotive benchmarking"), where it is vital to design products that match precise user expectations, at minimal cost, by applying the best technologies available worldwide. Data is obtained by fully disassembling existing cars and their systems. Such analyses were initially carried out in-house by car makers and their suppliers. However, as these analyses are expensive, they are increasingly being outsourced to companies who specialize in this area. Outsourcing has enabled a drastic decrease in costs for each company (by cost sharing) and the development of efficient tools (standards, software).

Types • Process benchmarking - the initiating firm focuses its observation and investigation of business processes with a goal of identifying and observing the best practices from one or more benchmark firms. Activity analysis will be required where the objective is to benchmark cost and efficiency; increasingly applied to back-office processes where outsourcing may be a consideration. • Financial benchmarking - performing a financial analysis and comparing the results in an effort to assess your overall competitiveness and productivity. • Benchmarking from an investor perspective- extending the benchmarking universe to also compare to peer companies that can be considered alternative investment opportunities from the perspective of an investor. • Performance benchmarking - allows the initiator firm to assess their competitive position by comparing products and services with those of target firms. • Product benchmarking - the process of designing new products or upgrades to current ones. This process can sometimes involve reverse engineering which is taking apart competitors products to find strengths and weaknesses. • Strategic benchmarking - involves observing how others compete. This type is usually not industry specific, meaning it is best to look at other industries. • Functional benchmarking - a company will focus its benchmarking on a single function to improve the operation of that particular function. Complex functions such as Human Resources, Finance and Accounting and Information and Communication Technology are unlikely to be directly comparable in cost and efficiency terms and may need to be disaggregated into processes to make valid comparison. • Best-in-class benchmarking - involves studying the leading competitor or the company that best carries out a specific function. • Operational benchmarking - embraces everything from staffing and productivity to office flow and analysis of procedures performed.[5] • Energy benchmarking - process of collecting, analysing and relating energy performance data of comparable activities with the purpose of evaluating and comparing performance between or within entities[6]. Entities can include processes, buildings or companies. Benchmarking may be internal between entities within a single organization, or - subject to confidentiality restrictions - external between competing entities. Benchmarking 45

Tools Benchmarking software can be used to organize large and complex amounts of information. Software packages can extend the concept of benchmarking and competitive analysis by allowing individuals to handle such large and complex amounts or strategies. Such tools support different types of benchmarking (see above) and can reduce the above costs significantly.

Metric benchmarking Another approach to making comparisons involves using more aggregative cost or production information to identify strong and weak performing units. The two most common forms of quantitative analysis used in metric benchmarking are data envelope analysis (DEA) and regression analysis. DEA estimates the cost level an efficient firm should be able to achieve in a particular market. In infrastructure regulation, DEA can be used to reward companies/operators whose costs are near the efficient frontier with additional profits. Regression analysis estimates what the average firm should be able to achieve. With regression analysis, firms that performed better than average can be rewarded while firms that performed worse than average can be penalized. Such benchmarking studies are used to create yardstick comparisons, allowing outsiders to evaluate the performance of operators in an industry. Advanced statistical techniques, including stochastic frontier analysis, have been used to identify high and weak performers in industries, including applications to schools, hospitals, water utilities, and electric utilities.[7] One of the biggest challenges for metric benchmarking is the variety of metric definitions used among companies or divisions. Definitions may change over time within the same organization due to changes in leadership and priorities. The most useful comparisons can be made when metrics definitions are common between compared units and do not change so improvements can be verified.

References

[1] . http:/ / www. globalbenchmarking. org/ news-items/ findings_global_survey_on_business_imporvement_and_benchmarking. pdf.

[2] Benchmarking for Competitive Advantage (http:/ / www. amazon. com/ Benchmarking-Competitive-Advantage-Robert-Boxwell/ dp/ 0070068992). Robert J Boxwell Jr, New York: McGraw-Hill. 1994. pp. 225. ISBN 0-07-006899-2. .

[3] Beating the competition: a practical guide to Benchmarking (http:/ / www. kaiserassociates. com/ cs/ first_book_on_benchmarking). Washington, DC: Kaiser Associates. 1988. pp. 176. ISBN 978-1-56365-018-5. . [4] Camp, R. (1989). The search for industry best practices that lead to superior performance. Productivity Press.

[5] Benchmarking: How to Make the Best Decisions for Your Practice (http:/ / www. nuesoft. com/ news-events/

medical-practice-benchmarking. html) [6] prEN16231:2011 Energy Efficiency Benchmarking Methodology, Brussels: CEN, 2011, p5 (Definition 3.2)

[7] Body of Knowledge on Infrastructure Regulation (http:/ / www. regulationbodyofknowledge. org/ 04/ narrative/ 2/ ) "Incentive Regulation: Basic forms of Regulation" Text mining 46 Text mining

Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis via application of natural language processing (NLP) and analytical methods. A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.

Text mining and text analytics The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.[1] The term is roughly synonymous with text mining; indeed, Ronen Feldman modified a 2000 description of "text mining"[2] in 2004 to describe "text analytics."[3] The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s,[4] notably life-sciences research and government intelligence. The term text analytics also describes that application of text analytics to respond to business problems, whether independently or in conjunction with query and analysis of fielded, numerical data. It is a truism that 80 percent of business-relevant information originates in unstructured form, primarily text.[5] These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.

History Labor-intensive manual text mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance during the past decade. Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most information (common estimates say over 80%)[5] is currently stored as text, text mining is believed to have a high commercial potential value. Increasing interest is being paid to multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning. The challenge of exploiting the large proportion of enterprise information that originates in "unstructured" form has been recognized for decades.[6] It is recognized in the earliest definition of business intelligence (BI), in an October 1958 IBM Journal article by H.P. Luhn, A Business Intelligence System, which describes a system that will: "...utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points." Text mining 47

Yet as management information systems developed starting in the 1960s, and as BI emerged in the '80s and '90s as a software category and field of practice, the emphasis was on numerical data stored in relational databases. This is not surprising: text in "unstructured" documents is hard to process. The emergence of text analytics in its current form stems from a refocusing of research in the late 1990s from algorithm development to application, as described by Prof. Marti A. Hearst in the paper Untangling Text Data Mining:[7] For almost a decade the computational linguistics community has viewed large text collections as a resource to be tapped in order to produce better text analysis algorithms. In this paper, I have attempted to suggest a new emphasis: the use of large online text collections to discover new facts and trends about the world itself. I suggest that to make progress we do not need fully artificial intelligent text analysis; rather, a mixture of computationally-driven and user-guided analysis may open the door to exciting new results. Hearst's 1999 statement of need fairly well describes the state of text analytics technology and practice a decade later.

Text analysis processes Subtasks — components of a larger text-analytics effort — typically include: • Information retrieval or identification of a corpus is a preparatory step: collecting or identifying a set textual materials, on the Web or held in a file system, database, or content management system, for analysis. • Although some text analytics systems limit themselves to purely statistical methods, many others apply more extensive natural language processing, such as part of speech tagging, syntactic parsing, and other types of linguistic analysis. • Named entity recognition is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. Disambiguation — the use of contextual clues — may be required to decide where, for instance, "Ford" refers to a former U.S. president, a vehicle manufacturer, a movie star (Glenn or Harrison?), a river crossing, or some other entity. • Recognition of Pattern Identified Entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other pattern matches. • Coreference: identification of noun phrases and other terms that refer to the same object. • Relationship, fact, and event Extraction: identification of associations among entities and other information in text • Sentiment analysis involves discerning subjective (as opposed to factual) material and extracting various forms of attitudinal information: sentiment, opinion, mood, and emotion. Text analytics techniques are helpful in analyzing sentiment at the entity, concept, or topic level and in distinguishing opinion holder and opinion object.[8] • Quantitative text analysis is a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of psychological profiling etc.[9] Text mining 48

Applications The technology is now broadly applied for a wide variety of government, research, and business needs. Applications can be sorted into a number of categories by analysis type or by business function. Using this approach to classifying solutions, application categories include: • Enterprise Business Intelligence/Data Mining, Competitive Intelligence • E-Discovery, Records Management • National Security/Intelligence • Scientific discovery, especially Life Sciences • Sentiment Analysis Tools, Listening Platforms • Natural Language/Semantic Toolkit or Service • Publishing • Automated ad placement • Search/Information Access • Social media monitoring

Security applications Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes.[10] It is also involves in the study of text encryption/decryption.

Biomedical applications A range of text mining applications in the biomedical literature has been described.[11] One online text mining application in the biomedical literature is GoPubMed.[12] GoPubmed was the first semantic search engine on the Web. Another example is PubGene that combines biomedical text mining with network visualization as an Internet service.[13]

Software applications Text mining methods and software is also being researched and developed by major firms, including IBM and Microsoft, to further automate the mining and analysis processes, and by different firms working in the area of search and indexing in general as a way to improve their results. Within public sector much effort has been concentrated on creating software for tracking and monitoring terrorist activities.[14]

Online media applications Text mining is being used by large media companies, such as the Tribune Company, to disambiguate information and to provide readers with greater search experiences, which in turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content. Text mining 49

Marketing applications Text mining is starting to be used in marketing as well, more specifically in analytical customer relationship management. Coussement and Van den Poel (2008)[15] apply it to improve predictive analytics models for customer churn (customer attrition).[16]

Sentiment analysis Sentiment analysis may involve analysis of movie reviews for estimating how favorable a review is for a movie.[17] Such an analysis may need a labeled data set or labeling of the affectivity of words. Resources for affectivity of words and concepts have been made for WordNet[18] and ConceptNet,[19] respectively. Text has been used to detect emotions in the related area of affective computing.[20] Text based approaches to affective computing have been used on multiple corpora such as students evaluations, children stories and news stories.

Academic applications The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access. Academic institutions have also become involved in the text mining initiative: • The National Centre for Text Mining (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the University of Manchester[21] in close collaboration with the Tsujii Lab,[22] University of Tokyo.[23] NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the Joint Information Systems Committee (JISC) and two of the UK Research Councils (EPSRC & BBSRC). With an initial focus on text mining in the biological and biomedical sciences, research has since expanded into the areas of social sciences. • In the United States, the School of Information at University of California, Berkeley is developing a program called BioText to assist biology researchers in text mining and analysis.

Software and applications Text mining computer programs are available from many commercial and open source companies and sources.

Commercial • AeroText – a suite of text mining applications for content analysis. Content used can be in multiple languages. • Attensity – hosted, integrated and stand-alone text mining (analytics) software that uses natural language processing technology to address collective intelligence in social media and forums; the voice of the customer in surveys and emails; customer relationship management; e-services; research and e-discovery; risk and compliance; and intelligence analysis. • Autonomy – text mining, clustering and categorization software • Basis Technology – provides a suite of text analysis modules to identify language, enable search in more than 20 languages, extract entities, and efficiently search for and translate entities. • Clarabridge – text analytics (text mining) software, including natural language (NLP), machine learning, clustering and categorization. Provides SaaS, hosted and on-premise text and sentiment analytics that enables companies to collect, listen to, analyze, and act on the Voice of the Customer (VOC) from both external (Twitter, Text mining 50

Facebook, Yelp!, product forums, etc.) and internal sources (call center notes, CRM, Enterprise Data Warehouse, BI, surveys, emails, etc.). • Endeca Technologies – provides software to analyze and cluster unstructured text. • Expert System S.p.A. – suite of semantic technologies and products for developers and knowledge managers. • Fair Isaac – leading provider of decision management solutions powered by advanced analytics (includes text analytics). • General Sentiment - Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. • IBM LanguageWare - the IBM suite for text analytics (tools and Runtime). • IBM SPSS - provider of Modeler Premium, which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. • Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008). • LanguageWare – text analysis libraries and customization software from IBM. • Language Computer Corporation – text extraction and analysis tools, available in multiple languages. • Lexalytics - provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. • LexisNexis – provider of business intelligence solutions based on an extensive news and company information content set. LexisNexis acquired DataOps to pursue search • Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis. • SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management. SAS Text Miner rated as the third most used text mining software (9%) by Rexer's Annual Data Miner Survey in 2010.[24] • IBM SPSS – provider of IBM SPSS Modeler and IBM SPSS Text Analytics (now called IBM SPSS Modeler Premium).[25] Rated as the second (17%) and fourth (7%), respectively, most used text mining software by Rexer's Annual Data Miner Survey in 2010.[24] • StatSoft – provides STATISTICA Text Miner as an optional extension to STATISTICA Data Miner, for Predictive Analytics Solutions. Rated as the top used text mining software (19%) by Rexer's Annual Data Miner Survey in 2010.[24] • Sysomos - provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. • WordStat - Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data. • Thomson Data Analyzer – enables complex analysis on patent information, scientific publications and news. • WordStat – Content analysis and text mining software for analyzing large amounts of unstructured information. Text mining 51

Open source • Carrot2 – text and search results clustering framework. • GATE – General Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering • OpenNLP - natural language processing • Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. • RapidMiner with its Text Processing Extension – data and text mining software. Rated as the fifth most used text mining software (6%) by Rexer's Annual Data Miner Survey in 2010.[24] • Unstructured Information Management Architecture (UIMA) – a component framework to analyze unstructured content such as text, audio and video, originally developed by IBM. • The programming language R provides a framework for text mining applications in the package tm

Implications Until recently, websites most often used text-based searches, which only found documents containing specific user-defined words or phrases. Now, through use of a semantic web, text mining can find content based on meaning and context (rather than just by a specific word). Additionally, text mining software can be used to build large dossiers of information about specific people and events. For example, large datasets based on data extracted from news reports can be built to facilitate social networks analysis or counter-intelligence. In effect, the text mining software may act in a capacity similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis. Text mining is also used in some email spam filters as a way of determining the characteristics of messages that are likely to be advertisements or other unwanted material.

Notes

[1] Defining Text Analytics (http:/ / intelligent-enterprise. informationweek. com/ blog/ archives/ 2007/ 02/ defining_text_a. html)

[2] KDD-2000 Workshop on Text Mining (http:/ / www. cs. cmu. edu/ ~dunja/ CFPWshKDD2000. html)

[3] Text Analytics: Theory and Practice (http:/ / www. ir. iit. edu/ cikm2004/ tutorials. html#T2)

[4] Hobbs, Walker, and Amsler, Natural Language Access to Structured Text, 1982 (http:/ / www. aclweb. org/ anthology/ C/ C82/ C82-1020. pdf)

[5] Unstructured Data and the 80 Percent Rule (http:/ / www. clarabridge. com/ default. aspx?tabid=137& ModuleID=635& ArticleID=551)

[6] http:/ / www. b-eye-network. com/ view/ 6311

[7] http:/ / people. ischool. berkeley. edu/ ~hearst/ papers/ acl99/ acl99-tdm. html

[8] http:/ / www. clarabridge. com/ default. aspx?tabid=137& ModuleID=635& ArticleID=722

[9] http:/ / dingo. sbs. arizona. edu/ ~mehl/ eReprints/ Text%20analysis%20Handbook. pdf

[10] Alessandro Zanasi: Virtual Weapons for Real Wars: Text Mining for National Security (http:/ / extra. springer. com/ 2009/

978-3-540-88180-3/ papers/ 0053/ 00530053. pdf), E. Corchado et al. (Eds.): CISIS 2008, ASC 53, pp. 53–60, Springer 2009

[11] K. Bretonnel Cohen & Lawrence Hunter (January 2008). "Getting Started in Text Mining" (http:/ / www. ploscompbiol. org/ article/

info:doi/ 10. 1371/ journal. pcbi. 0040020). PLoS Computational Biology 4 (1): e20. doi:10.1371/journal.pcbi.0040020. PMC 2217579. PMID 18225946. .

[12] GoPubMed: exploring PubMed with the Gene Ontology, A. Doms and M. Schroeder, 2005, http:/ / nar. oxfordjournals. org/ content/ 33/

suppl_2/ W783. long [13] Tor-Kristian Jenssen, Astrid Lægreid, Jan Komorowski1 & Eivind Hovig (2001). "A literature network of human genes for high-throughput

analysis of gene expression" (http:/ / www. nature. com/ ng/ journal/ v28/ n1/ abs/ ng0501_21. html). Nature Genetics 28 (1): 21–28. doi:10.1038/ng0501-21. PMID 11326270. . • Summary: Daniel R. Masys (2001). "Linking microarray data to the literature". Nature Genetics 28 (1): 9–10. doi:10.1038/ng0501-9. PMID 11326264.

[14] Texor (http:/ / vetsky. narod2. ru/ catalog/ texor/ )

[15] Academic Papers about Analytical Customer Relationship Management (http:/ / www. textmining. UGent. be) Text mining 52

[16] Kristof Coussement, and Dirk Van den Poel (forthcoming 2008). "Integrating the Voice of Customers through Call Center Emails into a

Decision Support System for Churn Prediction" (http:/ / www. textmining. ugent. be). Information and Management. . [17] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques"

(http:/ / www. cs. cornell. edu/ home/ llee/ papers/ sentiment. pdf). Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 79–86. .

[18] Alessandro Valitutti, Carlo Strapparava, Oliviero Stock (2005). "Developing Affective Lexical Resources" (http:/ / www. psychnology. org/

File/ PSYCHNOLOGY_JOURNAL_2_1_VALITUTTI. pdf). Psychology Journal 2 (1): 61–83. . [19] Erik Cambria; Robert Speer, Catherine Havasi and Amir Hussain (2010). "SenticNet: a Publicly Available Semantic Resource for Opinion

Mining" (http:/ / www. aaai. org/ ocs/ index. php/ FSS/ FSS10/ paper/ download/ 2216/ 2617. pdf). Proceedings of AAAI CSK. pp. 14-18. . [20] Rafael A. Calvo, Sidney K. D'Mello (2010). "Affect Detection: An Interdisciplinary Review of Models,Methods, and their Applications"

(http:/ / ieeexplore. ieee. org/ xpl/ freeabs_all. jsp?arnumber=5520655). IEEE Transactions on Affective Computing 1 (1): 18–37. .

[21] The University of Manchester (http:/ / www. manchester. ac. uk)

[22] Tsujii Laboratory (http:/ / www-tsujii. is. s. u-tokyo. ac. jp/ index. html)

[23] The University of Tokyo (http:/ / www. u-tokyo. ac. jp/ index_e. html)

[24] Rexer Analytics 4th Annual Data Miner Survey - 2010 (http:/ / www. rexeranalytics. com/ Data-Miner-Survey-Results-2010. html)

[25] IBM - SPSS - Software Products (http:/ / www. . com/ software)

References • Ananiadou, S. and McNaught, J. (Editors) (2006). Text Mining for Biology and Biomedicine. Artech House Books. ISBN 978-1-58053-984-5 • Bilisoly, R. (2008). Practical Text Mining with Perl. New York: John Wiley & Sons. ISBN 978-0-470-17643-6 • Feldman, R., and Sanger, J. (2006). The Text Mining Handbook. New York: Cambridge University Press. ISBN 978-0-521-83657-9 • Indurkhya, N., and Damerau, F. (2010). Handbook Of Natural Language Processing, 2nd Edition. Boca Raton, FL: CRC Press. ISBN 978-1-4200-8592-1 • Kao, A., and Poteet, S. (Editors). Natural Language Processing and Text Mining. Springer. ISBN 1-84628-175-X • Konchady, M. Text Mining Application Programming (Programming Series). Charles River Media. ISBN 1-58450-460-9 • Manning, C., and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. ISBN 978-0-262-13360-9 • Miner, G., Elder, J., Hill. T, Nisbet, R., Delen, D. and Fast, A. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Elsevier Academic Press. ISBN 978-0-12-386979-1 • McKnight, W. (2005). "Building business intelligence: Text data mining in business intelligence". DM Review, 21-22. • Srivastava, A., and Sahami. M. (2009). Text Mining: Classification, Clustering, and Applications. Boca Raton, FL: CRC Press. ISBN 978-1-4200-5940-3

External links

• Marti Hearst: What Is Text Mining? (http:/ / people. ischool. berkeley. edu/ ~hearst/ text-mining. html) (October, 2003)

• Automatic Content Extraction, Linguistic Data Consortium (http:/ / projects. ldc. upenn. edu/ ace/ )

• Automatic Content Extraction, NIST (http:/ / www. itl. nist. gov/ iad/ 894. 01/ tests/ ace/ )

• Academic, Open Source, and Industrial tools, Alias-I (http:/ / alias-i. com/ lingpipe/ web/ competition. html) Predictive analytics 53 Predictive analytics

Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events.[1][2] In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. Predictive analytics is used in actuarial science,[3] marketing,[4] financial services,[5] insurance, telecommunications,[6] retail,[7] travel,[8] healthcare,[9] pharmaceuticals[10] and other fields. One of the most well known applications is credit scoring,[1] which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time. A well-known example would be the FICO score.

Definition Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes. It is important to note, however, that the accuracy and usability of results will depend greatly on the level of data analysis and the quality of assumptions.

Types Generally, the term predictive analytics is used to mean predictive modeling, "scoring" data with predictive models, and forecasting. However, people are increasingly using the term to describe related analytical disciplines, such as descriptive modeling and decision modeling or optimization. These disciplines also involve rigorous data analysis, and are widely used in business for segmentation and decision making, but have different purposes and the statistical techniques underlying them vary.

Predictive models Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future in order to improve marketing effectiveness. This category also encompasses models that seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision. With advancement in computing speed, individual agent modeling systems can simulate human behavior or reaction to given stimuli or scenarios. The new term for animating data specifically linked to an individual in a simulated environment is avatar analytics.

Descriptive models Descriptive models quantify relationships in data in a way that is often used to classify customers or prospects into groups. Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products. Descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do. Descriptive models can be used, for example, to categorize customers by their product preferences and life stage. Descriptive modeling tools can be utilized to develop further models that can simulate large number of individualized agents and make predictions. Predictive analytics 54

Decision models Decision models describe the relationship between all the elements of a decision — the known data (including results of predictive models), the decision, and the forecast results of the decision — in order to predict the results of decisions involving many variables. These models can be used in optimization, maximizing certain outcomes while minimizing others. Decision models are generally used to develop decision logic or a set of business rules that will produce the desired action for every customer or circumstance.

Applications Although predictive analytics can be put to use in many applications, we outline a few examples where predictive analytics has shown positive impact in recent years.

Analytical customer relationship management (CRM) Analytical Customer Relationship Management is a frequent commercial application of Predictive Analysis. Methods of predictive analysis are applied to customer data to pursue CRM objectives which is to have a holistic view of the customer no matter where their information resides in the company or the department involved. CRM uses predictive analysis in applications for marketing campaigns, sales, and customer services to name a few. These tools are required in order for a company to posture and focus their efforts effectively across the breadth of their customer base. They must analyze and understand the products in demand or have the potential for high demand, predict customer's buying habits in order to promote relevant products at multiple touch points, and proactively identify and mitigate issues that have the potential to lose customers or reduce their ability to gain new ones.

Clinical decision support systems Experts use predictive analysis in health care primarily to determine which patients are at risk of developing certain conditions, like diabetes, asthma, heart disease and other lifetime illnesses. Additionally, sophisticated clinical decision support systems incorporate predictive analytics to support medical decision making at the point of care. A working definition has been proposed by Robert Hayward of the Centre for Health Evidence: "Clinical Decision Support Systems link health observations with health knowledge to influence health choices by clinicians for improved health care."

Collection analytics Every portfolio has a set of delinquent customers who do not make their payments on time. The financial institution has to undertake collection activities on these customers to recover the amounts due. A lot of collection resources are wasted on customers who are difficult or impossible to recover. Predictive analytics can help optimize the allocation of collection resources by identifying the most effective collection agencies, contact strategies, legal actions and other strategies to each customer, thus significantly increasing recovery at the same time reducing collection costs.

Cross-sell Often corporate organizations collect and maintain abundant data (e.g. customer records, sale transactions) as exploiting hidden relationships in the data can provide a competitive advantage. For an organization that offers multiple products, predictive analytics can help analyze customers’ spending, usage and other behavior, leading to efficient cross sales, or selling additional products to current customers.[2] This directly leads to higher profitability per customer and stronger customer relationships. Predictive analytics 55

Customer retention With the number of competing services available, businesses need to focus efforts on maintaining continuous consumer satisfaction, rewarding consumer loyalty and minimizing customer attrition. Businesses tend to respond to customer attrition on a reactive basis, acting only after the customer has initiated the process to terminate service. At this stage, the chance of changing the customer’s decision is almost impossible. Proper application of predictive analytics can lead to a more proactive retention strategy. By a frequent examination of a customer’s past service usage, service performance, spending and other behavior patterns, predictive models can determine the likelihood of a customer terminating service sometime in the near future.[6] An intervention with lucrative offers can increase the chance of retaining the customer. Silent attrition, the behavior of a customer to slowly but steadily reduce usage, is another problem that many companies face. Predictive analytics can also predict this behavior, so that the company can take proper actions to increase customer activity.

Direct marketing When marketing consumer products and services, there is the challenge of keeping up with competing products and consumer behavior. Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of product versions, marketing material, communication channels and timing that should be used to target a given consumer. The goal of predictive analytics is typically to lower the cost per order or cost per action.

Fraud detection Fraud is a big problem for many businesses and can be of various types: inaccurate credit applications, fraudulent transactions (both offline and online), identity thefts and false insurance claims. These problems plague firms of all sizes in many industries. Some examples of likely victims are credit card issuers, insurance companies,[11] retail merchants, manufacturers, business-to-business suppliers and even services providers. A predictive model can help weed out the “bads” and reduce a business's exposure to fraud. Predictive modeling can also be used to identify high-risk fraud candidates in business or the public sector. Nigrini developed a risk-scoring method to identify audit targets. He describes the use of this approach to detect fraud in the franchisee sales reports of an international fast-food chain. Each location is scored using 10 predictors. The 10 scores are then weighted to give one final overall risk score for each location. The same scoring approach was also used to identify high-risk check kiting accounts, potentially fraudulent travel agents, and questionable vendors. A reasonably complex model was used to identify fraudulent monthly reports submitted by divisional controllers.[12] The Internal Revenue Service (IRS) of the United States also uses predictive analytics to mine tax returns and identify tax fraud.[11] Recent advancements in technology have also introduced predictive behavior analysis for web fraud detection. This type of solution utilizes heuristics in order to study normal web user behavior and detect anomalies indicating fraud attempts.

Portfolio, product or economy-level prediction Often the focus of analysis is not the consumer but the product, portfolio, firm, industry or even the economy. For example, a retailer might be interested in predicting store-level demand for inventory management purposes. Or the Federal Reserve Board might be interested in predicting the unemployment rate for the next year. These types of problems can be addressed by predictive analytics using time series techniques (see below). They can also be addressed via machine learning approaches which transform the original time series into a feature vector space, where the learning algorithm finds patterns that have predictive power.[13][14] Predictive analytics 56

Risk management When employing risk management techniques, the results are always to predict and benefit from a future scenario. The Capital asset pricing model (CAP-M) “predicts” the best portfolio to maximize return, Probabilistic Risk Assessment (PRA)--when combined with mini-Delphi Techniques and statistical approaches yields accurate forecasts and RiskAoA is a stand-alone predictive tool.[15] These are three examples of approaches that can extend from project to market, and from near to long term. Underwriting (see below) and other business approaches identify risk management as a predictive method.

Underwriting Many businesses have to account for risk exposure due to their different services and determine the cost needed to cover the risk. For example, auto insurance providers need to accurately determine the amount of premium to charge to cover each automobile and driver. A financial company needs to assess a borrower’s potential and ability to pay before granting a loan. For a health insurance provider, predictive analytics can analyze a few years of past medical claims data, as well as lab, pharmacy and other records where available, to predict how expensive an enrollee is likely to be in the future. Predictive analytics can help underwrite these quantities by predicting the chances of illness, default, bankruptcy, etc. Predictive analytics can streamline the process of customer acquisition by predicting the future risk behavior of a customer using application level data.[3] Predictive analytics in the form of credit scores have reduced the amount of time it takes for loan approvals, especially in the mortgage market where lending decisions are now made in a matter of hours rather than days or even weeks. Proper predictive analytics can lead to proper pricing decisions, which can help mitigate future risk of default.

Technology and Big Data influences on Predictive Analytics Big Data is a term used to describe data sets so large and complex that they become awkward to work with using traditional database management tools. The volume, variety and velocity of Big Data have introduced challenges across the board for capture, storage, search, sharing, analysis, and visualization. Examples of big data sources include web logs, RFID and sensor data, social networks, Internet search indexing, call detail records, military surveillance, and complex data in astronomic, biogeochemical, genomics, and atmospheric sciences. Thanks to technological advances in computer hardware—faster CPUs, cheaper memory, and MPP architectures-–and new technologies such as Hadoop, MapReduce, and in-database and text analytics for processing Big Data, it is now feasible to collect, analyze, and mine massive amounts of structured and unstructured data for new insights.[11] Today, exploring Big Data and using predictive analytics is within reach of more organizations than ever before.

Statistical techniques The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques.

Regression Models Regression models are the mainstay of predictive analytics. The focus lies on establishing a mathematical equation as a model to represent the interactions between the different variables in consideration. Depending on the situation, there is a wide variety of models that can be applied while performing predictive analytics. Some of them are briefly discussed below. Predictive analytics 57

Linear regression model The linear regression model analyzes the relationship between the response or dependent variable and a set of independent or predictor variables. This relationship is expressed as an equation that predicts the response variable as a linear function of the parameters. These parameters are adjusted so that a measure of fit is optimized. Much of the effort in model fitting is focused on minimizing the size of the residual, as well as ensuring that it is randomly distributed with respect to the model predictions. The goal of regression is to select the parameters of the model so as to minimize the sum of the squared residuals. This is referred to as ordinary least squares (OLS) estimation and results in best linear unbiased estimates (BLUE) of the parameters if and only if the Gauss-Markov assumptions are satisfied. Once the model has been estimated we would be interested to know if the predictor variables belong in the model – i.e. is the estimate of each variable’s contribution reliable? To do this we can check the statistical significance of the model’s coefficients which can be measured using the t-statistic. This amounts to testing whether the coefficient is significantly different from zero. How well the model predicts the dependent variable based on the value of the independent variables can be assessed by using the R² statistic. It measures predictive power of the model i.e. the proportion of the total variation in the dependent variable that is “explained” (accounted for) by variation in the independent variables.

Discrete choice models Multivariate regression (above) is generally used when the response variable is continuous and has an unbounded range. Often the response variable may not be continuous but rather discrete. While mathematically it is feasible to apply multivariate regression to discrete ordered dependent variables, some of the assumptions behind the theory of multivariate linear regression no longer hold, and there are other techniques such as discrete choice models which are better suited for this type of analysis. If the dependent variable is discrete, some of those superior methods are logistic regression, multinomial logit and probit models. Logistic regression and probit models are used when the dependent variable is binary.

Logistic regression In a classification setting, assigning outcome probabilities to observations can be achieved through the use of a logistic model, which is basically a method which transforms information about the binary dependent variable into an unbounded continuous variable and estimates a regular multivariate model (See Allison’s Logistic Regression for more information on the theory of Logistic Regression). The Wald and likelihood-ratio test are used to test the statistical significance of each coefficient b in the model (analogous to the t tests used in OLS regression; see above). A test assessing the goodness-of-fit of a classification model is the "percentage correctly predicted."

Multinomial logistic regression An extension of the binary logit model to cases where the dependent variable has more than 2 categories is the multinomial logit model. In such cases collapsing the data into two categories might not make good sense or may lead to loss in the richness of the data. The multinomial logit model is the appropriate technique in these cases, especially when the dependent variable categories are not ordered (for examples colors like red, blue, green). Some authors have extended multinomial regression to include feature selection/importance methods such as Random multinomial logit. Predictive analytics 58

Probit regression Probit models offer an alternative to logistic regression for modeling categorical dependent variables. Even though the outcomes tend to be similar, the underlying distributions are different. Probit models are popular in social sciences like economics. A good way to understand the key difference between probit and logit models is to assume that there is a latent variable z. We do not observe z but instead observe y which takes the value 0 or 1. In the logit model we assume that y follows a logistic distribution. In the probit model we assume that y follows a standard normal distribution. Note that in social sciences (e.g. economics), probit is often used to model situations where the observed variable y is continuous but takes values between 0 and 1.

Logit versus probit The Probit model has been around longer than the logit model. They behave similarly, except that the logistic distribution tends to be slightly flatter tailed. One of the reasons the logit model was formulated was that the probit model was computationally difficult due to the requirement of numerically calculating integrals. Modern computing however has made this computation fairly simple. The coefficients obtained from the logit and probit model are fairly close. However, the odds ratio is easier to interpret in the logit model. Practical reasons for choosing the probit model over the logistic model would be: • There is a strong belief that the underlying distribution is normal • The actual event is not a binary outcome (e.g., bankruptcy status) but a proportion (e.g., proportion of population at different debt levels).

Time series models Time series models are used for predicting or forecasting the future behavior of variables. These models account for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. As a result standard regression techniques cannot be applied to time series data and methodology has been developed to decompose the trend, seasonal and cyclical component of the series. Modeling the dynamic path of a variable can improve forecasts since the predictable component of the series can be projected into the future. Time series models estimate difference equations containing stochastic components. Two commonly used forms of these models are autoregressive models (AR) and moving average (MA) models. The Box-Jenkins methodology (1976) developed by George Box and G.M. Jenkins combines the AR and MA models to produce the ARMA (autoregressive moving average) model which is the cornerstone of stationary time series analysis. ARIMA(autoregressive integrated moving average models) on the other hand are used to describe non-stationary time series. Box and Jenkins suggest differencing a non stationary time series to obtain a stationary series to which an ARMA model can be applied. Non stationary time series have a pronounced trend and do not have a constant long-run mean or variance. Box and Jenkins proposed a three stage methodology which includes: model identification, estimation and validation. The identification stage involves identifying if the series is stationary or not and the presence of seasonality by examining plots of the series, autocorrelation and partial autocorrelation functions. In the estimation stage, models are estimated using non-linear time series or maximum likelihood estimation procedures. Finally the validation stage involves diagnostic checking such as plotting the residuals to detect outliers and evidence of model fit. In recent years time series models have become more sophisticated and attempt to model conditional heteroskedasticity with models such as ARCH (autoregressive conditional heteroskedasticity) and GARCH Predictive analytics 59

(generalized autoregressive conditional heteroskedasticity) models frequently used for financial time series. In addition time series models are also used to understand inter-relationships among economic variables represented by systems of equations using VAR (vector autoregression) and structural VAR models.

Survival or duration analysis Survival analysis is another name for time to event analysis. These techniques were primarily developed in the medical and biological sciences, but they are also widely used in the social sciences like economics, as well as in engineering (reliability and failure time analysis). Censoring and non-normality, which are characteristic of survival data, generate difficulty when trying to analyze the data using conventional statistical models such as multiple linear regression. The normal distribution, being a symmetric distribution, takes positive as well as negative values, but duration by its very nature cannot be negative and therefore normality cannot be assumed when dealing with duration/survival data. Hence the normality assumption of regression models is violated. The assumption is that if the data were not censored it would be representative of the population of interest. In survival analysis, censored observations arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time. An important concept in survival analysis is the hazard rate, defined as the probability that the event will occur at time t conditional on surviving until time t. Another concept related to the hazard rate is the survival function which can be defined as the probability of surviving to time t. Most models try to model the hazard rate by choosing the underlying distribution depending on the shape of the hazard function. A distribution whose hazard function slopes upward is said to have positive duration dependence, a decreasing hazard shows negative duration dependence whereas constant hazard is a process with no memory usually characterized by the exponential distribution. Some of the distributional choices in survival models are: F, gamma, Weibull, log normal, inverse normal, exponential etc. All these distributions are for a non-negative random variable. Duration models can be parametric, non-parametric or semi-parametric. Some of the models commonly used are Kaplan-Meier and Cox proportional hazard model (non parametric).

Classification and regression trees Classification and regression trees (CART) is a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively. Decision trees are formed by a collection of rules based on variables in the modeling data set: • Rules based on variables’ values are selected to get the best split to differentiate observations based on the dependent variable • Once a rule is selected and splits a node into two, the same process is applied to each “child” node (i.e. it is a recursive procedure) • Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met. (Alternatively, the data are split as much as possible and then the tree is later pruned.) Each branch of the tree ends in a terminal node. Each observation falls into one and exactly one terminal node, and each terminal node is uniquely defined by a set of rules. A very popular method for predictive analytics is Leo Breiman's Random forests or derived versions of this technique like Random multinomial logit. Predictive analytics 60

Multivariate adaptive regression splines Multivariate adaptive regression splines (MARS) is a non-parametric technique that builds flexible models by fitting piecewise linear regressions. An important concept associated with regression splines is that of a knot. Knot is where one local regression model gives way to another and thus is the point of intersection between two splines. In multivariate and adaptive regression splines, basis functions are the tool used for generalizing the search for knots. Basis functions are a set of functions used to represent the information contained in one or more variables. Multivariate and Adaptive Regression Splines model almost always creates the basis functions in pairs. Multivariate and adaptive regression spline approach deliberately overfits the model and then prunes to get to the optimal model. The algorithm is computationally very intensive and in practice we are required to specify an upper limit on the number of basis functions.

Machine learning techniques Machine learning, a branch of artificial intelligence, was originally employed to develop techniques to enable computers to learn. Today, since it includes a number of advanced statistical methods for regression and classification, it finds application in a wide variety of fields including medical diagnostics, credit card fraud detection, face and speech recognition and analysis of the stock market. In certain applications it is sufficient to directly predict the dependent variable without focusing on the underlying relationships between variables. In other cases, the underlying relationships can be very complex and the mathematical form of the dependencies unknown. For such cases, machine learning techniques emulate human cognition and learn from training examples to predict future events. A brief discussion of some of these methods used commonly for predictive analytics is provided below. A detailed study of machine learning can be found in Mitchell (1997).

Neural networks Neural networks are nonlinear sophisticated modeling techniques that are able to model complex functions. They can be applied to problems of prediction, classification or control in a wide spectrum of fields such as finance, cognitive psychology/neuroscience, medicine, engineering, and physics. Neural networks are used when the exact nature of the relationship between inputs and output is not known. A key feature of neural networks is that they learn the relationship between inputs and output through training. There are three types of training in neural networks used by different networks, supervised and unsupervised training, reinforcement learning,with supervised being the most common one. Some examples of neural network training techniques are backpropagation, quick propagation, conjugate gradient descent, projection operator, Delta-Bar-Delta etc. Some unsupervised network architectures are multilayer perceptrons, Kohonen networks, Hopfield networks, etc.

Radial basis functions A radial basis function (RBF) is a function which has built into it a distance criterion with respect to a center. Such functions can be used very efficiently for interpolation and for smoothing of data. Radial basis functions have been applied in the area of neural networks where they are used as a replacement for the sigmoidal transfer function. Such networks have 3 layers, the input layer, the hidden layer with the RBF non-linearity and a linear output layer. The most popular choice for the non-linearity is the Gaussian. RBF networks have the advantage of not being locked into local minima as do the feed-forward networks such as the multilayer perceptron. Predictive analytics 61

Support vector machines Support Vector Machines (SVM) are used to detect and exploit complex patterns in data by clustering, classifying and ranking the data. They are learning machines that are used to perform binary classifications and regression estimations. They commonly use kernel based methods to apply linear classification techniques to non-linear classification problems. There are a number of types of SVM such as linear, polynomial, sigmoid etc.

Naïve Bayes Naïve Bayes based on Bayes conditional probability rule is used for performing classification tasks. Naïve Bayes assumes the predictors are statistically independent which makes it an effective classification tool that is easy to interpret. It is best employed when faced with the problem of ‘curse of dimensionality’ i.e. when the number of predictors is very high.

k-nearest neighbours The nearest neighbour algorithm (KNN) belongs to the class of pattern recognition statistical methods. The method does not impose a priori any assumptions about the distribution from which the modeling sample is drawn. It involves a training set with both positive and negative values. A new sample is classified by calculating the distance to the nearest neighbouring training case. The sign of that point will determine the classification of the sample. In the k-nearest neighbour classifier, the k nearest points are considered and the sign of the majority is used to classify the sample. The performance of the kNN algorithm is influenced by three main factors: (1) the distance measure used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. It can be proved that, unlike other methods, this method is universally asymptotically convergent, i.e.: as the size of the training set increases, if the observations are independent and identically distributed (i.i.d.), regardless of the distribution from which the sample is drawn, the predicted class will converge to the class assignment that minimizes misclassification error. See Devroy et al.

Geospatial predictive modeling Conceptually, geospatial predictive modeling is rooted in the principle that the occurrences of events being modeled are limited in distribution. Occurrences of events are neither uniform nor random in distribution – there are spatial environment factors (infrastructure, sociocultural, topographic, etc.) that constrain and influence where the locations of events occur. Geospatial predictive modeling attempts to describe those constraints and influences by spatially correlating occurrences of historical geospatial locations with environmental factors that represent those constraints and influences. Geospatial predictive modeling is a process for analyzing events through a geographic filter in order to make statements of likelihood for event occurrence or emergence.

Tools Historically, using predictive analytics tools—as well as understanding the results they delivered—required advanced skills. However, modern predictive analytics tools are no longer restricted to IT specialists. As more organizations adopt predictive analytics into decision-making processes and integrate it into their operations, they’re creating a shift in the market toward business users as the primary consumers of the information. Business users want tools they can use on their own. Vendors are responding by creating new software that removes the mathematical complexity, provides user-friendly graphic interfaces and/or builds in short cuts that can, for example, recognize the kind of data available and suggest an appropriate predictive model.[16] Predictive analytics tools have become sophisticated enough to adequately present and dissect data problems, so that any data-savvy information worker can utilize them to analyze data and retrieve meaningful, useful results.[2] For example, modern tools present findings using simple charts, graphs, and scores that indicate the likelihood of possible outcomes.[17] Predictive analytics 62

There are numerous tools available in the marketplace that help with the execution of predictive analytics. These range from those that need very little user sophistication to those that are designed for the expert practitioner. The difference between these tools is often in the level of customization and heavy data lifting allowed. Notable predictive analytic toolset vendors include:[18] • SAS • IBM • Information Builders, Inc. • SPSS • SAP BusinessObjects • Predictive Analysis • • StatSoft, Inc. • KXEN • FICO • Acxiom • Teradata • PROS • TIBCO Spotfire • Fuzzy Logix • Pervasive Software • Zementis • Alpine Data Labs • Apara Software [19]

PMML In an attempt to provide a standard language for expressing predictive models, the Predictive Model Markup Language (PMML) has been proposed. Such an XML-based language provides a way for the different tools to define predictive models and to share these between PMML compliant applications. PMML 4.0 was released in June, 2009.

References

[1] Nyce, Charles (2007), Predictive Analytics White Paper (http:/ / www. aicpcu. org/ doc/ predictivemodelingwhitepaper. pdf), American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America, p. 1,

[2] Eckerson, Wayne (May 10, 2007), Extending the Value of Your Data Warehousing Investment (http:/ / tdwi. org/ articles/ 2007/ 05/ 10/

predictive-analytics. aspx?sc_lang=en), The Data Warehouse Institute,

[3] Conz, Nathan (September 2, 2008), "Insurers Shift to Customer-focused Predictive Analytics Technologies" (http:/ / www. insurancetech.

com/ business-intelligence/ 210600271), Insurance & Technology,

[4] Fletcher, Heather (March 2, 2011), "The 7 Best Uses for Predictive Analytics in Multichannel Marketing" (http:/ / www. targetmarketingmag.

com/ article/ 7-best-uses-predictive-analytics-modeling-multichannel-marketing/ 1#), Target Marketing,

[5] Korn, Sue (April 21, 2011), "The Opportunity for Predictive Analytics in Finance" (http:/ / www. hpcwire. com/ hpcwire/ 2011-04-21/

the_opportunity_for_predictive_analytics_in_finance. html), HPC Wire,

[6] Barkin, Eric (May 2011), "CRM + Predictive Analytics: Why It All Adds Up" (http:/ / www. destinationcrm. com/ Articles/ Editorial/

Magazine-Features/ CRM---Predictive-Analytics-Why-It-All-Adds-Up-74700. aspx), Destination CRM, [7] Das, Krantik; Vidyashankar, G.S. (July 1, 2006), "Competitive Advantage in Retail Through Analytics: Developing Insights, Creating Value"

(http:/ / www. information-management. com/ infodirect/ 20060707/ 1057744-1. html), Information Management,

[8] McDonald, Michèle (September 2, 2010), "New Technology Taps ‘Predictive Analytics’ to Target Travel Recommendations" (http:/ / www.

travelmarketreport. com/ technology?articleID=4259& LP=1,), Travel Market Report,

[9] Stevenson, Erin (December 16, 2011), "Tech Beat: Can you pronounce health care predictive analytics?" (http:/ / www. times-standard. com/

business/ ci_19561141), Times-Standard,

[10] McKay, Lauren (August 2009), "The New Prescription for Pharma" (http:/ / www. destinationcrm. com/ articles/ Web-Exclusives/

Web-Only-Bonus-Articles/ The-New-Prescription-for-Pharma-55774. aspx), Destination CRM, Predictive analytics 63

[11] Schiff, Mike (March 6, 2012), BI Experts: Why Predictive Analytics Will Continue to Grow (http:/ / tdwi. org/ Articles/ 2012/ 03/ 06/

Predictive-Analytics-Growth. aspx?Page=1), The Data Warehouse Institute,

[12] Nigrini, Mark (June, 2011). "Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations" (http:/ / www. wiley.

com/ WileyCDA/ WileyTitle/ productCd-0470890460. html). Hoboken, NJ: John Wiley & Sons Inc.. ISBN 978-0-470-89046-2. .

[13] Dhar, Vasant (April 2011). "Prediction in Financial Markets: The Case for Small Disjuncts" (http:/ / dl. acm. org/ citation. cfm?id=1961191). ACM Transactions on Intelligent Systems and Technologies 2 (3). . [14] Dhar, Vasant; Chou, Dashin and Provost Foster (October 2000). "Discovering Interesting Patterns in Investment Decision Making with

GLOWER – A Genetic Learning Algorithm Overlaid With Entropy Reduction" (http:/ / dl. acm. org/ citation. cfm?id=593502). Data Mining and Knowledge Discovery 4 (4). .

[15] https:/ / acc. dau. mil/ CommunityBrowser. aspx?id=126070

[16] Halper, Fran (November 1, 2011), "The Top 5 Trends in Predictive Analytics" (http:/ / www. information-management. com/ issues/ 21_6/

the-top-5-trends-in-redictive-an-alytics-10021460-1. html), Information Management,

[17] MacLennan, Jamie (May 1, 2012), 5 Myths about Predictive Analytics (http:/ / tdwi. org/ articles/ 2012/ 05/ 01/

5-predictive-analytics-myths. aspx), The Data Warehouse Institute,

[18] Kalakota, Ravi (March 18, 2012), Making Money on Predictive Analytics – Tools, Consulting and Content (http:/ / practicalanalytics.

wordpress. com/ 2012/ 03/ 18/ making-money-on-predictive-analytics-tools-consulting-and-content/ ), Practical Analytics,

[19] http:/ / www. aparasw. com/ index. php/ en • Agresti, Alan (2002). Categorical Data Analysis. Hoboken: John Wiley and Sons. ISBN 0-471-36093-7. • Coggeshall, Stephen, Davies, John, Jones, Roger., and Schutzer, Daniel, "Intelligent Security Systems," in Freedman, Roy S., Flein, Robert A., and Lederman, Jess, Editors (1995). Artificial Intelligence in the Capital Markets. Chicago: Irwin. ISBN 1-55738-811-3. • L. Devroye, L. Györfi, G. Lugosi (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag. • Enders, Walter (2004). Applied Time Series Econometrics. Hoboken: John Wiley and Sons. ISBN 0-521-83919-X. • Greene, William (2000). Econometric Analysis. Prentice Hall. ISBN 0-13-013297-7. • Guidère, Mathieu; Howard N, Sh. Argamon (2009). Rich Language Analysis for Counterterrrorism. Berlin, London, New York: Springer-Verlag. ISBN 978-3-642-01140-5. • Mitchell, Tom (1997). Machine Learning. New York: McGraw-Hill. ISBN 0-07-042807-7. • Tukey, John (1977). Exploratory Data Analysis. New York: Addison-Wesley. ISBN 0-201-07616-0. Prescriptive analytics 64 Prescriptive analytics

Prescriptive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine learning to make predictions and then suggests decision options to take advantage of the predictions. Prescriptive analytics is the third phase of business analytics (BA) which includes descriptive, predictive and prescriptive analytics.[1][2] The most traditional of business analytics is descriptive analytics, and it accounts for the majority of all business analytics today. Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure. Almost all management reporting such as sales, marketing, operations, and finance, uses this type of post-mortem analysis. The next phase is predictive analytics. This is when historical performance data is combined with rules, algorithms, and occasionally external data to determine the probable future outcome of an event or a likelihood of a situation occurring. The final phase is prescriptive analytics.[3] Prescriptive analytics goes beyond predicting future outcomes by also suggesting actions to benefit from the predictions and showing the decision maker the implications of each decision option.[4] Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics can suggest decision options on how to take advantage of a future opportunity or mitigate a future risk and illustrate the implication of each decision option. In practice, prescriptive analytics can continually and automatically process new data to improve prediction accuracy and provide better decision options.

History Prescriptive analytics has been around since 2003-2004. The technology behind prescriptive analytics synergistically combines data, business rules, and mathematical models. The data inputs to prescriptive analytics may come from multiple sources: internal, such as inside a corporation; and external, also known as environmental data. The data may also be structured, which includes numerical and categorical data, as well as unstructured data, such as text, images, audio, and video data, including big data. Business rules define the business process and include constraints, preferences, policies, best practices, and boundaries. Mathematical models are techniques derived from mathematical sciences and related disciplines including applied statistics, machine learning, operations research, and natural language processing.

Applications in healthcare Multiple factors are driving healthcare providers to dramatically improve business processes and operations as the United States healthcare industry embarks on the necessary migration from a largely fee-for service, volume-based system to a fee-for-outcome, value-based system. Prescriptive analytics is playing a key role to help improve the performance in a number of healthcare areas. Prescriptive analytics can benefit healthcare strategic planning by using analytics to leverage operational and usage data combined with data of external factors such as economic data, population demographic trends and population health trends, to more accurately plan for future capital investments such as new facilities and equipment utilization as well as understand the trade-offs between adding additional beds and expanding an existing facility versus building a new one.[5] In provider-payer negotiations, providers can improve their negotiating position with health insurers by developing a robust understanding of future service utilization. By accurately predicting utilization, providers can also better allocate personnel. Prescriptive analytics 65

Applications in energy and utilities Natural gas prices fluctuate dramatically depending upon supply, demand, econometrics, geopolitics, and weather conditions. Gas producers, pipeline transmission companies and utility firms have a keen interest in more accurately predicting gas prices so that they can lock in favorable terms while hedging downside risk. Prescriptive analytics can accurately predict prices by modeling internal and external variables simultaneously and also provide decision options and show the impact of each decision option.

Applications in telecommunications and cable Telecom and cable companies have large field service organizations to install, repair and resolve issues at customer sites, both in homes or at businesses. Mastering field service operations is critical because field services impact customer satisfaction, cost and churn. Prescriptive analytics enables telecom and cable providers to dramatically improve effectiveness and efficiency of field service operations.

Further reading • Davenport, Thomas H., Kalakota, Ravi, Taylor, James, Lampa, Mike, Franks, Bill, Jeremy, Shapiro, Cokins, Gary, Way, Robin, King, Joy, Schafer, Lori, Renfrow, Cyndy and Sittig, Dean, Predictions for Analytics in 2012 [6] International Institute for Analytics (December 15, 2011) • Laney, Douglas and Kart, Lisa, (March 20, 2012).Emerging Role of the Data Scientist and the Art of Data Science [7] Gartner. • McCormick Northwestern Engineering Prescriptive analytics is about enabling smart decisions based on data [8]. • Business Analytics Information Event [9], I2SDS and Department of Decision Sciences, School of Business, The George Washington University (February 10, 2011). • "The Difference Between Operations Research and Business Analysis" [10] OR Exchange / Informs (April 2011). • Horner, Peter and Basu, Atanu, Analytics and the Future of Healthcare [11] Analytics. (January / February 2012). • Ghosh, Rajib, Basu, Atanu and Bhaduri, Abhijit, From ‘Sick’ Care to ‘Health’ Care [12] Analytics. (July / August 2011). • Fischer, Eric, Basu, Atanu, Hubele, Joachim and Levine, Eric, TV ads, Wanamaker’s Dilemma & Analytics [13] Analytics. (March / April 2011) • Basu, Atanu and Worth, Tim, Predictive Analytics Practical ways to Drive Customer Service, Looking Forward [14] Analytics. (July / August 2010). • Jain, Rajeev, Basu, Atanu, and Levine, Eric, Putting Major Energy Decisions through their Paces, A Framework for a Better Environment through Analytics [15] OR/MS Today (December 2010). • Bhaduri, Abhijit and Basu, Atanu, Predictive Human Resources, Can Math Help Improve HR Mandates in an Organization? [16] OR/MS Today (October 2010). • Brown, Scott, Basu, Atanu and Worth, Tim, Predictive Analytics in Field Service, Practical Ways to Drive Field Service, Looking Forward [17] Analytics. (November / December 2010). • Pease, Andrew Bringing Optimization to the Business [18], SAS Global Forum 2012, Paper 165-2012 (2012). Prescriptive analytics 66

External links • INFORMS' bi-monthly, digital magazine on the analytics profession [19] • Northwestern University Master of Science in Analytics [8] • The George Washington University [20] • AYATA [21] • Prescriptive Analytics Overview [22] • Memon, Jai "Why Data Matters: Moving Beyond Prediction" [23] IBM.

References [1] Evans,James R. and Lindner, Carl H. (March 2012). "Business Analytics: The Next Frontier for Decision Sciences". Decision Line 43 (2). [2] Lustig,Irv, Dietric, Brenda, Johnson, Christer, and Dziekan, Christopher (Nov/Dec 2010). "The Analytics Journey". Analytics. [3] Haas, Peter J., Maglio, Paul P., Selinger, Patricia G., and Tan, Wang-Chie (2011). "Data is Dead…Without What-If Models". Proceedings of the VLDB Endowment 4 (12). [4] Stewart, Thomas. R., and McMillan, Claude, Jr. (1987). "Descriptive and Prescriptive Models for Judgment and Decision Making: Implications for Knowledge Engineering". NATO AS1 Senes, Expert Judgment and Expert Systems, F35: 314–318. [5] Foster, Roger (May 2012). "Big data and public health, part 2: Reducing Unwarranted Services". Government Health IT.

[6] http:/ / iianalytics. com/ wp-content/ uploads/ 2011/ 12/ 2012-IIA-Predictions-Brief-Final. pdf

[7] http:/ / www. parabal. com/ uploads/ docs/ Greenplum/

Emerging%20Role%20of%20the%20Data%20Scientist%20and%20the%20Art%20of%20Data%20Science. pdf

[8] http:/ / www. analytics. northwestern. edu/ analytics-examples/ prescriptive-analytics. html

[9] http:/ / business. gwu. edu/ decisionsciences/ i2sds/ pdf/ Program%20in%20BA%20presentation. pdf

[10] http:/ / www. or-exchange. com/ questions/ 1344/ difference-between-operations-research-and-business-analysis

[11] http:/ / www. analytics-magazine. org/ januaryfebruary-2012/ 503-analytics-a-the-future-of-healthcare

[12] http:/ / www. analytics-magazine. org/ septemberoctober-2011/ 402-health-care-a-analytics. html

[13] http:/ / www. analytics-magazine. org/ march-april-2011/ 278-predictive-analytics-tv-ads-wanamakers-dilemma-a-analytics. html

[14] http:/ / www. analytics-magazine. org/ july-august-2010/ 128-predictive-analytics-game-changer. html

[15] https:/ / www. ayata. com/ pdfs/ Predictive-Analytics-In-Clean-Energy. pdf

[16] http:/ / www. ayata. com/ company/ resources?start=6

[17] http:/ / www. analytics-magazine. org/ november-december-2010/ 98-predictive-analytics-in-field-service

[18] http:/ / support. sas. com/ resources/ papers/ proceedings12/ 165-2012. pdf

[19] http:/ / analyticsmagazine. com/

[20] http:/ / business. gwu. edu/ decisionsciences/ i2sds/ businessanalytics. cfm

[21] http:/ / www. ayata. com

[22] http:/ / www. . com/ watch?v=3l__n5zlVRQ& feature=plcp

[23] http:/ / www. youtube. com/ watch?v=VtETirgVn9c Decision support system 67 Decision support system

A decision support system (DSS) is a computer-based information system that supports business or organizational decision-making activities. DSSs serve the management, operations, and planning levels of an organization and help to make decisions, which may be rapidly changing and not easily specified in advance. Decision support systems can be either fully computerized, human or a combination of both.

DSSs include knowledge-based systems. A properly designed DSS is an interactive software-based system intended to help decision makers Example of a Decision Support System for John Day Reservoir. compile useful information from a combination of raw data, documents, and personal knowledge, or business models to identify and solve problems and make decisions.

Typical information that a decision support application might gather and present includes: • inventories of information assets (including legacy and relational data sources, cubes, data warehouses, and data marts), • comparative sales figures between one period and the next, • projected revenue figures based on product sales assumptions.

History According to Keen (1978),[1] the concept of decision support has evolved from two main areas of research: The theoretical studies of organizational decision making done at the Carnegie Institute of Technology during the late 1950s and early 1960s, and the technical work on interactive computer systems, mainly carried out at the Massachusetts Institute of Technology in the 1960s. It is considered that the concept of DSS became an area of research of its own in the middle of the 1970s, before gaining in intensity during the 1980s. In the middle and late 1980s, executive information systems (EIS), group decision support systems (GDSS), and organizational decision support systems (ODSS) evolved from the single user and model-oriented DSS. According to Sol (1987)[2] the definition and scope of DSS has been migrating over the years. In the 1970s DSS was described as "a computer based system to aid decision making". In the late 1970s the DSS movement started focusing on "interactive computer-based systems which help decision-makers utilize data bases and models to solve ill-structured problems". In the 1980s DSS should provide systems "using suitable and available technology to improve effectiveness of managerial and professional activities", and end 1980s DSS faced a new challenge towards the design of intelligent workstations.[2] In 1987, Texas Instruments completed development of the Gate Assignment Display System (GADS) for United Airlines. This decision support system is credited with significantly reducing travel delays by aiding the management of ground operations at various airports, beginning with O'Hare International Airport in Chicago and Stapleton Decision support system 68

Airport in Denver Colorado.[3][4] Beginning in about 1990, data warehousing and on-line analytical processing (OLAP) began broadening the realm of DSS. As the turn of the millennium approached, new Web-based analytical applications were introduced. The advent of better and better reporting technologies has seen DSS start to emerge as a critical component of management design. Examples of this can be seen in the intense amount of discussion of DSS in the education environment. DSS also have a weak connection to the user interface paradigm of hypertext. Both the University of Vermont PROMIS system (for medical decision making) and the Carnegie Mellon ZOG/KMS system (for military and business decision making) were decision support systems which also were major breakthroughs in user interface research. Furthermore, although hypertext researchers have generally been concerned with information overload, certain researchers, notably Douglas Engelbart, have been focused on decision makers in particular.

Taxonomies As with the definition, there is no universally accepted taxonomy of DSS either. Different authors propose different classifications. Using the relationship with the user as the criterion, Haettenschwiler[5] differentiates passive, active, and cooperative DSS. A passive DSS is a system that aids the process of decision making, but that cannot bring out explicit decision suggestions or solutions. An active DSS can bring out such decision suggestions or solutions. A cooperative DSS allows the decision maker (or its advisor) to modify, complete, or refine the decision suggestions provided by the system, before sending them back to the system for validation. The system again improves, completes, and refines the suggestions of the decision maker and sends them back to him for validation. The whole process then starts again, until a consolidated solution is generated. Another taxonomy for DSS has been created by Daniel Power. Using the mode of assistance as the criterion, Power differentiates communication-driven DSS, data-driven DSS, document-driven DSS, knowledge-driven DSS, and model-driven DSS.[6] • A communication-driven DSS supports more than one person working on a shared task; examples include integrated tools like Microsoft's NetMeeting or Groove[7] • A data-driven DSS or data-oriented DSS emphasizes access to and manipulation of a time series of internal company data and, sometimes, external data. • A document-driven DSS manages, retrieves, and manipulates unstructured information in a variety of electronic formats. • A knowledge-driven DSS provides specialized problem-solving expertise stored as facts, rules, procedures, or in similar structures.[6] • A model-driven DSS emphasizes access to and manipulation of a statistical, financial, optimization, or simulation model. Model-driven DSS use data and parameters provided by users to assist decision makers in analyzing a situation; they are not necessarily data-intensive. Dicodess is an example of an open source model-driven DSS generator.[8] Using scope as the criterion, Power[9] differentiates enterprise-wide DSS and desktop DSS. An enterprise-wide DSS is linked to large data warehouses and serves many managers in the company. A desktop, single-user DSS is a small system that runs on an individual manager's PC. Decision support system 69

Components

Three fundamental components of a DSS architecture are:[5][6][10][11][12] 1. the database (or knowledge base), 2. the model (i.e., the decision context and user criteria), and 3. the user interface. The users themselves are also important components of the architecture.[5][12]

Development Frameworks

DSS systems are not entirely different from other systems and require a Design of a Drought Mitigation Decision Support System. structured approach. Such a framework includes people, technology, and the development approach.[10] DSS technology levels (of hardware and software) may include: 1. The actual application that will be used by the user. This is the part of the application that allows the decision maker to make decisions in a particular problem area. The user can act upon that particular problem. 2. Generator contains Hardware/software environment that allows people to easily develop specific DSS applications. This level makes use of case tools or systems such as Crystal, AIMMS, Analytica and iThink. 3. Tools include lower level hardware/software. DSS generators including special languages, function libraries and linking modules An iterative developmental approach allows for the DSS to be changed and redesigned at various intervals. Once the system is designed, it will need to be tested and revised where necessary for the desired outcome.

Classification There are several ways to classify DSS applications. Not every DSS fits neatly into one of the categories, but may be a mix of two or more architectures. Holsapple and Whinston[13] classify DSS into the following six frameworks: Text-oriented DSS, Database-oriented DSS, Spreadsheet-oriented DSS, Solver-oriented DSS, Rule-oriented DSS, and Compound DSS. A compound DSS is the most popular classification for a DSS. It is a hybrid system that includes two or more of the five basic structures described by Holsapple and Whinston.[13] The support given by DSS can be separated into three distinct, interrelated categories[14]: Personal Support, Group Support, and Organizational Support. DSS components may be classified as: 1. Inputs: Factors, numbers, and characteristics to analyze 2. User Knowledge and Expertise: Inputs requiring manual analysis by the user 3. Outputs: Transformed data from which DSS "decisions" are generated 4. Decisions: Results generated by the DSS based on user criteria DSSs which perform selected cognitive decision-making functions and are based on artificial intelligence or intelligent agents technologies are called Intelligent Decision Support Systems (IDSS). Decision support system 70

The nascent field of Decision engineering treats the decision itself as an engineered object, and applies engineering principles such as Design and Quality assurance to an explicit representation of the elements that make up a decision.

Applications As mentioned above, there are theoretical possibilities of building such systems in any knowledge domain. One example is the clinical decision support system for medical diagnosis. Other examples include a bank loan officer verifying the credit of a loan applicant or an engineering firm that has bids on several projects and wants to know if they can be competitive with their costs. DSS is extensively used in business and management. Executive dashboard and other business performance software allow faster decision making, identification of negative trends, and better allocation of business resources. Due to DSS all the information from any organization is represented in the form of charts, graphs i.e. in a summarized way, which helps the management to take strategic decision. A growing area of DSS application, concepts, principles, and techniques is in agricultural production, marketing for sustainable development. For example, the DSSAT4 package,[15][16] developed through financial support of USAID during the 80's and 90's, has allowed rapid assessment of several agricultural production systems around the world to facilitate decision-making at the farm and policy levels. There are, however, many constraints to the successful adoption on DSS in agriculture.[17] DSS are also prevalent in forest management where the long planning time frame demands specific requirements. All aspects of Forest management, from log transportation, harvest scheduling to sustainability and ecosystem protection have been addressed by modern DSSs. A specific example concerns the Canadian National Railway system, which tests its equipment on a regular basis using a decision support system. A problem faced by any railroad is worn-out or defective rails, which can result in hundreds of derailments per year. Under a DSS, CN managed to decrease the incidence of derailments at the same time other companies were experiencing an increase.

Benefits 1. Improves personal efficiency 2. Speed up the process of decision making 3. Increases organizational control 4. Encourages exploration and discovery on the part of the decision maker 5. Speeds up problem solving in an organization 6. Facilitates interpersonal communication 7. Promotes learning or training 8. Generates new evidence in support of a decision 9. Creates a competitive advantage over competition 10. Reveals new approaches to thinking about the problem space 11. Helps automate managerial processes Decision support system 71

References [1] Keen, P. G. W. (1978). Decision support systems: an organizational perspective. Reading, Mass., Addison-Wesley Pub. Co. ISBN 0-201-03667-3 [2] Henk G. Sol et al. (1987). Expert systems and artificial intelligence in decision support systems: proceedings of the Second Mini Euroconference, Lunteren, The Netherlands, 17–20 November 1985. Springer, 1987. ISBN 90-277-2437-7. p.1-2. [3] Efraim Turban, Jay E. Aronson, Ting-Peng Liang (2008). Decision Support Systems and Intelligent Systems. p. 574.

[4] "Gate Delays at Airports Are Minimised for United by Texas Instruments' Explorer" (http:/ / www. cbronline. com/ news/ gate_delays_at_airports_are_minimised_for_united_by_texas_instruments_explorer). Computer Business Review. 1987-11-26. . [5] Haettenschwiler, P. (1999). Neues anwenderfreundliches Konzept der Entscheidungsunterstützung. Gutes Entscheiden in Wirtschaft, Politik und Gesellschaft. Zurich, vdf Hochschulverlag AG: 189-208. [6] Power, D. J. (2002). Decision support systems: concepts and resources for managers. Westport, Conn., Quorum Books. [7] Stanhope, P. (2002). Get in the Groove: building tools and peer-to-peer solutions with the Groove platform. New York, Hungry Minds [8] Gachet, A. (2004). Building Model-Driven Decision Support Systems with Dicodess. Zurich, VDF. [9] Power, D. J. (1997). What is a DSS? The On-Line Executive Journal for Data-Intensive Decision Support 1(3). [10] Sprague, R. H. and E. D. Carlson (1982). Building effective decision support systems. Englewood Cliffs, N.J., Prentice-Hall. ISBN 0-13-086215-0 [11] Haag, Cummings, McCubbrey, Pinsonneault, Donovan (2000). Management Information Systems: For The Information Age. McGraw-Hill Ryerson Limited: 136-140. ISBN 0-07-281947-2 [12] Marakas, G. M. (1999). Decision support systems in the twenty-first century. Upper Saddle River, N.J., Prentice Hall. [13] Holsapple, C.W., and A. B. Whinston. (1996). Decision Support Systems: A Knowledge-Based Approach. St. Paul: West Publishing. ISBN 0-324-03578-0 [14] Hackathorn, R. D., and P. G. W. Keen. (1981, September). "Organizational Strategies for Personal Computing in Decision Support Systems." MIS Quarterly, Vol. 5, No. 3.

[15] DSSAT4 (pdf) (http:/ / www. aglearn. net/ resources/ isfm/ DSSAT. pdf)

[16] The Decision Support System for Agrotechnology Transfer (http:/ / www. icasa. net/ dssat/ ) [17] Stephens, W. and Middleton, T. (2002). Why has the uptake of Decision Support Systems been so poor? In: Crop-soil simulation models in developing countries. 129-148 (Eds R.B. Matthews and William Stephens). Wallingford:CABI.

Further reading • Delic, K.A., Douillet,L. and Dayal, U. (2001) "Towards an architecture for real-time decision support

systems:challenges and solutions (http:/ / ieeexplore. ieee. org/ xpl/ freeabs_all. jsp?arnumber=938098). • Diasio, S., Agell, N. (2009) "The evolution of expertise in decision support technologies: A challenge for organizations," cscwd, pp. 692–697, 13th International Conference on Computer Supported Cooperative Work in

Design, 2009. http:/ / www. computer. org/ portal/ web/ csdl/ doi/ 10. 1109/ CSCWD. 2009. 4968139 • Gadomski, A.M. et al.(2001) "An Approach to the Intelligent Decision Advisor (IDA) for Emergency Managers", Int. J. Risk Assessment and Management, Vol. 2, Nos. 3/4. • Gomes da Silva, Carlos; Clímaco, João; Figueira, José. European Journal of Operational Research. • Ender, Gabriela; E-Book (2005–2011) about the OpenSpace-Online Real-Time Methodology: Knowledge-sharing, problem solving, results-oriented group dialogs about topics that matter with extensive

conference documentation in real-time. Download http:/ / www. openspace-online. com/

OpenSpace-Online_eBook_en. pdf • Jiménez, Antonio; Ríos-Insua, Sixto; Mateos, Alfonso. Computers & Operations Research. • Jintrawet, Attachai (1995). A Decision Support System for Rapid Assessment of Lowland Rice-based Cropping Alternatives in Thailand. Agricultural Systems 47: 245-258. • Matsatsinis, N.F. and Y. Siskos (2002), Intelligent support systems for marketing decisions, Kluwer Academic Publishers. • Power, D. J. (2000). Web-based and model-driven decision support systems: concepts and issues. in proceedings of the Americas Conference on Information Systems, Long Beach, California. • Reich, Yoram; Kapeliuk, Adi. Decision Support Systems., Nov2005, Vol. 41 Issue 1, p1-19, 19p. • Sauter, V. L. (1997). Decision support systems: an applied managerial approach. New York, John Wiley. • Silver, M. (1991). Systems that support decision makers: description and analysis. Chichester ; New York, Wiley. Decision support system 72

• Sprague, R. H. and H. J. Watson (1993). Decision support systems: putting theory into practice. Englewood Clifts, N.J., Prentice Hall.

Competitive intelligence

A broad definition of competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors and any aspect of the environment needed to support executives and managers in making strategic decisions for an organization. Key points of this definition: 1. Competitive intelligence is an ethical and legal business practice, as opposed to industrial espionage which is illegal. 2. The focus is on the external business environment.[1] 3. There is a process involved in gathering information, converting it into intelligence and then utilizing this in business decision making. Some CI professionals erroneously emphasize that if the intelligence gathered is not usable (or actionable) then it is not intelligence. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge.[2] The term CI is often viewed as synonymous with competitor analysis, but competitive intelligence is more than analyzing competitors — it is about making the organization more competitive relative to its entire environment and stakeholders: customers, competitors, distributors, technologies, macro-economic data etc.

Historic development The literature associated with the field of competitive intelligence is best exemplified by the detailed bibliographies that were published in the Society of Competitive Intelligence Professionals refereed academic journal called The Journal of Competitive Intelligence and Management.[3][4][5][6] Although elements of organizational intelligence collection have been a part of business for many years, the history of competitive intelligence arguably began in the U.S. in the 1970s, although the literature on the field pre-dates this time by at least several decades.[6] In 1980, Michael Porter published the study Competitive-Strategy: Techniques for Analyzing Industries and Competitors which is widely viewed as the foundation of modern competitive intelligence. This has since been extended most notably by the pair of Craig Fleisher and Babette Bensoussan, who through several popular books on competitive analysis have added 48 commonly applied competitive intelligence analysis techniques to the practitioner's tool box.[7][8] In 1985, Leonard Fuld published his best selling book dedicated to competitor intelligence.[9] However, the institutionalization of CI as a formal activity among American corporations can be traced to 1988, when Ben and Tamar Gilad published the first organizational model of a formal corporate CI function, which was then adopted widely by US companies.[10] The first professional certification program (CIP) was created in 1996 with the establishment of The Fuld-Gilad-Herring Academy of Competitive Intelligence in Cambridge, MA. In 1986 the Society of Competitive Intelligence Professionals (SCIP) was founded in the U.S. and grew in the late 1990s to around 6000 members worldwide, mainly in the U.S. and Canada, but with large numbers especially in UK and Germany. Due to financial difficulties in 2009, the organization merged with Frost & Sullivan under the Frost & Sullivan Institute. SCIP has since been renamed "Strategic & Competitive Intelligence Professionals" to emphasise the strategic nature of the subject, and also to refocus the organisation's general approach, while keeping the existing Competitive intelligence 73

SCIP brandname and logo. A number of efforts have been made to discuss the field's advances in post-secondary (university) education, covered by several authors including Blenkhorn & Fleisher,[11] Fleisher,[12] Fuld,[13] Prescott,[14] and McGonagle,[15] among others. Although the general view would be that competitive intelligence concepts can be readily found and taught in many business schools around the globe, there are still relatively few dedicated academic programs, majors, or degrees in the field, a concern to academics in the field who would like to see it further researched.[12] These issues were widely discussed by over a dozen knowledgeable individuals in a special edition of the Competitive Intelligence Magazine that was dedicated to this topic.[16] In France, a Specialized Master in Economic Intelligence and Knowledge Management was created in 1995 within the CERAM Business School, now SKEMA Business School [17] in Paris, with the objective of delivering a full and professional training in Economic Intelligence. A Centre for Global Intelligence and Influence was created in September 2011 in the same School. On the other hand, practitioners and companies regard professional accreditation as especially important in this field.[18] In 2011, SCIP recognized the Fuld-Gilad-Herring Academy of Competitive Intelligence's CIP certification process as its global, dual-level (CIP-I and CIP-II) certification program. Global developments have also been uneven in competitive intelligence.[19] Several academic journals, particularly the Journal of Competitive Intelligence and Management in its third volume, provided coverage of the field's global development.[20] For example, in 1997 the Ecole de Guerre Economique [21] (School of economic warfare) was founded in Paris, France. It is the first European institution which teaches the tactics of economic warfare within a globalizing world. In Germany, competitive intelligence was unattended until the early 1990s. The term "competitive intelligence" first appeared in German literature in 1997. In 1995 a German SCIP chapter was founded, which is now second in terms of members in Europe. In summer 2004 the Institute for Competitive Intelligence was founded, which provides a post-graduate certification program for Competitive Intelligence Professionals. Japan is currently the only country that officially maintains an economic intelligence agency (JETRO). It was founded by the Ministry of International Trade and Industry (MITI) in 1958. Accepting the importance of competitive intelligence, major multinational corporations, such as ExxonMobil, Procter & Gamble, and Johnson and Johnson, have created formal CI units. Importantly, organizations execute competitive intelligence activities not only as a safeguard to protect against market threats and changes, but also as a method for finding new opportunities and trends.

Principles Organizations use competitive intelligence to compare themselves to other organizations ("competitive benchmarking"), to identify risks and opportunities in their markets, and to pressure-test their plans against market response (war gaming), which enable them to make informed decisions. Most firms today realize the importance of knowing what their competitors are doing and how the industry is changing, and the information gathered allows organizations to understand their strengths and weaknesses. The actual importance of these categories of information to an organization depends on the contestability of its markets, the organizational culture, and personality and biases of its top decision makers, and the reporting structure of competitive intelligence within the company. Strategic Intelligence (SI): focus is on the longer term, looking at issues affecting a company’s competitiveness over the course of a couple of years. The actual time horizon for SI ultimately depends on the industry and how quickly it’s changing. The general questions that SI answers are, ‘Where should we as a company be in x Years?’ and 'What are the strategic risks and opportunities facing us?' This type of intelligence work involves among others the identification of weak signals and application of methodology and process called Strategic Early Warning (SEW), first introduced by Gilad,[22][23][24] followed by Steven Shaker and Victor Richardson,[25] Alessandro Comai and Joaquin Tena,[26][27] and others. According to Gilad, 20% of the work of competitive intelligence practitioners should be dedicated to strategic early identification of weak signals within a SEW framework. Competitive intelligence 74

Tactical Intelligence: the focus is on providing information designed to improve shorter-term decisions, most often related with the intent of growing market share or revenues. Generally, the type of information that you would need to support the sales process in an organization. Investigates various aspects of a product/product line marketing: • Product - what are people selling? • Price - what price are they charging? • Promotion - what activities are they conducting for promoting this product? • Place - where are they selling this product? • Other - sales force structure, clinical trial design, technical issues, etc. With the right amount of information, organizations can avoid unpleasant surprises by anticipating competitors’ moves and decreasing response time. Examples of competitive intelligence research is evident in Daily Newspapers, such as the Wall Street Journal, Business Week and Fortune. Major airlines change hundreds of fares daily in response to competitors’ tactics. They use information to plan their own marketing, pricing, and production strategies. Resources, such as the Internet, have made gathering information on competitors easy. With a click of a button, analysts can discover future trends and market requirements. However competitive intelligence is much more than this, as the ultimate aim is to lead to competitive advantage. As the Internet is mostly public domain material, information gathered is less likely to result in insights that will be unique to the company. In fact there is a risk that information gathered from the Internet will be misinformation and mislead users, so competitive intelligence researchers are often wary of using such information. As a result, although the Internet is viewed as a key source, most CI professionals should spend their time and budget gathering intelligence using primary research — networking with industry experts, from trade shows and conferences, from their own customers and suppliers, and so on. Where the Internet is used, it is to gather sources for primary research as well as information on what the company says about itself and its online presence (in the form of links to other companies, its strategy regarding search engines and online advertising, mentions in discussion forums and on blogs, etc.). Also, important are online subscription databases and news aggregation sources which have simplified the secondary source collection process. Social media sources are also becoming important - providing potential interviewee names, as well as opinions and attitudes, and sometimes breaking news (e.g. via Twitter). Organizations must be careful not to spend too much time and effort on old competitors without realizing the existence of any new competitors. Knowing more about your competitors will allow your business to grow and succeed. The practice of competitive intelligence is growing every year, and most companies and business students now realize the importance of knowing their competitors. According to Arjan Singh and Andrew Beurschgens in their 2006 article in the Competitive Intelligence Review, there are 4 stages of development of a competitive intelligence capability with a firm. It starts from Stick Fetching, where a CI department is very reactive, to World Class where it is completely integrated in the decision making process.

Competitive Intelligence in Central-Eastern Europe Competitive Intelligence for business purposes is still a new concept in the Central and Eastern European region. The communist era was one in which free market "competitiveness" was almost unknown and "intelligence" was associated with the security services. Competitive Intelligence about Central Eastern Europe and Russia is an important issue for larger international companies. The same questions and issues as everywhere (costs, competition, clients, people, process, products and strategy) are valid, important and need to be understood. Using standard CI methodologies without adjusting research processes to the constraints, challenges and characteristics of the region will not work.[28] „“ Competitive intelligence 75

Distinguishing competitive intelligence from similar fields Competitive Intelligence is depended on the Intelligence Cycle which is the basic principle of the national intelligence activity. The website of the CIA[29] is providing a comprehensive explanation of this key principle. This is a five steps process aiming towards creating value to the intelligence activity, mainly to the decision-makers. It took CI a few years to comprehend that operating in the business field to value the corporation with better understanding of the external threats and opportunities,[30] comprises numerous constraints, mainly ethical and legal which are obviously less relevant while operating for governments. This process of emerging CI since the 1980s and building up its strengths is described by Prescott.[31] Competitive intelligence is often confused with, or viewed to have overlapping elements with related fields like environmental scanning, business intelligence, and marketing research, just to name a few.[32] Some have questioned whether the name of "competitive intelligence" is even a satisfactory one to apply to the field[32] In a 2003 book chapter, Fleisher compares and contrasts competitive intelligence to business intelligence, competitor intelligence, knowledge management, market intelligence, marketing research, and strategic intelligence[33] The argument put forth by former SCIP President and CI author Craig Fleisher[33] suggests that business intelligence has two forms. In its narrower (contemporary) form has more of an information technology and internal focus than competitive intelligence while its broader (historical) definition is actually more encompassing than the contemporary practice of CI. Knowledge management (KM), when it isn't properly achieved (it needs an appropriate taxonomy for being up the best standards in the domain), is also viewed as being a heavily information technology driven organizational practice, that relies on data mining, corporate intranets, and mapping organizational assets, among other things, in order to make it accessible to organizational members for decision making. The CI shares some aspects of the real KM that is ideally and definitely human intelligence and experiences-based for more sophisticated qualitative analysis, creativity, prospective views. KM is essential for effective innovations. Market intelligence (MI) is industry-targeted intelligence that is developed on real-time (i.e., dynamic) aspects of competitive events taking place among the 4Ps of the marketing mix (i.e., pricing, place, promotion, and product) in the product or service marketplace in order to better understand the attractiveness of the market.[34] A time-based competitive tactic, MI insights are used by marketing and sales managers to hone their marketing efforts so as to more quickly respond to consumers in a fast-moving, vertical (i.e., industry) marketplace. Craig Fleisher suggests it is not distributed as widely as some forms of CI, which are distributed to other (non-marketing) decision-makers as well.[33] Market intelligence also has a shorter-term time horizon than many other intelligence areas and is usually measured in days, weeks, or, in some slower-moving industries, a handful of months. Marketing research is a tactical, methods-driven field that consists mainly of neutral primary research that draws on customer data in the form of beliefs and perceptions as gathered through surveys or focus groups, and is analyzed through the application of statistical research techniques.[35] In contrast, CI typically draws on a wider variety (i.e., both primary and secondary) of sources, from a wider range of stakeholders (e.g., suppliers, competitors, distributors, substitutes, media, and so on), and seeks not just to answer existing questions but also to raise new ones and to guide action.[33] In the 2001 article by Ben Gilad and Jan Herring, the authors lay down a set of basic prerequisites that define the unique nature of CI and distinguish it from other information-rich disciplines such as market research or business development. They show that a common body of knowledge and a unique set of applied tools (Key Intelligence Topics, Business War Games, Blindspots analysis) make CI clearly different, and that while other sensory activities in the commercial firm focus on one category of players in the market (customers or suppliers or acquisition targets), CI is the only integrative discipline calling for a synthesis of the data on all High Impact Players (HIP).[18] In a later article,[2] Gilad focuses his delineation of CI more forcefully on the difference between information and intelligence. According to Gilad, the commonality among many organizational sensory functions, whether called Market Research, Business Intelligence or Market intelligence is that in practice they deliver facts and information, not intelligence. Intelligence, by Gilad, is a perspective on facts, not the facts themselves. Uniquely among other Competitive intelligence 76

corporate functions, competitive intelligence has a specific perspective of external risks and opportunities to the firm’s overall performance, and as such it is part of an organization's risk management activity, not information activities.

Ethics Ethics has been a long-held issue of discussion amongst CI practitioners.[32] Essentially, the questions revolve around what is and is not allowable in terms of CI practitioners' activity. A number of very excellent scholarly treatments have been generated on this topic, most prominently addressed through Society of Competitive Intelligence Professionals publications.[36] The book Competitive Intelligence Ethics: Navigating the Gray Zone provides nearly twenty separate views about ethics in CI, as well as another 10 codes used by various individuals or organizations.[36] Combining that with the over two dozen scholarly articles or studies found within the various CI bibliographic entries,[37][5][6][38] it is clear that no shortage of study has gone into better classifying, understanding and addressing CI ethics. Competitive information may be obtained from public or subscription sources, from networking with competitor staff or customers, disassembly of competitor products or from field research interviews. Competitive intelligence research is distinguishable from industrial espionage, as CI practitioners generally abide by local legal guidelines and ethical business norms.[39]

References [1] Haag, Stephen. Management Information Systems for the Information Age. Third Edition. McGraw-Hill Ryerson, 2006. [2] Gilad, Ben. "The Future of Competitive Intelligence: Contest for the Profession's Soul", Competitive Intelligence Magazine, 2008, 11(5), 22 [3] Dishman, P., Fleisher, C.S., and V. Knip. "Chronological and Categorized Bibliography of Key Competitive Intelligence Scholarship: Part 1 (1997-2003), Journal of Competitive Intelligence and Management, 1(1), 16-78. [4] Fleisher, Craig S., Wright, Sheila, and R. Tindale. "Bibliography and Assessment of Key Competitive Intelligence Scholarship: Part 4 (2003-2006), Journal of Competitive Intelligence and Management, 2007, 4(1), 32-92. [5] Fleisher, Craig S., Knip, Victor, and P. Dishman. "Bibliography and Assessment of Key Competitive Intelligence Scholarship: Part 2 (1990-1996), Journal of Competitive Intelligence and Management, 2003, 1(2), 11-86. [6] Knip, Victor, P. Dishman, and C.S. Fleisher. "Bibliography and Assessment of Key Competitive Intelligence Scholarship: Part 3 (The Earliest Writings-1989), Journal of Competitive Intelligence and Management, 2003, 1(3), 10-79. [7] Fleisher, Craig S. and Babette E. Bensoussan. Strategic and Competitive Analysis: Methods and Techniques for Analyzing Business Competition. Prentice Hall, Upper Saddle River, 2003. [8] Fleisher, Craig S. and Babette E. Bensoussan. Business and Competitive Analysis: Effective Application of New and Classic Methods, FT Press, 2007. [9] Fuld, Leonard M. Competitor Intelligence: How to Get It, How to Use It.. NY: Wiley, 1985. [10] Gilad, Ben and Tamar Gilad. The Business Intelligence System. NY: American Management Association, 1988. [11] Blenkhorn, D. and C.S. Fleisher (2003). "Teaching CI to three diverse groups: Undergraduates, MBAs, and Executives," Competitive Intelligence Magazine, 6(4), 17-20. [12] Fleisher, C.S. (2003). "Competitive Intelligence Education: Competencies, Sources and Trends," Information Management Journal, March/April, 56-62. [13] Fuld, 2006 [14] Prescott, J. (1999). "Debunking the Academic Abstinence Myth of Competitive Intelligence," Competitive Intelligence Magazine, 2(4). [15] McGonagle, J. (2003). "Bibliography: Education in CI," Competitive Intelligence Magazine, 6(4), 50. [16] (Competitive Intelligence Magazine, 2003, 6(4), July/August)

[17] http:/ / www. skema-bs. fr/ faculte-recherche/ centre-intelligence-economique-et-influence [18] Gilad, Ben and Jan Herring. "CI Certification - Do We Need It?", Competitive Intelligence Magazine, 2001, 4(2), 28-31. [19] Blenkhorn, D. and C.S. Fleisher. Competitive Intelligence and Global Business. Westport, CT: Praeger, 2005 [20] (Journal of Competitive Intelligence and Management, volume 2, numbers 1-3

[21] http:/ / www. ege. fr/ [22] Gilad, Ben (2001). "Industry Risk Management: CI's Next Step", Competitive Intelligence Magazine, 4 (3), May–June. [23] Gilad, Ben. Early Warning. NY: American Management Association, 2003. [24] Gilad, Ben (2006). "Early Warning Revisited", Competitive Intelligence Magazine, 9(2), March–April. [25] Shaker, Steven and Richardson, Victor (2004). "Putting the System Back into Early Warning". Competitive Intelligence Magazine, 7(3), May–June. Competitive intelligence 77

[26] Comai, Alessandro and Tena, Joaquin (2007). "Early Warning Systems for your Competitive Landscape", Competitive Intelligence Magazine, 10(3), May–June. [27] Comai, Alessandro and Tena, Joaquin (2006). "Mapping and Anticipating the Competitive Landscape", Emecom Ediciones, Barcelona, Spain.

[28] Competitive Intelligence in CE (http:/ / www. pmrconsulting. com/ a3/ surviving-competitive-inteligence-projects-in-central-and-eastern-europe-and-russia)

[29] https:/ / www. cia. gov/ kids-page/ 6-12th-grade/ who-we-are-what-we-do/ the-intelligence-cycle. html [30] Barnea, A., (2010), "Creating More Value to the Competitive Intelligence Function", Competitive Intelligence Magazine, Vol. 13. No. 1, January/March. [31] Prescott, J. (1999 ), "The Evolution of Competitive Intelligence, Designing a Process for Action", APMP, Spring. [32] Fleisher, Craig S. and David Blenkhorn. Controversies in Competitive Intelligence: The Enduring Issues. Westport, CT: Praeger, 2003. [33] Fleisher, Craig S. (2003). "Should the Field be Called 'Competitive Intelligence?' pp. 56-69 in Fleisher, Craig S. and David Blenkhorn [eds.], Controversies in Competitive Intelligence: The Enduring Issues. Westport, CT: Praeger, 2003. [34] Skyrme, D.J. (1989). "The Planning and Marketing of the Market Intelligence Function," Marketing Intelligence and Planning, 7(1/2), 5-10. [35] Sharp, S. (2000). "Truth or Consequences: 10 Myths that Cripple Competitive Intelligence," Competitive Intelligence Magazine, 3(1), 37-40. [36] Competitive Intelligence Foundation (2006). Competitive Intelligence Ethics: Navigating the Gray Zone. D. Fehringer and Hohhof, B.[Eds], Alexandria, VA: Competitive Intelligence Foundation [37] Knip, Fleisher, & Dishman, 2003

[38] Ethics in Competitive Intelligence, University of Ottawa (http:/ / wiki. telfer. uottawa. ca/ ci-wiki/ index. php/

Ethics_in_Competitive_Intelligence#. 22The_Bible_of_BI_Ethics. 22) [39] Society of Competitive Intelligence Policy Analysis on Competitive Intelligence and the Economic Espionage Act, written by Richard

Horowitz, Esq. (http:/ / rhesq. com/ CI/ SCIP EEA Policy Analysis. pdf)

Data warehouse

In computing, a data warehouse (DW or DWH) is a database used for reporting and data analysis. The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting.

The typical ETL-based data warehouse uses staging, integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the

disparate source data systems. The integration layer integrates the Data Warehouse Overview disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchal groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.[1]

A data warehouse constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualization methods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse. Data warehouse 78

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

Benefits of a data warehouse A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to: • Maintain data history, even if the source transaction systems do not. • Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. • Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. • Present the organization's information consistently. • Provide a single common data model for all data of interest regardless of the data's source. • Restructure the data so that it makes sense to the business users. • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. • Add value to operational business applications, notably customer relationship management (CRM) systems.

History The concept of data warehousing dates back to the late 1980s[2] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to as legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that were tailored for ready access by users. Key developments in early years of data warehousing were: • 1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.[3] • 1970s — ACNielsen and IRI provide dimensional data marts for retail sales.[3] • 1970s — Bill Inmon begins to define and discuss the term: Data Warehouse • 1975 — Sperry Univac Introduce MAPPER (MAintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL. It was the first platform specifically designed for building Information Centers (a forerunner of contemporary Enterprise Data Warehousing platforms) • 1983 — Teradata introduces a database management system specifically designed for decision support. • 1983 — Sperry Corporation Martyn Richard Jones defines the Sperry Information Center approach, which while not being a true DW in the Inmon sense, did contain many of the characteristics of DW structures and process as Data warehouse 79

defined previously by Inmon, and later by Devlin. First used at the TSB England & Wales • 1984 — Metaphor Computer Systems, founded by David Liddle and Don Massaro, releases Data Interpretation System (DIS). DIS was a hardware/software package and GUI for business users to create a database management and analytic system. • 1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information system [4] in IBM Systems Journal where they introduce the term "business data warehouse". • 1990 — Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing. • 1991 — Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse. • 1992 — Bill Inmon publishes the book Building the Data Warehouse.[5] • 1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded. • 1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.[6] • 2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.

Dimensional vs. Normalized approach for storage of data There are two leading approaches to storing data in a data warehouse — the dimensional approach and the normalized approach. The dimensional approach, whose supporters are referred to as “Kimballites”, believe in Ralph Kimball’s approach in which it is stated that the data warehouse should be modeled using a Dimensional Model/star schema. The normalized approach, also called the 3NF model, whose supporters are referred to as “Inmonites”, believe in Bill Inmon's approach in which it is stated that the data warehouse should be modeled using an E-R model/normalized model. In a dimensional approach, transaction data are partitioned into "facts", which are generally numeric transaction data, and "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order. A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly. Dimensional structures are easy to understand for business users, because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organization’s business processes and operational system whereas the dimensions surrounding them contain context about the measurement (Kimball, Ralph 2008). The main disadvantages of the dimensional approach are: 1. In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is complicated, and 2. It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does business. In the normalized approach, the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.). The normalized structure divides data into entities, which creates several tables in a relational database. When applied in large enterprises the result is dozens of tables that are linked together by a web of joins. Furthermore, each of the created entities is converted into separate physical tables when the database is implemented (Kimball, Ralph 2008). The main advantage of this approach is that it is straightforward to add information into the database. A disadvantage of this approach is that, because of the number of tables involved, it can be difficult for users both to: Data warehouse 80

1. join data from different sources into meaningful information and then 2. access the information without a precise understanding of the sources of data and of the data structure of the data warehouse. It should be noted that both normalized – and dimensional models can be represented in entity-relationship diagrams as both contain joined relational tables. The difference between the two models is the degree of normalization. These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree (Kimball, Ralph 2008). In Information-Driven Business (Wiley 2010),[7] Robert Hillard proposes an approach to comparing the two approaches based on the information needs of the business problem. The technique shows that normalized models hold far more information than their dimensional equivalents (even when the same fields are used in both models) but this extra information comes at the cost of usability. The technique measures information quantity in terms of Information Entropy and usability in terms of the Small Worlds data transformation measure.[8]

Top-down versus bottom-up design methodologies

Bottom-up design Ralph Kimball, a well-known author on data warehousing,[9] is a proponent of an approach to data warehouse design which he describes as bottom-up.[10] In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. Though it is important to note that in Kimball methodology, the bottom-up process is the result of an initial business-oriented Top-down analysis of the relevant business processes to be modelled. Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".[11] The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts. The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts. Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform". Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self-contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. For example, the data warehousing effort might start in the "Sales" department, by building a Sales-data mart. Upon completion of the Sales-data mart, the business might then decide to expand the warehousing activities into the, say, "Production department" resulting in a Production data mart. The requirement for the Sales data mart and the Production data mart to be integrable, is that they share the same "Bus", that will be, that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus, and that the individual data marts links that information from the bus. Note that this does not require 100% awareness from the onset of the data warehousing Data warehouse 81

effort, no master plan is required upfront. The Sales-data mart is good as it is (assuming that the bus is complete) and the Production-data mart can be constructed virtually independent of the Sales-data mart (but not independent of the Bus). If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do, in this example either "Sales" or "Production" information, but can deliver integrated Sales-Production information, which, often, is of critical business value. An integration (possibly) achieved in a flexible and iterative fashion.

Top-down design Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise.[11] Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision, the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities. Inmon states that the data warehouse is: Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together. Non-volatile Data in the data warehouse are never over-written or deleted — once committed, the data are static, read-only, and retained for future reporting. Integrated The data warehouse contains data from most or all of an organization's operational systems and these data are made consistent. Time-variant For An operational system, the stored data contains the current value. The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.[11]

Hybrid design Data warehouse (DW) solutions often resemble the hub and spokes architecture. Legacy systems feeding the DW/BI solution often include customer relationship management (CRM) and enterprise resource planning solutions (ERP), generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load (ETL) process, DW solutions often make use of an operational data store (ODS). The information from the ODS is then parsed into the actual DW. To reduce data redundancy, larger systems will often store the data in a normalized way. Data marts for specific reports can then be built on top of the DW solution. Data warehouse 82

It is important to note that the DW database in a hybrid solution is kept on third normal form to eliminate data redundancy. A normal relational database however, is not efficient for business intelligence reports where dimensional modelling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required. The DW effectively provides a single source of information from which the data marts can read, creating a highly flexible solution from a BI point of view. The hybrid architecture allows a DW to be replaced with a master data management solution where operational, not static information could reside. The Data Vault Modeling components follow hub and spokes architecture. This modeling style is a hybrid design, consisting of the best of breed practices from both 3rd normal form and star schema. The Data Vault model is not a true 3rd normal form, and breaks some of the rules that 3NF dictates be followed. It is however, a top-down architecture with a bottom up design. The Data Vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which when built, still requires the use of a data mart or star schema based release area for business purposes.

Data warehouses versus operational systems Operational systems are optimized for preservation of data integrity and speed of recording of business transactions through use of database normalization and an entity-relationship model. Operational system designers generally follow the Codd rules of database normalization in order to ensure data integrity. Codd defined five increasingly stringent rules of normalization. Fully normalized database designs (that is, those satisfying all five Codd rules) often result in information from a business transaction being stored in dozens to hundreds of tables. Relational databases are efficient at managing the relationships between these tables. The databases have very fast insert/update performance because only a small amount of data in those tables is affected each time a transaction is processed. Finally, in order to improve performance, older data are usually periodically purged from operational systems.

Evolution in organization use These terms refer to the level of sophistication of a data warehouse: Offline operational data warehouse Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data Offline data warehouse Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data are stored in a data structure designed to facilitate reporting. On time data warehouse Online Integrated Data Warehousing represent the real time Data warehouses stage data in the warehouse is updated for every transaction performed on the source data Integrated data warehouse These data warehouses assemble data from different areas of business, so users can look up the information they need across other systems.[12] Data warehouse 83

Sample applications Some of the applications data warehousing can be used for are • Agriculture[13] • Call record analysis • Churn Prediction for Telecom subscribers, Credit Card users etc. • Decision support • Financial forecasting • Insurance fraud analysis • Logistics and Inventory management • Trend analysis

References [1] Patil, Preeti S.; Srikantha Rao; Suryakant B. Patil (2011). "Optimization of Data Warehousing System: Simplification in Reporting and

Analysis" (http:/ / www. ijcaonline. org/ proceedings/ icwet/ number9/ 2131-db195). IJCA Proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET) (Foundation of Computer Science) 9 (6): 33–37. .

[2] "The Story So Far" (http:/ / www. computerworld. com/ databasetopics/ data/ story/ 0,10801,70102,00. html). 2002-04-15. . Retrieved 2008-09-21. [3] Kimball 2002, pg. 16

[4] http:/ / ieeexplore. ieee. org/ stamp/ stamp. jsp?tp=& arnumber=5387658 [5] Inmon, Bill (1992). Building the Data Warehouse. Wiley. ISBN 0-471-56960-7. [6] Kimball, Ralph (1996). The Data Warehouse Toolkit. Wiley. ISBN 0-471-15337-0. [7] Hillard, Robert (2010). Information-Driven Business. Wiley. ISBN 978-0-470-62577-4.

[8] http:/ / mike2. openmethodology. org/ wiki/ Small_Worlds_Data_Transformation_Measure [9] Kimball 2002, pg. 310

[10] "The Bottom-Up Misnomer" (http:/ / www. kimballgroup. com/ html/ articles_search/ articles2003/ 0309IE. html?TrkID=IE200309_2). 2003-09-17. . Retrieved 2012-02-14. [11] Ericsson 2004, pp. 28–29

[12] "Data Warehouse" (http:/ / www. tech-faq. com/ data-warehouse. html). . [13] Abdullah, Ahsan (2009). "Analysis of mealybug incidence on the cotton crop using ADSS-OLAP (Online Analytical Processing) tool, Volume 69, Issue 1". Computers and Electronics in Agriculture 69: 59–72. doi:10.1016/j.compag.2009.07.003.

Further reading • Davenport, Thomas H. and Harris, Jeanne G. Competing on Analytics: The New Science of Winning (2007) Harvard Business School Press. ISBN 978-1-4221-0332-6 • Ganczarski, Joe. Data Warehouse Implementations: Critical Implementation Factors Study (2009) VDM Verlag ISBN 3-639-18589-7 ISBN 978-3-639-18589-8 • Kimball, Ralph and Ross, Margy. The Data Warehouse Toolkit Second Edition (2002) John Wiley and Sons, Inc. ISBN 0-471-20024-7 • Linstedt, Graziano, Hultgren. The Business of Data Vault Modeling Second Edition (2010) Dan linstedt, ISBN 978-1-4357-1914-9 • William Inmon. Building the Data Warehouse 2005) John Wiley and Sons, ISBN 978-8-1265-0645-3 Data warehouse 84

External links

• Ralph Kimball articles (http:/ / www. kimballgroup. com/ html/ articles. html)

• International Journal of Computer Applications (http:/ / www. ijcaonline. org/ archives/ number3/ 77-172)

Data mart

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.[1] This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc. The related term spreadmart describes the situation that occurs when one or more business analysts develop a system of linked to perform a business analysis, then grow it to a size and degree of complexity that makes it nearly impossible to maintain.

Design schemas • Star schema - fairly popular design choice; enables a relational database to emulate the analytical functionality of a multidimensional database • Snowflake schema • Data Mart Architecture Pattern (EA Reference Architecture)

Reasons for creating a data mart • Easy access to frequently needed data • Creates collective view by a group of users • Improves end-user response time • Ease of creation • Lower cost than implementing a full data warehouse • Potential users are more clearly defined than in a full data warehouse • Contains only business essential data and is less cluttered.

Dependent data mart According to the Inmon school of data warehousing, a dependent data mart is a logical subset (view) or a physical subset (extract) of a larger data warehouse, isolated for one of the following reasons: • A need refreshment for a special data model or schema: e.g., to restructure for OLAP • Performance: to offload the data mart to a separate computer for greater efficiency or to obviate the need to manage that workload on the centralized data warehouse. • Security: to separate an authorized data subset selectively • Expediency: to bypass the data governance and authorizations required to incorporate a new application on the Enterprise Data Warehousetest • Proving Ground: to demonstrate the viability and ROI (return on investment) potential of an application prior to migrating it to the Enterprise Data Warehouse Data mart 85

• Politics: a coping strategy for IT (Information Technology) in situations where a user group has more influence than funding or is not a good citizen on the centralized data warehouse. • Politics: a coping strategy for consumers of data in situations where a data warehouse team is unable to create a usable data warehouse. According to the Inmon school of data warehousing, tradeoffs inherent with data marts include limited scalability, duplication of data, data inconsistency with other silos of information, and inability to leverage enterprise sources of data. The alternative school of data warehousing is that of Ralph Kimball. In his view, a data warehouse is nothing more than the union of all the data marts. This view helps to reduce costs and provides fast development, but can create an inconsistent data warehouse, especially in large organizations. Therefore, Kimball's approach is more suitable form small-to-medium corporations.[2]

References

[1] Data Mart Does Not Equal Data Warehouse (http:/ / www. information-management. com/ infodirect/ 19991120/ 1675-1. html) [2] Paulraj Ponniah. Data Warehousing Fundamentals for IT Professionals. Wiley, 2010, pp. 29–32. ISBN 0470462078.

Bibliography

• Inmon, William (2000-07-18). "Data Mart Does Not Equal Data Warehouse" (http:/ / csis. bits-pilani. ac. in/

faculty/ goel/ Data Warehousing/ Articles/ Data Marts/ dataWarehouse_com Article_DM VS DW. htm). DMReview.com. Forrester Research 86 Forrester Research

Forrester Research, Inc.

Type Public

[1] Traded as NASDAQ: FORR

Industry Market research

Founded 1983 by George F. Colony

Headquarters 60 Acorn Park Drive, Cambridge, MA

Employees 1,068 (as of September 30, 2008)

[2] Website Forrester Research

Forrester Research is an independent technology and market research company that provides advice on existing and potential impact of technology, to its clients and the public. Forrester Research has five research centers in the US: Cambridge, Massachusetts; New York, New York; San Francisco, California; Washington, D.C.; and Dallas, Texas. It also has four European research centers in Amsterdam, Frankfurt, London, and Paris. The firm has 27 sales locations worldwide. It offers a variety of services including syndicated research on technology as it relates to business, quantitative market research on George Colony, CEO of Forrester Research consumer technology adoption as well as enterprise IT spending, research-based consulting and advisory services, events, workshops, teleconferences, and executive peer-networking programs.

History Forrester Research was founded in July 1983 by George Forrester Colony, now chairman of the board and chief executive officer, in Cambridge, Massachusetts. The company's first report, "The Professional Automation Report," was published in November 1983. In November 1996, Forrester announced its initial public offering of 2,300,000 shares. In February 2000, the company announced its secondary public offering of 626,459 shares. The company acquired Fletcher Research, a British Internet research firm, in November 1999; FORIT GmbH, a German market research and consulting firm, in October 2000 and Giga Information Group, a Cambridge, Massachusetts-based information technology research and consulting company, in February 2003.[3] In July 2008, Forrester announced the acquisition of New York City-based JupiterResearch.[4] Forrester acquired Strategic Oxygen on December 1, 2009, and Springboard Research on May 12, 2011. Forrester Research 87

Leadership • George F. Colony, Chairman of the Board, President, and CEO • Charles Rutstein, Chief Operating Officer • Tom Pohlmann, Chief Marketing & Strategy Officer • Michael Doyle, Chief Financial Officer • Elizabeth Lemons, Chief People Officer • Gail S. Mann, Chief Legal Officer And Secretary • Steven Peltzman, Chief Business Technology Officer • Dennis van Lingen, Managing Director, Marketing & Strategy Client Group; Chief Europe, Middle East, & Africa Officer • Ellen Daley, Managing Director, Business Technology Client Group

Competitors • AMR Research (bought by Gartner) • Aberdeen Group • ABI Research • Basex • Berg Insight • Canalys • CMS Watch • Cutter Consortium • The 451 Group • Gartner • GigaOM Pro • The Gilbane Group • Gfk • Gleanster Research • IDC • Info-Tech Research Group • IANS (formerly the Institute of Applied Network Security) • Leading Edge Forum • Nemertes Research • Ovum Ltd. • Yankee Group Forrester Research 88

References

[1] http:/ / www. nasdaq. com/ symbol/ forr

[2] http:/ / www. forrester. com

[3] BNet, "Forrester Research to acquire Giga Information Group for 0.74 times revenue" (http:/ / findarticles. com/ p/ articles/ mi_qa3755/

is_200302/ ai_n9221953/ )

[4] Forbes Magazine, "Forrester Buys JupiterResearch For $23 Million" (http:/ / www. forbes. com/ technology/ 2008/ 07/ 31/

forrester-buys-jupiter-research-tech-cx_pco_0731paidcontent. html)

External links

• Forrester Research (http:/ / www. forrester. com/ )

Information management

Information management (IM) is the collection and management of information from one or more sources and the distribution of that information to one or more audiences. This sometimes involves those who have a stake in, or a right to that information. Management means the organization of and control over the structure, processing and delivery of information. Throughout the 1970s this was largely limited to files, file maintenance, and the life cycle management of paper-based files, other media and records. With the proliferation of information technology starting in the 1970s, the job of information management took on a new light, and also began to include the field of data maintenance. No longer was information management a simple job that could be performed by almost anyone. An understanding of the technology involved, and the theory behind it became necessary. As information storage shifted to electronic means, this became more and more difficult. By the late 1990s when information was regularly disseminated across computer networks and by other electronic means, network managers, in a sense, became information managers. Those individuals found themselves tasked with increasingly complex tasks, hardware and software. With the latest tools available, information management has become a powerful resource and a large expense for many organizations. In short, information management entails organizing, retrieving, acquiring, securing and maintaining information. It is closely related to and overlapping with the practice of data management.

Information management concepts Following the behavioral science theory of management, mainly developed at Carnegie Mellon University and prominently represented by Barnard, Richard M. Cyert, March and Simon, most of what goes on in service organizations is actually decision making and information processes. The crucial factor in the information and decision process analysis is thus individuals’ limited ability to process information and to make decisions under these limitations. According to March and Simon,[1] organizations have to be considered as cooperative systems with a high level of information processing and a vast need for decision making at various levels. They also claimed that there are factors that would prevent individuals from acting strictly rationally, in opposite to what has been proposed and advocated by classic theorists. Instead of using the model of the economic man, as advocated in classic theory, they proposed the administrative man as an alternative based on their argumentation about the cognitive limits of rationality. While the theories developed at Carnegie Mellon clearly filled some theoretical gaps in the discipline, March and Simon[1] did not propose a certain organizational form that they considered especially feasible for coping with cognitive limitations and bounded rationality of decision-makers. Through their own argumentation against Information management 89

normative decision-making models, i.e., models that prescribe people how they ought to choose, they also abandoned the idea of an ideal organizational form. In addition to the factors mentioned by March and Simon, there are two other considerable aspects, stemming from environmental and organizational dynamics. Firstly, it is not possible to access, collect and evaluate all environmental information being relevant for taking a certain decision at a reasonable price, i.e., time and effort.[2] In other words, following a national economic framework, the transaction cost associated with the information process is too high. Secondly, established organizational rules and procedures can prevent the taking of the most appropriate decision, i.e., that a sub-optimum solution is chosen in accordance to organizational rank structure or institutional rules, guidelines and procedures,[3][4] an issue that also has been brought forward as a major critique against the principles of bureaucratic organizations.[5] According to the Carnegie Mellon School and its followers, information management, i.e., the organization's ability to process information, is at the core of organizational and managerial competencies. Consequently, strategies for organization design must be aiming at improved information processing capability. Jay Galbraith[6] has identified five main organization design strategies within two categories – increased information processing capacity and reduced need for information processing. 1. Reduction of information processing needs 1. Environmental management 2. Creation of slack resources 3. Creation of self-contained tasks 2. Increasing the organizational information processing capacity 1. Creation of lateral relations 2. Vertical information systems Environmental management. Instead of adapting to changing environmental circumstances, the organization can seek to modify its environment. Vertical and horizontal collaboration, i.e. cooperation or integration with other organizations in the industry value system are typical means of reducing uncertainty. An example of reducing uncertainty in relation to the prior or demanding stage of the industry system is the concept of Supplier-Retailer collaboration or Efficient Customer Response. Creation of slack resources. In order to reduce exceptions, performance levels can be reduced, thus decreasing the information load on the hierarchy. These additional slack resources, required to reduce information processing in the hierarchy, represent an additional cost to the organization. The choice of this method clearly depends on the alternative costs of other strategies. Creation of self-contained tasks. Achieving a conceptual closure of tasks is another way of reducing information processing. In this case, the task-performing unit has all the resources required to perform the task. This approach is concerned with task (de-)composition and interaction between different organizational units, i.e. organizational and information interfaces. Creation of lateral relations. In this case, lateral decision processes are established that cut across functional organizational units. The aim is to apply a system of decision subsidiarity, i.e. to move decision power to the process, instead of moving information from the process into the hierarchy for decision-making. Investment in vertical information systems. Instead of processing information through the existing hierarchical channels, the organization can establish vertical information systems. In this case, the information flow for a specific task (or set of tasks) is routed in accordance to the applied business logic, rather than the hierarchical organization. Following the lateral relations concept, it also becomes possible to employ an organizational form that is different from the simple hierarchical information. The Matrix organization is aiming at bringing together the functional and product departmental bases and achieving a balance in information processing and decision making between the vertical (hierarchical) and the horizontal (product or project) structure. The creation of a matrix organization can also Information management 90

be considered as management's response to a persistent or permanent demand for adaptation to environmental dynamics, instead of the response to episodic demands.

References [1] March, James G. and Simon, Herbert A. (1958),Pradeep praduman wg (2008) Organizations, John Wiley & Sons [2] Hedberg, Bo (1981), "How organizations learn and unlearn", in: Nyström, P.C. & Starbuck, W.H., Handbook of Organizational Design, Oxford University Press [3] Mackenzie K.D. (1978), Organizational Structures, AHM Publishing Corporation [4] Mullins, L.J (1993), Management and Organizational Behaviours, 3rd ed., Pitman Publishing [5] Wigand, Rolf T., Picot, Arnold and Reichwald, Ralf (1997), Information, Organization and Management: Expanding Markets and Corporate Boundaries, Wiley & Sons [6] Galbraith, Jay R. p. 49 ff. (1977), Organization Design, Addison-Wesley

External links

• TLCG Information Server (http:/ / www. thelondonconsulting. com/ products/ information-server)

• Information Management Paper (http:/ / www. managing-information. org. uk/ summary. htm)

• "Information Management Body of Knowledge" (http:/ / www. imbok. org) (Offers a 170pp freely-distributable broadly-based examination of IM issues

• Information Management papers from IPL (http:/ / www. ipl. com/ services/ businessconsulting/ resources/ #businessconsulting-papers)

Business process

A business process or business method is a collection of related, structured activities or tasks that produce a specific service or product (serve a particular goal) for a particular customer or customers. It often can be visualized with a flowchart as a sequence of activities with interleaving decision points or with a Process Matrix as a sequence of activities with relevance rules based on the data in the process.

Overview There are three types of business processes: 1. Management processes, the processes that govern the operation of a system. Typical management processes include "Corporate Governance" and "Strategic Management". 2. Operational processes, processes that constitute the core business and create the primary value stream. Typical operational processes are Purchasing, Manufacturing, Advertising and Marketing, and Sales. 3. Supporting processes, which support the core processes. Examples include Accounting, Recruitment, Call center, Technical support. A business process begins with a mission objective and ends with achievement of the business objective. Process-oriented organizations break down the barriers of structural departments and try to avoid functional silos. A business process can be decomposed into several sub-processes[1], which have their own attributes, but also contribute to achieving the goal of the super-process. The analysis of business processes typically includes the mapping of processes and sub-processes down to activity level. Business Processes are designed to add value for the customer and should not include unnecessary activities. The outcome of a well designed business process is increased effectiveness (value for the customer) and increased efficiency (less costs for the company). Business process 91

Business Processes can be modeled through a large number of methods and techniques. For instance, the Business Process Modeling Notation is a Business Process Modeling technique that can be used for drawing business processes in a workflow.

History

Adam Smith One of the most significant people in 18th century to describe processes was Adam Smith in his famous (1776) example of a pin factory. Inspired by an article in Diderot's Encyclopédie, Smith described the production of a pin in the following way: ”One man draws out the wire, another straights it, a third cuts it, a fourth points it, a fifth grinds it at the top for receiving the head: to make the head requires two or three distinct operations: to put it on is a particular business, to whiten the pins is another ... and the important business of making a pin is, in this manner, divided into about eighteen distinct operations, which in some manufactories are all performed by distinct hands, though in others the same man will sometime perform two or three of them.” Smith also first recognized how the output could be increased through the use of labor division. Previously, in a society where production was dominated by handcrafted goods, one man would perform all the activities required during the production process, while Smith described how the work was divided into a set of simple tasks, which would be performed by specialized workers. The result of labor division in Smith’s example resulted in productivity increasing by 24,000 percent (sic), i.e. that the same number of workers made 240 times as many pins as they had been producing before the introduction of labor division. It is worth noting that Smith did not advocate labor division at any price and per se. The appropriate level of task division was defined through experimental design of the production process. In contrast to Smith's view which was limited to the same functional domain and comprised activities that are in direct sequence in the manufacturing process, today's process concept includes cross-functionality as an important characteristic. Following his ideas the division of labor was adopted widely, while the integration of tasks into a functional, or cross-functional, process was not considered as an alternative option until much later.

Other definitions In the early 1990s, US corporations, and subsequently companies all over the world, started to adopt the concept of reengineering in an attempt to re-achieve the competitiveness that they had lost during the previous decade. A key characteristic of Business Process Reengineering (BPR) is the focus on business processes. Davenport (1993)[2] defines a (business) process as ”a structured, measured set of activities designed to produce a specific output for a particular customer or market. It implies a strong emphasis on how work is done within an organization, in contrast to a product focus’s emphasis on what. A process is thus a specific ordering of work activities across time and space, with a beginning and an end, and clearly defined inputs and outputs: a structure for action. ... Taking a process approach implies adopting the customer’s point of view. Processes are the structure by which an organization does what is necessary to produce value for its customers.” This definition contains certain characteristics a process must possess. These characteristics are achieved by a focus on the business logic of the process (how work is done), instead of taking a product perspective (what is done). Following Davenport's definition of a process we can conclude that a process must have clearly defined boundaries, input and output, that it consists of smaller parts, activities, which are ordered in time and space, that there must be a receiver of the process outcome- a customer - and that the transformation taking place within the process must add customer value. Business process 92

Hammer & Champy’s (1993)[3] definition can be considered as a subset of Davenport’s. They define a process as ”a collection of activities that takes one or more kinds of input and creates an output that is of value to the customer.” As we can note, Hammer & Champy have a more transformation oriented perception, and put less emphasis on the structural component – process boundaries and the order of activities in time and space. Rummler & Brache (1995)[4] use a definition that clearly encompasses a focus on the organization’s external customers, when stating that ”a business process is a series of steps designed to produce a product or service. Most processes (...) are cross-functional, spanning the ‘white space’ between the boxes on the organization chart. Some processes result in a product or service that is received by an organization's external customer. We call these primary processes. Other processes produce products that are invisible to the external customer but essential to the effective management of the business. We call these support processes.” The above definition distinguishes two types of processes, primary and support processes, depending on whether a process is directly involved in the creation of customer value, or concerned with the organization’s internal activities. In this sense, Rummler and Brache's definition follows Porter's value chain model, which also builds on a division of primary and secondary activities. According to Rummler and Brache, a typical characteristic of a successful process-based organization is the absence of secondary activities in the primary value flow that is created in the customer oriented primary processes. The characteristic of processes as spanning the white space on the organization chart indicates that processes are embedded in some form of organizational structure. Also, a process can be cross-functional, i.e. it ranges over several business functions. Finally, let us consider the process definition of Johansson et al. (1993).[5] They define a process as ”a set of linked activities that take an input and transform it to create an output. Ideally, the transformation that occurs in the process should add value to the input and create an output that is more useful and effective to the recipient either upstream or downstream.” This definition also emphasizes the constitution of links between activities and the transformation that takes place within the process. Johansson et al. also include the upstream part of the value chain as a possible recipient of the process output. Summarizing the four definitions above, we can compile the following list of characteristics for a business process. 1. Definability : It must have clearly defined boundaries, input and output. 2. Order : It must consist of activities that are ordered according to their position in time and space. 3. Customer : There must be a recipient of the process' outcome, a customer. 4. Value-adding : The transformation taking place within the process must add value to the recipient, either upstream or downstream. 5. Embeddedness : A process can not exist in itself, it must be embedded in an organizational structure. 6. Cross-functionality : A process regularly can, but not necessarily must, span several functions. Frequently, a process owner, i.e. a person being responsible for the performance and continuous improvement of the process, is also considered as a prerequisite... Business process 93

Importance of the Process Chain Business processes comprise a set of sequential sub-processes or tasks, with alternative paths depending on certain conditions as applicable, performed to achieve a given objective or produce given outputs. Each process has one or more needed inputs. The inputs and outputs may be received from, or sent to other business processes, other organizational units, or internal or external stakeholders. Business processes are designed to be operated by one or more business functional units, and emphasize the importance of the “process chain” rather than the individual units. In general, the various tasks of a business process can be performed in one of two ways – 1) manually and 2) by means of business data processing systems such as ERP systems. Typically, some process tasks will be manual, while some will be computer-based, and these tasks may be sequenced in many ways. In other words, the data and information that are being handled through the process may pass through manual or computer tasks in any given order.

Policies, Processes and Procedures The above improvement areas are equally applicable to policies, processes and detailed procedures (sub-processes/tasks). There is a cascading effect of improvements made at a higher level on those made at a lower level. For instance, if a recommendation to replace a given policy with a better one is made with proper justification and accepted in principle by business process owners, then corresponding changes in the consequent processes and procedures will follow naturally in order to enable implementation of the policies

Manual / Administrative vs. Computer System-Based Internal Controls Internal controls can be built into manual / administrative process steps and / or computer system procedures. It is advisable to build in as many system controls as possible, since these controls, being automatic, will always be exercised since they are built into the design of the business system software. For instance, an error message preventing an entry of a received raw material quantity exceeding the purchase order quantity by greater than the permissible tolerance percentage will always be displayed and will prevent the system user from entering such a quantity. However, for various reasons such as practicality, the need to be “flexible” (whatever that may signify), lack of business domain knowledge and experience, difficulties in designing/writing software, cost of software development/modification, the incapability of a computerised system to provide controls, etc., all internal controls otherwise considered to be necessary are often not built into business systems and software. In such a scenario, the manual, administrative process controls outside the computer system should be clearly documented, enforced and regularly exercised. For instance, while entering data to create a new record in a material system database’s item master table, the only internal control that the system can provide over the item description field is not to allow the user to leave the description blank – in other words, configure item description as a mandatory field. The system obviously cannot alert the user that the description is wrongly spelt, inappropriate, nonsensical, etc. In the absence of such a system-based internal control, the item creation process must include a suitable administrative control through the detailed checking, by a responsible officer, of all fields entered for the new item, by comparing a print-out taken from the system with the item data entry sheet, and ensuring that any corrections in the item description (and other similar fields where no system control is possible) are promptly carried out. Last but not least, the introduction of effective manual, administrative controls usually requires an overriding periodic check by a higher authority to ensure that such controls are exercised in the first place. Business process 94

Information Reports as an Essential Base for Executing Business Processes Business processes must include up-to-date and accurate Information reports to ensure effective action. An example of this is the availability of purchase order status reports for supplier delivery follow-up as described in the section on effectiveness above. There are numerous examples of this in every possible business process. Another example from production is the process of analysis of line rejections occurring on the shop floor. This process should include systematic periodical analysis of rejections by reason, and present the results in a suitable information report that pinpoints the major reasons, and trends in these reasons, for management to take corrective actions to control rejections and keep them within acceptable limits. Such a process of analysis and summarisation of line rejection events is clearly superior to a process which merely inquires into each individual rejection as it occurs. Business process owners and operatives should realise that process improvement often occurs with introduction of appropriate transaction, operational, highlight, exception or M.I.S. reports, provided these are consciously used for day-to-day or periodical decision-making. With this understanding would hopefully come the willingness to invest time and other resources in business process improvement by introduction of useful and relevant reporting systems.

Supporting theories and concepts Frederick Winslow Taylor developed the concept of scientific management. The concept contains aspects on the division of labor being relevant to the theory and practice around business processes. The business process related aspects of Taylor's scientific management concept are discussed in the article on Business Process Reengineering.

Span of control The span of control is the number of sub-ordinates a supervisor manages within a structural organization. Introducing a business process concept has a considerable impact on the structural elements of the organization and thus also on the span of control. Large organizations that are not organized as markets need to be organized in smaller units - departments - which can be defined according to different principles.

Information management concepts Information Management and the organization design strategies being related to it, are a theoretical cornerstone of the business process concept.

References

[1] Anderson, Chris. What are the Top Ten Core business processes? (http:/ / www. bizmanualz. com/ blog/ strategy/

what-are-the-ten-core-business-processes. html), Bizmanualz, July 22nd, 2009. [2] Thomas Davenport (1993). Process Innovation: Reengineering work through information technology. Harvard Business School Press, Boston [3] Michael Hammer and James Champy (1993). Reengineering the Corporation: A Manifesto for Business Revolution, Harper Business [4] Rummler & Brache (1995). Improving Performance: How to manage the white space on the organizational chart. Jossey-Bass, San Francisco [5] Henry J. Johansson et al. (1993). Business Process Reengineering: BreakPoint Strategies for Market Dominance. John Wiley & Sons Business process 95

Further reading • Hall, J.M. and Johnson, M.E. (2009, March), “When Should Process Be Art, Not Science“, Harvard Business Review, 58 – 65 • Paul Harmon, (2007). Business Process Change: 2nd Ed, A Guide for Business Managers and BPM and Six Sigma Professionals. Morgan Kaufmann • E. Obeng and S. Crainer S (1993). Making Re-engineering Happen. Financial Times Prentice Hall • Howard Smith and Peter Fingar (2003). Business Process Management. The Third Wave, MK Press • Slack et al., edited by: David Barnes (2000) The Open University, Understanding Business: Processes

External links

• BPTrends Website (http:/ / www. bptrends. com) A free webizine that publishes articles on all aspects of business process

Business Intelligence portal

Business Intelligence portal or BI portal is the primary access interface for DW/BI (Data Warehouse and Business Intelligence) applications. When using a DW/BI application the BI portal is the users first impression of the DW/BI system. From the BI portal which is typically a browser application is where the user of the DW/BI application has access to all the functionality of the DW/BI system, where if it is reports and other analytical functionality. The BI portal must be implemented in such a way that it is easy for the users of the DW/BI application to call on the functionality of the application.[1] Following components should be integrated in the portal : • Usable

User should easily find what they need in the BI tool.

• Content Rich

The portal is not just a report printing tool, it should contain more functionality such as advice, help, support information and documentation.

• Clean

The portal should be designed so it is easily understandable and not over complex as to confuse the users

• Current

The portal should be updated regularly.

• Interactive

The portal should be implemented in a way that makes it easy for the user to use its functionality

and encourage them to use the portal. Scalability and customization give the user the means to fit the

portal to each user. Business Intelligence portal 96

• Value Oriented

It is very important that the user has the feeling that the DW/BI application is a valuable resource that

is worth working on.

The BI portal's main functionality is to provide a navigation system of the DW/BI application. This means that the portal has to be implemented in a way that the user has access to every and all functions of the DW/BI application. This is normally no easy task because the DW/BI applications can have a very complex and big structure. The most common way to design the portal is to custom fit it to the business processes of the organization for which the DW/BI application is designed, in that way the portal can best fit the needs and requirements of its users.[2]

The BI portal needs to be easy to use and understand, and if possible have the same or similar outlook as other applications or web content of the organization the DW/BI application is designed for. "References" [1] The Data Warehouse Lifecycle Toolkit (2nd ed.). Ralph Kimball (2008). [2] Microsoft Data Warehouse Toolkit. Wiley Publishing. (2006) "External links to Business Intelligence and Data Warehouse related"

• bi-dw.info (http:/ / www. bi-dw. info/ )

• MyBusiness Portal (http:/ / www. 1300bpo. com/ mybusiness_portal/ mybusiness_portal. html)

• Ultimate Strategy Information Aggregator (http:/ / www. strategator. com)

Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction. Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains. An example is the extraction from news wire reports of corporate mergers, such as denoted by the formal relation: , from an online news sentence such as: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction 97

History Information extraction dates back to the late 1970s in the early days of NLP.[1] An early commercial system from the mid 1980s was JASPER built for Reuters by the Carnegie Group with the aim of providing real-time financial news to financial traders.[2] Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: • MUC-1 (1987), MUC-2 (1989): Naval operations messages. • MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. • MUC-5 (1993): Joint ventures and microelectronics domain. • MUC-6 (1995): News articles on management changes. • MUC-7 (1998): Satellite launch reports. Considerable support came from DARPA, the US defense agency, who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism.

Present significance The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the world wide web, refers to the existing Internet as the web of documents [3] and advocates that more of the content be made available as a web of data.[4] Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted.[5]

Tasks and subtasks Applying information extraction on text, is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical subtasks of IE include: • Named entity extraction which could include: • Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims to detect entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (/or, "might be") the specific person whom that sentence is talking about. • Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously-extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". • Relationship extraction: identification of relations between entities, such as: • PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") • PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Information extraction 98

• Semi-structured information extraction which may refer to any IE that tries to restore some kind information structure that has been lost through publication such as: • Table extraction: finding and extracting tables from documents. • Comments extraction : extracting comments from actual content of article in order to restore the link between author of each sentence • Language and vocabulary analysis • Terminology extraction: finding the relevant terms for a given corpus • Audio extraction • Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance [6] time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasing topic in research and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally lead to the fusion of extracted information from multiple kind of documents and sources.

World Wide Web applications IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people to cope with the enormous amount of data that is available online. Systems that perform IE from online text should meet the requirements of low cost, flexibility in development and easy adaptation to new domains. MUC systems fail to meet those criteria. Moreover, linguistic analysis performed for unstructured text does not exploit the HTML/XML tags and layout format that are available in online text. As a result, less linguistically intensive approaches have been developed for IE on the Web using wrappers, which are sets of highly accurate rules that extract a particular page's content. Manually developing wrappers has proved to be a time-consuming task, requiring a high level of expertise. Machine learning techniques, either supervised or unsupervised, have been used to induce such rules automatically. Wrappers typically handle highly structured collections of web pages, such as product catalogues and telephone directories. They fail, however, when the text type is less structured, which is also common on the Web. Recent effort on adaptive information extraction motivates the development of IE systems that can handle different types of text, from well-structured to almost free text -where common wrappers fail- including mixed types. Such systems can exploit shallow natural language knowledge and thus can be also applied to less structured text.

Approaches Three standard approaches are now widely accepted • Hand-written regular expressions (perhaps stacked) • Using classifiers • Generative: naïve Bayes classifier • Discriminative: maximum entropy models • Sequence models • Hidden Markov model • CMMs/MEMMs Information extraction 99

• Conditional random fields (CRF) are commonly used in conjunction with IE for tasks as varied as extracting information from research papers[7] to extracting navigation instructions.[8] Numerous other approaches exist for IE including hybrid approaches that combine some of the standard approaches previously listed.

Free or open source software and services • ReVerb [9] is an open source unsupervised relation extraction system from the University of Washington • GExp [10] is a rule based open source information extraction toolkit • General Architecture for Text Engineering "General Architecture for Text Engineering", which is bundled with a free Information Extraction system • OpenCalais Automated information extraction web service from Thomson Reuters (Free limited version) • Machine Learning for Language Toolkit (Mallet) is a Java-based package for a variety of natural language processing tasks, including information extraction. • Apache Tika [11] provides information extraction framework to parse textual content and meta data for several document formats[12] • DBpedia Spotlight is an open source tool in Java/Scala (and free web service) that can be used for named entity recognition and name resolution. • See also CRF implementations

References [1] Andersen, Peggy M.; Hayes, Philip J.; Huettner, Alison K.; Schmandt, Linda M.; Nirenburg, Irene B.; Weinstein, Steven P.. "Automatic

Extraction of Facts from Press Releases to Generate News Stories". CiteSeerX: 10.1.1.14.7943 (http:/ / citeseerx. ist. psu. edu/ viewdoc/

summary?doi=10. 1. 1. 14. 7943).

[2] Cowie, Jim; Wilks, Yorick. "Information Extraction". CiteSeerX: 10.1.1.61.6480 (http:/ / citeseerx. ist. psu. edu/ viewdoc/ summary?doi=10.

1. 1. 61. 6480).

[3] "Linked Data - The Story So Far" (http:/ / tomheath. com/ papers/ bizer-heath-berners-lee-ijswis-linked-data. pdf). .

[4] "Tim Berners-Lee on the next Web" (http:/ / www. ted. com/ talks/ tim_berners_lee_on_the_next_web. html). . [5] R. K. Srihari, W. Li, C. Niu and T. Cornell,"InfoXtract: A Customizable Intermediate Level Information Extraction Engine", Journal of

Natural Language Engineering (http:/ / journals. cambridge. org/ action/ displayIssue?iid=359643), Cambridge U. Press , 14(1), 2008, pp.33-69.

[6] A.Zils, F.Pachet, O.Delerue and F. Gouyon, Automatic Extraction of Drum Tracks from Polyphonic Music Signals (http:/ / www. csl. sony.

fr/ downloads/ papers/ 2002/ ZilsMusic. pdf), Proceedings of WedelMusic, Darmstadt, Germany, 2002. [7] Peng, F.; McCallum, A. (2006). "Information extraction from research papers using conditional random fields☆". Information Processing & Management 42: 963. doi:10.1016/j.ipm.2005.09.002.

[8] Shimizu, Nobuyuki; Hass, Andrew (2006). "Extracting Frame-based Knowledge Representation from Route Instructions" (http:/ / www. cs.

albany. edu/ ~shimizu/ shimizu+ haas2006frame. pdf). .

[9] http:/ / reverb. cs. washington. edu/

[10] http:/ / code. google. com/ p/ graph-expression/

[11] http:/ / tika. apache. org/

[12] Content Extraction with Tika (http:/ / www. lucidimagination. com/ Community/ Hear-from-the-Experts/ Articles/ Content-Extraction-Tika) Information extraction 100

External links

• MUC (http:/ / www. itl. nist. gov/ iaui/ 894. 02/ related_projects/ muc/ )

• ACE (http:/ / projects. ldc. upenn. edu/ ace/ ) (LDC)

• ACE (http:/ / www. itl. nist. gov/ iad/ 894. 01/ tests/ ace/ ) (NIST)

• Alias-I "competition" page (http:/ / alias-i. com/ lingpipe/ web/ competition. html) A listing of academic toolkits and industrial toolkits for natural language information extraction.

• Gabor Melli's page on IE (http:/ / www. gabormelli. com/ RKB/ Information_Extraction_Task) Detailed description of the information extraction task.

• CRF++: Yet Another CRF toolkit (http:/ / crfpp. sourceforge. net/ )

• A Survey of Web Information Extraction Systems (http:/ / www. csie. ncu. edu. tw/ ~chia/ pub/ iesurvey2006. pdf) A comprehensive survey.

• An information extraction framework (http:/ / www. tdg-seville. info/ Hassan/ Research) A framework to develop and compare information extractors. Enterprise Search

Granularity

Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation. It is the extent to which a larger entity is subdivided. For example, a yard broken into inches has finer granularity than a yard broken into feet. Coarse-grained systems consist of fewer, larger components than fine-grained systems; a coarse-grained description of a system regards large subcomponents while a fine-grained description regards smaller components of which the larger ones are composed. The terms granularity, coarse, and fine are relative, used when comparing systems or descriptions of systems. An example of increasingly fine granularity: a list of nations in the United Nations, a list of all states/provinces in those nations, a list of all counties in those states, etc. The terms fine and coarse are used consistently across fields, but the term granularity itself is not. For example, in investing, more granularity refers to more positions of smaller size, while photographic film that is more granular has fewer and larger chemical "grains".

Physics A fine-grained description of a system is a detailed, low-level model of it. A coarse-grained description is a model where some of this fine detail has been smoothed over or averaged out. The replacement of a fine-grained description with a lower-resolution coarse-grained model is called coarse graining. (See for example the second law of thermodynamics)

Molecular dynamics In molecular dynamics, coarse graining consists of replacing an atomistic description of a biological molecule with a lower-resolution coarse-grained model that averages or smooths away fine details. Coarse-grained models have been developed for investigating the longer time- and length-scale dynamics that are critical to many biological processes, such as lipid membranes and proteins. These concepts not only apply to biological molecules but also inorganic molecules. Coarse graining may simply remove certain degrees of freedom (e.g. vibrational modes between two atoms) or it may in fact simplify the two atoms completely via a single particle representation. The ends to which systems may be coarse grained is simply bound by the accuracy in the dynamics and structural properties one wishes Granularity 101

to replicate. This modern area of research is in its infancy, and although it is commonly used in biological modeling, the analytic theory behind it is poorly understood.

Computing In parallel computing, granularity means the amount of computation in relation to communication, i.e., the ratio of computation to the amount of communication. Fine-grained parallelism means individual tasks are relatively small in terms of code size and execution time. The data is transferred among processors frequently in amounts of one or a few memory words. Coarse-grained is the opposite: data are communicated infrequently, after larger amounts of computation. The finer the granularity, the greater the potential for parallelism and hence speed-up, but the greater the overheads of synchronization and communication.[1] In order to attain the best parallel performance, the best balance between load and communication overhead needs to be found. If the granularity is too fine, the performance can suffer from the increased communication overhead. On the other side, if the granularity is too coarse, the performance can suffer from load imbalance.

Reconfigurable computing and supercomputing In reconfigurable computing and in supercomputing these terms refer to the data path width. The use of about one bit wide processing elements like the configurable logic blocks (CLBs) in an FPGA is called fine-grained computing or fine-grained reconfigurability, whereas using wide data paths, such as, for instance 32 bits wide resources, like microprocessor CPUs or data-stream-driven data path units (DPUs) like in a reconfigurable datapath array (rDPA) is called coarse-grained computing coarse-grained reconfigurability.

Data granularity The granularity of data refers to the size in which data fields are sub-divided. For example, a postal address can be recorded, with coarse granularity, as a single field: 1. address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USA or with fine granularity, as multiple fields: 1. street address = 200 2nd Ave. South #358 2. city = St. Petersburg 3. postal code = FL 33701-4313 4. country = USA or even finer granularity: 1. street = 2nd Ave. South 2. address number = 200 3. suite/apartment number = #358 4. city = St. Petersburg 5. state = FL 6. postal-code = 33701 7. postal-code-add-on = 4313 8. country = USA Finer granularity has overheads for data input and storage. This manifests itself in a higher number of objects and methods in the object-oriented programming paradigm or more subroutine calls for procedural programming and parallel computing environments. It does however offer benefits in flexibility of data processing in treating each data field in isolation if required. A performance problem caused by excessive granularity may not reveal itself until Granularity 102

scalability becomes an issue.

Credit portfolio risk management In credit portfolio risk modeling, granularity refers to the number of the exposures in the portfolio. The higher the granularity, the more positions are in a credit portfolio, providing a higher degree of size diversification, which in turn reduces concentration risk. This is colloquially known as "not putting all your eggs in one basket".

Photographic film In photography, granularity is a measure of film grain. It is measured using a particular standard procedure but in general a larger number means the grains of silver are larger and there are fewer grains in a given area.

Business In modern U.S. business lingo, granularity refers to "specificity" or "hierarchical ranking". For example, granularity has been written about in the book, The Granularity of Growth: Making choices that drive enduring company performance. Its authors, Patrick Viguerie, Sven Smit, and Mehrdad Baghai say that there’s a problem with the broad-brush way that many companies describe their business opportunities. They argue that growth opportunities best emerge from a finer-than-usual understanding of market segments, their needs, and the capabilities required to serve them well.

References

[1] FOLDOC (http:/ / foldoc. doc. ic. ac. uk/ foldoc/ foldoc. cgi?granularity)

• de Pablo, J. J. (2011). "Coarse-grained simulations of macromolecules: From DNA to nanocomposites" (http:/ /

www. annualreviews. org/ doi/ pdf/ 10. 1146/ annurev-physchem-032210-103458). Annu. Rev. Phys. Chem. 62: 555. doi:10.1146/annurev-physchem-032210-103458. Mashup (web application hybrid) 103 Mashup (web application hybrid)

A mashup, in web development, is a web page, or web application, that uses and combines data, presentation or functionality from two or more sources to create new services. The term implies easy, fast integration, frequently using open Application programming interfaces (API) and data sources to produce enriched results that were not necessarily the original reason for producing the raw source data. The main characteristics of a mashup are combination, visualization, and aggregation. It is important to make existing data more useful, moreover for personal and professional use. To be able to permanently access the data of other services, mashups are generally client applications or hosted online. In the past years, more and more Web applications have published APIs that enable software developers to easily integrate data and functions instead of building them by themselves. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Mashup composition tools are usually simple enough to be used by end-users. They generally do not require programming skills and rather support visual wiring of GUI widgets, services and components together. Therefore, these tools contribute to a new vision of the Web, where users are able to contribute.

History The history of mashup can be backtracked by first understanding the broader context of the history of the Web. For Web 1.0 business model, companies stored consumer data on portals and updated them regularly. They controlled all the consumer data, and the consumer had to use their products and services to get the information. With the advent of Web 2.0 a new proposition was created, using Web standards that were commonly and widely adopted across traditional competitors and unlocked the consumer data. At the same time, mashups emerged allowing mixing and matching competitor's API to create new services. The term isn't formally defined by any standard-setting body.[1] The first mashups used mapping services or photo services to combine these services with data of any kind and therefore create visualizations of the data.[2] In the beginning, most mashups were consumer-based, but recently the mashup is to be seen as an interesting concept useful also to enterprises. Business mashups can combine existing internal data with external services to create new views on the data. Mashups are in the ascendant. As a statistic from Programmable Web found out in 2009 that three new mashups have been registered every single day for the last two years.[3]

Types of mashup There are many types of mashup, such as business mashups, consumer mashups, and data mashups.[4] The most common type of mashup is the consumer mashup, aimed at the general public. • Business (or enterprise) mashups define applications that combine their own resources, application and data, with other external Web services.[2] They focus data into a single presentation and allow for collaborative action among businesses and developers. This works well for an agile development project, which requires collaboration between the developers and customer (or customer proxy, typically a product manager) for defining and implementing the business requirements. Enterprise mashups are secure, visually rich Web applications that expose actionable information from diverse internal and external information sources. • Consumer mashups combines data from multiple public sources in the browser and organizes it through a simple browser user interface.[5] (e.g.: Wikipediavision combines Google Map and a Wikipedia API) • Data mashups, opposite to the consumer mashups, combine similar types of media and information from multiple sources into a single representation. The combination of all these resources create a new and distinct Web service Mashup (web application hybrid) 104

that was not originally provided by either source.

By API type Mashups can also be categorized by the basic API type they use but any of these can be combined with each other or embedded into other applications.

Data types • Indexed data (documents, weblogs, images, videos, shopping articles, jobs ...) used by Metasearch engines • Cartographic and geographic data: Geolocation software, Geovisualization • Feeds, podcasts: News aggregators

Functions • Data converters: language translators, speech processing, URL shorteners... • Communication: email, instant messaging, notification... • Visual data rendering: Information visualization, diagrams • Security related: electronic payment systems, ID identification... • Editors

Mashup enabler In technology, a mashup enabler is a tool for transforming incompatible IT resources into a form that allows them to be easily combined, in order to create a mashup. Mashup enablers allow powerful techniques and tools (such as mashup platforms) for combining data and services to be applied to new kinds of resources. An example of a mashup enabler is a tool for creating an RSS feed from a spreadsheet (which cannot easily be used to create a mashup). Many mashup editors include mashup enablers, for example, Presto Mashup Connectors [6], Convertigo Web Integrator [7] or Caspio Bridge. Mashup enablers have also been described as "the service and tool providers, that make mashups possible".

History Early mashups were developed manually by enthusiastic programmers. However, as mashups became more popular, companies began creating platforms for building mashups, which allow designers to visually construct mashups by connecting together mashup components. Mashup editors have greatly simplified the creation of mashups, significantly increasing the productivity of mashup developers and even opening mashup development to end-users and non-IT experts. Standard components and connectors enable designers to combine mashup resources in all sorts of complex ways with ease. Mashup platforms, however, have done little to broaden the scope of resources accessible by mashups and have not freed mashups from their reliance on well-structured data and open libraries (RSS feeds and public APIs). Mashup enablers evolved to address this problem, providing the ability to convert other kinds of data and services into mashable resources.

Web resources Of course, not all valuable data is located within organizations. In fact, the most valuable information for business intelligence and decision support is often external to the organisation. With the emergence of rich internet applications and online Web portals, a wide range of business-critical processes (such as ordering) are becoming available online. Unfortunately, very few of these data sources syndicate content in RSS format and very few of these services provide publicly accessible APIs. Mashup editors therefore solve this problem by providing enablers Mashup (web application hybrid) 105

or connectors.

Data integration challenges There are a number of challenges to address when integrating data from different sources. The challenges can be classified into four groups: text/data mismatch, object identifiers and schema mismatch, abstraction level mismatch, data accuracy.[8]

Text–data mismatch A large portion of data is described in text. Human language is often ambiguous - the same company might be referred to in several variations (e.g. IBM, International Business Machines, and Big Blue). The ambiguity makes cross-linking with structured data difficult. In addition, data expressed in human language is difficult to process via software programs. One of the functions of a data integration system is to overcome the mismatch between documents and data.[8]

Object identity and separate schemata Structured data is available in a plethora of formats. Lifting the data to a common data format is thus the first step. But even if all data is available in a common format, in practice sources differ in how they state what is essentially the same fact. The differences exist both on the level of individual objects and the schema level. As an example for a mismatch on the object level, consider the following: the SEC uses a so-called Central Index Key (CIK) to identify people (CEOs, CFOs), companies, and financial instruments while other sources, such as DBpedia (a structured data version of Wikipedia), use URIs to identify entities. In addition, each source typically uses its own schema and idiosyncrasies for stating what is essentially the same fact. Thus, Methods have to be in place for reconciling different representations of objects and schemata.

Abstraction levels Data sources provide data at incompatible levels of abstraction or classify their data according to taxonomies pertinent to a certain sector. Since data is being published at different levels of abstraction (e.g. person, company, country, or sector), data aggregated for the individual viewpoint may not match data e.g. from statistical offices. Also, there are differences in geographic aggregation (e.g. region data from one source and country-level data from another). A related issue is the use of local currencies (USD vs. EUR) which have to be reconciled in order to make data from disparate sources comparable and amenable for analysis.

Data quality Data quality is a general challenge when automatically integrating data from autonomous sources. In an open environment the data aggregator has little to no influence on the data publisher. Data is often erroneous, and combining data often aggravates the problem. Especially when performing reasoning (automatically inferring new data from existing data), erroneous data has potentially devastating impact on the overall quality of the resulting dataset. Hence, a challenge is how data publishers can coordinate in order to fix problems in the data or blacklist sites which do not provide reliable data. Methods and techniques are needed to; check integrity, accuracy, highlight, identify and sanity check, corroborating evidence; assess the probability that a given statement is true, equate weight differences between market sectors or companies; act as clearing houses for raising and settling disputes between competing (and possibly conflicting) data providers and interact with messy erroneous Web data of potentially dubious provenance and quality. In summary, errors in signage, amounts, labeling, and classification can seriously impede the utility of systems operating over such data. Mashup (web application hybrid) 106

Mashups versus portals Mashups and portals are both content aggregation technologies. Portals are an older technology designed as an extension to traditional dynamic Web applications, in which the process of converting data content into marked-up Web pages is split into two phases: generation of markup "fragments" and aggregation of the fragments into pages. Each markup fragment is generated by a "portlet", and the portal combines them into a single Web page. Portlets may be hosted locally on the portal server or remotely on a separate server. Portal technology defines a complete event model covering reads and updates. A request for an aggregate page on a portal is translated into individual read operations on all the portlets that form the page ("render" operations on local, JSR 168 portlets or "getMarkup" operations on remote, WSRP portlets). If a submit button is pressed on any portlet on a portal page, it is translated into an update operation on that portlet alone (processAction on a local portlet or performBlockingInteraction on a remote, WSRP portlet). The update is then immediately followed by a read on all portlets on the page. Portal technology is about server-side, presentation-tier aggregation. It cannot be used to drive more robust forms of application integration such as two-phase commit. Mashups differ from portals in the following respects:

Portal Mashup

Classification Older technology, extension to traditional Web server Using newer, loosely defined "Web 2.0" techniques model using well-defined approach

Philosophy/approach Approaches aggregation by splitting role of Web server Uses APIs provided by different content sites to aggregate into two phases: markup generation and aggregation of and reuse the content in another way markup fragments

Content Aggregates presentation-oriented markup fragments Can operate on pure XML content and also on dependencies (HTML, WML, VoiceXML, etc.) presentation-oriented content (e.g., HTML)

Location Traditionally, content aggregation takes place on the server Content aggregation can take place either on the server or on dependencies the client

Aggregation style "Salad bar" style: Aggregated content is presented "Melting Pot" style - Individual content may be combined in 'side-by-side' without overlaps any manner, resulting in arbitrarily structured hybrid content

Event model Read and update event models are defined through a CRUD operations are based on REST architectural specific portlet API principles, but no formal API exists

Relevant standards Portlet behavior is governed by standards JSR 168, JSR Base standards are XML interchanged as REST or Web 286 and WSRP, although portal page layout and portal Services. RSS and Atom are commonly used. More specific functionality are undefined and vendor-specific mashup standards such as EMML are emerging.

The portal model has been around longer and has had greater investment and product research. Portal technology is therefore more standardized and mature. Over time, increasing maturity and standardization of mashup technology will likely make it more popular than portal technology because it is more closely associated with Web 2.0 and lately Service-oriented Architectures (SOA).[9] New versions of portal products are expected to eventually add mashup support while still supporting legacy portlet applications. Mashup technologies, in contrast, are not expected to provide support for portal standards. Mashup (web application hybrid) 107

Business mashups Mashup uses are expanding in the business environment. Business mashups are useful for integrating business and data services, as business mashups technologies provide the ability to develop new integrated services quickly, to combine internal services with external or personalized information, and to make these services tangible to the business user through user-friendly Web browser interfaces.[10] Business mashups differ from consumer mashups in the level of integration with business computing environments, security and access control features, governance, and the sophistication of the programming tools (mashup editors) used. Another difference between business mashups and consumer mashups is a growing trend of using business mashups in as a service (SaaS) offering. Many of the providers of business mashups technologies have added SOA features.

Architectural aspects of mashups The architecture of a mashup is divided into three layers: • Presentation / user interaction: this is the user interface of mashups. The technologies used are HTML/XHTML, CSS, Javascript, Asynchronous Javascript and Xml (Ajax). • Web Services: the products functionality can be accessed using the API services. The technologies used are XMLHTTPRequest, XML-RPC, JSON-RPC, SOAP, REST. • Data: handling the data like sending, storing and receiving. The technologies used are XML, JSON, KML. Architecturally, there are two styles of mashups: Web-based and server-based. Whereas Web-based mashups typically use the user's Web browser to combine and reformat the data, server-based mashups analyze and reformat the data on a remote server and transmit the data to the user's browser in its final form.[11] Mashups appear to be a variation of a façade pattern.[12] That is: a software engineering design pattern that provides a simplified interface to a larger body of code (in this case the code to aggregate the different feeds with different APIs). Mashups can be used with software provided as a service (SaaS). After several years of standards development, mainstream businesses are starting to adopt service-oriented architectures (SOA) to integrate disparate data by making them available as discrete Web services. Web services provide open, standardized protocols to provide a unified means of accessing information from a diverse set of platforms (operating systems, programming languages, applications). These Web services can be reused to provide completely new services and applications within and across organizations, providing business flexibility.

References

[1] "Enterprise Mashups: The New Face of Your SOA" (http:/ / soa. sys-con. com/ node/ 719917). http:/ / soa. sys-con. com/ : SOA WORLD MAGAZINE. . Retrieved 2010-03-03. "The term mashup isn't subject to formal definition by any standards-setting body."

[2] Holmes, Josh. "Enterprise Mashups" (http:/ / msdn. microsoft. com/ en-us/ architecture/ bb906060. aspx). MSDN Architecture Journal. MSDN Architecture Center. .

[3] "Enterprise Mashups: The New Face of Your SOA" (http:/ / soa. sys-con. com/ node/ 719917). http:/ / soa. sys-con. com/ : SOA WORLD MAGAZINE. . Retrieved 2010-03-03. "One popular mashup site, Programmable Web, reports that three new mashups have been registered every single day for the last two years."

[4] Sunilkumar Peenikal (2009). "Mashups and the enterprise" (http:/ / www. mphasis. com/ pdfs/ Mashups_and_the_Enterprise. pdf). MphasiS - HP. .

[5] "Enterprise Mashups: The New Face of Your SOA" (http:/ / soa. sys-con. com/ node/ 719917). http:/ / soa. sys-con. com/ : SOA WORLD MAGAZINE. . Retrieved 2010-03-03. "A consumer mashup is an application that combines data from multiple public sources in the browser and organizes it through a simple browser user interface."

[6] http:/ / www. jackbe. com/ products/ connectors. php

[7] http:/ / www. convertigo. com/ en/ overview/ features/ web. html Mashup (web application hybrid) 108

[8] E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” (http:/ / sw. deri. org/ 2009/ 09/ financial-data/ ) in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.

[9] Digna, Larry (2007). "Gartner: The future of portals is mashups, SOA, more aggregation" (http:/ / blogs. zdnet. com/ BTL/ ?p=4912). ZDNET. .

[10] Holt, Adams (2009). "Executive IT Architect, Mashup business scenarios and patterns" (http:/ / www. ibm. com/ developerworks/ lotus/

library/ mashups-patterns-pt1/ ). IBM DeveloperWorks. .

[11] Bolim, Michael (2005). "End-User Programming for the Web, MIT MS thesis, 2.91 MB PDF" (http:/ / bolinfest. com/

Michael_Bolin_Thesis_Chickenfoot. pdf). pp. 22–23. . [12] Design Patterns: Elements of Resuable Object-Oriented Software (ISBN 0-201-63361-2) by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides

External links

• Why Mashups = (REST + ‘Traditional SOA’) * Web 2.0 (http:/ / blog. sherifmansour. com/ ?p=187)

• Mashups Part I: Bringing SOA to the People (http:/ / www. soamag. com/ I18/ 0508-1. asp)

• Mashups Part II: Why SOA Architects Should Care (http:/ / www. soamag. com/ I21/ 0808-1. asp)

Green computing

Green computing, green IT or ICT Sustainability, refers to environmentally sustainable computing or IT. In the article Harnessing Green IT: Principles and Practices, San Murugesan defines the field of green computing as "the study and practice of designing, manufacturing, using, and disposing of computers, servers, and associated subsystems—such as monitors, printers, storage devices, and networking and communications systems — efficiently and effectively with minimal or no impact on the environment."[1] The goals of green computing are similar to green chemistry; reduce the use of hazardous materials, maximize energy efficiency during the product's lifetime, and promote the recyclability or biodegradability of defunct products and factory waste. Many corporate IT department have Green Computing initiatives to reduce the environmental impacts of their IT operations [2]. Research continues into key areas such as making the use of computers as energy-efficient as possible, and designing algorithms and systems for efficiency-related computer technologies.

Origins

In 1992, the U.S. Environmental Protection Agency launched Energy Star, a voluntary labeling program that is designed to promote and recognize energy-efficiency in monitors, climate control equipment, and other technologies. This resulted in the widespread adoption of sleep mode among consumer electronics. Concurrently, the Swedish organization TCO Development launched the TCO Certification program to promote low magnetic and electrical emissions from CRT-based computer displays; this program was later expanded to include criteria on energy consumption, ergonomics, and the use of hazardous materials in construction.[3] Energy Star logo

Regulations and industry initiatives The Organisation for Economic Co-operation and Development (OECD) has published a survey of over 90 government and industry initiatives on "Green ICTs", i.e. information and communication technologies, the environment and climate change. The report concludes that initiatives tend to concentrate on the greening ICTs themselves rather than on their actual implementation to tackle global warming and environmental degradation. In general, only 20% of initiatives have measurable targets, with government programs tending to include targets more Green computing 109

frequently than business associations.[4]

Government Many governmental agencies have continued to implement standards and regulations that encourage green computing. The Energy Star program was revised in October 2006 to include stricter efficiency requirements for computer equipment, along with a tiered ranking system for approved products.[5][6] There are currently 26 US states that have established state-wide recycling programs for obsolete computers and consumer electronics equipment.[7] The statutes either impose an "advance recovery fee" for each unit sold at retail or require the manufacturers to reclaim the equipment at disposal. In 2010, the American Recovery and Reinvestment Act (ARRA) was signed into legislation by President Obama. The bill allocated over $90 billion to be invested in green initiatives (renewable energy, smart grids, energy efficiency, etc.) In January 2010, the U.S. Energy Department granted $47 million of the ARRA money towards projects that aim to improve the energy efficiency of data centers. The projects will provide research on the following three areas: optimize data center hardware and software, improve power supply chain, and data center cooling technologies.[8]

Industry • Climate Savers Computing Initiative (CSCI) is an effort to reduce the electric power consumption of PCs in active and inactive states.[9] The CSCI provides a catalog of green products from its member organizations, and information for reducing PC power consumption. It was started on 2007-06-12. The name stems from the World Wildlife Fund's Climate Savers program, which was launched in 1999.[10] The WWF is also a member of the Computing Initiative.[9] • The Green Electronics Council offers the Electronic Product Environmental Assessment Tool (EPEAT) to assist in the purchase of "greener" computing systems. The Council evaluates computing equipment on 51 criteria - 23 required and 28 optional - that measure a product's efficiency and sustainability attributes. Products are rated Gold, Silver, or Bronze, depending on how many optional criteria they meet. On 2007-01-24, President George W. Bush issued Executive Order 13423, which requires all United States Federal agencies to use EPEAT when purchasing computer systems.[11][12] • The Green Grid is a global consortium dedicated to advancing energy efficiency in data centers and business computing ecosystems. It was founded in February 2007 by several key companies in the industry – AMD, APC, Dell, HP, IBM, Intel, Microsoft, Rackable Systems, SprayCool, Sun Microsystems and VMware. The Green Grid has since grown to hundreds of members, including end-users and government organizations, all focused on improving data center infrastructure efficiency (DCIE). • The Green500 list rates supercomputers by energy efficiency (megaflops/watt, encouraging a focus on efficiency rather than absolute performance. • Green Comm Challenge is an organization that promotes the development of energy conservation technology and practices in the field of Information and Communications Technology (ICT). • The Transaction Processing Performance Council(TPC) Energy specification augments the existing TPC benchmarks by allowing for optional publications of energy metrics alongside their performance results.[13] • The SPEC Power is the first industry standard benchmark that measures power consumption in relation to performance for server-class computers. Green computing 110

Approaches In the article Harnessing Green IT: Principles and Practices, San Murugesan defines the field of green computing as "the study and practice of designing, manufacturing, using, and disposing of computers, servers, and associated subsystems — such as monitors, printers, storage devices, and networking and communications systems — efficiently and effectively with minimal or no impact on the environment."[1] Murugesan lays out four paths along which he believes the environmental effects of computing should be addressed:[1] Green use, green disposal, green design, and green manufacturing. Green computing can also develop solutions that offer benefits by "aligning all IT processes and practices with the core principles of sustainability, which are to reduce, reuse, and recycle; and finding innovative ways to use IT in business processes to deliver sustainability benefits across the enterprise and beyond".[14] Modern IT systems rely upon a complicated mix of people, networks, and hardware; as such, a green computing initiative must cover all of these areas as well. A solution may also need to address end user satisfaction, management restructuring, regulatory compliance, and return on investment (ROI). There are also considerable fiscal motivations for companies to take control of their own power consumption; "of the power management tools available, one of the most powerful may still be simple, plain, common sense."[15]

Product longevity Gartner maintains that the PC manufacturing process accounts for 70% of the natural resources used in the life cycle of a PC.[16] More recently, Fujitsu released a Life Cycle Assessment (LCA) of a desktop that show that manufacturing and end of life accounts for the majority of this laptop ecological footprint.[17] Therefore, the biggest contribution to green computing usually is to prolong the equipment's lifetime. Another report from Gartner recommends to "Look for product longevity, including upgradability and modularity." [18] For instance, manufacturing a new PC makes a far bigger ecological footprint than manufacturing a new RAM module to upgrade an existing one.

Data center design Data center facilities are heavy consumers of energy, accounting for between 1.1% and 1.5% of the world’s total energy use in 2010 [1]. The U.S. Department of Energy estimates that data center facilities consume up to 100 to 200 times more energy than standard office buildings.[19] Energy efficient data center design should address all of the energy use aspects included in a data center: from the IT equipment to the HVAC equipment to the actual location, configuration and construction of the building. The U.S. Department of Energy specifies five primary areas on which to focus energy efficient data center design best practices:[20] • Information technology (IT) systems • Environmental conditions • Air management • Cooling systems • Electrical systems Additional energy efficient design opportunities specified by the U.S. Department of Energy include on-site electrical generation and recycling of waste heat.[19] Energy efficient data center design should help to better utilize a data center’s space, and increase performance and efficiency. Green computing 111

Software and deployment optimization

Algorithmic efficiency The efficiency of algorithms has an impact on the amount of computer resources required for any given computing function and there are many efficiency trade-offs in writing programs. While algorithmic efficiency does not have as much impact as other approaches , it is still an important consideration. A study by a physicist at Harvard, estimated that the average released 7 grams of carbon dioxide (CO₂).[21] However, Google disputes this figure, arguing instead that a typical search produces only 0.2 grams of CO₂.[22] More recently, an independent study by Green IT.fr [23] demonstrate that Windows 7 + Office 2010 require 70 times more memory (RAM) than Windows 98 + Office 2000 to write exactly the same text or send exactly the same e-mail than 10 years ago[24]

Resource allocation Algorithms can also be used to route data to data centers where electricity is less expensive. Researchers from MIT, Carnegie Mellon University, and Akamai have tested an energy allocation algorithm that successfully routes traffic to the location with the cheapest energy costs. The researchers project up to a 40 percent savings on energy costs if their proposed algorithm were to be deployed. However, this approach does not actually reduce the amount of energy being used; it reduces only the cost to the company using it. Nonetheless, a similar strategy could be used to direct traffic to rely on energy that is produced in a more environmentally friendly or efficient way. A similar approach has also been used to cut energy usage by routing traffic away from data centers experiencing warm weather; this allows computers to be shut down to avoid using air conditioning.[25] Larger server centers are sometimes located where energy and land are inexpensive and readily available. Local availability of renewable energy, climate that allows outside air to be used for cooling, or locating them where the heat they produce may be used for other purposes could be factors in green siting decisions.

Virtualizing Computer virtualization refers to the abstraction of computer resources, such as the process of running two or more logical computer systems on one set of physical hardware. The concept originated with the IBM mainframe operating systems of the 1960s, but was commercialized for x86-compatible computers only in the 1990s. With vitualization, a system administrator could combine several physical systems into virtual machines on one single, powerful system, thereby unplugging the original hardware and reducing power and cooling consumption. Virtualization can assist in distributing work so that servers are either busy or put in a low-power sleep state. Several commercial companies and open-source projects now offer software packages to enable a transition to virtual computing. Intel Corporation and AMD have also built proprietary virtualization enhancements to the x86 instruction set into each of their CPU product lines, in order to facilitate virtual computing.

Terminal servers Terminal servers have also been used in green computing. When using the system, users at a terminal connect to a central server; all of the actual computing is done on the server, but the end user experiences the on the terminal. These can be combined with thin clients, which use up to 1/8 the amount of energy of a normal workstation, resulting in a decrease of energy costs and consumption. There has been an increase in using terminal services with thin clients to create virtual labs. Examples of terminal server software include Terminal Services for Windows and the Linux Terminal Server Project (LTSP) for the Linux operating system. Green computing 112

Power management The Advanced Configuration and Power Interface (ACPI), an open industry standard, allows an operating system to directly control the power-saving aspects of its underlying hardware. This allows a system to automatically turn off components such as monitors and hard drives after set periods of inactivity. In addition, a system may hibernate, where most components (including the CPU and the system RAM) are turned off. ACPI is a successor to an earlier Intel-Microsoft standard called Advanced Power Management, which allows a computer's BIOS to control power management functions. Some programs allow the user to manually adjust the voltages supplied to the CPU, which reduces both the amount of heat produced and electricity consumed. This process is called undervolting. Some CPUs can automatically undervolt the processor, depending on the workload; this technology is called "SpeedStep" on Intel processors, "PowerNow!"/"Cool'n'Quiet" on AMD chips, LongHaul on VIA CPUs, and LongRun with Transmeta processors.

Data center power Data centers, which have been criticized for their extraordinarily high energy demand, are a primary focus for proponents of green computing.[26] Data centers can potentially improve their energy and space efficiency through techniques such as storage consolidation and virtualization. Many organizations are starting to eliminate underutilized servers, which results in lower energy usage.[27] The U.S. federal government has set a minimum 10% reduction target for data center energy usage by 2011.[26] With the aid of a self-styled ultraefficient evaporative cooling technology, Google Inc. has been able to reduce its energy consumption to 50% of that of the industry average.[26]

Operating system support The dominant desktop operating system, , has included limited PC power management features since Windows 95.[28] These initially provided for stand-by (suspend-to-RAM) and a monitor low power state. Further iterations of Windows added hibernate (suspend-to-disk) and support for the ACPI standard. Windows 2000 was the first NT-based operating system to include power management. This required major changes to the underlying operating system architecture and a new hardware driver model. Windows 2000 also introduced Group Policy, a technology that allowed administrators to centrally configure most Windows features. However, power management was not one of those features. This is probably because the power management settings design relied upon a connected set of per-user and per-machine binary registry values,[29] effectively leaving it up to each user to configure their own power management settings. This approach, which is not compatible with Windows Group Policy, was repeated in Windows XP. The reasons for this design decision by Microsoft are not known, and it has resulted in heavy criticism.[30] Microsoft significantly improved this in Windows Vista[31] by redesigning the power management system to allow basic configuration by Group Policy. The support offered is limited to a single per-computer policy. The most recent release, Windows 7 retains these limitations but does include refinements for more efficient user of operating system timers, processor power management,[32][33] and display panel brightness. The most significant change in Windows 7 is in the user experience. The prominence of the default High Performance power plan has been reduced with the aim of encouraging users to save power. There is a significant market in third-party PC power management software offering features beyond those present in the Windows operating system.[34][35][36] available. Most products offer Active Directory integration and per-user/per-machine settings with the more advanced offering multiple power plans, scheduled power plans, anti-insomnia features and enterprise power usage reporting. Notable vendors include 1E NightWatchman,[37][38] Data Synergy PowerMAN (Software),[39] Faronics Power Save[40] and Verdiem SURVEYOR.[41] Green computing 113

Power supply Desktop computer power supplies (PSUs) are in general 70–75% efficient,[42] dissipating the remaining energy as heat. An industry initiative called 80 PLUS certifies PSUs that are at least 80% efficient; typically these models are drop-in replacements for older, less efficient PSUs of the same form factor.[43] As of July 20, 2007, all new Energy Star 4.0-certified desktop PSUs must be at least 80% efficient.[44]

Storage Smaller form factor (e.g., 2.5 inch) hard disk drives often consume less power per gigabyte than physically larger drives.[45][46] Unlike hard disk drives, solid-state drives store data in flash memory or DRAM. With no moving parts, power consumption may be reduced somewhat for low-capacity flash-based devices.[47][48] In a recent case study, Fusion-io, manufacturer of solid state storage devices, managed to reduce the energy use and operating costs of MySpace data centers by 80% while increasing performance speeds beyond that which had been attainable via multiple hard disk drives in Raid 0.[49][50] In response, MySpace was able to retire several of their servers. As hard drive prices have fallen, storage farms have tended to increase in capacity to make more data available online. This includes archival and backup data that would formerly have been saved on tape or other offline storage. The increase in online storage has increased power consumption. Reducing the power consumed by large storage arrays, while still providing the benefits of online storage, is a subject of ongoing research.[51]

Video card A fast GPU may be the largest power consumer in a computer.[52] Energy-efficient display options include: • No video card - use a shared terminal, shared thin client, or desktop sharing software if display required. • Use motherboard video output - typically low 3D performance and low power. • Select a GPU based on low idle power, average wattage, or performance per watt.

Display CRT monitors typically use more power than LCD monitors. They also contain significant amounts of lead. LCD monitors typically use a cold-cathode fluorescent bulb to provide light for the display. Some newer displays use an array of light-emitting diodes (LEDs) in place of the fluorescent bulb, which reduces the amount of electricity used by the display.[53] Fluorescent back-lights also contain mercury, whereas LED back-lights do not.

Materials recycling Recycling computing equipment can keep harmful materials such as lead, mercury, and hexavalent out of landfills, and can also replace equipment that otherwise would need to be manufactured, saving further energy and emissions. Computer systems that have outlived their particular function can be re-purposed, or donated to various charities and non-profit organizations.[54] However, many charities have recently imposed minimum system requirements for donated equipment.[55] Additionally, parts from outdated systems may be salvaged and recycled through certain retail outlets[56][57] and municipal or private recycling centers. Computing supplies, such as printer cartridges, paper, and batteries may be recycled as well.[58] A drawback to many of these schemes is that computers gathered through recycling drives are often shipped to developing countries where environmental standards are less strict than in North America and Europe.[59] The Silicon Valley Toxics Coalition estimates that 80% of the post-consumer e-waste collected for recycling is shipped abroad to countries such as China and Pakistan.[60] In 2011, the collection rate of e-waste is still very low, even in the most ecology-responsible countries like France. In this country, e-waste collection is still at a 14% annual rate between electronic equipments sold and e-waste Green computing 114

collected for 2006 to 2009.[61] The recycling of old computers raises an important privacy issue. The old storage devices still hold private information, such as emails, passwords, and credit card numbers, which can be recovered simply by someone's using software available freely on the Internet. Deletion of a file does not actually remove the file from the hard drive. Before recycling a computer, users should remove the hard drive, or hard drives if there is more than one, and physically destroy it or store it somewhere safe. There are some authorized hardware recycling companies to whom the computer may be given for recycling, and they typically sign a non-disclosure agreement.[62]

Telecommuting Teleconferencing and telepresence technologies are often implemented in green computing initiatives. The advantages are many; increased worker satisfaction, reduction of greenhouse gas emissions related to travel, and increased profit margins as a result of lower overhead costs for office space, heat, lighting, etc. [63] The savings are significant; the average annual energy consumption for U.S. office buildings is over 23 kilowatt hours per square foot, with heat, air conditioning and lighting accounting for 70% of all energy consumed.[64] Other related initiatives, such as hotelling, reduce the square footage per employee as workers reserve space only when they need it.[65] Many types of jobs, such as sales, consulting, and field service, integrate well with this technique. Voice over IP (VoIP) reduces the telephony wiring infrastructure by sharing the existing Ethernet copper. VoIP and phone extension mobility also made hot desking more practical.

Education and certification

Green computing programs Degree and postgraduate programs that provide training in a range of information technology concentrations along with sustainable strategies in an effort to educate students how to build and maintain systems while reducing its negative impact on the environment. The Australian National University (ANU) offers "ICT Sustainability" as part of its information technology and engineering masters programs.[66] Athabasca University offer a similar course "Green ICT Strategies",[67] adapted from the ANU course notes.[68] In the UK, Leeds Metropolitan University offers an MSc Green Computing program in both full and part time access modes.[69]

Green computing certifications Some certifications demonstrate that an individual has specific green computing knowledge, including: • Green Computing Initiative - GCI offers the Certified Green Computing User Specialist (CGCUS), Certified Green Computing Architect (CGCA) and Certified Green Computing Professional (CGCP) certifications.[70] • CompTIA Strata Green IT is designed for IT managers to show that they have good knowledge of green IT practices and methods and why it is important to incorporate them into an organization. • Information Systems Examination Board (ISEB) Foundation Certificate in Green IT is appropriate for showing an overall understanding and awareness of green computing and where its implementation can be beneficial. • Singapore Infocomm Technology Federation (SiTF) Singapore Certified Green IT Professional is an industry endorsed professional level certification offered with SiTF authorized training partners. Certification requires completion of a four day instructor-led core course, plus a one day elective from an authorized vendor.[71] • Australian Computer Society (ACS) The ACS offers a certificate for "Green Technology Strategies" as part of the Computer Professional Education Program (CPEP). Award of a certificate requires completion of a 12 week e-learning course, with written assignments.[72] Green computing 115

Blogs and Web 2.0 resources There are a lot of blogs and other user created references that can be used to gain more insights on green computing strategies, technologies and business benefits. A lot of students in Management and Engineering courses have helped in raising higher awareness about green computing.[73]

References [1] San Murugesan, “Harnessing Green IT: Principles and Practices,” IEEE IT Professional, January–February 2008, pp 24-33.

[2] E. Curry, B. Guyon, C. Sheridan, and B. Donnellan, “Developing a Sustainable IT Capability: Lessons From Intel’s Journey,” (http:/ / www.

edwardcurry. org/ publications/ MISQE_SustainableIT_Intel_2012. pdf) MIS Quarterly Executive, vol. 11, no. 2, pp. 61–74, 2012.

[3] "TCO takes the initiative in comparative product testing" (http:/ / www. boivie. se/ index. php?page=2& lang=eng). 2008-05-03. . Retrieved 2008-05-03. [4] Full report: OECD Working Party on the Information Economy. "Towards Green ICT strategies: Assessing Policies and Programmes on ICTs

and the Environment" (http:/ / www. oecd. org/ dataoecd/ 47/ 12/ 42825130. pdf). . Summary: OECD Working Party on the Information

Economy. "Executive summary of OECD report" (http:/ / www. oecd. org/ dataoecd/ 46/ 18/ 43044065. pdf). .

[5] Jones, Ernesta (2006-10-23). "EPA Announces New Computer Efficiency Requirements" (http:/ / yosemite. epa. gov/ opa/ admpress. nsf/

a8f952395381d3968525701c005e65b5/ 113b0c0647fee41585257210006474f1!OpenDocument). U.S. EPA. . Retrieved 2007-09-18.

[6] Gardiner, Bryan (2007-02-22). "How Important Will New Energy Star Be for PC Makers?" (http:/ / www. pcmag. com/ article2/

0,1759,2097558,00. asp). PC Magazine. . Retrieved 2007-09-18.

[7] "State Legislation on E-Waste" (http:/ / www. e-takeback. org/ docs open/ Toolkit_Legislators/ state legislation/ state_leg_main. htm). Electronics Take Back Coalition. 2008-03-20. . Retrieved 2008-03-08.

[8] "Secretary Chu Announces $47 Million to Improve Efficiency in Information Technology and Communications Sectors" (http:/ / www.

energy. gov/ news2009/ 8491. htm) (Press release). U.S. Department of Energy. 2010-01-06. . Retrieved 2010-10-30. [9] "Intel and Google Join with Dell, EDS, EPA, HP, IBM, Lenovo, Microsoft, PG&E, World Wildlife Fund and Others to Launch Climate

Savers Computing Initiative" (http:/ / www. climatesaverscomputing. org/ program/ press. html) (Press release). Business Wire. 2007-06-12. . Retrieved 2007-12-11.

[10] "What exactly is the Climate Savers Computing Initiative?" (http:/ / www. climatesaverscomputing. org/ program/ index. html). Climate Savers Computing Initiative. 2007. . Retrieved 2007-12-11.

[11] "President Bush Requires Federal Agencies to Buy EPEAT Registered Green Electronic Products" (http:/ / www. epeat. net/ Docs/ Bush

Requires EPEAT (1-24-07). pdf) (PDF) (Press release). Green Electronics Council. 2007-01-24. . Retrieved 2007-09-20.

[12] "Executive Order: Strengthening Federal Environmental, Energy, and Transportation Management" (http:/ / georgewbush-whitehouse.

archives. gov/ news/ releases/ 2007/ 01/ 20070124-2. html) (Press release). The White House: Office of the Press Secretary. 2007-01-24. . Retrieved 2007-09-20.

[13] "Energy benchmarks: a detailed analysis (e-Energy 2006)" (http:/ / portal. acm. org/ citation. cfm?doid=1791314. 1791336). ACM. ISBN 978-1-4503-0042-1. Meikel Poess, Raghunath Nambiar, Kushagra Vaid, John M. Stephens, Jr., Karl Huppler, Evan Haines. . [14] Donnellan, Brian and Sheridan, Charles and Curry, Edward (Jan/Feb 2011). "A Capability Maturity Framework for Sustainable Information

and Communication Technology" (http:/ / ieeexplore. ieee. org/ lpdocs/ epic03/ wrapper. htm?arnumber=5708282). IEEE IT Professional 13 (1): 33–40. .

[15] "The common sense of lean and green IT" (http:/ / www. deloitte. co. uk/ TMTPredictions/ technology/

Green-and-lean-it-data-centre-efficiency. cfm). Deloitte Technology Predictions. .

[16] InfoWorld July 06, 2009; http:/ / www. infoworld. com/ d/ green-it/

used-pc-strategy-passes-toxic-buck-300?_kip_ipx=1053322433-1267784052& _pxn=0

[17] GreenIT.fr Feb. 2011; http:/ / www. greenit. fr/ article/ materiel/ pc-de-bureau/ quelle-est-l-empreinte-carbone-d-un-ordinateur-3478

[18] Simon Mingay, Gartner: 10 Key Elements of a 'Green IT' Strategy; http:/ / www. onsitelasermedic. com/ pdf/ 10_key_elements_greenIT. pdf. [19] “Best Practices Guide for Energy-Efficient Data Center Design”, prepared by the National Renewable Energy Laboratory for the U.S.

Department of Energy, Federal Energy Management Program, March 2011. (http:/ / www1. eere. energy. gov/ femp/ pdfs/

eedatacenterbestpractices. pdf)

[20] Koomey, Jonathon. “Growth in data center electricity use 2005 to 2010,” Oakland, CA: Analytics Press. August 1. (http:/ / www.

analyticspress. com/ datacenters. html)

[21] "Research reveals environmental impact of Google searches." (http:/ / www. foxnews. com/ story/ 0,2933,479127,00. html). Fox News. 2009-01-12. .

[22] "Powering a Google search" (http:/ / googleblog. blogspot. com/ 2009/ 01/ powering-google-search. html). Official Google Blog. Google. . Retrieved 2009-10-01.

[23] http:/ / www. greenit. fr

[24] "Office suite require 70 times more memory than 10 years ago." (http:/ / www. greenit. fr/ article/ logiciels/ logiciel-la-cle-de-l-obsolescence-programmee-du-materiel-informatique-2748). Green-IT.fr. 2010-05-24. . Retrieved 2010-05-24. Green computing 116

[25] Rear don, Marguerite (August 18, 2009). "Energy-aware Internet routing coming soon" (http:/ / news. cnet. com/

8301-11128_3-10312408-54. html). . Retrieved August 19, 2009. [26] Kurp, Patrick."Green Computing," Communications of the ACM51(10):11.

[27] "Is Green IT Over?" (http:/ / content. dell. com/ us/ en/ enterprise/ d/ large-business/ green-it-over. aspx). Dell.com. . Retrieved 07-11-21.

[28] "Windows 95 Power Management" (http:/ / msdn. microsoft. com/ en-us/ library/ ms810046. aspx). .

[29] "Windows power-saving options are, bizarrely, stored in HKEY_CURRENT_USER" (http:/ / www. liv. ac. uk/ csd/ greenit/ powerdown). .

[30] "How Windows XP Wasted $25 Billion of Energy" (http:/ / www. treehugger. com/ files/ 2006/ 11/ how_windows_xp. php). 2006-11-21. . Retrieved 2005-11-21.

[31] "Windows Vista Power Management Changes" (http:/ / download. microsoft. com/ download/ 5/ b/ 9/

5b97017b-e28a-4bae-ba48-174cf47d23cd/ CPA075_WH06. ppt). .

[32] "Windows 7 Processor Power Management" (http:/ / www. microsoft. com/ whdc/ system/ pnppwr/ powermgmt/ ProcPowerMgmtWin7. mspx). .

[33] "Windows 7 Timer Coalescing" (http:/ / www. microsoft. com/ whdc/ system/ pnppwr/ powermgmt/ TimerCoal. mspx). .

[34] "Power Management Software for Windows Workstations" (http:/ / www. windowsitpro. com/ article/ buyers-guide/

Power-Management-Software-for-Windows-Workstations-Buyers-Guide. aspx). .

[35] "Energy Star Commercial Packages List" (http:/ / www. energystar. gov/ index. cfm?c=power_mgt. pr_power_mgt_comm_packages). .

[36] The Headmasters' and Headmistresses' Conference. "HMC: A Practical Guide to Sustainable Building for Schools" (http:/ / www.

hmcsustainability. org. uk/ energy. html). .

[37] "PC Power Management Solutions" (http:/ / itmanagersinbox. com/ 1399/ pc-power-management-solutions). .

[38] "Why use software NightWatchman to turn your PCs off?" (http:/ / features. techworld. com/ green-it/ 3546/

why-use-software-nightwatchman-to-turn-your-pcs-off/ ). .

[39] "University of Oxford Low Carbon Project: Energy and the networked computing environment" (http:/ / projects. oucs. ox. ac. uk/

lowcarbonict/ conferences/ conf-2. htm#providers). .

[40] "Forrester Study: Total Economic Impact of Faronics Power Save" (http:/ / www. faronics. com/ Faronics/ Documents/

Study_PowerSave_Forrester_TEI_EN. pdf). .

[41] "1E upgrades NightWatchman, seeks to bring powermanagement to SMEs: Competitive landscape" (http:/ / www. 1e. com/ Downloads/

Articles/ Published/ 451 - 1E - Market Development-020309. pdf). .

[42] Schuhmann, Daniel (2005-02-28). "Strong Showing: High-Performance Power Supply Units" (http:/ / www. tomshardware. com/ 2005/ 02/

28/ strong_showing/ page38. html). Tom's Hardware. . Retrieved 2007-09-18. [43] 80 PLUS

[44] "Computer Key Product Criteria" (http:/ / www. energystar. gov/ index. cfm?c=computers. pr_crit_computers). Energy Star. 2007-07-20. . Retrieved 2007-09-17.

[45] Mike Chin (8 March 2004). "IS the Silent PC Future 2.5-inches wide?" (http:/ / www. silentpcreview. com/ article145-page1. html). . Retrieved 2008-08-02.

[46] Mike Chin (2002-09-18). "Recommended Hard Drives" (http:/ / www. silentpcreview. com/ article29-page2. html). . Retrieved 2008-08-02.

[47] Super Talent's 2.5" IDE Flash hard drive - The Tech Report - Page 13 (http:/ / techreport. com/ articles. x/ 10334/ 13)

[48] Power Consumption - Tom's Hardware: Conventional Hard Drive Obsoletism? Samsung's 32 GB Flash Drive Previewed (http:/ / www.

tomshardware. com/ reviews/ conventional-hard-drive-obsoletism,1324-5. html)

[49] http:/ / community. fusionio. com/ media/ p/ 458/ download. aspx

[50] (http:/ / www. forbes. com/ feeds/ businesswire/ 2009/ 10/ 13/ businesswire130131961. html)

[51] IBM chief engineer talks green storage (http:/ / searchstorage. techtarget. com/ news/ article/ 0,289142,sid5_gci1255635,00.

html?track=NL-52& ad=592882& asrc=EM_NLN_1637134& uid=6241617), SearchStorage - TechTarget

[52] http:/ / www. xbitlabs. com/ articles/ video/ display/ power-noise. html X-bit labs: Faster, Quieter, Lower: Power Consumption and Noise Level of Contemporary Graphics Cards

[53] "Cree LED Backlight Solution Lowers Power Consumption of LCD Displays" (http:/ / news. thomasnet. com/ fullstory/ 464080). 2005-05-23. . Retrieved 2007-09-17.

[54] Reuse your electronics through donation » Earth 911 (http:/ / earth911. org/ blog/ 2007/ 04/ 12/

reuse-your-electronics-through-donation-to-schools-charities-and-nonprofit-organizations/ )

[55] Delaney, John (2007-09-04). "15 Ways to Reinvent Your PC" (http:/ / www. pcmag. com/ article2/ 0,1895,2170255,00. asp). PC Magazine 26 (17). .

[56] "Staples Launches Nationwide Computer and Office Technology Recycling Program" (http:/ / investor. staples. com/ phoenix.

zhtml?c=96244& p=irol-newsArticle& ID=1004542& highlight=). Staples, Inc.. 2007-05-21. . Retrieved 2007-09-17.

[57] "Goodwill Teams with Electronic Recyclers to Recycle eWaste" (http:/ / earth911. org/ blog/ 2007/ 08/ 15/

goodwill-teams-with-electronic-recyclers-to-recycle-ewaste/ ). Earth 911. 2007-08-15. . Retrieved 2007-09-17. [58] Refilled ink cartridges, paper recycling, battery recycling

[59] Segan, Sascha (2007-10-02). "Green Tech: Reduce, Reuse, That's It" (http:/ / www. pcmag. com/ article2/ 0,2704,2183977,00. asp). PC Magazine 26 (19): 56. . Retrieved 2007-11-07. [60] Royte, Elizabeth (2006). Garbage Land: On the Secret Trail of Trash. Back Bay Books. pp. 169–170. ISBN 0-316-73826-3. Green computing 117

[61] "e-waste take-back still very low" (http:/ / www. greenit. fr/ article/ materiel/ recyclage/ deee-l-europe-revoit-ses-objectifs-a-la-hausse-3482). GreenIT.fr. 2011-02-09. . Retrieved 2011-02-09.

[62] MySecureCyberspace » Green Computing and Privacy Issues (http:/ / www. mysecurecyberspace. com/ articles/ features/

green-computing-and-privacy-issues. html)

[63] E. Curry, B. Guyon, C. Sheridan, and B. Donnellan, “Developing an Sustainable IT Capability: Lessons From Intel’s Journey,” (http:/ /

www. edwardcurry. org/ publications/ MISQE_SustainableIT_Intel_2012. pdf) MIS Quarterly Executive, vol. 11, no. 2, pp. 61–74, 2012.

[64] "EPA Office Building Energy Use Profile" (http:/ / www. epa. gov/ cleanenergy/ documents/ sector-meeting/ 4bi_officebuilding. pdf) (PDF). EPA. 2007-08-15. . Retrieved 2008-03-17.

[65] "What Is Green IT?" (http:/ / energypriorities. com/ entries/ 2007/ 06/ what_is_green_it_data_centers. php). .

[66] "ICT Sustainability Course" (http:/ / studyat. anu. edu. au/ courses/ COMP7310;details. html). Australian National University. 2012-01-01. . Retrieved 2012-02-05.

[67] "Green ICT Strategies (Revision 1)" (http:/ / scis. athabascau. ca/ graduate/ comp635syllabus. php). Athabasca University. 2011-09-06. . Retrieved 2012-02-05.

[68] "ICT Sustainability: Assessment and Strategies for a Low Carbon Future" (http:/ / www. tomw. net. au/ ict_sustainability/ ). Tomw Communications Pty, Limited. 2011-09-06. . Retrieved 2012-02-05.

[69] "MSc Green Computing Course" (http:/ / courses. leedsmet. ac. uk/ greencomputing_msc). Leeds Metropolitan University. 2012-01-01. . Retrieved 2012-04-26.

[70] "Green Computing Initiative Certification" (http:/ / www. sgauge. com/ greenci/ ). Green Computing Initiative. . Retrieved 2012-02-05.

[71] "Singapore Certified Green IT Professional" (http:/ / www. sitf. org. sg/ sitfacademy/ Programme_SCGP. html). Singapore Infocomm Technology Federation. 2012. . Retrieved 2012-02-05.

[72] "Green Technology Strategies Course" (http:/ / www. acs. org. au/ cpeprogram/ index. cfm?action=show& conID=greenict). Australian Computer Society. 2011-11-29. . Retrieved 2012-02-05.

[73] "Green Computing Blog by students of [[IE Business School (http:/ / ygreenit. wordpress. com/ about/ )]"]. Wordpress. . Retrieved 2012-06-12.

External links

• Green IT Factsheet (http:/ / css. snre. umich. edu/ css_doc/ CSS09-07. pdf) by the University of Michigan's

Center for Sustainable Systems (http:/ / www. css. snre. umich. edu) Social network 118 Social network

A social network is a social structure made up of a set of actors (such as individuals or organizations) and the dyadic ties between these actors. The social network perspective provides a clear way of analyzing the structure of whole social entities.[1] The study of these structures uses social network analysis to identify local and global patterns, locate influential entities, and examine network dynamics. Social networks and the analysis of them is an inherently interdisciplinary academic field which emerged from social psychology, sociology, statistics, and graph theory. Georg Simmel authored early structural theories in sociology emphasizing the dynamics of triads and "web of group affiliations."[2] Jacob Moreno is credited with developing the first sociograms in the 1930s to study interpersonal relationships. These approaches were mathematically formalized in the 1950s and theories and methods of social networks became pervasive in the social and behavioral sciences by the 1980s.[1][3] Social network analysis is now one of the major paradigms in contemporary sociology, and is also employed in a number of other social and formal sciences. Together with other complex networks, it forms part of the nascent field of network science.[4][5]

Overview

A social network is a theoretical construct useful in the social sciences to study relationships between individuals, groups, organizations, or even entire societies (social units, see differentiation). The term is used to describe a social structure determined by such interactions. The ties through which any given social unit connects represent the convergence of the various social contacts of that unit. This theoretical approach is, necessarily, relational. An axiom of the social network approach to understanding social interaction is that social phenomena should be primarily conceived and investigated through the properties of relations between and within units, instead of the properties of these units themselves. Thus, one common criticism of social network theory is that individual agency is often ignored,[6] although this may not be Evolution graph of a social network: Barabási model. the case in practice (see agent-based modeling). Precisely because many different types of relations, singular or in combination, form these network configurations, network analytics are useful to a broad range of research enterprises. In social science, these fields of study include, but are not limited to anthropology, biology, communication studies, economics, geography, information science, organizational studies, social psychology, sociology, and sociolinguistics.

History Some of the ideas of the social network approach are found in writings going back to the ancient Greeks . In the late 1800s, both Émile Durkheim and Ferdinand Tönnies foreshadow the idea of social networks in their theories and research of social groups. Tönnies argued that social groups can exist as personal and direct social ties that either link individuals who share values and belief (Gemeinschaft, German, commonly translated as "community") or impersonal, formal, and instrumental social links (Gesellschaft, German, commonly translated as "society").[7] Durkheim gave a non-individualistic explanation of social facts arguing that social phenomena arise when interacting individuals constitute a reality that can no longer be accounted for in terms of the properties of individual actors.[8] Georg Simmel, writing at the turn of the twentieth century, pointed to the nature of networks and the effect of network size on interaction and examined the likelihood of interaction in loosely-knit networks rather than groups.[9] Social network 119

Major developments in the field can be seen in the 1930s by several groups in psychology, anthropology, and mathematics working independently.[6] .[10][11] In psychology, in the 1930s, Jacob L. Moreno began systematic recording and analysis of social interaction in small groups, especially classrooms and work groups (see sociometry). In anthropology, the foundation for social network theory is the theoretical and ethnographic work of Bronislaw Malinowski,[12] Alfred Radcliffe-Brown,[13][14] and Claude Lévi-Strauss.[15] A group of social anthropologists associated with Max Gluckman and the Manchester School, including John A. Barnes,[16] J. Clyde Mitchell and Elizabeth Bott Spillius,[17][18] often are credited with performing some of the first fieldwork from which network analyses were performed, investigating community networks in southern Africa, India and the United Kingdom.[6] Concomitantly, British anthropologist S.F. Nadel codified a theory of social structure that was influential in later network analysis.[19] In sociology, the early (1930s) work of Talcott Parsons set the stage for taking a relational approach to understanding social structure.[20][21] Later, drawing upon Parsons' theory, the work of sociologist Peter Blau provides a strong impetus for analyzing the relational ties of social units with his work on social exchange theory.[22][23][24] By the 1970s, a growing number of scholars worked to combine the different tracks and traditions. One group consisted of sociologist Harrison White and his students at the Harvard University Department of Social Relations. Also independently active in the Harvard Social Relations department at the time were Charles Tilly, who focused on networks in political and community sociology and social movements, and Stanley Milgram, who developed the "six degrees of separation" thesis.[25]Mark Granovetter[26] and Barry Wellman[27] are among the former students of White who elaborated and championed the analysis of social networks.[28][29][30][31]

Levels of analysis

In general, social networks are self-organizing, emergent, and complex, such that a globally coherent pattern appears from the local interaction of the elements that make up the system.[33][34] These patterns become more apparent as network size increases. However, a global network analysis of, for example, all interpersonal relationships in the world is not feasible and is likely to contain so much information as to be uninformative. Practical limitations of computing power, ethics and participant recruitment and payment also limit the scope of a social network analysis.[35][36] The nuances of a local system may be Self-organization of a network, based on Nagler, lost in a large network analysis, hence the quality of information may [32] Levina, & Timme, (2011) be more important than its scale for understanding network properties. Thus, social networks are analyzed at the scale relevant to the researcher's theoretical question. Although levels of analysis are not necessarily mutually exclusive, there are three general levels into which networks may fall: micro-level, meso-level, and macro-level. Social network 120

Micro level At the micro-level, social network research typically begins with an individual, snowballing as social relationships are traced, or may begin with a small group of individuals in a particular social context. Dyadic level: A dyad is a social relationship between two individuals. Network research on dyads may concentrate on structure of the relationship (e.g. multiplexity, strength), social equality, and tendencies toward reciprocity/mutuality. Triadic level: Add one individual to a dyad, and you have a triad. Research at this level may concentrate on factors such as balance and transitivity, as well as social equality and tendencies toward reciprocity/mutuality.[35] Social network diagram, micro-level. Actor level: The smallest unit of analysis in a social network is an individual in their social setting, i.e., an "actor" or "ego". Egonetwork analysis focuses on network characteristics such as size, relationship strength, density, centrality, prestige and roles such as isolates, liaisons, and bridges.[37] Such analyses, are most commonly used in the fields of psychology or social psychology, ethnographic kinship analysis or other genealogical studies of relationships between individuals. Subset level: Subset levels of network research problems begin at the micro-level, but may crossover into the meso-level of analysis. Subset level research may focus on distance and reachability, cliques, cohesive subgroups, or other group action, group actions or behavior.

Meso level In general, meso-level theories begin with a population size that falls between the micro- and macro-levels. However, meso-level may also refer to analyses that are specifically designed to reveal connections between micro- and macro-levels. Meso-level networks are low density and may exhibit causal processes distinct from interpersonal micro-level networks.[38] Organizations: Formal organizations are social groups that distribute tasks for a collective goal.[39] Network research on organizations may focus on either intra-organizational or inter-organizational ties in terms of formal or informal relationships. Intra-organizational networks themselves often contain multiple levels of analysis, especially in larger organizations with multiple branches, franchises or semi-autonomous departments. In these cases, research is often conducted at a workgroup level and organization level, focusing on the interplay between the two structures.[40]

Randomly-distributed networks: Exponential random graph models Social network diagram, meso-level of social networks became state-of-the-art methods of social network analysis in the 1980s. This framework has the capacity to represent social-structural effects commonly observed in many human social networks, including general degree-based structural effects commonly observed in many human social networks as well as reciprocity and transitivity, and at the node-level, homophily and attribute-based activity and popularity effects, as derived from explicit hypotheses about dependencies among network ties. Parameters are given in terms of the prevalence of small subgraph configurations in the network and can be interpreted as describing the combinations of local social processes from which a given network emerges. These probability models for networks on a given set of actors allow generalization beyond the restrictive dyadic independence assumption of micro-networks, allowing models to be built from theoretical structural foundations of social behavior.[41] Social network 121

Scale-free networks: A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. In network theory a scale-free ideal network is a random network with a degree distribution that unravels the size distribution of social groups.[42] Specific characteristics of scale-free networks vary with the theories and analytical tools used to create them, however, in general, scale-free networks have some common characteristics. One notable Examples of a random network and a scale-free network. Each graph has 32 nodes and 32 links. characteristic in a scale-free network is the relative commonness of Note the "hubs" in the scale-free diagram (on the vertices with a degree that greatly exceeds the average. The right). highest-degree nodes are often called "hubs", and may serve specific purposes in their networks, although this depends greatly on the social context. Another general characteristic of scale-free networks is the clustering coefficient distribution, which decreases as the node degree increases. This distribution also follows a power law.[43] The Barabási model of network evolution shown above is an example of a scale-free network.

Macro level Rather than tracing interpersonal interactions, macro-level analyses generally trace the outcomes of interactions, such as economic or other resource transfer interactions over a large population. Large-scale networks: Large-scale network is a term somewhat synonymous with "macro-level" as used, primarily, in social and behavioral sciences, in economics. Originally, the term was used extensively in the computer sciences (see large-scale network mapping). Complex networks: Most larger social networks display features of social complexity, which involves substantial non-trivial features of network topology, with patterns of complex connections between elements that are neither purely regular nor purely random (see, Diagram: section of a large-scale social network complexity science, dynamical system and chaos theory), as do biological, and technological networks. Such complex network features include a heavy tail in the degree distribution, a high clustering coefficient, assortativity or disassortativity among vertices, community structure, and hierarchical structure. In the case of agency-directed networks these features also include reciprocity, triad significance profile (TSP, see network motif), and other features. In contrast, many of the mathematical models of networks that have been studied in the past, such as lattices and random graphs, do not show these features.[44] Social network 122

Theoretical Links

Imported Theories Various theoretical frameworks have been imported for the use of social network analysis. The most prominent of these are Graph Theory, Balance Theory, Social Comparison Theory, and more recently, the Social identity approach.[45]

Indigenous Theories Few complete theories have been produced from social network analysis. Two that have areStructural Role Theory and Heterophily Theory. The basis of Heterophily Theory was the finding in one study that more numerous weak ties can be important in seeking information and innovation, as cliques have a tendency to have more homogeneous opinions as well as share many common traits. This homophilic tendency was the reason for the members of the cliques to be attracted together in the first place. However, being similar, each member of the clique would also know more or less what the other members knew. To find new information or insights, members of the clique will have to look beyond the clique to its other friends and acquaintances. This is what Granovetter called "the strength of weak ties".[46]

Research clusters

Communications Communication Studies are often considered a part of both the social sciences and the humanities, drawing heavily on fields such as sociology, psychology, anthropology, information science, biology, political science, and economics as well as rhetoric, literary studies, and semiotics. Many communications concepts describe the transfer of information from one source to another, and can thus be conceived of in terms of a network.

Community In J.A. Barnes' day, a "community" referred to a specific geographic location and studies of community ties had to do with who talked, associated, traded, and attended church with whom. Today, however, there are extended "online" communities developed through telecommunications devices and social network services. Such devices and services require extensive and ongoing maintenance and analysis, often using network science methods. Community development studies, today, also make extensive use of such methods.

Complex Networks Complex networks require methods specific to modelling and interpreting social complexity and complex adaptive systems, including techniques of dynamic network analysis.

Criminal networks In criminology and urban sociology, much attention has been paid to the social networks among criminal actors. For example, Andrew Papachristos has studied gang murders as a series of exchanges between gangs. Murders can be seen to diffuse outwards from a single source, because weaker gangs cannot afford to kill members of stronger gangs in retaliation, but must commit other violent acts to maintain their reputation for strength. Social network 123

Diffusion of innovations Diffusion of ideas and innovations studies focus on the spread and use of ideas from one actor to another or one culture and another. This line of research seeks to explain why some become "early adopters" of ideas and innovations, and links social network structure with facilitating or impeding the spread of an innovation.

Demography In demography, the study of social networks has led to new sampling methods for estimating and reaching populations that are hard to enumerate (for example, homeless people or intravenous drug users.) For example, respondent driven sampling is a network-based sampling technique that relies on respondents to a survey recommending further respondents.

Economic sociology The field of sociology focuses almost entirely on networks of outcomes of social interactions. More narrowly, economic sociology considers behavioral interactions of individuals and groups through social capital and social "markets". Sociologists, such as Mark Granovetter, have developed core principles about the interactions of social structure, information, ability to punish or reward, and trust that frequently recur in their analyses of political, economic and other institutions. Granovetter examines how social structures and social networks can affect economic outcomes like hiring, price, productivity and innovation and describes sociologists’ contributions to analyzing the impact of social structure and networks on the economy.[47]

Health care Analysis of social networks is increasingly incorporated into heath care analytics, not only in epidemological studies but also in models of patient communication and education, disease prevention, mental health diagnosis and treatment, and in the study of health care organizations and systems.[48]

Human ecology Human ecology is an interdisciplinary and transdisciplinary study of the relationship between humans and their natural, social, and built environments. The scientific philosophy of human ecology has a diffuse history with connections to geography, sociology, psychology, anthropology, zoology, and natural ecology.[49][50]

Language/Linguistics Studies of language and lingustics, particularly evolutionary linguistics, focus on the development of linguistic forms and transfer of changes, sounds or words, from one language system to another through networks of social interaction. Social networks are also important in language shift, as groups of people add and/or abandon languages to their repertoire.

Organizational Studies Research studies of formal or informal organizational relationships, organizational communication, economics, economic sociology, and other resource transfers. Social networks have also been used to examine how organizations interact with each other, characterizing the many informal connections that link executives together, as well as associations and connections between individual employees at different organizations.[51] Intra-organizational networks have been found to affect organizational commitment,[52] organizational identification,[37] interpersonal citizenship behaviour.[53] Social network 124

Social capital Social capital is a sociological concept which refers to the value of social relations and the role of cooperation and confidence to achieve positive outcomes. The term refers to the value one can get from their social ties. For example, newly arrived immigrants can make use of their social ties to established migrants to acquire jobs they may otherwise have trouble getting (e.g., because of lack of knowledge of language). Studies show that there a positive relationship between social capital and the intensity of social network use.[54]

References [1] Wasserman, Stanley; Faust, Katherine (1994). "Social Network Analysis in the Social and Behavioral Sciences". Social Network Analysis: Methods and Applications. Cambridge University Press. pp. 1–27. ISBN 978521382694 . [2] Scott, W. Richard; Davis, Gerald F. (2003). "Networks In and Around Organizations". Organizations and Organizing. Pearson Prentice Hall. ISBN 0-13-195893-3. [3] Freeman, Linton (2004). The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press. ISBN 1-59457-714-5. [4] Borgatti, Stephen P.; Mehra, Ajay; Brass, Daniel J.; Labianca, Giuseppe (2009). "Network Analysis in the Social Sciences". Science 323 (5916): 892–895. doi:10.1126/science.1165821. [5] Easley, David; Kleinberg, Jon (2010). "Overview". Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press. pp. 1–20. ISBN 978-0-521-19533-1. [6] Scott, John P. (2000). Social Network Analysis: A Handbook (2nd edition). Thousand Oaks, CA: Sage Publications. [7] Tönnies, Ferdinand (1887). Gemeinschaft und Gesellschaft, Leipzig: Fues's Verlag. (Translated, 1957 by Charles Price Loomis as Community and Society, East Lansing: Michigan State University Press.) [8] Durkheim, Emile (1893). De la division du travail social: étude sur l'organisation des sociétés supérieures, Paris: F. Alcan. (Translated, 1964, by Lewis A. Coser as The Division of Labor in Society, New York: Free Press.) [9] Simmel, Georg (1908). Soziologie, Leipzig: Duncker & Humblot. [10] For a historical overview of the development of social network analysis, see: Carrington, Peter J. & Scott, John (2011). "Introduction" (http:/

/ books. google. com/ books?id=2chSmLzClXgC& pg=PA1). The Sage Handbook of Social Network Analysis. SAGE. p. 1. ISBN 978-1-84787-395-8. .

[11] See also the diagram in Scott, John (2000). Social Network Analysis: A Handbook (http:/ / books. google. com/

books?id=Ww3_bKcz6kgC& pg=PA8). SAGE. p. 8. ISBN 978-0-7619-6339-4. . [12] Malinowski, Bronislaw (1913). The Family Among the Australian Aborigines: A Sociological Study. London: University of London Press. [13] Radcliffe-Brown, Alfred Reginald (1930) The social organization of Australian tribes. Sydney, Australia: University of Sydney Oceania monographs, No.1. [14] Radcliffe-Brown, A.R. (1940). "On social structure". Journal of the Royal Anthropological Institute, 70, 1-12. [15] Lévi-Strauss, Claude ([1947]1967). Les structures élémentaires de la parenté. Paris: La Haye, Mouton et Co. (Translated, 1969 by J. H. Bell, J. R. von Sturmer, and R. Needham, 1969, as The Elementary Structures of Kinship, Boston: Beacon Press.) [16] Barnes, John (1954). "Class and Committees in a Norwegian Island Parish." Human Relations, (7): 39-58. [17] Freeman, Linton C. and Barry Wellman (1995). "A note on the ancestoral Toronto home of social network analysis." Connections, 18(2): 15-19. [18] Savage, Mike (2008). "Elizabeth Bott and the formation of modern British sociology." The Sociological Review, 56(4): 579–605. [19] Nadel, SF. 1957. The Theory of Social Structure. London: Cohen and West. [20] Parsons, Talcott ([1937] 1949). The Structure of Social Action: A Study in Social Theory with Special Reference to a Group of European Writers. New York, NY: The Free Press. [21] Parsons, Talcott (1951). The Social System. New York, NY: The Free Press. [22] Blau, Peter (1956). Bureaucracy in Modern Society. New York: Random House, Inc. [23] Blau, Peter (1960). "A Theory of Social Integration." The American Journal of Sociology, (65)6: 545-556 , (May). [24] Blau, Peter (1964). Exchange and Power in Social Life.

[25] Bernie Hogan. "The Networked Individual: A Profile of Barry Wellman" (http:/ / www. semioticon. com/ semiotix/ semiotix14/ sem-14-05. html). . [26] Granovetter, Mark (2007). "Introduction for the French Reader," Sociologica 2: 1–8 [27] Wellman, Barry (1988). "Structural analysis: From method and metaphor to theory and substance." Pp. 19-61 in B. Wellman and S. D. Berkowitz (eds.) Social Structures: A Network Approach, Cambridge, UK: Cambridge University Press. [28] Mullins, Nicholas. Theories and Theory Groups in Contemporary American Sociology. New York: Harper and Row, 1973. [29] Tilly, Charles, ed. An Urban World. Boston: Little Brown, 1974. [30] Mark Granovetter, "Introduction for the French Reader," Sociologica 2 (2007): 1–8. [31] Wellman, Barry. 1988. "Structural Analysis: From Method and Metaphor to Theory and Substance." Pp. 19-61 in Social Structures: A Network Approach, edited by Barry Wellman and S.D. Berkowitz. Cambridge: Cambridge University Press. Social network 125

[32] Nagler, Jan, Anna Levina and Marc Timme (2011). "Impact of single links in competitive percolation." Nature Physics, 7: 265-270. [33] Newman, Mark, Albert-László Barabási and Duncan J. Watts (2006). The Structure and Dynamics of Networks (Princeton Studies in Complexity). Oxford: Princeton University Press. [34] Wellman, Barry (2008). "Review: The development of social network analysis: A study in the sociology of science." Contemporary Sociology, 37: 221-222. [35] Kadushin, C. (2012).Understanding social networks: Theories, concepts, and findings. Oxford: Oxford University Press. [36] Granovetter, M. (1976). Network sampling: Some first steps. 81. pp. 1287-1303. [37] Jones, C. & Volpe, E.H. (2011). Organizational identification: Extending our understanding of social identities through social networks. Journal of Organizational Behavior, 32, 413-434. [38] Hedström,Peter, Rickard Sandell, and Charlotta Stern (2000).”Mesolevel Networks and the Diffusion of Social Movements: The Case of the

Swedish Social Democratic Party.” American Journal of Sociology, 106(1): 145–72. (http:/ / www. nuffield. ox. ac. uk/ users/ hedstrom/ ajs3. pdf) [39] Riketta, M. & Nienber, S. (2007). Multiple identities and work motivation: The role of perceived compatibility between nested organizational units. British Journal of Management, 18, S61-77. [40] Riketta, M. & Nienber, S. (2007). Multiple identities and work motivation: The role of perceived compatibility between nested organizational units. British Journal of Management, 18, S61-77. [41] Cranmer, Skyler J. and Bruce A. Desmarais (2011). "Inferential Network Analysis with Exponential Random Graph Models." Political Analysis, 19(1): 66-86. [42] Moreira, André A., Demétrius R. Paula, Raimundo N. Costa Filho, José S. Andrade, Jr. (2006). "Competitive cluster growth in complex networks". Physical Review E 73 (6). arXiv:cond-mat/0603272. doi:10.1103/PhysRevE.73.065101. [43] Barabási, Albert-László (2003). Linked: how everything is connected to everything else and what it means for business, science, and everyday life. New York, NY: Plum. [44] Strogatz, Steven H. (2001). "Exploring complex networks." Nature, 410: 268-276. [45] Kilduff, M., Tsai, W. (2003). Social networks and organisations. Sage Publications. [46] Granovetter, M. (1973). The strength of weak ties. 78. pp. 1360-1380. [47] Granovetter, Mark (2005). "The Impact of Social Structure on Economic Outcomes." The Journal of Economic Perspectives, 19(1): 33-50

(http:/ / www. jstor. org/ stable/ 4134991) [48] Levy, Judith and Bernice Pescosolido (2002). Social Networks and Health. Boston, MA: JAI Press. [49] Crona, Beatrice and Klaus Hubacek (eds.) (2010). "Special Issue: Social network analysis in natural resource governance." Ecology and

Society, 48. (http:/ / www. ecologyandsociety. org/ issues/ view. php?sf=48)

[50] Ernstson, Henrich (2010). "Reading list: Using social network analysis (SNA) in social-ecological studies." Resilience Science (http:/ / rs.

resalliance. org/ 2010/ 11/ 03/ reading-list-using-social-network-analysis-sna-in-social-ecological-studies/ ) [51] Podolny, J.M. & Baron, J.N. (1997). Resources and relationships: Social networks and mobility in the workplace. American Sociological Review, 62(5), 673-693 [52] Lee, J. & Kim, S. (2011). Exploring the role of social networks in affective organizational commitment: Network centrality, strength of ties, and structural holes. The American Review of Public Administration, 41(2), 205-223. [53] Bowler, W.M. & Brass, D.J. (2011). Relational correlates of interpersonal citizenship behaviour: A social network perspective. Journal of Applied Psychology, 91(1), 70-82. [54] Sebastián, Valenzuela; Namsu Park, Kerk F. Kee (2009). "Is There Social Capital in a Social Network Site?: Facebook Use and College Students' Life Satisfaction, Trust, and Participation". Journal of Computer-Mediated Communication 14 (4): 875-901.; Hua Wang and Barry Wellman, "Social Connectivity in America: Changes in Adult Friendship Network Size from 2002 to 2007, " American Behavioral Scientist 53 (8): 1148-69, 2010. DOI: 10.1177/0002764209356247

Further reading • Wellman, Barry; Berkowitz, S.D. (1988). Social Structures: A Network Approach. Structural Analysis in the Social Sciences. Cambridge University Press. ISBN 0-521-24441-2. • Scott, John (1991). Social Network Analysis: a handbook. SAGE. ISBN 978-0-7619-6338-7. • Wasserman, Stanley; Faust, Katherine (1994). Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge University Press. ISBN 978-0-521-38269-4. • Barabási, Albert-László (2003). Linked: How everything is connected to everything else and what it means for business, science, and everyday life. Plum. ISBN 978-0-452-28439-5. • Freeman, Linton C. (2004). The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press. ISBN 1-59457-714-5. • Barnett, George A. (2011). Encyclopedia of Social Networks. SAGE. ISBN 978-1-4129-7911-5. Social network 126

• Kadushin, Charles (2012). Understanding Social Networks: Theories, Concepts, and Findings. Oxford University Press. ISBN 978-0-19-537946-4. • Rainie, Lee and Barry Wellman. 2012. Networked: The New Social Operating System. MIT Press. isbn=978-0-262-01719-0

External links

Organizations

• International Network for Social Network Analysis (http:/ / www. insna. org/ )

Peer-reviewed journals

• Social Networks (http:/ / www. sciencedirect. com/ science/ journal/ 03788733)

• Network Science (http:/ / journals. cambridge. org/ action/ displaySpecialPage?pageId=3656)

• Journal of Social Structure (http:/ / www. cmu. edu/ joss/ content/ articles/ volindex. html)

• Journal of Mathematical Sociology (http:/ / www. tandfonline. com/ toc/ gmas20/ current)

Textbooks and educational resources

• Networks, Crowds, and Markets (http:/ / www. cs. cornell. edu/ home/ kleinber/ networks-book/ ) (2010) by D. Easley & J. Kleinberg

• Introduction to Social Networks Methods (http:/ / faculty. ucr. edu/ ~hanneman/ nettext/ ) (2005) by R. Hanneman & M. Riddle

• Social Network Analysis Instructional Web Site (http:/ / www. analytictech. com/ networks/ ) by S. Borgatti

Data sets

• Pajek's list of lists of datasets (http:/ / pajek. imfm. si/ doku. php?id=data:urls:index)

• UC Irvine Network Data Repository (http:/ / networkdata. ics. uci. edu/ index. html)

• Stanford Large Network Dataset Collection (http:/ / snap. stanford. edu/ data/ )

• M.E.J. Newman datasets (http:/ / www-personal. umich. edu/ ~mejn/ netdata/ )

• Pajek datasets (http:/ / vlado. fmf. uni-lj. si/ pub/ networks/ data/ )

• Gephi datasets (http:/ / wiki. gephi. org/ index. php?title=Datasets#Social_networks)

• KONECT - Koblenz network collection (http:/ / konect. uni-koblenz. de/ networks) Data visualization 127 Data visualization

Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".[1]

A data visualization of Wikipedia as part of the World Wide Web, demonstrating hyperlinks

Overview

According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".[2] Indeed, Fernanda Viegas and Martin M. Wattenberg have suggested that an ideal visualization should not merely communicate clearly, but stimulate viewer engagement and A data visualization from social media attention[3]

Data visualization is closely related to information graphics, information visualization, scientific visualization, and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching, and development. According to Post et al. (2002), it has united the field of scientific and information visualization.[4] As demonstrated by Brian Willison, data visualization has also been linked to enhancing agile software development and customer engagement.[5] KPI Library has developed the “Periodic Table of Visualization Methods”, an interactive chart displaying various data visualization methods [6]. It details 6 types of data visualization methods: data, information, concept, strategy, metaphor and compound. Data visualization 128

Data visualization scope There are different approaches on the scope of data visualization. One common focus is on information presentation such as Friedman (2008) presented it. On this way Friendly (2008) presumes two main parts of data visualization: statistical graphics, and thematic cartography.[1] In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization:[7] • Mindmaps • Displaying news • Displaying data • Displaying connections • Displaying websites • Articles & resources • Tools and services All these subjects are closely related to graphic design and information representation. On the other hand, from a computer science perspective, Frits H. Post (2002) categorized the field into a number of sub-fields:[4] • Visualization algorithms and techniques • Volume visualization • Information visualization • Multiresolution methods • Modelling techniques and • Interaction techniques and architectures

Related fields

Data acquisition Data acquisition is the sampling of the real world to generate data that can be manipulated by a computer. Sometimes abbreviated DAQ or DAS, data acquisition typically involves acquisition of signals and waveforms and processing the signals to obtain desired information. The components of data acquisition systems include appropriate sensors that convert any measurement parameter to an electrical signal, which is acquired by data acquisition hardware.

Data analysis Data analysis is the process of studying and summarizing data with the intent to extract useful information and develop conclusions. Data analysis is closely related to data mining, but data mining tends to focus on larger data sets, with less emphasis on making inference, and often uses data that was originally collected for a different purpose. In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis, and inferential statistics (or confirmatory data analysis), where the EDA focuses on discovering new features in the data, and CDA on confirming or falsifying existing hypotheses. Types of data analysis are: • Exploratory data analysis (EDA): an approach to analyzing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses. It was so named by John Tukey. • Qualitative data analysis (QDA) or qualitative research is the analysis of non-numerical data, for example words, photographs, observations, etc. Data visualization 129

Data governance Data governance encompasses the people, processes and technology required to create a consistent, enterprise view of an organisation's data in order to: • Increase consistency & confidence in decision making • Decrease the risk of regulatory fines • Improve data security • Maximize the income generation potential of data • Designate accountability for information quality

Data management Data management comprises all the academic disciplines related to managing data as a valuable resource. The official definition provided by DAMA is that "Data Resource Management is the development and execution of architectures, policies, practices, and procedures that properly manage the full data lifecycle needs of an enterprise." This definition is fairly broad and encompasses a number of professions that may not have direct technical contact with lower-level aspects of data management, such as relational database management.

Data mining Data mining is the process of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[8] and "the science of extracting useful information from large data sets or databases."[9] In relation to enterprise resource planning, according to Monk (2006), data mining is "the statistical and logical analysis of large sets of transaction data, looking for patterns that can aid decision making".[10]

Data transforms Data transforms is the process of Automation and Transformation, of both real-time and offline data from one format to another. There are standards and protocols that provide the specifications and rules, and it usually occurs in the process pipeline of aggregation or consolidation or interoperability. The primary use cases are in integration systems organizations, and compliance personnels.

Data visualization software

Software Type Targeted Users License

Amira GUI/Code Data Visualisation Scientists Proprietary

Avizo GUI/Code Data Visualisation Engineers and Scientists Proprietary

Cave5D Virtual Reality Data Visualization Scientists Open Source

Data Desk GUI Data Visualisation Statistician Proprietary

DAVIX Operating System with data tools Security Consultant Various

Dundas Data GUI Data Visualisation Business Managers Proprietary Visualization, Inc.

ELKI Data mining visualizations Scientists and Teachers Open Source

Eye-Sys GUI/Code Data Visualisation Engineers and Scientists Proprietary Data visualization 130

Ferret Data Visualization Gridded Datasets Visualisation Oceanographers and meteorologists Open Source and Analysis

Trendalyzer Data Visualisation Teachers Proprietary

Tulip GUI Data Visualization Researchers and Engineers Open Source

Gephi GUI Data Visualisation Statistician Open Source

GGobi GUI Data Visualisation Statistician Open Source

Grapheur GUI Data Visualisation Business Users Proprietary

Data visualization package for R Programmers Open Source

Mondrian GUI Data Visualisation Statistician Open Source

IBM OpenDX GUI/Code Data Visualisation Engineers and Scientists Open Source

IDL (programming Code Data Visualisation Programmer Many language)

IDL (programming Programming Language Programmer Open Source language)

InetSoft Company Many Proprietary

Instantatlas GIS Data Visualisation Analysts, researchers, statisticians and GIS professionals Proprietary

MeVisLab GUI/Code Data Visualisation Engineers and Scientists Proprietary

Panopticon Software Enterprise application, SDK, Rapid Capital Markets, Telecommunications, Energy, Proprietary Development Kit (RDK) Government

ParaView GUI/Code Data Visualisation Engineers and Scientists BSD

Processing Programming Language Programmers GPL (programming language)

protovis Library / Toolkit Programmers BSD

Smile (software) GUI/Code Data Visualisation Engineers and Scientists Proprietary

Spotfire GUI Data Visualisation Business Users Proprietary

StatSoft Company of GUI/Code Data Engineers and Scientists Proprietary Visualisation Software

Tableau Software GUI Data Visualisation Business Users Proprietary

TinkerPlots GUI Data Visualisation Students Proprietary

Tom Sawyer Software Data Visualization and Social Capital Markets, Telecommunications, Energy, Proprietary Network Analysis Applications Government; Business Users, Engineers, and Scientists

Trade Space Visualizer GUI/Code Data Visualisation Engineers and Scientists Proprietary

Visifire Library Programmers Was Open Source, now Proprietary

Vis5D GUI Data Visualization Scientists Open Source

VisAD Java/Jython Library Programmers Open Source

VisIt GUI/Code Data Visualisation Engineers and Scientists BSD

VTK C++ Library Programmers Open Source

Yoix Programming Language Programmers Open Source

Visual.ly Company Creative Tools: Data curation and visualization Proprietary Data visualization 131

References

[1] Michael Friendly (2008). "Milestones in the history of thematic cartography, statistical graphics, and data visualization" (http:/ / www. math.

yorku. ca/ SCS/ Gallery/ milestone/ milestone. pdf).

[2] Vitaly Friedman (2008) "Data Visualization and Infographics" (http:/ / www. smashingmagazine. com/ 2008/ 01/ 14/

monday-inspiration-data-visualization-and-infographics/ ) in: Graphics, Monday Inspiration, January 14th, 2008.

[3] Fernanda Viegas and Martin Wattenberg, "How To Make Data Look Sexy", CNN.com, April 19, 2011. http:/ / articles. cnn. com/

2011-04-19/ opinion/ sexy. data_1_visualization-21st-century-engagement?_s=PM:OPINION [4] Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002). Data Visualization: The State of the Art. Research paper TU delft,

2002. (http:/ / visualisation. tudelft. nl/ publications/ post2003b. pdf). [5] Brian Willison, "Visualization Driven Rapid Prototyping", Parsons Institute for Information Mapping, 2008

[6] http:/ / www. visual-literacy. org/ periodic_table/ periodic_table. html

[7] "Data Visualization: Modern Approaches" (http:/ / www. smashingmagazine. com/ 2007/ 08/ 02/ data-visualization-modern-approaches/ ). in: Graphics, August 2nd, 2007 [8] W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992). "Knowledge Discovery in Databases: An Overview". AI Magazine: pp. 213–228. ISSN 0738-4602. [9] D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA. ISBN 0-262-08290-X. [10] Ellen Monk, Bret Wagner (2006). Concepts in Enterprise Resource Planning, Second Edition. Thomson Course Technology, Boston, MA. ISBN 0-619-21663-8.

Further reading • Chandrajit Bajaj, Bala Krishnamurthy (1999). Data Visualization Techniques. • William S. Cleveland (1993). Visualizing Data. Hobart Press. • William S. Cleveland (1994). The Elements of Graphing Data. Hobart Press. • Alexander N. Gorban, Balázs Kégl, Donald Wunsch, and Andrei Zinovyev (2008). Principal Manifolds for Data

Visualization and Dimension Reduction (http:/ / www. springer. com/ math/ cse/ book/ 978-3-540-73749-0). LNCSE 58. Springer. • John P. Lee and Georges G. Grinstein (eds.) (1994). Database Issues for Data Visualization: IEEE Visualization

'93 Workshop, San Diego (http:/ / portal. acm. org/ toc. cfm?id=646122& type=proceeding& coll=GUIDE&

dl=GUIDE& CFID=35087769& CFTOKEN=59542343). • Peter R. Keller and Mary Keller (1993). Visual Cues: Practical Data Visualization. • Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002). Data Visualization: The State of the Art

(http:/ / visualisation. tudelft. nl/ publications/ post2003b. pdf). • Stewart Liff and Pamela A. Posey, Seeing is Believing: How the New Art of Visual Management Can Boost Performance Throughout Your Organization, AMACOM, New York (2007), ISBN 978-0-8144-0035-7 • Stephen Few (2009) Fundamental Differences in Analytical Tools - Exploratory, Custom, or Customizable (http:/

/ www. perceptualedge. com/ articles/ visual_business_intelligence/ differences_in_analytical_tools. pdf).

External links

• Yau, Nathan (2011), Visualize this, the flowing data guide to design visualization and statistics (http:/ / book.

flowingdata. com/ ), Wiley, pp. 384

• Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization (http:/ / www.

math. yorku. ca/ SCS/ Gallery/ ), An illustrated chronology of innovations by Michael Friendly and Daniel J. Denis.

• Peer-reviewed definition of Data Visualization (http:/ / www. interaction-design. org/ encyclopedia/

data_visualization_for_human_perception. html) with commentaries

• We Love Infographics (http:/ / weloveinfographics. info/ ) Mobile business intelligence 132 Mobile business intelligence

Mobile Business Intelligence (Mobile BI or Mobile Intelligence) refers to the distribution of business data to mobile devices such as smartphones and tablet computers. Business intelligence (BI) refers to computer-based techniques used in spotting, digging-out, and analyzing business data, such as sales revenue by products and/or departments or associated costs and incomes.[1] Although the concept of mobile computing has been prevalent for over a decade, Mobile BI has shown a momentum/growth only very recently. This change has been partly encouraged by a change from the ‘wired world’ to a wireless world with the advantage of smartphones which has led to a new era of mobile computing, especially in the field of BI.[2] According to the Aberdeen Group, a large number of companies are rapidly undertaking mobile BI owing to a large number of market pressures such as the need for higher efficiency in business processes, improvement in employee productivity (e.g., time spent looking for information), better and faster decision making, better customer service, and delivery of real-time bi-directional data access to make decisions anytime and anywhere.[3]

History

Information delivery to mobile devices The predominant method for accessing BI information is using or a Web browser on a personal computer to connect to BI applications. These BI applications request data from databases. Starting in the late 1990s, BI systems offered alternatives for receiving data, including email and mobile devices.

Static data push Initially, mobile devices such as pagers and mobile phones received pushed data using a short message service (SMS) or text messages. These applications were designed for specific mobile devices, contained minimal amounts of information, and provided no data interactivity. As a result, the early mobile BI applications were expensive to design and maintain while providing limited informational value, and garnered little interest.

Data access via a mobile browser The mobile browser on a smartphone, a handheld computer integrated with a mobile phone,[4] provided a means to read simple tables of data. The small screen space, immature mobile browsers, and slow data transmission could not provide a satisfactory BI experience. Accessibility and bandwidth may be perceived as issues when it comes to mobile technology, but BI solutions provide advanced functionality to predict and outperform such potential challenges. While Web-based mobile BI solutions provide little to no control over the processing of data in a network, managed BI solutions for mobile devices only utilize the server for specific operations. In addition, local reports are compressed both during transmission and on the device, permitting greater flexibility for storage and receipt of these reports. Within a mobile environment, users capitalize on easy access to information because the mobile application operates within a single authoring environment that permits access to all BI content (respecting existing security) regardless of language or locale. Furthermore, the user will not need to build and maintain a separate mobile BI deployment. In addition, mobile BI requires much less bandwidth for functionality. Mobile BI promises a small report footprint on memory, encryption during transmission as well as on the device, and compressed data storage for offline viewing and use.[5] Mobile business intelligence 133

Mobile client application In 2002, Research in Motion released the first BlackBerry smartphone optimized for wireless email use. Wireless e-mail proved to be the “killer app” that accelerated the popularity of the smartphone market. By the mid-2000s, Research in Motion’s BlackBerry had solidified its hold on the smartphone market with both corporate and governmental organizations. The BlackBerry smartphones eliminated the obstacles to mobile business intelligence. The BlackBerry offered a consistent treatment of data across its many models, provided a much larger screen for viewing data, and allowed user interactivity via the thumbwheel and keyboard. BI vendors re-entered the market with offerings spanning different mobile operating systems (BlackBerry, Windows, Symbian) and data access methods. The two most popular data access options were: • to use the mobile browser to access data, similar to desktop computer, and • to create a native application designed specifically for the mobile device.[6] Research in Motion is continuing to lose market share to Apple and Android smartphones. In the first three months of 2011 Google’s Android OS gained 7 points of market share. During the same time period RIM’s market share collapsed and dropped almost 5 points.[7]

Purpose-built Mobile BI apps Apple quickly set the standard for mobile devices with the introduction of the iPhone. In the first three years, Apple sold over 33.75 million units.[8] Similarly in 2010, Apple sold over 1 million iPads in just under three months.[9] Both devices feature an interactive touchscreen display that is the de facto standard on many mobile phones and tablet computers. In 2008, Apple published the SDK for which developers can build applications that run natively on the iPhone and iPad instead of Safari based-applications. These native applications can give the user a robust, easier-to-read and easier-to-navigate experience. The Apple App Store now has over 250,000 apps, giving the Apple mobile devices a decided advantage. Others were quick to replicate Apple’s success. The Android Market now has over 200,000 apps available for the mobile devices running the Android operating system. More importantly, the advent of the Apple iPhone has radically changed the way people use data on their mobile devices. This includes mobile BI. Business intelligence applications can be used to transform reports and data into mobile dashboards, and have them instantly delivered to any iPhone or iPad.[10] Google Inc.’s Android has overtaken Apple Inc.’s iOS in the wildly growing arena of app downloads. In the second quarter of 2011, 44% of all apps downloaded from app marketplaces across the web were for Android devices while 31% were for Apple devices, according to new data from ABI Research. The remaining apps were for various other mobile operating systems, including BlackBerry and Windows Phone 7.[11] Mobile BI applications have evolved from being a client application for viewing data to a purpose built application designed to provide information and workflows necessary to quickly make business decisions and take action.

Web Applications vs. Device-Specific Applications for Mobile BI In early 2011, as the mobile BI software market started to mature and adoption started to grow at a significant pace in both small and large enterprises, most vendors have adopted either a purpose built, device-specific application strategy (e.g. iPhone or Android apps, downloaded from iTunes or the Android Market) or a web application strategy (browser-based, works on most devices without an application being installed on the device). This debate continues and there are benefits and drawbacks to both methods.[12] One potential solution will be the wider adoption of HTML5 on mobile devices which will give web applications many of the characteristics of dedicated applications while still allowing them to work on many devices without an installed application. Mobile business intelligence 134

Microsoft has announced their mobile BI strategy. Microsoft plans to support browser-based applications such as Reporting Services and PerformancePoint on iOS in the first half of 2012 and touch-based applications on iOS and Android by the second half of 2012. Despite popular perception that Microsoft only acknowledges its own existence, recent moves suggest the company is aware that it is not the only player in the technology ecosystem. Instead of attempting to squelch competition or suggesting new technology developments were ridiculous the company has instead decided to make its technology accessible to a wider audience.[13] There are many mobile devices and platforms available today. The list is constantly growing and so is the platform support. There are hundreds of models available today, with multiple hardware and software combinations. The enterprise must select a device very carefully. The target devices will impact the mobile BI design itself because the design for a smartphone will be different than for a tablet. The screen size, processor, memory, etc. all vary. The mobile BI program must account for lack of device standardization from the providers by constantly testing devices for the mobile BI apps. Some best practices can always be followed. For example, a smartphone is a good candidate for operational mobile BI. However, for analytics and what-if analysis, tablets are the best option. Hence, the selection or availability of the device plays a big role in the implementation.[14]

Demand Gartner analyst Ted Friedman believes that mobile delivery of BI is all about practical, tactical information needed to make immediate decisions – "The biggest value is in operational BI — information in the context of applications — not in pushing lots of data to somebody's phone."[15] Accessing the Internet through a mobile device such as a smartphone is also known as the mobile Internet or mobile Web. IDC expects the US mobile workforce to increase by 73% in 2011.[16] Morgan Stanley reports the mobile Internet is ramping up faster than its predecessor, the desktop Internet, enabling companies to deliver knowledge to their mobile workforce to help them make more profitable decisions.[17] Michael Cooney from Gartner has identified bring-your-own-technology at work as becoming the norm, not the exception. By 2015 media tablet shipments will reach around 50% of laptop shipments and Windows 8 will likely be in third place behind Android and Apple. The net result is that Microsoft’s share of the client platform, be it PC, tablet or smartphone, will likely be reduced to 60% and it could fall below 50%.[18]

Business Benefits In its latest Magic Quadrant for Business Intelligence Platforms, Gartner examines whether the platform enables users to "fully interact with BI content delivered to mobile devices." The phrase "fully interact" is the key. The ability to send alerts embedded in email or text messages, or links to static content in email messages hardly represents sophistication in mobile analytics. For users to benefit from mobile BI, they must be able to navigate dashboard and guided analytics comfortably—or as comfortably as the mobile device will allow, which is where devices with high-resolution screens and touch interfaces (like the iPhone and Android-based phones) have a clear edge over, say, earlier editions of BlackBerry. It is equally important to take a step back to define your purpose and adoption patterns. Which business users stand to benefit the most from mobile analytics—and what, exactly, is their requirement? You don't need mobile analytics to send a few alerts or summary reports to their handhelds—without interactivity, mobile BI is indistinguishable from merely informative email or text messages.[19] Mobile business intelligence 135

Applications Similar to consumer applications, which have shown an ever increasing growth over the past few years, a constant demand for anytime, anywhere access to BI is leading to a number of custom mobile application development.[20] Businesses have also started adopting mobile solutions for their workforce and are soon becoming key components of core business processes.[21] In an Aberdeen survey conducted in May 2010, 23% of companies participating indicated that they now have a mobile BI app or dashboard in place, while another 31% indicated that they plan to implement some form of mobile BI in the next year.[22]

Definitions Mobile BI applications can be defined/segregated as follows: • Mobile Browser Rendered App: Almost any mobile device enables Web-based, thin client, HTML-only BI applications. However, these apps are static and provide little data interactivity. Data is viewed just as it would be over a browser from a personal computer. Little additional effort is required to display data but mobile browsers can typically only support a small subset of the interactivity of a web browser. • Customized App: A step up from this approach is to render each (or all) reports and dashboards in device-specific format. In other words provide information specific to the screen size, optimize usage of screen real estate, and enable device-specific navigation controls. Examples of these include thumb wheel or thumb button for BlackBerry, up/down/left/right arrows for Palm, gestural manipulation for iPhone. This approach requires more effort than the previous but no additional software. • Mobile Client App: The most advanced, the client app provides full interactivity with the BI content viewed on the device. In addition, this approach provides periodic caching of data which can be viewed and analyzed even offline.[23] Companies across all verticals, from retail[24] to even non-profit organizations[25] are realizing the value of purpose-specific mobile applications suited for their mobile workforce.

Development Developing a native mobile BI app poses challenges, especially concerning data display rendering and user interactivity. Mobile BI App development has traditionally been a time-consuming and expensive effort requiring businesses to justify the investment for the mobile workforce. They do not only require texting and alerts, they need information customized for their line of work which they can interact with and analyze to gain deeper information.[20]

Custom-coded Mobile BI Apps Mobile BI applications are often custom coded apps specific to the underlying mobile operating system. For example, the iPhone apps require coding in Objective-C while Android apps require coding in Java. In addition to the user functionality of the app, the app must be coded to work with the supporting server infrastructure required to serve data to the mobile BI app. While custom coded apps offer near limitless options, the specialized software coding expertise and infrastructure can be expensive to development, modify, and maintain. Mobile business intelligence 136

Fixed-form Mobile BI Applications Business data can be displayed in a mobile BI client (or web browser) that serves as a user interface to existing BI platforms or other data sources, eliminating the need for new master sources of data and specialized server infrastructure. This option offers fixed and configurable data visualizations such as charts, tables, trends, KPIs, and links, and can usually be deployed quickly using existing data sources. However, the data visualizations are not limitless and cannot always be extended to beyond what is available from the vendor.

Graphical Tool-developed Mobile BI Apps Mobile BI apps can also be developed using the graphical, drag-and-drop development environments of BI platforms. The advantages including the following: 1. Apps can be developed without coding, 2. Apps can be easily modified and maintained using the BI platform change management tools, 3. Apps can use any range of data visualizations and not be limited to just a few, 4. Apps can incorporate specific business workflows, and 5. The BI platform provides the server infrastructure. Using graphical BI development tools can allow faster mobile BI app development when a custom application is required.

Security Considerations for Mobile BI Apps High adoption rates and reliance on mobile devices makes safe mobile computing a critical concern.[26] The Mobile Business Intelligence Market Study discovered that security is the number one issue (63%) for organizations.[27] A comprehensive mobile security solution must provide security at these levels:[28] • Device • Transmission • Authorization, Authentication, and Network Security

Device Security Senior analyst at the Burton Group research firm recommends that the best way to ensure data will not be tampered with is to not store it on the client device (mobile device). As such, there is no local copy to lose if the mobile device is stolen and the data can reside on servers within the data center with access permitted only over the network.[29] Most smartphone manufacturers provide a complete set of security features including full-disk encryption, email encryption, as well as remote management which includes the ability to wipe contents if device is lost or stolen.[28] Also, some devices have embedded third-party antivirus and firewall software such as RIM's BlackBerry.[28]

Transmission Security Transmission security refers to measures that are designed to protect data from unauthorized interception, traffic analysis, and imitative deception.[30] These measures include Secure Sockets Layer (SSL), iSeries Access for Windows, and virtual private network (VPN) connections.[31] A secure data transmission should enable the identity of the sender and receiver to be verified by using a cryptographic shared key system as well as protect the data to be modified by a third party when it crosses the network. This can be done using AES or Triple DES with an encrypted SSL tunnel.[28] Mobile business intelligence 137

Authorization, Authentication, and Network Security Authorization refers to the act of specifying access rights to control access of information to users.[32] Authentication refers to the act of establishing or confirming the user as true or authentic.[33] Network security refers to all the provisions and policies adopted by the network administrator to prevent and monitor unauthorized access, misuse, modification, or denial of the computer network and network-accessible resources.[34] The mobility adds to unique security challenges. As data is trafficked beyond the enterprise firewall towards un-known territories, ensuring that it is handled safe is of paramount importance. Towards this, proper authentication of user connections, centralized access control (like LDAP Directory), Encrypted data transfer mechanisms can be implemented.[35]

Role of BI for Securing Mobile Apps To ensure high security standards, BI software platforms must extend the authentication options and policy controls to the mobile platform. Business intelligence software platforms need to ensure a secure encrypted keychain for storage of credentials. Administrative control of password policies should allow creation of security profiles for each user and seamless integration with centralized security directories to reduce administration and maintenance of users.

Products A number of BI vendors and niche software vendors offer mobile BI solutions. Some notable examples include: • JReport • PHOCAS Business Intelligence • MyBI|MyBI SaaS • Business Objects • Cognos • Strategy Companion • Information Builders • InetSoft • MicroStrategy • QlikView • SAP • TARGIT Business Intelligence • Transpara • Yellowfin Business Intelligence • SurfBI [36] • Roambi • Trendslide [37] Mobile business intelligence 138

References

[1] BusinessDictionary.com (http:/ / www. businessdictionary. com/ )

[2] Business Intelligence for Intelligent Businesses (http:/ / www. information-management. com/ specialreports/ 2008_89/ 10001705-1. html?pg=1Mobile)

[3] Mobile Business Intelligence: Best-in-Class Secrets to Success (http:/ / intelligent-enterprise. informationweek. com/ channels/

enterprise_applications/ development/ showArticle. jhtml;jsessionid=VCS3TZ0FEKCO5QE1GHRSKHWATMY32JVN?articleID=212300110)

[4] Smartphone definition (http:/ / www. pcmag. com/ encyclopedia_term/ 0,2542,t=Smartphone& i=51537,00. asp)

[5] http:/ / www. information-management. com/ specialreports/ 2008_111/ 10002254-1. html?pg=2

[6] Intelligence on the Go: Mobile Delivery of BI (http:/ / intelligent-enterprise. informationweek. com/ channels/ business_intelligence/

showArticle. jhtml;jsessionid=QJ2ILLXEXQ0SDQE1GHOSKHWATMY32JVN?articleID=207801077& cid=RSSfeed_IE_News)

[7] http:/ / articles. businessinsider. com/ 2011-04-02/ tech/ 30089528_1_android-phones-google-s-android-smartphone-market

[8] Apple Reports Second Quarter Results (http:/ / www. apple. com/ pr/ library/ 2009/ 04/ 22results. html)

[9] Apple Sells 1 Million iPads (http:/ / www. cnbc. com/ id/ 36911690)

[10] Henschen, Doug (April 15, 2010). "Mobile BI Apps Target iPad" (http:/ / www. informationweek. com/ news/ software/

systems_management/ showArticle. jhtml?articleID=224400267). InformationWeek. .

[11] http:/ / www. internetretailer. com/ 2011/ 10/ 26/ android-surpasses-apple-app-download-share

[12] Eckerson, Wayne "Architecting for Mobile BI" (http:/ / www. b-eye-network. com/ blogs/ eckerson/ archives/ 2011/ 04/ architecting_fo. php), BeyeNetwork, April 21, 2011.

[13] http:/ / www. cmswire. com/ cms/ mobile/ microsoft-announces-mobile-business-intelligence-strategy-013091. php

[14] http:/ / www. information-management. com/ newsletters/ mobile_BI_integration_apps_data_management-10020527-1. html?pg=3

[15] Business Intelligence Goes Mobile (http:/ / www. computerworld. com/ s/ article/ 9179090/

Business_intelligence_goes_mobile?taxonomyId=9& pageNumber=1)

[16] Worldwide Mobi le Worker Population 2007–2011 Forecast (http:/ / img. en25. com/ web/ CitrixOnline/ IDC_MobileWorker_excerpt_0_0. pdf)

[17] Morgan Stanley - Internet Trends (http:/ / www. morganstanley. com/ institutional/ techresearch/ pdfs/ Internet_Trends_041210. pdf)

[18] http:/ / www. cio. com/ article/ 692031/ Gartner_the_Top_10_Strategic_Technology_Trends_for_2012?source=ctwartcio

[19] http:/ / www. informationweek. com/ news/ software/ bi/ 229300446

[20] iPhone, Need for Features Driving Plans for Mobile BI App Development (http:/ / searchcio. techtarget. com/ news/ 1376914/ IPhone-need-for-features-driving-plans-for-mobile-BI-app-development)

[21] Mobile Line-of-Business Applications for the Midsize Business: An ROI Analysis (http:/ / download. microsoft. com/ download/ A/ 9/ F/

A9F105FF-3C62-44B1-A1AE-71E267C31CFE/ LOB_for_Midsize_Business_ROI_IDC_White_Paper. pdf)

[22] Mobile BI - A Bonanza for many Businesses (http:/ / www. itbusiness. ca/ it/ client/ en/ home/ News. asp?id=58375)

[23] Not all BI Applications are Created Equal (http:/ / blogs. forrester. com/ business_process/ 2009/ 11/

not-all-mobile-bi-applications-are-created-equal. html)

[24] Intelligence on the Go: Mobile Delivery of BI (http:/ / intelligent-enterprise. informationweek. com/ channels/ business_intelligence/

customer_intelligence/ showArticle. jhtml;jsessionid=ZIXKX1N5D25C1QE1GHOSKHWATMY32JVN?articleID=207801077& pgno=2)

[25] Mobile BI can be Quick (http:/ / napps. networkworld. com/ news/ 2010/ 071410-mobile-bi-can-be-quick. html)

[26] Mobile Security Needs Rise With Smartphone Sales (http:/ / gigaom. com/ 2010/ 09/ 09/ mobile-security-needs-rise-with-smartphone-sales/ )

[27] Business Intelligence Blog by Howard Dresner (http:/ / businesssintelligence. blogspot. com/ )

[28] Successful Mobile Deployments Require Robust Security (http:/ / na. blackberry. com/ ataglance/ get_the_facts/

Successful_Mobile_Deployments. pdf)

[29] Mobile Security Definition and Solutions (http:/ / www. cio. com/ article/ 40360/ Mobile_Security_Definition_and_Solutions?page=1& taxonomyId=3061)

[30] (http:/ / www. answers. com/ topic/ transmission-security)

[31] IBM: Transmission security options (http:/ / publib. boulder. ibm. com/ infocenter/ iseries/ v5r4/ index. jsp?topic=/ rzaj4/

rzaj45zhcryptointro. htm) [32] Wikipedia Article - Authorization [33] Wikipedia Article - Authentication

[34] Definition of Network Security (http:/ / www. deepnines. com/ secure-web-gateway/ definition-of-network-security)

[35] Expertstown.com - Mobile Business Intelligence (http:/ / www. expertstown. com/ mobile-business-intelligence)

[36] http:/ / www. surfbi. com/

[37] http:/ / www. trendslide. com/ Composite application 139 Composite application

In computing, the term composite application expresses a perspective of software engineering that defines an application built by combining multiple existing functions into a new application. The technical concept can be compared to mashups. However, composite applications use business sources (e.g., existing modules or even Web services ) of information, while mashups usually rely on web-based, and often free, sources. It is wrong to assume that composite applications are by definition part of a service oriented architecture (SOA). Composite applications can be built using any technology or architecture. A composite application consists of functionality drawn from several different sources. The components may be individual selected functions from within other applications, or entire systems whose outputs have been packaged as business functions, modules, or web services. Composite applications often incorporate orchestration of "local" application logic to control how the composed functions interact with each other to produce the new, derived functionality. For composite applications that are based on SOA, WS-CAF is a Web services standard for composite applications[1].

External links • On Estimating the Security Risks of Composite Software Services [2] Research paper (PDF). • Visual Composite Applications: Magnifying Data Return on Investment (ROI) [3] White Paper (PDF). • IBM DeveloperWorks Composite Application Blog [4] • Magic Quadrant for Application Infrastructure for SOA Composite Application Projects [5] Gartner • Composite application guidance from patterns & practices [6]

References

[1] OASIS Web Services Composite Application Framework (WS-CAF) TC (http:/ / www. oasis-open. org/ committees/ tc_home. php?wg_abbrev=ws-caf)

[2] http:/ / research. ihost. lv/ Yin. pdf

[3] http:/ / www. idvsolutions. com/ documents/ Visual%20Composite%20Applications_white%20paper%206. pdf

[4] http:/ / www. ibm. com/ developerworks/ blogs/ page/ CompApps

[5] http:/ / mediaproducts. gartner. com/ reprints/ microsoft/ vol3/ article5/ article5. html

[6] http:/ / msdn. microsoft. com/ en-us/ library/ cc707819. aspx Cloud computing 140 Cloud computing

Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation.

There are many types of public cloud computing:[1] • Infrastructure as a service (IaaS) • Platform as a service (PaaS) • Software as a service (SaaS) • Storage as a service (STaaS)

• Security as a service (SECaaS) Cloud computing logical diagram • Data as a service (DaaS) • Test environment as a service (TEaaS) • Desktop as a service (DaaS) • API as a service (APIaaS) The business model, IT as a service (ITaaS), is used by in-house, enterprise IT organizations that offer any or all of the above services. Using software as a service, users also rent application software and databases. The cloud providers manage the infrastructure and platforms on which the applications run. End users access cloud-based applications through a web browser or a light-weight desktop or mobile app while the business software and user's data are stored on servers at a remote location. Proponents claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT to more rapidly adjust resources to meet fluctuating and unpredictable business demand.[2][3] Cloud computing relies on sharing of resources to achieve coherence and economies of scale similar to a utility (like the electricity grid) over a network.[4] At the foundation of cloud computing is the broader concept of converged infrastructure and shared services.

History The origin of the term cloud computing is obscure, but it appears to derive from the practice of using drawings of stylized clouds to denote networks in diagrams of computing and communications systems. The word cloud is used as a metaphor for the Internet, based on the standardized use of a cloud-like shape to denote a network on telephony schematics and later to depict the Internet in computer network diagrams as an abstraction of the underlying infrastructure it represents. The cloud symbol was used to represent the Internet as early as 1994.[5][6] In the 1990s, telecommunications companies who previously offered primarily dedicated point-to-point data circuits, began offering virtual private network (VPN) services with comparable quality of service but at a much lower cost. Cloud computing 141

By switching traffic to balance utilization as they saw fit, they were able to utilize their overall network bandwidth more effectively. The cloud symbol was used to denote the demarcation point between that which was the responsibility of the provider and that which was the responsibility of the users. Cloud computing extends this boundary to cover servers as well as the network infrastructure.[7] The underlying concept of cloud computing dates back to the 1950s; when large-scale mainframe became available in academia and corporations, accessible via thin clients / terminal computers. Because it was costly to buy a mainframe, it became important to find ways to get the greatest return on the investment in them, allowing multiple users to share both the physical access to the computer from multiple terminals as well as to share the CPU time, eliminating periods of inactivity, which became known in the industry as time-sharing.[8] As computers became more prevalent, scientists and technologists explored ways to make large-scale computing power available to more users through time sharing, experimenting with algorithms to provide the optimal use of the infrastructure, platform and applications with prioritized access to the CPU and efficiency for the end users.[9] John McCarthy opined in the 1960s that "computation may someday be organized as a public utility." Almost all the modern-day characteristics of cloud computing (elastic provision, provided as a utility, online, illusion of infinite supply), the comparison to the electricity industry and the use of public, private, government, and community forms, were thoroughly explored in Douglas Parkhill's 1966 book, The Challenge of the Computer Utility. Other scholars have shown that cloud computing's roots go all the way back to the 1950s when scientist Herb Grosch (the author of Grosch's law) postulated that the entire world would operate on dumb terminals powered by about 15 large data centers.[10] Due to the expense of these powerful computers, many corporations and other entities could avail themselves of computing capability through time sharing and several organizations, such as GE's GEISCO, IBM subsidiary The Service Bureau Corporation, Tymshare (founded in 1966), National CSS (founded in 1967 and bought by Dun & Bradstreet in 1979), Dial Data (bought by Tymshare in 1968), and Bolt, Beranek and Newman marketed time sharing as a commercial venture. The ubiquitous availability of high capacity networks, low cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture, autonomic, and utility computing have led to a tremendous growth in cloud computing.[11][12][13] After the dot-com bubble, Amazon played a key role in the development of cloud computing by modernizing their data centers, which, like most computer networks, were using as little as 10% of their capacity at any one time, just to leave room for occasional spikes. Having found that the new cloud architecture resulted in significant internal efficiency improvements whereby small, fast-moving "two-pizza teams" could add new features faster and more easily, Amazon initiated a new product development effort to provide cloud computing to external customers, and launched Amazon Web Service (AWS) on a utility computing basis in 2006.[14][15] In early 2008, Eucalyptus became the first open-source, AWS API-compatible platform for deploying private clouds. In early 2008, OpenNebula, enhanced in the RESERVOIR European Commission-funded project, became the first open-source software for deploying private and hybrid clouds, and for the federation of clouds.[16] In the same year, efforts were focused on providing quality of service guarantees (as required by real-time interactive applications) to cloud-based infrastructures, in the framework of the IRMOS European Commission-funded project, resulting to a real-time cloud environment.[17] By mid-2008, Gartner saw an opportunity for cloud computing "to shape the relationship among consumers of IT services, those who use IT services and those who sell them"[18] and observed that "organisations are switching from company-owned hardware and software assets to per-use service-based models" so that the "projected shift to computing... will result in dramatic growth in IT products in some areas and significant reductions in other areas."[19] On March 1, 2011, IBM announced the Smarter Computing framework to support Smarter Planet.[20] Among the various components of the Smarter Computing foundation, cloud computing is a critical piece. In 2012, Dr. Biju John and Dr. Souheil Khaddaj describe the cloud as a virtualized, semantic source of information: "Cloud computing is a universal collection of data which extends over the internet in the form of resources (such as Cloud computing 142

information hardware, various platforms, services etc.) and forms individual units within the virtualization environment. Held together by infrastructure providers, service providers and the consumer, then it is semantically accessed by various users."

Similar systems and concepts Cloud computing shares characteristics with: • Autonomic computing — Computer systems capable of self-management.[21] • Client–server model — Client–server computing refers broadly to any distributed application that distinguishes between service providers (servers) and service requesters (clients).[22] • Grid computing — "A form of distributed and parallel computing, whereby a 'super and virtual computer' is composed of a cluster of networked, loosely coupled computers acting in concert to perform very large tasks." • Mainframe computer — Powerful computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, police and secret intelligence services, enterprise resource planning, and financial transaction processing.[23] • Utility computing — The "packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility, such as electricity."[24][25] • Peer-to-peer — Distributed architecture without the need for central coordination, with participants being at the same time both suppliers and consumers of resources (in contrast to the traditional client–server model). • Cloud gaming - Also called On-demand gaming is a way of delivering to games to computers. The gaming data will be stored in the provider's server, so that gaming will be independent of client computers used to play the game.

Characteristics Cloud computing exhibits the following key characteristics: • Agility improves with users' ability to re-provision technological infrastructure resources. • Application programming interface (API) accessibility to software that enables machines to interact with cloud software in the same way the user interface facilitates interaction between humans and computers. Cloud computing systems typically use REST-based APIs. • Cost is claimed to be reduced and in a public cloud delivery model capital expenditure is converted to operational expenditure.[26] This is purported to lower barriers to entry, as infrastructure is typically provided by a third-party and does not need to be purchased for one-time or infrequent intensive computing tasks. Pricing on a utility computing basis is fine-grained with usage-based options and fewer IT skills are required for implementation (in-house).[27] The e-FISCAL project's state of the art repository[28] contains several articles looking into cost aspects in more detail, most of them concluding that costs savings depend on the type of activities supported and the type of infrastructure available in-house. • Device and location independence[29] enable users to access systems using a web browser regardless of their location or what device they are using (e.g., PC, mobile phone). As infrastructure is off-site (typically provided by a third-party) and accessed via the Internet, users can connect from anywhere.[27] • Virtualization technology allows servers and storage devices to be shared and utilization be increased. Applications can be easily migrated from one physical server to another. • Multitenancy enables sharing of resources and costs across a large pool of users thus allowing for: • Centralization of infrastructure in locations with lower costs (such as real estate, electricity, etc.) • Peak-load capacity increases (users need not engineer for highest possible load-levels) • Utilisation and efficiency improvements for systems that are often only 10–20% utilised.[14] • Reliability is improved if multiple redundant sites are used, which makes well-designed cloud computing suitable for business continuity and disaster recovery.[30] Cloud computing 143

• Scalability and elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads.[31][32] • Performance is monitored, and consistent and loosely coupled architectures are constructed using web services as the system interface.[27] • Security could improve due to centralization of data, increased security-focused resources, etc., but concerns can persist about loss of control over certain sensitive data, and the lack of security for stored kernels.[33] Security is often as good as or better than other traditional systems, in part because providers are able to devote resources to solving security issues that many customers cannot afford.[34] However, the complexity of security is greatly increased when data is distributed over a wider area or greater number of devices and in multi-tenant systems that are being shared by unrelated users. In addition, user access to security audit logs may be difficult or impossible. Private cloud installations are in part motivated by users' desire to retain control over the infrastructure and avoid losing control of information security. • Maintenance of cloud computing applications is easier, because they do not need to be installed on each user's computer and can be accessed from different places.

On-demand self-service On-demand self-service allows users to obtain, configure and deploy cloud services themselves using cloud service catalogues, without requiring the assistance of IT.[35][36] This feature is listed by the The National Institute of Standards and Technology (NIST) as a characteristic of cloud computing.[37] The self-service requirement of cloud computing prompts infrastructure vendors to create cloud computing templates, which are obtained from cloud service catalogues. Manufacturers of such templates or blueprints include Hewlett-Packard (HP), which names its templates as HP Cloud Maps[38] RightScale[39] and Red Hat, which names its templates CloudForms.[40] The templates contain predefined configurations used to by consumers to set up cloud services. The templates or blueprints provide the technical information necessary to build ready-to-use clouds.[39] Each template includes specific configuration details for different cloud infrastructures, with information about servers for specific tasks such as hosting applications, databases, websites and so on.[39] The templates also include predefined Web service, the operating system, the database, security configurations and load balancing.[40] Cloud consumers use cloud templates to move applications between clouds through a self-service portal. The predefined blueprints define all that an application requires to run in different environments. For example, a template could define how the same application could be deployed in cloud platforms based on Amazon Web Service, VMware or Red Hat.[41] The user organisation benefits from cloud templates because the technical aspects of cloud configurations reside in the templates, letting users to deploy cloud services with a push of a button.[42][43] Cloud templates can also be used by developers to create a catalog of cloud services.[44] Cloud computing 144

Service models Cloud computing providers offer their services according to three fundamental models:[4][45] Infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) where IaaS is the most basic and each higher model abstracts from the details of the lower models.

Infrastructure as a service (IaaS)

In this most basic cloud service model, cloud providers offer computers, as physical or more often as virtual machines, and other resources. The virtual machines are run as guests by a hypervisor, such as Xen or KVM. Management of pools of hypervisors by the cloud operational support system leads to the ability to scale to support a large number of virtual machines. Other resources in IaaS clouds include images in a virtual machine image library, raw (block) and file-based storage, firewalls, load balancers, IP addresses, virtual local area networks (VLANs), and software bundles.[46] Amies, Alex; Sluiman, Harm; Tong IaaS cloud providers supply these resources on demand from their large pools installed in data centers. For wide area connectivity, the Internet can be used or—in carrier clouds -- dedicated virtual private networks can be configured., Qiang Guo (July 2012). "Infrastructure as a Service Cloud Concepts" [47]. Developing and Hosting Applications on the Cloud. IBM Press. ISBN 978-0-13-306684-5.

To deploy their applications, cloud users then install operating system images on the machines as well as their application software. In this model, it is the cloud user who is responsible for patching and maintaining the operating systems and application software. Cloud providers typically bill IaaS services on a utility computing basis, that is, cost will reflect the amount of resources allocated and consumed. IaaS refers not to a machine that does all the work, but simply to a facility given to businesses that offers users the leverage of extra storage space in servers and data centers. Examples of IaaS include: Amazon CloudFormation (and underlying services such as Amazon EC2), Rackspace Cloud, Terremark and Google Compute Engine.

Platform as a service (PaaS) In the PaaS model, cloud providers deliver a computing platform typically including operating system, programming language execution environment, database, and web server. Application developers can develop and run their software solutions on a cloud platform without the cost and complexity of buying and managing the underlying hardware and software layers. With some PaaS offers, the underlying computer and storage resources scale automatically to match application demand such that cloud user does not have to allocate resources manually. Examples of PaaS include: Amazon Elastic Beanstalk, Cloud Foundry, Heroku, EngineYard, Mendix, , Microsoft Azure and OrangeScape.

Software as a service (SaaS) In this model, cloud providers install and operate application software in the cloud and cloud users access the software from cloud clients. The cloud users do not manage the cloud infrastructure and platform on which the application is running. This eliminates the need to install and run the application on the cloud user's own computers simplifying maintenance and support. What makes a cloud application different from other applications is its elasticity. This can be achieved by cloning tasks onto multiple virtual machines at run-time to meet the changing work demand.[48] Load balancers distribute the work over the set of virtual machines. This process is inconspicuous Cloud computing 145

to the cloud user who sees only a single access point. To accommodate a large number of cloud users, cloud applications can be multitenant, that is, any machine serves more than one cloud user organization. It is common to refer to special types of cloud based application software with a similar naming convention: desktop as a service, business process as a service, test environment as a service, communication as a service. The pricing model for SaaS applications is typically a monthly or yearly flat fee per user,[49] so price is scalable and adjustable if users are added or removed at any point.[50] Examples of SaaS include: Google Apps, innkeypos, Quickbooks Online, Limelight Video Platform, .com and Microsoft Office 365.

Cloud clients Users access cloud computing using networked client devices, such as desktop computers, laptops, tablets and smartphones. Some of these devices - cloud clients - rely on cloud computing for all or a majority of their applications so as to be essentially useless without it. Examples are thin clients and the browser-based . Many cloud applications do not require specific software on the client and instead use a web browser to interact with the cloud application. With Ajax and HTML5 these Web user interfaces can achieve a similar or even better look and feel as native applications. Some cloud applications, however, support specific client software dedicated to these applications (e.g., virtual desktop clients and most email clients). Some legacy applications (line of business applications that until now have been prevalent in thin client Windows computing) are delivered via a screen-sharing technology.

Deployment models

Public cloud

Public cloud applications, storage, and other resources are made available to the general public by a service provider. These services are free or offered on a pay-per-use model. Generally, public cloud service providers like Amazon AWS, Microsoft and Google own and operate the infrastructure and offer access only via Internet (direct connectivity is not offered.[27] Cloud computing types Cloud computing 146

Community cloud Community cloud shares infrastructure between several organizations from a specific community with common concerns (security, compliance, jurisdiction, etc.), whether managed internally or by a third-party and hosted internally or externally. The costs are spread over fewer users than a public cloud (but more than a private cloud), so only some of the cost savings potential of cloud computing are realized.[4]

Hybrid cloud Hybrid cloud is a composition of two or more clouds (private, community or public) that remain unique entities but are bound together, offering the benefits of multiple deployment models.[4] By utilizing "hybrid cloud" architecture, companies and individuals are able to obtain degrees of fault tolerance combined with locally immediate usability without dependency on internet connectivity. Hybrid cloud architecture requires both on-premises resources and off-site (remote) server-based cloud infrastructure. Hybrid clouds lack the flexibility, security and certainty of in-house applications.[51] Hybrid cloud provides the flexibility of in house applications with the fault tolerance and scalability of cloud based services.

Private cloud Private cloud is cloud infrastructure operated solely for a single organization, whether managed internally or by a third-party and hosted internally or externally.[4] Undertaking a private cloud project requires a significant level and degree of engagement to virtualize the business environment, and it will require the organization to reevaluate decisions about existing resources. When it is done right, it can have a positive impact on a business, but every one of the steps in the project raises security issues that must be addressed in order to avoid serious vulnerabilities.[52] They have attracted criticism because users "still have to buy, build, and manage them" and thus do not benefit from less hands-on management,[53] essentially "[lacking] the economic model that makes cloud computing such an intriguing concept".[54][55]

Architecture

Cloud architecture,[56] the systems architecture of the software systems involved in the delivery of cloud computing, typically involves multiple cloud components communicating with each other over a loose coupling mechanism such as a messaging queue. Elastic provision implies intelligence in the use of tight or loose coupling as applied to mechanisms such as these and others.

The Intercloud [57] The Intercloud is an interconnected Cloud computing sample architecture global "cloud of clouds"[58][59] and an extension of the Internet "network of networks" on which it is based.[60][61][62] Cloud computing 147

Cloud engineering Cloud engineering is the application of engineering disciplines to cloud computing. It brings a systematic approach to the high-level concerns of commercialisation, standardisation, and governance in conceiving, developing, operating and maintaining cloud computing systems. It is a multidisciplinary method encompassing contributions from diverse areas such as systems, software, web, performance, information, security, platform, risk, and quality engineering.

Issues

Privacy The cloud model has been criticised by privacy advocates for the greater ease in which the companies hosting the cloud services control, thus, can monitor at will, lawfully or unlawfully, the communication and data stored between the user and the host company. Instances such as the secret NSA program, working with AT&T, and Verizon, which recorded over 10 million phone calls between American citizens, causes uncertainty among privacy advocates, and the greater powers it gives to telecommunication companies to monitor user activity.[63] Using a cloud service provider (CSP) can complicate privacy of data because of the extent to which virtualization for cloud processing (virtual machines) and cloud storage are used to implement cloud service.[64] The point is that CSP operations, customer or tenant data may not remain on the same system, or in the same data center or even within the same provider's cloud. This can lead to legal concerns over jurisdiction. While there have been efforts (such as US-EU Safe Harbor) to "harmonise" the legal environment, providers such as Amazon still cater to major markets (typically the United States and the European Union) by deploying local infrastructure and allowing customers to select "availability zones."[65] Cloud computing poses privacy concerns because the service provider may access the data that is on the cloud at any point in time. They could accidentally or deliberately alter or even delete information.[66] Postage and delivery services company, Pitney Bowes launched Volly, a cloud-based, digital mailbox service to leverage its communication management assets. They also faced the technical challenge of providing strong data security and privacy. However, they were able to address the same concern by applying customized, application-level security, including encryption. [67]

Compliance In order to obtain compliance with regulations including FISMA, HIPAA, and SOX in the United States, the Data Protection Directive in the EU and the credit card industry's PCI DSS, users may have to adopt community or hybrid deployment modes that are typically more expensive and may offer restricted benefits. This is how Google is able to "manage and meet additional government policy requirements beyond FISMA"[68][69] and Rackspace Cloud or QubeSpace are able to claim PCI compliance.[70] Many providers also obtain a SAS 70 Type II audit, but this has been criticised on the grounds that the hand-picked set of goals and standards determined by the auditor and the auditee are often not disclosed and can vary widely.[71] Providers typically make this information available on request, under non-disclosure agreement.[72][73] Customers in the EU contracting with cloud providers outside the EU/EEA have to adhere to the EU regulations on export of personal data.[74] U.S. Federal Agencies have been directed by the Office of Management and Budget to use a process called FedRAMP (Federal Risk and Authorization Management Program) to assess and authorize cloud products and services. Federal CIO Steven VanRoekel issued a memorandum to federal agency Chief Information Officers on December 8, 2011 defining how federal agencies should use FedRAMP. FedRAMP consists of a subset of NIST Special Publication 800-53 security controls specifically selected to provide protection in cloud environments. A subset has been defined for the FIPS 199 low categorization and the FIPS 199 moderate categorization. The Cloud computing 148

FedRAMP program has also established a Joint Accreditation Board (JAB) consisting of Chief Information Officers from DoD, DHS and GSA. The JAB is responsible for establishing accreditation standards for 3rd party organizations who will perform the assessments of cloud solutions. The JAB will also review authorization packages and may grant provisional authorization (to operate). The federal agency consuming the service will still have the final responsibility for final authority to operate.[75]

Legal As can be expected with any revolutionary change in the landscape of global computing, certain legal issues arise; everything from trademark infringement, security concerns to the sharing of propriety data resources.

Open source Open-source software has provided the foundation for many cloud computing implementations, prominent examples being the Hadoop framework[76] and VMware's Cloud Foundry.[77] In November 2007, the Free Software Foundation released the Affero General Public License, a version of GPLv3 intended to close a perceived legal loophole associated with free software designed to be run over a network.[78]

Open standards Most cloud providers expose APIs that are typically well-documented (often under a Creative Commons license[79]) but also unique to their implementation and thus not interoperable. Some vendors have adopted others' APIs and there are a number of open standards under development, with a view to delivering interoperability and portability.[80]

Security As cloud computing is achieving increased popularity, concerns are being voiced about the security issues introduced through adoption of this new model. The effectiveness and efficiency of traditional protection mechanisms are being reconsidered as the characteristics of this innovative deployment model can differ widely from those of traditional architectures.[81] An alternative perspective on the topic of cloud security is that this is but another, although quite broad, case of "applied security" and that similar security principles that apply in shared multi-user mainframe security models apply with cloud security.[82] The relative security of cloud computing services is a contentious issue that may be delaying its adoption.[83] Physical control of the Private Cloud equipment is more secure than having the equipment off site and under someone else’s control. Physical control and the ability to visually inspect the data links and access ports is required in order to ensure data links are not compromised. Issues barring the adoption of cloud computing are due in large part to the private and public sectors' unease surrounding the external management of security-based services. It is the very nature of cloud computing-based services, private or public, that promote external management of provided services. This delivers great incentive to cloud computing service providers to prioritize building and maintaining strong management of secure services.[84] Security issues have been categorised into sensitive data access, data segregation, privacy, bug exploitation, recovery, accountability, malicious insiders, management console security, account control, and multi-tenancy issues. Solutions to various cloud security issues vary, from cryptography, particularly public key infrastructure (PKI), to use of multiple cloud providers, standardisation of APIs, and improving virtual machine support and legal support.[81][85][86] Cloud computing offers many benefits, but it also is vulnerable to threats. As the uses of cloud computing increase, it is highly likely that more criminals will try to find new ways to exploit vulnerabilities in the system. There are many underlying challenges and risks in cloud computing that increase the threat of data being compromised. To help mitigate the threat, cloud computing stakeholders should invest heavily in risk assessment to ensure that the system encrypts to protect data; establishes trusted foundation to secure the platform and infrastructure; and builds higher Cloud computing 149

assurance into auditing to strengthen compliance. Security concerns must be addressed in order to establish trust in cloud computing technology.

Sustainability Although cloud computing is often assumed to be a form of "green computing", there is no published study to substantiate this assumption.[87] citing the servers affects the environmental effects of cloud computing. In areas where climate favors natural cooling and renewable electricity is readily available, the environmental effects will be more moderate. (The same holds true for "traditional" data centers.) Thus countries with favorable conditions, such as Finland,[88] Sweden and Switzerland,[89] are trying to attract cloud computing data centers. Energy efficiency in cloud computing can result from energy-aware scheduling and server consolidation.[90] However, in the case of distributed clouds over data centers with different source of energies including renewable source of energies, a small compromise on energy consumption reduction could result in high carbon footprint reduction.[91]

Abuse As with privately purchased hardware, customers can purchase the services of cloud computing for nefarious purposes. This includes password cracking and launching attacks using the purchased services.[92] In 2009, a banking trojan illegally used the popular Amazon service as a command and control channel that issued software updates and malicious instructions to PCs that were infected by the malware.[93]

Research Many universities, vendors and government organisations are investing in research around the topic of cloud computing:[94][95] • In October 2007, the Academic Cloud Computing Initiative (ACCI) was announced as a multi-university project designed to enhance students' technical knowledge to address the challenges of cloud computing.[96] • In April 2009, UC Santa Barbara released the first open source platform-as-a-service, AppScale, which is capable of running Google App Engine applications at scale on a multitude of infrastructures. • In April 2009, the St Andrews Cloud Computing Co-laboratory was launched, focusing on research in the important new area of cloud computing. Unique in the UK, StACC aims to become an international centre of excellence for research and teaching in cloud computing and will provide advice and information to businesses interested in using cloud-based services.[97] • In October 2010, the TClouds (Trustworthy Clouds) project was started, funded by the European Commission's 7th Framework Programme. The project's goal is to research and inspect the legal foundation and architectural design to build a resilient and trustworthy cloud-of-cloud infrastructure on top of that. The project also develops a prototype to demonstrate its results.[98] • In December 2010, the TrustCloud research project [99][100] was started by HP Labs Singapore to address transparency and accountability of cloud computing via detective, data-centric approaches[101] encapsulated in a five-layer TrustCloud Framework. The team identified the need for monitoring data life cycles and transfers in the cloud,[99] leading to the tackling of key cloud computing security issues such as cloud data leakages, cloud accountability and cross-national data transfers in transnational clouds. • In July 2011, the High Performance Computing Cloud (HPCCLoud) project was kicked-off aiming at finding out the possibilities of enhancing performance on cloud environments while running the scientific applications - development of HPCCLoud Performance Analysis Toolkit which was funded by CIM-Returning Experts Programme - under the coordination of Prof. Dr. Shajulin Benedict. • In June 2011, the Telecommunications Industry Association developed a Cloud Computing White Paper, to analyze the integration challenges and opportunities between cloud services and traditional U.S. Cloud computing 150

telecommunications standards.[102] • In 2011, FEMhub launched NCLab, a free SaaS application for science, technology, engineering and mathematics (STEM). NCLab has more than 10,000 users as of July 2012.

References

[1] Monaco, Ania (7 June 2012 [last update]). "A View Inside the Cloud" (http:/ / theinstitute. ieee. org/ technology-focus/ technology-topic/ a-view-inside-the-cloud). theinstitute.ieee.org (IEEE). . Retrieved August 21, 2012.

[2] "Baburajan, Rajani, "The Rising Cloud Storage Market Opportunity Strengthens Vendors," infoTECH, August 24, 2011" (http:/ / it. tmcnet.

com/ channels/ cloud-storage/ articles/ 211183-rising-cloud-storage-market-opportunity-strengthens-vendors. htm). It.tmcnet.com. 2011-08-24. . Retrieved 2011-12-02.

[3] "Oestreich, Ken, "Converged Infrastructure," CTO Forum, November 15, 2010" (http:/ / www. thectoforum. com/ content/ converged-infrastructure-0). Thectoforum.com. 2010-11-15. . Retrieved 2011-12-02.

[4] "The NIST Definition of Cloud Computing" (http:/ / csrc. nist. gov/ publications/ nistpubs/ 800-145/ SP800-145. pdf). National Institute of Science and Technology. . Retrieved 24 July 2011. [5] Figure 8, "A network 70 is shown schematically as a cloud", US Patent 5,485,455, column 17, line 22, filed Jan 28, 1994 [6] Figure 1, "the cloud indicated at 49 in Fig. 1.", US Patent 5,790,548, column 5 lines 56-57, filed April 18, 1996

[7] "July, 1993 meeting report from the IP over ATM working group of the IETF" (http:/ / mirror. switch. ch/ ftp/ doc/ ietf/ ipatm/

atm-minutes-93jul. txt). CH: Switch. . Retrieved 2010-08-22. [8] Strachey, Christopher (June 1959). "Time Sharing in Large Fast Computers". Proceedings of the International Conference on Information processing, UNESCO. paper B.2.19: 336–341.

[9] Corbató, Fernando J.. "An Experimental Time-Sharing System" (http:/ / larch-www. lcs. mit. edu:8001/ ~corbato/ sjcc62/ ). SJCC Proceedings. MIT. . Retrieved 3 July 2012.

[10] Ryan; Falvey; Merchant (October 2011), "Regulation of the Cloud in India" (http:/ / ssrn. com/ abstract=1941494), Journal of Internet Law 15 (4),

[11] "Cloud Computing: Clash of the clouds" (http:/ / www. economist. com/ displaystory. cfm?story_id=14637206). The Economist. 2009-10-15. . Retrieved 2009-11-03.

[12] "Gartner Says Cloud Computing Will Be As Influential As E-business" (http:/ / www. gartner. com/ it/ page. jsp?id=707508). Gartner. . Retrieved 2010-08-22.

[13] Gruman, Galen (2008-04-07). "What cloud computing really means" (http:/ / www. infoworld. com/ d/ cloud-computing/ what-cloud-computing-really-means-031). InfoWorld. . Retrieved 2009-06-02.

[14] "Jeff Bezos' Risky Bet" (http:/ / www. businessweek. com/ magazine/ content/ 06_46/ b4009001. htm), Business Week,

[15] "Amazon’s early efforts at cloud computing partly accidental" (http:/ / itknowledgeexchange. techtarget. com/ cloud-computing/ 2010/ 06/

17/ amazons-early-efforts-at-cloud-computing-partly-accidental/ ), IT Knowledge Exchange, Tech Target, 2010‐6‐17, [16] B Rochwerger, J Caceres, RS Montero, D Breitgand, E Elmroth, A Galis, E Levy, IM Llorente, K Nagin, Y Wolfsthal, E Elmroth, J Caceres, M Ben-Yehuda, W Emmerich, F Galan. "The RESERVOIR Model and Architecture for Open Federated Cloud Computing", IBM Journal of Research and Development, Vol. 53, No. 4. (2009) [17] D Kyriazis, A Menychtas, G Kousiouris, K Oberle, T Voith, M Boniface, E Oliveros, T Cucinotta, S Berger, “A Real-time Service Oriented Infrastructure”, International Conference on Real-Time and Embedded Systems (RTES 2010), Singapore, November 2010

[18] Keep an eye on cloud computing (http:/ / www. networkworld. com/ newsletters/ itlead/ 2008/ 070708itlead1. html), Amy Schurr, Network World, 2008-07-08, citing the Gartner report, "Cloud Computing Confusion Leads to Opportunity". Retrieved 2009-09-11.

[19] Gartner Says Worldwide IT Spending On Pace to Surpass Trillion in 2008 (http:/ / www. gartner. com/ it/ page. jsp?id=742913), Gartner, 2008-08-18. Retrieved 2009-09-11.

[20] "Launch of IBM Smarter Computing" (https:/ / www-304. ibm. com/ connections/ blogs/ IBMSmarterSystems/ date/ 201102?lang=en_us). . Retrieved 1 March 2011.

[21] "What's In A Name? Utility vs. Cloud vs Grid" (http:/ / www. datacenterknowledge. com/ archives/ 2008/ Mar/ 25/

whats_in_a_name_utility_vs_cloud_vs_grid. html). Datacenterknowledge.com. . Retrieved 2010-08-22.

[22] "Distributed Application Architecture" (http:/ / java. sun. com/ developer/ Books/ jdbc/ ch07. pdf). Sun Microsystem. . Retrieved 2009-06-16.

[23] "Sun CTO: Cloud computing is like the mainframe" (http:/ / itknowledgeexchange. techtarget. com/ mainframe-blog/

sun-cto-cloud-computing-is-like-the-mainframe/ ). Itknowledgeexchange.techtarget.com. 2009-03-11. . Retrieved 2010-08-22.

[24] "It's probable that you've misunderstood 'Cloud Computing' until now" (http:/ / portal. acm. org/ citation. cfm?id=1496091. 1496100&

coll=& dl=ACM& CFID=21518680& CFTOKEN=18800807). TechPluto. . Retrieved 2010-09-14.

[25] Danielson, Krissi (2008-03-26). "Distinguishing Cloud Computing from Utility Computing" (http:/ / www. ebizq. net/ blogs/ saasweek/

2008/ 03/ distinguishing_cloud_computing/ ). Ebizq.net. . Retrieved 2010-08-22.

[26] "Recession Is Good For Cloud Computing – Microsoft Agrees" (http:/ / www. cloudave. com/ link/ recession-is-good-for-cloud-computing-microsoft-agrees). CloudAve. . Retrieved 2010-08-22.

[27] "Defining "Cloud Services" and "Cloud Computing"" (http:/ / blogs. idc. com/ ie/ ?p=190). IDC. 2008-09-23. . Retrieved 2010-08-22. Cloud computing 151

[28] "e-FISCAL project state of the art repository" (http:/ / www. efiscal. eu/ state-of-the-art). .

[29] Farber, Dan (2008-06-25). "The new geek chic: Data centers" (http:/ / news. cnet. com/ 8301-13953_3-9977049-80. html). CNET News. . Retrieved 2010-08-22.

[30] King, Rachael (2008-08-04). "Cloud Computing: Small Companies Take Flight" (http:/ / www. businessweek. com/ technology/ content/

aug2008/ tc2008083_619516. htm). Businessweek. . Retrieved 2010-08-22.

[31] "Defining and Measuring Cloud Elasticity" (http:/ / digbib. ubka. uni-karlsruhe. de/ volltexte/ 1000023476). KIT Software Quality Departement. . Retrieved 13 August 2011.

[32] "Economies of Cloud Scale Infrastructure" (http:/ / www. youtube. com/ watch?v=nfDsY3f4nVI). Cloud Slam 2011. . Retrieved 13 May 2011.

[33] "Encrypted Storage and Key Management for the cloud" (http:/ / www. cryptoclarity. com/ CryptoClarityLLC/ Welcome/ Entries/ 2009/ 7/

23_Encrypted_Storage_and_Key_Management_for_the_cloud. html). Cryptoclarity.com. 2009-07-30. . Retrieved 2010-08-22.

[34] Mills, Elinor (2009-01-27). "Cloud computing security forecast: Clear skies" (http:/ / news. cnet. com/ 8301-1009_3-10150569-83. html). CNET News. . Retrieved 2010-08-22.

[35] Perera, David. "The real obstacle to federal cloud computing," Fierce Government IT, July 12, 2012. (http:/ / www. fiercegovernmentit.

com/ story/ real-obstacle-federal-cloud-computing/ 2012-07-12)

[36] MSV, Janakiram. "Top 10 reasons why startups should consider cloud," Cloud Story, July 20, 2012 (http:/ / cloudstory. in/ 2012/ 07/

top-10-reasons-why-startups-should-consider-cloud/ ) [37] Mell, Peter; Grance, Timothy. "The NIST definition of cloud computing," Special Publication 800-145, National Institute of Standards and

Technology. (http:/ / csrc. nist. gov/ publications/ nistpubs/ 800-145/ SP800-145. pdf)

[38] Waters, John K. "HP's turn-key private cloud," Application Development Trends, August 30, 2010 (http:/ / adtmag. com/ blogs/

watersworks/ 2010/ 08/ hps-turn-key-private-cloud. aspx)

[39] Babcock, Charles. "RightScale launches app store for infrastructure," InformationWeek, June 3, 2011. (http:/ / www. informationweek. com/

news/ cloud-computing/ infrastructure/ 229900165)

[40] Jackson, Joab. "Red Hat launches hybrid cloud management software," Computerworld Techworld, June 6, 2012 (http:/ / www. techworld.

com. au/ article/ 426861/ red_hat_launches_hybrid_cloud_management_software) [41] Brown, Rodney. "Spinning up the instant cloud," CloudEcosystem, April 10, 2012 [www.cloudecosystem.com/author.asp?section_id=1873&doc_id=242031] [42] Riglian, Adam., "VIP Art Fair picks OpDemand over RightScale for IaaS management," SearchCloudApplications.com, December 1, 2011

(http:/ / searchcloudapplications. techtarget. com/ news/ 2240111861/ VIP-Art-Fair-picks-OpDemand-over-RightScale-for-IaaS-management) [43] Samson, Ted. "HP advances public cloud as part of ambitious hybrid cloud strategy," InfoWorld, April 10, 2012 [ttp://www.infoworld.com/t/hybrid-cloud/hp-advances-public-cloud-part-of-ambitious-hybrid-cloud-strategy-190524]

[44] "HP Cloud Maps can ease application automation," SiliconIndia, January 24, 2012 (http:/ / www. siliconindia. com/ shownews/

HP_Cloud_Maps_can_Ease_Application_Automation-nid-103866-cid-7. html)

[45] Voorsluys, William; Broberg, James; Buyya, Rajkumar (February 2011). "Introduction to Cloud Computing" (http:/ / media. johnwiley.

com. au/ product_data/ excerpt/ 90/ 04708879/ 0470887990-180. pdf). In R. Buyya, J. Broberg, A.Goscinski. Cloud Computing: Principles and Paradigms. New York, USA: Wiley Press. pp. 1–44. ISBN 978-0-470-88799-8. .

[46] Amies, Alex; Sluiman, Harm; Tong, Qiang Guo; Liu, Guo Ning (July 2012). "Infrastructure as a Service Cloud Concepts" (http:/ / www.

ibmpressbooks. com/ bookstore/ product. asp?isbn=9780133066845). Developing and Hosting Applications on the Cloud. IBM Press. ISBN 978-0-13-306684-5. .

[47] http:/ / www. ibmpressbooks. com/ bookstore/ product. asp?isbn=9780133066845

[48] Hamdaqa, Mohammad. A Reference Model for Developing Cloud Applications (http:/ / www. stargroup. uwaterloo. ca/ ~mhamdaqa/

publications/ A REFERENCEMODELFORDEVELOPINGCLOUD APPLICATIONS. pdf). .

[49] Chou, Timothy. Introduction to Cloud Computing: Business & Technology (http:/ / www. scribd. com/ doc/ 64699897/ Introduction-to-Cloud-Computing-Business-and-Technology). .

[50] "HVD: the cloud's silver lining" (http:/ / www. intrinsictechnology. co. uk/ FileUploads/ HVD_Whitepaper. pdf). Intrinsic Technology. . Retrieved 30 August 2012.

[51] Stevens, Alan (June 29, 2011). "When hybrid clouds are a mixed blessing" (http:/ / www. theregister. co. uk/ 2011/ 06/ 29/ hybrid_cloud/ ).

The Register (http:/ / www. theregister. co. uk). . Retrieved March 28, 2012.

[52] "Is a Private Cloud Really More Secure?" (http:/ / content. dell. com/ us/ en/ enterprise/ d/ large-business/ private-cloud-more-secure. aspx). Dell.com. . Retrieved 07-11-12.

[53] Foley, John. "Private Clouds Take Shape" (http:/ / www. informationweek. com/ news/ services/ business/ showArticle. jhtml?articleID=209904474). InformationWeek. . Retrieved 2010-08-22.

[54] Haff, Gordon (2009-01-27). "Just don't call them private clouds" (http:/ / news. cnet. com/ 8301-13556_3-10150841-61. html). CNET News. . Retrieved 2010-08-22.

[55] "There's No Such Thing As A Private Cloud" (http:/ / www. informationweek. com/ cloud-computing/ blog/ archives/ 2009/ 01/

theres_no_such. html). InformationWeek. 2010-06-30. . Retrieved 2010-08-22.

[56] "Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures" (http:/ / developer. amazonwebservices. com/ connect/ entry.

jspa?externalID=1632& categoryID=100). Developer.amazonwebservices.com. . Retrieved 2010-08-22. Cloud computing 152

[57] Bernstein, David; Ludvigson, Erik; Sankar, Krishna; Diamond, Steve; Morrow, Monique (2009-05-24). Blueprint for the Intercloud –

Protocols and Formats for Cloud Computing Interoperability (http:/ / www2. computer. org/ portal/ web/ csdl/ doi/ 10. 1109/ ICIW. 2009. 55). IEEE Computer Society. pp. 328–336. doi:10.1109/ICIW.2009.55. .

[58] "Kevin Kelly: A Cloudbook for the Cloud" (http:/ / www. kk. org/ thetechnium/ archives/ 2007/ 11/ a_cloudbook_for. php). Kk.org. . Retrieved 2010-08-22.

[59] "Intercloud is a global cloud of clouds" (http:/ / samj. net/ 2009/ 06/ intercloud-is-global-cloud-of-clouds. html). Samj.net. 2009-06-22. . Retrieved 2010-08-22.

[60] "Vint Cerf: Despite Its Age, The Internet is Still Filled with Problems" (http:/ / www. readwriteweb. com/ archives/

vint_cerf_despite_its_age_the. php?mtcCampaign=2765). Readwriteweb.com. . Retrieved 2010-08-22.

[61] "SP360: Service Provider: From India to Intercloud" (http:/ / blogs. cisco. com/ sp/ comments/ from_india_to_intercloud/ ). Blogs.cisco.com. . Retrieved 2010-08-22.

[62] Canada (2007-11-29). "Head in the clouds? Welcome to the future" (http:/ / www. theglobeandmail. com/ servlet/ story/ LAC. 20071129.

TWLINKS29/ TPStory/ Business). Toronto: Theglobeandmail.com. . Retrieved 2010-08-22.

[63] Cauley, Leslie (2006-05-11). "NSA has massive database of Americans' phone calls" (http:/ / www. usatoday. com/ news/ washington/

2006-05-10-nsa_x. htm). USATODAY.com. . Retrieved 2010-08-22.

[64] Winkler, Vic (2011). Securing the Cloud: Cloud Computer Security Techniques and Tactics (http:/ / www. elsevier. com/ wps/ find/

bookdescription. cws_home/ 723529/ description#description). Waltham, MA USA: Elsevier. p. 60. ISBN 978-1-59749-592-9. .

[65] "Feature Guide: Amazon EC2 Availability Zones" (http:/ / developer. amazonwebservices. com/ connect/ entry. jspa?externalID=1347& categoryID=112). Amazon Web Services. . Retrieved 2010-08-22.

[66] "Cloud Computing Privacy Concerns on Our Doorstep" (http:/ / cacm. acm. org/ magazines/ 2011/ 1/

103200-cloud-computing-privacy-concerns-on-our-doorstep/ fulltext). .

[67] "Cloud Enables New Business Opportunity" (http:/ / smartenterpriseexchange. com/ community/ execution/ blog/ 2011/ 06/ 13/

cloud-enables-new-business-opportunity/ fulltext). .

[68] "FISMA compliance for federal cloud computing on the horizon in 2010" (http:/ / searchcompliance. techtarget. com/ news/ article/

0,289142,sid195_gci1377298,00. html). SearchCompliance.com. . Retrieved 2010-08-22.

[69] "Google Apps and Government" (http:/ / googleenterprise. blogspot. com/ 2009/ 09/ google-apps-and-government. html). Official Google Enterprise Blog. 2009-09-15. . Retrieved 2010-08-22.

[70] "Cloud Hosting is Secure for Take-off: Mosso Enables The Spreadsheet Store, an Online Merchant, to become PCI Compliant" (http:/ /

www. rackspace. com/ cloud/ blog/ 2009/ 03/ 05/

cloud-hosting-is-secure-for-take-off-mosso-enables-the-spreadsheet-store-an-online-merchant-to-become-pci-compliant/ ). Rackspace. 2009-03-14. . Retrieved 2010-08-22.

[71] "Amazon gets SAS 70 Type II audit stamp, but analysts not satisfied" (http:/ / searchcloudcomputing. techtarget. com/ news/ article/

0,289142,sid201_gci1374629,00. html). SearchCloudComputing.com. 2009-11-17. . Retrieved 2010-08-22.

[72] "Assessing Cloud Computing Agreements and Controls" (http:/ / wistechnology. com/ articles/ 6954/ ). WTN News. . Retrieved 2010-08-22.

[73] "Cloud Certification From Compliance Mandate to Competitive Differentiator" (http:/ / www. youtube. com/ watch?v=wYiFdnZAlNQ). Cloudcor. . Retrieved 2011-09-20. [74] "How the New EU Rules on Data Export Affect Companies in and Outside the EU | Dr. Thomas Helbing – Kanzlei für Datenschutz-,

Online- und IT-Recht" (http:/ / www. thomashelbing. com/ en/ how-new-eu-rules-data-export-affect-companies-and-outside-eu). Dr. Thomas Helbing. . Retrieved 2010-08-22.

[75] "FedRAMP" (http:/ / www. gsa. gov/ portal/ category/ 102371). U.S. General Services Administration. 2012-06-13. . Retrieved 2012-06-17.

[76] "Open source fuels growth of cloud computing, software-as-a-service" (http:/ / www. networkworld. com/ news/ 2008/

072808-open-source-cloud-computing. html). Network World. . Retrieved 2010-08-22.

[77] "VMware Launches Open Source PaaS Cloud Foundry" (http:/ / www. cmswire. com/ cms/ information-management/

vmware-launches-open-source-paas-cloud-foundry-010941. php). Simpler Media Group, Inc.. 2011-04-21. . Retrieved 2012-08-01.

[78] "AGPL: Open Source Licensing in a Networked Age" (http:/ / redmonk. com/ sogrady/ 2009/ 04/ 15/

open-source-licensing-in-a-networked-age/ ). Redmonk.com. 2009-04-15. . Retrieved 2010-08-22.

[79] GoGrid Moves API Specification to Creative Commons (http:/ / www. gogrid. com/ company/ press-releases/

gogrid-moves--specification-to-creativecommons. php)

[80] "Eucalyptus Completes Amazon Web Services Specs with Latest Release" (http:/ / ostatic. com/ blog/ eucalyptus-completes-amazon-web-services-specs-with-latest-release). Ostatic.com. . Retrieved 2010-08-22.

[81] Zissis, Dimitrios; Lekkas (2010). "Addressing cloud computing security issues" (http:/ / www. sciencedirect. com/ science/ article/ pii/ S0167739X10002554). Future Generation Computer Systems. doi:10.1016/j.future.2010.12.006. .

[82] Winkler, Vic (2011). Securing the Cloud: Cloud Computer Security Techniques and Tactics (http:/ / www. elsevier. com/ wps/ find/

bookdescription. cws_home/ 723529/ description#description). Waltham, MA USA: Syngress. pp. 187, 189. ISBN 978-1-59749-592-9. .

[83] "Are security issues delaying adoption of cloud computing?" (http:/ / www. networkworld. com/ news/ 2009/

042709-burning-security-cloud-computing. html). Network World. . Retrieved 2010-08-22.

[84] "Security of virtualization, cloud computing divides IT and security pros" (http:/ / www. networkworld. com/ news/ 2010/

022210-virtualization-cloud-security-debate. html). Network World. 2010-02-22. . Retrieved 2010-08-22. Cloud computing 153

[85] Armbrust, M; Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Zaharia, (2010). "A view of cloud computing.". Communication of the ACM 53 (4): 50–58. doi:10.1145/1721654.1721672. [86] Anthens, G. "Security in the cloud". Communications of the ACM 53 (11). doi:10.1145/1839676.1839683.

[87] James Urquhart (January 7, 2010). "Cloud computing's green paradox" (http:/ / news. cnet. com/ 8301-19413_3-10428065-240. html). CNET News. . Retrieved March 12, 2010. "...there is some significant evidence that the cloud is encouraging more compute consumption"

[88] Finland – First Choice for Siting Your Cloud Computing Data Center. (http:/ / www. fincloud. freehostingcloud. com/ ). Retrieved 4 August 2010.

[89] Swiss Carbon-Neutral Servers Hit the Cloud. (http:/ / www. greenbiz. com/ news/ 2010/ 06/ 30/ swiss-carbon-neutral-servers-hit-cloud). Retrieved 4 August 2010.

[90] Berl, Andreas, et al., Energy-Efficient Cloud Computing (http:/ / comjnl. oxfordjournals. org/ content/ 53/ 7/ 1045. short), The Computer Journal, 2010.

[91] Farrahi Moghaddam, Fereydoun, et al., Low Carbon Virtual Private Clouds (http:/ / ieeexplore. ieee. org/ search/ srchabstract. jsp?tp=& arnumber=6008718), IEEE Cloud 2011.

[92] Alpeyev, Pavel (2011-05-14). "Amazon.com Server Said to Have Been Used in Sony Attack" (http:/ / www. bloomberg. com/ news/

2011-05-13/ sony-network-said-to-have-been-invaded-by-hackers-using-amazon-com-server. html). Bloomberg. . Retrieved 2011-08-20.

[93] Goodin, Dan (2011-05-14). "PlayStation Network hack launched from Amazon EC2" (http:/ / www. theregister. co. uk/ 2011/ 05/ 14/

playstation_network_attack_from_amazon/ ). The Register. . Retrieved 2012-05-18.

[94] "Cloud Net Directory. Retrieved 2010-03-01" (http:/ / www. cloudbook. net/ directories/ research-clouds). Cloudbook.net. . Retrieved 2010-08-22. [95] "– National Science Foundation (NSF) News – National Science Foundation Awards Millions to Fourteen Universities for Cloud

Computing Research – US National Science Foun" (http:/ / www. nsf. gov/ news/ news_summ. jsp?cntn_id=114686). Nsf.gov. . Retrieved 2011-08-20.

[96] Rich Miller (2008-05-02). "IBM, Google Team on an Enterprise Cloud" (http:/ / www. datacenterknowledge. com/ archives/ 2008/ 05/ 02/

ibm-google-team-on-an-enterprise-cloud/ ). DataCenterKnowledge.com. . Retrieved 2010-08-22.

[97] "StACC - Collaborative Research in Cloud Computing" (http:/ / www. cs. st-andrews. ac. uk/ stacc). University of St Andrews department of Computer Science. . Retrieved 2012-06-17.

[98] "Trustworthy Clouds: Privacy and Resilience for Internet-scale Critical Infrastructure" (http:/ / www. tclouds-project. eu). . Retrieved 2012-06-17. [99] Ko, Ryan K. L.; Jagadpramana, Peter; Lee, Bu Sung (2011). "Flogger: A File-centric Logger for Monitoring File Access and Transfers

within Cloud Computing Environments" (http:/ / www. hpl. hp. com/ techreports/ 2011/ HPL-2011-119. pdf). Proceedings of the 10th IEEE International Conference on Trust, Security and Privacy of Computing and Communications (TrustCom-11). . [100] Ko, Ryan K. L.; Jagadpramana, Peter; Mowbray, Miranda; Pearson, Siani; Kirchberg, Markus; Liang, Qianhui; Lee, Bu Sung (2011).

"TrustCloud: A Framework for Accountability and Trust in Cloud Computing" (http:/ / www. hpl. hp. com/ techreports/ 2011/ HPL-2011-38. pdf). Proceedings of the 2nd IEEE Cloud Forum for Practitioners (IEEE ICFP 2011), Washington DC, USA, July 7–8, 2011. . [101] Ko, Ryan K. L. Ko; Kirchberg, Markus; Lee, Bu Sung (2011). "From System-Centric Logging to Data-Centric Logging - Accountability,

Trust and Security in Cloud Computing" (http:/ / www. hpl. hp. com/ people/ ryan_ko/ RKo-DSR2011-Data_Centric_Logging. pdf). Proceedings of the 1st Defence, Science and Research Conference 2011 - Symposium on Cyber Terrorism, IEEE Computer Society, 3–4 August 2011, Singapore. .

[102] "Publication Download" (http:/ / www. tiaonline. org/ market_intelligence/ publication_download. cfm?file=TIA_Cloud_Computing_White_Paper). Tiaonline.org. . Retrieved 2011-12-02.

External links

• Cloud Computing Portal and Scholarly Resources on Academic Room (http:/ / www. academicroom. com/

physical-sciences/ information-technology/ cloud-computing)

• The NIST Definition of Cloud Computing (http:/ / csrc. nist. gov/ publications/ nistpubs/ 800-145/ SP800-145. pdf). Peter Mell and Timothy Grance, NIST Special Publication 800-145 (September 2011). National Institute of Standards and Technology, U.S. Department of Commerce.

• Guidelines on Security and Privacy in Public Cloud Computing (http:/ / nvlpubs. nist. gov/ nistpubs/ sp/ 2011/

sp800-144. pdf). Wayne Jansen and Timothy Grance, NIST Special Publication 800-144 (December 2011). National Institute of Standards and Technology, U.S. Department of Commerce.

• Cloud Computing - Benefits, risks and recommendation for information security (http:/ / www. enisa. europa. eu/

activities/ risk-management/ files/ deliverables/ cloud-computing-risk-assessment). Daniele Cattedu and Giles Hobben, European Network and Information Security Agency 2009. Multi-touch 154 Multi-touch

In computing, multi-touch refers to a touch sensing surface's (trackpad or touchscreen) ability to recognize the presence of two or more points of contact with the surface. This plural-point awareness is often used to implement advanced functionality such as pinch to zoom or activating predefined programs.

In an effort of disambiguation or marketing classification some companies further break down the various definitions of multi-touch. An example of this is 3M, defining multi-touch as a touch-screen's ability to register three or more distinct positions.[1] Multi-touch screen

History of Multitouch The use of touchscreen technology to control electronic devices pre-dates multi-touch technology and the personal computer. Early synthesizer and electronic instrument builders like Hugh Le Caine and Bob Moog experimented with using touch-sensitive capacitance sensors to control the sounds made by their instruments.[2] IBM began building the first touch screens in the late 1960s, and, in 1972, Control Data released the PLATO IV computer, a terminal used for educational purposes that employed single-touch points in a 16x16 array as its user interface. One of the early implementations of mutual capacitance touchscreen technology was developed at CERN in 1977[4][5] based on their capacitance touch screens developed in 1972 by Danish electronics engineer Bent Stumpe. This technology was used to develop a new type of human machine interface (HMI) for the control room of the Super Proton Synchrotron particle accelerator.

In a handwritten note dated 11 March 1972, Stumpe presented his proposed solution – a capacitive touch screen with a fixed number of [3] programmable buttons presented on a display. The screen was to The prototypes of the x-y mutual capacitance multi-touch screens (left) developed at CERN consist of a set of capacitors etched into a film of copper on a sheet of glass, each capacitor being constructed so that a nearby flat conductor, such as the surface of a finger, would increase the capacity by a significant amount. The capacitors were to consist of fine lines etched in copper on a sheet of glass – fine enough (80 μm) and sufficiently far apart (80 μm) to be invisible (CERN Courier April 1974 p117). In the final device, a simple lacquer coating prevented the fingers from actually touching the capacitors.

Multi-touch technology began in 1982, when the University of Toronto's Input Research Group developed the first human-input multi-touch system. The system used a frosted-glass panel with a camera placed behind the glass. When a finger or several fingers pressed on the glass, the camera would detect the action as one or more black spots on an otherwise white background, allowing it to be registered as an input. Since the size of a dot was dependent on pressure (how hard the person was pressing on the glass), the system was somewhat pressure-sensitive as well.[2] In 1983, Bell Labs at Murray Hill published a comprehensive discussion of touch-screen based interfaces.[6] In 1984, Bell Labs engineered a touch screen that could change images with more than one hand. In 1985, the University of Multi-touch 155

Toronto group including Bill Buxton developed a multi-touch tablet that used capacitance rather than bulky camera-based optical sensing systems.[2] A breakthrough occurred in 1991, when Pierre Wellner published a paper on his multi-touch "Digital Desk", which supported multi-finger and pinching motions.[7][8] Various companies expanded upon these inventions in the beginning of the twenty-first century. The company Fingerworks developed various multi-touch technologies between 1999 and 2005, including Touchstream keyboards and the iGesture Pad. Several studies of this technology were published in the early 2000s by Alan Hedge, professor of human factors and ergonomics at Cornell University.[9][10][11] Apple acquired Fingerworks and its multi-touch technology in 2005. Mainstream exposure to multi-touch technology occurred in 2007 when the iPhone gained popularity, with Apple stating they 'invented multi touch' as part of the iPhone announcement,[12] however both the function and the term predate the announcement or patent requests, except for such area of application as capacitive mobile screens, which did not exist before Fingerworks/Apple's technology (Fingerworks filed patents in 2001-2005[13], subsequent multitouch refinements were patented by Apple[14]) . Apple were the first to introduce multi-touch on a mobile device.[15] Microsoft's table-top touch platform Microsoft PixelSense, which started development in 2001, interacts with both the users touch and their electronic devices. Similarly, in 2001, Mitsubishi Electric Research Laboratories (MERL) began development of a multi-touch, multi-user system called DiamondTouch, also based on capacitance but able to differentiate between multiple simultaneous users (or rather, the chairs in which each user is seated or the floorpad the user is standing on); the Diamondtouch became a commercial product in 2008. Small-scale touch devices are rapidly becoming commonplace, with the number of touch screen telephones expected to increase from 200,000 shipped in 2006 to 21 million in 2012.[16] Some of the first devices to support multi-touch were: • Mitsubishi DiamondTouch (2001) • Apple iPhone (January 9, 2007) • Microsoft PixelSense (May 29, 2007) • NORTD labs Open Source system CUBIT (multi-touch) (2007) • Elan eFinger

Brands and manufacturers

Apple has retailed and distributed numerous products using multi-touch technology; most prominently including its iPhone smartphone and iPad tablet. Additionally, Apple also holds several patents related to the implementation of multi-touch in user interfaces[17] Apple additionally attempted to register "Multi-touch" as a trademark in the United States—however its request was denied by the United States Patent and Trademark Office because it considered the term generic.[18]

Multi-touch sensing and processing occurs via an ASIC A virtual keyboard on an iPad sensor that is attached to the touch surface. Usually, separate companies make the ASIC and screen that combine into a touch screen; conversely, a trackpad's surface and ASIC are usually manufactured by the same company. There have been large companies in recent years that have expanded into the growing multi-touch industry, with systems designed for everything from the casual user to multinational organizations. Multi-touch 156

It is now common for laptop manufacturers to include multi-touch trackpads on their laptops, and tablet computers respond to touch input rather than traditional stylus input and it is supported by many recent operating systems. A few companies are focusing on large-scale surface computing rather than personal electronics, either large multi-touch tables or wall surfaces. These systems are generally used by government organizations, museums, and companies as a means of information or exhibit display.

Implementations Multi-touch has been implemented in several different ways, depending on the size and type of interface. The most popular form are mobile devices, tablets, touchtables and walls. Both touchtables and touch walls project an image through acrylic or glass, and then back-light the image with LEDs. Types[19] • Capacitive Technologies • Surface Capacitive Technology or Near Field Imaging (NFI) • Projected Capacitive Touch (PST) • Mutual capacitance • Self-capacitance • In-cell: Capacitive • Resistive Technologies • Analog Resistive • Digital Resistive or In-Cell: Resistive • Optical Technologies • Optical Imaging or Infrared technology • Rear Diffused Illumination (DI) [20] • Infrared Grid Technology (opto-matrix) or Digital Waveguide Touch (DWT)™ or Infrared Optical Waveguide • Frustrated Total Internal Reflection (FTIR) • Diffused Surface Illumination (DSI) • Laser Light Plane (LLP) • In-Cell: Optical • Wave Technologies • Surface Acoustic Wave (SAW) • Bending Wave Touch (BWT) • Dispersive Signal Touch (DST) • Acoustic Pulse Recognition (APR) • Force-Sensing Touch Technology The optical touch technology functions when a finger or an object touches the surface, causing the light to scatter, the reflection is caught with sensors or cameras that send the data to software which dictates response to the touch, depending on the type of reflection measured. Touch surfaces can also be made pressure-sensitive by the addition of a pressure-sensitive coating that flexes differently depending on how firmly it is pressed, altering the reflection.[21] Handheld technologies use a panel that carries an electrical charge. When a finger touches the screen, the touch disrupts the panel's electrical field. The disruption is registered and sent to the software, which then initiates a response to the gesture.[22] In the past few years, several companies have released products that use multi-touch. In an attempt to make the expensive technology more accessible, hobbyists have also published methods of constructing DIY touchscreens.[23] Multi-touch 157

Multi-touch gestures Multi-touch gestures are standardized motions used to interact with multitouch devices. Multitouch gestures are tightly integrated into many Apple products, and in laptop and desktop computing systems. Multi-touch gestures are also a part of most modern smartphones and multitouch tablets, including the iPhone, iPad, Android phones or tablets, and some BlackBerry devices.

Tap

Double Tap

Long Press

Scroll

Pan

Flick

Two Finger Tap Multi-touch 158

Two Finger Scroll

Pinch

Spread

Rotate

Popular culture references Popular culture has also portrayed potential uses of multi-touch technology in the future, including several installments of the Star Trek franchise. The television series CSI: Miami introduced both surface and wall multi-touch displays in its sixth season. Another television series, NCIS: Los Angeles, make use of multi-touch surfaces and wall panels as an initiative to go digital. Another form of a multi-touch computer was seen in the film The Island, where the professor, played by Sean Bean, has a multi-touch desktop to organize files, based on an early version of Microsoft PixelSensehttp:.2F.2Fwww.istartedsomething.com.2F20070611.2Fmicrosoft-surface-the-island.2F. Multitouch technology can also be seen in the James Bond film Quantum of Solace, where MI6 uses a touch interface to browse information about the criminal Dominic Greene.[24] In an episode of the television series The Simpsons, when Lisa Simpson travels to the underwater headquarters of Apple to visit Steve Jobs, who is shown to be performing multiple multi-touch hand gestures on a large touch wall. A device similar to the Microsoft PixelSense was seen in the 1982 Disney sci-fi film Tron. It took up an executive's entire desk and was used to communicate with the Master Control computer. The interface used to control the alien ship in the 2009 film District 9 features such similar technology.[25] Microsoft's PixelSense was also used in the 2008 film The Day the Earth Stood Still.[26] In the 2002 film Minority Report, Tom Cruise uses a set of gloves that resemble a multi-touch interface to browse through information.t[27] Multi-touch 159

References

[1] "What is Multitouch" (http:/ / solutions. 3m. com/ wps/ portal/ 3M/ en_US/ TouchTopics/ Home/ Terminology/ WhatIsMultitouch/ ). . Retrieved 30 May 2010.

[2] Buxton, Bill. "Multitouch Overview" (http:/ / www. billbuxton. com/ multitouchOverview. html)

[3] The first capacitative touch screens at CERN (http:/ / cerncourier. com/ cws/ article/ cern/ 42092), CERN Courrier, 31 March 2010, , retrieved 2010-05-25

[4] Bent STUMPE (16 March 1977), A new principle for x-y touch system (http:/ / cdsweb. cern. ch/ record/ 1266588/ files/ StumpeMar77. pdf), CERN, , retrieved 2010-05-25

[5] Bent STUMPE (6 February 1978), Experiments to find a manufacturing process for an x-y touch screen (http:/ / cdsweb. cern. ch/ record/

1266589/ files/ StumpeFeb78. pdf), CERN, , retrieved 2010-05-25

[6] Nakatani, L. H., John A Rohrlich; Rohrlich, John A. (1983). "Soft Machines: A Philosophy of User-Computer Interface Design" (http:/ / doi.

acm. org/ 10. 1145/ 800045. 801573). Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’83): 12–15. doi:10.1145/800045.801573. . Retrieved 2009-01-28.

[7] Wellner, Pierre. 1991. The Digital Desk. YouTube video (http:/ / www. youtube. com/ watch?v=S8lCetZ_57g)

[8] Pierre Wellner's papers (http:/ / www. informatik. uni-trier. de/ ~ley/ db/ indices/ a-tree/ w/ Wellner:Pierre. html) via DBLP [9] Westerman, W., Elias J.G. and A.Hedge (2001) Multi-touch: a new tactile 2-d gesture interface for human-computer interaction Proceedings of the Human Factors and Ergonomics Society 45th Annual Meeting, Vol. 1, 632-636. [10] Shanis, J. and Hedge, A. (2003) Comparison of mouse, touchpad and multitouch input technologies. Proceedings of the Human Factors and Ergonomics Society 47th Annual Meeting, Oct. 13-17, Denver, CO, 746-750. [11] Thom-Santelli, J. and Hedge, A. (2005) Effects of a multitouch keyboard on wrist posture, typing performance and comfort. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, Orlando, Sept. 26-30, HFES, Santa Monica, 646-650.

[12] Steve Jobs (2006). "And Boy Have We Patented It" (http:/ / www. businessinsider. com/ and-boy-have-we-patented-it-2010-3). . Retrieved 2010-05-14. "And we have invented a new technology called Multi-touch"

[13] "US patent 7,046,230 "Touch pad handheld device"" (http:/ / patft. uspto. gov/ netacgi/ nph-Parser?Sect1=PTO2& Sect2=HITOFF& p=1&

u=/ netahtml/ PTO/ search-bool. html& r=4& f=G& l=50& co1=AND& d=PTXT& s1="Touch+ Pad+ Handheld+ Device". TI. & OS=TTL/

"Touch+ Pad+ For+ Handheld+ Device"& RS=TTL/ "Touch+ Pad+ For+ Handheld+ Device"). .

[14] Jobs et al. "Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics" (http:/ /

appft1. uspto. gov/ netacgi/ nph-Parser?Sect1=PTO2& Sect2=HITOFF& p=1& u=/ netahtml/ PTO/ search-bool. html& r=3& f=G& l=50&

co1=AND& d=PG01& s1="touch+ screen+ device". TTL. & s2=apple. AS. & OS=TTL/ "touch+ screen+ device"+ AND+ AN/ apple&

RS=TTL/ "touch+ screen+ device"+ AND+ AN/ apple). . [15] Neumayr, Tom, and Kerris, Natalie, 2007 "Apple Unveils iPod touch Revolutionary Multi-touch Interface & Built-in Wi-Fi Wireless

Networking" (http:/ / www. apple. com/ pr/ library/ 2007/ 09/ 05touch. html) accessdate=21 November 2010

[16] Wong, May. 2008. Touch-screen phones poised for growth (http:/ / www. usatoday. com/ tech/ products/ 2007-06-21-1895245927_x. htm) . Retrieved April 2008.

[17] Heater, Brian (27 January 2009). "Key Apple Multi-Touch Patent Tech Approved" (http:/ / www. pcmag. com/ article2/ 0,2817,2339661,00. asp#fbid=lGomcx1lHVv). PCmag.com. . Retrieved 27 September 2011.

[18] Golson, Jordan. "Apple Denied Trademark for Multi-Touch" (http:/ / www. macrumors. com/ 2011/ 09/ 26/

apple-denied-trademark-for-multi-touch/ ). MacRumors. . Retrieved 27 September 2011.

[19] Knowledge base:Multitouch technologies. Digest author: Gennadi Blindmann (http:/ / www. multi-touch-solution. com/ en/

knowledge-base-en/ )

[20] http:/ / wiki. nuigroup. com/ Diffused_Illumination

[21] Scientific American. 2008. "How It Works: Multitouch Surfaces Explained" (http:/ / www. scientificamerican. com/ article. cfm?id=how-it-works-touch-surfaces-explained) . Retrieved January 9, 2010.

[22] Brandon, John. 2009. "How the iPhone Works (http:/ / www. computerworld. com/ s/ article/ 9138644/ How_the_iPhone_works)

[23] "http:/ / www. humanworkshop. com/ index. php?modus=e_zine& sub=articles& item=99 "DIY Multi-touch screen"

[24] 2009. " Quantum of Solace Multitouch UI" (http:/ / garry. posterous. com/ quantum-of-solaces-multitouch-ui-video-wall-g)

[25] "District 9 - Ship UI " (http:/ / www. youtube. com/ watch?v=YZoa1jHSpBw)

[26] Garofalo, Frank Joseph. "User Interfaces For Simultaneous Group Collaboration Through Multi-Touch Devices" (http:/ / docs. lib. purdue.

edu/ cgi/ viewcontent. cgi?article=1019& context=techdirproj). Purdue University. p. 17. . Retrieved 3 June 2012.

[27] http:/ / gizmodo. com/ 229464/ minority-report-touch-interface-for-real Multi-touch 160

External links

• Multi-Touch Systems that I Have Known and Loved (http:/ / www. billbuxton. com/ multitouchOverview. html) – An overview by researcher Bill Buxton of Microsoft Research, formerly at University of Toronto and Xerox PARC.

• The Unknown History of Pen Computing (http:/ / users. erols. com/ rwservices/ pens/ penhist. html) contains a history of pen computing, including touch and gesture technology, from approximately 1917 to 1992.

• Annotated bibliography of references to pen computing (http:/ / rwservices. no-ip. info:81/ biblio. html)

• Multi-Touch Interaction Research @ NYU (http:/ / cs. nyu. edu/ ~jhan/ ftirtouch/ )

• Camera-based multi-touch for wall-sized displays (http:/ / www. cs. uit. no/ ~daniels/ multi-touch/ index. html)

• David Wessel Multitouch (http:/ / cnmat. berkeley. edu/ system/ files/ attachments/ p41-wessel. pdf)

• Jeff Han's Multi Touch Screen's chronology archive (http:/ / multi-touchscreen. com/ ) De

• Force-Sensing, Multi-Touch, User Interaction Technology (http:/ / www. motorola. com/ Business/ US-EN/

Technology_Licensing/ Proprietary+ Technology+ Licensing/ Force-Sensing+ Touch+ Technology)

• LCD In-Cell Touch by Geoff Walker and Mark Fihn (http:/ / www. walkermobile. com/

March_2010_ID_LCD_In_Cell_Touch. pdf)

• Touch technologies for large-format applications by Geoff Walker (http:/ / www. walkermobile. com/

Touch_Technologies_for_Large_Format_Applications. pdf)

• Video: Surface Acoustic Wave Touch Screens (http:/ / www. youtube. com/ watch?v=hwjMWhpxWqo)

• Video: How 3M™ Dispersive Signal Technology Works (http:/ / www. youtube. com/ watch?v=YEaNmEZJ1Bk)

• Video: Introduction to mTouch Capacitive Touch Sensing (http:/ / www. youtube. com/ watch?v=JVRuDY4X88M) Article Sources and Contributors 161 Article Sources and Contributors

Business intelligence Source: http://en.wikipedia.org/w/index.php?oldid=512163622 Contributors: A40220, ABCD, AMJBIUser, AVM, Aadeel, Aaron Brenneman, Ademkader, AdjustablePliers, Aetheling, Aexis, Alansohn, Alberrosidus, Alem, Alexandra31, Alexf, AlistairMcMillan, Alpha Quadrant (alt), Ananthnats, AndriuZ, Andy Dingley, Ansumang, Ant, Apolitano, Arcann, Arthena, Arthur Rubin, Ashil04, Aspandyar, Az1568, Bansipatel, Batra, Bcarrdba, Beardo, Becky Sayles, Beetstra, Beland, Beroneous, Bevelson, Bharatcit, BlackLips, Blackstar138, BlaineKohl, Blork-mtl, Bonanjef, Bpm123, Bpmbooks, Brick Thrower, Brookie, Camw, CaveJohnson, Ceranthor, CharlesHoffman, Charlesmnicholls, Chase me ladies, I'm the Cavalry, Chris the speller, Chrisawiki, Chrisvonsimson, Chuckrussell, Chuq, Codeculturist, CommodiCast, Coolpriyanka10, Corp Vision, Cquels, Cryptblade, Crysb, Czenek, DMG413, DanDoughty, DanMS, Dancter, Danielsmith, DavidDouthitt, DePiep, Dhavalp453, Discospinster, Dnazip, Dnedzel, DouglasGreen, Dr Gangrene, Dr.apostrophe, Drcwright, Dreamrequest, Dwandelt, Edit06, Einsteinlebt, Ejosse1, ElixirTechnology, Eliyak, Elvis, Entgroupzd, Erianna, Ermite, Ethansdad, Etz Haim, Evans1982, Evil Monkey, Extransit, Fauxstar, Folajimi, Force88, Forceblue, Fraggle81, FrancoGG, Frederikton, Fredsmith2, Fæ, Gary King, Gary a mason, Genuinedifference, Glane23, Glaugh, Golbez, Goyalaishwarya, Grafen, Greenboite, GregorB, Gscshoyru, Guillom, Gwalarn, HMishkoff, Halfgoku, Hanantaiber, Hans Genten, Happyinmaine, Heirpixel, Helwr, Hyphen DJW, ITPerforms, ITtalker, IW.HG, Iaantheron, Iamthenewno2, Ianhowlett, Intelligentknowledgeyoudrive, Intergalacticz9, Ionium, Ireas, Islamomt, IvanLanin, Ivytrejo, JEL888, JVRudnick, Jaej, Janner210, Jay, Jcarroll, Jeff G., Jeff3000, Jehan21, Jesaisca, Jim380, Jkofron4, Jmkim dot com, Joel7687, Joelm, John Vandenberg, John ellenberger, Joshua.pierce84, Jpnofseattle, Jsmith1108, JukoFF, Just zis Guy, you know?, Jvlock527, Jwcga, Kadambarid, Karl-Henner, Kbeneby, Kellylautt, KeyStroke, Khalid hassani, Kit Berg, Kku, Krich, Kuru, L Kensington, L'Aquatique, Lancet75, Langbk01, Larrymcp, Leandrod, Lfstevens, Liface, Logical Cowboy, Loripiquet, Lovedemon84, Lucky 6.9, Lupo, MER-C, MPerel, Macrakis, Man koznar, Manning Bartlett, Manop, Mark Bosley, Mark Renier, Martarius, Martpol, Materialscientist, Maurreen, Mcclarke, Mdd, Meg2765, Mehtasanjay, Melcombe, Michael Hardy, Mikeblas, Mindmatrix, Mirv, Mitrius, Mkoval, Momotoshi, Moonriddengirl, Mr.Gaebrial, MrOllie, Muu-karhu, Mydogategodshat, Mymallandnews, Naniwako, Natasha81, Navvod, Ncw0617, Nhgaudreau, Nixdorf, Nmourfield, Norm, Nposs, Nraden, Nsaa, ObserverToSee, Ohconfucius, Ohnoitsjamie, OnBeyondZebrax, OnTheNet21, Opagecrtr, Openstrings, Outbackprincess, Pacific202, Pamela Haas, PaulHanson, Pedant17, PerfectStorm, Perohanych, Peters72, Peyre, Phani96, Philip Trueman, Piano non troppo, Pmresource, Pravisurabhi, Prazan, Priyank bolia, Prolog, Psb777, Qarakesek, Qqppqqpp, Quantpole, Qwyrxian, R'n'B, RJHall, Racecarradar, Rajashekar iitm, Rayrubenstein, Reinyday, Reverend T. R. Malthus, Rhobite, Rich Janis, Rickybarron, Riyadmks, Roberta F., Robiminer, Roc, Rollins83, Rongou, Ronhjones, Ronz, Ruislick0, Ryan Rasmussen, S.K., SEOtools, Sarnholm, Saturnight, Schniider, Seankenalty, SebastianHelm, SethGrimes, Setti, Shadowjams, Sierramadrid, Simoes, Sinotara, Siryendor, SkyWalker, Slant, Snowolf, Srknet, Srkview, Stationcall, Steelsabre, Stefanomione, Stephen, Stevage, StuffOfInterest, Sualfradique, Supercactus, Superm401, Supertouch, Sutanupaul, Svetovid, Swells65, Technologyvoices, Technopat, TexasAndroid, TheKMan, TheWakeUpFactory, TigerShark, TimMulherin, Tjic, Tjtyrrell, Tomhubbard, Tompana82, Traroth, Travis.a.buckingham, Triplestop, Trusilver, Ukpremier, Urchandru, Vasant Dhar, Veinor, ViriiK, WLU, Waggers, Warren, Wavelength, Welsh, Wendecover, Wernight, Wiki episteme, Wikidan829, Wikiolap, Wilcoxaj, Wile E. Heresiarch, WinterSpw, ,דוד שי ,Wondigoma, Woohookitty, Woz2, Wperdue, WriterListener, Writerguy71, Wsvlqc, Wxhat1, X0lani, Xjengvyen, Xyzzyplugh, Y.Kondrykava, YK Times, Yasst8, ZimZalaBim, Zzuuzz นคเรศ, 874 anonymous edits

Online analytical processing Source: http://en.wikipedia.org/w/index.php?oldid=509863487 Contributors: .:Ajvol:., 7, 90 Auto, Afraietta, AlanUS, AlexAnglin, Alfio, Alvarezdebrot, Anwar saadat, Arcann, Awbush, Beardo, Beland, Bitterpeanut, Bruce1ee, Bunnyhop11, COnFlIcT.sYs, CambridgeBayWeather, Campingcar, Cffrost, ChanciOly, Charleca, Chris lavigne, CommodiCast, Crysb, Csp987654321, Danim, DePiep, Deflective, Degroffdo, Djoni, Dll99, Dmsar, Dmuzza, DocendoDiscimus, Dougher, Drewgut, Dturner46, EbenVisher, Elsendero, Elvis, Elwikipedista, Epbr123, Ezrakilty, FatalError, Flankk, Foot, Founder DIPM Institute, Fæ, Gary a mason, Germanseneca, Goethean, Grand ua, Greenrd, Gscshoyru, Gspofford, Hanacy, Howcheng, Imhogan, JEB90, Jagyanseni, Jarda-wien, Jay, Jay-Jay, Jbecher, Jcarroll, JesseHogan, Jgiam, Jherm, Jmcc150, Jmd wiki99, John Vandenberg, Jon Awbrey, JonHarder, Julianhyde, Kerenb, Kgrr, Kher122, Kku, Kronostos, Kuru, Kwamikagami, L Kensington, Larkspurs, Leandrod, Lemmie, Leonbravo, Lingliu07, Livingthingdan, Lovedemon84, MaGa, Magnabonzo, Mark Renier, Mark T, Markham, Masterpra2002, Maurreen, Mboverload, Mbowen, Mcvearry, Metaeducation, Miaow Miaow, Michael Hardy, Michaell, Mikeblas, Modify, MrDolomite, MrJones, Mydogategodshat, Ndenison, Ningauble, Numbsyd, Oaf2, Object01, Ohnoitsjamie, Oxymoron83, Pasquale, Pcb21, Pearle, Pgan002, Plasticup, Playmobilonhishorse, Psb777, Qxz, Ramki, Ratarsed, Retired username, Richmaddox, Ringbang, Rjwilmsi, S.K., S1199, Sally Ku, Sam Korn, Sansari13, Sarnholm, Saulat78, Selah28, Sj, Slposey, SqlPac, Sspecter, Stefan, Tbsdy lives, Tcloonan, The Thing That Should Not Be, Theo10011, Tikiwont, Tobycat, Tonyproctor, Tsjustme, TwoOneTwo, Vargabor, Veinor, Wikiolap, Winterst, Woohookitty, WorldsEndGirl, Writerguy71, Yabdulkarim, Zanaq, Zhenqinli, ZimZalaBim, 396 anonymous edits

Analytics Source: http://en.wikipedia.org/w/index.php?oldid=510946342 Contributors: 16@r, Aharol, Analytically, Bansipatel, BarryList, Barticus88, Beetstra, BlaineKohl, Brandoneus, C.Fred, Charles Matthews, CommodiCast, Cyberjacob, DeadEyeArrow, Deineka, Deli nk, Dysprosia, Elringo, Emcien, Ethansdad, Falcon8765, Freakmighty, Gogo Dodo, Gregory787, Hanswaarle, HikeBandit, Hobophobe, Idea Farm, IvanLanin, James Doehring, Jeff3000, Jimmaths, Jonkerz, Julesd, Kadambarid, Kellylautt, Kerberus13, Kerenb, Kku, Kuru, KyleAraujo, LittleBenW, Loripiquet, MagneticFlux, Maralia, Melcombe, Michael Hardy, MikeLampaBI, Mkennedy1981, MrOllie, NishithSingh, Ocatecir, OnTheNet21, Ottawahitech, Paolo787, PitOfBabel, Planbhups, Prabhu137, Quasipalm, RichardF, Rick lightburn, Ronz, Sanya r, Sergio.ballestrero, SimonP, Simplyuttam, Spugsley, Stephenpace, TFinn734, TheAdamEvans, Tmguru, Trevorallred, Vishal.dani, Visviva, WhartonCAI, Wikiolap, Zgemignani, 102 anonymous edits

Data mining Source: http://en.wikipedia.org/w/index.php?oldid=513181007 Contributors: 16@r, 1yesfan, 48states, 7376a73b3bf0a490fa04bea6b76f4a4b, 9Nak, A m sheldon, AVM, Aacruzr, Aaron Brenneman, Aaronbrick, Aaronzat, Abrech, Acather96, Adam McMaster, Adam78, Adamminstead, Adelpine, Adrian.walker, Aetheling, Ahoerstemeier, Aiwing, Ajaysathe, Akhil joey, Alan Liefting, AlbertoBetulla, AlexAnglin, Alfio, Altenmann, Alusayman, Amossin, Andonic, Andre Engels, Andrei Stroe, Andres.santana, Andrevan, AndrewHZ, Andrzejrauch, Andy Marchbanks, Angela, Anilkumar 0587, Anir1uph, Ansell, Antandrus, Antonrojo, Ap, Apogr, Ar5144-06, Aranea Mortem, ArielGold, Arpabr, Articdawg, Aseld, Atallcostsky, Atannir, AtheWeatherman, Athernar, AtholM, Atlantia, Axeman89, BMF81, BRUTE, Badgernet, Barek, Barticus88, Beefyt, Beetstra, Bennose, Bensin, Bhikubhadwa, Blake-, Blanchardb, BluePlateSpecial, Bmhkim, Bmicomp, Boleslav Bobcik, Bonadea, Bongomatic, Bongwarrior, Bovineone, Bryan Derksen, Btcoal, Bubba73, Bull3t, Burakordu, Burkeangirl, CRGreathouse, CWY2190, Caesura, Calliopejen1, Calum MacÙisdean, Can't sleep, clown will eat me, Candace Gillhoolley, Capitalist, Capricorn42, CesarB, Cflm001, Cgfjpfg, Chafe66, Charvest, Chenxlee, Chenzheruc, Chire, Chivista, Chowbok, Chrisguyot, Chzz, Cis411, Cmccormick8, CoderGnome, Colewaldron, Comatose51, Cometstyles, CommodiCast, Comp8956, Compo, Compuneo, Confusss, Connormah, Cronk28, Ctacmo, Cureden, Curuxz, Cutter, D vandyke67, DH85868993, Dagoldman, DanDoughty, Dank, Daqu, Data.mining, DataExp, Davgrig04, David Eppstein, David Haslam, Dawn Bard, Dcirovic, Dcoetzee, DePiep, Deanabb, Debuntu, Deepred6502, Dekisugi, Delaszk, Delldot, Deltabeignet, Den fjättrade ankan, Denoir, Deville, Dhart, Dicklyon, Differentview, Dlyons493, Dmmd123, Dodabe, Don4of4, Doug4, Download, Dr Oldekop, DrCroco, Dreamyshade, Drkknightbatman, Drowne, DryaUnda, Dudukeda, DueSouth, DyingIce, ERI employee, Ecmalthouse, Edayapattiarun, Ejrrjs, Elsendero, Elseviereditormath, Emarsee, Emilescheepers444, Enggakshat, EngineerScotty, EntropyAS, EoGuy, Epic, Equilibrioception, Er.piyushkp, Ericmortenson, ErinRea, Essjay, Estard, Etz Haim, Euryalus, EvaJamax, Fahadsadah, Falcon8765, Fang Aili, Fauxstar, Fcueto, Feraudyh, Filemon, Filip*, Fly by Night, Flyskippy1, Foober, Fourthcourse, Foxyliah, Frap, Fratrep, Freestyle-69, Funkykeith777, Furrykef, Fuzheado, Gadfium, Gaius Cornelius, Gargvikram07, Gary King, GeorgeBills, Giftlite, Giggy, Glane23, Glenn Maddox, Gmelli, Gogo Dodo, GoingBatty, Gomm, Gpswiki, Graciella, GraemeL, Graham87, GregAsche, GregorB, Ground Zero, Gscshoyru, Guerrerocarlos, Gurubrahma, Gwyatt-agastle, Gzkn, H005, Haakon, Hadleywickham, HamburgerRadio, Hanifbbz, Harsh 2580, Headbomb, Hede2000, Hefaistos, Helwr, Henrygb, Hherbert, Hike395, Hu12, HughJorgan, Hulek, Hut 8.5, Hydrargyrum, Hynespm, Ianhowlett, Inoshika, Ipeirotis, Isomorphic, IvanLanin, Ixfd64, JCLately, JHP, JJ Harrison, Jackverr, Jamiemac, Janvo, Jasonem, Jay, Jayrde, Jefgodesky, Jeltz, Jerryobject, Jesuja, Jet57, Jfitzg, Jfroelich, Jguthaaz, Jim1138, Jimmaths, Jimmy Pitt, Jkosik1, Jks, Jmajeremy, Joanlofe, Joerg Kurt Wegner, John Broughton, John Vandenberg, Joinarnold, JonHarder, Jonkerz, Josephjthomas, Joy, Jpbowen, Jpcedenog, Jpoelma13, Jrtayloriv, Juliustch, Just Another Dan, Kadambarid, Kainaw, Katanada, Katieh5584, Katyare, Kbdank71, Kexpert, KeyStroke, Kgfleischmann, Killian441, Kjtobo, Kkarimi, Kmettler, Kotsiantis, Krexer, Kuru, Kushal one, Kxlai, Kyleamiller, LOL, LaUs3r, Lambiam, Lament, Lark137, LarryGilbert, Lawrykid, LeighvsOptimvsMaximvs, Leiluo, Leo505, Leonardo61, Lisasolomonsalford, Liwaste, Looper5920, Loren.wilton, Lovro, Luke145, M.r santosh kumar., MER-C, MK8, Mabdul, Mackseem, MainFrame, Malleus Fatuorum, Malo, Mannafredo, Manufan 11, MarcoTolo, Mark Klamberg, Mark Renier, Mark.hornick, Marner, Martin Jensen, Materialscientist, Mathstat, Matt90855, Mattelsen, Matusz, Maurreen, Mdd, Melcombe, Mhahsler, Michael Hardy, Michal.burda, Mike Rosoft, Mike Schwartz, Mike78465, Mikeputnam, Mindmatrix, Mkch, Modify, Monkeyman, Moshiurbd, Mpaye, MrOllie, Mshecket, Mswake, Mydogategodshat, Mátyás, N8chz, NHRHS2010, Nabeth, Naeemmalik036, Nano5656, Nathanashleywild, NawlinWiki, Neilc, Netra Nahar, Netsnipe, Nezza 4 eva, Ngorman, Nilfanion, Nirvanalulu, Nixdorf, Nixeagle, Nowozin, Nuno Tavares, Nuwanmenuka, ONEder Boy, Ocarbone, Odo Benus, Ojigiri, OlafvanD, OlavN, Oleg Alexandrov, Oli2140, Onasraou, Onco p53, Onebravemonkey, OverlordQ, OwenX, Oxymoron83, Pakcw, Palapa, Parikshit Basrur, Paul Foxworthy, Pdfpdf, Pgan002, Ph92, Phantomsteve, Pharaoh of the Wizards, Philip Trueman, Philopedia, Philwelch, PhnomPencil, Pinethicket, Pingku, Pinkadelica, Pjoef, Pmauer, Pmbhagat, Predictor, Priyank782, Propheci, Pyromaniaman, Quanticle, Qwfp, R'n'B, RJASE1, RTG, Raand, Rabarbaro70, Radagast83, RadioFan, Ralf Klinkenberg, Ralf Mikut, Ramkumar.krishnan, Raomohsinkhan, Raymondwinn, Rd232, Reaper Eternal, RepubCarrier, Rich Farmbrough, RichardF, Ripper234, Rjanag, Rjwilmsi, Rknasc, Robiminer, RoboBaby, RonFredericks, Ronz, Rosannel, Roxy1984, Rugaaad, Rustyspatula, Ryanscraper, SOMart, Salih, Sanya r, SarekOfVulcan, Sarnholm, Sbacle, SchreiberBike, Schul253, Scientio, Seaphoto, Sebastjanmm, Sebleouf, Sepreece, Serenity-Fr, Sergio.ballestrero, Shantavira, ShaunMacPherson, Shaw76, Shoessss, Shwapnil, Simon Lacoste-Julien, Simsong, SiobhanHansa, Siroxo, Sjakkalle, SkerHawx, Skizzik, Slhumph, Slon02, Smallman12q, Somewherepurple, Spencer, Spiritia, Splash, SpuriousQ, Srp33, Starnestommy, Statethatiamin, StaticGull, Stekre, Stephenpace, Sterdeus, Stfg, Stheodor, Stpasha, Sunray, Sunsetsky, Supertouch, Swift-Epic (Refectory), T789, TFinn734, Talgalili, Tareq300, Tdelamater, Tedickey, Terryholmsby, TestPilot, Texterp, Tgeairn, The Anome, The Evil IP address, The Rambling Man, The Thing That Should Not Be, The Transhumanist, Thefriedone, ThreeDee912, Thumperward, Thundertide, Tiffany9027, TigerShark, Timdew, Tlevine, Tobias Bergemann, Tom harrison, Tomwsulcer, Tony1, Toohuman1, Tpbradbury, Traroth, Tristanb, Trusilver, Tslocum, Twerbrou, Twillisjr, TyrantX, Ubuntu2, Udufruduhu, Uksas, UngerJ, Unyoyega, Uploadvirus, Urhixidur, Valerie928, Valodzka, Van helsing, Vangelis12, Varlaam, Verloren, Versus22, Verticalsearch, VerySmartNiceGuy, Veyklevar, Vgy7ujm, Vijay.babu.k, Vitamin b, Vlad.gerchikov, Vonaurum, Vonkje, W Nowicki, Waggers, Walter Görlitz, Warrenxu, Wavelength, Webzie, Weregerbil, Whizzdumb, WikHead, Wiki0709, Wikiant, Wikiolap, WinterSpw, WirlWhind, WojPob, Woohookitty, X7q, Yecril, Ypouliot, Zeno of Elea, ZimZalaBim, Zzuuzz, 1313 anonymous edits

Process mining Source: http://en.wikipedia.org/w/index.php?oldid=511993387 Contributors: Addere, Archaeoceratops, Avalon, Bpm123, Caesura, Coopislover22, Dzjenske, Elzibio, In7sky, JIP, KathrynLybarger, MER-C, Mdd, Michael Hardy, Mzurmuehlen, Requestion, RichardVeryard, Rjwilmsi, Rp, Tashif, Tony0002, Walk&check, Wireless friend, Wvdaalst, Ɯ, 27 anonymous edits Article Sources and Contributors 162

Complex event processing Source: http://en.wikipedia.org/w/index.php?oldid=510999019 Contributors: ACalleM, Aaron Kauppi, Amniarix, Anandnalya, AndrewHowse, Andypayne, Anthony Appleyard, Apuneet, Betacommand, Bharath r, Bonhomie, Charleyrich, Chrismartins, CitizenB, Darp-a-parp, Decoy, DhrubaB, Differo, Dougher, Download, Eboily, Echidnaslumber, Elonka, EoGuy, Erianna, Euchiasmus, Eudymon, Ferran.cabrer, Flameviper, Geeker87, Giant fan, Gsaurer, Harshana05, Hemphilldavida, HighKing, Hu12, Ihalilaltun, Isvana, J. J. F. Nau, Japinder, JeffAdkins, Jmcenaney, Joedoedoe, Joel7687, Jp.martin-flatin, Jtq4u, KarlHeg, Katharineamy, Kirilian, Leo gan 57, Link1983Link, LouisLovas, MC Wapiti, Mark Renier, Mefisk, Mindmatrix, Modify, Mofojackson, Mohit.thatte, Mrkwpalmer, Muloem, Neilmcgovern, Nixeagle, PhnomPencil, Pratyeka, Radagast83, Raymondwinn, RichardVeryard, Rjwilmsi, Ronnotel, Sandeepbishnoi.wiki, Sergeman, Shalin420, Sjkim1971, Slade8181, The Thing That Should Not Be, Themfromspace, Theodolite, Tikiwont, Tvorozil, Winterst, Wireless friend, Wmahan, Ycbrad, Ygokirmak, 183 anonymous edits

Business performance management Source: http://en.wikipedia.org/w/index.php?oldid=499751681 Contributors: Aaron Brenneman, Acontrada, Aitias, Ajit555, AlejandroCh, Alliance09, Amalve, Andypandy.UK, Arpabr, Arthur Rubin, Arunsingh16, Ash, At-square, Auntof6, BazookaJoe, Beardo, Bharatcit, Bobjudeferrante, Bpmmageditor, BrokenSegue, Bron01, Bruinz50, Bryan Derksen, CarlBuffery, CasperJB, Ceyockey, ChrisLoosley, Claudine McClean, CorPeuM, CovenantD, CraigSaunders, Crysb, Dancter, Darp-a-parp, Donert, Drschultz, ERcheck, Eastcoastjim, Editorwoo, Fenice, Freejason, Fæ, Garycokins, Germanseneca, Giadaint, Gijoemike, Glawrie, Grafen, GregorB, IvanLanin, JLaTondre, Jay, Jbecher, JesseHogan, John Vandenberg, JonHarder, Just zis Guy, you know?, Jvlock, Jwo3, JzG, Kanandm, Kellylautt, Khalid, Kku, Klaus G, Kurtjb, Kuru, Liface, Life of Riley, LittleOldMe, Maralia, Mark dash, Maurreen, Michael Hardy, Michkinn, Milesgillham, Modster, MrOllie, Mrdashboard, Mydogategodshat, Niteowlneils, Nonantum, Nricardo, OEEGuru, Ohnoitsjamie, Olapman, PNMsoft, PabloTavares, Pagospl, Pasquale, Peters72, Phil Boswell, Pintail1626, Pleasetrustme, Poweroid, Psb777, Radagast83, Rich Farmbrough, RichardVeryard, Rly2000, Rockfang, RonDimon, Ronz, Sarnholm, Saulat78, Seraphimblade, SiobhanHansa, Sleepyhead81, Slon02, Smalljim, Srikanthsh, Sytwix, Tasumner, Tbucki1, Technopat, The Thing That Should Not Be, Tjtyrrell, Travis.a.buckingham, Twirligig, Vees, Venkatvis, Wcrosbie, Wik, Wikiolap, Wile E. Heresiarch, Willcho, Winterst, Wkygy, YPeeters, Yamamoto Ichiro, Zeimusu, ZimZalaBim, 241 anonymous edits

Benchmarking Source: http://en.wikipedia.org/w/index.php?oldid=511873772 Contributors: 16@r, Aapo Laitinen, Ademkader, Adpsimpson, Ahoerstemeier, Albedo, AlphaEta, Amalthea, AndrewCarey, Apqcmrkt, Asclepius, Axlq, BRUTE, Bazonka, Beatgr, Beetstra, Betacommand, Bevo, Billyb22, Black Swan01, Brad Kenney, CPieras, Caim, Calltech, CambridgeBayWeather, Capricorn42, Cbishop77, Cevalsi, Coerrsm, Commander Keane, Copter007, Courcelles, Craigwb, Crysb, Crystallina, Csbaker80, Cutter, Darth Kule, David Edgar, David n m bond, Dcirovic, Deborah new, Deli nk, Dick.avis, Dinbandhu kmr, E-pen, Eactivist, Elisina78, Encephalon, Enzoanglia, Eric Zbikowski, Frehley, Fuhghettaboutit, Gmatht, Graemeaustin, Gsmgm, Guthardt, Hede2000, Heisenbergthechemist, Hu12, Huodaxia, Ian mccarthy, Iliank, Imroy, Indon, Irishguy, Jeffrey Henning, Jjasi, Joelsmom, JohnCub, JonHarder, Jusdafax, Kku, Kucing, Kuru, Lemmie, Lillingen, LittleDan, Liu.spain, Lucasartes, LutzPrechelt, MER-C, MQL, Majorly, Malcia, Maurreen, Mayalld, Mmxx, MrOllie, Mwolleswinkel, Mydogategodshat, N5iln, Naniwako, NawlinWiki, Noot al-ghoubain, Nuesoft, OSP Editor, Ottawahitech, PURCDirector MarkJamison, Petrolmaps, Philip Trueman, PhnomPencil, Pinkano, Pm master, RTFArt, Rbonjo, Rd232, Reinyday, Rettetast, RichardF, Ricochet2, Robert P. O'Shea, RobertG, Robolb, Romanm, Ronaldo12345, Ronz, Rsmann1, Rsmann2, Ryanology, S, Sam Hocevar, Samjackson99, Sergeidave, Shadowjams, Shantavira, Signalhead, Simi8017, Sjakkalle, Spike Wilbury, StaticVision, SueHay, Suizei, Suruena, TastyPoutine, Tgeairn, TheGrimReaper NS, Timwhit, Tswain1, Val42, Very special, Veyklevar, WazzaMan, Wk muriithi, Wrp103, Xodó, Xuelei, Zfr, Дмитрий Сироткин, 320 anonymous edits

Text mining Source: http://en.wikipedia.org/w/index.php?oldid=513180961 Contributors: 1exec1, A3RO, ABF, AKA MBG, AManWithNoPlan, Adam78, Adoniscik, Agathoclea, Alansohn, Alessandro.zanasi, Alexmorgan, Amzimti, Andrewjsledge, Angela, Anonymous but Registered, Astaravista, Atolf19, AvicAWB, Babaifalola, Babakdarabi, Beetstra, Beland, Bender235, Bina2, Boxplot, Bradhill14, Bubba hotep, CalendarWatcher, CharlesGillingham, Chickensquare, Chire, Comtebenoit, D6, Dallan, DamsonDragon, DarkSaber2k, Davednh, Debuntu, Dianeburley, Disooqi, Dmb000006, Dokkai, Dr.faustus, Drecute, Dreftymac, Drgarden, Ds13, Dtunkelang, East718, Eric.dane, EverGreg, FayssalF, Fennec, Fernando S. Aldado, Ferzsamara, Firstauthor, Fnielsen, Furrykef, Geni, Glentipton, Gogo Dodo, Gokmop, GraemeL, Guyjones, Hdurina, Hike395, Hobbularmodule, Hu12, Infovarius, Intgr, Iridescent, IslandData, IvanLanin, JBrookeAker, Jahub, Jerryobject, Jfroelich, Jkwaran, Jodi.a.schneider, Joerg Kurt Wegner, John Vandenberg, JonHarder, JoshKirschner, Jplehmann, Justin Mauger, JzG, K. Takeda, Khym Chanur, Kku, Kotsiantis, Kq-hit, Ksteen, Lalvers, Lam Kin Keung, Lexisnexisamsterdam, Lexor, Louiseh, Luke145, MBisanz, Macaddct1984, Maghnus, Magioladitis, Marie22446688, Marketpsy, Martinef, Matthew, Maximgr, Mayko333, Mdehaaff, Meshlabs, Mfalhon, Mike Rosoft, Mkonchady, Musidora, Nixdorf, Nixeagle, NotAnonymous0, Nwynder, Object01, Olaf, Pangeaworld, Paulbalas, Periergeia, Pigsonthewing, Quasipalm, Ralf Klinkenberg, Ray3055, Rgranich, Rogerfgay, Ronz, Saganaki-, Saustinwiki, Scientio, Sebastjanmm, Serapio, Shadesofgrey, Shah1979, Shanes, Slabbe, Slakr, Sonermanc, Sredmore, Stabylorus, Stanislaw.osinski, Stephen Turner, TextAnalysisProf, Texterp, Textminer, Themfromspace, Thüringer, Tiffany9027, Timrollpickering, Tmcw, TyA, Ubahat, Utcursch, Valerie928, Van helsing, Veinor, Vision3001, Waidanian, Whayes, Wikimeyou, Wikri63, Woohookitty, Xicouto, Yami0226, Yatsko, Zang0, ZimZalaBim, 388 anonymous edits

Predictive analytics Source: http://en.wikipedia.org/w/index.php?oldid=513163651 Contributors: Ahyeek, Allens, Andrewpmk, Angoss, Apdevries, Arpabr, Atama, Avani, Baccyak4H, Bateni, Batra, Bellemichelle, BizAnalyst, BlaineKohl, Boxplot, Calair, Chire, Chrisguyot, CommodiCast, Cookiehead, Crysb, Cuttysc, Cyricx, DARTH SIDIOUS 2, Dancter, DeadEyeArrow, Dherman652, Dmitry St, Dominic, Dontdoit, Doug Bell, Download, Drakedirect, Dvdpwiki, EAderhold, Edward, Ekotkie, Ethansdad, Eudaemonic3, Fbooth, GESICC, Giftlite, Gilliam, Glenn Maddox, Gregheth, GuyRo, Hherbert, Howie Goodell, HughJorgan, IRP, Into The Fray, Isthisthingworking, JHunterJ, Jackverr, Jeremy Kolb, Jfroelich, Jlamro, JonHarder, Jssgator, Kai-Hendrik, Karada, Kku, Kmettler, Kuru, Leighblackall, Lisasolomonsalford, Lpgeffen, Luke145, MC Wapiti, MER-C, Maralia, Marek69, Mcld, Melcombe, Michael Hardy, Mikeono, Mr. Darcy, MrOllie, Nosperantos, Oleg Alexandrov, Onorem, Peter.borissow, Pgr94, Phy31277, Qwfp, Rajah, Ralf Klinkenberg, Ralftgehrig, Ramkumar.krishnan, Raspabill, Requestion, Ronz, Rpm698, Rubicon, S2rikanth, SOMart, SWAdair, SchreiberBike, Scientio, Selain03, SpaceFlight89, SpikeToronto, Stefanomazzalai, Stephen Milborrow, Stephen Turner, Sterdeus, Sudheervaishnav, Sunsetsky, Sweet2, Synecticsgroup, TLAN38, Talgalili, Talyian, The Anome, Thirdharringtonskier, Thorwald, Triplestop, Trusilver, Vaheterdu, Van helsing, Vianello, Vikx1, Vrenator, WhartonCAI, Zvika, Zzuuzz, Σ, 180 anonymous edits

Prescriptive analytics Source: http://en.wikipedia.org/w/index.php?oldid=508702227 Contributors: Bearcat, BlaineKohl, Melcombe, Ottawahitech

Decision support system Source: http://en.wikipedia.org/w/index.php?oldid=510898541 Contributors: -Midorihana-, 10metreh, AJackl, Academic Challenger, AdultSwim, Alansohn, Alexsung59, Aligilman, Alkarex, Andreas Kaufmann, Andrefalcao, Anshuk, Anticipation of a New Lover's Arrival, The, Arpabr, Arthur Rubin, Atlant, B2stafford, Beardo, Berny, BitStrat, Boxplot, Bruce1ee, CalcMan, Camw, Can't sleep, clown will eat me, Chowbok, CommodiCast, Coolcaesar, Crysb, DXBari, Damieng, DePiep, DeadEyeArrow, Disaas, Discospinster, Dlb19338, ESkog, Echuck215, Ehamberg, Ekdamcheap, Emailtonaved, Ettrig, Extremecircuitz, EyeSerene, Falcon Kirtaran, Fieldday-sunday, G.Voelcker, Gachet, Geni, Giftlite, Gmarakas, Guillaume2303, GuySh, Hauberg, Hbent, Henry Delforn (old), Hiner, Informavores, Isaacdealey, J. M., J04n, Jackacon, Jauerback, Jdthood, JonHarder, Kduhaime, KenKendall, KrakatoaKatie, Kuru, Kushalbiswas777, Landroni, LizardJr8, MER-C, Mark Renier, Masgatotkaca, Maurreen, Mdd, Metalray, Michael Hardy, MooNFisH, MuZemike, Mydogategodshat, Newbyguesses, Nickkretz, Ohnoitsjamie, Peter.C, Piano non troppo, Picapica, Power, Prazan, Q uant, Qwertyus, RC, RInstitute, Raymondwinn, RobertG, Robfergusonjr, Ronz, Rsocol, ST47, Salvar, Sanketholey, Sarnholm, SchreiberBike, Scriberius, Seglea, Simoes, Sindidziwa, StefSybo, Stevage, Stevietheman, Supten, Taichi, TauntingElf, The Transhumanist, Thobach, ThomasHofmann, Titoxd, Tomchiukc, Topiarydan, Trbdavies, TyA, Ummit, Van helsing, Veinor, WadeSimMiser, Walk&check, WaltBusterkeys, Wavelength, WorldlyWebster, Wtmitchell, Xyzzyplugh, 351 anonymous edits

Competitive intelligence Source: http://en.wikipedia.org/w/index.php?oldid=509214577 Contributors: 206209nyc, A360, AbsolutDan, Ananthnats, Anotherklaus, Arikjohnson, Beeblebrox, Benaron, Betacommand, Billinghurst, Bmusician, Boing! said Zebedee, British2004, Bunnyhop11, Calabraxthis, Campsean, Cherubino, Chris the speller, Ciadvantage, Cimas, Cinnamon42, Clicketyclack, CommodiCast, Companalyst, Corp Vision, Crysb, Curious Blue, Dancter, Danwikimax, Democracyisborn, Eastlaw, EdBever, Edcrop, Edward, EneMsty12, Erianna, Ewlyahoocom, Exit2DOS2000, Expertci, Fliptoplid, Forceblue, Fvasconcellos, GRBerry, Gadfium, Gavin.collins, Gdo01, George2001hi, Gorongo, Group7, Hstavrinides, Hu12, Jeffhane100, Jeffswartz, Jgiandoni, Joelm, JohnEdit21, Jojalozzo, JosefBar, JzG, Kalle Wirsch68, Kuru, LilHelpa, Linkfinder2012, Lode Runner, Lrrios78, Luna Santin, Maralia, Mark Bosley, Maurreen, McGeddon, Melab-1, Methods101, Mhpolak, Michael Hardy, Michaeldsuarez, Millmoss, Mitrius, MrOllie, Nhwiii, Nimalan.dj, Nono64, Okosa, OlEnglish, Olegwiki, Paddingtonbaer, Papertigers, Pernambuco, Pion, Porterfan1, Prescottje, RHaworth, Renesis, Richard Marshall, Rjwilmsi, Ronz, Ruy Pugliesi, Ryu ushida, S.K., Saganaki-, Samohyl Jan, Samuraiemily, Sarnholm, דוד שי, Saxton.Ryan, Sbeletre, Shoeofdeath, Stefanomione, Thingg, TomStar81, Transportnagar, Untwirl, Verdrawn, Vh88, VsevolodKrolikov, Wragge, Ydriuf, Ysterpen, Zzuuzz, 203 anonymous edits

Data warehouse Source: http://en.wikipedia.org/w/index.php?oldid=511969642 Contributors: 99magred, ABMH, ALargeElk, Aaron Brenneman, Ace of Spades, ActivExpression, Aetherfukz, AgentCDE, Aitias, Allstarecho, Alqayoom, Altenmann, Amorim Parga, Andy Dingley, Aneah, Angusmca, Ankit Maity, AnmaFinotera, Ansumang, Antonrojo, Apolitano, Arcann, Arch dude, Ari1974, Arielco, Arpabr, Asif earning, Asparagus, Atlantas, Atomician, Atw1996, AutumnSnow, Avjoska, BanyanTree, Bdebbarma, Beardo, Berny, Bertport, Betacommand, Bgwhite, BigDunc, Binksternet, Blanchardb, Blue520, Bmotoc, Bobrayner, Bogey97, Brc4, BryantAvey, Butros, C960657, CGerrard, Camw, Carmichael, Catchsandy, Chowbok, Chromatikoma, Chuq, Cobi, Cometstyles, Conversion script, Coolelle, Corruptcopper, Crysb, Crystalattice, Cybercobra, DARTH SIDIOUS 2, DBigXray, DNkYpNcH, DWBIGuru, Dan100, Dancter, Darth Zephyr, DatabaseDiva, Dawriter, DePiep, Denisarona, DhirajGupta, Diannaa, Diego Moya, Dina, Disaas, Dividing, Dmccreary, Doc9871, Donaldpugh, Download, Dpm64, Drbreznjev, Dthvt, Długosz, Edward, Edwardmking, Elonka, Enimer, Er.piyushkp, Ericd, ErinRea, Ethansdad, Evert r, Falcon Kirtaran, Fbooth, Finlay McWalter, Foxj, Foxyliah, Fred Bradstadt, Fudoreaper, Funandtrvl, Fyyer, Fæ, G716, Gabrielinux, Gachet, Gaius Cornelius, Gene Fellner, Ghewgill, Gilliam, Giulio.orru, GreenInker, Greenlightwilly, GregorB, Grjuedes, Gscshoyru, Gskrzypczynski, Gurch, Gzkn, HJ Mitchell, Hadal, Hanifbbz, Hectorpa, Hhultgren, Ianemitchell, Imdeadlymon, Intgr, Itmanning, IvanLanin, Ixfd64, JCLately, JForget, Jackfork, Jarble, Jaxl, Jay, Jburtenshaw, Jdrlundgren, Je ne détiens pas la vérité universelle, Jenks1987, Jgroove, Jguthaaz, JoeSmack, John Vandenberg, Joinarnold, Jomsviking, Jonwynne, JoyMundy, Joyous!, Jusdafax, Jwissick, Kadambarid, Kellylautt, Kingpin13, Kingpomba, Kiwi137, Kjkolb, Kku, Klausness, Kuru, Kwesterh, L Kensington, Lahiru k, Laurentseries, LcawteHuggle, Leandrod, Les boys, Lightmouse, Lingliu07, Llla32, Loonquawl, Lordzilla, Loren.wilton, Lostraven, MHPSM, MHargraves, MK8, Madman2001, Madmonky, Mahuna2, MainFrame, Makro, Malcolmxl5, Mandarax, Manning Bartlett, Marcika, Mark Renier, Mark94502, Martarius, MartinsB, Maurreen, Mav, McSly, Mdd, MeekMark, Megras, Meyer, Mhooreman, Michael Devore, Michael Hardy, Michal Jurosz, Mike Rosoft, Mikeblas, Mindmatrix, Minervasolutions, Minimac, Minna Sora no Shita, Modify, Mpolichany, MrOllie, Mustafaisonline, Mwaci2, MyWorld, Mydogategodshat, Neilc, NishithSingh, Nitin ravin, Nixdorf, Nn123645, Noah Salzman, Nraden, Nrahimian, Oaf2, Object01, Ojigiri, OliverKlozoff, Ondertitel, Ordoon, Oxymoron83, PC78, Philip Trueman, Piano non troppo, Pietrow, Pisceswzh, Playmobilonhishorse, Pnm, Propheticone, Qst, Quinsareth, R'n'B, RHaworth, RJaguar3, RScheiber, Ramesh l, Ranjran, RayGates, Raysonho, Rednblu, Reedy, Rhoegg, Ringbang, Article Sources and Contributors 163

Rjanag, Rjwilmsi, Rknasc, Robert K S, RocklegendPoker, Roenbaeck, RomanEmbacher, Rrabins, Rschmertz, Ryan Rasmussen, S, SJP, ST47, Sabed Aashour, Sae1962, Salvio giuliano, Samroar, Sandeep4tech, Sarnholm, Savan3147754, Schul253, Scurra, Sean D Martin, SebastianHelm, Sesamevoila, Sfingram, Shortbusmedia, Shubinator, Shwetank99, Sidhekin, SimpleTheory, Siroxo, Sixstone, Sleske, Slowbro, SlubGlub, Snowolf, SoCalSuperEagle, SoSaysChappy, Soumyasch, Soundmind43, Spoxox, SqlPac, Srperez, Stefmol, Stephenpace, Stevag, Steve G, Steverapaport, Strongsauce, Studerby, Stvltvs, Subash.chandran007, Swatjester, Swintrob, TFinn734, THEN WHO WAS PHONE?, Tamarandom, Tapir Terrific, TechPurism, Techteachermayank, Telugucherla, Tgeller, The Thing That Should Not Be, Tide rolls, Tom Morris, Troels Arvin, Trusilver, Tvarnoe, Two Companions, Uršul, Utcursch, Vaibhaovaibhav, Vald, Van helsing, Vertium, Vincent jonsson, Vinod Alangaram, Vipinhari, Viridae, Volphy, WH98, Walk&check, Wanderingstew, Wavelength, Wedge3rd, WetherMan, WikHead, Wiki Raja, Wikiolap, Wile E. Heresiarch, Will Beback, Willscraper, Wknight94, WojPob, Writerguy71, Wtmitchell, Xlaran, Yasth, ZimZalaBim, Zundark, 1048 anonymous edits

Data mart Source: http://en.wikipedia.org/w/index.php?oldid=509652928 Contributors: Adrignola, Andy Dingley, Arcann, Beardo, Betacommand, Bijee, Bonadea, Canadian-Bacon, Chipofbags, Ckw888, Crysb, DARTH SIDIOUS 2, DePiep, Docu, Doniago, Dustmachine, Electron9, Elsendero, Eyhk, Fearlessunit, Funnyfarmofdoom, GilCahana, GregorB, Gscshoyru, Gurch, Isomorphic, Jay, Joejimbo, Josh5555, KeyStroke, Kingpin13, Kuru, [email protected], Ldalle, M7, Madman2001, Mark Renier, Materialscientist, McSly, Michael Hardy, Michel Jansen, MilerWhite, Morpheios Melas, Mpramsey, Mwdsmith, Mwtoews, NishithSingh, NortyNort, Nuno Tavares, Opsix, Phileas, Pne, Puckly, Rahulsrivastav2607, Rhillard, Ringbang, Robert K S, Rockbiter, Sadeq, Sarfa, Sarnholm, SqlPac, TFinn734, TechPurism, Turk oğlan, Tzeh, Volphy, Wyatt915, 124 anonymous edits

Forrester Research Source: http://en.wikipedia.org/w/index.php?oldid=508369527 Contributors: Adrius42, Ahunt, Anowitz, Aranel, AzaToth, Bobaffett, BostonRed, Bryan Derksen, Cms-8, DMG413, DavidWBrooks, Davidp, Difu Wu, Editrixie, Edward, ElBenevolente, ElectroPro, Eurobas, Falcon8765, Flowerparty, HHammond, Jeffrey Henning, Jfire, Jmcquivey, Johnpacklambert, Kmcisaac, Kurttttt, League fool, Lescarolnyc, Linguatech, Maxx9997, Millmoss, Mizchalmers, Mmelanson, Mworth, Norm, Nyttend, Ohnoitsjamie, Pearle, Perohanych, Pgan002, Pjb007, PlayCold, R. fiend, Rhobite, Sfmammamia, Shortride, Solphusion, Steveatehs, Steven Walling, Tallbonez, Tawker, TexasAndroid, Vkaplo, Wackymacs, Wererooster, Wookiepower2, Wuthering123, 66 anonymous edits

Information management Source: http://en.wikipedia.org/w/index.php?oldid=510061260 Contributors: 7, AIMSinfo, AJackl, Agendum, AndyB, BrockLi, Bumm13, Carrielle, ChrisCork, Danim, Dawnseeker2000, Delaszk, Dfrg.msc, El grimley, Evert r, Felixlover4ever, Glendac, Glenn, Gurchzilla, Hadal, Harkonlucas, Heidijane, Hintss, InverseHypercube, Ja 62, Jarble, JennyRad, Jinhyuk, JoseREMY, Jpbowen, Julcia, Kai a simon, Kku, L mahonsmith, Lead Farmer, Lepti, LibLord, Lietviki, Lupin, Malcolmxl5, Mark.s.patrick, Markcowan, Maxim Masiutin, Mjbinfo, MrOllie, NeilN, Nighthawk2050, Northamerica1000, O1eqlsfun, Ohconfucius, Oicumayberight, Oli Filth, Pawl Kennedy, Pheasantplucker, Poddly, Prof 7, Ronz, Rudowsky, SchreyP, SiobhanHansa, Sjhnumber1, Someguy1221, SouthLake, Stan3000, Strikers 2, SudoGhost, The Thing That Should Not Be, TheDataMaster, TrisDG, Tthheeppaarrttyy, Useight, V8Tbird, Veridia, anonymous edits 160 ,ﺩﺍﻟﺒﺎ ,Versageek, Versus22, Vespristiano, Veyklevar, Walexino, Wavelength, Westyfield, Wiffy, Wstankich, Ycsinger

Business process Source: http://en.wikipedia.org/w/index.php?oldid=510142295 Contributors: AdjustShift, Alansohn, Alemily, Anetman, Ap, Ash, Atlantia, Barcelova, Bidnessman, Binksternet, Bináris, Blauerflummi, Boly38, Bpmbooks, Buckaroobowie, CalumH93, Canterbury Tail, D00p, Dancter, DanielPenfield, Deadboyman12, Deiz, Dreispt, Dysepsion, Ehheh, Erielhonan, Erkan Yilmaz, Ewlyahoocom, Fabrictramp, Faradayplank, FritzSolms, Garima.rai30, Gbeekhof, Goflow6206, Gotheek, Heirpixel, Hmu111, Hughch, Iain marcuson, Ihcoyc, Inwind, Iskatel, IvanLanin, Jajis, Jarble, Jayen466, JohnCD, Jwitcom, JzG, K.Nevelsteen, KPH2293, Kaarenoergaard, Kai a simon, Khaliddb, Kkonwalin, Kku, Kuru, Lafeber, Levineps, Lingliu07, Lycurgus, M4gnum0n, MairAW, Mark Renier, Materialscientist, Mdd, Michael J. Cunningham, Mkoval, Mmalbayrakoglu, MrOllie, Muhandes, Oashi, Olff, Paul Harmon, Paullewis4372, Pepzos, PlatypeanArchcow, Plinkit, Popguru, Pvanerk, Rblaesing, Reach Out to the Truth, Rforsyth, Rich Farmbrough, RichardVeryard, Robofish, Rockero, Ryan shell, Shajikv, Sharon Crowell,

SimonP, Storace, Subtractive, Teammetz, Three-quarter-ten, Tom harrison, Tomas e, Tony D. Abel, UncleDouggie, WissensDürster, Woohookitty, Wortech tom, Zchriz, Zigger, Zzuuzz, 雁 太 郎, 162 anonymous edits

Business Intelligence portal Source: http://en.wikipedia.org/w/index.php?oldid=490958402 Contributors: Agustf, Fram, Gaius Cornelius, Iulia e, Paulrigby, Plastikspork, 3 anonymous edits

Information extraction Source: http://en.wikipedia.org/w/index.php?oldid=503924921 Contributors: Al Maghi, Alexander Wilks, Animorphus, Astronautguo, Bangla11, Belmond, Bidoll, Bkkbrad, CharlesGillingham, DBigXray, Dasterner, David Eppstein, Disooqi, Dmolla, Dongxun, Dreftymac, Dtunkelang, Edward, Falazar, Fran.sansalone, Francesco sclano, Gdupont, George1975, Gmelli, HamishCunningham, Hosszuka, Icognitiva, Intgr, Jamelan, Jandalhandler, John of Reading, Jojalozzo, JonHarder, Khazakistyle, Kku, Kuru, Lawrence87, Lawrykid, Lucyinthesky45, Mark Renier, MaxEnt, Michael Hardy, Mike Schwartz, Mladifilozof, Natecull, Nevalicori, OlEnglish, Omerod, Owen, Pablomendes, Phil Boswell, Plastikspork, Pushpinder12, Robertbowerman, Ronbarak, Ronz, Rubengra, ScottDavis, Searchtools, SebastianHellmann, Sebastjanmm, Sfrancoeur, Shaalank, Solipsist, Spencerk, Supersun511, Texterp, The Anome, The Transhumanist, Tiffany9027, Will Beback, Yiyeguhu, 99 anonymous edits

Granularity Source: http://en.wikipedia.org/w/index.php?oldid=502763049 Contributors: 16@r, 28bytes, Ashishmeena, Ben.c.roberts, BenProspero, BlueDevil, Charles Matthews, Clales, DBooth, Dicklyon, EdJohnston, Eric Soloveitzik, Formerly the IP-Address 24.22.227.53, FrankTobia, Gbchaosmaster, George100, Giac, Gimboid13, Gulmammad, Jacob grace, Jheald, Kaihsu, Laser813, Lectonar, LeeHunter, Luiscantero, Madmedea, Materialscientist, Michael Hardy, MrOllie, Niceguyedc, Paleorthid, Patrick, Pigsonthewing, Platonides, RJFJR, RJGuisandez, Robertvan1, Rp, Rwdh, SWAdair, The Thing That Should Not Be, Vigneshb4u, Vsmith, 80 anonymous edits

Mashup (web application hybrid) Source: http://en.wikipedia.org/w/index.php?oldid=510024608 Contributors: 16@r, Aaronbeaton, Aervanath, Aftabm, Akheirol, Alamaison, Alik Kirillovich, AmitKumar Agrawal, Amy.C, Andrew c, Aranda, Ashleymarcwright, Aude, Babajobu, Beland, Bhy, Black Falcon, Blinklmc, [email protected], Bobet, Boyprose, Brevity, Bstraehle, BusinessMashups, C3o, CapitalR, CatherineMunro, Cedders, Cfulghum, Chessofnerd, Chris the speller, ChrisGualtieri, Clayoquot, Closedmouth, CockroachTeaParty, Copycow, CovenantD, Crzydmnd, Ctkeene, Cy21, Dancter, Danielfonseca, DataSurfer, David Latapie, Deeptext, DhananSekhar, Diligent Terrier, Dipskinny, Discospinster, Disdero, Dkitchin, Dlyons493, Doc glasgow, Doubleyouyou, Dougher, Dsnelling, Eboily, Eclecticos, EdJogg, EddyPauwels, Edward, Ehheh, Elc66, Elygre, Emurphy42, EnsGabe, Eric.k.herberholz, ErixSamson, EurekaLott, Falcon Kirtaran, Fenwayguy, Ferran.cabrer, Forenti, Fortunevieyra, Frap, Freddymay, Funandtrvl, Fyyer, Garagefire, Genecyber, Gggrzesiek, Ghettoblaster, Gnudeep, Godricwu, Goldameir, GraemeL, Grainger.jim, Greenyoda, H@r@ld, Henry W. Schmitt, Hgfernan, Hiles8500, Hu12, Hutcher, Huygens 25, Hymek, Ian Moody, Iankennedy, IdaApps, Inger elis, Innovateb42l8, Interiot, Intgr, Ironholds, Ivan007, J.delanoy, JHunterJ, Jackbecorp, Jamelan, JamesLWilliams2010, Jasonjoh, Jatkins, Jbergerot, Jcrupi, Jeffreyarcand, Jevon, Jh27607, Jheiss, Jluuwindsorca, Jm34harvey, Jmalc, Johnny B, JonathanPMarsh, Jpbowen, Jtq4u, Julia555, Justavo, Justsee, Kauczuk, Kcmaguire33, KellyCoinGuy, KenFehling, Kephir, Kglafond, Kgoarany, King of Hearts, KlausRoder, Koavf, Kozuch, Kshawaerworthycom, L Kensington, LadyHawk89, Larryaasen, Liberatus, Lincolnite, Loelsner, Lord Matt, Lordsatri, Ltfhenry, Lubutu, MC10, MacTed, Macaldo, Madhura Vadvalkar, Madmardigan53, Magister Mathematicae, Maicole, Mapping101, Marcika, Mark Renier, Mashpope, Mashupman, Master690, Materialscientist, MaxDunn, MaxVeers, Mbblake, Mbc362, Mennonot, Michael Slone, Microsofkid, MikeGogulski, Miketwardos, Mjarrar, Mlaffs, Mlucagilbert, Moejorris, Mojei, Mondoblu, Monisiqbal, Monkeythumpa, Moreschi, MrOllie, Msbloves, Mschevrolet, Mugunth Kumar, Nagy, Neilc, Nicelife, NigelR, Nigelwalsh, NorbertEder, Northgrove, Not a dog, Nrcjersey, NuclearWarfare, ONEder Boy, Odinmetatech, Ohnoitsjamie, Omegatron, Orconero2, P3davis, Partybob, Paycoguy, Peak, Peciv, Perkinss, Peugf, Phanhaitrieu, Phauly, Phil Boswell, Pholmstr, Piast93, Pictowrit, Pilaftank, Pindakaas, Pingloo, Pissant, Plateoftheday, Polf91, PolyGlot, Pro bug catcher, Prodego, Purple40, Qllach, RHaworth, RSStockdale, RadicalFayth, Raphaelsand, RedWolf, Renatko, Rene.bonvanie, ReyBrujo, Rhobite, Rich Janis, [email protected], RichardVeryard, Rjwilmsi, Robertcrowe, Ronz, Royboycrashfan, Rsazima, Rufflos, Sae1962, Salamurai, SamJohnston, Sbhoi, Seewhere.net, Seifip, Serenity007, Sergei Kolomiets, Shadow1, Shoeofdeath, SiobhanHansa, Sleepyhead81, Smartse, Sobaka, Some jerk on the Internet, Sostby, Spacesoon, Squids and Chips, Starfishjones, StuffOfInterest, Stuii, Subdolous, Suboucha, Swallowbooks, Synchronism, TMsk, Talis paul, Tangotango, Tarmo, Tassedethe, TechGuy08, Telecomtom, Tertulia, ThatWikiGuy, The Anome, The Rambling Man, ThePedanticPrick, Thetrickyt, Thobach, Tijuana Brass, Toddfast, Toddlevy, Toddst1, Tony1, Toussaint, Trevj, TurntableX, Tyrell perera, Urbanor, Useight, Vdegroot, Vdzhuvinov, Verify101, Vipinhari, Visvadinu, W.F.Galway, WLU, Wafulz, Walden, Wayne Slam, Weathermaps, Webchat, Weibifan, WereSpielChequers, Whughes98144, Wikistudio, Wisteriabumps, Wombat97, Wowlookitsjoe, Yogendra.joshi, Yousou, Yudis asnar, Yug, ZOverLord, Ziyang1226, Zohar.ma, 955 anonymous edits

Green computing Source: http://en.wikipedia.org/w/index.php?oldid=511974170 Contributors: 28bytes, Adolphus79, Aiyizo, Akira Tomosuke, Alan Liefting, Alansohn, Aleph Infinity, Aliavni, Alpha 4615, Ameliorate!, Analyst10, Anand akela, Andreig84, ArnoldReinhold, Aspevents, BWirth, Behun, Beland, BenignEditor, Bill.albing, BloomingtonSky, Bruce404, C777, CFilm, Cdamcke, Church of emacs, Ckatz, CoupDeMistral, Cut1664, Cwolfsheep, DARTH SIDIOUS 2, Danderson68, David44357, Davidobrisbane, Dmalis, Dman727, Doctorfluffy, Dpanda, Dr Aaij, Drphilharmonic, Drstuey, DuBois, Dwheeler, EagleOne, Electronicdatagirl, Fionacampbelly, Frap, Friend of the Facts, Furrykef, Gaius Cornelius, Galoubet, Geek2003, Geordieminor, GorillaWarfare, Haakon, Harryzilber, History2007, Hnobley, Hpclover, Hu12, IanGodfrey1E, Iridescent, Irishguy, Itgov, IvanLanin, J.delanoy, JDBravo, Jarble, JasonWoof, Jbabe42, Jeh, Jennavecia, Jimdiskman, Jobbon, Johner, Johnfos, Johnuniq, Jonkerz, Joren, Jorfer, Julesd, Kardan, Kdlarkowski, Kelson, Kerri33029, Kozuch, L57 Lives, Lanuova, Larsrein, Laustin1, Lca css, LemonTwinkle, Lemurtree, Lubutu, M1ss1ontomars2k4, M8829, MER-C, MHPSM, MMuzammils, Mac, MacJarvis, Machtfuernacht, Mahjongg, Mandarax, Markandrewberry, Materialscientist, Matsuiny2004, Matthewbulat, Mdd, Mgmcginn, Michelle.naquin, Mild Bill Hiccup, Mindmatrix, Mion, Mmxx, MrOllie, Mrshaba, Msileinad, MyTigers, Mykhal, NOrbeck, NapoliRoma, Natandoron9, Nglg, Notesfromtheroad, NuclearWarfare, Oli Filth, Oxymoron83, Parkertrail, Pearle, Peatcher, Pedro, PerryTachett, Petehopton, Peter Campbell, Philip Trueman, Pinethicket, Piotrus, Plasticup, Plouin, Plugwash, Pmnewsham, Prof Colin, Proxar, Ptoone, Pål Jensen, QuadRobin, Rabarberski, Razarlazer, Rednatski, Research2009, Rfsmit, RichardVeryard, Richardtj, Rjwilmsi, Ronambiar, Ronz, SGN128, Samuel Grant, Sanding06, Saoshyant, Schapel, Schulzgp1, Scoid, Seanlw, Shuffdog, Simesa, SimonP, SiobhanHansa, Skier Dude, Slwri, Srleffler, Suffusion of Yellow, SuperHamster, Superm401, SwisterTwister, Synchronism, Tabby, Tbhotch, Terrafirmex, The Thing That Should Not Be, Theshado13, Thumperward, Tobalola, Tom Worthington, Towel401, Txuspe, UncleDouggie, Usedciscodotcom, Uzume, Vellvisher, Verismic, Viretse, Vortexrealm, Wadamja, Wavelength, Webdc, Who, Widefox, Willaf, Wiretse,

Woohookitty, Xionbox, Xrm0, Yamaguchi先 生, Ysangkok, Yuriz, Yworo, Zippyclown, Zodon, Zzuuzz, 360 anonymous edits

Social network Source: http://en.wikipedia.org/w/index.php?oldid=513016555 Contributors: Anna Frodesiak, Apparition11, Artimis Cat, Axcurtis, Barek, Bellagio99, Bibid, Bjankuloski06en, BobDohse, Bonadea, Brad7777, Daivhp, Danno uk, DarwinPeacock, Editor2020, Ekren, ElKevbo, Eyeofyou, Funandtrvl, Groupuscule, Jake Wartenberg, Jim1138, KLongAus, Klilidiplomus, KuboF, Lotje, Macedonian, Madcoverboy, Meclee, Mike Restivo, Mindmatrix, Modotam, Morphh, Mssclanz, Mysterytrey, Neonworm, NewEnglandYankee, Nihiltres, Pete unseth, Singhal.aman007, SudoGhost, Svlberg, Tbhotch, Tide rolls, Tungsten, Uday.gautam6, Versageek, Vertium, Victoria A Allen, 77 anonymous edits Article Sources and Contributors 164

Data visualization Source: http://en.wikipedia.org/w/index.php?oldid=511789213 Contributors: Aaronzat, Abdel.a.saleh, Agor153, Ashash2288, Bcgrossmann, Blaz.zupan, BrunoGebarski, CMorty, Chartball, Chire, CinagroErunam, Clemenza16, CommonsDelinker, Cristian Opazo, Danlev, David Eppstein, Deeptext, Dekisugi, Delectate, Deville, Drphilharmonic, EpicSystems, Ericmelse, Ernestolago, Fengzhichun, FinnFitzsimons, GreenRoot, Ground Zero, Hanchuanpeng, Hgamboa, Hu12, Hughrheinsohn, Infografica, Ivanpetrov, JamesXinzhiLi, Jowa fan, Kazhe, Kellylautt, Kencf0618, Kosh21, Kuru, Levenen, Linafuko, Linforest, M2Ys4U, MainFrame, Mandarax, Martin58474, Mdd, Melcombe, MilerWhite, Nlan1igse, Ohspite, PeterWilfahrt, PhnomPencil, PrimaVista, Protonk, Qwfp, Qwyrxian, Rmontroy, Robiminer, Ronz, SanFranArt, SchreiberBike, Srylesmor, Stealth500, Stephroj, Stevemorr, Sweetmoose6, Tagresta, Talgalili, Teryx, Tony1, Vaccaricarlo, Vandit.P, Vsmith, Whibbard, Whouk, Wing61, Xmlgal, Yug, Yukimonster, Zeliboba7, ZweiOhren, 92 anonymous edits

Mobile business intelligence Source: http://en.wikipedia.org/w/index.php?oldid=505029251 Contributors: Ajaics, Alexandra31, Alibabala, BD2412, Bearcat, Beroneous, Chowbok, Christiedavis08, Crysb, David.monks, Dylan72986, Flaticida, GoingBatty, Jevansen, Joel7687, Jorgenev, Joshua.pierce84, Kirkcunningham, Lnlena13, Mkblackstone, Muhandes, Obiwankenobi, Pavoman, Pidgeyoki, Pnm, Rhylton, Rollins83, Romio1983, Rovervoyageur, Sayno2junk, Sjcbellevue, SurfBI, Tekgeek09, Timespinnerw, Truonghoangduong, Vcasesb, Versageek, Vibhunath, Woohookitty, Yunshui, Δ, 27 anonymous edits

Composite application Source: http://en.wikipedia.org/w/index.php?oldid=415498993 Contributors: 2514stewart, Chomsnl1, Compapp, Dawynn, Drbreznjev, Elonka, Gjbarber, Glenn4pr, Glennblock, I already forgot, Jfire, Jpbowen, Linkspamremover, Movednaval, Murrayflynn, NetManage, Nigelwalsh, Patrickyong, Peter Campbell, Radagast83, Ramprasad83, RandyKolb, RichardVeryard, Ronz, Simonmartin74, Tiago simoes, Timo Kouwenhoven, Tyrell perera, Visvadinu, 44 anonymous edits

Cloud computing Source: http://en.wikipedia.org/w/index.php?oldid=512967475 Contributors: 1337H4XX0R, 1exec1, 21hattongardens, 28bytes, 623des, 842U, 9Engine, A. B., A.K.Karthikeyan, A.Ward, A520, A8UDI, A930913, Aaditya025, Aavindraa, Abb615, AbdullahHaydar, Abligh, Abrahami, Aburreson, Acandus, Acather96, Achimew, Acrooney, AcuteSys, Adam Hauner, Aditya.smn, AdityaTandon, Adjectivore, Adlawlor, Aflagg99, Agemoi, Ahunt, Ajraddatz, Akropp, Alankc, Alanmonteith, Alansohn, Alex5678, Alexamies, Alexlange, Allenmwnc, Allens, Aloak1, Amartya71, Ameliorate!, American Eagle, Amoebat, Amore proprio, AnOddName, Anamika.search, Anandcv, Anderson, Andersoniooi, Andromani, Andy Dingley, Andyg511, Aneesh1981, Angrywikiuser, Anikingos, Anna Frodesiak, Anne.naimoli, Anonymous Dissident, Ansumang, AoV2, ArglebargleIV, Arnavrox, Ashakeri3596, Ashley Pomeroy, Ashleymcneff, Asieo, AstralWiki, Auntof6, Aurifier, Avermapub, Avillaba, Avlnet, Avneralgom, Avoided, Axorc, B Fizz, BD2412, BDavis27, BaldPark, Balvord, Bamtelim, Barmijo, Beanilkumar, Beer2beer, Beetstra, Bejnar, Belfry, Beltman R., Bencey, Bentogoa, Bernd in Japan, Berny68, Betswiki, Bidgee, Bigfella1123, Biglovinb, Bijoysr, Bikeborg, Bill.albing, Biscuittin, Bkengland, Blaxthos, Blvrao, Bob bobato, [email protected], BobGourley, Bodhran, Bonadea, Bongwarrior, BostonRed, Bovineone, Bped1985, Bpgriner, BrennaElise, BrianWren, Brianwc, Bruce404, Bshende, Bubba73, Bulldog73, Bumm13, Burrows01, Bus stop, Businesstecho, ByM4k5, C. A. Russell, CPieras, Cajetan da kid, Calabe1992, CanadianLinuxUser, Candace Gillhoolley, Capricorn42, Carl presscott, Casieg, Casliber, CastAStone, Cbdorsett, Cemgurkok, Certitude1, Chadastrophic, Chaheel Riens, Chanakal, ChangChienFu, Chapmagr, Chintan4u, Chishtigharana, Chmyr, Chris the speller, Chris55, Christian75, Chriswriting, Chuahcsy, CitizenB, Ckatz, ClamDip, CliffC, CloudComputing, Cloudcoder, Cloudcto, Cloudfest, Cloudnitc, Cloudreviewer, Cloudxtech, Clovis Sangrail, Clutterpad, Cmartell, Colleenhaskett, Colonies Chris, ConcernedVancouverite, Cornelius383, Courcelles, Cpl Syx, Crackerspeanut12, Craig.Coward, Cronos4d, Crysb, Csabo, Cst17, Curtbeckmann, Cybercool10, Czarkoff, D rousslan, DARTH SIDIOUS 2, DBigXray, DJ Clayworth, DSP-user, DVdm, Dadomusic, Dan6hell66, DanStern, Dancter, Danielchalef, Dannymjohnson, Darktemplar, Darkyeffectt, DarrenOP, DaudSharif, Davewild, David H Braun (1964), DavidLam, Davidogm, Dbake, Dbrisinda, Declan Clam, Deepalja, Deicool, Deineka, DeleoB, DellTechWebGuy, Denisarona, Derekvicente, Dethomas, Devinek22, DexDor, Dfxdeimos, Diego Moya, Digestor81, Dimos2k, Dinferno, Dinhthaihoang, Dlohcierekim, Dlohcierekim's sock, Dlowenberger, Dlrohrer2003, Dmcelroy1, Dmichaud, Doctorfree, Dogbertwp, Dogsbody, Dominik92, DoriSmith, Dougher, Dpsd 02011, Dr.K., DrFurey, DragonflyReloaded, Drphilharmonic, DuineSidhe, Dyepoy05, E Wing, E0steven, EDC5370, EEldridge1, ESkog, EagerToddler39, Earthlyreason, Ebhakt, Eco30, Edchi, Eddiev11, Edeskonline, Editorfry, Eeekster, Ego White Tray, Ehheh, Einasmadi, ElKevbo, Elassint, Elasticprovisioner, Elauminri, Elbarcino, Eleven even, EliyahuStern, Elminster Aumar, EmilyEgnyte, Emrekenci, Enc1234, Eniagrom, Epbr123, Er.madnet, Eric, EricEnfermero, EricTheRed20, Estahl, Esthon, Ethicalhackerin2010, Ethoslight, Eusbls, Evagarfer, Everything counts, Evil saltine, EwokiWiki, Explorer09, Falcon8765, FatalError, Favonian, FayssalF, Fb2ts, Fcalculators, Feinoha, Felipebm, Felixchu, Fergus Cloughley, Fieldday-sunday, Figaronline, Figureskatingfan, Finngall, Firsfron, Fishtron, Five-toed-sloth, ForthOK, Francis.w.usher, Frap, Frdfm, Freddymay, Frozen Wind, Fvw, Fylbecatulous, Fzk chc, Fæ, GRevolution824, GVnayR, Gamber34, Gandalf44, Garima.rai30, Gary King, Gawali.jitesh, GcSwRhIc, Ggafraser, GhostModern, Giftlite, Ginsengbomb, Gkorland, Glane23, Glass Sword, Gmm24, GorillaWarfare, Gotocloud, Gozzilli, Grafen, Gram123, Gratridge, Greensburger, GregorB, Grim Littlez, Guillaume2303, Guliolopez, Gxela, HJ Mitchell, Haakon, Hallows AG, HamburgerRadio, HandThatFeeds, HangingCurve, Happyinmaine, Hardduck, Haroldpolo, Harryboyles, Haseo9999, Hatfields, Helotrydotorg, Hemapi, HenryLarsen, Heracles31, Heroeswithmetaphors, Heron, Heyitscory, Hfoxwell, History2007, Hogne, Hu12, Hughesjp, Hutch8700, Huygens 25, IANYL, IDrivebackup, IHSscj, Iamdavinci, Iamthenewno2, Ianlovesgolf, Ianmolynz, Identity20, Imeson, Iminvinciblekris, Imllorente, Imnotminkus, Imtiyaz 81, Imtiyazali4all, Indigokk, Info20072009, IronGargoyle, Ispabierto, Iste Praetor, Itangalo, Itzkishor, IvanLanin, J.delanoy, J04n, JDC321, JForget, JLRedperson, JLaTondre, JMJeditor, JaGa, Jack Bauer00, Jack007, Jackoboss, Jacob Poon, Jakeburns99, Jakemoilanen, Jamalystic, JamesBWatson, JamesLWilliams2010, JanaGanesan, Jandalhandler, Janniscui, Jarble, Jarrad Lewis, Jasapir, Javaeu, Jay-Sebastos, Jayvd, Jbadger88, Jbekuykendall, Jbucket, Jburns2009, Jdlh, Jean.julius, Jef4444, Jeff G., Jeh, Jerichochang97, Jerryobject, Jesse.gibbs.elastra, Jhodge88, Jht4060, Jim1138, JimDelRossi, Jimmi Hugh, Jimsimwiki, Jineshvaria, Jinlye, Jkelleyy, Jldupont, Jleedev, Jmc, Jmillerfamily, Joe1689, JoeBulsak, Johananl, John254, JohnCD, JohnJamesWIlson, JohnnyJohnny, Johnpltsui, Johnuniq, Jojalozzo, Jon.ssss, Jonathan.robie, JonathonReinhart, Jorunn, Jose Icaza, JoshDuffMan, JoshuaZ, Jpgordon, Jtschoonhoven, Julesd, Juliano, Juliashapiro, Jusdafax, Justinc, Jwt1801, JzG, Kawana, Keilana, Kenny Strawn, Kessler, Keynoteworld, Kgfleischmann, Kibbled bits, Kieransimkin, Kimdino, Kkh03, Klilidiplomus, Kmsmgill, Knodir, Knoxi171, Kompere, KotetsuKat, Kozuch, Kpalsson, Krenair, Kridjsss, Kristiewells, Kronostar, Kubanczyk, Kunitakako, Kuru, Kvng, Kyerussell, L Kensington, L200817s, Lainagier, Lairdp, Lamro, LandoSr, LarryEFast, Larrymcp, Laser813, Latha as, Latticelattuce, Lavanyalava, Lbecque, Learn2009, Lee.kinser, Leifnsn, Leinsterboy, LemonairePaides, Lerichard, Letdorf, Li Yue Depaul, Linuxbabu, Lionheartf, Lipun4u, LittleBenW, Littlewild, Liujiang, Liyf, LizardJr8, Lmbhull, Lmp90, Loadbang, Logan, Lone boatman, Lord Roem, Lorddunvegan, Lordfaust, Lotje, LuchoX, Luigi Baldassari, Luk, Lysy, M.O.X, MJ94, MPSTOR, Maasmiley, Mabdul, Macecanuck, Machilde, Macholl, Mahjongg, MainFrame, Makru, Malke 2010, Malleus Fatuorum, Manasprakash79, Mandarax, Manishekhawat, Manop, Manusnake, Mariolina, Mark Arsten, Mark Renier, MarkCPhinn, Markoo3, Markqu, Marokwitz, MarshallWilensky, Martin.ashcroft, Martinwguy, Materialscientist, Mathematrucker, Mathglot, Mathonius, Mattg82, Maurice Carbonaro, MaxedoutnjNJITWILL, Mayur, Mbblake, Mckaysalisbury, Mdd, Meitar, Metapsyche, Metatinara, Mheikkurinen, Mhodapp, Michael Hardy, Michaelmas1957, MichealH, Miguelcaldas, Mike Rosoft, Mild Bill Hiccup, MillinKilli, Millycylan, Mindmatrix, Miracle Pen, Miym, Mkdonqui, Mlavannis, Mmanie, Mobilecloud, Mohamed Magdy, Momoricks, Monkey Bounce, Monkeyfunforidiots, Moonriddengirl, Mortense, Moswento, Motyka, Mpcirba, Mr little irish, MrKG, MrMarmite, MrOllie, Msfitzgibbonsaz, Msileinad, Muhandes, Mwarren us, Mwmaxey, Mychalmccabe, Nabeez, Nadeemamin, Nakakapagpabagabag, Nasnema, Nasserjune, Natishalom, NatterJames, Naturelover007, Navywings, NawlinWiki, Nbarth, NeD80, Neko-chan, Nelliejellynoonaa, Nesjo, Neustradamus, NewEnglandYankee, Nfr-Maat, Nick Number, Nightscream, Nikoschance, Nimbus prof, Niri.M, Njcaus, Njkool, Nnemo, No1can, NocturneNoir, Nogburt, NonNobisSolum, Nookncranny, Nopetro, North wiki, Northamerica1000, Now3d, Nurnware, Ny156uk, Nyllo, OSborn, Obdurodon, Ohnoitsjamie, Ohspite, Oleg Alexandrov, Olegwiki, OllieFury, Oluropo, Om23, OmbudsTech, Onmytoes4eva, Optatus, Orange Suede Sofa, Os connect, OsamaBinLogin, OsamaK, Oskilian, Otto ter Haar, Ourhistory153, PCHS-NJROTC, PJH4-NJITWILL, PMstdnt, Padmaja cool, Paj mccarthy, Palapa, Panoramak, Parveson, Patrickwooldridge, PaulQSalisbury, Paulmmn, PavelSolin, PcCoffee, Pea.hamilton, Pegordon, Person1988, Pete maloney, Peterduffell, Pgan002, Pgj1997, Phazoni, Phil.gagner, PhilJackson, Philg88, Philopappos86, Piandcompany, Pinethicket, Piperh, Pisapatis, Pleft, Pmell, Pmj, Pmorinreliablenh, Pointillist, Poliverach, Pontificalibus, Popnose, Porqin, Pottersson, Ppachwadkar, Priyo123, Pumpmeup, Qarakesek, Qoncept, Quaeler, Quantling, QuentinUK, Qusayfadhel, R39132, RHaworth, RLSnow, RS-HKG, Rakesh india, Ramnathkc, Ramu50, Randhirreddy, Rangoon11, Ras67, Raushan Shahi, RayAYang, Razorflame, Ready, RealWorldExperience, Realtimer, Reaper Eternal, Rebuker, Recognizance, Reconsider the static, Reservoirhill, Reshadipoor, RetiredWikipedian789, Rich Farmbrough, Rich Janis, Richard Harvey, Richard.McGuire88, Richramos, Rickyphyllis, Rjwilmsi, Rms77, Rnb, Robert.Harker, Robert.ohaver, Robofish, Robscheele, Rolf-Peter Wille, Ronen.hamias, Ronz, Rosiestep, Rossumcapek, Rotiro, Rsiddharth, Rursus, Rutger72, Ruud Koot, Rw2, Ryanrs, SBunce, SC, SPACEBAR, Sakaal, Salix alba, SamJohnston, Samuraiguy, San2011, Sandipk singh, Sanjiv swarup, Sanmurugesan, Sanspeur, Sapenov, Sarathc, SarekOfVulcan, Sariman, SchreyP, Scientus, Scott A Herbert, ScottSteiner, Scottonsocks, Scwlong, Sean R Fox, Seaphoto, Sebastiangarth, Sfiteditor, Sfoskett, Shadowjams, Shajulin, Shakeelrashed, Shan.rajad23, Sharktopus, Shawnconaway, Shaysom09, Shierro, Shinerunner, Shirt58, Shoaibnz, ShortBus, Shri ram r, Shubhi choudhary, Siddii, Silver seren, Simonmartin74, Sissyneck, Sivanesh, Sjames1, Sk8trboi199, Skarebo, Skizzik, Skovigo, Skrewler, Skumarvimal, Skyrail, Slady, Slwri, Smileyranger, Smjg, Snaxe920, Snideology, Snowolf, Snydeq, SoWhy, SocialRadiusOly, Softdevusa, Songjin, Sp33dyphil, SpecMode, Spiritia, Spojrzenie, Sponsion, Sql er2, SqueakBox, Srastogis, Ssabihahmed, Staceyeschneider, Stbrodie1, StefanB sv, Stephan Leeds, Stephenb, SteveLoughran, Steven Walling, Stevenro, Steveozone, Stevevictor134, Stickee, Stickyboi, StrategicBlue, Stuart.clayton.22, Stuartyeates, Subbumv, SudoGhost, Sumdeus, SunnySideOfStreet, Superbeecat, Superm401, Surajpandey10, SuzanneIAM, Sweerek, Swolfsg, Syp, TJK4114, Taicl13, Tangurena, TastyPoutine, Tatatemc, Tbhotch, Tdp610, Techman224, Technobadger, TechyOne, Teles, Terrillja, Tgeairn, ThaddeusB, The High Fin Sperm Whale, The Thing That Should Not Be, The wub, TheNightFly, Thecheesykid, Themfromspace, Thesurfpup, Thewebartists013, Thexman20, ThomasTrappler, Thorwald, Thumperward, Tiburondude, Tide rolls, TimFreeman701, Timwayne, Tinton5, Tisane, Tkdcoach, Tmguru, Tobias Bergemann, Tobych, Tom Morris, Tom6, Tomhubbard, Tony1, TonyHagale, Tonyshan, Torturedgenius, Tpbradbury, Tramen12, Tree Hugger, Trusilver, Trödel, Tsmitty31, Tungsten, TutterMouse, Tuwa, TylerFarell, Tylerskf, Ugarit, Ummelgroup, Uncle Dick, UncleBubba, UncleDouggie, Understated1, Undiscovered778, Uozef, Utcursch, ValStepanova, Valenciano, Valentina Ochoa, Vasiliy Faronov, Vasq0123, Veerapatr.k, Vegaswikian, Velella, Vermtt, VernoWhitney, Vertium, VicoSystems1, Victor falk, Vicweast, Vigilius, VijayKrishnaPV, Vipinhari, Viralshah0704, Vishwaraj.anand00, Vmops, Vondruska, Vpfaiz, Vpolavia, Vranak, Vrenator, Wackywace, Waggers, Walshga, Wasami007, Washburnmav, Watal, Wavelength, WeisheitSuchen, Weternuni, Whitehatpeople, Whitehouseseo, WikHead, Wiki13, WikiLaurent, WikiPuppies, WikiScrubber, Wikid77, Wikipelli, Willit63, Windchaser, Winterst, Wizzard, Wlouth, WookieInHeat, Wooptoo, WootFTW, Wretman, Wtmitchell, Wtsao, Wtshymanski, Xe7al, Xiler, Xinconnu, Xxyt65r, Y2l2, YUL89YYZ, Yaraman, Yaris678, Ycagen, Yerpo, Yinchunxiang, Yonidebest, YoungManBlues, Yuchan0803, Yworo, ZX81, Zachnh, Zevnik, Zfish118, anonymous edits דוד שי, Zhanghaisu, Zundark, Zzuuzz, Çalıştay, Σ, 2579

Multi-touch Source: http://en.wikipedia.org/w/index.php?oldid=513230055 Contributors: A Generic User, ALH, Agadez, Ah29, Airplaneman, Akronym, Aladdin Sane, Alan Liefting, Alanburchill, Alex.tan, Alexanderaelberts, Andyso, Anthony-Carlier, Antimatter15, AppleProductFan, Aymen ka, Baba786, Bearcat, Beetstra, BenFrantzDale, Bencey, Betjangee, Bigbluefish, Bill G. Evans, Bitlera2005, BlacKeNinG, Blacksurface, Blaxthos, Bramboot, Brianlucas, Bubuka, Cambridgeblue1971, Capricorn42, CaseyPenk, Caseymrm, Cerupcat, Charlestephen, Chowbok, ChrisHodgesUK, Chuq, Ckatz, Cocoaguy, CodenameCueball, ConradKilroy, CraigD1993, Crookedfoot, Crysb, Curly Turkey, DMacks, Dancter, Darrell Greenwood, Datagutten, Dcalfine, Deepfry3, Defyingapple, DenisRS, DennisIsMe, Denpick, Destin, Dfd123, Diego Moya, Dougher, Dougie WII, Dr. Crash, Dragice, Dreaded Walrus, East718, Edinburghmac, Eloil, Epson291, Eptin, Erutuon, Evildeathmath, Evren TANRIKULU, Excirial, Farshadachak, Floemuc, FlyingggFish, Fresheneesz, Fru1tbat, GGShinobi, Gbleyan, Gblindmann, Gco, Geekstuff, GermanX, Gfarnam, Glenn, GodFather1921, GoingBatty, GorillaWarfare, Gronky, Guus van Gestel, Guy Harris, Gyro Copter, Haakon, Hanii Puppy, Holden15, Homam.Alghorani, IGeek, Illegal Article Sources and Contributors 165

Operation, Interframe, Intgr, Irish spring69, Itsmelah, Ivucica, Ixmat, Jagged 85, JamesHaigh, Jebba, Jerome Charles Potts, Jesse Viviano, Jevansen, Jewenet, Jim1138, Jj137, Jkeller87, Jmrowland, JoTp, JoeJohnson2, Joeybabe, JohnSawyer, Julesd, Justanother, Kalban, Kaosrok, Katharineamy, Kedadi, KennethMPennington, Kenny Strawn, Khukri, Kieff, KillerCommz, Kitfaaace, Kockgunner, Kool2bchilln, KristinMiller, Kvng, Lbecque, Lester, Levendt1, Lexischemen, Lhommemagique, LightSpeed, Lindsey10, Lmsilva, LtWv, Lucello, M gol, MER-C, Mac, Madkayaker, Madlovba, Magioladitis, Magneei, Mark Arsten, Martarius, MartinKraemer, Mathmo, MaxTheMan36, Maxí, Mephiles602, Mepler, Mergatroid212, Michael tarr, Mindmatrix, Mingzhe0908, Miquonranger03, Mmmtouch, Moment Deuterium, Mr pig face, MrOllie, Multitouch, MySchizoBuddy, Mygerardromance, Mzajac, Nanika88, Naturalui, Navwikiadroit, Nealmcb, Nickoutram, Ninjagecko, Nixdorf, Nmehta0, Nopetro, Normanmargolus, Notedgrant, Nwk, Oaragones, Oatmealr101, Old Guard, Ondenc, Orphan Wiki, OverlordQ, POTA, Patrick DeepPlaya, Paultaele, PenComputingPerson, Peter S., Phaelex, PhilKnight, Philu, Pineapple fez, Pisharov, Platypus222, Pnm, Prasadrs, PsyberS, RW Marloe, Raghavkvp, RainbowCrane, Rasmasyean, Realhiph, Redsoxsux132, Ribtor, Rich Farmbrough, Ricktt, RiyaKeralore, Robbosher, Robertinventor, Robinsw, Rockysmile11, Roguegeek, Ronz, Rosieate, Rsrchandran, Russer3232, SF007, Sabramovich, ScottyWZ, SeanAhern, Secret Saturdays, Seresin, Shantavira, Shaun Ewing, Sifi899, Siliconov, Simplyakm, Sj, Slark, Sorek73, Sourcetech, Spellcast, Spirarel, Stamatron, StaticGull, Stephen, Stepho-wrs, SteveSims, Steven Walling, Stumppi23, Swamp Ig, Tanner65, Tbonge, Tgeairn, TheJames, Thecurran91, Theoboyd, Theroadislong, ThibaudG, Thomas-pluralvonglas, Thumperward, TiagoTiago, Tikitpok, Tilmonen, Timneu22, Tiptyper, Tktktk, Tmcw, Tokori, Totakeke423, TouchInternational, Toussaint, TrevorLSciAct, Ttrygve, Unimath, UseAndLearn, Utcursch, UweLX, ViperSnake151, Vishahu, W00fd06, WKW, Wallen-zhou, Warpflyght, Wasbuxton, Washburnmav, Wikitürkçe, Willtron, Woodega, Wwmarket, Wyatt915, Xavcam, Xlotlu, YUL89YYZ, Yahya Abdal-Aziz, Z897623, Zedenstein, Δ, 636 anonymous edits Image Sources, Licenses and Contributors 166 Image Sources, Licenses and Contributors

File:Loudspeaker.svg Source: http://en.wikipedia.org/w/index.php?title=File:Loudspeaker.svg License: Public Domain Contributors: Bayo, Gmaxwell, Gnosygnu, Husky, Iamunknown, Mirithing, Myself488, Nethac DIU, Omegatron, Rocket000, Shanmugamp7, The Evil IP address, Wouterhagens, 22 anonymous edits File:Google_Analytics_Sample_Dashboard.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Google_Analytics_Sample_Dashboard.jpg License: Public Domain Contributors: bloggingehow File:Decision Support System for John Day Reservoir.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Decision_Support_System_for_John_Day_Reservoir.jpg License: Public Domain Contributors: USGS: Project contact Michael J. Parsley, U.S. Geological Survey File:Drought Mitigation Decision Support System.png Source: http://en.wikipedia.org/w/index.php?title=File:Drought_Mitigation_Decision_Support_System.png License: Public Domain Contributors: NASA Image:Data warehouse overview.JPG Source: http://en.wikipedia.org/w/index.php?title=File:Data_warehouse_overview.JPG License: Public Domain Contributors: Hhultgren File:George Colony in 2011.jpg Source: http://en.wikipedia.org/w/index.php?title=File:George_Colony_in_2011.jpg License: Creative Commons Attribution 2.0 Contributors: Magnus Höij from Stockholm, Sweden File:Energy Star logo.svg Source: http://en.wikipedia.org/w/index.php?title=File:Energy_Star_logo.svg License: Public Domain Contributors: Fred J, Rocket000, Tetris L, 1 anonymous edits File:Barabasi Albert model.gif Source: http://en.wikipedia.org/w/index.php?title=File:Barabasi_Albert_model.gif License: Creative Commons Attribution-Sharealike 3.0 Contributors: Horváth Árpád File:Network self-organization stages.png Source: http://en.wikipedia.org/w/index.php?title=File:Network_self-organization_stages.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:Takemori39 File:Social-network.svg Source: http://en.wikipedia.org/w/index.php?title=File:Social-network.svg License: Public Domain Contributors: User:Wykis File:Social Red.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Social_Red.jpg License: Creative Commons Attribution-Share Alike Contributors: Usuario:Daniel TenerifeDaniel Tenerife File:Scale-free network sample.png Source: http://en.wikipedia.org/w/index.php?title=File:Scale-free_network_sample.png License: GNU Free Documentation License Contributors: It Is Me Here File:Diagram of a social network.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Diagram_of_a_social_network.jpg License: Public Domain Contributors: Katharinewillis. File:WorldWideWebAroundWikipedia.png Source: http://en.wikipedia.org/w/index.php?title=File:WorldWideWebAroundWikipedia.png License: GNU Free Documentation License Contributors: User:Chris 73 File:Kencf0618FacebookNetwork.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Kencf0618FacebookNetwork.jpg License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:Kencf0618 File:Cloud computing.svg Source: http://en.wikipedia.org/w/index.php?title=File:Cloud_computing.svg License: Creative Commons Attribution-Sharealike 3.0 Contributors: Sam Johnston File:Cloud computing layers.png Source: http://en.wikipedia.org/w/index.php?title=File:Cloud_computing_layers.png License: unknown Contributors: User:CommonsHelper2 Bot, User:Jan Luca, User:Magnus Manske File:Cloud computing types.svg Source: http://en.wikipedia.org/w/index.php?title=File:Cloud_computing_types.svg License: Creative Commons Attribution-Sharealike 3.0 Contributors: Sam Joton File:CloudComputingSampleArchitecture.svg Source: http://en.wikipedia.org/w/index.php?title=File:CloudComputingSampleArchitecture.svg License: Creative Commons Attribution-Share Alike Contributors: Sam Johnston, Australian Online Solutions Pty Ltd Image:Multitouch screen.svg Source: http://en.wikipedia.org/w/index.php?title=File:Multitouch_screen.svg License: Creative Commons Sharealike 1.0 Contributors: User:Willtron file:CERN-Stumpe Capacitance Touchscreen.jpg Source: http://en.wikipedia.org/w/index.php?title=File:CERN-Stumpe_Capacitance_Touchscreen.jpg License: Creative Commons Attribution Contributors: Maximilien Brice File:Apple iPad Event03.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Apple_iPad_Event03.jpg License: Creative Commons Attribution 2.0 Contributors: matt buchanan File:Gestures Tap.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Tap.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Double Tap.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Double_Tap.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Long Press.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Long_Press.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Scroll.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Scroll.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Pan.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Pan.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Flick.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Flick.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Two Finger Tap.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Two_Finger_Tap.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Two Finger Scroll.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Two_Finger_Scroll.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Pinch.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Pinch.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Unpinch.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Unpinch.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 File:Gestures Rotate.png Source: http://en.wikipedia.org/w/index.php?title=File:Gestures_Rotate.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:GRPH3B18 License 167 License

Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/