Data Mining Industry Report

Total Page:16

File Type:pdf, Size:1020Kb

Data Mining Industry Report

Data Mining 1

Data Mining Industry Report

Section 1: General information about the data mining industry

Professionals in the data mining industry are in demand now and will be for years to come simply because of the data mining is one of the primary building blocks of the cus- tomer relationship management revolution. Employment prospects will be particularly attractive if you combine the statistical techniques you learned in school with the "data detective" skills that can only come from extensive, in-the-trenches experience.

The data mining analyst is the person who understands the information contained in the data and can evaluate whether the output of the analytical or mining stages truly makes sense in the specific business domain. The proper use of predictive models must be carefully integrated into the actual business processes so the models can be properly evaluated and updated. The various data mining tools have certain strengths and cer- tain weaknesses. The tool and its use must be properly matched to the expertise of the user and that person’s objectives in using the tool.

Different levels of analysis: • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. • Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natu- ral evolution. • Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree meth- ods include Classification and Regression Trees (CART) and Chi Square Automatic In- teraction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (un- classified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to cre- ate multi-way splits. CART typically requires less data preparation than CHAID. • Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. • Rule induction: The extraction of useful if-then rules from data based on statisti- cal significance. • Data visualization: The visual interpretation of complex relationships in multidi- mensional data. Graphics tools are used to illustrate data relationships.

There are two basic kinds of data miners.

One has just enough programming proficiency to execute the statistical steps, or proce- dures, required for analytical projects. The other is able to manipulate data in complex and sophisticated ways. Consider, for example, a predictive modeling project where the Data Mining 2 data miner wishes to create a derived field to act as a potential predictor variable and where the analysis file must be manipulated in a complex way to achieve this end.

A data miner with weak programming skills either will have to forgo this variable or de- pend on others for assistance. A data miner with strong skills, however, will meet this challenge with ease.

Often, analysts with weak programming skills work at companies with large statistical staffs, proprietary data mining systems and rigid processes. Some analysts equate data mining with pushing buttons in the prescribed sequence indicated by the company manual. They have little appreciation of the "eureka moments" that occur when labori- ous digging unearths a paradigm-shifting fact or market segment.

You do not want to become one of these individuals.

Section 2: Factors to consider which will influence your career

2.1 The size of the company

At a small company, you will have a greater effect on the organization. However, oppor- tunities for growth might be more limited. Also, there may be fewer chances to latch on to experienced mentors who will push you to achieve your potential.

2.2 Your ability to communicate effectively

Effective communication and an understanding of direct marketing. As an ambitious data miner, you must develop an appreciation of how the results of analytical projects are leveraged by marketers and fit into the company’s overall strategy. The astute ana- lyst soon will realize that it takes much more than just statistics and programming virtu- osity to break into the elite of the industry.

It will be critical to think of yourself as a quantitatively grounded direct marketer rather than as just a technician. As you evolve into a well-rounded business professional, you must develop the ability to communicate clearly and concisely. This will allow you to work effectively with experienced professionals in marketing, sales and business devel- opment, many of whom will have MBAs and more years of experience than you have.

Do not be discouraged if you find it difficult to master the business and communications side of direct marketing. It is understandable because you accustomed to focusing on numbers and code.

Section 3: Types of career paths for data mining analysts

Typically, data miners follow one of two very different career paths. Data Mining 3

Some remain on the technical side and eventually either move up the ranks to manage an entire staff of analysts or transition into the related field of data warehousing and pro- cessing. Others evolve into generalists and become senior-level marketers or strategy consultants.

Establish a solid foundation of statistical techniques and programming skills. Regard- less, you first must establish grounding in the basics of data mining. Strive to develop deep expertise in core analytical techniques such as clustering and predictive modeling. You should also work to become an excellent programmer in the widely used analytical packages such as SAS. Below is a list of some types of data mining job opportunities.

3.1 Junior Data Mining Analyst A junior analyst searches for the appropriate data and provide sufficient material to be analyzed. They have to prepare statistical diagrams and flowcharts.

Duties and responsibilities 1. Understand internal client business needs and how Data Mining fits into their suc- cess: work with team members to identify business issues and research needs; support business planning efforts. 2. Gain proficiency of data sources, tools (such as SAS), and methodologies: work with team members to identify research objectives and applicable methodologies. Gain hands-on experience with all appropriate data sources. 3. Implement projects and report findings: analyze database information in response to data analysis requests. Activities include, but are not limited to: • Provide business intelligence through data analysis and standard report- ing both ongoing and ad-hoc. • Compile, interpret, and analyze data. Deliver work output including written reports and presentations to clients and team members as needed to help facilitate business decision making. • Conduct other data analysis as requested including business metrics, client profiles, and behavior statistics, campaign results and other data mining research. • Aide in data management by actively monitoring data sources to ensure completeness and accuracy. • Support sales & marketing efforts by building processes to analyze, mea- sure, track results of efforts. Administrative • Maintain project activity logs, regularly provide project updates, seek client and team member feedback about the effectiveness and quality of work output, and proactively participate in team meetings.

3.2 Senior Data Mining Analyst A senior analyst has to consult and communicate with the client. He/she has to prepare the final report and present it.

Duties and responsibilities Data Mining 4

Identify cost of business opportunities via data mining. Investigate outliers and research potential operational efficiencies.

Highly collaborative role working with network contract manager, regional directors and provider relations to review cost of industry trends. Operating as the analytical leader for a geo region to understand delivery of care, spend, etc.

Able to interview business experts to understand operational challenges and develop potential solutions. Conducts independent analysis of high complexity under minimal su- pervision and guidance.

Clearly communicates analytical results in presentations, abstracts, graphs or sum- maries with minor editing and input from manager to various levels of management.

Using SAS and other tools, designs, builds and enhances data systems and analysis methods so they the serve complex, high level reporting requirements.

Critically reviews and revises existing analytical processes for efficiency.

Assesses business risks associated with analytical processes and data systems and develops strategies to mitigate risks.

Manages projects and develop workplans which may coordinate activities of lower level analysts and collaborate with other analytical teams. Provides technical guidance to less experienced staff.

3.3 Data Analysis Project Manager Duties include project organization and methodology. The project manager has to audit the reports and make sure that they fulfill the needs of the client

Duties and responsibilities • Led and developed full scale of project plans and executions. • Responsible for more than one cross-company project at a time. • Define the project scope of work, financial plan, its goals and deliverable. • Managed all aspects of the project business plan and budget. • Managed the operational, financial and technological aspects of projects based on time-lines and work plans. • Identified resources requirement, assigned responsibilities and coordinat- ed directly and indirectly project staff to ensure successful completion of the project. • Tracked project deliveries using project management tools. • Managed the design of the project documents to monitor project perfor- mance and data stored. • Reported on project progress and communicated relevant information to superiors. Data Mining 5

• Resolved, traced and escalated critical issues to minimize project risk fac- tors. • Prepared the QA procedure of the project. • Directed, supervised, supported and coordinated the project staff. . • Communicated intensively with clients, sub-contractors and vendors to es- tablish cordial/effective working relationship. • Followed up with clients to verify satisfaction.

3.4 Data Warehouse Analyst

If you think you would like eventually to branch out beyond data mining but want to re- main on the technical side of the business, you will be in an ideal position to transition into the exploding field of data warehousing and processing. As a successful analyst, you will have honed your logic and data detective skills. Also, you will have become an accomplished programmer.

Stories of data warehousing disasters circulate throughout this industry. You will be in an ideal position to avoid these pitfalls, and your employer and clients will recognize this.

An estimated 30,000 new jobs will be created in the direct marketing industry in the next five years. Many will be in data warehousing and processing. Do not be concerned if you need additional training to learn a new programming language, for example. There is such a shortage of experienced personnel that many employers will contribute to- ward, or even pay all of, your tuition.

Duties and responsibilities • Understand the business users’ requirements for information and communicate them to the rest of the data warehouse team; • Lead and conduct interviewing task; • Lead interview documentation; • Assist DW data analyst in analyzing existing reports and identifying iteration met- rics; • Lead preparation of data warehouse requirements document • Assist data analyst in mapping task; • Analyze existing reports; • Lead the identification and documentation of business metrics ; • Determine systems of record with the assistance of appropriate source system experts; • Help identify potential sources of data for the data warehouse; • Oversee testing of data acquisition processes and their implementation into pro- duction; • Act as consultant to the ETL and front-end programmers. Depending on how technical a business analyst is, he or she may also: • Help data modelers prepare models, and • Review models to ascertain that requirements are met. Data Mining 6

3.5 Marketing Analyst

The data mining analyst, who thoroughly understands, from a business perspective, what the client wants to accomplish and assists in translating those business objectives into technical requirements to be used in the subsequent development of the data min- ing model(s).

Duties and responsibilities As a marketing analyst, you'll gather consumer information and examine buying trends to create marketing plans for companies. One of your primary job duties in this career is to design surveys that identify consumer preferences and prospective markets for prod- ucts. You'll conduct these surveys over the phone, on the Internet, through the mail and in focus groups. A marketing analyst usually oversees a team that helps with the survey- ing process. Once this research has been completed, you'll evaluate the feedback and organize it into reports for company use. You'll also advise your employer on what products will be most beneficial to produce, as well as on the design, distribution and promotion of these products. With the information you provide, your employer is able to target the most profitable markets in order to generate the maximum amount of revenue possible.

3.6 Data Mining Research Analyst The Research Analytics team supports the Research department and executive man- agement with strategic planning, business & market intelligence and data mining/model- ing services.

Duties and responsibilities • Perform market data research and analysis to identify and resolve data issues using advanced data mining techniques. • Develop proprietary data mining tools and applications. • Develop predictive models.

3.7 Data Mining Analyst Consultant

Help your clients develop quantitative models for creating strategies for addressing good process for approaching a problem. With these models and strategies your client can solve problems quickly and effectively. As a data mining analyst consultant, you can help your clients to meet challenges effectively and capitalizing on the possible opportu- nities.

Duties and responsibilities

Modeling and Forecasting Build predictive models using advanced statistical techniques making use of the high- -volume data available with the bank. Data Mining 7

Business Strategy Businesses are under tremendous pressure to generate revenues and increase profit. As a consultant you can provide consulting for portfolio of initiatives that drives long- term performance.

Market Research solutions Market research enables the companies to identify opportunities for growth and under- stand how to most effectively position themselves in the market so as to take full advan- tage of the opportunities.

Miscellaneous solutions Converting business data into highly effective and insightful reports and presentations. • Customer design and support • Customer reporting • Development of Automation tool

Section 4: Types of industries which hire data mining analysts

Data mining jobs are found primarily in the technology, finance, healthcare and pharma- ceutical fields. They can range from social media and digital media analysts who focus on enterprise-level data mining to PhD-level quantitative analysts who mine millions of data units for investment banks and hedge funds. In the pharmaceutical industry, data mining analyst jobs tend to focus on statistical work involving analysis of pharmaceutical marketing information and sales.

The ability to effectively cultivate product development capabilities is an important skill to have for anyone considering data mining jobs in the technology field. Particularly in the Internet realm, jobs in data mining are highly valued. Professionals in these posi- tions support the immense data mining work that must be in effect for a consumer-fac- ing technology company to succeed.

Many search engine companies and technology companies that build on search and web crawler technologies, such as social media analytics firms, offer critical data mining job opportunities for those who are qualified. Experience working with web analytics platforms and databases built using Structured Query Language (SQL) constitute the bulk of the data mining jobs found in companies offering search engine technology.

Finance firms all over the world are also places where people with data mining skills are in increasingly high demand. In finance, data mining professionals or, quants as they are more commonly called, are charged with creating better ways to visualize prediction curves, valuation models, and other important aspects of financial quantitative analysis. The data mining job description for quants typically involves a great deal of program- ming work in C++, a popular computer programming language used in banking and en- terprise information technology systems. In addition, a quantitative professional in fi- Data Mining 8 nance or banking must have a strong grasp of Visual Basics for Applications (VBA) to use in Excel modeling and analysis.

Although not thought of as a particularly quantitative field, healthcare firms and large pharmaceutical companies oftentimes present opportunities for data mining jobs. Using statistics to predict future sales or to calculate the amount of risk involved in a product launch or a branding change are some of the tasks that quantitative analysts in pharma- ceutical companies are required to do. Usually a master's or a PhD degree in mathe- matics, statistics, economics, or another quantitative-based discipline is required for this position.

Quantitative analysts for pharmaceutical firms provide much needed insight into which drugs perform best on the market. Their work can also demonstrate why one product performs better than another. Through analyzing product distribution channels as well as constructing financial valuation models, the pharmaceutical quantitative analyst is able to use data mining techniques to serve the firm's interests.

Below is a list of industries that employ data mining analysts.

Casinos Communications Education Financial Services - especially banking, fraud detection, credit scoring, investment/stocks Government/ Military/ Security/ Anti-terrorism Health Care Providers Health Insurance Hotels Insurance Life Sciences Manufacturing Media Advertising Oil & Gas Retail Social Policy/ Survey Analysis Travel & Transportation Utilities Web usage mining

Section 5: Examples of the duties and responsibilities of data mining analysts

Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and cus- tomer demographics. And, it enables them to determine the impact on sales, customer Data Mining 9 satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data.

With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demo- graphic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

For example, Blockbuster Entertainment mines its video rental history database to rec- ommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.

WalMart is pioneering massive data mining to transform its supplier relationships. Wal- Mart captures point-of-sale transactions from over 2,900 stores in 6 countries and con- tinuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. Wal- Mart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identi- fy new merchandising opportunities. In 1995, WalMart computers processed over 1 mil- lion complex data queries.

The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.

By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot.

These general forms illustrate what data mining can do.

Anomaly detection : In a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit. Data Mining 10

Association learning: This is the type of data mining that drives the Amazon recommen- dation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of find- ings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations. Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups.

This is one of the black-box type of algorithms that are hard to understand. But in a sim- ple example - again with purchasing behavior - we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fish- ermen and model airplane enthusiasts would all be quite distinct. Machine learning al- gorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

Classification: If an existing structure is already known, data mining can be used to clas- sify new cases into these pre-determined categories. Learning from a large set of pre- classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this - large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.

Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to in- clude or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.

The patterns detected and structures revealed by the descriptive data mining are then often applied to predict other aspects of the data. Amazon offers a useful example of how descriptive findings are used for prediction. The (hypothetical) association between cocktail shaker and martini glass purchases, for instance, could be used, along with many other similar associations, as part of a model predicting the likelihood that a par- ticular user will make a particular purchase. This model could match all such associa- tions with a user's purchasing history, and predict which products they are most likely to purchase. Amazon can then serve ads based on what that user is most likely to buy. Data Mining 11

Section 6: Career Outlook for data mining analysts

The U.S. Bureau of Labor Statistics (BLS) states that data communication analyst posi- tions would increase 53% from 2008-2018. Data analysts in business settings are re- ported to have had a median salary of $55,053 in 2011, according to Salary.com. The website also reports that experience as a data analyst might slightly increase annual salaries. With less than one year of experience, data analysts had a salary of $51,681- $55,532 in 2011, while four or more years of experience raised the salary range from $53,704-$56,764 per year.With experience and specialization a data mining analyst can earn $120,000 per year.

The use of competitive intelligence by data scientists can pay big dividends to business- es who invest in these services. A May 2011 study by McKinsey Global Institute sug- gests that retailers analyzing large data sets to their fullest could increase operating margins by 60 percent and the health care industry could reduce annual costs by 8 per- cent or $200 billion.

However, the study also warns there is a significant shortage of qualified workers to an- alyze these data sets adequately. According to the report, a shortfall of about 140,000 to 190,000 individuals with analytical expertise is projected by 2018. The study also pre- dicts a need for an additional 1.5 million managers and analysts by that same date to fully engage the true potential of the currently available data.

While it may be conventional wisdom that data is growing exponentially, the actual amount of that growth can be staggering to consider. A 2003 study conducted by the University of California Berkley found that worldwide information production increased 30 percent each year from 1999 until 2002. In 2010, then-Google CEO Eric Schmidt turned heads at the 2010 Techonomy Conference when he said people currently create as much data every two days as was previously created in all of history up to 2003.

Section 7: Consultant

You may opt to freelance or establish a consulting firm that delivers analytical and statis- tical solutions and expert consulting services to identify new insights, drive strategic de- cisions, and create measurable results for your clients. To show clients your value con- sider providing ‘proof-of-concept’. To ensure that the most important business objectives are being met and to ensure the investment in data mining is done in the most cost-ef- fective manner. The proof-of-concept period is used to answer the following questions. · What is data mining? · What do the data mining tools really do? · How should my raw operational data be structured to be compatible with data mining? · Which data-mining tool, or suite of tools, is best suited to meet my business objectives? Data Mining 12

· Is there hard evidence that can be generated by mining my data that shows that my company should invest in data mining and deploy it in my business?

The proof-of-concept process is as follows. 1. Define the business objectives. Start with at most three objectives in order to focus the study. 2. Identify the corporate data that contains information related to those business objectives. 3. Create a sample data set that contains all relevant information. 4. Identify a domain expert(s) to work with a group experienced in knowledge discovery systems. 5. Install the data in a facility that has the computational power to handle the size of the data being examined and which has a suite of knowledge discovery tools suitable to meet the business objectives. 6. The domain expert(s) works with the data mining expert(s) to determine which data mining tool(s) are best suited to meet the business objectives. 7. Extract relationships and patterns from the business data set. 8. The domain expert(s) works with the data mining expert(s) to determine which patterns and relationships are really relevant to the business objectives. Experience in the CDI on a number of data mining projects has shown that surprising results may occur at this stage. Underlying assumptions about how a business works, how the market works, or how the customer behaves may change. 9. Develop models to predict how data mining results can assist in meeting business objectives. 10. The company then decides what level of investment to make in data mining consistent with their business plan.

At this point, a company will have significant evidence of how data mining can be em- ployed to achieve a competitive advantage, training in data mining, and the skeleton of a development plan for using data mining in a cost effective manner.

Section 8: See Emerging Trends & Opportunities PDF

Section 9: Data Mining Analyst Job Boards 1. KD Nugget http://www.kdnuggets.com/jobs/ 2. Analytic Talent http://www.analytictalent.com/ 3. iCrunchData http://www.icrunchdata.com/Statistician-Jobs.aspx 4. StatVista http://www.statvista.com/jobs/default.asp

Summary:

There's now an intellectual consensus in business that the only way to run an enterprise is to use analytics with data scientists to find opportunities. Because of the immense op- portunity for strategic insight buried in all that data corporations now have an unlimited demand for people with background in quantitative analysis. Data Mining 13

Recommended publications