An Approach for Predicting Hype Cycle Based on Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
An Approach for Predicting Hype Cycle Based on Machine Learning Zhijun Ren Shuo Xu Institute of Scientific and Technical Information of China, Institute of Scientific and Technical Information of China, Beijing 100038, China Beijing 100038, China The China Patent Information Center, the State [email protected] Intellectual Property Office, Beijing 100088, China [email protected] Xiaodong Qiao Hongqi Han Institute of Scientific and Technical Information of China, Institute of Scientific and Technical Information of China, Beijing 100038, China Beijing 100038, China [email protected] [email protected] Kai Zhang The China Patent Information Center, the State Intellectual Property Office, Beijing 100088, China [email protected] ABSTRACT Some scholars believe that hype means advertised in publicity Analyzing mass information and supporting insight based on and exaggerated propaganda[1], and the best measure is the analysis results are very important work but it needs much expectation, which means people’s expectation for technology effort and time. Therefore, in this paper, we propose an innovation. Enterprises should use hype cycle to target the approach for predicting hype cycle based on machine learning emerging technologies, and use the concept of digital business for effective, systematic, and objective information analysis transformation to predict future business trends. and future forecasting of science and IT field. Additionally, Gartner believes that hype cycle is a qualitative we execute a comparative evaluation between the suggested decision-making tool, like other management methods, it model and Hype Cycle for Big Data, 2013 for validating the relies mainly on the judgment of experts. And to complete the suggested model and generally used for information analysis hype cycle assessment and prediction of a set of technologies and forecasting. in a certain field, we need to use a variety of evaluation methods. Other scholars have explored how to measure Keywords expectations for innovation, and quantitative indicators are mainly the number of participants[2], the number or the ratio Hype Cycle; Data Mining; Technology Predict; KNN of the technological innovation documents[3], patent statistical data[4] and the search flow of Google and other search engines[5]. When using the network measurement 1. OVERVIEW method, other tools may need to be supplemented: USPTO Hype Cycle is a conceptual model widely used by Gartner, patents, news reports of Google news archive and the official Inc., which can reveal the basic laws of the evolution of website market share of a certain product, etc., these methods technological innovation chain and is a powerful tool to get mainly create the hype cycle by adopting artificial methods. the overall grasp of technological innovation development The InSciTe[6,7] developed by Korea Institute of Science and trend, to get the objective assessment of the maturity of Technology Information adopts the decision tree and technological innovation, to make the reasonable choice on statistical feature analysis method, based on Gartner research, innovative intervention time and to seek the technological to provide the technical life cycle diagram and adopts the innovation late-mover advantage. emerging technologies discovering model to provide the key technologies on the life cycle diagram. In the particular judgment process, there might be problems of force compliance in certain stage. For example, a technology is during the plateau of productivity between the year from 2000 Copyright © 2015 for the individual papers by the papers' authors. to 2005, but its data between 2006 to 2007 is in line with the Copying permitted for private and academic purposes. stage of slope of enlightenment, so the force compliance is This volume is published and copyrighted by its editors. needed to judge the technology life cycle of the last two years[1]. Related reference didn’t include how specific Published at Ceur-ws.org technology term coordinates are obtained. Proceedings of the Second International Workshop on Patent Mining and its Applications (IPAMIN). May 27–28, 2015, Beijing, China. Therefore, the paper uses papers and patent information to realize emerging technology discovering method by machine learning; then acquires some features by feature selection and types of organizations make technology deployed. uses machine learning algorithm to classify and locate the Data from Internet between 2001 and 2014 about Gartner coordinates, and produces the hype cycle according to the Hype Cycle is manually collected on terminology, stage and prediction. coordinate to be used as training data. The paper is structured as follows. The prediction frame of technology maturity is described in Chapter 2. The learning method and model of technology maturity is introduced in 3.2 Feature Calculation Chapter 3. The prediction method and model of technology Technical features are the foundation of technology life cycle maturity is illustrated in Chapter 4. Experiment and analysis discovering model. The technical features calculation uses the are conducted in Chapter 5 and conclusion and forecasting are papers and patents as data, and uses paper made in the last chapter. S(Pp) {Pp , Pp ,, Pp } index 1 2 n , patent index S(Pt) {Pt , Pt ,, Pt } 2. TECHNOLOGY MATURITY 1 2 n and combined index of PREDICTION FRAME S(Ppt) {Ppt , Ppt ,, Ppt } paper and patent 1 2 n as The model of approach for predicting hype cycle based on calculation objects. It can study the interaction and exclusion machine learning includes the learning part and the prediction between papers and patents, and explore the rule of part. The learning model mainly relates to data training. The development between science and technology. data acquiring and learning includes acquiring training data, data annotation and feature calculation. The prediction model Paper index includes paper growth means identification and discovering method of emerging AN k AN k1 technology in certain field and process to discover technology Pp Pp PpGrowthRate k1 system innovation by producing hype cycle and predict AN Pp maturity and discover an emerging technology, as well as to rate , paper relative make sure the input ratio of partial innovation and overall k k1 NPp NPp innovation. It includes the term selection, feature calculation, PpRelativeGrowthRate k1 stage classification, technology position and visible NPp information module. The hype cycle prediction module is growth rate , paper k show in Fig. 1. A Pp Pp AuthorRate AAk1 author rate Pp , paper author growth Ak Ak1 Pp Pp Pp AuthorGrowthRate Ak1 rate Pp , paper institution I k Pp Pp InstitutionRate AI k1 rate Pp , and paper institution growth I k I k1 Pp Pp Pp InstitutionGrowthRate I k1 rate Pp . Patent index includes patent growth AN k AN k1 Pt Pt Pt GrowthRate AN k1 rate Pt , patent relative N k N k1 Pt Pt Pt RelativeGrowthRate N k1 growth rate Pt , inventor k I Pt Pt InventorRate k1 AI Pt Fig.1 Technology maturity prediction frame rate , inventor growth AI k AI k1 Pt Pt Pt 3. TECHNOLOGY MATURITY InventorGrowthRate AI k1 LEARNING rate Pt , application Ak Pt Pt 3.1 Data Acquirement ApplicantRate AAk1 rate Pt , and application growth Since 1995, Gartner began to pay attention to the hype and AAk AAk1 disillusionment along with every appearance of new Pt Pt Pt technologies and innovations, to track the trends of the ApplicantGrowthRate AAk1 technology life cycle, to study the common pattern between rate Pt them, in order to provide guidance of when and where all Combined index of paper and patent includes paper and patent extracted from patents and papers, TLCD model can relative growth determine the stage of the emerging technology adopting the rate five stages of classification of TSKNN algorithm, and the (N k N k1) (N k N k1) specific stages refer to the Gartner's Hype cycle, which Ppt Pp Pp Pt Pt includes Technology Trigger, Peak of Inflated, Expectations RelativeGrowthRate N k1 N k1 Trough of Disillusionment, Slope of Enlightenment, Plateau Pt Pp , of Productivity. k NPp SKNN algorithm is an improvement of KNN algorithm. KNN Ppt algorithm, by computing the distance between the training RatioRate N k paper and patent ratio rate Pt , paper and point from training set and test point from test set, considers patent people growth rate the closest distance having the most similar feature and can be k k1 k k1 classified into the same group, obtaining test markers (A A ) (I I ) characteristic points and the same tag feature training points. Ppt Pp Pp Pt Pt PeopleGrowthRate Ak1 I k1 SKNN mainly considers the time sequence issue of Pp Pt , terminologies to be classified, so the data of next year need to paper and patent people ration rate be larger than the data of last stage, in order to avoid the force classification problem. The specific algorithm is as follows. Ak Ppt Pp (1) Redescribe training technology terminology and feature PeopleRatioRate I k vector, according to feature set. Pt , paper and patent institution growth rate (2) When the technology terminology feature vector reaches, k k1 k k1 18 features should be calculated respectively to establish (I I ) (A A ) feature vector according to age. Ppt Pp Pp Pt Pt InstitutionGrowthRate I k1 Ak1 (3) Select K technical terminologies which are most similar to Pp Pt , new technical year that is to be calculated from the training paper and patent institution ration rate technical terminology set following the formula below, I k Ppt Pp M InstitutionRatioRate k W ik W jk (1) A k 1 Pt Sim(ti ,t j ) . M M ( W 2 )( W 2 ) ik jk k 1 k 1 4. TECHNOLOGY MATURITY PREDICTION (4) Among K neighbors of new terminology, weight of each classification is calculated respectively as follows, 4.1 Terminology Extraction Technology terminology refers to terms used in a certain field, which means concepts, features or relationships in the field. In (2) this paper, terminology extraction is based on keywords of p(x,C j ) Sim(x,di )y(di ,C j ) papers and templates.