Bayesian Count Data Modeling for Finding Technological Sustainability

sustainability

Article Bayesian Count Data Modeling for Finding Technological Sustainability

Sunghae Jun Department of Big Data and Statistics, Cheongju University, Chungbuk 28503, Korea; [email protected]; Tel.: +82-10-7745-5677 Received: 11 August 2018; Accepted: 7 September 2018; Published: 8 September 2018

Abstract: Technology developments change society, and society demands new and innovative technology developments. We analyze technology to understand society and technology itself. Much research related to technology analysis has been introduced in various fields. Most of it has been on patent analysis. This is because detailed and accurate results of research and development are patented. In this paper, we study a new patent analysis method based on the count data model and Bayesian regression analysis. Using the count data model, we analyzed the technological keywords extracted from the collected patent documents. We used the prior distribution of Bayesian statistics to reflect the experience and knowledge of the relevant technological experts in the analysis model. Moreover, we applied the proposed model to find sustainable technologies. Finding and developing sustainable technologies is an important activity for companies and research institutes to maintain their technological competitiveness. To illustrate how our modeling could be applied to real domains, we carried out a case study using the patent documents related to artificial intelligence.

Keywords: count data; Bayesian regression; technological sustainability; Poisson probability distribution; patent analysis

1. Introduction A company with sustainable technology can maintain technological competitiveness in the marketplace [1,2]. Most companies have tried to find their sustainable areas for technological innovation and new product development. Therefore, sustainable technology is an important issue in the management of technology (MOT) [3]. Many academics, research institutes, and companies have studied sustainable technologies. Recently, Kim et al. (2018) published a statistical method for sustainable technology analysis [4]. They considered Bayesian inference and social network analysis for the proposed method and applied their research to the technology domain related to artificial intelligence (AI). They also used the IPC (international patent classification) codes extracted from patent documents as input data for sustainable technology analysis. The IPC is a hierarchical system of technologies for the classification of patents [5]. For example, the IPC code G06F represents electric digital data processing technology [6]. In general, IPC codes cover a wide range of technologies. Thus, it is difficult for us to grasp the detailed technological structure of a specific technology field. In order to overcome this problem, we propose a technology analysis method using patent keywords. The keywords are extracted from patent documents related to specific technology through text mining techniques [7]. Therefore, the technology keyword can represent a more detailed description of a specific technology field than the IPC code for sustainable technology analysis. In addition, we propose a statistical modeling using Bayesian count data analysis for understanding sustainability of a given technology domain. The count of event is the number of times an event occurs [8]. In this paper, each patent keyword is an event, and we analyzed the count data of patent keywords. We considered the Poisson probability distribution for the proposed statistical patent analysis model, because the count

Sustainability 2018, 10, 3220; doi:10.3390/su10093220 www.mdpi.com/journal/sustainability Sustainability 2018, 10, 3220 2 of 12 data of patent keywords are nonnegative integer values [9]. We also combined the Poisson count model with Bayesian regression analysis to build Bayesian count data modeling for finding technological sustainability. In this paper, we looked for sustainable technologies from the predictive keywords that have the most significant impact on the dependent keywords that represent the target technology. For example, if AI (artificial intelligence) is the target technology, ‘artificial’ and ‘intelligence’ become response keywords (variables), and other keywords, except ‘artificial’ and ‘intelligence’, such as ‘learning’, ‘visual’, and ‘language’, are predictive variables. Therefore, we carried out Bayesian count data modeling to find technological sustainability. To show the validity of our modeling, we performed a case study using the patent documents related to AI. The remainder of this paper is organized as follows: In Section2, we show the research background related to our study. We explain the proposed modeling for finding technological sustainability in Section3. The next section illustrates the result of our case study. In the Conclusions section, we conclude our research and describe our future work related to this paper.

2. Sustainable Technology and Patent Analysis In this paper, sustainable technology means a technology that can sustain a company’s technological competitiveness. Therefore, it is important for a company to know what its sustainable technologies are. In the MOT field, many companies and institutes have tried to find their sustainable technologies. Many research results on sustainable technology analysis have been published in academia [1,2,4,10,11]. Sustainable technology analysis has been made using diverse analytical methods in various technological fields. However, research on this field is still lacking. Companies and research institutions demand more sophisticated and feasible methodologies for sustainable technology analysis. Most methods for sustainable technology analysis rely on patent analysis. This is because patents contain accurate and vast results on the research and development of technology. This is due to the exclusive right to use the technology granted to the inventor. For this reason, many studies related to patent analysis have been made [12–16]. In patent analysis, we should transform the collected patent documents into structured data consisting of keywords, IPC codes, citations, etc. for statistical analysis. In the preprocessing process of patent data, we used the R data language and its ‘tm’ package [7,17]. Using the structured patent data, we performed the proposed modeling for finding technological sustainability.

3. Finding Technological Sustainability Using Bayesian Count Data Modeling To find the sustainability of a given technology field, we need the structured data applied to Bayesian count data modeling. Therefore, to prepare the structured data, we extracted the patent keywords from the collected patent documents using text mining techniques in Figure1. In our text mining process, first of all, we needed to determine the target technology for statistical analysis. Next, we used a retrieved equation to collect the patent documents related to target technology from the patent databases in the world. It is impossible to analyze the searched patent document data directly, because the data are not suitable for inputting data for statistical analysis, including Bayesian count data modeling. Thus, we tried to make structured data for statistical analysis. The first step in creating structured data is to create a corpus collecting and interpreting text documents. Based on the created corpus, we transformed the patent documents into plain texts, and cleaned the text data by eliminating whitespace, removing step-words (“and”, “for”, “in”, “is”, etc.), stemming, and filtering. Finally, we made the document–keyword matrix as structured data for Bayesian count data modeling. The matrix consisted of patents (rows) and keywords (columns), and its elements were the frequency (count) values of keywords occurring in each patent document. Next, we built a methodology using the Bayesian count data model to analyze the structured data for finding sustainable technology. Sustainability 2018, 10, x FOR PEER REVIEW 2 of 11 considered the Poisson probability distribution for the proposed statistical patent analysis model, because the count data of patent keywords are nonnegative integer values [9]. We also combined the Poisson count model with Bayesian regression analysis to build Bayesian count data modeling for finding technological sustainability. In this paper, we looked for sustainable technologies from the predictive keywords that have the most significant impact on the dependent keywords that represent the target technology. For example, if AI (artificial intelligence) is the target technology, ‘artificial’ and ‘intelligence’ become response keywords (variables), and other keywords, except ‘artificial’ and ‘intelligence’, such as ‘learning’, ‘visual’, and ‘language’, are predictive variables. Therefore, we carried out Bayesian count data modeling to find technological sustainability. To show the validity of our modeling, we performed a case study using the patent documents related to AI. The remainder of this paper is organized as follows: In Section 2, we show the research background related to our study. We explain the proposed modeling for finding technological sustainability in Section 3. The next section illustrates the result of our case study. In the Conclusions section, we conclude our research and describe our future work related to this paper.

2. Sustainable Technology and Patent Analysis In this paper, sustainable technology means a technology that can sustain a company's technological competitiveness. Therefore, it is important for a company to know what its sustainable technologies are. In the MOT field, many companies and institutes have tried to find their sustainable technologies. Many research results on sustainable technology analysis have been published in academia [1,2,4,10,11]. Sustainable technology analysis has been made using diverse analytical methods in various technological fields. However, research on this field is still lacking. Companies and research institutions demand more sophisticated and feasible methodologies for sustainable technology analysis. Most methods for sustainable technology analysis rely on patent analysis. This is because patents contain accurate and vast results on the research and development of technology. This is due to the exclusive right to use the technology granted to the inventor. For this reason, many studies related to patent analysis have been made [12–16]. In patent analysis, we should transform the collected patent documents into structured data consisting of keywords, IPC codes, citations, etc. for statistical analysis. In the preprocessing process of patent data, we used the R data language and its ‘tm’ package [7,17]. Using the structured patent data, we performed the proposed modeling for finding technological sustainability.

3. Finding Technological Sustainability Using Bayesian Count Data Modeling To find the sustainability of a given technology field, we need the structured data applied to SustainabilityBayesian count2018, 10 data, 3220 modeling. Therefore, to prepare the structured data, we extracted the patent3 of 12 keywords from the collected patent documents using text mining techniques in Figure 1.

Figure 1. Text mining process for creating structure data for Bayesian count data modeling.

In our research, we considered Bayesian count data modeling with a Poisson distribution for ﬁnding technological sustainability. The Poisson probability distribution is the most popular model for count data. If the random variable Y is distributed to Poisson with parameter λ, its distribution is deﬁned as follows [9]: Y ∼ Poisson(λ).

e−λλy P(Y = y|λ) = , y = 0, 1, 2, . . . (1) y! where the expectation E(Y) and variance of Y are equal to parameter λ. In addition, the likelihood function of Poisson random variable Y is as follows [18,19]:

n 1 yi −λ ∑ yi −nλ f(y1, y2,..., yn|λ) = ∏ λ e ∝ λ e (2) i=1 yi!

Equation (4) is in the form of λce−dλ, and this is Gamma distribution with parameters c and d. We can therefore select Gamma distribution as conjugate prior for the Poisson parameter. In this paper, the frequency of each patent keyword extracted from patent document data is a Poisson random variable with parameter λi as follows:

frequency of keywordi ∼ Poisson(λi), i = 1, 2, . . . , m (3) where m is the number of all keywords. In our modeling, we deﬁne the frequency (count) of ith keyword, the occurring keyword as yi, and represent the data set as follows [9]:

yi ∼ Poisson(λi),E(yi) = Var(yi) = λi, i = 1, 2, . . . , m (4)

Using this data set, we performed the generalized linear model (GLM) with Poisson probability distribution, no predictors, and log link function as follows:

log (λi) = β0 or λi = exp (β0), i = 1, 2, . . . , m (5)

We determined the response and predictor variables (keywords) by the results of the GLM. The parameter from this constant GLM model with Poisson distribution is similar to the mean of Poisson random variable. In this paper, therefore, we used the Poisson mean value by MLE (maximum likelihood estimator) instead of the constant GLM model. In addition, we considered the Sustainability 2018, 10, 3220 4 of 12 full GLM model using all predictor variables as well as the constant model. In our study, we used variables with large coefficient values as response variables and those with small coefficient values as explanatory variables. In Figure2, Bayesian modeling has increasingly been used in diverse data analysis areas, such as regression and classification. This is one of two approaches to statistics. We started Bayesian count dataSustainability modeling 2018, from 10, x FOR the PEER following REVIEW expression [18]: 4 of 11

PP((yy||θθ))PP((θθ)) P(Pθ(|θy|)y) == (6)(6) PP((yy)) wherewhereθ θis is the the model model parameter, parameter, and andy is 푦 the is response the response variable variable to be predicted.to be predicted.P(θ) and P(θP)( θand|y) areP(θ the|y) priorare the and prior posterior and probabilities posterior probabilities of parameter ofθ respectively. parameter Pθ(y |respectively.θ) represents theP(y| likelihoodθ) represents function the oflikelihoody given θ function, respectively. of 푦 Additionally,given θ, respectively. P(y) is calculated Additionally by the, followingP(y) is calculated integration by [19 the]: following integration [19]: Z P(y) = P(y|θ)P(θ)dθ (7) P(y) = ∫ P(y|θ)P(θ)dθ (7)

FigureFigure 2.2. DeterminationDetermination ofof responseresponse variablevariable byby generalizedgeneralized linearlinear modelmodel (GLM)(GLM) results.results.

UsingUsing Bayesian Bayesian modeling, modeling, we we determined determine thed modelthe model parameter parameter of posterior of posterior distribution. distribution. Because weBecause were interestedwe were interested in the mean in of the the mean parameters, of the weparameters had to select, we aha priord to distribution select a prior to begindistribution Bayesian to modeling.begin Bayesian In general, modeling. noninformative In general, n oroninformative informative priorsor informative can be used priors for can the priorbe used distribution for the prior in Bayesiandistribution modeling in Bayesian [9,19 ].modeling In our research, [9,19]. In we our used research, the informative we used the prior informative to get the prior updated to get result the forupdated the parameter result for estimation. the parameter However, estimation. we needed However to carry, we outneeded Bayesian to carry computing out Bayesian such ascomputing Markov Chainsuch as Monte Markov Carlo Chain (MCMC) Monte forCarlo using (MCMC) the informative for using priorthe informative [19]. To alleviate prior [19]. the To computational alleviate the burden,computational we were burden, able to we use were conjugate able to prior. use conjugate prior. InIn this this paper, paper, we we denoted denoted thethe response response variable variable as as YY,, and and the the explanatory explanatory variables variables as as (x x x ) p (푥11,, 푥22,...,, … , 푥푝p) , wherewhere is p theis number the number of explanatory of explanatory variables. variables. We selected W thee select keywordsed the representing keywords targetrepresenting technology target as response technology variables, as response and the variable remainings, and keywords, the remaining except the keywords response, except variables, the wereresponse used variables as explanatory, were used variables. as explanatory For example, variables. if our For target example, technology if our target was AI, technology we selected was theAI, keywordswe selected ‘Artiﬁcial’ the keywords and ‘Intelligence’ ‘Artificial’ and as ‘Intelligence’ response variables. as response For Bayesian variables. count For Bayesian data modeling, count data we constructedmodeling, we the construct Poisson regressioned the Poisson model regression with Gamma model distribution with Gamma as prior. distribution The Poisson as regression prior. The modelPoisson is regression deﬁned as model follows is [ 9defined,20]: as follows [9,20]: 퐸(Y|푥) = exp(푥 훽 )exp(푥 훽 ) ⋯ exp(푥 훽 ) E(Y|x) = exp (x1β11)1exp (x22β22) ··· exp (x푝pβ푝 p ) (8)(8) where β is the regression parameter. Additionally, an informative Gamma prior for 휆 is as follows: 푒−푏휆휆푎−1푏푎 P(λ) = (9) Γ(푎) 푎 푎 where Γ(∙) is Gamma function, and E(휆) and Var(휆) are and , respectively. This expression 푏 푏2 is used for the likelihood in Bayesian count data modeling. Thus, using the likelihood and prior distributions, we show the posterior distribution as follows:

푒−푏휆휆푎−1푏푎 P(λ|Y) = exp(−푛휆 + 푛푌̅ log 휆 − ∑푛 log 푌 !) × (10) 푖=1 푖 Γ(푎) Sustainability 2018, 10, 3220 5 of 12 where β is the regression parameter. Additionally, an informative Gamma prior for λ is as follows:

e−bλλa−1ba P(λ) = (9) Γ(a)

(·) ( ) ( ) a a where Γ is Gamma function, and E λ and Var λ are b and b2 , respectively. This expression is used for the likelihood in Bayesian count data modeling. Thus, using the likelihood and prior distributions, we show the posterior distribution as follows:

n e−bλλa−1ba P(λ|Y) = exp (−nλ + nY log λ − log Y !) × (10) ∑ i ( ) i=1 Γ a

SustainabilityWe can ignore 2018, 10, x the FOR terms PEER REVIEW not involving λ, so we yield the proportional result of posterior5 of 11 distribution as follows: We can ignore the terms not involving 휆, so we yield the proportional result of posterior distribution as follows: P(λ|Y) ∝ exp(−(n + b)λ + (nY + a − 1) log λ) (11) P(λ|Y) ∝ 푒푥푝(−(푛 + 푏)휆 + (푛푌̅ + 푎 − 1)log 휆) (11) This expression represents the kernel of the Gamma distribution with parameters (n + b) and (nY + a).T Inhis addition,expression by represents the characteristic the kernel of of Gamma the Gamma distribution, distribution the with posterior parameters mean and(푛 + variance푏) and of ̅ (푛n푌Y+a푎). In addition,nY+a by the characteristic of Gamma distribution, the posterior| mean and variance λ are n+b andn푌̅+푎 2 , respectively.n푌̅+푎 In the Bayesian Poisson regression case, Yi xi is distributed Poisson of 휆 are (n +andb) , respectively. In the Bayesian Poisson regression case, 푌 |푥 is distributed 푛+푏 0(푛+푏)2 푖 푖 with mean λi = exp(xi β), where β is the parameter vector of Poisson regression. In our research, (Y|x) Poisson with mean 휆 = exp(푥 ′훽), where 훽 is the parameter vector of Poisson regression. In our is shown as (response keyword푖 |푖 explanatory keywords). Therefore, we get the Bayesian count data research, (Y|x) is shown as (response keyword | explanatory keywords). Therefore, we get the modeling as follows: Bayesian count data modeling as follows: Yi|xi ∼ Poisson(λi) (12) 푌 |푥 ~푃표푖푠푠표푛(휆 ) (12) 0 푖 푖 푖 2 λi = exp(xi β) + αi, αi ∼ Normal(0, σ ) (13) 휆 = 푒푥푝(푥 ′훽) + 훼 , 훼 ~푁표푟푚푎푙(0, 휎2) (13) P푖(β) : Gamma푖 prior푖 푖 density of β (14) P(β): Gamma prior densityn of β (14) λ|y1, y2,..., yn ∼ Gamma(c + ∑ yi, d + n) (15) 푛i=1 λ|푦1, 푦2, … , 푦푛~퐺푎푚푚푎(푐 + ∑푖=1 푦푖 , 푑 + 푛) (15) where c and d are deﬁned in Equation (4). The posterior distribution is computed by multiplying given where c and d are defined in Equation (4). The posterior distribution is computed by multiplying priorgiven and likelihoodprior and likelihood based on based data, andon data, the calculatedand the calculated posterior posterior is used is as used a prior as a at prior the nextat the modeling. next Usingmodeling. this modeling, Using this we modeling, built a technology we built a technology structure tostructure understand to understand the target thetechnology target technology from the viewpointfrom the of viewpoint sustainability of sustainability in Figure3 .in Figure 3.

FigureFigure 3. 3.Technology Technology structure structure ofof Bayesian count count data data modeling. modeling.

with parameter 휆푖 . From the final result of Bayesian count data modeling, we got the regression coefficients β between response and explanatory variables (keywords). Using β, we built a technological structure of a target domain for sustainable technology management. This description is related to Figure 3. That is, we selected the keywords with high impact using the Poisson parameter, which is mean value of Poisson random variable (keyword). Next, we extracted the final predictor keywords by comparing probability value (p-value) of the regression parameters. Therefore, we selected the meaningful predictor variables in two steps. First, the predictor variables with large Poisson parameters were selected. Next, the predictor variables with relatively large regression coefficients were chosen. In order to make the scale of variables equal, we standardized variables Sustainability 2018, 10, 3220 6 of 12

In our modeling, all keywords except response were used as explanatory keywords. In Figure3, the keyword at the beginning of the arrow indicates the explanatory variable, and the keyword at the end of the arrow indicates the response variable. Additionally, each keyword is distributed to Poisson with parameter λi. From the final result of Bayesian count data modeling, we got the regression coefficients β between response and explanatory variables (keywords). Using β, we built a technological structure of a target domain for sustainable technology management. This description is related to Figure3. That is, we selected the keywords with high impact using the Poisson parameter, which is mean value of Poisson random variable (keyword). Next, we extracted the final predictor keywords by comparing probability value (p-value) of the regression parameters. Therefore, we selected the meaningful predictor variables in two steps. First, the predictor variables with large PoissonSustainability parameters 2018, 10, x FOR were PEER selected. REVIEW Next, the predictor variables with relatively large regression6 of 11 coefficients were chosen. In order to make the scale of variables equal, we standardized variables beforebefore proceedingproceeding withwith BayesianBayesian countcount datadata modeling.modeling. BayesianBayesian countcount datadata modelingmodeling isis basedbased onon thethe followingfollowing conceptconcept inin FigureFigure4 4..

FigureFigure 4.4. Concept of Bayesian count datadata modeling.modeling.

InIn thisthis paper,paper, wewe triedtried toto combinecombine thethe expert’sexpert’s subjectivesubjective knowledgeknowledge andand objectiveobjective resultresult fromfrom patentpatent data data analysis. analysis. That That is, the is, prior the prior represented represent theed domain the domain knowledge knowledge of experts, of and experts, the likelihood and the denotedlikelihood the denote objectived the data objective based data on patent based documents. on patent documents. The result ofThe multiplying result of multiplying prior and likelihood prior and islikeliho posterior;od is weposterior used this; we as use a predictived this as a modelpredictive for ﬁndingmodel for technological finding technological sustainability. sustainability. Therefore, weTherefore, used the w priore used probability the prior probability distribution distribution to reﬂect theto reflect experience the experience and knowledge and knowledge of the relevant of the technologyrelevant technology experts in the experts model. in The the collected model. patent The collected data were paten representedt data were by the represented likelihood function. by the Welikelihood got the posterior function. distribution We got the by posterior multiplying distribution the prior distribution by multiplying and likelihoodthe prior distributionfunction. Finally, and theselikelihood Bayesian function. probability Finally, distributions these Bayesian were probability applied to distributions the count data were regression applied to for the the count Bayesian data countregression data model. for the UsingBayesia thisn approach, count data we model. expect Using an improved this approach, performance we expectof patent an technology improved analysisperformance for sustainable of patent technology.technology Ouranalysis model for is sustainable applicable totechnology. multivariate Our response model vectoris applicable as well asto univariatemultivariate response response variables. vector as well as univariate response variables.

푦푚×1 = 푥푚×(푝+1)훽(푝+1)×1 + 휀푚×1 (16) ym×1 = xm×(p+1)β(p+1)×1 + εm×1 (16) This is because, depending on the technology field, there may be more than one response This is because, depending on the technology field, there may be more than one response variable. variable. For example, in the AI technology field, the response vector is defined as follows: For example, in the AI technology field, the response vector is defined as follows: 푦1 푎푟푡푖푓푖푐푖푎푙 (푦!) = ( ) ! (17) y 2 푖푛푡푒푙푙푖푔푒푛푐푒arti f icial 1 = (17) We illustrate how this research couldy2 be appliedintelligence to practical problems through a case study in the next section. We illustrate how this research could be applied to practical problems through a case study in the next4. Case section. Study

4. CaseTo Studyshow how this research could be applied to a practical problem, we performed a case study using the patent documents related to artificial intelligence (AI) technology. We collected the patents To show how this research could be applied to a practical problem, we performed a case study applied and registered by 2016 from the WIPSON [21]. The total number of collected and valid using the patent documents related to artiﬁcial intelligence (AI) technology. We collected the patents patents was 11,973 cases. First of all, we consulted experts on AI and extracted the keywords related applied and registered by 2016 from the WIPSON [21]. The total number of collected and valid patents to AI from the collected patent document data [22]. Next, using text mining techniques, we built was 11,973 cases. First of all, we consulted experts on AI and extracted the keywords related to AI structured patent data for performing our case study [7,17]. Our structured patent data are shown in from the collected patent document data [22]. Next, using text mining techniques, we built structured Figure 5. patent data for performing our case study [7,17]. Our structured patent data are shown in Figure5.

Figure 5. Structured patent data.

The row and column of this data matrix are patents and keywords related to AI, and each cell of this matrix represents the occurrence frequency of each keyword on a patent. In this case study, we selected the keywords ‘Artificial’ and ‘Intelligence’ as dependent variables. The remaining keywords, except dependent variables, were used as predictor variables. That is, the structured data contain the keywords in Table 1 and the keywords ‘Artificial’ and ‘Intelligence’, and each element of the Sustainability 2018, 10, x FOR PEER REVIEW 6 of 11

before proceeding with Bayesian count data modeling. Bayesian count data modeling is based on the following concept in Figure 4.

Figure 4. Concept of Bayesian count data modeling.

In this paper, we tried to combine the expert’s subjective knowledge and objective result from patent data analysis. That is, the prior represented the domain knowledge of experts, and the likelihood denoted the objective data based on patent documents. The result of multiplying prior and likelihood is posterior; we used this as a predictive model for finding technological sustainability. Therefore, we used the prior probability distribution to reflect the experience and knowledge of the relevant technology experts in the model. The collected patent data were represented by the likelihood function. We got the posterior distribution by multiplying the prior distribution and likelihood function. Finally, these Bayesian probability distributions were applied to the count data regression for the Bayesian count data model. Using this approach, we expect an improved performance of patent technology analysis for sustainable technology. Our model is applicable to multivariate response vector as well as univariate response variables.

푦푚×1 = 푥푚×(푝+1)훽(푝+1)×1 + 휀푚×1 (16) This is because, depending on the technology field, there may be more than one response variable. For example, in the AI technology field, the response vector is defined as follows:

푦1 푎푟푡푖푓푖푐푖푎푙 ( ) = ( ) (17) 푦2 푖푛푡푒푙푙푖푔푒푛푐푒 We illustrate how this research could be applied to practical problems through a case study in the next section.

4. Case Study To show how this research could be applied to a practical problem, we performed a case study using the patent documents related to artificial intelligence (AI) technology. We collected the patents applied and registered by 2016 from the WIPSON [21]. The total number of collected and valid patents was 11,973 cases. First of all, we consulted experts on AI and extracted the keywords related to AI from the collected patent document data [22]. Next, using text mining techniques, we built Sustainabilitystructured 2018patent, 10, 3220data for performing our case study [7,17]. Our structured patent data are shown7 of 12in Figure 5.

Figure 5. Structured patent data.

TheThe row and column of this data matrix are patents and keywords related to AI, and each cell of this matrix represents the occurrence frequency of each keyword on a patent. In this case study, we selected the keywords ‘Artificial’ ‘Artificial’ and ‘Intelligence’ as dependent variables. The The remaining remaining keywords keywords,, except dependent variables,variables, were used as predictor variables. ThatThat is, the st structuredructured data contain the keywords in in Table Table1 and 1 and the keywordsthe keywords ‘Artificial’ ‘Artificial’ and ‘Intelligence’, and ‘Intelligence’, and each and element each of element the structured of the data represents the occurrence frequency of each keyword on the AI patents. Based on the structured data, we classified the AI technology as follows:

Table 1. Hierarchical structure of AI technology.

Sub-Technology Patent Keywords Learning Learning, inference, ontology, representation, analysis, data Behavior Behavior, awareness, situation, sentiment, mind, spatial, collaborative Language, natural, understanding, morphological, dialogue, sentence, corpus, Language voice, speech, conversation, interface Vision Vision, ﬁgure, object, video, image Neuro Neuro, network, computing, feedback, pattern, recognition, cognitive

In Table1, we divided AI technology to ﬁve subtechnologies as follow: Learning, behavior, language, vision, and neuro. In addition, we showed the patent keywords belonging to each subtechnology. We used this technology tree to retrieve the AI patents and analyze them. First, we estimated the Poisson parameters for the patent keywords using a maximum likelihood estimator (MLE) by the frequency values of the keywords. Table2 shows the estimates of Poisson parameters for all patent keywords.

Table 2. Estimates of Poisson parameters for all keywords.

Keyword λ Keyword λ Keyword λ analysis 0.0287 image 0.2745 pattern 0.1506 awareness 0.0008 inference 0.0019 recognition 0.0211 behavior 0.0258 interface 0.017 representation 0.0062 cognitive 0.0001 language 0.0426 sentence 0.0103 collaborative 0.0001 learning 0.0058 sentiment 0.0016 computing 0.001 mind 0.0004 situation 0.006 conversation 0.0035 morphological 0.0004 spatial 0.0966 corpus 0.0123 natural 0.0011 speech 0.5527 data 0.8316 network 0.2668 understanding 0.0005 dialogue 0.0009 neuro 0.0014 video 0.5114 feedback 0.0287 object 1.351 vision 0.0123 ﬁgure 0.0007 ontology 0.0044 voice 0.0153

We can compare the relative frequency between the patent keywords in Table2. These estimates are the MLE (maximum likelihood estimate) for Poisson parameters of AI keywords [18]. Figure6 illustrates the MLEs for the Poisson parameters of all patent keywords. Sustainability 2018, 10, 3220 8 of 12

Sustainability 2018, 10, x FOR PEER REVIEW 8 of 11

Sustainability 2018, 10, x FOR PEER REVIEW 9 of 11

ontology −2.0103 −0.0028 −1.0065 pattern 0.1729 0.0155 0.0942 recognition −0.003 0.0214 0.0092 representation −1.247 −0.008 −0.6275

sentence −3.2219 −0.0033 −1.6126 sentimentFigureFigure 6.6. MLEsMLEs− 1.695of all patent patent−0.0006 keywords. keywords. −0.8478 situation −2.1258 −0.0019 −1.0638 In thisIn this figure, figure, we we found found the the MLEs MLEsspatial ofof ‘object’,‘object’,0.1356 ‘data’, ‘data’, ‘speech’,0.0144 ‘speech’, ‘video’, ‘video’,0.075 ‘image’, ‘image’, ‘network’, ‘network’, ‘pattern’, ‘pattern’, ‘spatial’,‘spatial’, ‘language’, ‘language’, ‘analysis’, ‘analysis’, feedback’, speech feedback’, ‘behavior’, ‘behavior’,−1.2228 ‘recognition’,− 0.0131 ‘recognition’, − ‘interface’,0.618 ‘interface’, ‘voice’, ‘voice’, ‘corpus’, ‘corpus’, ‘vision’, understanding −0.546 −0.0037 −0.2749 and‘vision’, ‘learning’ and are ‘learning’ relatively arelarger relatively than larger other than keywords. other keywords. Using the Using results the in result Figures in6, weFigure determined 6, we determined the patent keywords videothat affect AI−0.5061 technology. −0.0072 A keyword −0.2567 with a larger MLE value will the patent keywords that affect AIvision technology. −3.2688 A keyword −0.0032 with − a1.636 larger MLE value will have more have more impact on AI technology. We also carried out Bayesian count data modeling on the impact on AI technology. We also carriedvoice out− Bayesian3.6855 − count0.0027 data−1.8441 modeling on the structured patent structured patent data matrix. Table 3 shows the results of the modeling. data matrix. Table3 shows the results of the modeling. We performed the Bayesian regression models by Gaussian, as well as Poisson distributions. To We performed the BayesianTable regression 3. Model models parameters by and Gaussian, weight. as well as Poisson distributions. To comparecompare the the weights weights ofof allall keywordskeywords equally, equally, we we standardized standardized the the scale scale of each of each variable. variable. In addition,In addition , the weight in Table 3 is the averageKeyword value Poissonof Poisson Gaussian and Gaussian Weight parameters. In this paper, we the weight in Table3 is the average value of Poisson and Gaussian parameters. In this paper, we selected selected the keywords with largeranalysis weight 0.1023 values for0.0176 finding technological0.0599 sustainability in AI the keywords with larger weight values for finding technological sustainability in AI technology. Using technology. Using the result awareness of Table 3, we− 0.663 show the− 0.0013 patent keyword−0.3321 ranking that influences AI the resulttechnology of Table in Figure3, we show7. thebehavior patent keyword0.1062 ranking 0.0103 that influences0.0582 AI technology in Figure7. cognitive −0.2173 −0.0016 −0.1094 collaborative −0.1971 0 −0.0985 computing 0.0867 0.2221 0.1544 conversation −1.73 −0.0009 −0.8654 corpus −4.5122 −0.0009 −2.2566 data −0.2869 −0.0065 −0.1467 dialogue −0.882 −0.0007 −0.4414 feedback −5.6046 −0.0046 −2.8046 figure −0.632 −0.0002 −0.3161 image −16.8687 −0.0159 −8.4423 inference −1.1521 −0.0013 −0.5767 interface 0.167 0.0284 0.0977 language −0.2675 0.002 −0.1327 learning 0.1336 0.05 0.0918 mind −0.5577 −0.0004 −0.2791 morphological −0.5222 −0.0012 −0.2617 natural −0.8831 −0.0016 −0.4424 network −0.3976 −0.0099 −0.2038 neuro −1.5059 −0.0007 −0.7533 object −0.3076 −0.0085 −0.158 FigureFigure 7. Patent7. Patent keyword keyword ranking ranking that influences influences artificia artificiall intelligence. intelligence.

The keywords in the 1st Group have a greater impact on AI technology than in the 2nd Group or the 3rd Group. Using the experimental results of this paper, we made the following technological structure for sustainable AI technology. In this paper, we divided the AI technology into five subtechnologies (learning, behavior, language, vision, neuro) in Table 1. In Figure 8, each subtechnology contains the keywords that can represent its technology. For example, the keywords of learning, inference, ontology, representation, analysis, and data describe the learning technology for AI. Each keyword is represented by bold or underlined types, depending on its importance from the results of Tables 2 and 3. The keywords in bold type are those that have an impact on AI technology from the results of Poisson MLEs. Additionally, the keywords with underlined lettering affect AI technology, based on the Bayesian regression model. Therefore, we knew that the subtechnologies related to learning, behavior, language, and neuro influence the sustainability of AI technology. However. we found that the subtechnology of vision has a relatively small effect on the technological sustainability of AI compared to other subtechnologies. In this paper, we concluded that the four technologies related to Sustainability 2018, 10, 3220 9 of 12

Table 3. Model parameters and weight.

Keyword Poisson Gaussian Weight analysis 0.1023 0.0176 0.0599 awareness −0.663 −0.0013 −0.3321 behavior 0.1062 0.0103 0.0582 cognitive −0.2173 −0.0016 −0.1094 collaborative −0.1971 0 −0.0985 computing 0.0867 0.2221 0.1544 conversation −1.73 −0.0009 −0.8654 corpus −4.5122 −0.0009 −2.2566 data −0.2869 −0.0065 −0.1467 dialogue −0.882 −0.0007 −0.4414 feedback −5.6046 −0.0046 −2.8046 ﬁgure −0.632 −0.0002 −0.3161 image −16.8687 −0.0159 −8.4423 inference −1.1521 −0.0013 −0.5767 interface 0.167 0.0284 0.0977 language −0.2675 0.002 −0.1327 learning 0.1336 0.05 0.0918 mind −0.5577 −0.0004 −0.2791 morphological −0.5222 −0.0012 −0.2617 natural −0.8831 −0.0016 −0.4424 network −0.3976 −0.0099 −0.2038 neuro −1.5059 −0.0007 −0.7533 object −0.3076 −0.0085 −0.158 ontology −2.0103 −0.0028 −1.0065 pattern 0.1729 0.0155 0.0942 recognition −0.003 0.0214 0.0092 representation −1.247 −0.008 −0.6275 sentence −3.2219 −0.0033 −1.6126 sentiment −1.695 −0.0006 −0.8478 situation −2.1258 −0.0019 −1.0638 spatial 0.1356 0.0144 0.075 speech −1.2228 −0.0131 −0.618 understanding −0.546 −0.0037 −0.2749 video −0.5061 −0.0072 −0.2567 vision −3.2688 −0.0032 −1.636 voice −3.6855 −0.0027 −1.8441

In this paper, we divided the AI technology into ﬁve subtechnologies (learning, behavior, language, vision, neuro) in Table1. In Figure8, each subtechnology contains the keywords that can represent its technology. For example, the keywords of learning, inference, ontology, representation, analysis, and data describe the learning technology for AI. Each keyword is represented by bold or underlined types, depending on its importance from the results of Tables2 and3. The keywords in bold type are those that have an impact on AI technology from the results of Poisson MLEs. Additionally, the keywords with underlined lettering affect AI technology, based on the Bayesian regression model. Therefore, we knew that the subtechnologies related to learning, behavior, language, and neuro inﬂuence the sustainability of AI technology. However. we found that the subtechnology of vision has a relatively small effect on the technological sustainability of AI compared to other subtechnologies. In this paper, we concluded that the four technologies related to ‘learning data’, ‘behavior spatial’, ‘language interface’, and ‘recognition pattern’ are important to continue the sustainability for AI technology. Sustainability 2018, 10, x FOR PEER REVIEW 10 of 11

‘learningSustainability data’,2018 ‘behavior, 10, 3220 spatial’, ‘language interface’, and ‘recognition pattern’ are important10 of 12 to continue the sustainability for AI technology.

FigureFigure 8. 8.TechnologicalTechnological structure structure forfor sustainable sustainable AI. AI.

5. Discussi5. Discussionon Since most statistical analyses are performed based on given or observed data, there is no factor Since most statistical analyses are performed based on given or observed data, there is no factor that can reflect subjective opinions of experts with domain knowledge of data in the model building that canprocess. reflect However, subjective in the opinions Bayesian of count experts model, with the domain analyst knowledge can determine of thedata prior in the distribution model building and process.various However, attempts in canthe beBayesian made basedcount on model, the expert’s the analyst experience can determine in determining the prior this distribution distribution. and variousThrough attempts this, can we can be made build a based final onmodel the with expert high’s experience performance. in determining Using Bayesian this count distribution. data Throughmodeling this, based we can on the build prior a distributionfinal model from with domain high performance.experts, we can Using generalize Bayesian our results count to data modelingvarious based technological on the fields,prior such distribution as bio, materials, from domain smart cars, experts, etc. we can generalize our results to various technologicalOur research contributesfields, such to as quantitative bio, materials, approaches smart car to findings, etc. sustainable technology areas in diverseOur research technologies. contributes This paper to quantitative focuses on objective approaches analysis to finding through patentsustainable data analysis technology compared areas in diversewith technol existingogies. methods This based paper on focuses qualitative on objective methods. analysis Particularly, through the implication patent data of thisanalysis study compared is that withthe existing subjective method experiencess based ofon experts qualitative are added method to thes. Particularly, quantitative data the implication analysis. of this study is that the subjective6. Conclusions experiences of experts are added to the quantitative data analysis.

6. ConclusionsWe proposed Bayesian count data modeling to find technological sustainability. To know the sustainable technology in a given technological field is very important to improve the technological competitionWe proposed of aBayesian company count and adata nation. modeling Much researchto find technological has been conducted sustainability. to find sustainable To know the sustainabletechnologies. technology Most of in it wasa given carried technological out using statistical field is modelsvery important that do not to consider improve the the characteristic technological competitionof count ofdata a fromcompany patent and documents. a nation. However, Much research most structured has been patent conducted data have to find a count sustainable data technologies.structure. Most In order of it to was solve carried this discrepancy out using statistical problem, Bayesian models that count do data not modeling consider isthe proposed characteristic in of countthis study. data Infrom addition, patent we documents. applied the domainHowever knowledge, most structured of experts topatent prior data distribution have a of count model data parameters in the Bayesian regression model. To show the validity of proposed modeling and illustrate structure. In order to solve this discrepancy problem, Bayesian count data modeling is proposed in how our approach could be applied to practical problems, we carried out a case study using patent this study. In addition, we applied the domain knowledge of experts to prior distribution of model documents related to AI technology. In the case study, we found the subtechnologies for the sustainable parametertechnologiess in the of AI.Bayesian They were regression learning from model. data, To spatial show behavior, the validity interface of of proposed language, and modeling pattern and illustraterecognition how our technologies. approach could Therefore, be applied we should to practical concentrate problem our researchs, we carried and development out a case onstudy these using patentsubtechnologies documents related to keep to the AI sustainability technology. of In AI the technology. case study, we found the subtechnologies for the sustainableOur technolo researchgies can beof appliedAI. They to were the research learning and from development-related data, spatial behavior, planning interface of companies of language, and and research pattern institutes. recognition Additionally, technologies. this research Therefore, will contribute we should to diverse concentrate technological our fields research as well and developmentas AL technology. on these Insubtechnologies this research, we to considered keep the sustainability only the patent of keywords AI technology. extracted from patent documentsOur research for patentcan be technology applied to analysis the research using statistical and development modeling. Our-related future planning work will of use companies more and research institutes. Additionally, this research will contribute to diverse technological fields as well as AL technology. In this research, we considered only the patent keywords extracted from patent documents for patent technology analysis using statistical modeling. Our future work will use more diverse elements as well as keywords, such as citations and claims, to find sustainable technologies for specific technological domains. Sustainability 2018, 10, 3220 11 of 12 diverse elements as well as keywords, such as citations and claims, to find sustainable technologies for specific technological domains.

Author Contributions: S.J. designed this study and collected the data for the experiment. He also preprocessed the data and selected valid patents and analyzed the data to show the validity of the study, and wrote the paper and performed all the research steps. Funding: This research received no external funding. Conﬂicts of Interest: The author declared no conﬂict of interest.

References

1. Park, S.; Jun, S. Statistical Technology Analysis for Competitive Sustainability of Three Dimensional Printing. Sustainability 2017, 9, 1142. [CrossRef] 2. Choi, J.; Jun, S.; Park, S. A patent analysis for sustainable technology management. Sustainability 2016, 8, 688. [CrossRef] 3. Roper, A.T.; Cunningham, S.W.; Porter, A.L.; Mason, T.W.; Rossini, F.A. Banks, Forecasting and Management of Technology; John Wiley & Sons: Hoboken, NJ, USA, 2011. 4. Kim, J.; Jun, S.; Jang, D.; Park, S. Sustainable Technology Analysis of Artificial Intelligence Using Bayesian and Social Network Models. Sustainability 2018, 10, 115. [CrossRef] 5. WIPO. World Intellectual Property Organization. Available online: www.wipo.org (accessed on 15 May 2018). 6. WIPO IPC. International Patent Classification (IPC), World Intellectual Property Organization. Available online: http://www.wipo.int/classifications/ipc/en (accessed on 15 May 2018). 7. Feinerer, I.; Hornik, K. Package ‘tm’ Ver. 0.7-5, Text Mining Package, CRAN of R Project. 2018. Available online: https://cran.r-project.org/web/packages/tm/tm.pdf (accessed on 1 August 2018). 8. Hilbe, J.M. Modeling Count Data; Cambridge University Press: Cambridge, UK, 2014. 9. Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: New York, NY, USA, 2013. 10. Kim, S.; Jang, D.; Jun, S.; Park, S. A novel forecasting methodology for sustainable management of defense technology. Sustainability 2015, 7, 16720–16736. [CrossRef] 11. Park, S.; Lee, S.; Jun, S. A network analysis model for selecting sustainable technology. Sustainability 2015, 7, 13126–13141. [CrossRef] 12. Jun, S.; Park, S. Examining technological innovation of Apple using patent analysis. Ind. Manag. Data Syst. 2013, 113, 890–907. [CrossRef] 13. Kim, J.; Jun, S. Graphical causal inference and copula regression model for Apple keywords by text mining. Adv. Eng. Inform. 2015, 29, 918–929. [CrossRef] 14. Jun, S.; Park, S. Examining technological competition between BMW and Hyundai in the Korean car market. Technol. Anal. Strateg. Manag. 2016, 28, 156–175. [CrossRef] 15. Grimaldi, M.; Cricelli, L.; Rogo, F. Valuating and analyzing the patent portfolio: The patent portfolio value index. Eur. J. Innov. Manag. 2018, 21, 174–205. [CrossRef] 16. Kim, J.; Jun, S.; Jang, D.; Park, S. An Integrated Social Network Mining for Product-based Technology Analysis of Apple. Ind. Manag. Data Syst. 2017, 117, 2417–2430. [CrossRef] 17. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria; Available online: http://www.R-project.org (accessed on 1 April 2018). 18. Ross, S.M. Introduction to Probability and Statistics for Engineers and Scientists, 4th ed.; Elsevier: Seoul, Korea, 2012. 19. Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2013. 20. Hogg, R.V.; McKean, J.M.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Pearson: Upper Saddle River, NJ, USA, 2018. Sustainability 2018, 10, 3220 12 of 12

21. WIPSON. WIPS Corporation. Available online: http://www.wipson.com (accessed on 15 May 2018). 22. KISTA, Korea Intellectual Property Strategy Agency. Available online: http://www.kista.or.kr (accessed on 16 June 2018).

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).