Identifying AI Talents Among Linkedin Members a Machine Learning Approach

Identifying AI talents among LinkedIn members A machine learning approach

Thomas Roca, PhD Microsoft and LinkedIn Economic Graph [email protected]

June 7, 2019

Abstract 1 Introduction

The debate around Artificial Intelligence (AI) induced automation and the labor force is ex- How to identify specific profiles among the tremely vivid, but rarely based on strong em- hundred of millions LinkedIn’s members? pirical evidence. How fast is AI diffusing in LinkedIn Economic Graph thrives on skills, industry and services? Is AI already impact- around 50 thousand of them are listed by ing the job market? A few academic studies LinkedIn and constitute one of the main sig- propose frameworks to estimate the impact of nals to identify professions or trends. Artificial the fourth industrial revolution on the labor Intelligence (AI) skills, for example, can be market (Frey and Osborn[7], OECD[3], etc.), used to identify the diffusion of AI in indus- but they reach no consensus, providing a wide tries [16]. But the noise can be loud around range of outcomes. The public debate is fu- skills for which the demand is high. Some eled with guesstimates filling the vacuum left users may add ”trendy” skills on their profiles by academia. Indeed, data is scarce when it without having work experience or training comes to Artificial Intelligence workers and related to them. On the other hand, some actual implementation of AI systems within people may work in the broad AI ecosystem companies. Tech enthusiasts argue that AI (e.g. AI recruiters, AI sales representatives, will empower workers, complementing their etc.), without being the AI practitioners we skills and allow people focusing on analysis, are looking for. Searching for keywords in pro- critical thinking, ingenuity or human to human files’ sections can lead to mis-identification of interactions. The truth is we have no compre- certain profiles, especially for those related to hensive statistics that describe AI diffusion a field rather than an occupation. This is the within companies and societies and current case for Artificial Intelligence. estimates are based on qualitative data and In this paper, we propose a machine learning perception surveys from astonishingly small approach to identify such profiles, and suggest samples. to train a binary text-classifier using job offers What’s more, the European Union and gov- posted on the platform rather than actual pro- ernments across Europe are currently devising files. We suggest this approach allows to avoid their own AI strategy, craving for more objec- manually labeling the training dataset, granted tive data and insights about AI skills needed, the assumption that job profiles posted by re- AI diffusion progress and hindrance. cruiters are more ”ideal-typical” or simply pro- We suggest LinkedIn insights can provide an- vide a more consistent triptych ”job title, job swers to some ofthese questions. By scrutiniz- description, associated skills” than the ones ing AI talents, their skills, the industry they that can be found among member’s profiles. work for, but also the vacancies advertised on

1 our platform we can provide the first build- AI talents population estimates were updated ing block of a more comprehensive piece of to 30.000 AI experts, based not only on aca- research to better inform the debate about the demic publication and conferences but also impact of AI on the Labor market. based on crawling LinkedIn’s website. Element Identifying AI specialists among the hundreds AI, explains clearly the limitation of their esti- of petabytes of LinkedIn database reveals a mates and the biases of the data sources, for in- “needle in a haystack” type of problem. To dis- stance, the over-representation of the English tinguish AI specialists who actually implement speaking and western-world both in LinkedIn AI, from other members with a mention of AI population and academic conferences. in their profile, I built a text classifier using Aware of those limitations, our research Scikit-learn and NLTK. The model was trained does not seek to provide an estimate of AI using AI-related job offers data from different talents worldwide. Although our machine countries. This paper describes the methodol- learning model does come with an estimate ogy I developed and the challenges I faced to for this population - and score associated transform LinkedIn big data into actionable with this prediction. Our objective is simply statistics for policy makers. The study covers to identify the largest possible sample of AI 44 countries1 and a data-visualization portal talents. allows making the best out of these results. On the demand side, literature exists analyzing the emergence of data-science and Ar- 1.1 Related work tificial Intelligence in the software and IT in- Estimates of the AI worker population, can dustry. Qualitative research such as the one be found on the web, but the methodology done by Kim, M. et al. in 2016 [10] help under- used to construct those numbers are rarely standing the demand for those relatively new detailed or when they are, they duly acknowl- roles and the skills and background needed for edge they are working assumptions, based on such positions, but also the motivation of data- existing biased data sources.Chinese tech giant scientists. Estimates of AI related job offers Tencent, published the 2017 Global AI Talent are easier to find as companies have been spe- White Papers [19] estimating the AI practi- cializing in scrapping the web to identify Data tioners’ population to approximately 300.000 Science and Analytics job offers. For example, individuals[21]. In 2017, The New York Times, Bruning Glass, IBM and BHEF published in citing Jean-Fran¸coisGagn´e,from Element AI, 2017 a report[13] analyzing such job offers from used the estimates of approximately 10.000 AI various sources and provide comprehensive in- experts worldwide[14]. This estimates is the sights about the magnitude of the demand in one taken by the European Commission’s Eu- the United States, the type of roles and their ropean Political Strategy Center in its March requirements and salaries for those positions. 2018 Strategic Notes The Age of Artificial In- Nevertheless, this report has a broader scope telligence, Towards a European Strategy for than our analysis of the demand side as it in- Human-Centric Machines[5]. cludes data analytics and engineering. It has Since then, Element AI has published the also a more limited breadth as our study cov- Global AI Talent Report 2018 [8] in which the ers not only the demand but also the supply side and goes beyond the United States. 1Argentina; Australia; Austria; Belgium; Brazil; Drawing upon research on AI skills done Bulgaria; Canada; Chile; Colombia; Croatia; Cyprus; by LinkedIn Economic Graph2 we propose Czech Republic; Denmark; Estonia; Finland; France; a new framework to provide a sound and Germany; Greece; Hong Kong SAR China; Hungary; Ireland; Italy; Latvia; Lithuania; Luxembourg; Malta; comprehensive description of the AI labor Mexico; Netherlands; New Zealand; Norway; Philip- force: the skills AI talents own, the industry pines; Poland; Portugal; Romania; Singapore; Slo- they work in, their education, etc. By vak Republic; Slovenia; South Africa; Spain; Swe- den; Switzerland; Thailand; United Kingdom; United 2see How artificial intelligence is already impacting States. today’s jobs

2 scrutinizing AI related job offers we were also 2 Building the training able to identify skills gap between demand dataset and supply to provide a complete picture of the AI labor force. Manually labeling thousands of profiles proved extremely time consuming, and potentially To our knowledge, there is currently no leads to encoding our own biases in the model. academic literature covering this specific topic. After having started with this approach, and However, the methodology we implemented scrutinized the skills, job title and description in the field of Natural Language processing is of the most prominent AI workers, I decided well covered by computer science literature. to resort to AI vacancies, advertised on the For surveys on Natural Language Processing platform. The job database contains job titles, and machine learning models for binary text position description and the skills associated classification see [11], [9], [20], [12] and [2]. with it. Our assumption is that the jobs (title, description, skills) containing AI-related keywords will likely provide an accurate description AI talents’ profiles while containing 1.2 Identifying AI talents in less noise than LinkedIn members’ profiles. I LinkedIn database: a machine assume that the very nature of job advertising learning approach data allows to use string search for building a relatively pure training dataset and that by harvesting enough of those, my training Artificial Intelligence is not a job title, it is dataset will contain enough variation to be a field. There is no such standardized cate- able to capture most of AI talents in their gory within LinkedIn database. Thus, to iden- respective area of expertise (computer vision, tify AI practitioners among all our members Natural language processing, robotics, knowl- we need to scrutinize their skills and employ- edge reasoning etc.). I thus built “synthetic ment. Querying LinkedIn database with key- profiles” gathering job titles, position descrip- words such as “Artificial Intelligence”, “Ma- tions and skills extracted from the job table3 chine learning”, “Deep learning”, we would using a string search for: [“Artificial Intelli- capture not only AI talents but also, sales gence”, “AI”, “Computational Intelligence”, people specialized in selling AI technology, re- “Reinforcement Learning” , “Machine Learn- cruiters, public speakers and advocates, and ing”, ”Deep learning”, ”neural networks”] - many variations of those profile we simply can- and their translation in local languages when not foresee. Indeed, it is difficult to come up needed. with a clear set of rules to properly query the However, to be able to detect non-AI prac- database to identify AI talents, the ones who titioners in the member database, I need actually implement AI. to have similar categories in my training Machine learning is usually a good approach dataset. Therefore, I also need to find jobs when confronted to a “Needle in a haystack” advertising for non-AI practitioners which, type of problem. My assumption is that ex- nevertheless, contain certain AI-related key- posing a text classifier to enough AI talent words. To identify those “false positive”, I profiles, I will be able to filter and refine the looked4 into jobs’ title and checked if, be- results of an initial and more generic query sides AI related keywords, they also contained for AI talents on LinkedIn database. For this 3For USA, Great Britain, Australia since January study I used python 3.5, Scikit-Learn and the 2017 for the “AI class”. NLTK library. The quality of predictions of 4For USA, France, Great Britain, Germany, Aus- machine learning models primarily lies in the tralia since June 2016 for the “Non-AI class”. It is quality of the data used to train them. The more challenging to find non-AI practitioners in the Job database when querying for AI keywords, for this next section will describe the approach I used reason, I expanded both time and space coverage for for building a proper training dataset. this class.

3 one of those: [”account manager”, ”prac- and prevent increasing the number of features, tice leader”, ”Account Manager”, ”Sales”, I followed those steps: ”Practice Leader”, ”Product Manager”, ”Re- cruiter”, ”Business Analyst”, ”Sales Engineer”, Text passed to lower case; ”Evangelist”, ”Representative”, ”Digital Ana- lyst”, ”human resource”, ”customer success”, Stop words were removed (see NLTK stop ”speaker”]. These keywords are the ones I words); identified as the most prominent in non-hard- Punctuation and special characters were skilled AI related profiles. I assumed that removed; overall both categories, “AI” and “non-AI”, job offers will provide enough variation and Job description turned into keywords (us- help building a sound binary classifier. ing Rake library).

2.1 Standardization issues and With 30% of “Not AI” profiles in the train- users’ supplied information ing dataset, our training set is a bit unbalanced, extra-care will be needed while evaluat- Standardization in a free text environment ing the models. Confusion matrices are a good is a huge challenge. Standardization cover- way to evaluate unbalanced misclassification. age varies by language – standardization is See figure3 on page 11. strongest for English language profiles. To make sure to grasp as much information as possible, I decided to use both standardized 3 Binary text classification and user supplied information. I resorted to with Scikit-Learn Microsoft Bing translation API to automate translation of resulting synthetic profiles. Us- 3.1 Vectorizers and models bench- ing both standardized and user supplied in- marking formation I must make sure no duplicated information remains in the final profiles. Deep neural networks are gaining momentum in text classification. TensorFlow proposes a text classifier using deep neural networks 2.2 Data preparation and ingestion but I ultimately decided not to resort to deep Models do not understand words, we need learning to allow better explainability of our to transform text into numbers, extract fea- results (differences in performances were small tures that characterize text input (member’s enough to feel good with this trade-off). One of information). Several technics exist to extract the main differences between machine learning features from text. Common features extrac- and deep learning, is the control over feature tion methods consist in counting (Count Vec- extraction. With deep learning, the model also torizer aka bag of words) unique words or chose the best way to extract features while counting words next to each other (N-gram). in traditional machine learning, the developer It is also possible to count the frequency of has control over it. This has important con- words (Terms Frequency Inverse Document sequence on model interpretability and trans- Frequency - TFIDF), the amount of punctu- parency. ation, capitalized letters, length of the text Support Vector Machine, together with Logis- (use case e.g.: spam detection). When fea- tic Regression, are the go-to models for binary tures are extracted from raw text, distinctions classification in supervised machine learning are made between each and every character: settings. SVM draws its name from the data space, coma, uppercase, punctuation, can thus points (the support vectors) which support the end up influencing the outcome, although in margins between which is drawn the hyper- our case, those do not carry meaningful in- plane (i.e. they are the data points the closest formation for distinguishing AI from non-AI to the other class, the hardest to classify). practitioners. To reduce the noise in our texts Logistic Regression operates to maximize the

4 Figure 1: Flow chart: building the training dataset

5 likelihood that a given data point is well clas- 3.3 Cross Validation sified (i.e. Maximum Likelihood Estimation). For training and testing purposes the train- SVM properties are considered better suited ing dataset is split into two subsets. I use a for generalization and scalability [18], while 20/80 ratio (i.e. 20% of the training set is kept LR provides ”calibrated probabilities that can for testing). To benchmark the performance be interpreted as confidence in a decision”[18]. of alternative models and make sure the re- NB. We will use this feature of LR in Platt sults are consistent across all the dataset, I calibration, see sub-section 4.3. We used a performed a cross-validation analysis using 5 linear Support Vector Classifier which is a iterations. This procedure compares the per- generalization of SVM. formances between the different models using new randomly generated testing/training set for each iteration - see figure9 on page 14. Note that the testing set contains 20% of the training dataset (i.e. 2858 observations). For me information on SVM and implementation with Scikit-learn see [1]

3.4 Tuning Hyper parameters When fitting a Support Vector Machine clas- Figure 2: SVM explained [4] sifier, two hyper-parameters are considered important: the Kernel trick and the Penalty For a discussion on Support Vector Machine parameter C of the error term. The first al- vs. Logistic Regression, refer to [18], [6], [15] lows to specify whether data are linearly sep- and [4] arable and the second allows fine-tuning how closely the classifier fits to the training data. To fine-tunes those two parameters I used 3.2 Testing Vectorizers GridSearchCV, the outcome shows that the C parameters had little influence on the results I tested both TFIDF and Count Vectorizers and that the data are linearly separable - see as features extractors and different models figure7 on page 13. As a result, I kept the (Logistic Regression, Random forest, Na¨ıve default value for the C parameters and used Bayes, Perceptron, Support Vector Machine, the linear kernel. etc.). Our intuition that bags of word would provide an efficient vectorizer was confirmed 3.5 Most informative features (tradeoff execution time / performance). In- deed, the nature of the data, mostly list of Inspecting the most informative features driv- words (skills, job title and Key words extrac- ing the classification, we can observe how the tion for job description) renders less necessary model behaves and correct for features we do the weighting procedure used in TFIDF and not want - for example company names. See will save us a fair amount of computing time. figure8 on page 13. Performance wise, the bag of words approach even provides a slightly better accuracy for 4 Implementation selected models. A Support Vector Machine Model (Support Vector Classifier with linear Once the model has been trained, I can use it kernel) with bags of words vectorizer provided to classify actual profiles (from pseudonymised the best results with 99.7% accuracy on our and concatenated profiles sections: title, cur- testing set- figure5, page 12 - (8 synthetic rent or last position description, skill set). For profiles misclassified), slightly outperforming the AI in the Labour Market series, I attempt Logistic Regression - see figure6, page 12 to build AI talents pool for 44 countries.

6 4.1 Data ingestion 4.3 Prediction Binary classification sorts the population into The classifier described in this paper is de- two groups. Support Vector Machine, in par- signed to distinguish AI talents from a subset ticular, classifies observations into two groups of members who possess those ”AI keywords” taking the value +1 or -1. SVM does not we mentioned before. I started with a broad come with a probability or certainty degree keywords search on the database for the given metric. Nevertheless, we can resort to Platt countries. This query should be broad enough, calibration to compute the probability distri- both specific and generic. To build this query bution associated with each classes. Platt I looked at specific AI libraries: keras, py- calibration was initially designed by John C. torch, scikit-learn, tensorflow, theano, CNTK, Platt for SVM. It uses a logistic regression caffe (taking case and spelling variations into model to ”map the SVM outputs into proba- account). I also used generic terms referring bilities” [17] the resulting probabilities can be to AI: “artificial intelligence, computational used as confidence levels. In Scikit-learn you intelligence, reinforcement learning, machine can pass this option in the parameter of your learning, deep learning, neural networks” and SVC: SVC(probability=True, kernel=’linear’). their translation in local languages. NB. These More information on model calibration are keywords are the results of an AI taxonomy available in Scikit-learn documentation. NB. effort led by the Economic Graph team. A Scikit-learn relies on liblinear libray for lin- member ID will end up in my bucket if at ear SVC which uses a Coordinate descent as least one of those keywords appears in one of optimization program. those profile sections: position summary, user Once this score computed, I decided to use a supplied title, std skill name, user supplied threshold of 0.95, excluding from the AI talents skills. This bucket will contain, non-AI prac- pool all observations for which the prediction titioners, sales people, HR, AI advocates, etc. score is below 0.95. This is arbitrary, and if that’s where the text classifier comes in, to I were to provide raw count, I would provide distinguish AI from not-AI practitioners. NB. it as a range with associated scores. However, For jobs position of our members, I looked at this is not my attempt. For this study, the “final active” or “current position”, not their threshold discussion is not so crucial. The complete work history. AI talents series will only provide, aggregates and rankings that should be consistent for a wide range of thresholds. Furthermore, the score distribution is clearly left-skewed. See 4.2 Prepare the data for prediction figure 12 on page 15. To put it differently, in this classification problem we are more in- terested in precision than recall, we aim at Once collected the skills, position description building a sample of population as representa- and title of potential AI talents, this data tive of AI talents as possible, even if it implies must be cleaned/prepared to feed the classi- that true-positives close to the threshold are fier. I translated user-supplied information excluded. into English using Bing translator API. Dupli- cates between standardized and user supplied skills were removed, so as stop words and non- 5 Discussion alphanumerical characters. For position summary, I also used keywords extraction (Rake Our objective was to go beyond keywords to library) and translated this into English. At refine a pre-selection of potential AI talents the end of this process, I end up with synthetic extracted from our database using generic AI- profiles: concatenation of title, last/current po- related keywords. In this subset we identified sition description (as keywords), skill set and that many of resulting potential AI talents corresponding member IDs. were actually not AI practitioners. Our intu-

7 ition was that a machine learning model can who could be trained and up-skilled to meet identify better the features able to distinguish to the demand for AI talents. between ”AI practitioners” and ”Not AI prac- Ultimately, the metric we lack to properly titioners”. However, I resort to keywords to evaluate the robustness of the method is the build my training set and the method I pro- actual accuracy of the model when classifying pose only has a value added if the features members’ profiles, but for this, we would need generated by the classifier capture a different enough of labeled profiles - and if we had those subset of members than a simple query on the we would simply train the model this way. database searching for the keywords that help building the training set. First distinction to 6 Descriptive statistics and make here, is that the training set was build not using members’ profiles but job offers ad- visualization vertized on the platform. Although vacancies contain similar sections to members profiles Once identified AI talents, we can get more (skills, job title, job description), the way they information about them: skills, industries, in- are formulated are different, job offers tend ferred seniority, gender, location, company, to provide a more ”consistent” triptych: job education, diplomas, etc. As mentioned pre- title, job description and associated skills. viously, LinkedIn Economic Graph policy is to not publish raw counts. Considering our Would the keywords used to build the train- methodology, raw counts would only be a pre- ing set be good a predictor of the subset identi- diction range with associated confidence score. fied by the classifier on the top of the broader For this research, we simply want the best initial query? Probably as overlaps exist, some sample possible from which to construct de- of the most informative features used by the scriptive statistics. I computed those and built model naturally reflect those keywords. But charts and maps (using leaflet and Highcharts), the features built by the model and influenc- such as AI talents’ repartition by administra- ing the classification go way beyond the few tive level5, by Industries6, seniority groups, keywords used to select the job offers that feed gender, skills, universities, companies, etc. Av- the training set. Furthermore, the weights of erages for the European Union and worldwide those features, their interplay, pushing in both average were computed using weighted aver- directions to make a final prediction would ages (using the share of AI talents by coun- be very difficult to explicit and encode for ex- tries). Those insights will be gathered on a ample in a rule based algorithm. Figure 11, web portal hosted on Azure to help policy page 15 shows how different are the resulting makers make the best of them. talent pools when identified via the keywords used to build the training set or via our SVM classifier. 6.1 Data Limitations and things to keep in mind when analyzing Our intuition was also that a machine learn- those results ing model would be easier to implement, less arbitrary and more agile as the features identi- LinkedIn’s market penetration varies from fied by the model also reflect the labor market one country to another. Both in terms and will evolve with it when reproducing the of membership and jobs advertizing. In study. Indeed, the way to characterize AI some countries the number of AI related practitioners evolves overtime, as the indus- job offers for the considered period can be try evolves. More agile also because using a small (this should be taken into account predictive model, we can easily play with the when comparing countries). threshold (the confidence interval) to loosen the criterion and allow the ”net” to be wider Gender balance: not all countries have and capture a broader population. This is par- the same gender representation among ticularly interesting for us at LinkedIn, as we 5See figure 13 on page 16 would like to identify less specialized members 6See figure 14 on page 16

8 LinkedIn members. In some countries [3] T. Gregory Arntz, M. and U. Zierahn. The women can outnumber men on LinkedIn, risk of automation for jobs in oecd countries: and vice versa. A comparative analysis, 2016.1 [4] R Berwick. An Idiot’s guide to Support vector Geographic information: not all countries machines (SVMs). page 28.6 have the same geographic information as- [5] European Political Strategy Center. The age sociated with members of artificial intelligence, towards a european strategy for human-centric machines, 2018.2 The AI talents in the Labor Market series only [6] Tristan Fletcher. Support Vector Machines assess Artificial Intelligence prevalence within Explained. page 19, 2008.6 LinkedIn’s population. Given LinkedIn’s [7] Carl Benedict Frey and Micheal A. Osborne. strong presence in the IT and computer science The future of employment: how susceptible community, the information we share as rank- are jobs to computerization?, 2013.1 ings, distributions and aggregates are likely [8] J.F. Gagn´e.Global ai talent report 2018, 2018. reliable representations of the AI labour force 2 for the given countries. Nevertheless, we do [9] Rajni Jindal, Ruchika Malhotra, and Abha Jain. Techniques for text classification: Lit- not claim statistical representativeness of our erature review and current trends. page 28. sample. These results shall be understood as 3 estimates derived from LinkedIn population. [10] Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. The emerging role 7 Conclusion of data scientists on software development teams. In Proceedings of the 38th Interna- tional Conference on Software Engineering - Using Artificial Intelligence vacancies pub- ICSE ’16, pages 96--107, Austin, Texas, 2016. lished on LinkedIn, we were able to build ACM Press.2 synthetic profiles describing AI talents and [11] Roshan Kumari and Saurabh Kr. Machine train a binary text classifier to distinguish Learning: A Review on Binary Classification. AI practitioners from a broader population of International Journal of Computer Applica- AI related profiles extracted from LinkedIn tions, 160(7):11--15, Feb. 2017.3 database. Doing so, we were able to save a [12] Dapeng Liu, Virginia Commonwealth, and considerable amount of time not having to Yan Li. A Roadmap for Natural Language manually label profiles to build our training Processing Research in Information Systems. dataset. This approach could be used to iden- page 10.3 [13] Braganza S. Markow, W. and B. Taska. The tify less specialized members who could be quant crunch. how the demand for data sci- trained and up-skilled to meet to the demand ence skills is disrupting the job market, 2017. for AI talents, or even scale-up and be used to 2 eventually match different kinds of job offers [14] C. Metz. Tech giants are paying huge salaries and talent pools on LinkedIn. For the later, for scarce a.i. talent, 2017.2 the remaining challenge would be to properly [15] Andrew Ng. Cs229 lecture notes: Support label the ”not what we are looking for class”. vector machines.6 A track to explore could consist in building a [16] Igor Perisic. How artificial intelligence user interface to propose recruiters to identify is already impacting today’s jobs, 2018. a few types of profiles they would like to avoid https://economicgraph.linkedin.com/blog/how- and within a few iterations constructing this artificial-intelligence-is-already-impacting- todays-jobs.1 second class. [17] John C. Platt. Probabilistic Outputs for Sup- port Vector Machines and Comparisons to References Regularized Likelihood Methods. In Advances in Large Margin Classifiers, pages 61--74. [1] 1.4. Support Vector Machines — scikit-learn MIT Press, 1999.7 0.20.3 documentation.6 [18] Kevin Swersky. Support Vector Machines [2] Charu C. Aggarwal and Chengxiang Zhai. A vs Logistic Regression. Logistic regression, Survey of Text Classification Algorithms.3 page 23.6

9 [19] Tencent. 2017 global ai talent white papers, 2017.2 [20] M. Thangaraj and M. Sivakami. Text Classi- fication Techniques: A Literature Review. In- terdisciplinary Journal of Information, Knowl- edge, and Management, 13:117--135, 2018.3 [21] The Verge Vincent, J. Tencent says there are only 300,000 ai engineers worldwide, but millions are needed, 2017.2

10 8 Appendix

Figure 3: Class distribution in training dataset

Figure 4: Machine learning versus deep learning processes

11 Figure 5: Confusion matrix: Linear Support Vector Machine with Bags of Words vectorizer

Figure 6: Confusion matrix: Logistic Regression with Bags of Words vectorizer

12 Figure 7: Fine Tuning hyper parameters

Figure 8: Most informative features

13 Figure 9: Benchmarking and Cross validation

Figure 10: Flow chart training the classifier

14 Figure 11: Comparing the size of AI talents pools identified with alternative methods: ratio Keywords search vs. Machine learning (SVM)

Figure 12: Score distribution for Prediction of AI talents in the US

15 Figure 13: AI talents distribution by administrative levels in Finland

Figure 14: AI talents distribution by industries