Understanding Job-Skill Relationships using Big Data and Neural Networks Abhinav Maurya Carnegie Mellon University Pittsburgh, PA – 15213
[email protected] ABSTRACT Our proposed model is inspired by topic modeling where Nonlinear vector space embeddings have shown great promise latent topics occur throughout the documents of a corpus in many applications such as similarity search, analogy map- in varying proportions, which enables their identification to pings, dataset visualization, etc. In this paper, we propose characterize the text corpus. Similarly, in our model, vari- a simple distribution-to-distribution regression model based ation in the co-occurrence of jobs and skills can be used on neural networks which provides (i) job2vec: interpretable to disentangle which skills are associated with which jobs. embeddings of jobs in terms of associated skills, (ii) skill2vec: However, unlike unsupervised topic models such as LDA [1], converse embeddings of skills in terms of the jobs that re- we cast our problem of identifying job-skill associations as quire them, and (iii) SkillRank: a mechanism for ranking a supervised distribution-to-distribution regression, where skills associated with each job that can be recommended the input is the empirical distribution over job titles and to members aspiring for that job. Due to the simplicity of the output is the corresponding empirical distribution over our model, it has no hyperparameters that need tuning ex- skills for that person. Moreover, unlike unsupervised topic cept for the number of stochastic gradient descent iterations models such as LDA and supervised variants such as [4], our which is easily determined using the early stopping criterion.