Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect AvailableScienceDirect online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia ScienceDirect www.elsevier.com/locate/procedia Procedia Computer Science 127 (2018) 82–91
The First International Conference On Intelligent Computing in Data Sciences The First International Conference On Intelligent Computing in Data Sciences Data science in light of natural language processing: An overview Data science in light of natural language processing: An overview Imad Zerouala* and Abdelhak Lakhouajaa Imad Zerouala* and Abdelhak Lakhouajaa aFaculty of Sciences, Mohamed First University, Av Med VI BP 717, Oujda 60000, Morocco aFaculty of Sciences, Mohamed First University, Av Med VI BP 717, Oujda 60000, Morocco
Abstract Abstract The focus of data scientists is essentially divided into three areas: collecting data, analyzing data, and inferring information from data.The focus Each ofone data of thesescientists tasks is requires essentially special divided personnel, into three takes areas: time, collecting and costs data, money. analyzing Yet, the data, next and an dinferring the fastidious informat stepion is fromhow data.to turn Each data one into of products. these tasks Therefore, requires this special field personnel, grabs the attentiontakes time, of andmany costs research money. groups Yet, thein academia next and asthe well fastidious as industry step .is In how the tolast turn decades, data into data products.-driven approachesTherefore, this came field into grabs existence the attention and gained of many more research popularity groups because in academia they requireas well asmuch industry less .human In the lasteffort. decades, Natural data Language-driven Processingapproaches (NLP)came intois strongly existence among and gainedthe fields more influenced popularity by becausedata. The they growth require of datamuch is lessbehind human the performanceeffort. Natural improvement Language Processing of most (NLP) NLP isapplications strongly among such theas fieldsmachine influenced translation by data. and The automatic growth ofspeech data isrecogn behindition. the Consequently,performance improvement many NLP applications of most NLPare frequently applications moving such from as rulemachine-based translationsystems and and knowledge automatic-based speech methods recogn to ition.data- drivenConsequently, approaches. many However, NLP applications collected dataare frequentlythat are based moving on undefined from rule design-based criteria systems or and on technicallyknowledge -unsuitablebased methods forms to wil datal be- udrivenseless. approaches. Also, they However, will be neglectedcollected dataif the that size are isbased not onenough undefined to perform design criteriathe required or on technicallyanalysis and unsuitable to infer formsthe accurate will be information.useless. Also, The they chief will purpose be neglected of this overviewif the size is tois shednot someenough lights to performon the vital the rolerequired of data analysis in various and fields to infer and thegive accurate a better uinformation.nderstanding The of chiefdata inpurpose light ofof NLP.this overview Expressly, is toit sheddescribes some whatlights happen on the vitalto data role during of data its in life various-cycle: fields building, and give processing, a better analyzing,understanding and ofexploring data in phases.light of NLP. Expressly, it describes what happen to data during its life-cycle: building, processing, analyzing, and exploring phases.
© 2018 The Authors. Published by Elsevier B.V. © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and © 2018 The Authors. Published by Elsevier B.V. Thispeer-review is an open under access responsibility article under of theInternational CC BY-NC Neural-ND licenseNetwork (http://creativecommons.org/licenses/by Society Morocco Regional Chapter. -nc-nd/3.0/). Selection andThis peer is an-review open access under articleresponsibility under the of CCInternational BY-NC-ND Neural license Network (http://creativecommons.org/licenses/by Society Morocco Regional Chapter.-nc -nd/3.0/). Selection andKeywords: peer-review Data science; under Naturalresponsibility language of processing; International Data Neuraldriven approches; Network Corpora;Society MoroccoMachine learning Regional Chapter. Keywords: Data science; Natural language processing; Data driven approches; Corpora; Machine learning
1. Introduction 1. Introduction Recently, a variety of Natural Language Processing (NLP) applications are based on data-driven methods such as neuralRecently, networks a variety and Hidden of Natural Markow Language Models Processing (HMMs) (NLP) [1]. Since applications the progress are based in most on data of -thesedriven methods methods is such driven as neural networks and Hidden Markow Models (HMMs) [1]. Since the progress in most of these methods is driven
* Corresponding author. Tel.: +212618417420. *E -Correspondingmail address: [email protected] author. Tel.: +212618417420. E-mail address: [email protected] 1877-0509© 2018 The Authors. Published by Elsevier B.V. This1877 -is0509© an open 2018 access The Authors. article Publishedunder the by CC Elsevier BY-NC B.V.-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection andThis peer is an-review open access under articleresponsibility under the of CCInternational BY-NC-ND Neural license Network (http://creativecommons.org/licenses/by Society Morocco Regional Chapter.-nc -nd/3.0/). Selection and peer-review under responsibility of International Neural Network Society Morocco Regional Chapter.
1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of International Neural Network Society Morocco Regional Chapter. 10.1016/j.procs.2018.01.101
10.1016/j.procs.2018.01.101 1877-0509 Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect Imad Zeroual et al. / Procedia Computer Science 127 (2018) 82–91 83 ScienceDirect Imad Zeroual / Procedia Computer Science 00 (2018) 000–000 Procedia Computer Science 00 (2018) 000–000
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia The First International Conference On Intelligent Computing in Data Sciences The First International Conference On Intelligent Computing in Data Sciences Data science in light of natural language processing: An overview Data science in light of natural language processing: An overview Imad Zerouala* and Abdelhak Lakhouajaa Imad Zerouala* and Abdelhak Lakhouajaa a Faculty of Sciences, Mohamed First University, Av Med VI BP 717, Oujda 60000, Morocco aFaculty of Sciences, Mohamed First University, Av Med VI BP 717, Oujda 60000, Morocco dictionaries such as the “Dictio Abstract Scottish Tongue”, the “Middle English Dictionary”, the “Dictionary of Old English”, and the “Oxford English Abstract Dictionary” The focus of data scientists is essentially divided into three areas: collecting data, analyzing data, and inferring information from data.The focus Each ofone data of thesescientists tasks is requires essentially special divided personnel, into three takes areas: time, collecting and costs data, money. analyzing Yet, the data, next and an dinferring the fastidious informat stepion is fromhow todata. turn Each data one into of products. these tasks Therefore, requires this special field personnel, grabs the attentiontakes time, of andmany costs research money. groups Yet, thein academia next and asthe well fastidious as industry step .is In how the Nation lastto turn decades, data into data products.-driven approachesTherefore, this came field into grabs existence the attention and gained of many more research popularity groups because in academia they requireas well asmuch industry less .human In the effort.last decades, Natural data Language-driven Processingapproaches (NLP)came intois strongly existence among and gainedthe fields more influenced popularity by becausedata. The they growth require of datamuch is lessbehind human the may not need to become a part of the learners’ output or the teacher performanceeffort. Natural improvement Language Processing of most (NLP) NLP isapplications strongly among such theas fieldsmachine influenced translation by data. and The automatic growth ofspeech data isrecogn behindition. the performance improvement of most NLP applications such as machine translation and automatic speech recognition. Consequently, many NLP applications are frequently moving from rule-based systems and knowledge-based methods to data- drivenConsequently, approaches. many However, NLP applications collected dataare frequentlythat are based moving on undefined from rule design-based criteria systems or and on technicallyknowledge -unsuitablebased methods forms to wil datal be- udrivenseless. approaches. Also, they However, will be neglectedcollected dataif the that size are isbased not onenough undefined to perform design criteriathe required or on technicallyanalysis and unsuitable to infer formsthe accurate will be information.useless. Also, The they chief will purpose be neglected of this overviewif the size is tois shednot someenough lights to performon the vital the rolerequired of data analysis in various and fields to infer and thegive accurate a better uinformation.nderstanding The of chiefdata inpurpose light ofof NLP.this overview Expressly, is toit sheddescribes some whatlights happen on the vitalto data role during of data its in life various-cycle: fields building, and give processing, a better analyzing,understanding and ofexploring data in phases.light of NLP. Expressly, it describes what happen to data during its life-cycle: building, processing, analyzing, and exploring phases. © 2018 The Authors. Published by Elsevier B.V. This© 2018 is an The open Authors. access Published article under by Elsevierthe CC BY B.V.-NC -ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection DOI andThis peer is an-review open access under articleresponsibility under the of CCInternational BY-NC-ND Neural license Network (http://creativecommons.org/licenses/by Society Morocco Regional Chapter.-nc -nd/3.0/). Selection andKeywords: peer-review Data science; under Naturalresponsibility language of processing; International Data Neuraldriven approches; Network Corpora;Society MoroccoMachine learning Regional Chapter. Keywords: Data science; Natural language processing; Data driven approches; Corpora; Machine learning 1. Introduction 1. Introduction Recently, a variety of Natural Language Processing (NLP) applications are based on data-driven methods such as 2. Life-cycle of data in NLP neuralRecently, networks a variety and Hidden of Natural Markow Language Models Processing (HMMs) (NLP) [1]. Since applications the progress are based in most on data of -thesedriven methods methods is such driven as neural networks and Hidden Markow Models (HMMs) [1]. Since the progress in most of these methods is driven * Corresponding author. Tel.: +212618417420. E* -Correspondingmail address: [email protected] author. Tel.: +212618417420. E-mail address: [email protected] 1877-0509© 2018 The Authors. Published by Elsevier B.V. Manning This1877 -is0509© an open 2018 access The Authors. article Publishedunder the by CC Elsevier BY-NC B.V.-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection andThis peer is an-review open access under articleresponsibility under the of CCInternational BY-NC-ND Neural license Network (http://creativecommons.org/licenses/by Society Morocco Regional Chapter.-nc -nd/3.0/). Selection and peer-review under responsibility of International Neural Network Society Morocco Regional Chapter. 2.1. Data design, sources, and format