<<

Journal of Personalized Medicine

Review How Do Machines Learn? Artificial Intelligence as a New Era in Medicine

Oliwia Koteluk 1,†, Adrian Wartecki 1,†, Sylwia Mazurek 2,3,*,‡, Iga Kołodziejczak 4,‡ and Andrzej Mackiewicz 2,3

1 Faculty of Medical Sciences, Chair of Medical Biotechnology, Poznan University of Medical Sciences, 61-701 Poznan, Poland; oliwiakoteluk@.com (O.K.); [email protected] (A.W.) 2 Department of Cancer Immunology, Chair of Medical Biotechnology, Poznan University of Medical Sciences, 61-701 Poznan, Poland; [email protected] 3 Department of Cancer Diagnostics and Immunology, Greater Poland Cancer Centre, 61-866 Poznan, Poland 4 Postgraduate School of Molecular Medicine, Medical University of Warsaw, 02-091 Warsaw, Poland; [email protected] * Correspondence: [email protected]; Tel.: +48-61-885-06-67 † These authors contributed equally. ‡ These authors contributed equally.

Abstract: With an increased number of medical generated every day, there is a strong need for reliable, automated evaluation tools. With high hopes and expectations, machine has the potential to revolutionize many fields of medicine, helping to make faster and more correct decisions and improving current standards of treatment. Today, machines can analyze, learn, commu- nicate, and understand processed data and are used in health care increasingly. This review explains different models and the general process of and training the . Further- more, it summarizes the most useful machine learning applications and tools in different branches of medicine and health care (radiology, pathology, pharmacology, infectious diseases, personalized decision making, and many others). The review also addresses the futuristic prospects and threats of   applying artificial intelligence as an advanced, automated medicine tool.

Citation: Koteluk, O.; Wartecki, A.; Keywords: machine learning; artificial intelligence; ; medicine; ; decision Mazurek, S.; Kołodziejczak, I.; making; personalized medicine; data processing; ; personalized treatment Mackiewicz, A. How Do Machines Learn? Artificial Intelligence as a New Era in Medicine. J. Pers. Med. 2021, 11, 32. https://doi.org/ 1. Introduction 10.3390/jpm11010032 Living in the era, with billions of terabytes of data generated every year, it Received: 25 November 2020 might be challenging for humans to proceed with all the information. However, Artificial Accepted: 5 January 2021 Intelligence (AI) can lend a helping hand. In the past, machines have gained an advantage Published: 7 January 2021 over humans in physical work, where automation contributed to industry and agriculture’s rapid development. Nowadays, machines are gaining an advantage over humans in typi- Publisher’s Note: MDPI stays neu- cally human cognitive skills like analyzing and learning. Moreover, their communication tral with regard to jurisdictional clai- and understanding skills are improving quickly. There are numerous examples where AI ms in published maps and institutio- already achieves much better results than humans in analyzing [1–3]. nal affiliations. The AI focuses on exploiting calculation techniques with advanced investigative and prognostic facilities to process all data types, which allows for decision-making and the mimicking of human intelligence. Such computational systems usually operate on large amounts of data and often integrate different types of input. AI is a broader field of Copyright: © 2021 by the authors. Li- censee MDPI, Basel, Switzerland. science, and one of the most significant branches of AI in medicine is machine learning This article is an open access article (ML). ML means understanding and processing information from a given dataset by the distributed under the terms and con- algorithm, namely machine. The word “learning” stands here as the machine’s ability to ditions of the Creative Commons At- become more effective with training experience. Such a machine can quickly draw novel tribution (CC BY) license (https:// conclusions from the data that may be omitted by humans. Machines’ potential increases creativecommons.org/licenses/by/ year by year, making them more autonomous. However, human interference is necessary 4.0/). and still has the final word about taking or not particular actions. At least for now. Will

J. Pers. Med. 2021, 11, 32. https://doi.org/10.3390/jpm11010032 https://www.mdpi.com/journal/jpm J. Pers. Med. 2021, 11, 32 2 of 22

it change in the future? Will we let the AI perform actions itself, or will it remain only as a human tool? One thing is unquestionable—we must start accustoming ourselves to live alongside the machines that begin to equal or even surpass people in the processes of analyzing and deciding.

2. How Do Machines Learn The machine learning process is very similar to the learning mechanisms and biochem- ical principles of the human . All the decisions a human makes result from billions of that analyze images, sounds, smells, structures, and movements, recognize patterns, and continuously calculate and options. Machines can also analyze and calculate similar data, including smell sensing by the electronic nose [4].

2.1. The Main Components of the Machine Learning Process ML algorithms are methods to perform calculations and [5]. They re- quire inputs (see Table1. Glossary)—all data presented to the ML algorithm for analysis, e.g., patients’ genome sequencing data. The ML algorithm’s outcome is called the output; for instance, of a patients’ susceptibility to cancer. The simple analysis usually does not require large amounts of data for obtaining a high accuracy prognosis. In the more advanced analysis, more input is required [6–8]. Although the relationship between inputs and outputs is more complex than this, generally, setting more inputs should provide more accurate outcomes.

Table 1. Glossary.

Element of ML Description An independent program that acts regarding received signals from its environment to meet designated goals. Artificial agent Such an agent is autonomous because it can perform without human or any other system. Grouping data points with similar features which differ from other data points containing exceedingly Clustering different properties. A task or a problem that needs to be resolved by the agent. The environment interacts with the agent by Environment executing each received action, sending its current state and reward, linked with agents’ undertaken actions. Feature An individual quantifiable attribute for the presented event, as the input color or size. Parameters that cannot be estimated from training data and are optimized beyond the model. They can be Hyperparameters tuned manually in order to get the best possible results. Input A piece of information or data provided to the machine in pictures, numbers, text, sounds, or other types. Label A description of the input or output; for example, an x-ray of lungs may have the label “lung”. The most prominent structure in . Each consists of nodes called neurons, connected, Layer creating together a neural network. The connections between neurons are weighted, and in consequence, the processing signal is increased or decreased. Output Predicted data generated by a machine learning model with an association with the further given input. Information from the environment (or supervisor) to an agent about the action’s precision. The reward can Reward be positive or negative, depending on if the action was correct or not. It allows concluding behavior in a particular state.

In comparison to ML, AI acts in response to the environment to meet the defined goals. According to Turing’s test, AI must be contagious, embody its memory, be able to conclude, and adapt to new circumstances [9]. A good example is or ALEXA, where the AI performs different tasks such as voice recognition, number dialing, information searching to fulfill user’s requests [10,11]. AI gives a machine cognitive ability and therefore is more complicated than ML [12]. J. Pers. Med. 2021, 11, 32 3 of 22 J. Pers. Med. 2021, 11, x FOR PEER REVIEW 3 of 21

2.2. Machine Learning Models There are three principal learning models of the ML: , unsuper- vised learning, and , which differ depending on the type of datadata input. Different learning schemes require specificspecific algorithmsalgorithms (Figure(Figure1 1,, Table Table2). 2).

Figure 1.1. Machine learning models and algorithms.algorithms. Machine learning is a subfieldsubfield of artificialartificial intelligenceintelligence science that enables the machine to become more effective with training experience. Three principal learning models are supervised enables the machine to become more effective with training experience. Three principal learning models are supervised learning, , and reinforcement learning. Learning models differ depending on the input data type learning, unsupervised learning, and reinforcement learning. Learning models differ depending on the input data type and and require various algorithms. require various algorithms. Table 2. Examples of specific algorithms suited for different learning models. Table 2. Examples of specific algorithms suited for different learning models. Algorithm Prediction References AlgorithmAlgorithms Applied in Supervised Prediction Learning References distributionsAlgorithms are Applied applied in to Supervised represent all Learning uncertain, unnoticed quantities Probabilistic model (classifi- (including structural, parametric, and noise-related aspects) and their relation to [13] cation) Probability distributions are applied to represent all uncertain,current unnoticed data. quantities (including structural, Probabilistic model (classification)Predicts probability comparing to a logit function or decision trees, where the algo- [13] (classifi- parametric, and noise-related aspects) and their relation to rithm divides data according to the essential assetscurrent making data. these groups extremely [14] cation) Predicts probabilitydistinct. comparing to a logit function or Naïve Bayes classifier (clas- Assumes that a feature presencedecision in trees, a class where is unrelated the algorithm to any dividesother element’s data according pres- Logistic regression (classification) [15][14] sification) to theence. essential assets making these groups Support vector machine The algorithm finds the hyperplane with the immenseextremely distance distinct. of points from both [16] (classification) Assumes thatclasses. a feature presence in a class is unrelated to Naïve Bayes classifier (classification) [15] Simple Estimates the relationship of one independentany other to element’sone dependent presence. variable using a [17] (value prediction) straight line. The algorithm finds the hyperplane with the immense Support vector machine (classification) [16] Multiple linear regression Estimates the relationship of at leastdistance two independents of points from to one both dependent classes. variable [18] (value prediction) using a hyperplane. Estimates the relationship of one independent to one PolynomialSimple linear regression regression Kind (value of prediction)linear regression in which the relationship between the independent and de- [19][17] dependent variable using a straight line. (value prediction) pendent variables is projected as an n-degree polynomial. Estimates the relationship of at least two independents to Multiple linear regressionA (value non-parametric prediction) algorithm constructs classification or regression in the form of a [18] Decision- (classification one dependent variable using a hyperplane. tree structure. It splits a data set into further small subsets while gradually expand- [15] or value prediction) ing the associated .

J. Pers. Med. 2021, 11, 32 4 of 22

Table 2. Cont.

Algorithm Prediction References Kind of linear regression in which the relationship between (value prediction) the independent and dependent variables is projected as [19] an n-degree polynomial. A non-parametric algorithm constructs classification or regression in the form of a tree structure. It splits a data set Decision-tree (classification or value prediction) [15] into further small subsets while gradually expanding the associated decision tree. Set consisting of a few decision trees, out of which a class (classification or value prediction) dominant (classification) or expected average (regression) of [20] individual trees is determined. Algorithms Applied in Supervised Learning Clusters are formed by the proximity of data points to K-means (clustering) the middle of cluster—the most minimal distance of data [21] points in the center. Consists of clustering points within nearby neighbors DBSCAN (clustering) (high-density region), outlying those being comparatively [21] far away (low-density region). Algorithms Applied in Reinforcement Learning A mathematical approach where sets of states and rewards are finite. The probability of movement into a new state is influenced by the previous one and the selected action. [21] The likelihood of the transition from state “a” to state “b” is defined with a reward from taking particular action. Discovers an optimum policy and maximizes reward for the whole following steps launched from the present state. Q-learning Hither, the agent acts randomly, exploring and discovering [20] new states or exploiting provided information on the possibility of initiating action in the current state.

The supervised model requires the described data for learning. Hence an input with extracted features is linked to its output label (Figure2)[ 22]. Therefore, after training, the algorithm can make predictions on non-. The output is generated by data classification or value prediction (Figure1, Table2). The classification bases on assigning elements into groups, having previously defined features, whereas the value is predicted based on training data calculations [15]. Contrarily, in unsupervised learning, the machine tries to find patterns and correla- tions between presented in randomized order examples that are not labeled, categorized, or classified. (Figure3)[ 23]. The main unsupervised data mining methods are clustering, association rules, and (DR) [24–26]. The difference between clustering and classification is that grouping does not base on predefined features. Con- sequently, an algorithm must assemble data by characteristics, which differentiates them from other groups of objects. Besides data clustering, unsupervised learning allows detect- ing anomalies, meaning identify thighs that outline the other data points and differ from them [27]. The association rules mining is aimed to find common features and dependencies in a large dataset [26]. For example, Scicluna et al. classified the patients’ sepsis basing on the association of its endotypes to leukocyte counts and differentials [28]. This study allowed to predict a patient’s prognosis and mortality by characterizing blood leucocyte genome-wide expression profiles. Such classification would allow the identification of patient endotypes in clinical practice. J. Pers. Med. 2021, 11, 32 5 of 22 J. Pers. Med. 2021, 11, x FOR PEER REVIEW 5 of 21

Figure 2. SupervisedFigure 2. Supervised learning model. learning In model. the supervised In the supervised method, method, learning learning begins begins on a labeled on a labeled dataset, da- where input data is linkedtaset, to itswhere output input label. data The is linked algorithm to its isoutput then validatedlabel. The onalgorithm a different, is then non-labeled validated dataset,on a different, not presented to non-labeled dataset, not presented to the machine previously. the machine previously.

Contrarily, in Allunsupervised of the data learning, types ranging the machine from MRI tries scans to find to patterns digital photographs and correla- or speech sig- tions betweennals presented usually in are randomized characterized order by examples high dimensionality that are not labeled, [29]. The categorized, data dimensions denote or classified. (Figurethe number 3) [23] of. The features main measured unsupervised for every data single mining observation. methods ar DRe clustering, decreases the number association rules,of data and featuresdimensionality by selecting reduction important (DR) [24 attributes–26]. The or difference combining between traits. clus- Concerning unsu- tering and classificationpervised learning, is that grouping DR is used does to improve not base algorithm on predefined performance, features. mainly Conse- by employing quently, an algorithmbias/variance must tradeoffassemble and data thus by alleviating characteristics, overfitting which [30 differentiates]. Post-genomic them data can serve as from other groupsa good ofmodel objects. of Besides DR. Those data data clustering, are often unsupervised high-dimensional, learning contain allows more de- variables than tecting anomalies,samples, meaning have identify a high degree thighs of that noise, outline and maythe other include data many points missing and differ values. The use of from them [27]unsupervised. learning would reduce the number of dimensions (e.g., variables), limiting the data set to only those variables with, e.g., the highest variance [31].

J. Pers. Med. 2021, 11, 32 6 of 22 J. Pers. Med. 2021, 11, x FOR PEER REVIEW 6 of 21

FigureFigure 3. 3.Unsupervised Unsupervised learning learning model. model. The The algorithm algorithm extract extract features features itself itself from from the unknownthe unknown input datainput without data without training. training. Hencethe Hence algorithm the algorithm can cluster can datacluster with data a similar with a component,similar component, which differ- which differentiates data from other objects or groups. entiates data from other objects or groups.

DRThe is association performed rules through mining two is categories aimed to offind methods: common feature features selection and dependencies and feature in extraction.a large dataset Feature [26] selection. For example, takes a subsetScicluna of et features al. classified from original the patients’ data that sepsis are thebasing most on relevantthe association for a specific of its issue endotypes [32]. Feature to leukocyte extraction counts removes and differentials redundant and[28]. irrelevant This study fea- al- tureslowed from to thepredict original a patient’s data set, prognosis resulting inand more morta relevantlity by data characterizing for analysis [blood33]. The leucocyte major differencegenome-wide between expression these two profiles. methods Such is thatclassification would allow chooses the a identification subset of original of pa- traits,tient andendotypes feature in extraction clinical practice. produces new features distinct from the original ones. A wide rangeAll of linear of the and data nonlinear types ranging DR methods from MRI is usedscans to to displacedigital photographs excessive features or speech [34]. signals usuaOnelly ofare the characterized most broadly by used high unsupervised dimensionality learning [29]. The methods data dimensions for DR of large-scale denote the unlabelednumber of data features is principal measured component for every analysis single (PCA)observation. [35]. The DR PCA decreases method’s the mainnumber aim of isdata to determine features by all selecting uncorrelated important features attributes called principalor combining components traits. Concerning (PC). PCA unsuper- can be usedvised in learning, various applications DR is used such to improve as image algorithm and speech performance, processing, mainly robotic by sensor employing data, ,/variance exploratory tradeoff and data thus analysis, alleviating and a data preprocessing [30]. Post-genomic step before data buildingcan serve modelsas a good [33, 35model]. of DR. Those data are often high-dimensional, contain more variables thanBesides samples, supervised have a high and degree unsupervised of noise, and models, may include some modelsmany missing cannot values. be classified The use strictlyof unsupervised into these categories.learning would In the reduce first one,the number semi-supervised of dimensions learning (e.g., labeled variables), training limit- seting is the supported data set byto only an immense those variables amount with, of unlabeled e.g., the highest data during variance the [31] training. process. The mainDR goalis performed of including through the unlabeled two categories data into of the methods: model is feature improving selection the classifier and feature [36]. Whatextraction. is more, Feature it has beenselection shown takes that a subset usingsemi-supervised of features from modelsoriginal candata improve that are thethe gen-most eralizabilityrelevant for of a riskspecific prediction issue [32] when. Feature compared extraction to supervised removesones redundant [37]. Another and irrelev approach,ant fea- namedtures from self-supervised the original learning, data set, generates resulting supervisoryin more relevant signals data automatically for analysis from[33]. The the datamajor itselfdifference [38]. It between is achieved these by two presenting methods an is unlabeledthat feature data selection set, hiding chooses part a ofsubset input of signals original fromtraits, the and model, feature and extraction asking the produces algorithm new to features fill in the distinct missing from information the original [39]. ones. Presented A wide methodsrange of eliminate linear and the nonlinear often-occurring DR methods problem, is used which to displace is the lack excessive of an adequate features amount[34]. of labeledOne data.of the They most are broadly especially used useful unsupervised when working learning with method deepslearning for DR of algorithms large-scale andunlabeled are gaining data moreis principal and more component popularity. analysis (PCA) [35]. The PCA method’s main aim is toIn determine the reinforcement all uncorrelated learning method,features thecalled algorithm principal learns components by trial-and-error (PC). PCA process, can be continuallyused in various receiving applications such [40]. as The imageartificial and speech agent reacts processing, to its environment robotic sensorsignals data, representingvisualization, the exploratory environment’s data stateanalysis, (Figure and4 ).a data The p actionsreprocessing performed step before by the building agent influencesmodels [33,35] the state. of the environment. The foremost goal is to make decisions that guarantee the maximum reward. When the machine makes a correct decision, the supervisor Besides supervised and unsupervised models, some models cannot be classified gives a reward for the last taken action in the form of an assessment, for example, 1 for strictly into these categories. In the first one, semi-supervised learning labeled training set

J. Pers. Med. 2021, 11, x FOR PEER REVIEW 7 of 21

is supported by an immense amount of unlabeled data during the training process. The main goal of including the unlabeled data into the model is improving the classifier [36]. What is more, it has been shown that using semi-supervised models can improve the gen- eralizability of risk prediction when compared to supervised ones [37]. Another approach, named self-supervised learning, generates supervisory signals automatically from the data itself [38]. It is achieved by presenting an unlabeled data set, hiding part of input signals from the model, and asking the algorithm to fill in the missing information [39]. Presented methods eliminate the often-occurring problem, which is the lack of an ade- quate amount of labeled data. They are especially useful when working with deep learn- ing algorithms and are gaining more and more popularity. In the reinforcement learning method, the algorithm learns by trial-and-error pro- cess, continually receiving feedback [40]. The artificial agent reacts to its environment signals representing the environment’s state (Figure 4). The actions performed by the agent influ- J. Pers. Med. 2021, 11, 32 ences the state of the environment. The foremost goal is to make decisions that7 of 22guarantee the maximum reward. When the machine makes a correct decision, the supervisor gives a reward for the last taken action in the form of an assessment, for example, 1 for proper properaction actionand 0 for and incorrect. 0 for incorrect. However, However, when whenthe machine the machine chooses chooses the next the nextstep steperroneously, erroneously,it is penalized it is [41] penalized. The functional [41]. The functional expression expression of reinforcem of reinforcementent learning learning is a chess is game, awhere chess game,an agent where has an to agent react has to toan react opponent’s to an opponent’s moves to moves get a to maximal get a maximal reward reward for its move- forment its movementand win [42] and. win [42].

FigureFigure 4. 4.Reinforcement Reinforcement learning learning model. model. An agent An inagent its current in its current state performs state performs an action, whichan action, influ- which encesinfluences the state the of state the environment. of the environment. The environment The environment gives back the gives information back the about information its changed stateabout its tochanged the agent. state The to supervisor the agent. interprets The supervisor and rates theinterprets action, providing and rates a rewardthe action, for a providing correct decision. a reward for a correct decision. 2.3. Deep Learning 2.3. ArtificialDeep Learning neural networks (ANNs) are a subset of ML, where the model consists of numerous layers—functions connected just like neurons and acting parallel. ANNs that Artificial neural networks (ANNs) are a subset of ML, where the model consists of contain more than one hidden layer are thus referred to as “deep” [43]. Deep learning (DL) isnumerous built on interlinked layers—functions multi-level connected algorithms, just creating like neurons neural-like and networks acting parallel. [6]. In other ANNs that words,contain DL more is a collectionthan one of hidden complex layer functions are thus that automaticallyreferred to as discover “deep” relationships [43]. Deep learning in(DL) raw is data. built Such on interlinked a set is created multi by-level extracting algorit higherhms, abstractioncreating neural from- thelike data networks [44]. [6]. In DLother can words, also be DL categorized is a collection into supervised, of complex semi-supervised, functions that automatically unsupervised, asdiscover well as relation- reinforcementships in raw learningdata. Such [45 ].a set is created by extracting higher abstraction from the data [44]. DL Thecan mainalso be advantage categorized of this int methodo supervised, is that DL semi is capable-supervised, of feature unsupervised, extraction with as well as noreinforcement human intervention. learning DL [45] exploits. a structure imitating a human’s neuronal structure of the brain (Figure5). The structure consists of one input layer, some hidden layers, and The main advantage of this method is that DL is capable of with one output layer wherein neurons (also nodes) are connected with other layers’ neurons. Theseno human connections intervention. are assigned DL exploits a weight, a whichstructure is calculated imitating during a human’s the training neuronal process. structure of Thethe algorithmbrain (Figure has to 5 determine). The structure the best consists approximate of one output input at layer, eachlayer some to hidden get the desiredlayers, and one finaloutput result layer [40, 44 wherein,46]. neurons (also nodes) are connected with other layers’ neurons. TheseThe connections most straightforward are assigned neural a weight, network which is called is feedforward.calculated during The statement the traini feed-ng process. forwardThe algorithm means that has the to informationdetermine the flows best from approximate input neurons output through at each some layer estimation to get the de- functionssired final to result generate [40,44,46] the output.. This DL operation provides no feedback between layers. DespiteThe that, most the straightforward neural algorithm network is often is usedcalled with feedforward. feedforward The neural statement net- feed- works. It is a precise adjustment of neural network scales based on the error rate obtained forward means that the information flows from input neurons through some estimation in the previous training session. It allows for the calculation of the gradient, including all the weights in the network. Proper weight tuning reduces the error level and increases the model’s reliability by increasing its generalization [47]. One of the most common deep neural networks is the convolutional neural network (CNN). It consists of the , the activation layer, the pooling layer, and the fully- connected (classification) layer. The convolution layer comprises filters that extract and expand the number of features (parameters), represented as maps that characterize the input. The activation layer (which is mostly nonlinear) is composed of an , and takes a generated map of features, and creates an activation map as its output. Then, the pooling layer is applied to reduce the spatial dimensions, hence achieving computational performance and lowering the likelihood of overfitting. Having processed the input by several such sets of layers, the classification occurs. The final output of CNN in the form of a vector serves as the input for the classification layer, where the algorithm J. Pers. Med. 2021, 11, x FOR PEER REVIEW 8 of 21

functions to generate the output. This DL operation provides no feedback between layers. Despite that, the backpropagation algorithm is often used with feedforward neural net- works. It is a precise adjustment of neural network scales based on the error rate obtained in the previous training session. It allows for the calculation of the loss function gradient, including all the weights in the network. Proper weight tuning reduces the error level and increases the model’s reliability by increasing its generalization [47]. One of the most common deep neural networks is the convolutional neural network (CNN). It consists of the convolution, the activation layer, the pooling layer, and the fully- connected (classification) layer. The convolution layer comprises filters that extract and expand the number of features (parameters), represented as maps that characterize the input. The activation layer (which is mostly nonlinear) is composed of an activation func- tion, and takes a generated map of features, and creates an activation map as its output. Then, the pooling layer is applied to reduce the spatial dimensions, hence achieving com- J. Pers. Med. 2021, 11, 32 putational performance and lowering the likelihood of overfitting. Having processed8 of 22 the input by several such sets of layers, the classification occurs. The final output of CNN in the form of a vector serves as the input for the classification layer, where the algorithm producesproduces the the classifier, classifier, e.g., e.g., tumor tumor or normal or normal [40,45 [40,45,47],47]. CNNs. CNNs are used are to used analyze to analyze images, images, andand inin medicine, medicine, they they are are most most helpful, helpful, for example, for example, in radiology in radiology [40]. There [40]. are There other are other variousvarious applications applications [48 [48–51–]51] described described in Sectionin Section3, Application 3, Application of Machine of Machine Learning Learning in inM Medicine.edicine.

FigureFigure 5. 5.Layers Layers of of deep deep feedforward feedforward or feedforward or feedforward neural neural network. network. The statement The statement feedforward feedforward meansmeans that that the the information information flows flows from from input input through through some estimation some estimation functions functions to generate to the generate out- the put.output. Elsewhere Elsewhere exists exists a feedback a feedback neural neural network, network, where thewhere information the information about the about output the is output fed is fed backback to to the the model. model. Each Each layer layer represents represents a different a different function—the function— inputthe input layer layer (L1)with (L1) thewith first the first function,function, the the hidden hidden layers layers (L2, (L2, L3, ...L3,, LX)…, LX) with with the next the functions.next functions. The depth The of depth the model of the reflects model reflects thethe connection connection chain chain length. length. In the In lastthe outputlast output layer, layer, the output the output value should value matchshould the match input the value input approximatedvalue approximated by earlier by functions. earlier functions.

AlthoughAlthough the the simple simple CNN CNN architecture architecture may may look look like like described, described, there there are are many many var- variationsiations and and improvements. improvements. One One of of them them is is the the fully fully convolutional convolutional networknetwork (FCN), (FCN), which which has convolutional layers instead of fully-connected layers. In opposite to CNN, FCN has convolutional layers instead of fully-connected layers. In opposite to CNN, FCN nat- naturally handles inputs of any size and allows for pixel-wise prediction. In order to do this,urally FCN handles yields outputinputs withof any the size input-like and allows spatial for dimensions. pixel-wise Such prediction. upsampling In order can be to do this, achievedFCN yields by using output deconvolution with the layers input [52-like]. Therefore, spatial the dimensions. FCN is a well-suited Such upsampling option for can be semanticachieved segmentation, by using deconvolu especiallytion in medical layers imaging.[52]. Therefore, Ronneberger the FCN et al. createdis a well a U-Net-suited option thatfor semantic goes over segmentation, 2D pictures [53 ].especially The u-shape in medical is created imaging. because Ronneberger of the upsampling et al. part, created a U- whereNet that there goes are over many 2D feature pictures channels, [53]. whichThe u allow-shape the is networkcreated tobecause propagate of the context upsampling information to higher resolution layers. The U-net comprises of the contracting and expanding path. The convolution and the pooling layers in the contracting path extract advanced features and downsize the feature maps. Later the expansion path, consisted of the different convolution (“up-convolution”) and upsampling layers, restores the original map size. In addition to this, after each upsampling, the feature map from the same level of the contracting path is concatenated to give the feature localization’s information. At the final layer a 1 on 1 convolution is used to map each component feature vector to the desired number of classes. Similar but yet different is the V-net, presented by Milletari et al. [54]. In comparison to U-net, V-net learns a residual function at each stage and examines 3D pictures, using volumetric filters. Both networks performed outstandingly, being named a state-of-the-art in the medical image segmentation [55]. J. Pers. Med. 2021, 11, 32 9 of 22

A noteworthy CNN variation is region-based CNN (R-CNN). R-CNN can find and classify any objects in an image by combining proposals of rectangular regions with CNN. R-CNN is an algorithm, which consists of two detection stages. The first stage identifies a subset of regions in the image that may contain an object, explicitly regions proposal. In the second stage, these regions are adequately classified by the CNN layers outputting classifier region of interest (ROI) and background [56]. This solution is successfully applied regarding tumor diagnosis from contours [57]. However, there are also more complex R-CNN subtypes. The fast R-CNN is more efficient owing to sharing the computations for overlapping regions. The fast R-CNN differs from R-CNN because, as input, it takes the entire image and a set of object proposals. Then, several convolutional and pooling layers produce the feature map. Given each ROI proposal, the pooling layer extracts a fixed-length feature vector from the feature map. All feature vectors are provided to a combination of fully connected layers and finally split into two output layers [58]. Additionally, there are implemented more advanced R-CNN. Instead of using an ad- ditional algorithm to generate proposal regions, the faster R-CNN uses proposal networks region. It is made up of convolutional layers and efficiently predicts region proposals, so the calculation is even faster [59]. Moreover, there is another R-CNN variant, namely Mask R-CNN. This complex method extends the faster R-CNN by adding a branch to predict segmentation masks for every ROI, together with the available branch of classi- fication and regression of the bounding box regression. The mask’s branch is a minor FCN employed to individual ROI, the segmentation mask in a pixel-to-pixel manner [60]. Other interesting examples of DL methods are the () and its variant long short-term memory network (LSTM). As distinct from the previously described neural network, RNN forms cycles in its structure. Such a network design enables recycling of its limited computational resources, thus performing more complex computations [61]. What is more, by using recurrent connections, a kind of memory is cre- ated so that RNN can learn from the information processed so far [62]. However, RNN may face the vanishing gradient problem encountered, e.g., during backpropagation [63]. Thus, variations of RNN were created, like LSTM. In LSTM, the recurrent hidden layer of RNN is replaced by a memory cell. It enables better reproduction of long-time dependencies [62].

2.4. Machine Learning Process The very first step of the learning process is data preparation (Figure6). When working on big datasets, data will likely be unclean, i.e., incomplete, inconsistent, or corrupt. A better algorithm-based analysis requires a high-quality dataset without any anomalies or duplicates [64]. A good practice is to randomize inputs in order to exclude the influence of order on learning. What is more, it is best to split data into three sets: training data, validation data, and test data [65]. This technique is termed the “lock box approach” and is a very effective practice in the learning process, commonly used in neuroscience [66]. Different datasets allow tuning some hyperparameters on validation data before testing the algorithm on the other datasets [67]. After having the data processed, the next step is selecting the algorithm and the learn- ing model. The most common learning model is the supervised one [64,68]. Sometimes, the choice of an appropriate algorithm and a learning scheme depends on the type of data, e.g., categorical or numerical, and what task it needs to be automated. The supervised learning requires labeled data. In the case of an insufficient quantity of labeled data, unsu- pervised learning, semi-supervised, or self-supervised learning may be used [36–39,69]. The accuracy, size of the training data set, training time, and the number of parameters and features need consideration when selecting the algorithm. During a training phase, the algorithm proceeds the training data. The outcome has to match the previously marked output. When the mistake occurs, the model is corrected, and another iteration is tested [70]. J. Pers. Med. 2021, 11, x FOR PEER REVIEW 10 of 21

J. Pers. Med. 2021, 11, 32 10 of 22 allow tuning some hyperparameters on validation data before testing the algorithm on the other datasets [67].

FigureFigure 6. 6.Machine Machine learning learning process. process. The The machine machine learning learning process process starts starts with modelwith model build-up. build-up. DataData need need to beto preprocessedbe preprocessed and splitand intosplit training, into training, validation, validation, and test and sets intest this sets step. in Thethis nextstep. The next stagestage is theis the training training phase, phase, during during which which parameters parameters are adjusted are adjusted on the on training the training dataset. dataset. Then, Then, duringduring the the optimization optimization phase, phase, hyperparameters hyperparameters are tuned are on tuned the validation on the validation dataset. After dataset. the last After the last modelmodel adjustments, adjustments, the trained the trained algorithm algorithm processes processes the final testthe dataset,final test and dataset, the model and performance the model perfor- resultsmance are results examined. are examined.

TheAfter validation having dataset the data is toprocessed, determine the the next best step tuning is selecting of hyperparameters the algorithm during and the learn- theing optimization model. The phase most [6 ,65common]. If the validationlearning model error is is high, the the supervised supervisor one presents [64,68] more. Sometimes, datathe to choice the algorithm of an appropriate and regulates algorithm parameters. and Sometimes, a learning buildingscheme adepends whole new on modelthe type of data, mighte.g., becategorical required. Ifor the numerical, validation and and trainingwhat task sets it with needs the to same be normalautomated. distribution The supervised perform well, then it is likely for our machine learning model to perform effectively also learning requires labeled data. In the case of an insufficient quantity of labeled data, un- on the test set [71]. supervisedThe final phaselearning, is applying semi-supervised, a test set to or the self trained-supervised model andlearning checking may the be perfor-used [36–39,69]. manceThe accuracy, results. A testsize setof mustthe trai containning datadata instancesset, training not presentedtime, and to the the number algorithm of inparameters theand training features and need optimization consideration phase [65 when,66]. Testingselecting the the model algorithm. on the previously applied data canDuring result in a obtainingtraining phase, inflated the performance algorithm scores proceeds [67]. the training data. The outcome has to match the previously marked output. When the mistake occurs, the model is corrected, and another iteration is tested [70]. The validation dataset is to determine the best tuning of hyperparameters during the optimization phase [6,65]. If the validation error is high, the supervisor presents more data to the algorithm and regulates parameters. Sometimes, building a whole new model

J. Pers. Med. 2021, 11, 32 11 of 22

2.5. Examples of Machine Learning in Everyday Life ML is a universal tool, applied in many various fields, and often we are not even aware we use it daily. In the cyber-security sector, ML is used to protect the user, whereas it becomes more resilient with every known . A Ravelin company can detect fraud using the ML algorithm, which continually analyzes normal customer behavior [72]. When it spots suspicious signals, like copying and pasting information by resizing the window, the algorithm can block the transaction or flag it to review [73]. The AI, such as IBM , is trained on billions of data artifacts from different sources like blogs. Afterward, AI concludes the relationship between threats such as malicious files or mistrustful IP, limiting time for analysis, and improving reaction to threat up to 60 times faster [3]. Considering the daily use, Netflix is an excellent example of a successful ML appli- cation. Behind their achievement stands personalization, where the platform, based on the user’s activity, recommends titles and visuals suited for them. Additionally, it helps the company to predict what content is worth investing [74]. A terrific example of the AI in ordinary routine is Waymo’s self-driven car, trained in 3D maps that point out information like road profiles, crosswalks, traffic lights, or stop signs [3,74,75]. The sensors and software scan around its neighborhood, and thus it can distinguish and predict traffic users’ movement based on their speed and trajectory [76].

3. Application of Machine Learning in Medicine Techniques based on ML started to step into medicine in the 1970s, but over time, the possibilities for their use began to multiply [77,78]. The first-ever ML-based diagnostic system was already approved by the U.S. Food and Drug Administration (FDA) in 2018 [79]. The system implements “in silico clinical trials”, which helps develop more efficient clinical trial strategies. It allows investigators to detect safety and effectiveness signals earlier in the new drug development process and contributes to costs reduction [80]. With many hopes and expectations, ML has the capacity to revolutionize many fields of medicine, helping to make faster and more correct decisions and improving current stan- dards of treatment. The potential applications of ML in general medicine are summarised in Table3. J. Pers. Med. 2021, 11, 32 12 of 22

Table 3. Potential applications of machine learning in medicine.

Branch of Medicine Application Description ML Method References Image reconstruction High resolution and quality images Deep neural network [81] Radiology Convolutional neural network and Image analysis Faster and more accurate analysis [82] No need for cell/tissue staining; faster and Pathology in silico labeling Deep neural network [83] cheaper analysis Detection of kidney injury up to 48 h in advance, Prediction of organ injury Deep neural network [84] Nephrology which enable early treatment Image analysis and diagnosis Polycystic kidneys segmentation Convolutional neural network [48] Early detection of abdominal aortic aneurysm Agnostic learning [85] Improvement of ML techniques to cardiovascular Principal component analysis and random forest [86] Cardiology Personalized decision making disease risk prediction Mortality risk prediction model in patients with Decision tree [87] a heart attack Nutrition More accurate, personalized postmeal glucose Personalized decision making [88] response prediction Boosted decision tree Diabetology Estimation of global glomerulosclerosis before Transplantology Computer-Aided Diagnosis Convolutional Neural Network [49] kidney transplantation Studying drug mechanisms of action New mechanisms of antibiotic action White-box machine learning [89] Predicting compounds reactivity Automated tool for reactivity screening Supported vector machine [90] Ligands screening Faster screening of compounds that bind to the target Supported vector machine [91] Pharmacology Compounds screening of new antibacterial molecules Deep neural network [92] Generation of libraries of a novel, potentially De novo Reinforcement neural network [93] therapeutical compounds with desired properties MRI image analysis and fast diagnoses Psychiatry Image analysis and diagnosis Supported vector machine [94] of schizophrenia J. Pers. Med. 2021, 11, 32 13 of 22

Table 3. Cont.

Branch of Medicine Application Description ML Method References A naïve Bayes, supported vector machine, MRI image analysis and diagnoses of autism random forest, extremely randomized trees, Image analysis and diagnosis [95] spectrum disorder adaptive boosting, with decision tree base, logistic regression, neural network Prediction of progression of disability of multiple Decision tree, logistic regression, Neurology Prognosis the course of the disease [96] sclerosis patients supported vector machine Mild and moderate Parkinson’s Disease detection Diagnosis support Artificial Neural Network [97] and rating Diagnosis support Blepharospasm detection and rating Artificial Neural Network [98] k-nearest neighbors, a naïve Bayes, decision tree, Determination of optimal bone age for Dentistry Personalized decision making neural network, supported vector machine, [99] orthodontal treatment random forest, logistic regression Supported vector machine, gradient- boosting and prediction of septic shock in the machine, random forest, multivariate adaptive Emergency medicine Personalized decision making [100] emergency department regression splines, least absolute shrinkage and selection operator, ridge regression Surgery Personalized decision making Prediction of the amount of lost blood during surgery Random forest [101] Prediction of number of confirmed cases, deaths, and Estimation of epidemic trend Neural network [102] recoveries during coronavirus outbreak Classification of novel pathogens and determination of Supervised learning with digital Infectious diseases The evolutionary history of viruses [103] the origin of the viruses processing (MLDSP) Convolutional neural network, support vector Diagnoses of infectious diseases Early diagnoses of COVID-19 [104] machine, random forest, and multilayer perception J. Pers. Med. 2021, 11, 32 14 of 22

Table 3. Cont.

Branch of Medicine Application Description ML Method References Indicating increased risk of colorectal cancer, early Patients screening Decision tree [105,106] cancer detection Cancer research New cancer driver genes and mutations discovery Random forest [107,108] Cancer subtypes classification three-level classification model of gliomas Support vector machine, decision tree [109] Image analysis and cancer diagnosis Prediction of gene expression Deep learning [50] Tumor microenvironment components classification in 1-nearest neighbor, support vector machine, Improvement of image analysis [110] colorectal cancer histological images decision tree Gut microbiota analysis in search of biomarkers Convolutional neural network, support vector Cancer development preventing [111] Oncology of neoplasms machine, random forest, and multilayer perception A naïve Bayes, supported vector machine, Identification of microbial signatures affecting random forest, extremely randomized trees, Tolerability of cancer therapies [111,112] gastrointestinal drug toxicity adaptive boosting, gradient boosting with decision tree base, logistic regression, neural network Image analysis and prognosis the course Predicting hepatocellular carcinoma patients’ survival Deep learning [51] of the disease after tumor resection based on histological slides Prediction of therapy outcomes in EGFR Treatment response prediction Deep learning [113] variant-positive non-small cell lung cancer patients Image analysis Tumor microenvironments components identification Support vector machine [114] J. Pers. Med. 2021, 11, 32 15 of 22

3.1. Imaging in Medicine With an increased number of images taken every day, e.g., magnetic resonance imag- ing (MRI), computer tomography (CT), or X-rays, there is a strong need for a reliable, automated image evaluation tool. An interesting example is a tool created by Kermany et al., which, when adequately trained, has the potential of numerous applications in medi- cal imaging [82]. It uses a neural network to analyze optical coherence tomography (OCT) images of the retina, allowing to diagnose macular degeneration or diabetic retinopathy with high accuracy and sensitivity. Moreover, this model could also indicate the cause of bacterial or viral pediatric pneumonia, making it a universal radiological tool. ML also al- lows creating images with better quality. In reconstructing the noisy image, the automated transform by manifold approximation (AUTOMAP) framework is used to obtain better resolution and quality [81]. As more details can be recognized, the diagnosis can be faster and more accurate. The accuracy of imaging and its assessment is essential, especially in (the case of) detecting and diagnosing abnormalities in the development of the fetus. Parental diagnosis of fetal abnormalities has markedly benefited from the advances in ML. ML algorithms have been widely used to predict the risk of chromosomal abnormalities (i.e., euploidy, trisomy 21) or preterm births. The latest technological advances in ML also improve the diagnosis of fetal acidemia or hypoxia based on CTG analysis [115]. ML also progresses in imaging methods. In silico staining technique provides an excel- lent solution to microscopy problems, such as the need for additional staining to visualize some cells or tissue structures [83]. Based on patterns invisible to the human eye, the algo- rithm can accurately predict the cell nuclei’s location and size, cell viability, or recognize neurons among mixed cell populations. Recent advances and using DL-based techniques enabled to read more information from various images. It is now possible to improve the transplantation process by using CNN [49]. The approach created by Altini et al. analyzes kidney histological slides and determines the global glomerulosclerosis (ratio between sclerotic glomeruli and an overall number of glomeruli), which is one of the necessary steps in the pre-transplantation process. By using DL, it can be assessed faster and with high accuracy, and therefore has the potential to quicken the whole transplantation process. Using automatic semantic segmentation of patients with autosomal dominant polycystic kidney disease enables noninvasive disease monitoring [48]. The introduction of the latest ML techniques also enables predicting less obvious information from microscopic section images. Two interesting examples determine RNA expression [50] and predict patient survival after tumor resection [51]. Schmauch et al. created the HE2RNA model, which correctly predicted transcriptome of different cancer types, detected molecular and cellular modifications within cancer cells, and was able to spatialize differentially expressed genes specifically by T cells or B cells [50]. A different study developed two CNN models that could predict survival from histological slides after the surgical resection of hepatocellular carcinoma. Both models outperformed a composite score incorporating all baseline variables associated with survival [51].

3.2. Personalized Decision Making Fast and personalized decisions are crucial in almost every field of medical sciences. Moreover, detecting and predicting life-threatening conditions before their full clinical manifestation is a highly significant issue. Cardiology’s main goals focus on developing tools predicting cardiovascular disease risk [86] and the mortality rate in heart failure patients [87]. AI can also be applied for prognosis in acute kidney injury [84]. Physicians can be informed about the injury before changes, detectable with current methods, occur. AI uses a recurrent neural network trained with big datasets of over 700,000 adult patients. It can predict kidney function deterioration up to 48 h in advance, giving some extra time to improve patients’ condition. The auspicious direction of AI is an individualized prediction of genetic disease occurrence based on the patient’s genome screening. Integrating genomic data with J. Pers. Med. 2021, 11, 32 16 of 22

parameters such as lifestyle or previous conditions established the tool, which may be used in the early screening of abdominal aortic aneurysm [85]. Another useful tool, created for personalized nutrition, processes data (e.g., blood tests, gut microbiome profile, physical activity, or dietary habits) and predicts postprandial blood glucose level [88]. The evaluation indicated a high correlation between predicted and measured glycemic response, indicating high fidelity of ML application. Such an ap- proach may be the beginning of the personalized nutrition era to program diet in other metabolic disorders. As data suggest, the microbiome is strictly related to cancer, affecting tumorigenesis’s natural course. Specific microbial signatures promote cancer development and affect many aspects of cancer therapies, such as the treatment’s efficacy or safety. Hence, ML- driven gut microbiota analysis seems to be extremely useful in oncology to prevent cancer development, make an appropriate diagnosis, and finally treat cancer [111]. Early diagnosis is a crucial but often also challenging task. Here once again, ML proves useful. It is now possible to detect abnormalities in patients’ handwriting. Using ANN, the algorithm can determine whether a person may be affected by Parkinson’s Disease or how much the disease has already developed [97]. Often the symptoms of a particular condition are subtle and therefore difficult to observe. That is what happens with blepharospasm, which is caused by orbicularis oculi muscle contractions and, in most problematic cases, may result in complete closure of the eyelids and blindness. Based on ANN, AI software was created to deal with diagnosis making [98]. It analyzes recorded videos, recognizes facial landmarks, and can detect even subtle blinks and around the eye area movement, which are necessary for diagnosing this dystonia.

3.3. Drug Design The traditional approach of new drug design is based on numerous wet-lab exper- iments and is costly and time-consuming. Solutions to these problems are combining traditional synthesis methods with ML techniques [90]. Granda et al. applied the algorithm to analyze obtained data and classify reagents as reactive or non-reactive, faster, and with high precision. The used approach is the beginning of creating an automated tool for chemicals discovery, contributing to new therapeutic compounds development. Screening big datasets of compounds to find ligands with target proteins is a very long part of the drug design process, even with utilizing ML. The fast-screening compounds tool, which uses traditional support vector ML and a (GPU), was created to face this challenge. A GPU divides all the data into small parts and analyzes them simultaneously in smaller subsets, shortened screening time. The multi-GPU might reduce this time even more [91]. Applying a deep neural network enabled to screen of over 107 million molecules and identified a new antibiotic [92]. This compound, named halicin, differs in structure from previously known antibiotics and exhibits broad-spectrum activity in a mouse model, including pan-resistant bacteria. Another big problem in the field of pharmacology is to identify the compounds’ mechanism of action. Yang et al. proposed a “white-box” ML approach that could identify new drugs and antibiotics mechanisms of action, contribute to overcoming antibiotic resistance, and design new therapeutics [89].

3.4. Infectious Diseases Almost all global media in the first part of 2020 were dominated by information about the SARS-CoV2 outbreak. With a promptly increasing number of cases and COVID- 19-related deaths, there is a strong need for tools to fast diagnoses, estimate epidemic trends, and determine viruses’ evolutionary history. Taking all of these needs into account, ML comes in handy. Combined ML techniques, such as neural network, support vector machine, random forest, and multilayer perception, were used to create a tool for rapid, early detection of SARS-CoV2 patients. This algorithm analyzes computed tomography J. Pers. Med. 2021, 11, 32 17 of 22

(CT) chest scans and clinical information such as leucocyte count, symptomatology, age, sex and travel, and exposure history [104]. ML techniques are also beneficial for virologists and epidemiologists. Supervised learning with digital processing was used for the rapid classification of novel pathogens [103]. The authors created an alignment-free tool, which analyzes viral genomic sequences and enables tracking the evolutionary history of viruses and detecting their origin. Modeling epidemic trends is significant from the public health and health care system’s point of view. A combination of ML algorithms and mathematical models can reliably predict the number of confirmed cases, deaths, and recoveries in the peak of an epidemic several months earlier. What is more, it can estimate the number of additional hospitalizations, which gives the hospitals and health care facilities time to prepare [102].

4. Challenges and Prospects It may seem that the revolution in biotechnology and information technology enables us to apply these fields in a very advanced way in everyday life. In a sense, this is true, but we must be aware that this revolution is just beginning and will move faster and faster. We must not forget that machines have a significant advantage over humans in addition to being on par with human cognitive skills: they can be networked. How is this beneficial? Take an “AI doctor” as an example. Networked AI doctors could easily and rapidly exchange information, be actualized, and learn from each other. In contrast, it is impossible to actualize the knowledge of every single human doctor in the world. Furthermore, sometimes this knowledge might be life-saving information, for example, newly discovered symptoms and treatment of rapidly spreading disease, like COVID-19. Therefore, networked AI doctors’ abilities can be as valuable as numerous experienced human doctors of different specializations. Some people fear that one mistake of the networked AI doctor could result in fatal consequences for thousands of patients worldwide within a few minutes. However, con- nected AI doctors could make their own independent decisions, considering the other AI doctors’ opinions. For example, a single patient living in a small village in Siberia or Tibet could benefit from comparing diagnosis coming from a thousand AI doctors [116–118]. AI could provide more accurate, faster, and cheaper health care for everyone. This vision is very futuristic but possible. For now, ML enables human doctors to save their time, hospitals to save money, and patients to receive highly personalized and more accurate treatment. However, the pro- gressing implementation of ML in medicine has many technical and ethical limitations. The main technical issue that ML needs to overcome is the number of potential manip- ulations of input data that can influence the system’s decisions. For example, a simple action as adding a few extra pixels or rotating the image can lead to misdiagnosing and cancer misclassification as malignant or benign [79]. Researchers worldwide are trying to find a way to trick trained ML models in various ways and improve them [119]. ML models’ good performance is strongly connected with the amount of data used in the training process—the larger the dataset, the better the model is trained. This creates a need to have a significant amount of good-quality data, which is not always easily accessible. On the other hand, knowing the weaknesses of AI creates a field for hackers to control it and influence its outcomes. Fortunately, machines do not yet make essential decisions that may affect human health or even life without human supervision. ML’s introduction to health care requires many ethical and legal issues to be solved [120]. There are reasoned concerns that AI may mimic human and have a propensity for any kind of discrimination. However, machines would mimic human prejudice and favoritism only if the creator incorporates them into the algorithm. Another significant threat is the uncontrolled creation of algorithms to perform in an unethical way. Private IT companies, which want to produce medicine systems, will have to balance their profits and patients’ well-being. Given the above risks, it will be necessary for governmental J. Pers. Med. 2021, 11, 32 18 of 22

authorities to create a legal practice of ML-based systems approval and precautions to identify potential mistakes, biases, and abuse. Nowadays, when AI is all around, and humans interact with it on a regular basis, we may perceive the mind in the machines. Recent results suggest that most people report various emotions when interfacing with a system using ML [121]. The majority of people feel surprised or amazed by AI’s extraordinary outputs and its anthropomorphic qualities. However, AI-based systems also can arouse negative emotions such as discomfort, disappointment, confusion, or even fear. One thing is sure, ML models used in health care will need to earn patients’ and doctors’ trust. Some may argue that we will never let machines make their own decisions, but we already did in many fields. What is more alarming, many of us do not even know about it. Popular music applications decide what songs or artists they should recommend to us to match our taste or how often we need a random surprise to make us satisfied with the application. Everything we liked, watched, how many times went back to see the same picture, and how much time we spent on particular pictures is analyzed by algorithms. Based on all the gathered information, the algorithms recommend movies, posts, friends, advertisements. Moreover, the algorithms analyze us in terms of the likelihood of joining a particular group or organization. This sounds scary, but actions performed by machines are already influencing our decisions and lives daily. There is no doubt that, if ethically and adequately trained, ML improves medicine and health care. Nevertheless, it also leaves many unanswered questions. Should physicians have better knowledge about the construction and limitations of these tools? Are we able to trust the machines with our health and life? Will we allow the machines to think entirely by themselves? Will algorithms still require medical or bioinformatical supervision? Who will bear the blame for ML mistakes? For now, we are sure that artificial intelligence is not only the future but also the present.

Author Contributions: Conceptualization, S.M., and I.K.; investigation, O.K., A.W. and I.K.; writing— original draft preparation, O.K. and A.W.; writing—review and editing, S.M., and A.M.; visualization, S.M., O.K., and A.W.; supervision, S.M. All authors have read and agreed to the published version of the manuscript. O.K. and A.W. contributed equally. S.M. and I.K. contributed equally. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: No new data were created or analyzed in this study. Data sharing is not applicable to this article. Acknowledgments: We would like to thank Poznan University of Medical Sciences (Poznan, Poland) and Greater Poland Cancer Centre (Poznan, Poland) for supporting this work. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Ernest, N.; Carroll, D. Genetic Fuzzy based Artificial Intelligence for Unmanned Combat Aerial Vehicle Control in Simulated Air Combat Missions. J. Def. Manag. 2016, 6.[CrossRef] 2. Morando, M.M.; Tian, Q.; Truong, L.T.; Vu, H.L. Studying the Safety Impact of Autonomous Vehicles Using Simulation-Based Surrogate Safety Measures. J. Adv. Transp. 2018, 2018, 6135183. [CrossRef] 3. Palmer, C.; Angelelli, L.; Linton, J.; Singh, H.; Muresan, M. Cognitive Cyber Security Assistants–Computationally Deriving Cyber Intelligence and Course of Actions; AAAI: Menlo Park, CA, USA, 2016. 4. Karakaya, D.; Ulucan, O.; Turkan, M. Electronic Nose and Its Applications: A Survey. Int. J. Autom. Comput. 2020, 17, 179–209. [CrossRef] 5. Stulp, F.; Sigaud, O. Many regression algorithms, one unified model: A review. Neural Netw. 2015, 69, 60–79. [CrossRef] 6. Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [CrossRef] 7. Chandrasekaran, V.; Jordan, M.I. Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 2013, 110, E1181–E1190. [CrossRef] J. Pers. Med. 2021, 11, 32 19 of 22

8. Gahlot, S.; Yin, J. Data Optimization for Large Batch Distributed Training of Deep Neural Networks Mallikarjun (Arjun) Shankar. arXiv 2020, arXiv:2012.09272. 9. Yampolskiy, R.V. as a defining feature of AI-completeness. Stud. Comput. Intell. 2013, 427, 3–17. [CrossRef] 10. Aron, J. How innovative is Apple’s new voice assistant, Siri? New Sci. 2011, 212, 24. [CrossRef] 11. Soltan, S.; Mittal, P.; Vincent, H.; Poor, H.V. BlackIoT: IoT Botnet of High Wattage Devices Can Disrupt the Power Grid BlackIoT: IoT Botnet of High Wattage Devices Can Disrupt the Power Grid. In Proceedings of the 27th USENIX Security Symposium is sponsored by USENIX, Baltimore, MD, USA, 15–17 August 2018; pp. 33–47. 12. Gudwin, R.R. Evaluating intelligence: A Computational Semiotics perspective. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Nashville, TN, USA, 8–11 October 2000; Volume 3, pp. 2080–2085. 13. Ghahramani, Z. Probabilistic Machine Learning and Artificial Intelligence. Nature 2015, 521, 452–459. [CrossRef] 14. Cramer, J.S. The Origins of Logistic Regression. SSRN Electron. J. 2003, 119, 167–178. [CrossRef] 15. Neelamegam, S.; Ramaraj, E. Karaikudi Classification algorithm in Data mining: An Overview. Int. J. P2P Netw. Trends Technol. 2013, 3, 369–374. 16. Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [CrossRef] 17. Zou, K.H.; Tuncali, K.; Silverman, S.G. Correlation and simple linear regression. Radiology 2003, 227, 617–622. [CrossRef] 18. Multiple Linear Regression. In The Concise Encyclopedia of ; Springer: New York, NY, USA, 2008; pp. 364–368. 19. Polynomial Regression. In Applied ; Springer: Berlin/Heidelberg, Germany, 2006; pp. 235–268. 20. Ho, T.K. Random decision forests. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, IEEE Computer Society, Montreal, QC, Canada, 4–16 August 1995; Volume 1, pp. 278–282. 21. Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Syst. 2017, 42, 1–21. [CrossRef] 22. Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321–332. [CrossRef] 23. Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv 2013, arXiv:1307.0411. 24. Wang, M.; Sha, F.; Jordan, M.I. Unsupervised Dimension Reduction. Adv. Neural Inf. Process. Syst. 2010, 2, 2379–2387. 25. Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep Clustering for Unsupervised Learning of Visual Features. In Proceedings of the European Conference on (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. 26. Cios, K.J.; Swiniarski, R.W.; Pedrycz, W.; Kurgan, L.A. Unsupervised Learning: Association Rules. In Data Mining; Springer: New York, NY, USA, 2007; pp. 289–306. 27. Bifet, A.; Gavaldà, R.; Holmes, G.; Pfahringer, B.; M.I.T. Press Clustering. Machine Learning for Data Streams: With Practical Examples in MOA; MIT Press: Cambridge, MA, USA, 2018; pp. 149–163. ISBN 9780262346047. 28. Scicluna, B.P.; van Vught, L.A.; Zwinderman, A.H.; Wiewel, M.A.; Davenport, E.E.; Burnham, K.L.; Nürnberg, P.; Schultz, M.J.; Horn, J.; Cremer, O.L.; et al. Classification of patients with sepsis according to blood genomic endotype: A prospective cohort study. Lancet Respir. Med. 2017, 5, 816–826. [CrossRef] 29. Ayesha, S.; Hanif, M.K.; Talib, R. Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf. Fusion 2020, 59, 44–58. [CrossRef] 30. Clark, J.; Provost, F. Unsupervised dimensionality reduction versus supervised regularization for classification from sparse data. Data Min. Knowl. Discov. 2019, 33, 871–916. [CrossRef] 31. Handl, J.; Knowles, J.; Kell, D.B. Computational cluster validation in post-genomic . Bioinformatics 2005, 21, 3201–3212. [CrossRef][PubMed] 32. Lazar, C.; Taminau, J.; Meganck, S.; Steenhoff, D.; Coletta, A.; Molter, C.; De Schaetzen, V.; Duque, R.; Bersini, H.; Nowé, A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinforma. 2012, 9, 1106–1119. [CrossRef] 33. Yang, J.; Wang, H.; Ding, H.; An, N.; Alterovitz, G. Nonlinear dimensionality reduction methods for synthetic biobricks’ visualization. BMC Bioinform. 2017, 18, 47. [CrossRef][PubMed] 34. Zhu, M.; Xia, J.; Yan, M.; Cai, G.; Yan, J.; Ning, G. Dimensionality Reduction in Complex Medical Data: Improved Self-Adaptive Niche . Comput. Math. Methods Med. 2015, 2015, 794586. [CrossRef][PubMed] 35. Peng, C.; Chen, Y.; Kang, Z.; Chen, C.; Cheng, Q. Robust principal component analysis: A factorization-based approach with linear . Inf. Sci. 2020, 513, 581–599. [CrossRef] 36. Cheplygina, V.; de Bruijne, M.; Pluim, J.P.W. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 2019, 54, 280–296. [CrossRef] 37. Chi, S.; Li, X.; Tian, Y.; Li, J.; Kong, X.; Ding, K.; Weng, C.; Li, J. Semi-supervised learning to improve generalizability of risk prediction models. J. Biomed. Inform. 2019, 92, 103117. [CrossRef] 38. Chen, L.; Bentley, P.; Mori, K.; Misawa, K.; Fujiwara, M.; Rueckert, D. Self-supervised learning for medical image analysis using image context restoration. Med. Image Anal. 2019, 58, 101539. [CrossRef] 39. Doersch, C.; Zisserman, A. Multi-task Self-Supervised Visual Learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2051–2060. 40. Choy, G.; Khalilzadeh, O.; Michalski, M.; Do, S.; Samir, A.E.; Pianykh, O.S.; Geis, J.R.; Pandharipande, P.V.; Brink, J.A.; Dreyer, K.J. Current applications and future impact of machine learning in radiology. Radiology 2018, 288, 318–328. [CrossRef] J. Pers. Med. 2021, 11, 32 20 of 22

41. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [CrossRef] 42. Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [CrossRef] 43. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [CrossRef] [PubMed] 44. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] 45. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [CrossRef] 46. Marblestone, A.H.; Wayne, G.; Kording, K.P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 2016, 10, 94. [CrossRef][PubMed] 47. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.E.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [CrossRef][PubMed] 48. Bevilacqua, V.; Brunetti, A.; Cascarano, G.D.; Guerriero, A.; Pesce, F.; Moschetta, M.; Gesualdo, L. A comparison between two semantic deep learning frameworks for the autosomal dominant polycystic kidney disease segmentation based on magnetic resonance images. BMC Med. Inform. Decis. Mak. 2019, 19, 1–12. [CrossRef] 49. Altini, N.; Cascarano, G.D.; Brunetti, A.; Marino, F.; Rocchetti, M.T.; Matino, S.; Venere, U.; Rossini, M.; Pesce, F.; Gesualdo, L.; et al. Semantic Segmentation Framework for Glomeruli Detection and Classification in Kidney Histological Sections. Electronics 2020, 9, 503. [CrossRef] 50. Schmauch, B.; Romagnoni, A.; Pronier, E.; Saillard, C.; Maillé, P.; Calderaro, J.; Kamoun, A.; Sefta, M.; Toldo, S.; Zaslavskiy, M.; et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat. Commun. 2020, 11, 1–15. [CrossRef] 51. Saillard, C.; Schmauch, B.; Laifa, O.; Moarii, M.; Toldo, S.; Zaslavskiy, M.; Pronier, E.; Laurent, A.; Amaddeo, G.; Regnault, H.; et al. Predicting survival after hepatocellular carcinoma resection using deep-learning on histological slides. Hepatology 2020, 72, 2000–2013. [CrossRef][PubMed] 52. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651. 53. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. 54. Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. 55. Pinheiro, G.R.; Voltoline, R.; Bento, M.; Rittner, L. V-net and u-net for ischemic stroke lesion segmentation in a small dataset of perfusion data. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2019; Volume 11383 LNCS, pp. 301–309. 56. Nugaliyadde, A.; Wong, K.W.; Parry, J.; Sohel, F.; Laga, H.; Somaratne, U.V.; Yeomans, C.; Foster, O. RCNN for Region of Interest Detection in Whole Slide Images; Springer: Cham, Switzerland, 2020; pp. 625–632. ISBN 9783030638221. 57. Brunetti, A.; Carnimeo, L.; Trotta, G.F.; Bevilacqua, V. Computer-assisted frameworks for classification of liver, breast and blood neoplasias via neural networks: A survey based on medical images. Neurocomputing 2019, 335, 274–298. [CrossRef] 58. Girshick, R. Fast R-CNN; IEEE: New York, NY, USA, 2015; ISBN 978-1-4673-8391-2. 59. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [CrossRef][PubMed] 60. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [CrossRef] 61. Kriegeskorte, N.; Golan, T. Neural network models and deep learning. Curr. Biol. 2019, 29, R231–R236. [CrossRef][PubMed] 62. Lyu, C.; Chen, B.; Ren, Y.; Ji, D. Long short-term memory RNN for biomedical named entity recognition. BMC Bioinform. 2017, 18, 462. [CrossRef][PubMed] 63. Navamani, T.M. Efficient Deep Learning Approaches for Health . In Deep Learning and Parallel Environment for Bioengineering Systems; Elsevier: Amsterdam, The Netherlands, 2019; pp. 123–137. 64. Zhang, S.; Zhang, C.; Yang, Q. Data preparation for data mining. Appl. Artif. Intell. 2003, 17, 375–381. [CrossRef] 65. Chicco, D. Ten quick tips for machine learning in . Chicco BioData Min. 2017, 10, 35. [CrossRef] 66. Powell, M.; Hosseini, M.; Collins, J.; Callahan-Flintoft, C.; Jones, W.; Bowman, H.; Wyble, B. I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification. bioRxiv 2016, 119, 456–467. [CrossRef] 67. Boulesteix, A.-L. Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research. PLoS Comput. Biol. 2015, 11, e1004191. [CrossRef] 68. Grégoire, G. Simple linear regression. In EAS Publications Series; EDP Sciences: Les Ulis, France, 2015; Volume 66, pp. 19–39. 69. Tarca, A.L.; Carey, V.J.; Chen, X.; Romero, R.; Drăghici, S. Machine Learning and Its Applications to Biology. PLoS Comput. Biol. 2007, 3, e116. [CrossRef][PubMed] 70. Models for Machine Learning; IBM Developer: Armonk, NY, USA, 2017. J. Pers. Med. 2021, 11, 32 21 of 22

71. Mehta, P.; Bukov, M.; Wang, C.H.; Day, A.G.R.; Richardson, C.; Fisher, C.K.; Schwab, D.J. Physics Reports; Elsevier B.V.: Amsterdam, The Netherlands, 2019; pp. 1–124. 72. Online Payment Fraud. Available online: https://www.ravelin.com/insights/online-payment-fraud#thethreepillarsoffraudprotection (accessed on 19 November 2020). 73. Baker, J. Using Machine Learning to Detect Financial Fraud. Bus. Stud. Sch. Creat. Work 2019, 6. Available online: https: //jayscholar.etown.edu/busstu/6 (accessed on 19 November 2020). 74. Wei, J.; He, J.; Chen, K.; Zhou, Y.; Tang, Z. and Deep Learning Based Hybrid Recommendation for Cold Start Problem; IEEE: New York, NY, USA, 2016. [CrossRef] 75. Technology—Waymo. Available online: https://waymo.com/tech/ (accessed on 19 November 2020). 76. Brynjolfsson, E.; Rock, D.; Syverson, C.; Abrams, E.; Agrawal, A.; Autor, D.; Benzell, S.; Gans, J.; Goldfarb, A.; Goolsbee, A.; et al. Nber Working Paper Series Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics; National Bu- reau of Economic Research: Cambridge, MA, USA, 2017. 77. Chu, K.C.; Feldmann, R.J.; Shapiro, B.; Hazard, G.F.; Geran, R.I. and Structure-Activity Relation Studies. Computer-Assisted Prediction of Antitumor Activity in Structurally Diverse Drugs in an Experimental Mouse Brain Tumor System. J. Med. Chem. 1975, 18, 539–545. [CrossRef][PubMed] 78. Shortliffe, E.H. Computer-Based Medical Consultations: MYCIN; Elsevier: New York, NY, USA, 1976; ISBN 978-0444569691. 79. Finlayson, S.G.; Bowers, J.D.; Ito, J.; Zittrain, J.L.; Beam, A.L.; Kohane, I.S. Adversarial attacks on medical machine learning. Science 2019, 363, 1287–1289. [CrossRef][PubMed] 80. FDA’s Comprehensive Effort to Advance New Innovations: Initiatives to Modernize for Innovation | FDA. Available online: https://www.fda.gov/news-events/fda-voices/fdas-comprehensive-effort-advance-new-innovations-initiatives-modernize- innovation (accessed on 6 January 2021). 81. Zhu, B.; Liu, J.Z.; Cauley, S.F.; Rosen, B.R.; Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature 2018, 555, 487–492. [CrossRef][PubMed] 82. Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [CrossRef] [PubMed] 83. Christiansen, E.M.; Yang, S.J.; Ando, D.M.; Javaherian, A.; Skibinski, G.; Lipnick, S.; Mount, E.; O’Neil, A.; Shah, K.; Lee, A.K.; et al. In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images. Cell 2018, 173, 792–803.e19. [CrossRef][PubMed] 84. Tomašev, N.; Glorot, X.; Rae, J.W.; Zielinski, M.; Askham, H.; Saraiva, A.; Mottram, A.; Meyer, C.; Ravuri, S.; Protsyuk, I.; et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 2019, 572, 116–119. [CrossRef] 85. Li, J.; Pan, C.; Zhang, S.; Spin, J.M.; Deng, A.; Leung, L.L.K.; Dalman, R.L.; Tsao, P.S.; Snyder, M. Decoding the Genomics of Abdominal Aortic Aneurysm. Cell 2018, 174, 1361–1372. [CrossRef] 86. Jamthikar, A.; Gupta, D.; Khanna, N.N.; Saba, L.; Araki, T.; Viskovic, K.; Suri, H.S.; Gupta, A.; Mavrogeni, S.; Turk, M.; et al. A low-cost machine learning-based cardiovascular/stroke risk assessment system: Integration of conventional factors with image phenotypes. Cardiovasc. Diagn. Ther. 2019, 9, 420–430. [CrossRef] 87. Adler, E.D.; Voors, A.A.; Klein, L.; Macheret, F.; Braun, O.O.; Urey, M.A.; Zhu, W.; Sama, I.; Tadel, M.; Campagnari, C.; et al. Improving risk prediction in heart failure using machine learning. Eur. J. Heart Fail. 2019, 22, 139–147. [CrossRef][PubMed] 88. Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.; Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-Pompan, M.; et al. Personalized Nutrition by Prediction of Glycemic Responses. Cell 2015, 163, 1079–1094. [CrossRef][PubMed] 89. Yang, J.H.; Wright, S.N.; Hamblin, M.; McCloskey, D.; Alcantar, M.A.; Schrübbers, L.; Lopatkin, A.J.; Satish, S.; Nili, A.; Palsson, B.O.; et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell 2019, 177, 1649–1661.e9. [CrossRef][PubMed] 90. Granda, J.M.; Donina, L.; Dragone, V.; Long, D.L.; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 2018, 559, 377–381. [CrossRef] 91. Jayaraj, P.B.; Jain, S. Ligand based virtual screening using SVM on GPU. Comput. Biol. Chem. 2019, 83, 107143. [CrossRef] 92. Stokes, J.M.; Yang, K.; Swanson, K.; Jin, W.; Cubillos-Ruiz, A.; Donghia, N.M.; MacNair, C.R.; French, S.; Carfrae, L.A.; Bloom-Ackerman, Z.; et al. A Deep Learning Approach to Antibiotic Discovery. Cell 2020, 180, 688–702.e13. [CrossRef] 93. Popova, M.; Isayev, O.; Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 2018, 4, eaap7885. [CrossRef] 94. Lei, D.; Pinaya, W.H.L.; Young, J.; van Amelsvoort, T.; Marcelis, M.; Donohoe, G.; Mothersill, D.O.; Corvin, A.; Vieira, S.; Huang, X.; et al. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Hum. Brain Mapp. 2019, 41, 1119–1135. [CrossRef] 95. Mellema, C.; Treacher, A.; Nguyen, K.; Montillo, A. Multiple Deep Learning Architectures Achieve Superior Performance Diagnosing Autism Spectrum Disorder Using Features Previously Extracted from Structural and Functional MRI. Proc. IEEE Int. Symp. Biomed. Imaging 2019, 2019, 1891–1895. [CrossRef] 96. Law, M.T.; Traboulsee, A.L.; Li, D.K.; Carruthers, R.L.; Freedman, M.S.; Kolind, S.H.; Tam, R. Machine learning in secondary progressive multiple sclerosis: An improved predictive model for short-term disability progression. Mult. Scler. J. Exp. Transl. Clin. 2019, 5, 2055217319885983. [CrossRef][PubMed] J. Pers. Med. 2021, 11, 32 22 of 22

97. Cascarano, G.D.; Loconsole, C.; Brunetti, A.; Lattarulo, A.; Buongiorno, D.; Losavio, G.; Di Sciascio, E.; Bevilacqua, V. Biometric handwriting analysis to support Parkinson’s Disease assessment and grading. BMC Med. Inform. Decis. Mak. 2019, 19, 252. [CrossRef][PubMed] 98. Trotta, G.F.; Pellicciari, R.; Boccaccio, A.; Brunetti, A.; Cascarano, G.D.; Manghisi, V.M.; Fiorentino, M.; Uva, A.E.; Defazio, G.; Bevilacqua, V. A neural network-based software to recognise blepharospasm symptoms and to measure eye closure time. Comput. Biol. Med. 2019, 112, 103376. [CrossRef][PubMed] 99. Kök, H.; Acilar, A.M.; Izgi,˙ M.S. Usage and comparison of artificial intelligence algorithms for determination of growth and development by cervical vertebrae stages in orthodontics. Prog. Orthod. 2019, 20, 41. [CrossRef][PubMed] 100. Kim, J.; Chang, H.L.; Kim, D.; Jang, D.H.; Park, I.; Kim, K. Machine learning for prediction of septic shock at initial triage in emergency department. J. Crit. Care 2020, 55, 163–170. [CrossRef][PubMed] 101. Stehrer, R.; Hingsammer, L.; Staudigl, C.; Hunger, S.; Malek, M.; Jacob, M.; Meier, J. Machine learning based prediction of perioperative blood loss in orthognathic surgery. J. Craniomaxillofac. Surg. 2019, 47, 1676–1681. [CrossRef] 102. Liu, Z.; Huang, S.; Lu, W.; Su, Z.; Yin, X.; Liang, H.; Zhang, H. Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: A machine learning and -based analysis. Glob. Heal. Res. Policy 2020, 5, 1–11. [CrossRef] 103. Randhawa, G.S.; Soltysiak, M.P.M.; El Roz, H.; de Souza, C.P.E.; Hill, K.A.; Kari, L. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE 2020, 15, e0232391. [CrossRef] 104. Mei, X.; Lee, H.-C.; Diao, K.-Y.; Huang, M.; Lin, B.; Liu, C.; Xie, Z.; Ma, Y.; Robson, P.M.; Chung, M.; et al. Artificial intelligence- enabled rapid diagnosis of patients with COVID-19. Nat. Med. 2020, 26, 1224–1228. [CrossRef] 105. Hornbrook, M.C.; Goshen, R.; Choman, E.; O’Keeffe-Rosetti, M.; Kinar, Y.; Liles, E.G.; Rust, K.C. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Dig. Dis. Sci. 2017, 62, 2719–2727. [CrossRef] 106. Kinar, Y.; Kalkstein, N.; Akiva, P.; Levin, B.; Half, E.E.; Goldshtein, I.; Chodick, G. Development and validation of a predictive model for detection of colorectal cancer in primary care by analysis of complete blood counts: A binational retrospective study. J. Am. Med. Inform. Assoc. 2016, 972, 879–890. [CrossRef][PubMed] 107. Ellrott, K.; Bailey, M.H.; Saksena, G.; Covington, K.R.; Kandoth, C.; Stewart, C.; Hess, J.; Ma, S.; Chiotti, K.E.; McLellan, M.; et al. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. Cell Syst. 2018, 6, 271–281.e7. [CrossRef][PubMed] 108. Bailey, M.H.; Tokheim, C.; Porta-Pardo, E.; Sengupta, S.; Bertrand, D.; Weerasinghe, A.; Colaprico, A.; Wendl, M.C.; Kim, J.; Reardon, B.; et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 2018, 173, 371–385.e18. [CrossRef] [PubMed] 109. Lu, C.F.; Hsu, F.T.; Hsieh, K.L.C.; Kao, Y.C.J.; Cheng, S.J.; Hsu, J.B.K.; Tsai, P.H.; Chen, R.J.; Huang, C.C.; Yen, Y.; et al. Machine learning–based radiomics for molecular subtyping of gliomas. Clin. Cancer Res. 2018, 24, 4429–4436. [CrossRef] [PubMed] 110. Kather, J.N.; Weis, C.A.; Bianconi, F.; Melchers, S.M.; Schad, L.R.; Gaiser, T.; Marx, A.; Zöllner, F.G. Multi-class texture analysis in colorectal cancer histology. Sci. Rep. 2016, 6, 27988. [CrossRef][PubMed] 111. Cammarota, G.; Ianiro, G.; Ahern, A.; Carbone, C.; Temko, A.; Claesson, M.J.; Gasbarrini, A.; Tortora, G. Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat. Rev. Gastroenterol. Hepatol. 2020, 17, 635–648. [CrossRef] [PubMed] 112. Louise Pouncey, A.; James Scott, A.; Leslie Alexander, J.; Marchesi, J.; Kinross, J. Gut microbiota, chemotherapy and the host: The influence of the gut microbiota on cancer treatment. Ecancermedicalscience 2018, 12, 868. [CrossRef] 113. Song, J.; Wang, L.; Ng, N.N.; Zhao, M.; Shi, J.; Wu, N.; Li, W.; Liu, Z.; Yeom, K.W.; Tian, J. Development and Validation of a Machine Learning Model to Explore Tyrosine Kinase Inhibitor Response in Patients With Stage IV EGFR Variant–Positive Non–Small Cell Lung Cancer. JAMA Netw. Open 2020, 3, e2030442. [CrossRef] 114. Linder, N.; Konsti, J.; Turkki, R.; Rahtu, E.; Lundin, M.; Nordling, S.; Haglund, C.; Ahonen, T.; Pietikäinen, M.; Lundin, J. Identification of tumor epithelium and stroma in tissue microarrays using texture analysis. Diagn. Pathol. 2012, 7, 22. [CrossRef] 115. Garcia-Canadilla, P.; Sanchez-Martinez, S.; Crispi, F.; Bijnens, B. Machine Learning in Fetal Cardiology: What to Expect. Fetal Diagn. Ther. 2020, 47, 363–372. [CrossRef] 116. Vashist, S.; Schneider, E.; Luong, J. Commercial Smartphone-Based Devices and Smart Applications for Personalized Healthcare Monitoring and Management. Diagnostics 2014, 4, 104–128. [CrossRef][PubMed] 117. Nedungadi, P.; Jayakumar, A.; Raman, R. Personalized Health Monitoring System for Managing Well-Being in Rural Areas. J. Med. Syst. 2018, 42, 22. [CrossRef][PubMed] 118. Barrios, M.; Jimeno, M.; Villalba, P.; Navarro, E. Novel Data Mining Methodology for Healthcare Applied to a New Model to Diagnose Metabolic Syndrome without a Blood Test. Diagnostics 2019, 9, 192. [CrossRef][PubMed] 119. Heaven, D. Why deep-learning AIs are so easy to fool. Nature 2019, 574, 163–166. [CrossRef] 120. Char, D.S.; Shah, N.H.; Magnus, D. Implementing machine learning in health care ’ addressing ethical challenges. N. Engl. J. Med. 2018, 378, 981–983. [CrossRef] 121. Shank, D.B.; Graves, C.; Gott, A.; Gamez, P.; Rodriguez, S. Feeling our way to machine minds: People’s emotions when perceiving mind in artificial intelligence. Comput. Hum. Behav. 2019, 98, 256–266. [CrossRef]