<<

nanomaterials

Article Development of Single-Molecule Electrical Identification Method for Cyclic Monophosphate Signaling Pathway

Yuki Komoto 1,2 , Takahito Ohshiro 1 and Masateru Taniguchi 1,*

1 Institute of Science and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan; [email protected] (Y.K.); [email protected] (T.O.) 2 Artificial Intelligence Research Center, The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan * Correspondence: [email protected]; Tel.: +81-6-6879-8447

Abstract: Cyclic (cAMP) is an important research target because it activates protein , and its signaling pathway regulates the passage of ions and molecules inside a cell. To detect the chemical reactions related to the cAMP intracellular signaling pathway, cAMP, (ATP), adenosine monophosphate (AMP), and (ADP) should be selectively detected. This study utilized single-molecule quantum measurements of these adenosine family molecules to detect their individual electrical conductance using nanogap devices. As a result, cAMP was electrically detected at the single molecular level, and its signal was successfully discriminated from those of ATP, AMP, and ADP using the developed machine   learning method. The discrimination accuracies of a single cAMP signal from AMP, ADP, and ATP were found to be 0.82, 0.70, and 0.72, respectively. These values indicated a 99.9% accuracy when Citation: Komoto, Y.; Ohshiro, T.; detecting more than ten signals. Based on an analysis of the feature values used for the machine Taniguchi, M. Development of learning analysis, it is suggested that this discrimination was due to the structural difference between Single-Molecule Electrical the of the site of cAMP and those of ATP, ADP, and AMP. This method will be of Identification Method for Cyclic assistance in detecting and understanding the intercellular signaling pathways for small molecular Adenosine Monophosphate Signaling Pathway. Nanomaterials 2021, 11, 784. second messengers. https://doi.org/10.3390/nano11030784 Keywords: DNA; cyclic AMP; single-molecule detection; nanogap; second messenger Academic Editors: Kerstin Leopold and Andrew Pike

Received: 12 February 2021 1. Introduction Accepted: 16 March 2021 Second messengers are important for the signal transduction of intracellular signals Published: 19 March 2021 because they are always released by the cell in response to exposure to extracellular signal- ing molecules and trigger physiological changes at the cellular level, such as proliferation, Publisher’s Note: MDPI stays neutral differentiation, migration, survival, apoptosis, and depolarization [1–3]. Therefore, mea- with regard to jurisdictional claims in suring the molecular behavior of second messengers is important to understand the signal published maps and institutional affil- transduction events that occur inside cells. Among these second messengers, cyclic adeno- iations. sine monophosphate (cAMP) is an important research target because it activates protein kinases, which are used to regulate the passage of ions and small molecules such as Ca2+ through ion channels [4–6]. This cAMP is produced from adenosine triphosphate (ATP) by a g-protein and is converted to ATP by an adenylate cyclase enzyme. As a result Copyright: © 2021 by the authors. that the chemical properties and structures of molecules (i.e., ATP, adenosine Licensee MDPI, Basel, Switzerland. monophosphate (AMP), and adenosine diphosphate (ADP)) are very similar to those of This article is an open access article cAMP, the cAMP should be selectively detected in order to detect the chemical reactions distributed under the terms and related to its intracellular signaling pathway. conditions of the Creative Commons Fluorescence optical imaging using optically synthesized molecular or protein probes Attribution (CC BY) license (https:// has been as a selective detection method for cAMP. These probes are composed of two creativecommons.org/licenses/by/ functional parts that selectively bind to cAMP and have the ability to detect a fluorescent or 4.0/).

Nanomaterials 2021, 11, 784. https://doi.org/10.3390/nano11030784 https://www.mdpi.com/journal/nanomaterials Nanomaterials 2021, 11, x 2 of 12

Nanomaterials 2021, 11, 784 2 of 12

two functional parts that selectively bind to cAMP and have the ability to detect a fluores- cent or luminescent signal inside cells [7–10]. However, in order to simultaneously detect multipleluminescent targets signal consisting inside of cells cAMP-related [7–10]. However, molecules in order such toas simultaneouslyATP, cAMP, and detect ADP multiple to understandtargets consisting the signal of transduction cAMP-related of moleculesthe cAMP pathway, such as ATP, each cAMP, target andrequires ADP a toseparate understand probe.the signal In addition, transduction because of these the cAMP probes pathway, are bound each to the target individual requires target a separate molecules probe. in In ad- thedition, recognition, because the these detection probes conditions are bound toare the somewhat individual different target moleculesfrom the physiological in the recognition, conditions,the detection and conditionsit is appropriate are somewhat to directly different measure fromeach theof these physiological targets. Recently, conditions, single- and it is moleculeappropriate detection to directly nanotechnologies, measure each such of theseas nanopore targets. and Recently, nanogap single-molecule devices, have detectionde- velopednanotechnologies, so that these suchmethods as nanopore become prom andising nanogap approaches, devices, which have provide developed their so poten- that these tialmethods applications become within promising the approaches,science fields which[11–16]. provide Among their them, potential one of the applications approaches within isthe single-molecule life science quantum fields [11 detection,–16]. Among which them, is based one on of the the following approaches detection is single-molecule principle. Whenquantum sample detection, molecules which pass between is based nanogap on the followingelectrodes detectionthat are used principle. as sensors, When a tun- sample nelmolecules phenomenon pass betweeninduces an nanogap electron electrodes transfer through that are the used molecules, as sensors, resulting a tunnel in phenomenonthe de- tectioninduces of the an electronindividual transfer electrical through conductivity the molecules, due to the resulting electronic in thestate detection of each of of the the indi- moleculesvidual electrical [14–17]. conductivityTo date, we have due reported to the electronic the identification state of each of various of the molecules [14 –17]. suchTo date,as DNA, we haveRNA, reported and amino the acids identification using single-molecule of various biomolecules quantum detection suchas methods DNA, RNA, [18–22]. and amino acids using single-molecule quantum detection methods [18–22]. The current study investigated cAMP and its related molecules using the single-mol- The current study investigated cAMP and its related molecules using the single-molecule ecule quantum detection method (Figure 1a), and succeeded in electrically detecting quantum detection method (Figure1a), and succeeded in electrically detecting cAMP at the cAMP at the single-molecule level. The developed machine learning method was also suc- single-molecule level. The developed machine learning method was also successfully used cessfully used to discriminate the cAMP signals from those of ATP, AMP, and ADP (Fig- to discriminate the cAMP signals from those of ATP, AMP, and ADP (Figure1b), which are ure 1b), which are molecules similar to cAMP and coexist under physiological conditions. molecules similar to cAMP and coexist under physiological conditions. The single cAMP The single cAMP signal discrimination accuracies from AMP, ADP, and ATP were found tosignal be 0.82, discrimination 0.70, and 0.72, accuracies respectively. from Thes AMP,e values ADP, indicated and ATP a were 99.9% found accuracy to be when 0.82, 0.70,de- and tecting0.72, respectively.more than 10 Thesesignals. values Furthermore, indicated it was a 99.9% found accuracy that the features when detecting that significantly more than 10 contributedsignals. Furthermore, to the discrimination it was found by the that dev theeloped features machine that significantlylearning method contributed were af- to the fecteddiscrimination by the interaction by the developed of the electrodes machine and learning target molecules. method were This affectedresult indicated by the interaction that theof structural the electrodes difference and target from the molecules. ribose of This the phosphate result indicated site of that cAMP the enabled structural it to difference be distinguishedfrom the ribose from of theATP, phosphate ADP, and site AMP. of cAMP Thus, enabled it can be it tosaid be that distinguished the single-molecule from ATP, ADP, quantumand AMP. measurement Thus, it can method be said has that shown the single-molecule the potential for quantum identifying measurement one molecule method of has cAMPshown and the visualizing potential for the identifying information one transmission molecule of system cAMP in and the visualizing body containing the information the cAMP.transmission system in the body containing the cAMP.

(a) (b)

(c)

Figure 1. (a) Schematic image of single-molecule measurement of -type second messenger, e.g., cyclic adenosine monophosphate (cAMP), using nanogap electrodes. (b) Chemical structures of four targeted molecules: cyclic adenosine monophosphate (cAMP), adenosine triphosphate (ATP), adenosine monophosphate (AMP), and adenosine diphosphate (ADP). (c) Device and measurement system. The mechanically controllable break-junction (MCBJ) system was used for the formation of nanogap electrodes. Nanomaterials 2021, 11, 784 3 of 12

2. Materials and Methods 2.1. Single-Molecule Electrical Measurement of Sample Nucleotide Solution 50-Adenylic acid (AMP), 30,50-cyclic adenosine monophosphate (cAMP), adenosine 50-trihydrogen diphosphate (ADP), and adenosine 50-tetrahydrogen triphosphate (ATP) were purchased from Sigma-Aldrich (St. Louis, MO, USA) and used without further purification. Their chemical structures are shown in Figure1b. These were used to prepare 10 µM aqueous solutions without further purification. The prepared sample solutions were dropped (20 µL for each measurement) at the center of a sensor plate that contained the fabricated nanogap electrodes. The nanogap electrodes were constructed using the mechanically controllable break junction (MCBJ) nanofabrication method (Figure1c). The procedures for fabricating MCBJs are detailed elsewhere [23–25]. In this study, a nanogap device was fabricated as follows. First, a silicon substrate was electrically insulated by applying a thin polyimide layer. A nano-gold junction was fabricated using electron-beam lithography on the substrate. Next, a SiO2 layer was deposited using chemical vapor deposition. Here, a nanochannel pattern was superimposed on the nano-gold junctions by electron beam lithography. Finally, this pattern was developed, and dry etching was performed to form nanochannels. A cover made of polydimethylsiloxane (PDMS) was attached to the silicon substrate. The PDMS cover had a microchannel that connected the hole for introducing the sample solution and the nanochannel of the sensor in advance. The PDMS was purchased from Toray Dow Corning. Finally, the PDMS cover and silicon substrate were treated with ozone plasma and then bonded. The nanogap electrode devices were used for single-molecule measurements of the sample nucleotide solutions. The gap size was set to 0.65 nm and finely tuned by the piezoelectric element during all the measurements. During the measurement, the gap distance was maintained by feedback control of the piezo actuators. The gap distance was estimated using the following equation for the direct tunneling current:

 4π √  I = const exp − 2mwl (1) h

where h, m, w, and l represent the Planck constant, electron mass, work function of the gold electrode, and gap distance, respectively. An electron mass of 9.1 × 10−31 kg was used for m, and the work function of Au(111) (5.1 eV) was used for w [19]. After every 1 h period of current–time (I–t) measurements, the MCBJ sample was replaced with a new one. The current across the electrodes was amplified by a custom-built logarithmic current amplifier and recorded at 10 kHz and 100 kHz using an NI PXIe-4081 digital multimeter (National Instruments, Austin, TX, USA) and NI PXI-5922 (National Instruments, Austin, TX, USA) under a DC bias voltage of 100 mV. More than three sets of measurements were performed using different gold-gap sensors.

2.2. Single-Pickup and Machine Learning Method for Single Analysis The signal-rise time was judged to be the start of the signal from the estimated baseline with a value exceeding six times the “noise level” as the threshold value. At this time, the signal exceeded the noise level. Next, the signal was determined to have ended when a data point below the noise level appeared from the estimated baseline after the signal started. The baseline was defined as the mode of the histogram of the last 2000 data points at each time. The noise level was defined as the mode of the standard deviation of 200 data points obtained by dividing the latest 10,000 data points at each time into 500 equal parts. A machine learning analysis was performed using Python 3.6. First, positive and unlabeled data classification (PUC) was performed to remove “blank signals“, which were also observed in blank solutions because of the migration of gold atoms from the electrode and contamination and reported in previous studies [26–30]. The PUC was an appropriate algorithm for blank signal removal. The PUC algorithm is based on the method of Elkan and Noto [31]. The inner classifier was the Gaussian naïve Bayes from Nanomaterials 2021, 11, x 4 of 12

at each time. The noise level was defined as the mode of the standard deviation of 200 data points obtained by dividing the latest 10,000 data points at each time into 500 equal parts. A machine learning analysis was performed using Python 3.6. First, positive and un- labeled data classification (PUC) was performed to remove “blank signals“, which were also observed in blank solutions because of the migration of gold atoms from the electrode and contamination and reported in previous studies [26–30]. The PUC was an appropriate algorithm for blank signal removal. The PUC algorithm is based on the method of Elkan and Noto [31]. The inner classifier was the Gaussian naïve Bayes from the scikit-learn li- brary, version 0.21.3 [32]. After blank signal removal, the signals were trained and classi- fied using supervised machine learning with the random forest classifier from the scikit- learn library, version 0.21.3 [32]. A 10-fold cross-validation was performed, where all the

Nanomaterialsdata2021 of, 11 the, 784 single-molecule signals were partitioned into 10 sub-datasets, and 10 classifi-4 of 12 cations were performed for each of the sub-datasets used as test data, with other sub- datasets used as training data. The average ratios of the 10 classifications are shown in the confusion matrices.the scikit-learn library, version 0.21.3 [32]. After blank signal removal, the signals were trained and classified using supervised machine learning with the random forest classifier 3. Results from the scikit-learn library, version 0.21.3 [32]. A 10-fold cross-validation was performed, where all the data of the single-molecule signals were partitioned into 10 sub-datasets, and 3.1. Single-Molecule10 Electrical classifications Dete werection performed of cAMP, for ATP, each AMP, of the sub-datasets and ADP used as test data, with other First, the cAMPsub-datasets solution used was as investigated training data. The using average the ratiosnanogap of the electrodes 10 classifications (Figure are shown 1a: in schematics). The current–timethe confusion matrices. (I–t) profiles for the blank and cAMP solutions are shown in Figure 2a,b, respectively.3. Results Further current-time traces of cAMP measurements are shown in Figure S13.1. in Single-Molecule Supplementary Electrical Materials. Detection ofThe cAMP, cAMP ATP, molecules AMP, and ADP were detected as characteristic pulse currentFirst, the signals cAMP because solution wasincreases investigated in the using tunnel the nanogap current electrodes were induced (Figure 1a: by the cAMP moleculesschematics). translocating The current–time throug (I–t)h profiles the nanogap for the blank electrodes. and cAMP A solutions characteristic are shown in current signal wasFigure detected,2a,b, respectively. as shown Furtherin Figu current-timere 2c. The traces signal’s of cAMP peak measurements value was aredefined shown in Figure S1 in Supplementary Materials. The cAMP molecules were detected as characteristic as Ip, and the signalpulse duration current signalstime was because defined increases as intd.the Compared tunnel current to the were cAMP induced signals by the cAMPin Figure 2b, the averagemolecules Ip for translocating the characteristic through thecurrent nanogap signals electrodes. decreased A characteristic from 33 currentpA (Fig- signal ure 2b) to 18 pA (Figurewas detected, 2a). In asaddition, shown in th Figuree signal2c. The frequency signal’s peak decreased value was significantly defined as I p(Fig-, and the ure 2d). The duration-timesignal duration was timefound was to defined be 0.3 asmilliseconds. td. Compared In to thethe cAMP previous signals study, in Figure it was2b, the average Ip for the characteristic current signals decreased from 33 pA (Figure2b) to 18 pA found that the signal(Figure intensity2a). In addition,was around the signal 10 pA, frequency and the decreased duration significantly time was (Figure around2d). 0.2 The milliseconds. Suchduration-time statistical signal was found values to be such 0.3 milliseconds. as signal Inintensities the previous and study, duration-time it was found that were very similar theto signalthose intensity of the wasblank around sign 10als pA, obtained and the duration in buffer time aqueous was around solutions 0.2 milliseconds. in previous studies [19].Such These statistical results signal suggest values suched asthat signal the intensities obtained and current duration-time signals were were very typ- similar ical molecular signalsto those for ofcAMP the blank molecules. signals obtained in buffer aqueous solutions in previous studies [19]. These results suggested that the obtained current signals were typical molecular signals for cAMP molecules.

Figure 2. Current–time (I–t) profiles for signals obtained in water (a), and cAMP aqueous solution (b). (c) Enlargement of Figure 2. Current–time (I–t) profiles for signals obtained in water (a), and cAMP aqueous solution signal shown in (b) obtained in cAMP aqueous solution. (d) Comparison of signal frequency values for blank and cAMP, (b). (c) Enlargement of signal shown in (b) obtained in cAMP aqueous solution. (d) Comparison of respectively. The signal frequencies are clearly different, with the signal frequency of the signals obtained in the cAMP aqueoussignal solution frequency (right: values cAMP) significantlyfor blank and larger cAMP, than that resp ofectively. the signals The obtained signal in waterfrequencies (left: blank). are clearly dif- ferent, with the signal frequency of the signals obtained in the cAMP aqueous solution (right: cAMP) significantly largerNext, than single-molecule that of the signals measurements obtained in were water made (left: of blank). the ATP, AMP, and ADP. To evaluate the differences among these four types of molecular signals, the peak current (Ip) values were statistically compared. Figure3a–d show the current signal histograms. The average Ip values of cAMP, AMP, ADP, and ATP at a bias of 100 mV were 32 ± 18, 27 ± 13, 41 ± 21, and 36 ± 15 pA, respectively. These results demonstrated that the differences in the conductance values were small. Similarly, the duration (td) values were statistically compared. Figure3e–h show the signal duration time histograms. The average signal duration-time were 4.6, 4.6, 4.1, and 5.3 ms for cAMP, AMP, ADP, and ATP, respectively. These duration time values (td) were also similar, although the ATP signals had slightly longer averaged td values. Thus, statistical analyses of these signals showed that a single Nanomaterials 2021, 11, x 5 of 12

Next, single-molecule measurements were made of the ATP, AMP, and ADP. To evaluate the differences among these four types of molecular signals, the peak current (Ip) values were statistically compared. Figure 3a–d show the current signal histograms. The average Ip values of cAMP, AMP, ADP, and ATP at a bias of 100 mV were 32 ± 18, 27 ± 13, 41 ± 21, and 36 ± 15 pA, respectively. These results demonstrated that the differences in the conductance values were small. Similarly, the duration (td) values were statistically compared. Figure 3e–h show the signal duration time histograms. The average signal du- ration-time were 4.6, 4.6, 4.1, and 5.3 ms for cAMP, AMP, ADP, and ATP, respectively. These duration time values (td) were also similar, although the ATP signals had slightly Nanomaterials 2021, 11, 784 5 of 12 longer averaged td values. Thus, statistical analyses of these signals showed that a single feature parameter (i.e., Ip or td) would not be sufficient for their discrimination. However, multiple feature parameter analysis has potential for improving the discrimination accu- feature parameter (i.e., I or t ) would not be sufficient for their discrimination. However, racy of cAMP, ATP,p AMP,d and ADP. Figure 3i–l show two-dimensional plots of Ip and td, multiple feature parameter analysis has potential for improving the discrimination accuracy and the dependency of the sample species appears in these plots. The histograms of cAMP of cAMP, ATP, AMP, and ADP. Figure3i–l show two-dimensional plots of Ip and td, and the dependencyand AMP ofare the similar. sample speciesBut the appears difference in these was plots. observed The histograms as shown of cAMP in andFigure S2 in Sup- AMPplymentary are similar. Material. But the difference These results was observed indicated as shown that ininformation Figure S2 in Supplymentaryabout the entire waveforms Material.of the molecular These results signals indicated or several that information characteri aboutstic thefeature entire parameters waveforms of extracted the from the molecularsignals are signals needed or several for accurate characteristic discrimination. feature parameters extracted from the signals are needed for accurate discrimination.

Figure 3. 3.Statistical Statistical analysis analysis results results for obtained for obtained signals. In sign theseals. analyses, In these at least analyses, 1000 signals at least were 1000 signals utilized.were utilized. The occurrence The occurrence in the histogram in the representshistogram normalized represents signal normalized numbers. signal (a–d) Current numbers. (a–d) Cur- histogramsrent histograms at bias voltage at bias of voltage 100 mV forof cAMP100 mV (a), for AMP cAMP (b), ADP (a), ( cAMP), and ( ATPb), ADP (d). (e –(hc),) Duration- and ATP (d). (e–h) Du- timeration-time histograms histograms of cAMP (e ),of AMP cAMP (f), ADP(e), AMP (g), and (f ATP), ADP (h). (ig–),l) Two-dimensionaland ATP (h). (i plots–l) Two-dimensional of Ip and plots td for cAMP (i), AMP (j), ADP (k), and ATP (l), and dependency of sample species. of Ip and td for cAMP (i), AMP (j), ADP (k), and ATP (l), and dependency of sample species. 3.2. Discrimination of cAMP ATP, AMP, and ADP Using Machine Learning Method 3.2. ToDiscrimination improve the discrimination of cAMP ATP, of cAMP, AMP, ATP, and AMP, ADP and Using ADP usingMachine a multiple Learning feature Method parameterTo improve analysis, athe machine discrimination learning-based of cAMP, classification ATP, methodAMP, and was applied,ADP using with a multiple fea- multidimensional feature parameters prepared using each of the individual signals. The schemeture parameter of this machine analysis, learning a methodmachine is shownlearning-based in Figure4a,b. classification In the first step, method each was applied, signalwith multidimensional was obtained by the single-molecule feature parameters electrical prepared measurement. using In the each second of the step, individual the signals. single-moleculeThe scheme of signal this machine was extracted learning based onmethod the criteria is shown described in Figure in the materials 4a,b. In andthe first step, each methodssignal was section. obtained This study by the used single-molecule the following 13 featureelectrical parameters: measurement. the peak In value the of second step, the the signal current (I ), average signal value (I ), duration time (t ), and ten-dimensional single-molecule psignal was extracted avebased on the criteriad described in the materials and shape factors (Sn = In/Ip (n = 1, 2, ... , 10)). The values of Sn were defined as follows. The current–timemethods section. (I–t) profile This of study a signal used region the was follow separateding into13 feature ten time parameters: regions, and then the peak value of eachthe signal of the average current current (Ip), valuesaverage (blue signal circles) value for each (Iave of), the duration time regions time (I1 (,tId2),, I 3and, I4, Iten-dimensional5, I , I , I , I , and I ) was calculated (Figure4c). Each of these current values was normalized shape6 7 8 factors9 10(Sn = In/Ip (n = 1, 2, …, 10)). The values of Sn were defined as follows. The by the maximum current value of the signal (Ip), and the values in each region were defined as the shape factors, i.e., S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10. In the third step, each of the extracted signals was converted into feature parameter data (Ip, Iave, td, Sn (n = 1, 2, ... , 10)). The dataset of the feature parameter data was split into two datasets: the training

Nanomaterials 2021, 11, x 6 of 12

current–time (I–t) profile of a signal region was separated into ten time regions, and then each of the average current values (blue circles) for each of the time regions (I1, I2, I3, I4, I5, I6, I7, I8, I9, and I10) was calculated (Figure 4c). Each of these current values was normalized by the maximum current value of the signal (Ip), and the values in each region were de- fined as the shape factors, i.e., S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10. In the third step, each of the extracted signals was converted into feature parameter data (Ip, Iave, td, Sn (n = 1, 2, …, 10)). The dataset of the feature parameter data was split into two datasets: the training data and test data. In the fourth step, a machine learning classifier learned the feature parameters from the training data, as shown in Figure 4b. In the fifth step, the classifier was utilized to identify each of the chemical species in the test data signals for cAMP, Nanomaterials 2021, 11, 784 ATP, AMP, and ADP, and the discrimination accuracies were evaluated as F-measure6 of 12 val- ues. In the final step, the accuracy was validated by performing a 10-fold cross-validation, as described in the materials and methods section. dataThese and feature test data. parameters In the fourth (Ip, step,Iave, td a, and machine Sn) were learning used classifier to perform learned classifications the feature with theparameters supervised from machine the training learning data, method as shown for inallFigure the different4b. In the discrimination fifth step, the classifiercombinations of cAMP,was utilized ADP, to AMP, identify and each ATP. of the The chemical evaluation species results in the testare datashown signals in Figure for cAMP, 4d. ATP, The dis- criminationAMP, and accuracy ADP, and value the discrimination of 0.28 for cAMP accuracies was were poor, evaluated while those as F-measure for the ADP, values. AMP, andIn ATP the final signals step, showed the accuracy high wasaccuracy. validated by performing a 10-fold cross-validation, as described in the materials and methods section.

Figure 4. (a) Flow of data acquisition and analysis. (b) Scheme for machine learning analysis. (c) FigureDefinitions 4. (a) Flow of 13 of features data acquisition parameter of and signals analysis. usedfor (b) machine Scheme learning for machine analysis. learning These 13analysis. feature (c) Definitions of 13 features parameter of signals used for machine learning analysis. These 13 fea- parameters included the following: the peak value of the current signal (Ip), average signal value ture parameters included the following: the peak value of the current signal (Ip), average signal (Iave), duration-time (td), and ten-dimensional shape factors (Sn = In/Ip (n = 1, 2, ... , 10)). The average value (Iave), duration-time (td), and ten-dimensional shape factors (Sn = In/Ip (n = 1, 2, …, 10)). The current value (blue circle) for each of the time regions (I1, I2, I3, I4, I5, I6, I7, I8, I9, and I10) was averagecalculated. current Each value of the (blue current circle) values for waseach normalized of the time by regions the maximum (I1, I2, I current3, I4, I5, valueI6, I7, I of8, I the9, and signal I10) was calculated. Each of the current values was normalized by the maximum current value of the signal (Ip), and the values in each region were defined as shape factors (S1, S2, S3, S4, S5, S6, S7, S8, S9, (Ip), and the values in each region were defined as shape factors (S1, S2, S3, S4, S5, S6, S7, S8, S9, and and S10). (d) Evaluation results for discrimination accuracies of cAMP, ADP, AMP, and ATP. To S10).calculate (d) Evaluation the confusion results matrix for discrimination for the signals, ataccuracies least one thousandof cAMP, signals ADP, wereAMP, obtained and ATP. from To the calcu- lateexperiments the confusion and matrix all the signalsfor the were signals, used at for leas thet evaluation.one thousand The signals prediction were ratios obtained in the confusion from the ex- perimentsmatrix represent and all the the numberssignals were of signals used predicted for the ev toaluation. be the molecules The prediction normalized ratios by the in total the signalsconfusion matrixof the represent true molecules. the numbers The diagonal of signals terms predicted of the matrix to be indicate the molecules the discrimination normalized accuracies by the of total the sig- nalsmolecules. of the true The molecules. classifier was The trained diagonal with terms 10,000 of signals. the matrix indicate the discrimination accuracies of the molecules. The classifier was trained with 10,000 signals.

These feature parameters (Ip, Iave, td, and Sn) were used to perform classifications with the supervised machine learning method for all the different discrimination combinations of cAMP, ADP, AMP, and ATP. The evaluation results are shown in Figure4d. The discrimi- nation accuracy value of 0.28 for cAMP was poor, while those for the ADP, AMP, and ATP signals showed high accuracy. In the single-molecule detection, the obtained signals include target sample signals and also include signals which were generated from the fluctuation of the molecule or the electrode as shown in Figure2a,d. We called these signals blank signals. The blank signals are not Johnson noise, (thermal noise), and/or shot noise of measuring instruments generated by electrical measurement, but the signal. In the previous reports, this blank signal adversely affected the discrimination of molecules [26,27,29] so that the reduction of blank signal are a key step for the discrimination with high accuracy. To address the issues Nanomaterials 2021, 11, 784 7 of 12

of the blank signals, the signal of the sample target, i.e., cAMP, ADP, AMP, and ATP, had to be extracted from the obtained signals. In order to remove the blank signals, the PUC method was applied to this signal discrimination. In the PUC method, the signals obtained from the blank solution (control experiments) were used as “positive” signals. On the other hand, the signal data of the sample solution were regarded as “unlabeled” data because the signals obtained from the sample solution contained both blank signals and the sample molecule signals. The scheme is shown in Figure5a,b. In the first step, single-molecule measurements were performed for the solvent (blank). The signal data of the blank were regarded as positive data. In the second step, each signal was converted into a feature parameter value (Ip, Iave, td, and Sn). In the third step, the machine-learning classifier learned the feature parameters of the blank signals. In the fourth step, single-molecule measurements were performed on the sample solutions, and these signal data were regarded as unlabeled data. In the fifth step, the sample molecule signals were discriminated from the blank signals using the machine Nanomaterials 2021, 11, x 8 of 12 learning classifier of blank signals learned in the third step. In the final step, the extracted sample data (cAMP, ADP, AMP, and ATP) were used for the subsequent classification by the supervised machine learning method.

FigureFigure 5. (a) 5. Flow(a) Flow of data of data acquisition acquisition and and analysis analysis with with positive positive andand unlabeled unlabeled data data classification classification (PUC) (PUC) method. method. In the In the PUC method,PUC method, the signals the signals obtained obtained from from the the blank blank solution solution (control experiments)experiments) were were used used as positiveas positive signals. signals. On the On the other otherhand, hand, the signal the signal data data of the of the sample sample solution solution were were rega regardedrded asas unlabeledunlabeled data data because because these these signals signals were were composed composed of bothof blank both blank signals signals and andthe sample the sample molecule molecule signals. signals. (b) ( bSchematic) Schematic of of machine machine learning learning analysis usingusing PUC PUC method. method. (c) Evaluation(c) Evaluation results for results discrimination for discrimination accuracies accuracies of cAMP, of cAMP, ADP, ADP, AMP, AMP, and and ATP ATP using using the the PUC PUC method. ComparedCompared to to the discriminationthe discrimination accuracy accuracy value of value 0.25 offor 0.25 cAMP for cAMP without without the PUC the PUC method, method, the the average average discrimination discrimination accuracy accuracy waswas also improvedalso improved to 0.50. The to 0.50. classifier The classifier was trained was trained with with1000 1000signals. signals. (d) (Relationd) Relation between between probability probability of accurateaccurate prediction prediction by majorityby majorityvote and vote number and number of signals. of signals. The accuracy The accuracy was set was to set 0.5 to for 0.5 four for four classes. classes. (e– (je) –Evaluationj) Evaluation of of discrimination discrimination accu- racy ofaccuracy cAMP from of cAMP other from molecules, other molecules, i.e., AMP, i.e., ATP, AMP, and ATP, ADP. and ADP.The two-target The two-target discrimination discrimination accuracy accuracy values values were were found to be found0.70 for to AMP/cAMP be 0.70 for AMP/cAMP (e), 0.69 for (e ADP/cAMP), 0.69 for ADP/cAMP (f), 0.75 for (f), ATP/cAMP 0.75 for ATP/cAMP (g), 0.69 (gfor), 0.69AMP/ATP for AMP/ATP (h), 0.74 (h ),for 0.74 ATP/AMP for (i), andATP/AMP 0.69 for ADP/ATP (i), and 0.69 ( forj). ADP/ATP (j). Figure5c shows the evaluation results for the cAMP, ADP, AMP, and ATP discrimina- 4. Discussion tion accuracies using the PUC method. Compared to the discrimination accuracy value of 0.25This for molecular cAMP without recognition the PUC methodmethod (Figure using4 machined), the discrimination learning (Figure accuracy 5a,b) of cAMPhad at least twowas advantages significantly compared improved to to the 0.44. previous Together di withscrimination the other discrimination method that accuraciesonly analyzed of a single0.50 parameter for AMP, 0.49 such for as ADP, Ip or andtd. The 0.59 first for ATP,advantage the average was the discrimination reduction of accuracy the blank was signal usingalso the improved PUC method to 0.50. before For instance, signal discrimi when morenation. than The ten discrimination signals with a discrimination accuracy was sig- accuracy of 0.50 were obtained, a 99% accuracy is obtained (Figure5d). In the case of nificantly improved after PUC (Figure 5c) compared to that before PUC (Figure 4d). In identification with only two types, the discrimination accuracy values were found to be addition,0.70 for after AMP/cAMP, PUC, the 0.69 average for ADP/cAMP, Ip values of 0.75 the for cAMP, ADP/cAMP, AMP, ADP, 0.69 for and AMP/ATP, ATP signals 0.74 were 120for ± 18 ATP/cAMP, pA, 100 ± and24 pA, 0.69 104 for ADP/ATP± 13 pA, and (Figure 1485 ±e–j). 22 ThepA, obtainedrespectively. accuracy As a represents result that the intensitythat the of specific blank molecular signals is species known were to identifiedbe small, withthe difference a certain accuracy in the forcurrents each signal. with and without PUC was due to the blank signal reduction when using the PUC method. There- fore, the PUC method allowed the target signals to be extracted from the obtained signals, which made it possible to better detect slight differences in the chemical structures com- pared to conventional methods. The second advantage was that by using multiple parameters (Ip, Iave, td, Sn (n = 1, 2, ..., 10)) for each of the signals, this analysis significantly improved the discrimination ac- curacy and understanding of the differences in the physical behaviors of each molecule. In this analysis, the maximum current value (Ip), average value (Iave), signal duration (td), and signal shape factor of each region (Sn) were used for supervised machine learning. A comparison of several classification algorithms, including support vector machine (SVM) and extreme gradient boosting (XGBoost), showed that the random forest algorithm pro- vided the best discrimination accuracy (Figure 6a). In an ensemble learning method such as random forest, some feature parameters are randomly selected for classification trees, and then classification is performed using the ensemble. If only a specific feature param- eter is important for discrimination, while other parameters are not important, the identi- fication accuracy in a random forest becomes low. Therefore, the good discrimination ac- curacy using the random forest method indicated that all the prepared features used here were involved in the discrimination.

Nanomaterials 2021, 11, 784 8 of 12

Therefore, in order to determine the ratio of sample molecules in the mixed solution, the discrimination error of each signal can be determined from the obtained confusion matrix such as Figure5c. In the previous report, we discriminated ethylated from guanosine and estimated the mixture ratio [27], and also identified each sample in a mixed solution containing , , and noradrenaline and estimated the mixture ratio, which is comparable to that of the prepared solution [26]. Therefore, this method will be applicable for understanding cAMP-related reactions such as enzymatic reactions at the single-molecular level. However, we also found the determined mixture ratio has some difference which depends on the sample species. Since the discrimination errors were due to the difference between the concentration of bulk solution and that of near the sensor gap, further development of the active mass transport function in solution such as electrophoresis [30] would be needed.

4. Discussion This molecular recognition method using machine learning (Figure5a,b) had at least two advantages compared to the previous discrimination method that only analyzed a single parameter such as Ip or td. The first advantage was the reduction of the blank signal using the PUC method before signal discrimination. The discrimination accuracy was significantly improved after PUC (Figure5c) compared to that before PUC (Figure4d). In addition, after PUC, the average Ip values of the cAMP, AMP, ADP, and ATP signals were 120 ± 18 pA, 100 ± 24 pA, 104 ± 13 pA, and 148 ± 22 pA, respectively. As a result that the intensity of blank signals is known to be small, the difference in the currents with and without PUC was due to the blank signal reduction when using the PUC method. Therefore, the PUC method allowed the target signals to be extracted from the obtained signals, which made it possible to better detect slight differences in the chemical structures compared to conventional methods. The second advantage was that by using multiple parameters (Ip, Iave, td, Sn (n = 1, 2, ... , 10)) for each of the signals, this analysis significantly improved the discrimination accuracy and understanding of the differences in the physical behaviors of each molecule. In this analysis, the maximum current value (Ip), average value (Iave), signal duration (td), and signal shape factor of each region (Sn) were used for supervised machine learning. A comparison of several classification algorithms, including support vector machine (SVM) and extreme gradient boosting (XGBoost), showed that the random forest algorithm pro- vided the best discrimination accuracy (Figure6a). In an ensemble learning method such as random forest, some feature parameters are randomly selected for classification trees, and then classification is performed using the ensemble. If only a specific feature parameter is important for discrimination, while other parameters are not important, the identification accuracy in a random forest becomes low. Therefore, the good discrimination accuracy using the random forest method indicated that all the prepared features used here were involved in the discrimination. Next, to evaluate each feature parameter’s contribution to the discrimination, the dis- crimination accuracy was determined after changing the combination of feature parameters used. Among the four sets of parameter combinations, the best combination was found to be the data sets of the maximum current value (Ip), average current value (Iave), td, and shape factors (Sn). This confirmed that all of the prepared features used here were useful for the discrimination. Importantly, among the three types of features, (Ip, Iave), (Ip, Iave, td), and the shape factors (Sn), the shape factors were found to have the highest discrimination accuracy of 0.44, compared to 0.40 for the maximum and average values of the current and the signal duration (Ip, Iave, td), and 0.39 for the average and maximum values of the current (Ip, Iave) (Figure6b). It is suggested that the shape factors were important because the translocation behaviors inside the nanogap electrodes differed depending on the size and polarization of the molecule, resulting in characteristic signal behaviors. Nanomaterials 2021, 11, x 9 of 12 Nanomaterials 2021, 11, 784 9 of 12

FigureFigure 6. ( 6.a) (Comparisona) Comparison of ofF-measure F-measure values values for for support support vector vector machine machine (SVM), extremeextreme gradientgradient boosting boosting (XGBoost), (XGBoost), andand random random forest. forest. The The hyperparameters hyperparameters ofof SVM SVM were were determined determined by by a a grid search toto havehave the the best best F-measure. F-measure. All All the the algorithmsalgorithms succeeded succeeded in inidentifying identifying with with an an F-measure F-measure of 0.420.42 oror higher.higher. Among Among several several classification classification algorithms, algorithms, including includ- ing SVM and XGBoost, thethe randomrandom forestforestalgorithm algorithm had had the the best best discrimination discrimination accuracy. accuracy. Its Its good good discrimination discrimination accuracy accuracy indicated that all the prepared features (Ip, Iave, td, S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10) used here were involved in the indicated that all the prepared features (Ip, Iave, td, S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10) used here were involved in the discrimination.discrimination. An Anevaluation evaluation was was made made of of the the contributions contributions to to the the discriminatio discriminationn of thethe differentdifferent feature feature parameters. parameters. (b ()b) Comparison of four sets of feature parameters, i.e., the maximum current value (Ip), average current value (Iave), td, and Comparison of four sets of feature parameters, i.e., the maximum current value (Ip), average current value (Iave), td, and shape factors (Sn). Among the four sets of the parameter combinations, the best combination was found to be maximum shape factors (Sn). Among the four sets of the parameter combinations, the best combination was found to be maximum current value, average current value, and shape factors (Ip, Iave, td, Sn). This confirmed that all the prepared features used current value, average current value, and shape factors (Ip, Iave, td, Sn). This confirmed that all the prepared features used here were important for the discrimination. In addition, among three sets of feature parameters, Ip, td, and the shape factors, here were important for the discrimination. In addition, among three sets of feature parameters, Ip, td, and the shape factors, the shape factors (Sn) were found to provide the highest discrimination accuracy. This suggested that the shape factors werethe important. shape factors (c) Estimation (Sn) were found of the to importance provide the of highest each feature discrimination parameter. accuracy. The random This suggested forest classifier that the shapecould factorsevaluate the wereimportance important. of each (c) Estimationfeature, which of the was importance an index ofof eachhow featuremuch the parameter. discrimination The random accuracy forest decreased classifier when could each evaluate of the featurethe importanceparameter ofwas each omitted. feature, Among which was these an indexes, index of howtwo muchfeature the parameters, discrimination i.e., accuracythe maximum decreased current when value each of(Ip) the and firstfeature shape parameterfactor (S1), was were omitted. found Among to be the these most indexes, important. two feature parameters, i.e., the maximum current value (Ip) and first shape factor (S1), were found to be the most important. Next, to evaluate each feature parameter’s contribution to the discrimination, the dis- Finally, the importance of each feature parameter was estimated to quantitatively crimination accuracy was determined after changing the combination of feature parame- investigate which feature was more effective for discrimination. The random forest classifier ters used. Among the four sets of parameter combinations, the best combination was could evaluate the importance of each feature [33], which was an index of how much the founddiscrimination to be the data accuracy sets of decreased the maximum when that current feature value parameter (Ip), average was omitted. current Figurevalue 6(Icave), td, showsand shape the importance factors (Sn). index This forconfirmed each feature that all parameter. of the prepared Among features these indexes, used here the two were usefulmost for important the discrimination. feature parameters Importantly, were found among to be the the three maximum types currentof features, value (I (pI,p I)ave and), (Ip, Iavefirst, td), shapeand the factor shape (S1). factors (Sn), the shape factors were found to have the highest dis- criminationFor the accuracy maximum of current0.44, compared value (Ip), to an 0.40 increase for the in themaximum signal current and average was induced values by of theelectron current transmission and the signal via a duration single molecule (Ip, Iave at, t thed), and nanogap 0.39 electrodes.for the average It is well and known maximum that valuesthe current of the intensitycurrent is(Ip proportional, Iave) (Figure to 6b). the It transmission is suggested of athat single-molecule the shape factors junction, wereτ, asim- portantrepresented because in thethe following translocation equation behaviors [34–36]. inside the nanogap electrodes differed de- pending on the size and polarization of the molecule, resulting in characteristic signal be- 4ΓLΓR haviors. τ = (2) ε2 + (Γ + Γ )2 Finally, the importance of each feature parameterL R was estimated to quantitatively

investigateHere, whichε is the feature conductive was molecular more effective orbital for alignment discrimination. from the goldThe Fermirandom level, forest and clas-ΓL sifierand couldΓR denote evaluate the coupling the importance valuesbetween of each feature the molecule [33], which and the was left an and index right of electrodes, how much therespectively. discrimination A change accuracy in thedecreased molecular when orbital thatposition feature parameter causes a change was omitted. in the single- Figure 6c moleculeshows the conductance. importance It index has been for previouslyeach feature reported parameter. that the Among difference these in indexes, conductivity the two is mostclosely important related feature to the highest parameters occupied were molecular found to orbital be the (HOMO) maximum level current [15,16 ,value19,26,37 (Ip,38) and]. firstIn shape this study, factor all (S of1). the molecules investigated (cAMP, ATP, ADP, and AMP) have an adenineFor the structure, maximum which current is the value main ( localIp), an electron increase density in the sitesignal and current related was to the induced HOMO by electronlevel of transmission the molecule. via The a electronsingle molecule density ofat thethe nanogap structure electrodes. is influenced It is well by known the thatposition the current of the intensity phosphate is group,proportional which isto anthe electron-withdrawing transmission of a single-molecule functional group. junction, For instance, the phosphate group of cAMP approaches the vicinity of adenine because of two τ, as represented in the following equation [34–36]. covalent bonds between the phosphate and ribose. Thus, differences in the sizes of the molecules reflect those at each of the HOMO levels,4Γ Γ resulting in signal intensity differences. τ = L R ε 2 + ()Γ + Γ 2 (2) L R

Nanomaterials 2021, 11, 784 10 of 12

The electronic structure of molecules, such as HOMO levels, is one of the important parameters for each of the signal characters. In addition to the electronic structure of molecules, it was also found that fluctuations of sample molecules inside electrodes, i.e., structural motion of the DNA and its environment, are closely related to the signal characters [15,16]. In this study, among various potential candidates, we utilized td, Sn (n = 1, 2, . . . , 10)) for this discrimination as feature parameters for this machine learning because the characteristic fluctuation, which should be influenced by the molecular orientation and interaction inside nanogap, would induce the characteristic signal shape and signal duration time. Among the shape factors of Sn (n = 1, 2, ... , 10)), the first shape factor (S1) was the first region of the ten separated regions (Figure4c), i.e., the initial rising current for each signal. The difference in S1 was due to the initial interaction between the molecule and gold electrodes. The cAMP molecules are smaller than the ATP, AMP, and ADP molecules because the phosphate group of cAMP is covalently bound to the hydroxyl group of ribose molecules. Therefore, there was considered to be a large difference in the initial state of attraction to the electrode. In summary, it was shown that using numerous features with machine learning made it possible to determine the slight differences in the molecules, which assisted in improving the identification accuracy of the single-molecule measurements.

5. Conclusions A single-molecule quantum measurement method was used for cAMP and succeeded in detecting cAMP for the first time. Furthermore, this study succeeded in identifying signals from ATP, AMP, and ADP, which are molecules similar to cAMP. The single cAMP signal discrimination accuracy values for AMP, ADP, and ATP were found to be 0.82, 0.70, and 0.72, respectively. These values indicated a 99.9% accuracy when detecting more than ten signals. Furthermore, the features that contributed to the identification were identified. As a result that the features were affected by the interaction with the electrodes, the structural difference caused by the ribose of the phosphate site of cAMP was considered to be the reason for the differences compared to the ATP, ADP, and AMP signals. In this study, as a proof-of-concept, we demonstrated cAMP detection and its discrimination from other similar molecules. When multiple gap devices are developed and some kinds of cultured cell samples are mounted on the device, it is possible to visualize the cAMP related intercellular signaling pathway.

Supplementary Materials: The following are available online at https://www.mdpi.com/2079-499 1/11/3/784/s1, Figure S1: Current–time profiles for signals obtained cAMP aqueous solution, Figure S2: Distribution of the first component for AMP and cAMP. The first component is 0.29Ip + 0.05Id + 0.09Iave + 0.23S1 + 0.30S2 + 0.31S3 + 0.31S4 + 0.30S5 + 0.30S6 + 0.31S7 + 0.30S8 + 0.29S9 + 0.33S10. Author Contributions: Conceptualization, Y.K., T.O. and M.T.; methodology, T.O.; software, Y.K.; validation, Y.K., T.O. and M.T.; formal analysis, Y.K.; investigation, T.O. and Y.K.; resources, T.O.; data curation, Y.K.; writing—original draft preparation, T.O.; writing—review and editing, T.O. and Y.K.; visualization, Y.K.; supervision, T.O. and M.T.; project administration, T.O. and M.T.; funding acquisition, M.T., T.O. and Y.K. All authors have read and agreed to the published version of the manuscript. Funding: This research was supported by KAKENHI Grant No. 19H00852, No.18K14091, and JST- CREST Grant Number JPMJCR1666, and the New Energy and Industrial Technology Development Organization (NEDO), Grant project New Industry Creation New Technology leading research program (14004). Data Availability Statement: The data presented in this study are available on request from the corresponding authors. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the study design; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. Nanomaterials 2021, 11, 784 11 of 12

References 1. Municio, A.M.; Miras-Portugal, M.T. Cell Signal Transduction, Second Messengers, and Protein in Health and Disease; Springer: Boston, MA, USA, 1994. 2. Pollard, T.D.; Earnshaw, W.C.; Lippincott-Schwartz, J.; Johnson, G. (Eds.) Second Messengers. In Cell , 3rd ed.; Elsevier Inc.: Philadelphia, PA, USA, 2017. 3. Wouters, F.S.; Verveer, P.J.; Bastiaens, P.I.H. Imaging inside cells. Trends Cell Biol. 2001, 11, 203–211. [CrossRef] 4. Chou, S.-H.; Guiliani, N.; Lee, V.; Römling, U. Microbial Cyclic Di-Nucleotide Signaling; Springer: Cham, Switzerland, 2020. 5. Bender, A.T.; Beavo, J.A. : Molecular regulation to clinical use. Pharmacol. Rev. 2006, 58, 488–520. [CrossRef][PubMed] 6. Beavo, J.A.; Brunton, L.L. Cyclic nucleotide research—Still expanding after half a century. Nat. Rev. Mol. Cell Biol. 2002, 3, 710–718. [CrossRef] 7. Greenwald, E.C.; Mehta, S.; Zhang, J. Genetically Encoded Fluorescent Biosensors Illuminate the Spatiotemporal Regulation of Signaling Networks. Chem. Rev. 2018, 118, 11707–11794. [CrossRef] 8. Adams, S.R.; Harrotunian, A.T.R.; Buechlerm, Y.J.; Talyer, S.S.; Tsien, R.Y. Fluorescence ratio imaging of cyclic AMP in single cells. Nature 1991, 349, 694–697. [CrossRef] 9. Zhang, J.; Campbell, R.E.; Ting, A.Y.; Tsien, R.Y. Creating new fluorescent probes for cell biology. Nat. Rev. Mol. Cell Biol. 2002, 3, 906–918. [CrossRef][PubMed] 10. Zaccolo, M.; De Giorgi, F.; Cho, C.Y.; Feng, L.X.; Knapp, T.; Negulescu, P.A.; Taylor, S.S.; Tsien, R.Y.; Pozzan, T. A genetically encoded, fluorescent indicator for cyclic AMP in living cells. Nat. Cell Biol. 2000, 2, 25–29. [CrossRef][PubMed] 11. Branton, D.; Deamer, D.W.; Marziali, A.; Bayley, H.; Benner, S.A.; Butler, T.; Di Ventra, M.; Garaj, S.; Hibbs, A.; Huang, X.H. The potential and challenges of nanopore sequencing. Nat. Biotechnol. 2008, 26, 1146–1153. [CrossRef] 12. Shendure, J.; Balasubramanian, S.; Church, G.M.; Gilbert, W.; Rogers, J.; Schloss, J.A.; Waterston, R.H. DNA sequencing at 40: Past, present and future. Nature 2017, 550, 345–353. [CrossRef][PubMed] 13. Shi, W.Q.; Friedman, A.K.; Baker, L.A. Nanopore Sensing. Anal. Chem. 2017, 89, 157–188. [CrossRef] 14. Tokeshi, M. Applications of Microfluidic Systems in Biology and Medicine; Springer: Singapore, 2019. 15. Zwolak, M.; Di Ventra, M. Electronic signature of DNA via transverse transport. Nano Lett. 2005, 5, 421–424. [CrossRef][PubMed] 16. Lagerqvist, J.; Zwolak, M.; Di Ventra, M. Fast DNA sequencing via transverse electronic transport. Nano Lett. 2006, 6, 779–782. [CrossRef] 17. Di Ventra, M.; Taniguchi, M. Decoding DNA, RNA and peptides with quantum tunnelling. Nat. Nanotechnol. 2016, 11, 117. [CrossRef] 18. Tsutsui, M.; Taniguchi, M.; Yokota, K.; Kawai, T. Identifying single nucleotides by tunnelling current. Nat. Nanotechnol. 2010, 5, 286–290. [CrossRef] 19. Ohshiro, T.; Tsustui, M.; Matsubara, K.; Furuhashi, M.; Taniguchi, M.; Kawai, T. Single-Molecule Electrical Random Resequencing of DNA and RNA. Sci. Rep. 2012, 2, 501–507. [CrossRef] 20. Ohshiro, T.; Tsutsui, M.; Yokota, K.; Taniguchi, M. Quantitative analysis of DNA with single-molecule sequencing. Sci. Rep. 2018, 8, 8517. [CrossRef][PubMed] 21. Ohshiro, T.; Tsutsui, T.; Yokota, K.; Furuhashi, M.; Taniguchi, M.; Kawai, T. Single-Molecule Proteomic Analysis of Post- translational Modification. Nat. Nanotechnol. 2014, 9, 835–840. [CrossRef] 22. Ohshiro, T.; Komoto, Y.; Konno, M.; Koseki, J.; Asai, A.; Ishii, H.; Taniguchi, M. Direct Analysis of Incorporation of an Anticancer Drug into DNA at Single-Molecule Resolution. Sci. Rep. 2019, 9, 3886. [CrossRef][PubMed] 23. Tsutsui, M.; Shoji, K.; Taniguchi, M.; Kawai, T. Formation and self-breaking mechanism of stable atom-sized junctions. Nano Lett. 2008, 8, 345–349. [CrossRef] 24. Agrait, N.; Yeyati, A.L.; van Ruitenbeek, J.M. Quantum properties of atomic sized conductors. Phys. Rep. 2003, 377, 81–279. [CrossRef] 25. Michaelson, H.B. The work function of the elements and its periodicity. J. Appl. Phys. 1977, 48, 4729. [CrossRef] 26. Komoto, Y.; Ohshiro, T.; Yoshida, T.; Tarusawa, E.; Yagi, T.; Washio, T.; Taniguchi, M. Time-resolved detection in mouse brain tissue using an artificial intelligence-nanogap. Sci. Rep. 2020, 10, 11244. [CrossRef] 27. Komoto, Y.; Ohshiro, T.; Taniguchi, M. Detection of -derived cancer marker by single-molecule quantum sequencing. Chem. Commun. 2020, 56, 14299–14302. [CrossRef][PubMed] 28. Taniguchi, M.; Ohshiro, T.; Komoto, Y.; Takaai, T.; Yoshida, T.; Washio, T.J. High-precision single-molecule identification based on single-molecule information within a noisy matrix. Phys. Chem. C 2019, 123, 15867. [CrossRef] 29. Komoto, Y.; Ohshiro, T.; Taniguchi, M. Length discrimination of homo-oligomeric nucleic acids with single-molecule measurement. Anal. Sci. 2021, 37, 513–518. [CrossRef][PubMed] 30. Ohshiro, T.; Komoto, Y.; Taniguchi, M. Single-Molecule Counting of Nucleotide by Electrophoresis with Nanofluid integrated nano-gap devices. Micromachines 2020, 11, 982. [CrossRef] 31. Elkan, C.; Noto, K. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 213–220. Nanomaterials 2021, 11, 784 12 of 12

32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.J. Scikit-learn: Machine learning in Python. Mach. Learn. Res. 2011, 12, 2825. 33. Landauer, R. Electrical resistance of disordered one-dimensional lattices. Philos. Mag. 1970, 21, 863. [CrossRef] 34. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Boca Raton, FL, USA, 1984. 35. Kim, Y.; Pietsch, T.; Erbe, A.; Belzig, W.; Scheer, E. Benzenedithiol: A Broad-Range Single-Channel Molecular Conductor. Nano Lett. 2011, 11, 3734. [CrossRef] 36. Komoto, Y.; Fujii, S.; Nakamura, H.; Tada, T.; Nishino, T.; Kiguchi, M. Resolving metal-molecule interfaces at single-molecule junctions. Sci. Rep. 2016, 6, 26606. [CrossRef][PubMed] 37. Frisenda, R.; Perrin, M.L.; Valkenier, H.; Hummelen, J.C.; van der Zant, H.S.J. Statistical analysis of single-molecule breaking traces. Phys. Status Solidi B 2013, 250, 2431. [CrossRef] 38. Furuhata, T.; Ohshiro, T.; Akimoto, G.; Ueki, R.; Taniguchi, M.; Sando, S. Chemical Labeling-Assisted Detection of Modifications by Quantum Tunneling-Based Single Molecule Sensing. ACS Nano 2019, 13, 5028. [CrossRef][PubMed]