Article An Intelligent Baby Monitor with Automatic Sleeping Posture Detection and Notification

Tareq Khan

School of Engineering, Eastern Michigan University, Ypsilanti, MI 48197, USA; [email protected]

Abstract: Artificial intelligence (AI) has brought lots of excitement to our day-to-day lives. Some examples are spam email detection, language translation, etc. Baby monitoring devices are being used to send video data of the baby to the caregiver’s smartphone. However, the automatic understanding of the data was not implemented in most of these devices. In this research, AI and image processing techniques were developed to automatically recognize unwanted situations that the baby was in. The monitoring device automatically detected: (a) whether the baby’s face was covered due to sleeping on the stomach; (b) whether the baby threw off the blanket from the body; (c) whether the baby was moving frequently; (d) whether the baby’s eyes were opened due to awakening. The device sent notifications and generated alerts to the caregiver’s smartphone whenever one or more of these situations occurred. Thus, the caregivers were not required to monitor the baby at regular intervals. They were notified when their attention was required. The device was developed using NVIDIA’s Jetson Nano microcontroller. A night vision camera and Wi-Fi connectivity were interfaced. Deep learning models for pose detection, face and landmark detection were implemented in the microcontroller. A prototype of the monitoring device and the smartphone app were developed and tested successfully for different scenarios. Compared with general baby monitors, the proposed device gives more peace of mind to the caregivers by automatically detecting un-wanted situations.   Keywords: human pose detection; face detection; deep learning; cloud messaging; hypertext transfer Citation: Khan, T. An Intelligent protocol (HTTP) server; jetson nano Baby Monitor with Automatic Sleeping Posture Detection and Notification. AI 2021, 2, 290–306. https://doi.org/10.3390/ai2020018 1. Introduction Smart baby monitoring devices are being used to obtain and send video and audio Academic Editor: Giovanni Diraco data of the baby to the caregiver’s smartphone, but most of these devices are unable to recognize or understand the data. In this project, a novel baby monitoring device is Received: 27 April 2021 developed which automatically recognizes the undesired and harmful postures of the baby Accepted: 15 June 2021 Published: 18 June 2021 by image processing and sends an alert to the caregiver’s smartphone—even if the phone is in sleep mode. Deep learning-based object detection algorithms are implemented in the

Publisher’s Note: MDPI stays neutral hardware, and a smartphone app is developed. The overall system is shown in Figure1. with regard to jurisdictional claims in The research problem addressed in this paper is to develop methods to automatically published maps and institutional affil- detect (a) whether the baby’s face is covered due to sleeping on the stomach; (b) whether the iations. baby has thrown off the blanket from the body; (c) whether the baby is moving frequently; (d) whether the baby’s eyes are opened due to awakening. One of the challenges of deep learning models is running them in embedded systems where resources such as memory and speed are limited. The methods must work in an embedded system with low latency. The system should also work in both day and night conditions. The detection methods Copyright: © 2021 by the author. Licensee MDPI, Basel, Switzerland. should not be biased and be inclusive to all races of babies. This article is an open access article The objective of this study is to develop a baby monitoring device that will auto- distributed under the terms and matically recognize the harmful postures of the baby by image processing and send an conditions of the Creative Commons alert to the caregiver’s smartphone. The work will be considered successful when the Attribution (CC BY) license (https:// proposed baby monitor can automatically detect the targeted harmful postures and send a creativecommons.org/licenses/by/ notification to the smartphone. Experiments with different postures will be conducted and 4.0/). the latency of the detection algorithms will be measured.

AI 2021, 2, 290–306. https://doi.org/10.3390/ai2020018 https://www.mdpi.com/journal/ai AI 2021, 2, FOR PEER REVIEW 2

baby monitor can automatically detect the targeted harmful postures and send a notifica- AI 2021, 2 tion to the smartphone. Experiments with different postures will be conducted291 and the latency of the detection algorithms will be measured.

FigureFigure 1. 1.The The overall overall smart smart baby baby monitoring monitoring system: system: (a) baby (a) sleeping;baby sleeping; (b) smart (b) baby smart monitoring baby monitoring devicedevice automatically automatically detects detects harmful harmful postures postures such as such facecovered, as face covered, thrown off thrown the blanket, off the frequently blanket, fre- movingquently or moving awake; (cor) imageawake; data (c) are image sent todata the are smartphone sent to the through smartp thehone Internet through with the the message Internet of with the anymessage harmful of situation; any harmful (d) a real-timesituation; video (d) a of real-time the baby is video shown of in the the baby smartphone, is shown and in notifications the smartphone, andand alerts notifications are generated and alerts whenever are generated the caregiver whenever receives the a message caregiver of receives the harmful a message situation of of the harm- theful baby.situation of the baby.

TheThe needs needs and and significances significance ofs the of proposedthe proposed system system are mentioned are mentioned below: below: • About 1300 babies died due to sudden death syndrome (SIDS), about 1300 deaths • About 1300 babies died due to sudden infant death syndrome (SIDS), about 1300 were due to unknown causes, and about 800 deaths were caused by accidental suffoca- tiondeaths and strangulationwere due to unknown in bed in 2018 causes, in the and USA ab [1out]. Babies 800 deaths are at higherwere caused risk for SIDSby accidental ifsuffocation they sleep on and their strangulation stomachs as it in causes bed in them 2018 to in breathe the USA less air.[1]. The Babies best are and at only higher risk positionfor SIDS for if a they baby tosleep sleep on is their on the stomachs back—which as it the causes American them Academy to breathe of less air. The best recommendsand only position through for the a baby’s baby firstto sleep year [is2]. on Sleeping the back—which on the back improvesthe American airflow. Academy Toof reduce Pediatrics the risk recommends of SIDS, the baby’s through face the should baby’s be uncovered, first year and[2]. bodySleeping temperature on the back im- shouldproves be airflow. appropriate To reduce [3]. The the proposed risk of baby SIDS, monitor the baby’s will automatically face should detect be uncovered, these and harmfulbody temperature postures of the should baby andbe appropriate notify the caregiver. [3]. The This proposed will help baby to reduce monitor SIDS. will auto- • Babies—especiallymatically detect fourthese months harmful or older—move postures of frequentlythe baby and during notify sleep the and caregiver. can throw This will off the blanket from their body [4]. The proposed system will alert when the baby is help to reduce SIDS. • moving frequently and also whether the blanket is removed. Thus, it helps to keep theBabies—especially baby warm. four months or older—move frequently during sleep and can • Babiesthrow may off wakethe blanket up in the from middle their of body the night [4]. The due proposed to hunger, pain,system or justwill toalert play when the withbaby the is parent. moving There frequently is an increasing and also call whether in the medical the blanket community is removed. to pay attention Thus, it helps to tokeep parents the when baby they warm. say their babies do not sleep [5]. The smart baby monitor detects • whetherBabies themay baby’s wake eyes up arein openthe middle and sends of the an alert. night Thus, due itto helps hunger, the parents pain, or know just to play whenwith the the baby parent. is awake There even is an if increasing he/she is not call crying. in the medical community to pay attention • Whento parents a baby when sleeps they ina say different their babies room, do the no caregiverst sleep [5]. need The to smart check baby the sleeping monitor detects condition of the baby after a regular interval. Parents lose an average of six months’ whether the baby’s eyes are open and sends an alert. Thus, it helps the parents know sleep during the first 24 months of their child’s life. Approximately 10% of parents managewhen the to getbaby only is 2.5awake h of continuouseven if he/she sleep is eachnot crying. night. Over 60% of parents with • babiesWhen aged a baby less thansleeps 24 monthsin a different get no more room, than the 3.25 caregivers h of sleep eachneed night. to check A lack the of sleeping sleepcondition can affect of the qualitybaby after of work a regular and driving; interval. create Parents mental healthlose an problems, average such of six as months’ anxietysleep during disorders the and first depression; 24 months and causeof their physical child’s health life. problems,Approximately such as obesity,10% of parents highmanage blood to pressure, get only diabetes, 2.5 h of and continuous heart disease sleep [6]. each The proposed night. Over smart 60% device of parents will with automaticallybabies aged detectless than the 24 situations months when get no the more caregiver’s than 3.25 attention h of sleep is required each night. and A lack of sleep can affect the quality of work and driving; create mental health problems, AI 2021, 2, FOR PEER REVIEW 3

such as anxiety disorders and depression; and cause physical health problems, such as obesity, high blood pressure, diabetes, and heart disease [6]. The proposed smart device will automatically detect the situations when the caregiver’s attention is re- quired and generate alerts. Thus, it will reduce the stress of checking the baby at reg- AI 2021, 2 ular intervals and help the caregiver to have better sleep. 292 • The proposed baby monitor can send video and alerts using the Internet even when the parent/caregiver is out of the home Wi-Fi network. Thus, the parent/caregiver can monitorgenerate alerts.the baby Thus, with it will the reduce smartphone the stress while of checking at work, the grocery, baby at regular park, intervalsetc. • Doand smart help the devices caregiver make to haveus lazy? better It sleep.surely depends on the ethical and responsible use • ofThe technology. proposed baby Smart monitor devices can allow send video humans and alertsto have using more the time Internet for evencreative when work by automatingthe parent/caregiver routine istasks out of[7,8]. the home Wi-Fi network. Thus, the parent/caregiver can monitor the baby with the smartphone while at work, grocery, park, etc. • ThereDo smart are devices commercial make usbaby lazy? monitoring It surely depends devices on such the ethical as the and MBP36XL responsible baby use monitor by Motorolaof technology. [9] and Smart the devicesDXR-8 video allow humansbaby monitor to have by more Infant time Optics for creative [10] available work by in the marketautomating that can routineonly send tasks video [7,8]. and audio data but are unable to automatically recognize harmfulThere postures are commercial of the baby. baby monitoring The Nanit devices Pro smart such asbaby the MBP36XLmonitor [11] baby can monitor monitor the breathingby Motorola of [the9] and baby; the DXR-8however, video the baby baby monitor must wear by Infant a special Optics breathing [10] available dress—which in the is anmarket overhead. that can A onlyrecent send Lollipop video and baby audio monitor data but [12] are can unable automatically to automatically detect recognize crying sounds andharmful a baby postures crossing of the a certain baby. The boundary Nanit Pro from smart the baby crib. monitor The Cubo [11] Ai can smart monitor baby the monitor [13]breathing can detect of the faces baby; covered however, due the to baby sleeping must on wear the aback; special however, breathing it cannot dress—which detect a blan- is an overhead. A recent Lollipop baby monitor [12] can automatically detect crying ket removed, frequent moving, and awake or sleep state from the eyes. The detailed algo- sounds and a baby crossing a certain boundary from the crib. The Cubo Ai smart baby rithmsmonitor used [13] in can these detect commercial faces covered baby due monitors to sleeping are onnot the publicly back; however, available. it cannotThe proposed workdetect embraces a blanket removed,an open science frequent approach, moving, and and awake the methods or sleep stateare described from the eyes. in detail The for the researchersdetailed algorithms to repeat used the in experiments. these commercial The babyrest monitorsof the paper are not is organized publicly available. as follows. In SectionThe proposed 2, materials work embraces and methods an open are science discussed approach, with and the the proposed methods are detection described algorithms in anddetail the for prototype the researchers development. to repeat the Results experiments. are disc Theussed rest in of Section the paper 3. isIn organized Section 4, as the dis- cussionfollows. and In Section the limitations2, materials of and the methodsstudy are are presented. discussed Finally, with the Section proposed 5 presents detection the con- clusion.algorithms and the prototype development. Results are discussed in Section3. In Section4, the discussion and the limitations of the study are presented. Finally, Section5 presents the conclusion. 2. Materials and Methods 2. MaterialsThe steps and taken Methods to develop the detection algorithms of harmful and undesired sleep- ing posturesThe steps from taken image to develop data the and detection prototype algorithms development of harmful of and the undesired smart baby sleeping monitor are brieflypostures shown from image in Figure data and2. They prototype are described development below. of the smart baby monitor are briefly shown in Figure2. They are described below.

Figure 2. Flowchart showing the steps for developing the detection algorithms and the prototype. Figure 2. Flowchart showing the steps for developing the detection algorithms and the prototype. 2.1. Detection Algorithms 2.1. DetectionThe experimental Algorithms setup as shown in Figure3 is used to develop the alerting situation detection algorithms. A night-vision camera [14] is interfaced with an NVIDIA Jetson Nano microcontroller [15]. Realistic baby dolls [16–20] of both genders and different races—Asian, Black, Caucasian, Hispanic—were put under the camera during experiments. Both a daylight condition and night vision condition—where the doll is illuminated by AI 2021, 2, FOR PEER REVIEW 4

The experimental setup as shown in Figure 3 is used to develop the alerting situation detection algorithms. A night-vision camera [14] is interfaced with an NVIDIA Jetson Nano microcontroller [15]. Realistic baby dolls [16–20] of both genders and different AI 2021, 2, FOR PEER REVIEW 4 races—Asian, Black, Caucasian, Hispanic—were put under the camera during experi- ments. Both a daylight condition and night vision condition—where the doll is illumi- natedThe by experimental infrared light—were setup as shown taken in Figure into 3 isconsideration. used to develop A the brief alerting description situation of the detec- tiondetection of the algorithms. four alerting A night-vision situations—face camera [14] iscovered, interfaced blanket with an notNVIDIA covering Jetson the body, fre-

AI 2021, 2 quentlyNano microcontroller moving and [15]. awake—is Realistic babydescribed dolls [16–20] below. of both genders and different293 races—Asian, Black, Caucasian, Hispanic—were put under the camera during experi- ments. Both a daylight condition and night vision condition—where the doll is illumi- nated by infrared light—were taken into consideration. A brief description of the detec- infraredtion of the light—were four alerting taken situations—face into consideration. covered, A brief blanket description not covering of the detection the body, of thefre- fourquently alerting moving situations—face and awake—is covered, described blanket below. not covering the body, frequently moving and awake—is described below.

Figure 3. Experimental setup: a night-vision camera (a) is interfaced with an NVIDIA Jetson Nano microcontroller (b). Monitor, wireless keyboard, and wireless mouse (c) were connected with the FigureJensonFigure 3.3. Nano.Experimental Experimental Realistic setup:setup: baby aa night-visionnight-vision doll (d) of cameracamera both ((gendersaa)) isis interfacedinterfaced and different withwith anan NVIDIANVIDIA races Jetsonwere Nanoput under the cam- microcontrollereramicrocontroller during experiments. ((bb).). Monitor,Monitor, wirelesswireless keyboard,keyboard, and wireless mouse mouse ( (cc)) were were connected connected with with the the JensonJenson Nano. Nano. RealisticRealistic babybaby dolldoll ( (dd)) ofof bothboth genders genders and and different different races races were were put put under under the the camera cam- during2.1.1.era during experiments.Detection experiments. of Face Covered and Blanket Removed 2.1.1.2.1.1. DetectionTheDetection nose ofof of FaceFace the CoveredCovered baby is and anddetected BlanketBlanket from RemovedRemoved the image to decide whether the face is covered dueThe Theto nosesleepingnose ofof thethe babyonbaby the isis detected detectedstomach fromfrom or the thefor imageimage other toto decidereasons.decide whetherwhether To detect the the face face a is is coveredblanketcovered removed, the duevisibilitydue toto sleepingsleeping of the onon lower thethe stomachstomach body orpartsor forfor othersuchother reasons. reasons.as the hip, ToTo detect detectknee, aa and blanketblanket ankle removed,removed, are detected. thethe The pseu- visibilitydocodevisibility offorof the a facelower lower covered body body parts parts and such such blanket as as the the hip, remo hip, knee, knee,ved and detection and ankle ankle are isdetected. are shown detected. The in pseu-Figure The 4. pseudocodedocode for a for face a facecovered covered and andblanket blanket remo removedved detection detection is shown is shown in Figure in Figure 4. 4.

Figure 4. Pseudocode for a face covered and blanket removed detection.

FigureFigurePose 4. 4.Pseudocode detectionPseudocode fortechniques a for face a covered face [21–23] covered and are blanket used and removedto blanket detect detection. theremoved body parts. detection. Densenet 121 [24] is used as the backbone network for feature extraction. The features are then fed into two- branchPose multi-stage detection techniquestransposed [ 21convolution–23] are used networks. to detect These the body branch parts. networks Densenet simultane- 121 [24] isously used Posepredict as the detection the backbone heatmap techniques network and Part for Affinity [21–23] feature Field extraction.are (PAF)used matrices.to The detect features The the model arebody then was parts. fed trained into Densenet 121 [24] two-branchisfor used 160 epochs as multi-stagethe with backbone the transposed COCO network [25] convolution dataset. for feature The networks. COCO extraction. dataset These branchis Thea large features networks dataset simulta-arecontain- then fed into two- neouslybranching 1.5 million predict multi-stage object the heatmap instances transposed and and Part 80 Affinity convolutioncategories Field of (PAF)objects. networks. matrices. It has images These The model of branch 250,000 was trained people;networks simultane- forouslyof them, 160 epochspredict 56,165 with peoplethe the heatmap COCOhave labeled [25 and] dataset. key Part points TheAffinity COCOsuch asField dataset nose, (PAF) eye, is a largeear, matrices. etc. dataset [26]. containing The model was trained 1.5 million object instances and 80 categories of objects. It has images of 250,000 people; of them,for 160 56,165 epochs people with have the labeled COCO key points[25] dataset. such as nose,The eye,COCO ear, etc.dataset [26]. is a large dataset contain- ing 1.5The million heatmap object is a matrix instances that stores and 80 the categories confidence of aobjects. certain It pixel has containing images of a 250,000 people; certainof them, part. 56,165 There people are 18 heatmapshave labeled associated key points with each such one as of nose, the body eye, parts. ear, PAFsetc. [26]. are matrices that give information about the position and orientation of pairs. A pair is a connection between parts. They come in couples: for each part, there is a PAF in the ‘x’ direction and a PAF in the ‘y’ direction. Once the candidates for each one of the AI 2021, 2 294

body parts are found, they are then connected to form pairs guided by the PAFs. The line integral along the segment connecting each couple of part candidates are computed over the corresponding PAFs (x and y) for that pair. A line integral measures the effect of a PAF among the possible connections between part candidates. It gives each connection a score, that is saved in a weighted bipartite graph. The weighted bipartite graph shows all possible connections between candidates of two parts and holds a score for every connection. Then, the connections are searched that maximize the total score; that is, solving the assignment problem using a greedy algorithm. The last step is to transform these detected connections into the skeletons of a person. Finally, a collection of human sets is found, where each human is a set of parts, and where each part contains its relative coordinates.

2.1.2. Frequent Moving Detection The motion of the baby is detected by image processing [27,28]. To detect motion, the captured image is first converted to grayscale, as color information is not required for motion detection. Then, Gaussian Blur [29] is applied to smooth the image. In the Gaussian Blur operation, the image is convolved with a Gaussian filter. The Gaussian filter is a low-pass filter that removes the high-frequency camera noises. Then, an absolute difference image is calculated by subtracting the image from a previously captured image. The previously captured image is captured, gray-scaled, blurred and saved one second before. The difference image contains larger values where motion is detected, and smaller values where no or insignificant motion is detected. The image is then threshold [30] to make it a binary-black and white-image. It converts the motion regions to white and non- motion background regions to black. Then, the white region is enlarged by dilation [31] and contours [32] are drawn around them. Then, the area of each contour is calculated. If the area of any contour is larger than a threshold area, then it indicates a transient motion. This thresholding avoids small movements to be considered as a transient motion. The term frequent moving is defined if there is at least one transient motion in every three consecutive blocks of time. A block of time is declared to be 10 s. Whenever a transient motion is detected, a block movement flag is set as true. A first-in-first-out (FIFO) of size three is used to store the last three block movement flags. After every 10 s, an item from the FIFO is removed and the block movement flag is put in the FIFO. The block movement flag is then set as false. If all the entries of the FIFO are true, then a frequent moving flag is set to true.

AI 2021, 2, FOR PEER REVIEW 2.1.3. Awake Detection 6

To detect whether the baby is awake or asleep, the eye landmarks from the image are processed. The flowchart for awake detection is shown in Figure5.

Figure 5. Flowchart for awake detection (EAR = eye aspect ratio). Figure 5. Flowchart for awake detection (EAR = eye aspect ratio).

The face of the baby is detected using the Multi-Task Cascaded Convolutional Neural Network (MTCNN) [33]. It is a deep learning-based method. It can detect not only faces but can also detect landmarks such as the location of the two eyes, nose, and mouth. The model has a cascade structure with three networks. First, the image is rescaled to a range of different sizes. Then, the first model (Proposal Network or P-Net) proposes candidate facial regions. Addition processing such as non-maximum suppression (NMS) is also used to filter the candidate bounding boxes. Then, the second model (Refine Network or R-Net) filters the bounding boxes, and the third model (Output Network or O-Net) proposes fa- cial landmarks. These three models are trained for face classification, bounding box re- gression, and facial landmark localization namely. It was found that reducing the bright- ness and increasing the contrast of the image gives better face detection for both day and night light conditions. Therefore, the brightness and the contrast are adjusted of the image before passing it through the MTCNN. The eye landmarks detected by the MTCNN only provide the location of the eyes and cannot be used to detect whether the eyes are open or closed. Once the face bounding box is detected, the region of interest (ROI) is then passed to a facial landmark detector [34–36]. In this method, regression trees are trained using a gradient boosting algorithm with labeled datasets [37] to detect the x and y locations of 68 points on the face such as on the mouth, eyebrows, eyes, nose, and jaw. On each eye, the six locations are detected, as shown in Figure 6. The eye aspect ratio (EAR’) is then calcu- lated using Equation (1). 𝐸𝐴𝑅′ 𝐵𝐹𝐴 𝐶𝐸/2 𝐷 (1)

Figure 6. Eye landmarks. Two points (B and C) on the upper eyelid, two points (F and E) on the lower eyelid, and two points (A and D) on the two corners of the eye.

When an eye is open, the EAR’ will be larger; when an eye is closed, the EAR’ will be smaller. The average of the left and right eye EARs are used to find the final EAR. If the EAR is larger than a threshold, then eye open is detected. The threshold is set to 0.25. While awake, the baby may blink, and the eye gets closed for a short amount of time. It is not desirable to change the status to sleeping for blinking, as this is misleading. It is AI 2021, 2, FOR PEER REVIEW 6

Figure 5. Flowchart for awake detection (EAR = eye aspect ratio).

The face of the baby is detected using the Multi-Task Cascaded Convolutional Neural Network (MTCNN) [33]. It is a deep learning-based method. It can detect not only faces but can also detect landmarks such as the location of the two eyes, nose, and mouth. The model has a cascade structure with three networks. First, the image is rescaled to a range of different sizes. Then, the first model (Proposal Network or P-Net) proposes candidate facial regions. Addition processing such as non-maximum suppression (NMS) is also used AI 2021, 2 to filter the candidate bounding boxes. Then, the second model295 (Refine Network or R-Net) filters the bounding boxes, and the third model (Output Network or O-Net) proposes fa-

cialThe landmarks. face of the baby isThese detected three using the models Multi-Task are Cascaded traine Convolutionald for face Neural classification, bounding box re- Networkgression, (MTCNN) and [33facial]. It is alandmark deep learning-based localization method. It cannamely. detect not It only was faces found that reducing the bright- but can also detect landmarks such as the location of the two eyes, nose, and mouth. The modelness has and a cascade increasing structure withthe threecontrast networks. of First, the the image image is gives rescaled better to a range face detection for both day and ofnight different light sizes. conditions. Then, the first model Therefore, (Proposal Network the brightness or P-Net) proposes and the candidate contrast are adjusted of the image facial regions. Addition processing such as non-maximum suppression (NMS) is also used tobefore filter the passing candidate bounding it through boxes. Then,the theMTCNN. second model The (Refine eye Network landmarks or R-Net) detected by the MTCNN only filtersprovide the bounding the location boxes, and the of third the model eyes (Output and Network cannot or O-Net)be used proposes to facialdetect whether the eyes are open landmarks. These three models are trained for face classification, bounding box regression, andor facialclosed. landmark localization namely. It was found that reducing the brightness and increasingOnce the contrast the face of the imagebounding gives better box face is detection detected, for both the day re andgion night lightof interest (ROI) is then passed to conditions. Therefore, the brightness and the contrast are adjusted of the image before passinga facial it through landmark the MTCNN. detector The eye landmarks [34–36]. detected In this by the method, MTCNN only regression provide trees are trained using a thegradient location of boosting the eyes and cannotalgorithm be used towith detect labeled whether the datasets eyes are open [37] or closed.to detect the x and y locations of 68 Once the face bounding box is detected, the region of interest (ROI) is then passed topoints a facial landmarkon the detectorface such [34–36 ].as In on this method,the mouth, regression ey treesebrows, are trained eyes, using nose, a and jaw. On each eye, the gradientsix locations boosting algorithm are detected, with labeled as datasets shown [37 ]in to detectFigure the x6. and The y locations eye aspect of ratio (EAR’) is then calcu- 68 points on the face such as on the mouth, eyebrows, eyes, nose, and jaw. On each eye, thelated six locations using are Equation detected, as (1). shown in Figure6. The eye aspect ratio ( EAR’) is then calculated using Equation (1).

0 𝐸𝐴𝑅′ 𝐵𝐹𝐴 𝐶𝐸/2 𝐷 (1) EAR = (BF + CE)/2AD (1)

FigureFigure 6. Eye 6. landmarks.Eye landmarks. Two points (BTwo and C)points on the upper(B and eyelid, C) two on points the (Fupper and E) oneyelid, the two points (F and E) on the lowerlower eyelid, eyelid, and two and points two (A and points D) on the (A two and corners D) of on the eye.the two corners of the eye. When an eye is open, the EAR’ will be larger; when an eye is closed, the EAR’ will be smaller. The average of the left and right eye EARs are used to find the final EAR. If the EAR is largerWhen than an a threshold, eye is thenopen, eye openthe isEAR detected.’ will The be threshold larger; is set when to 0.25. an eye is closed, the EAR’ will be smaller.While awake, The the average baby may blink,of the and left the eye and gets right closed foreye a short EARs amount are of time.used to find the final EAR. If the It is not desirable to change the status to sleeping for blinking, as this is misleading. It is implementedEAR is larger that if thethan eye isa closedthreshold, consecutively then for eye a defined open number is detected. of loop cycles, The threshold is set to 0.25. then sleepWhile is detected. awake, In the the same baby way, if may the eye blink, is opened and consecutively the eye for gets a defined closed for a short amount of time. number of loop cycles, then an awake state is detected. It is not desirable to change the status to sleeping for blinking, as this is misleading. It is 2.2. Prototype Development The proposed system consists of the smart baby monitor device and the smartphone app. They are briefly described below.

2.2.1. Smart Baby Monitor Device The smart baby monitor device is placed above the baby’s crib. It takes images of the baby, detects harmful or undesired situations, and sends a notification to the caregiver’s smartphone. The hardware and the software parts of this device are briefly described below. AI 2021, 2, FOR PEER REVIEW 7

implemented that if the eye is closed consecutively for a defined number of loop cycles, then sleep is detected. In the same way, if the eye is opened consecutively for a defined number of loop cycles, then an awake state is detected.

2.2. Prototype Development The proposed system consists of the smart baby monitor device and the smartphone app. They are briefly described below.

2.2.1. Smart Baby Monitor Device The smart baby monitor device is placed above the baby’s crib. It takes images of the baby, detects harmful or undesired situations, and sends a notification to the caregiver’s smartphone. The hardware and the software parts of this device are briefly described be- low. Hardware: The single-board computer—NVIDIA® Jetson Nano™ [15]—is used as the main processing unit. It is a small size and low-power embedded platform where neural network models can run efficiently for applications such as image classification, object AI 2021, 2 detection, segmentation, etc. It contains a Quad-core ARM296 A57 microprocessor running at 1.43 GHz, 4 GB of RAM, a 128-core Maxwell graphics processing unit (GPU), a micro SDHardware: card slot,The single-board USB ports, computer—NVIDIA and other ®built-inJetson Nano hardware™ [15]—is used peripherals. as the A night-vision camera main processing unit. It is a small size and low-power embedded platform where neural network[14] is models interfaced can run efficiently with forthe applications Jetson suchNano as image using classification, a USB. object When the surrounding light is detection,enough, segmentation, such as etc.in Itthe contains daytime, a Quad-core it ARMcaptures A57 microprocessor color images. running atThis camera has a built-in light 1.43 GHz, 4 GB of RAM, a 128-core Maxwell graphics processing unit (GPU), a micro SD cardsensor slot, USB and ports, infrared and other (IR) built-in LEDs. hardware When peripherals. the Asurrounding night-vision camera light [14] is low, the IR LEDs automat- isically interfaced turn with on the and Jetson it Nano captures using a USB. grayscale When the surroundingimages. To light connect is enough, with the Internet wirelessly, a such as in the daytime, it captures color images. This camera has a built-in light sensor andWi-Fi infrared adaptor (IR) LEDs. [38] When is theconnected surrounding to light the is low, USB the IRport LEDs of automatically the Jetson Nano. A 110V AC to 5V 4A turnDC on adapter and it captures is grayscaleused as images. the Topower connect withsupply. the Internet A cooling wirelessly, afan Wi-Fi with pulse width modulation adaptor [38] is connected to the USB port of the Jetson Nano. A 110V AC to 5V 4A DC adapter(PWM)-based is used as the powerspeed supply. control A cooling is placed fan with pulseabove width th modulatione microprocessor. (PWM)- The hardware block dia- basedgram speed is shown control is placedin Figure above the 7. microprocessor. The hardware block diagram is shown in Figure7.

FigureFigure 7. Block 7. Block diagram diagram of the baby monitor of the hardware. baby monitor hardware. Software: Linux4Tegra (L4T)—a version of Ubuntu operating system (OS)—is installed on a 32Software: GB SD card of Linux4Tegra the Jetson Nano board. (L4T)—a The application version software of isUbuntu developed inoperating system (OS)—is in- Python language and the necessary packages are installed. stalledThe device on connectsa 32 GB to the SD home card router of using the Wi-Fi Jetson to access Nano the Internet. board. To streamThe application software is devel- real-time video to the smartphone and to receive commands from the smartphone that isoped outside in of Python the home Wi-Filanguage network, and the device the mustnecessary be accessed packages from outside are of the installed. home network.The device A Hypertext connects Transfer to Protocol the home (HTTP) router server, known using as FlaskWi-Fi [39], to is access the Internet. To stream implemented in the device. The device’s private IP is made static and port forwarding [40] isreal-time configured. Thus,video the serverto the can smartphone be accessed from outside and ofto the rece homeive network commands using its from the smartphone that is publicoutside IP and of the the port home number. Wi-Fi network, the device must be accessed from outside of the home The smartphone app sends commands using an HTTP GET request to the baby monitornetwork. device A to start,Hypertext stop, and configure Transfer settings Protocol such as enabling (HTTP) or disabling server, one orknown as Flask [39], is imple- moremented detections. in the Depending device. upon The the words device’s contained private in the Uniform IP is Resource made Locator static and port forwarding [40] is (URL) of the GET requests, call back functions are executed to start, to stop, and to configure settingsconfigured. of the device. Thus, A separate the threadserver is used can that be captures accessed VGA (640 from× 480) outside images and of the home network using its storespublic them IP in and a global the object port with number. thread locking [41]. This thread continues to capture images after it receives the start command. To stream video, a separate thread reads the capturedThe image smartphone from the global objectapp consideringsends commands thread locking, using compresses an theHTTP image GET request to the baby mon- toitor JPEG, device adds headers to start, to the stop, encoded and stream configure as an image settin object,gs and such then sends as enabling it to or disabling one or more the HTTP client, i.e., to the smartphone. Making a separate thread for streaming video solvesdetections. the latency Depending problem that will upon be caused the bywords the detection contained algorithms in inthe a single Uniform Resource Locator (URL) loop-basedof the GET program. requests, call back functions are executed to start, to stop, and to configure After the device receives the start command, a separate thread reads the captured im- agesettings considering of threadthe device. locking—to A detect separate the harmful thread and undesired is used situations. that Dependingcaptures VGA (640 × 480) images and uponstores the detectionthem in alerts a requestedglobal fromobject the user,with it continuously thread locking executes the [41]. requested This thread continues to capture detection algorithms—such as face covered, blanket removed, moving, or awake—as discussedimages in after Section it 2.1 receives. To reduce thethe inference start timecommand. of MTCNN To for face stream detection video, and a separate thread reads the Densenet121-based body part detection on the Jetson Nano, NVIDIA-TensorRT [42,43] is used.captured TensorRT image includes from a deep learningthe global inference object optimizer consi anddering runtime that thread delivers locking, compresses the image lowto latencyJPEG, and adds high throughputheaders for to deep the learning encoded inference stream applications. as an TensorRT image pro- object, and then sends it to the vides INT8 and FP16 optimizations and the reduced precision significantly reduces the inferenceHTTP latency.client, i.e., to the smartphone. Making a separate thread for streaming video solves AI 2021, 2 297

If a change occurs in any of the requested detection results, a message containing the results of the detections is sent to the user’s smartphone using Firebase Cloud Messaging (FCM) [44]. FCM can send a message of a maximum of 4 KB at no cost using the Internet to a client app. Using FCM, a smartphone app can be immediately notified whenever new data are available to sync. The message is sent using a Server key—that is generated from the cloud server where the smartphone app is registered.

2.2.2. Smartphone App The Android platform was used to develop the smartphone app. The first screen of the app contains a WebView object [45] to show the real-time video of the baby; a toggle button to start and to stop the baby monitor device; labels for displaying alerts, last detection status update time, and connection status with the baby monitor device using the Internet. It also contains a button to clear the alerts manually by the user and the button is visible only when there is at least one alert present. The app contains a settings menu for configuration. It contains check boxes for enabling or disabling real-time video and the four detection alerts such as face covered, blanket removed, moving or awake; textboxes for the baby monitor device’s public IP and port number; and checkboxes for enabling or disabling voice and vibration alert. To make an HTTP request to the baby monitor device from the smartphone app, the public IP of the device is required. If the smartphone is connected with the same Wi-Fi network of the device, then the public IP of the Wi-Fi network can be auto-filled by pressing a button— that sends an HTTP request to https://ipecho.net/plain (accessed on 17 June 2021).and it responds to the IP from where the request was made. Once the user exits from the settings menu by pressing the back button, the app saves the data in the settings.dat file and sends an HTTP GET request to the device using the settings word and a binary string indicating the requested detections in the URL. If the baby monitor device is online, then it responds with a success message, and the connection status in the app with the device is shown as connected. When the user presses the start/stop toggle button, the app sends HTTP GET requests to the device containing the word start/stop, namely in the URL. After starting, the URL for the WebView is set as the public IP of the device with port number and thus it shows the real-time video of the baby in the app. Whenever the detection status is changed in the device, a new FCM message containing the current status arrives in the smartphone app. The flowchart in Figure8 shows the actions taken by the smartphone whenever the FCM message is received. In this way, the data are synchronized between the device in the home and the user’s smartphone—at whatever place the user may be in the world. When it receives an FCM message, a call back function is called and the app saves the current detection status—such as is face covered, is blanket removed, is moving, or is awake—with the current date/time information in the status.dat file. The app then generates smartphone notifications, voice alerts, and vibration alerts—depending on the last detection status and alert settings. The voice alert is generated using text-to-speech and it speaks out loud the harmful situation such as “Alert: Face Covered”, “Alert: Blanket Removed”, etc. The voice alerts and the phone vibration alerts are repeated using a timer. If the FCM message contains no alerts, then the timer is disabled, speech is turned off, and the notifications are cleared. The first screen label of the app is then updated to show the current alert status of the baby. Once the user is aware of the alerts, the user may click a clear alert button to manually stop the alerts. AI 2021, 2, FOR PEER REVIEW 9

speaks out loud the harmful situation such as “Alert: Face Covered”, “Alert: Blanket Re- AI 2021, 2, FORmoved”, PEER REVIEW etc. The voice alerts and the phone vibration alerts are repeated using a timer. If 9 the FCM message contains no alerts, then the timer is disabled, speech is turned off, and the notifications are cleared. The first screen label of the app is then updated to show the current alert statusspeaks of the out baby. loud Oncethe harmful the user situation is aware such of as the “Alert: alerts, Face the Covered”, user may “Alert: click Blanketa Re- clear alert button moved”,to manu etc.ally The stop voice the alertsalerts. and the phone vibration alerts are repeated using a timer. If the FCM message contains no alerts, then the timer is disabled, speech is turned off, and AI 2021, 2 the notifications are cleared. The first screen label of the app is then updated to show the298 current alert status of the baby. Once the user is aware of the alerts, the user may click a clear alert button to manually stop the alerts.

Figure 8. Alert generation after FCM message arrives.

3. Results FigureFigure 8. 8. AlertAlert generation generation after after FCM FCM message message arrives. arrives.

The detection3.3. algorithmResults Results results and the prototype results are discussed below. TheThe detection detection algorithm algorithm results results and the prototype results are discussed below. 3.1. Detection Algorithm Results The face covered3.1.3.1. Detection Detection and blanket Algorithm Algorithm removed Results Results detection results are shown in Figure 9 and Figure 10, respectively.TheThe They face show coveredcovered the andand locati blanket blanketons removedof removed the detected detection detection body results results parts are are shownand shown the in skele-Figures in Figure9 and 9 and 10 , ton in green colorFigurerespectively. on different 10, respectively. They realistic show They baby the locations show dolls the in of locatidifferent the detectedons of light the body detected conditions. parts body and theIn parts Figure skeleton and the in skele- green 11, the different processingtoncolor in ongreen different steps colorrealistic onare different shown baby forrealistic dolls moving in differentbaby detection.dolls light in conditions.different Figure light 12 In Figure showsconditions. 11 the, the In different Figure processing steps are shown for moving detection. Figure 12 shows the face and eye face and eye landmark11, the detection different processingfor awake steps detection are shown in green for movingcolor for detection. eye open Figure and eye12 shows the landmark detection for awake detection in green color for eye open and eye close situations close situations inface two and different eye landmark light conditions. detection for Here, awake the de EARtection is 0.43 in green in Figure color for12a eye when open and eye closein two situations different in light two conditions. different light Here, conditions. the EAR isHere, 0.43 the in Figure EAR is 12 0.43a when in Figure the eye 12a is when open, the eye is open, andand the the EAR EAR is is 0.17 inin FigureFigure 12 12bb when when the the eye eye is closed. is closed. the eye is open, and the EAR is 0.17 in Figure 12b when the eye is closed.

(a) (a) (b) (b) (c) (c) (d) (d)

Figure 9. Cont. AI 2021, 2, FOR PEER REVIEWAI 2021 , 2 10 299

AI 2021, 2, FOR PEER REVIEW 10

(e) (f) (g) (h)

(e) (f) (g) (h)

(i) (j) (k) (l) (i) (j) (k) (l) FigureFigure 9. (a–d 9.):Figure ( captureda–d): 9. captured (a– imagesd): captured images of dolls images of in dolls theof dolls day in in andthe the day nightday andand conditions; night night conditions; conditions; (e–h) ( nosee–h) (nose detectede–h )detected nose (with detected (with other body(with parts), indicatingother face body isother not parts), body covered; parts)indicating (i,– indicatingl) nose face undetected, face is not is not covered; covered; indicating (i––ll) face) nose nose is undetected covered. undetected,, indicating indicating face is covered.face is covered.

(a) (b) (c) (d)

(a) (b) I (d)

(e) (f) (g) (h)

(e) (f) (g) (h)

(i) (j) (k) (l)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 10. (a–d):Figure captured 10. (a– imagesd): captured of dolls images with of dolls a blanket with a inblanket the day in the and day night and night conditions; conditions; (e– h(e)–h lower) body parts not lower body parts not detected, indicating blanket is not removed; (i–l) lower body parts (such as detected, indicating blanket is not removed; (i–l) lower body parts (such as hips) detected, indicating blanket is removed; hips) detected, indicating blanket is removed; (m–p) nose undetected and lower body parts de- (m–p) nose undetectedtected, indicating and lower face body is covered parts detected,and blanket indicating is removed. face is covered and blanket is removed.

(m) (n) (o) (p) Figure 10. (a–d): captured images of dolls with a blanket in the day and night conditions; (e–h) lower body parts not detected, indicating blanket is not removed; (i–l) lower body parts (such as hips) detected, indicating blanket is removed; (m–p) nose undetected and lower body parts de- tected, indicating face is covered and blanket is removed. AI 2021, 2, FOR PEERAI 2021 REVIEW, 2 11 300 AI 2021, 2, FOR PEER REVIEW 11

(a) (b) (c) (d) (a) (b) (c) (d)

(e) (f) (g) (h) (e) (f) (g) (h) Figure 11. (a) the previous frame; (b) previous frame grayscaled and blurred; (c) the current frame; Figure 11. Figure((a)d) thecurrent 11. previous (a) frame the frame;previous graysca (bled )frame; previous and blurred;(b) previous frame (e) grayscaled absolute frame grayscaled difference and blurred; and image blurred; (c )between the current(c) the(b) currentand frame; (d); frame; ((df) current frame grayscaled(black andd) current blurred; and white frame (e) absolutebinary graysca imled differenceage and using blurred; imagethresholding; ( betweene) absolute (g (b) ,imagedifferenced); (f) blackafter image dilation; and whitebetween (h) binary contours (b) and image shown (d); using (f) in thresholding; black and white binary image using thresholding; (g) image after dilation; (h) contours shown in (g) image afterred. dilation; (h) contours shown in red. red.

(a) (b) (c) (d) (a) (b) (c) (d) Figure 12. FaceFigure and 12. eye Face landmark and eye detection: landmark ( adetection:) eye opened (a) eye in day opened (EAR in = day 0.43); (EAR (b) eye = 0.43); closed (b) in eye day closed (EAR in = 0.17); (c) eye Figure 12. Face and eye landmark detection: (a) eye opened in day (EAR = 0.43); (b) eye closed in opened in nightday (EAR (EAR = = 0.17); 0.42); (c (d) eye) eye opened closed in night (EAR = 0.42); 0.19). (d) eye closed in night (EAR = 0.19). day (EAR = 0.17); (c) eye opened in night (EAR = 0.42); (d) eye closed in night (EAR = 0.19). 3.2. Prototype Results 3.2. Prototype Results3.2. Prototype Results A prototype of the proposed smart baby monitor device and smartphone app has A prototype ofA the prototype proposed of smart the proposed baby monitor smart device baby monitor and smartphone device and app smartphone has app has been developed and tested successfully. A photograph of the prototype device is shown been developedbeen and developedtested successfully. and tested A successfully. photograph A of photograph the prototype of the device prototype is shown device is shown in in Figure 13. The physical dimension of the Jetson Nano board is 69 mm × 45 mm and the× in Figure 13. TheFigure physical 13. Thedimension physical of dimensionthe Jetson Nano of the board Jetson is Nano 69 mm board × 45 ismm 69 and mm the45 mm and the total power consumption of the proposed device is measured to be around 24.5 watts [46]. total power consumptiontotal power of consumption the proposed of device the proposed is measured device to be is measuredaround 24.5 to watts be around [46]. 24.5 watts [46]. Different harmful and unwanted situations such as face covered due to sleeping on the Different harmfulDifferent and unwanted harmful andsituations unwanted such situationsas face covered such as due face to coveredsleeping due on tothe sleeping on the stomach, blanketstomach, removed, blanket frequently removed, moving, frequently and awake moving, were and created awake in were baby created dolls in in baby dolls in stomach, blanket removed, frequently moving, and awake were created in baby dolls in both daylight andboth night daylight environments, and night environments,and the proposed and baby the proposedmonitor was baby able monitor to detect, was able to detect, both daylight and night environments, and the proposed baby monitor was able to detect, send, and generatesend, alerts and generatein the smartphone. alerts in the Some smartphone. screenshots Some of the screenshots smartphone of the app smartphone are app are send, and generate alerts in the smartphone. Some screenshots of the smartphone app are shown in Figureshown 14. The in smartphone Figure 14. The was smartphone taken out of was the taken range out of ofhome the rangeWi-Fi ofand home it was Wi-Fi and it was shown in Figure 14. The smartphone was taken out of the range of home Wi-Fi and it was connected withconnected the Internet with using the the Internet cellular using network. the cellular In this network. scenario, In the this video scenario, stream the video stream connected with the Internet using the cellular network. In this scenario, the video stream was received, andwas the received, alerts were and thealso alerts generated were alsosuccessfully generated in successfullythe smartphone. in the smartphone. was received, and the alerts were also generated successfully in the smartphone. AI 2021, 2 301 AI 2021, 2, FOR PEER REVIEW 12 AI 2021, 2, FOR PEER REVIEW 12

(a) (b) (a) (b) Figure 13. Photograph of the smart baby monitor device: (a) closeup view of the device–(1) Jetson Nano microcontroller Figure 13. Photograph of the smart baby monitor device: ( aa)) closeup closeup view view of of the device–(1) Jetson Nano microcontroller board, (2) USB Wi-Fi dongle, (3) DC power adaptor, (4) power and reset switches; (b) birds eye view of the setup–(5) board, (2) USBUSB Wi-FiWi-Fi dongle, dongle, (3) (3) DC DC power power adaptor, adaptor, (4) power(4) power and and reset reset switches; switches; (b) birds (b) birds eye view eye ofview the setup–(5)of the setup–(5) camera camera with night vision capability, (6) doll with blanket. camerawith night with vision night capability, vision capability, (6) doll with(6) doll blanket. with blanket.

(a) (b) (c) (d) (a) (b) (c) (d) Figure 14. Screenshots of the smartphone app: (a) The first screen of the app showing live video stream, last status update Figure 14. Screenshots of the smartphone app: (a) The first screen of the app showing live video stream, last status update Figuredate and 14. Screenshotstime, start/stop of the toggle smartphone button, and app: connection (a) The first status; screen (b of) thesettings app showingwindow livefor configuring video stream, the last devices status public update IP date and time, start/stop toggle button, and connection status; (b) settings window for configuring the devices public IP dateand and port, time, video start/stop stream, toggleand detection button, alerts; and connection (c) voice and status; vibration (b) settings alert windowgenerated for wh configuringen the baby’s the face devices is covered; public IP(d) and port, video stream, and detection alerts; (c) voice and vibration alert generated when the baby’s face is covered; (d) andalert port, generated video stream, in night and light detection condition alerts; when (c) voice the baby and vibrationthrew off alert the generatedblanket, frequently when the mo baby’sving face and is eye covered; opened (d) due alert to alert generated in night light condition when the baby threw off the blanket, frequently moving and eye opened due to generatedawake. in night light condition when the baby threw off the blanket, frequently moving and eye opened due to awake. awake. TableTable1 shows1 shows the the latency latency for for detecting detecting body body parts parts for for the the face face covered covered and and blanket blanket Table 1 shows the latency for detecting body parts for the face covered and blanket removedremoved detection, detection, moving moving detection, detection, awake awake detection, detection, and and when when all all of of these these detections detections removed detection, moving detection, awake detection, and when all of these detections areare enabled—for enabled—for videovideo streamingstreaming enabled and disa disabledbled cases. cases. Here, Here, it itis isseen seen that that the the de- are enabled—for video streaming enabled and disabled cases. Here, it is seen that the de- detectiontection times times are are fast fast (much (much less less than than a a second)—due to thethe implementationimplementation ofofNVIDIA- NVIDIA- tection times are fast (much less than a second)—due to the implementation of NVIDIA- TensorRTTensorRT [42 [42,43].,43]. Though Though video video streaming streaming is is performed performed using using a a separate separate thread, thread, it it is is seen seen TensorRT [42,43]. Though video streaming is performed using a separate thread, it is seen thatthat the the detection detection latencies latencies are are slightly slightly less less when when video video streaming streaming is is turned turned off. off. After After the the that the detection latencies are slightly less when video streaming is turned off. After the detection, the proposed device sends a notification using FCM to the smartphone. The detection, the proposed device sends a notification using FCM to the smartphone. The notification generally reaches within a second. notification generally reaches within a second. AI 2021, 2 302

detection, the proposed device sends a notification using FCM to the smartphone. The notification generally reaches within a second. AI 2021, 2, FOR PEER REVIEW 13

Table 1. Latency of detection algorithms on the Jetson Nano device.

Video Streaming Detection Latency (Second) Table 1. Latency of detection algorithms on the Jetson Nano device. Body parts (for face covered and blanket removed) 0.1096 Video Streaming Detection Latency (Second) Moving 0.0001 Yes Body parts (for face covered and blanket removed) 0.1096 MovingAwake 0.0001 0.0699 Yes All (Body partsAwake + Moving + Awake)0.0699 0.1821 Body partsAll (Body (for face parts covered + Moving and blanket+ Awake) removed) 0.1821 0.1091 Body parts (for face covered and blanket removed) 0.1091 Moving 0.0001 No Moving 0.0001 No Awake 0.6820 0.6820 All (Body parts + Moving + Awake) Awake) 0.1807 0.1807

AlongAlong with with baby baby dolls, dolls, the the proposed proposed detect detectionion methods methods were were also also applied applied on on some some babybaby images images available available online, online, as as shown shown in in Fi Figuregure 15. 15 .Here, Here, it itis is seen seen that that the the proposed proposed methodsmethods can can successfully successfully detect detect the the no no alert alert situation situation as as in in Figure Figure 15a, 15a, the the face face covered covered andand blanket blanket removed removed condition condition as as in in Figure Figure 15b, 15b, the the only only blanket blanket removed removed condition condition as as inin Figure Figure 15c,d, 15c,d, the the eye eye closed closed condition condition as as in in Figure Figure 15e, 15e, and and the the awake awake condition condition as as in in FigureFigure 15f. 15 f.Frequent Frequent moving moving detection detection does does not not need need human human pose pose or or face face detection; detection; thus, thus, itsits results results do do not not depend depend on on whet whetherher doll doll or or real real baby baby is is used. used.

(a) (b) (c)

(d) (e) (f)

FigureFigure 15. 15. DetectionDetection methods methods applied applied on on real real baby baby images: images: (a)( ababy) baby sleeping sleeping on onthe the back back with with the the blanketblanket on—no on—no alert; alert; (b) ( bbaby) baby sleeping sleeping on onstomach stomach causing causing nose nose undetected—face undetected—face covered covered alert. alert. Hip,Hip, knee, knee, and and ankle ankle are are visi visible—blanketble—blanket removed removed alert; alert; (c, (dc), dbaby) baby sleeping sleeping on onthe the back back and and nose nose is visible. Hip, knee, and ankle are visible—blanket removed alert; (e) baby sleeping, showing eye is visible. Hip, knee, and ankle are visible—blanket removed alert; (e) baby sleeping, showing eye landmarks (EAR = 0.19); (f) baby awake, showing eye landmarks (EAR = 0.35)—baby awake alert. landmarks (EAR = 0.19); (f) baby awake, showing eye landmarks (EAR = 0.35)—baby awake alert. 4. Discussion 4. Discussion Table 2 shows the comparison of the proposed work with other works. To date, there Table2 shows the comparison of the proposed work with other works. To date, there is no baby monitor work found in the literature that can detect a thrown off blanket, or is no baby monitor work found in the literature that can detect a thrown off blanket, or being awake due to eyes being opened. This research focused on detecting alarmable sit- being awake due to eyes being opened. This research focused on detecting alarmable uations from image data only. Implementing detections from audio data and other wear- able sensors is planned for the future. To detect the nose for face covered detection, the body parts detection method as described in [22,23] are used. Other possible nose detection methods could be the MTCNN [33] and the facial landmark detector in [34]. One problem with [34] is that if any of the AI 2021, 2 303

situations from image data only. Implementing detections from audio data and other wearable sensors is planned for the future. To detect the nose for face covered detection, the body parts detection method as described in [22,23] are used. Other possible nose detection methods could be the MTCNN [33] and the facial landmark detector in [34]. One problem with [34] is that if any of the facial landmarks are covered, then all the other facial landmarks cannot be detected. As the body parts detection method can be used for both face covered and blanket removed detection, this method is preferred over other options. The logic definition of blanket removed detection, isBlanketRemoved, as shown in Figure3, can be made configurable by the user using the smartphone. Users may configure logical OR/AND among the visibleness of hip, knee, and ankle according to their needs as simply OR-ing these body parts may not be suitable for everyone’s needs. During the experiment, it was found that sometimes, the detections of some body parts are missed in a few frames—especially during night light conditions. This detection fluctuation is common in most object detection algorithms. Training the model with more grayscale baby images might improve the detections in night conditions. Frequent moving is defined as if there is at least one transient motion in three consec- utive blocks of time. The duration of a single block of time and the number of consecutive blocks can be made configurable by the smartphone app; thus, the user can choose these values according to their needs. For awake detection, the MTCNN [33] face detector is used. One limitation of the available face detectors is that sometimes, it misses the face if it is not aligned vertically. Another option is to use the Haar cascade [47–49] face detector, which runs fast. However, it was found during the experiment that it misses the face often, especially if the image contains a side face and in low light conditions. It is also possible to combine frequent moving and awake detection using logical AND—so the user will be notified when the baby is frequently moving when awake only.

Table 2. Comparison with other works.

Motorola Infant Optics Nanit Lollipop Cubo Ai Work Proposed [9] [10] [11] [12] [49] Live Video Yes Yes Yes Yes Yes Yes Boundary No No No Yes Yes No Cross Detection Cry detection No No No Yes Yes No Breathing Monitoring No No Yes No No No Face Covered No No No No Yes Yes Detection Blanket Removed Detection No No No No No Yes Frequent Moving Detection No No Yes No No Yes Awake Detection from Eye No No No No No Yes

In this work, the focus is given to detections from image data only. It is planned to process audio data and detect cry sounds in the future. Another future work is to use a thermal camera to read the baby’s body temperature and generate alters based on those data. Privacy and security are a concern for this type of device. To make the system more secure, it is planned to utilize POST requests instead of GET requests and also to encrypt the data. The models such as Densenet121 for pose detection, the MTCNN for face detection, and regression trees for face landmark detections are trained with human images from datasets such as COCO [25], 300-W [37], etc. As these datasets contain human images of various postures having complex backgrounds, it can be expected that the models trained with these images will work with real humans and . The experiments carried out with the online baby images in Figure 14 indicate its applicability with real subjects. AI 2021, 2 304

However, new challenges might be faced when it is tested on real subjects such as baby sucking thumbnail while sleeping, only side face visible, alignment issues, etc. To solve these problems, a new dataset of baby images having complex sleeping postures may need to be developed, the data labelled, and then the models retrained using transfer learning [50] with the new dataset. It is planned to test the system with real baby subjects in the future after institutional review board (IRB) approval.

5. Conclusions In this paper, an intelligent baby monitor device is developed that automatically detects face covered, blanket removed, frequent moving, and awake using deep learning and image processing algorithms. The algorithms are implemented in a microcontroller- based system interfaced with a camera and results show that they run successfully in real-time with low latency. The device implements an HTTP server and sends alerts to the caregiver’s smartphone using cloud messaging whenever one or more of these unwanted situations occur. Though some of the recently available baby monitors implement some automatic detection features, the proposed work contributes a new study for detecting the blanket removed from the baby’s body and awake detection by analyzing the eye images. Applying this new and useful knowledge in baby monitors can imply more peace of mind for the caregivers. The prototype of the baby monitor device and the smartphone app has been tested successfully with images and dolls of different races in both day and night light conditions.

Author Contributions: Conceptualization, methodology, software, validation, analysis, investiga- tion, resources, writing—original draft preparation, writing—review and editing, visualization, supervision, project administration, funding acquisition, by T.K. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by the Faculty Research Fellowship (FRF) award of Eastern Michigan University. Acknowledgments: The author would like to thank Aditya Annavarapu for collecting sleeping baby images. Conflicts of Interest: The authors declare no conflict of interest.

References 1. U.S. Department of Health & Human Services. Sudden Unexpected Infant Death and Sudden Infant Death Syndrome. Available online: https://www.cdc.gov/sids/data.htm/ (accessed on 22 February 2021). 2. The Bump. Is It Okay for Babies to Sleep on Their Stomach? Available online: https://www.thebump.com/a/baby-sleeps-on- stomach/ (accessed on 22 February 2021). 3. Illinois Department of Public Health. SIDS Fact Sheet. Available online: http://www.idph.state.il.us/sids/sids_factsheet.htm (accessed on 22 February 2021). 4. How Can I Keep My Baby Warm at Night? Available online: https://www.babycenter.in/x542042/how-can-i-keep-my-baby- warm-at-night (accessed on 4 March 2021). 5. 5 Reasons Why Your Newborn Isn’t Sleeping at Night. Available online: https://www.healthline.com/health/parenting/ newborn-not-sleeping (accessed on 4 March 2021). 6. Nordqvist, C. New Parents Have 6 Months Sleep Deficit During First 24 Months of Baby’s Life. Medical News Today, 25 July 2010. Available online: https://www.medicalnewstoday.com/articles/195821.php/ (accessed on 22 February 2021). 7. Belyh, A. The Future of Human Work is Imagination, Creativity, and Strategy. CLEVERISM, 25 September 2019. Available online: https://www.cleverism.com/future-of-human-work-is-imagination-creativity-and-strategy/ (accessed on 22 February 2021). 8. Janssen, C.P.; Donker, S.F.; Brumby, D.P.; Kun, A.L. History and future of human-automation interaction. Int. J. Hum. Comput. Stud. 2019, 131, 99–107. [CrossRef] 9. MOTOROLA. MBP36XL Baby Monitor. Available online: https://www.motorola.com/us/motorola-mbp36xl-2-5-portable- video-baby-monitor-with-2-cameras/p (accessed on 3 June 2021). 10. Infant Optics. DXR-8 Video Baby Monitor. Available online: https://www.infantoptics.com/dxr-8/ (accessed on 3 June 2021). 11. Nanit Pro Smart Baby Monitor. Available online: https://www.nanit.com/products/nanit-pro-complete-monitoring-system? mount=wall-mount (accessed on 3 June 2021). 12. Lollipop Baby Monitor with True Crying Detection. Available online: https://www.lollipop.camera/ (accessed on 22 February 2021). AI 2021, 2 305

13. Cubo Ai Smart Baby Monitor. Available online: https://us.getcubo.com (accessed on 1 June 2021). 14. Day and Night Vision USB Camera. Available online: https://www.amazon.com/gp/product/B00VFLWOC0 (accessed on 3 March 2021). 15. Jetson Nano Developer Kit. Available online: https://developer.nvidia.com/embedded/jetson-nano-developer-kit (accessed on 3 March 2021). 16. 15” Realistic Soft Body Baby Doll with Open/Close Eyes. Available online: https://www.amazon.com/gp/product/B00OMVPX0 K (accessed on 3 March 2021). 17. Asian 20-Inch Large Soft Body Baby Doll. Available online: https://www.amazon.com/JC-Toys-Asian-Baby-20-inch/dp/B074 N42T3S (accessed on 3 March 2021). 18. African American 20-Inch Large Soft Body Baby Doll. Available online: https://www.amazon.com/gp/product/B01MS9SY16 (accessed on 3 March 2021). 19. Caucasian 20-Inch Large Soft Body Baby Doll. Available online: https://www.amazon.com/JC-Toys-Baby-20-inch-Soft/dp/B0 74JL7MYM (accessed on 3 March 2021). 20. Hispanic 20-Inch Large Soft Body Baby Doll. Available online: https://www.amazon.com/JC-Toys-Hispanic-20-inch-Purple/ dp/B074N4C6J7 (accessed on 3 March 2021). 21. NVIDIA AI IOT TRT Pose. Available online: https://github.com/NVIDIA-AI-IOT/trt_pose (accessed on 8 March 2021). 22. Xiao, B.; Wu, H.; Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. 23. Cao, Z.; Simon, T.; Wei, S.; Sheikh, Y. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1302–1310. 24. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. 25. COCO Dataset. Available online: https://cocodataset.org (accessed on 8 March 2021). 26. Faber, M. How to Analyze the COCO Dataset for Pose Estimation. Available online: https://towardsdatascience.com/how-to- analyze-the-coco-dataset-for-pose-estimation-7296e2ffb12e (accessed on 8 March 2021). 27. Rosebrock. Available online: https://www.pyimagesearch.com/2015/05/25/basic-motion-detection-and-tracking-with-python- and-opencv/ (accessed on 18 March 2021). 28. Opencv-Motion-Detector. Available online: https://github.com/methylDragon/opencv-motion-detector (accessed on 18 March 2021). 29. Gedraite, E.S.; Hadad, M. Investigation on the effect of a Gaussian Blur in image filtering and segmentation. In Proceedings of the ELMAR-2011, Zadar, Croatia, 14–16 September 2011; pp. 393–396. 30. Image Thresholding. Available online: https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html (accessed on 18 March 2021). 31. Dilation. Available online: https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html (accessed on 18 March 2021). 32. Contours. Available online: https://docs.opencv.org/master/d4/d73/tutorial_py_contours_begin.html (accessed on 18 March 2021). 33. Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [CrossRef] 34. Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. 35. Dlib Shape Predictor. Available online: http://dlib.net/imaging.html#shape_predictor (accessed on 1 April 2021). 36. Rosebrock. Facial Landmarks with Dlib, OpenCV, and Python. Available online: https://www.pyimagesearch.com/2017/04/03 /facial-landmarks-dlib-opencv-python (accessed on 1 April 2021). 37. Facial Point Annotations. Available online: https://ibug.doc.ic.ac.uk/resources/facial-point-annotations (accessed on 1 April 2021). 38. Edimax 2-in-1 WiFi and 4.0 Adapter. Available online: https://www.sparkfun.com/products/15449 (accessed on 7 April 2021). 39. Flask Server. Available online: https://flask.palletsprojects.com/en/1.1.x (accessed on 7 April 2021). 40. Yatritrivedi; Fitzpatrick, J. How to Forward Ports on Your Router. Available online: https://www.howtogeek.com/66214/how- to-forward-ports-on-your-router/ (accessed on 7 April 2021). 41. Thread-Based Parallelism. Available online: https://docs.python.org/3/library/threading.html (accessed on 7 April 2021). 42. NVIDIA-TensorRT. Available online: https://developer.nvidia.com/tensorrt (accessed on 1 March 2021). 43. TensorRT Demos. Available online: https://github.com/jkjung-avt/tensorrt_demos (accessed on 1 March 2021). 44. Firebase Cloud Messaging. Available online: https://firebase.google.com/docs/cloud-messaging (accessed on 7 April 2021). 45. WebView. Available online: https://developer.android.com/guide/webapps/webview (accessed on 13 April 2021). 46. Jetson’s Tegrastats Utility. Available online: https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver% 20Package%20Development%20Guide/AppendixTegraStats.html (accessed on 21 April 2021). 47. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001. AI 2021, 2 306

48. Viola, P.; Jones, M. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [CrossRef] 49. Lienhart, R.; Maydt, J. An extended set of Haar-like features for rapid object detection. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002. 50. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Artificial Neural Networks and Machine Learning—ICANN 2018; K ˚urková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I., Eds.; ICANN 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11141. [CrossRef]