Remote Eye-Tracking for Cognitive Telerehabilitation and Interactive School Tasks in Times of COVID-19

information

Concept Paper Remote Eye-Tracking for Cognitive Telerehabilitation and Interactive School Tasks in Times of COVID-19

1, , 1, 2, 2, Giancarlo Iannizzotto * † , Andrea Nucita † , Rosa Angela Fabio † , Tindara Caprì † 3, and Lucia Lo Bello † 1 Department of Cognitive Sciences, Psychological, Educational and Cultural Studies, University of Messina, 98122 Messina, Italy; [email protected] 2 Department of Clinical and Experimental Medicine, University of Messina, 98122 Messina, Italy; [email protected] (R.A.F.); [email protected] (T.C.) 3 Department of Electrical, Electronics and Computer Engineering, University of Catania, 95125 Catania, Italy; [email protected] * Correspondence: [email protected] These authors contributed equally to this work. † Received: 19 May 2020; Accepted: 29 May 2020; Published: 1 June 2020

Abstract: In the attempt to mitigate the effects of COVID-19 lockdown, most countries have recently authorized and promoted the adoption of e-learning and remote teaching technologies, often with the support of teleconferencing platforms. Unfortunately, not all students can benefit from the adoption of such a surrogate of their usual school. We were asked to devise a way to allow a community of children affected by the Rett genetic syndrome, and thus unable to communicate verbally, in writing or by gestures, to actively participate in remote rehabilitation and special education sessions by exploiting eye-gaze tracking. As not all subjects can access commercial eye-tracking devices, we investigated new ways to facilitate the access to eye gaze-based interaction for this specific case. The adopted communication platform is a videoconferencing software, so all we had at our disposal was a live video stream of the child. As a solution to the problem, we developed a software (named SWYG) that only runs at the “operator” side of the communication, at the side of the videoconferencing software, and does not require to install other software in the child’s computer. The preliminary results obtained are very promising and the software is ready to be deployed on a larger base. While this paper is being written, several children are finally able to communicate with their caregivers from home, without relying on expensive and cumbersome devices.

Keywords: video-based eye-gaze tracking; video conferencing; remote interactive education; telerehabilitation; COVID-19

1. Introduction Since its outbreak, the COVID-19 pandemic has produced disruptive effects on the school education of millions of children all around the world. The loss of learning time for students of every age varies from country to country, according to the local patterns of contagion and the consequent government actions, however, the effects will be severe everywhere. In the attempt to mitigate such effects, most countries have authorized and promoted the adoption of e-learning and remote teaching technologies, often with the support of teleconferencing platforms such as Microsoft Teams, Zoom, Cisco Webex, Google Meet and others. With the support of such platform, both teachers and students have gradually and partially recovered their previous habits, by meeting daily within sorts of “virtual classrooms” where teachers and students could meet and interact almost lively, being able to talk to each other and see each other. If, on one end, the effectiveness of such a surrogate of the “normal” school is still to be

Information 2020, 11, 296; doi:10.3390/info11060296 www.mdpi.com/journal/information Information 2020, 11, 296 2 of 9 assessed and is widely criticized, on the other hand, it is undoubtedly an effective emergency solution that could be promptly deployed thanks to the widespread availability of its enabling technologies. Unfortunately, not all students can benefit from the adoption of teleconferencing platforms as a surrogate of their usual school. Children and young students with disabilities face severe barriers that, in several cases, prevent them from accessing any kind of remote education at all [1]. Among children with disabilities, those with Multiple Disabilities (MD) face the worst difficulties and, at the same time, need remote education and interventions more than others, as they cannot receive the usual help from caregivers and professionals. People affected by MDs have severe difficulty communicating their needs, freely moving their body to access and engage their world and learning abstract concepts and ideas. The lockdown is heavily affecting their lives, as it prevents them from receiving the sparkle of light that comes from their periodic meetings with caregivers and, in some cases, even their beloved ones. Moreover, the rehabilitation interventions are in general more effective during the early stages after the insurgence of the disease [2,3], therefore the ability to continue the cognitive rehabilitation and special education interventions becomes crucial. Rett Syndrome (RTT) is a complex genetic disorder caused by mutations in the gene encoding the methyl CpG binding protein 2 (MECP2) [4] that induces severe multiple disabilities. As Rett Syndrome is a rare disease, affecting 1:10,000 girls by age 12 (almost all RTT subjects are female), there are few specialized centers and therapists and accessing the needed treatments was a big challenge for the subjects and their families already before the lockdown. RTT subjects suffer from severe impairments in physical interaction, that in some cases leave them with only the ability to communicate and interact by means of their eye gaze [5]. As a consequence, for such subjects participating in online, interactive educational activities is particularly difficult. On the other hand, it has been demonstrated that technology-mediated cognitive rehabilitation and special education approaches are very effective in treating RTT subjects [6,7]. The influence of digital media on Typically Developing (TD) children has been extensively investigated in the last few years, due to the need for an adequate balance in their usage with respect to other activities [6,8–13]. However, in the case of RTT subjects, studies examining the effectiveness of multimedia technologies have demonstrated very positive effects on the cognitive, communicative and motivational abilities of subjects [14]. Furthermore, some studies have found that subjects with RTT have prerequisites underlying Theory of Mind (ToM) abilities, such as gaze control and pointing [15–17]. Other studies have demonstrated that ToM is not impaired in RTT [18]. For all these theoretical reasons, we think that RTT subjects may benefit from the possibility of joining social school interactions actively, as during these social interactions there might be opportunities for games, surprises and learning and so on. However, as the interaction between the subject and the digital platform must be mostly based on eye gaze direction estimation, specialized hardware must be acquired and installed on the subject’s side. Such a specialized hardware can be expensive and introduces a further degree of complexity in the already-hindered familiar routines. If, on one end, innovative software and hardware platforms are being developed to bring, in the near future, more and more effective rehabilitation and educational services [19,20], on the other hand, the current lockdown due to the COVID-19 pandemic and, more in general, the non-uniform geographical distribution and “physical” availability of the needed rehabilitation and special educational services, have lately produced an actual and immediate emergency. In the case of TD subjects, teleconferencing platforms and applications could be easily adopted mainly because the hardware and the technological infrastructure supporting them were already available to most of the families. However, this is not the case for RTT and, in general, for many MD subjects because such teleconferencing applications do not support eye-gaze tracking. In this paper we describe motivations, design, implementation and early results of a remote eye-tracking architecture that aims at enhancing most common teleconferencing and video chatting applications with the ability to remotely estimate the direction of the eye gaze of the users. The proposed approach enables the user on one side of the communication channel (the operator) to automatically Information 2020, 11, 296x FOR PEER REVIEW 33 of of 9 automatically estimate the direction of the eye gaze of the other user (the subject), thus enabling eye estimate the direction of the eye gaze of the other user (the subject), thus enabling eye gaze-based gaze-based interaction, without requiring the subject to install additional software or hardware. interaction, without requiring the subject to install additional software or hardware. Moreover, Moreover, the data acquired through this software can also be saved and constitute the basis for the data acquired through this software can also be saved and constitute the basis for machine learning machine learning applications and advanced analysis techniques to provide decision-making applications and advanced analysis techniques to provide decision-making support to therapists, support to therapists, educators or researchers [21]. educators or researchers [21].

2. Materials Materials and and Methods The target population for this work is a community of about 300 girls diagnosed with RTT in The target population for this work is a community of about 300 girls diagnosed with RTT in Italy and their families. From this community, we took into consideration those subjects who only Italy and their families. From this community, we took into consideration those subjects who only communicate with eye gaze, thus sampling from about 270 subjects. The targeted subjects cannot communicate with eye gaze, thus sampling from about 270 subjects. The targeted subjects cannot actively participate in remote education and rehabilitation programs and, therefore, are at risk of not actively participate in remote education and rehabilitation programs and, therefore, are at risk of not receiving their needed treatments during the lockdown. receiving their needed treatments during the lockdown. We propose a solution named SWYG (Speak With Your Gaze) that requires only the operator to We propose a solution named SWYG (Speak With Your Gaze) that requires only the operator to install some additional software (no special hardware is needed) and use it together with the install some additional software (no special hardware is needed) and use it together with the preferred preferred Video Conferencing System (VCS), while the subjects are not required to install any Video Conferencing System (VCS), while the subjects are not required to install any additional software additional software or hardware. or hardware. SWYG (this word means also “silent” in Afrikaans) sits at the side of the VCS acquiring the SWYG (this word means also “silent” in Afrikaans) sits at the side of the VCS acquiring the content content of the video conferencing window, analyses it in near real-time and visualizes the direction of the video conferencing window, analyses it in near real-time and visualizes the direction of the eye of the eye gaze of the person appearing in the window. As a result, the operator using it can easily gaze of the person appearing in the window. As a result, the operator using it can easily and promptly and promptly receive a feedback from the eye gaze of the subject (Figure 1). receive a feedback from the eye gaze of the subject (Figure1).

Figure 1. With SWYG (Speak With Your Gaze), the operator receives an automatic indication on the Figure 1. With SWYG (Speak With Your Gaze), the operator receives an automatic indication on the eye gaze direction of the subject. Note that the subject is actually looking at her right. eye gaze direction of the subject. Note that the subject is actually looking at her right. SWYG is composed of four modules: the Video Acquisition Module (VAM); the Face Detection and AnalysisSWYG Module is composed (FDAM); of the four Eye-Gaze modules: Tracking the Video Module Acquisition (EGTM) Module and the Visualization(VAM); the Face Module Detection (VM). and TheAnalysis VAM Module identiﬁes (FDAM); the window the ofEye-Gaze the VCS thoughTracking its applicationModule (EGTM) name andand acquires the Visualization its content (i.e.,Module the (VM). video). It is based on the well-established “Screen Capture Lite” open source screencasting libraryThe freely VAM available identifies online the window [22]. of the VCS though its application name and acquires its contentThe (i.e., FDAM the video). receives It the is based video on stream the we fromll-established the VAM and “Screen processes Capture it, detecting Lite” open the source largest face in screencastingeach frame and library estimating freely itsavailable orientation. online Only [22]. the largest face is considered because other persons mightThe be FDAM present receives during the the session video stream at the subject’s from the place VAM and, and moreover, processes the it, detecting video from the the largest VCS often face inshows each the frame user currentlyand estimating selected its in aorientation. larger frame, Only and the several largest other face users is inconsidered smaller frames because (Figure other2). persons might be present during the session at the subject’s place and, moreover, the video from the VCS often shows the user currently selected in a larger frame, and several other users in smaller frames (Figure 2).

Information 2020, 11, x FOR PEER REVIEW 4 of 9 Information 2020, 11, 296 4 of 9 Information 2020, 11, x FOR PEER REVIEW 4 of 9

Figure 2. Most Video Conferencing Systems (VCS) show several users at a time on the screen. Figure 2. Most Video Conferencing Systems (VCS) show several users at a time on the screen. In thethe FDAM, FDAM, face face detection detection is performedis performed by exploiting by exploiting the very the everyffective effective and effi andcient efficient face detector face In the FDAM, face detection is performed by exploiting the very effective and efficient face fromdetector the from OpenCV the OpenCV library [23 library]. The [23]. face orientationThe face orie isntation then estimated is then estimated according according to the same to approachthe same detector from the OpenCV library [23]. The face orientation is then estimated according to the same describedapproach indescribed [16,24], andin [16] the and eyes [24], are detected and the byeyes leveraging are detected on the by DLib leveraging machine on learning the DLib library machine [25]. approach described in [16] and [24], and the eyes are detected by leveraging on the DLib machine Thelearning face positionlibrary [25]. and The orientation, face position as well and as theorientatio eyes’ positionsn, as well relative as the eyes’ to the positions face, are thenrelative passed to the to learning library [25]. The face position and orientation, as well as the eyes’ positions relative to the theface, EGTM are then module passed that to performsthe EGTM the module eye-gaze that estimation. performs the eye-gaze estimation. face, are then passed to the EGTM module that performs the eye-gaze estimation. As SWYG must run run in in near near real-time real-time on on a agene generalral purpose, purpose, average-level average-level notebook, notebook, we we had had to As SWYG must run in near real-time on a general purpose, average-level notebook, we had to totake take into into consideration consideration strict strict computational computational power power constraints. constraints. In particular, In particular, the theEGTM EGTM should should not take into consideration strict computational power constraints. In particular, the EGTM should not notrequire require the support the support of a ofpowerful a powerful Graphics Graphics Proce Processingssing Unit Unit(GPU) (GPU) or specialized or specialized hardware hardware for Deep for require the support of a powerful Graphics Processing Unit (GPU) or specialized hardware for Deep DeepNeural Neural Networks Networks computation. computation. We Wetherefore therefore deve developedloped our our own own software software solution solution for for eye-gaze Neural Networks computation. We therefore developed our own software solution for eye-gaze tracking, that that is is still still under under final final testing testing and and will will be beopen-sourced open-sourced soon. soon. Our OurEGTM EGTM is devised is devised to work to tracking, that is still under final testing and will be open-sourced soon. Our EGTM is devised to work workboth in both natural in natural and andnear-infrared near-infrared illumination illumination and and does does not not rely rely on on spec specialial hardware. hardware. The The main both in natural and near-infrared illumination and does not rely on special hardware. The main didifficultiesfficulties that had to be overcome were the low resolutionresolution of thethe inputinput eyeeye image,image, thethe non-uniformnon-uniform difficulties that had to be overcome were the low resolution of the input eye image, the non-uniform illumination and the dependency of the eye gaze dire directionction on both the head and camera orientations. illumination and the dependency of the eye gaze direction on both the head and camera orientations. We leveragedleveraged onon thethe accurateaccurate headhead orientationorientation and eye position estimations produced by the FDAM to We leveraged on the accurate head orientation and eye position estimations produced by the FDAM to solve the firstfirst andand thethe lastlast twotwo issues,issues, andand onon anan accurateaccurate locallocal histogramhistogram equalizationequalization to deal withwith solve the first and the last two issues, and on an accurate local histogram equalization to deal with non-uniform illumination. A preliminary version of th thee tracker is demonstrated in Figure 3 3,, wherewhere thethe non-uniform illumination. A preliminary version of the tracker is demonstrated in Figure 3, where the face orientation and and the the positions positions of of the the left left and and right right pupil pupil are are shown. shown. face orientation and the positions of the left and right pupil are shown.

Figure 3. Eye-gaze tracking and head orientation estimation. Figure 3. Eye-gaze tracking and head orientation estimation. Figure 3. Eye-gaze tracking and head orientation estimation. Once the pose (i.e., position and spatial orientation) of the head is estimated and the eyes are Once the pose (i.e., position and spatial orientation) of the head is estimated and the eyes are detected, each eye is analyzed, and the relative pupil orientation is determined by applying a correlation detected,Once eachthe pose eye (i.e.,is analyzed, position and spatialthe relative orientat pupilion) orientationof the head isis determinedestimated and by theapplying eyes are a matching algorithm enhanced with a Bayesian approach that constraints the position of the pupil in detected,correlation each matching eye is algorithm analyzed, enha andnced the withrelative a Bayesian pupil orientation approach thatis determined constraints bythe applyingposition ofa the eye image. Figure4 illustrates the very simple algorithm adopted. Each eye image undergoes correlationthe pupil in matching the eye image. algorithm Figure enha 4 ncedillustrates with ath Bayesiane very simple approach algorithm that constraints adopted. Each the position eye image of histogram equalization and gray levels ﬁltering in order to improve the image quality, then a correlation theundergoes pupil in histogram the eye image. equalization Figure 4and illustrates gray levels the veryfiltering simple in orderalgorithm to improve adopted. the Each image eye quality, image matching ﬁlter is applied, that produces an image where the points that have higher probability of undergoesthen a correlation histogram matching equalization filter isand applied, gray levels that producesfiltering in an order image to whereimprove the the points image that quality, have being the centers of the eye pupil have higher brightness. This likelihood is then weighted by a 2D thenhigher a correlationprobability matchingof being thefilter centers is applied, of the theyeat pupilproduces have an higher image brightness. where the Thispoints likelihood that have is Gaussian mask determined, for each frame and each eye, according to the size and shape of the eye higherthen weighted probability by aof 2D being Gaussian the centers mask ofdetermined, the eye pupil for eachhave framehigher and brightness. each eye, This according likelihood to the is then weighted by a 2D Gaussian mask determined, for each frame and each eye, according to the

Information 2020, 11, x FOR PEER REVIEW 5 of 9 Information 2020, 11, 296 5 of 9 sizeInformation and shape 2020, 11 of, x theFOR eye PEER image REVIEW in that frame. Such a mask represents a “prior” that constraints5 of the 9 position of the pupil in the eye according to its size and shape. Finally, the point that features the imageMaximumsize and in thatshape A Posteriori frame. of the Such eye (MAP) image a mask probability in represents that frame. of abein “prior”Suchg the a mask that center constraints represents of the pupil, thea “prior” position is selected. that of constraints the pupil in the the eyeposition according of the to pupil its size in andthe eye shape. according Finally, to the its point size thatand featuresshape. Finally, the Maximum the point A that Posteriori features (MAP) the probabilityMaximum ofA Posteriori being the center(MAP) ofprobability the pupil, of is bein selected.g the center of the pupil, is selected.

Figure 4. The eye-gaze tracking process pipeline.

FigureFigure 4. The eye-gaze tracking tracking process process pipeline. pipeline. The orientations of the two eyes are then averaged in order to obtain a more reliable estimate of the direction of the eye gaze. An ad-hoc calibration procedure is applied in order to map the pupils TheThe orientationsorientations ofof thethe twotwo eyes are then averaged in in order order to to obtain obtain a a more more reliable reliable estimate estimate of of theaveragedthe direction direction direction ofof thethe ontoeyeeye gaze. gaze.the screen AnAn ad-hoc of the calibration device. As procedure procedure the subject is is appliedcannot applied bein in askedorder order toto to mapcollaborate map the the pupils pupils to a averagedlengthyaveraged calibration directiondirection ontoprocedure,onto thethe screenscreen in our of thecase device. only 3 As Ascalibration the the subject subject points cannot cannot are be beused, asked asked i.e., to tocenter, collaborate collaborate left, right.to to a a lengthyEachlengthy subject calibrationcalibration is invited procedure,procedure, to fixate in each our calibration case only only 3po 3 calibration calibrationint for a few points points seconds, are are used, used,in order i.e., i.e., center,to center, acquire left, left, a right. set right. of EachdiscriminationEach subjectsubject isis invitedparameters.invited toto ﬁxatefixate The each obtained calibration accuracy po pointint, forinitially for a a few few testedseconds, seconds, over in in orderabout order to to100 acquire acquire tests a awithset set of of 8 discriminationdifferentdiscrimination non-RTT parameters.parameters. subjects, The Theis obtained7 obtaineddegrees accuracy, ofaccuracy visual initially, angleinitially tested on tested the over average,over about about 100 that tests 100 is with bytests no 8 diwith meansﬀerent 8 non-RTTcomparabledifferent subjects,non-RTT with isthat 7subjects, degrees of modern ofis visual7 degreeshardware-oriented angle of on visual the average, angle gaze thaton trackers the is by average, no[26]; means however, that comparable is by we no judgedwith means that it ofacceptablecomparable modern hardware-orientedfor with our purposes.that of modern gaze trackershardware-oriented [26]; however, gaze we trackers judged it[26]; acceptable however, for we our judged purposes. it acceptableThe VMVM for isis our a a simple simple purposes. graphical graphical interface interface that that opens opens a window a window and and shows shows in near in real-timenear real-time the video the ofvideo theThe subjectof theVM subject withis a simple the with information graphicalthe information regardinginterface regarding that the op gazeens the direction. a windowgaze direction. This and isshowsthe This video inis nearthe that videoreal-time the operatorthat the the usesoperatorvideo to of receive usesthe subjectto the receive eye with gaze-based the the eye information gaze-b feedbackased regardingfeedback from the subjectfromthe gaze the (Figure subjectdirection.5). (Figure This 5).is the video that the operator uses to receive the eye gaze-based feedback from the subject (Figure 5).

Figure 5. The Visualization Module showing the output of SWYG. Figure 5. The Visualization Module showing the output of SWYG. 3. Preliminary Experiments 3. Preliminary Experiments 3. PreliminaryWe conducted Experiments a preliminary experimental campaign on a sample of 12 subjects, in their late We conducted a preliminary experimental campaign on a sample of 12 subjects, in their late primary and secondary school ages, all female. Subjects were selected among those who could primaryWe andconducted secondary a preliminary school ages, experimental all female. Sucampaignbjects were on selecteda sample among of 12 subjects,those who in couldtheir late also also access a commercial eye-tracking device, which was used as a reference ground truth for our accessprimary a commercialand secondary eye-tracking school ages, device, all female. which Su bjectswas usedwere selectedas a reference among groundthose who truth could for also our experiments. The videoconferencing software adopted by the involved families was Cisco Webex, experiments.access a commercial The videoconferencing eye-tracking device, software which adopted was usedby the as involved a reference families ground was truth Cisco for Webex, our acquiredexperiments. within The a divideoconferencingﬀerent ongoing remotesoftware rehabilitation adopted by projectthe involved [19]. Thefamilies commercial was Cisco eye Webex, tracker available to the subjects was the Tobii 4C eye-tracking USB bar.

Information 2020, 11, x FOR PEER REVIEW 6 of 9 acquired within a different ongoing remote rehabilitation project [19]. The commercial eye tracker Information 2020, 11, 296 6 of 9 available to the subjects was the Tobii 4C eye-tracking USB bar. The families of the subjects were preliminary asked for a written informed consent on the use of imagesThe from families the testing of the subjectssessions werefor scie preliminaryntific purposes asked (including for a written publication). informed consent on the use of imagesThe from recruited the testing subjects sessions were for already scientific trained purposes with (including eye gaze-based publication). communication, as they frequentlyThe recruited used an subjects eye tracker were during already their trained rehabilitation with eye gaze-based sessions before communication, the COVID-19 as they lockdown. frequently usedThe an eye experiments tracker during aimed their at evaluating rehabilitation the reliability sessions before of the thetracker COVID-19 in discriminating lockdown. the response intendedThe experimentsby the subject aimed while at answering evaluating to the a question reliability by of means the tracker of her in eye discriminating gaze. the response intendedWe ran by the12 subjectsessions, while one answeringfor each subject. to a question Each bysession means started of her with eye gaze. a short briefing for the family,We explaining ran 12 sessions, the aim one of for the each experiment subject. Eachand providing session started instructions with a shorton the briefing setup. forUp theto 10 family, min explainingwere, then, the devoted aim of to the locating experiment a suitable and providing setting for instructions the session. on A the correct setup. illumination, Up to 10 min reasonably were, then, devotedsmooth and to locating without a deep suitable shadows setting on for the the face session. of the A subject, correct and illumination, a scene with reasonably only one smooth face on andthe withoutforeground deep where shadows preferred. on the Then, face of a the short subject, presen andtation a scene started, with showing only one a face short on graphical the foreground story wherewith some preferred. verbal Then, comments a short from presentation the operator. started, The showing presentation a short was graphical run on storythe operator’s with some PC verbal and commentsits window fromwas shared the operator. with the The subject presentation through the was Webex run on software. the operator’s Finally, PC 8 questions and its window were asked was sharedwithin witha 10 min the subjectsession through of questions. the Webex Therefore, software. in total Finally, 8 × 812 questions = 96 questions were asked were withinposed, a equally 10 min sessiondistributed of questions. among 12 Therefore, different insubjects. total 8 The12 =subjec96 questionsts were wererequested posed, to equallyanswer distributed each question among by × 12means different of their subjects. eye gaze. The subjectsFigure 6 were is a requestedscreenshot to of answer the user each interface question on by the means subject’s of their side, eye with gaze. a Figurequestion6 is to a screenshotbe answered of theand user the interface trace of onthe the re subject’sference Tobii side, witheye tracker a question is shown to be answered as a circular and thetranslucid trace of shape the reference on the screen. Tobii eye tracker is shown as a circular translucid shape on the screen.

Figure 6. The user interface on the subject’ssubject's side. For the sake of our application, the computer screen was divided into 9 zones (three rows by For the sake of our application, the computer screen was divided into 9 zones (three rows by three columns), so the required resolution for the eye tracker is 9 9. The obtained results are very three columns), so the required resolution for the eye tracker is 9 ×× 9. The obtained results are very encouraging: provided that the illumination conditions are favorable and that only the subject is on encouraging: provided that the illumination conditions are favorable and that only the subject is on the foreground of the video, the correspondence between the results of the SWYG eye tracker and the the foreground of the video, the correspondence between the results of the SWYG eye tracker and Tobii eye tracker is about 98%, i.e., for 94 questions out of 96, SWYG estimated the same answer as the the Tobii eye tracker is about 98%, i.e., for 94 questions out of 96, SWYG estimated the same answer Tobii tracker. as the Tobii tracker. Of course, these are only preliminary results and we still have to reﬁne several aspects of our Of course, these are only preliminary results and we still have to refine several aspects of our approach; however, thanks to those results, we decided to run more experiments with more subjects as approach; however, thanks to those results, we decided to run more experiments with more subjects soon as the lockdown will come to an end. In the meanwhile, the software will be used during the as soon as the lockdown will come to an end. In the meanwhile, the software will be used during the next months in support of the rehabilitation and special education services to the subjects involved in next months in support of the rehabilitation and special education services to the subjects involved the project. in the project. 4. Discussion We were recently asked to devise a way to allow a community of girls diagnosed with the Rett syndrome, and thus unable to communicate verbally, in writing or by gestures, to actively participate in remote rehabilitation and special education sessions. As the hardware devices needed for eye

Information 2020, 11, 296 7 of 9 tracking were not available for all subjects, we investigated new ways to facilitate the access to eye gaze-based interaction for the specific case of simplified communication, i.e., communication through direct selection of one out of a small number of predefined choices. As the adopted communication platform was a videoconferencing software, all we had at our disposal was a live video stream of the subject. We considered the following use case. A cognitive therapist or special education teacher (the operator) is in charge of a group of RTT subjects distributed on a possibly large geographical area. The families cannot afford the expenses and commuting time needed to periodically reach the operator, thus adopting a teleconferencing platform would be both economically and logistically convenient. However, all or most subjects lack verbal or written communication abilities. Moreover, most subjects also lack fine motor control and therefore cannot communicate by gestures. As a consequence, only eye-gaze interaction can be used to communicate with the operator. If each family had an eye-tracking device, the simplest solution would be to share the operator’s desktop with each subject and show to the subject very simple questions with three or four options. The subject could then choose the preferred option by simply fixating their gaze on it. Unfortunately, not all the families have an eye-tracking device at home and some of them can only rely on a tablet or a smartphone, so using specialized software is not feasible. Without eye-gaze tracking, the subjects cannot send their feedback to the operator, who therefore has no way to estimate the subjects’ degree of comprehension of the content being administered and receive responses to his/her questions. Eye gaze technology has been investigated for more than 20 years now [26]. Most commercially available products rely on specialized hardware devices able to detect the eyes and eye direction in real-time, but those devices are often expensive and do not support smartphones or tables (except some very recent and quite expensive ones). As a consequence, an inclusive approach must leverage on common webcams and computer vision algorithms to detect the face of the subject and his/her eye gaze direction [27–32]. Moreover, to the best of our knowledge, current video conferencing platforms do not support software-based eye-gaze analysis, so the computer vision algorithms must be applied externally to the video conferencing software (VCS) yet work on the same video stream managed by that software. As VCS is usually not open source, modifying it is not a viable solution. Furthermore, most families are not willing to install yet more software on their devices. As a solution to the problem, we developed software (named SWYG) that only runs at the “operator” side of the communication, i.e., it does not require other software to be installed on the subject’s computer. Moreover, it does not require powerful hardware at the operator’s side and works with virtually any videoconferencing software, on Windows, Linux and Apple operating systems. Finally, it does not rely on any “cloud” or client-server data analysis and storage services, so it does introduce further issues with privacy and personal data management. The preliminary results obtained are very promising and the software is ready to be deployed on a larger base. While this paper is being written, several subjects are finally able to communicate with their caregivers from home, without relying on expensive and cumbersome devices. We are now planning to extend the SWYG platform to allow for a larger range of interaction channels in order to support different groups of disabled users as well. In particular, we are investigating remote gesture recognition [33,34] and remote activity recognition for the assistance to elderly subjects [35]. In combination with software-defined networking [36], the proposed approach can also be deployed at the local and campus level to develop real-time video communication and “rapid” surveillance architecture by relying on existing infrastructures and video conferencing platforms.

Author Contributions: All authors have jointly, and on equal basis, collaborated to this research work. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conﬂicts of Interest: The authors declare no conﬂict of interest. Information 2020, 11, 296 8 of 9

References

1. Global Education Monitoring Report. How Is the Coronavirus Affecting Learners with Disabilities? 30 March 2020. Available online: https://gemreportunesco.wordpress.com/2020/03/30/how-is-the- coronavirus-affecting-learners-with-disabilities/ (accessed on 10 May 2020). 2. Guralnick, M.J. Early Intervention for Children with Intellectual Disabilities: An Update. J. Appl. Res. Intellect. Disabil. 2016, 30, 211–229. [CrossRef][PubMed] 3. Gangemi, A.; Caprì, T.; Fabio, R.A.; Puggioni, P.; Falzone, A.; Martino, G. Transcranial Direct Current Stimulation (tDCS) and Cognitive Empowerment for the functional recovery of diseases with chronic impairment and genetic etiopathogenesis. In Advances in Genetic Research; Nova Science Publisher: New York, NY, USA; Volume 18, pp. 179–196. ISBN 978-1-53613-264-9. 4. Neul, J.; Kaufmann, W.E.; Glaze, D.G.; Christodoulou, J.; Clarke, A.J.; Bahi-Buisson, N.; Leonard, H.; Bailey,M.E.S.; Schanen, N.C.; Zappella, M.; et al. Rett syndrome: Revised diagnostic criteria and nomenclature. Ann. Neurol. 2010, 68, 944–950. [CrossRef][PubMed] 5. Vessoyan, K.; Steckle, G.; Easton, B.; Nichols, M.; Siu, V.M.; McDougall, J. Using eye-tracking technology for communication in Rett syndrome: Perceptions of impact. Augment. Altern. Commun. 2018, 34, 230–241. [CrossRef][PubMed] 6. Fabio, R.A.; Magaudda, C.; Caprì, T.; Towey, G.E.; Martino, G. Choice behavior in Rett syndrome: The consistency parameter. Life Span Disabil. 2018, 21, 47–62. 7. Fabio, R.A.; Capri, T.; Nucita, A.; Iannizzotto, G.; Mohammadhasani, N. Eye-gaze digital games improve motivational and attentional abilities in rett syndrome. J. Spéc. Educ. Rehabil. 2019, 19, 105–126. [CrossRef] 8. Caprì, T.; Gugliandolo, M.C.; Iannizzotto, G.; Nucita, A.; Fabio, R.A. The influence of media usage on family functioning. Curr. Psychol. 2019, 1, 1–10. [CrossRef] 9. Dinleyici, M.; Çarman, K.B.; Ozturk, E.; Dagli, F.S.; Guney, S.; Serrano, J.A. Media Use by Children, and Parents’ Views on Children’s Media Usage. Interact. J. Med Res. 2016, 5, e18. [CrossRef][PubMed] 10. Caprì, T.; Santoddi, E.; Fabio, R.A. Multi-Source Interference Task paradigm to enhance automatic and controlled processes in ADHD. Res. Dev. Disabil. 2019, 97, 103542. [CrossRef] 11. Cingel, D.P.; Krcmar, M. Predicting Media Use in Very Young Children: The Role of Demographics and Parent Attitudes. Commun. Stud. 2013, 64, 374–394. [CrossRef] 12. McDaniel, B.T.; Radesky, J.S. Technoference: Parent Distraction With Technology and Associations With Child Behavior Problems. Child Dev. 2017, 89, 100–109. [CrossRef] 13. Mohammadhasani, N.; Caprì, T.; Nucita, A.; Iannizzotto, G.; Fabio, R.A. Atypical Visual Scan Path Affects Remembering in ADHD. J. Int. Neuropsychol. Soc. 2019, 1–10. [CrossRef][PubMed] 14. Fabio, R.A.; Caprì, T. Automatic and controlled attentional capture by threatening stimuli. Heliyon 2019, 5, e01752. [CrossRef] 15. Djukic, A.; Rose, S.A.; Jankowski, J.J.; Feldman, J.F. Rett Syndrome: Recognition of Facial Expression and Its Relation to Scanning Patterns. Pediatr. Neurol. 2014, 51, 650–656. [CrossRef][PubMed] 16. Fabio, R.A.; Caprì, T.; Iannizzotto, G.; Nucita, A.; Mohammadhasani, N. Interactive Avatar Boosts the Performances of Children with Attention Deficit Hyperactivity Disorder in Dynamic Measures of Intelligence. Cyberpsychol. Behav. Soc. Netw. 2019, 22, 588–596. [CrossRef] 17. Baron-Cohen, S.; Ring, H. A model of the mindreading system: Neuropsychological and neurobiological perspectives. In Children’s Early Understanding of Mind: Origins and Development; Lewis, C., Mitchell, P., Eds.; Lawrence Erlbaum Associates: Hillsdale, MI, USA, 1994; pp. 183–207. 18. Castelli, I.; Antonietti, A.; Fabio, R.A.; Lucchini, B.; Marchetti, A. Do rett syndrome persons possess theory of mind? Some evidence from not-treated girls. Life Span Disabil. 2013, 16, 157–168. 19. Caprì, T.; Fabio, R.A.; Iannizzotto, G.; Nucita, A. The TCTRS Project: A Holistic Approach for Telerehabilitation in Rett Syndrome. Electronics 2020, 9, 491. [CrossRef] 20. Benham, S.; Gibbs, V. Exploration of the Effects of Telerehabilitation in a School-Based Setting for At-Risk Youth. Int. J. Telerehabil. 2017, 9, 39–46. [CrossRef] 21. Nucita, A.; Bernava, G.; Giglio, P.; Peroni, M.; Bartolo, M.; Orlando, S.; Marazzi, M.C.; Palombi, L. A Markov Chain Based Model to Predict HIV/AIDS Epidemiological Trends. Appl. Evol. Comput. 2013, 8216, 225–236. [CrossRef] Information 2020, 11, 296 9 of 9

22. Screen Capture Lite. Available online: https://github.com/smasherprog/screen_capture_lite (accessed on 14 May 2020). 23. Itseez. Open Source Computer Vision Library. Available online: https://github.com/itseez/opencv (accessed on 31 May 2020). 24. Iannizzotto, G.; Lo Bello, L.; Nucita, A.; Grasso, G.M. A Vision and Speech Enabled, Customizable, Virtual Assistant for Smart Environments. In Proceedings of the 2018 11th International Conference on Human System Interaction (HSI), Gda´nsk,Poland, 4–6 July 2018; pp. 50–56. 25. King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. 26. Kar, A.; Corcoran, P. Performance Evaluation Strategies for Eye Gaze Estimation Systems with Quantitative Metrics and Visualizations. Sensors 2018, 18, 3151. [CrossRef] 27. Kar, A.; Corcoran, P. A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms. IEEE Access 2017, 5, 16495–16519. [CrossRef] 28. Iannizzotto, G.; La Rosa, F. Competitive Combination of Multiple Eye Detection and Tracking Techniques. IEEE Trans. Ind. Electron. 2010, 58, 3151–3159. [CrossRef] 29. Crisafulli, G.; Iannizzotto, G.; La Rosa, F. Two competitive solutions to the problem of remote eye-tracking. In Proceedings of the 2009 2nd Conference on Human System Interactions, Catania, Italy, 21–23 May 2009; pp. 356–362. 30. Marino, S.; Sessa, E.; Di Lorenzo, G.; Lanzafame, P.; Scullica, G.; Bramanti, A.; La Rosa, F.; Iannizzotto, G.; Bramanti, P.; Di Bella, P. Quantitative Analysis of Pursuit Ocular Movements in Parkinson’s Disease by Using a Video-Based Eye Tracking System. Eur. Neurol. 2007, 58, 193–197. [CrossRef][PubMed] 31. Zhang, X.; Sugano, Y.; Fritz, M.; Bulling, A. MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 162–175. [CrossRef] 32. Lemley, J.; Kar, A.; Drimbarean, A.; Corcoran, P.Convolutional Neural Network Implementation for Eye-Gaze Estimation on Low-Quality Consumer Imaging Systems. IEEE Trans. Consum. Electron. 2019, 65, 179–187. [CrossRef] 33. Iannizzotto, G.; La Rosa, F. A Modular Framework for Vision-Based Human Computer Interaction. In Image Processing; IGI Global: Hershey, PA, USA, 2013; pp. 1188–1209. 34. Iannizzotto, G.; La Rosa, F.; Costanzo, C.; Lanzafame, P. A Multimodal Perceptual User Interface for Collaborative Environments. In Proceedings of the 13th International Conference on Image Analysis and Processing, ICIAP 2005, Cagliari, Italy, 6–8 September 2005; Volume 3617, pp. 115–122. 35. Cardile, F.; Iannizzotto, G.; La Rosa, F. A vision-based system for elderly patients monitoring. In Proceedings of the 3rd International Conference on Human System Interaction, Rzeszów, Poland, 13–15 May 2010; pp. 195–202. 36. Leonardi, L.; Ashjaei, M.; Fotouhi, H.; Bello, L.L. A Proposal Towards Software-Deﬁned Management of Heterogeneous Virtualized Industrial Networks. In Proceedings of the 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), Helsinki, Finland, 22–25 July 2019; Volume 1, pp. 1741–1746.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).