Explainable Federated Learning for Taxi Travel Time Prediction

Jelena Fiosina a Institute of Informatics, Clausthal Technical University, Julius-Albert Str. 4, D-38678, Clausthal-Zellerfeld, Germany

Keywords: FCD Trajectories, Traffic, Travel Time Prediction, Federated Learning, Explainability.

Abstract: Transportation data are geographically scattered across different places, detectors, companies, or organisations and cannot be easily integrated under data privacy and related regulations. The federated learning approach helps process these data in a distributed manner, considering privacy concerns. The federated learning archi- tecture is based mainly on , which is often more accurate than other models. However, deep-learning-based models are intransparent unexplainable black-box models, which should be explained for both users and developers. Despite the fact that extensive studies have been carried out on investigation of various model explanation methods, not enough solutions for explaining federated models exist. We propose an explainable horizontal federated learning approach, which enables processing of the dis- tributed data while adhering to their privacy, and investigate how state-of-the-art model explanation methods can explain it. We demonstrate this approach for predicting travel time on real-world floating car data from Brunswick, Germany. The proposed approach is general and can be applied for processing data in a federated manner for other prediction and classification tasks.

1 INTRODUCTION transparent. This feature of AI-driven systems com- plicates the user acceptance and can be troublesome Most real-world data are geographically scattered even for model developers. across different places, companies, or organisations, Therefore, the contemporary AI technologies and, unfortunately, cannot be easily integrated under should be capable of processing these data in a de- data privacy and related regulations. This is especially centralised manner, according to the data privacy reg- topical for the transportation domain, in which ubiq- ulations. Moreover, the algorithms should be maxi- uitous traffic sensors and create a mally transparent, which makes the decision-making world-wide network of interconnected uniquely ad- process user-centric. dressable cooperating objects, which enable exchange Federated learning is a distributed ML approach, and sharing of information. With the increase in the which enables model training on a large corpus of de- amount of traffic, a large number of available decen- centralised data (Konecny´ et al., 2016). It has three tralised data is available. major advantages: 1) it does not need to transmit the Fuelled by a large amount of data collected in original data to the cloud, 2) the computational load various domains and the high available computing is distributed among the participants, and 3) it as- power analytical procedures and statistical models for sumes synchronisation of the distributed models with data interpretation and processing rely on methods a server for more accurate models. The main assump- of artificial intelligence (AI). In recent years, a large tion is that the federated model should be parametric progress of AI has been achieved. These data-driven (e. g., deep learning) because the algorithm synchro- methods replace complex analytical procedures by nises the models by synchronising the parameters. A multiple calculations. They are easily applicable and, known limitation of deep learning is that neural net- in most cases, more accurate considering their ma- works inside it are unexplainable black-box models. chine learning (ML) ancestry. The accuracy and inter- Numerous model-agnostic and model-specific (e.g., pretability are two dominant features present in suc- Integrated gradients, DeepLIFT) methods for expla- cessful predictive models. However, more accurate nation of black-box models are available (Kraus et al., black-box models are not sufficiently explainable and 2020). Distributed versions of these methods exist, which allow them to be executed on various processes a https://orcid.org/0000-0002-4438-7580

670

Fiosina, J. Explainable Federated Learning for Taxi Travel Time Prediction. DOI: 10.5220/0010485606700677 In Proceedings of the 7th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2021), pages 670-677 ISBN: 978-989-758-513-5 Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

Explainable Federated Learning for Taxi Travel Time Prediction

on graphics processing units (GPUs) 1. However, to data analyses as well as investigation of novel ap- the best of our knowledge, studies on the explanation proaches capable of processing distributed data often of geographically distributed federated deep learning with data privacy requirements. models are lacking. When data centralisation is available, accurate The mentioned challenges are topical in the trans- prediction models can be developed, which address portation domain, in which the generation and pro- the big data challenge through smart ‘artificial’ par- cessing of big data are necessary. We propose titioning and parallelisation of data and computation a privacy-preserving explainable federated model, within a cloud-based architecture or powerful super which achieves a comparable accuracy to that of the computers (Fiosina and Fiosins, 2017). centralised approach on the considered real-world Often, data should be physically and logically dis- dataset. We predict the Brunswick taxi travel time tributed without transmission of big information vol- based on floating car data trajectories obtained from umes, without the need to store, manage, and pro- different taxi service providers, which should remain cess massive datasets in one location. This approach private. The proposed model makes predictions for enables a data analysis with smaller datasets. How- the stated problem and allows a joint learning pro- ever, scaling it up requires novel methods to effi- cess over different users, processing the data stored ciently support the coordinated creation and main- in each of them without exchanging their raw data, tenance of decentralised data models. Specific de- but only parameters, as well as providing joint expla- centralised architectures (e.g., multi-agent systems nations about variable importance. (MAS)) should be implemented to support the de- We address several research questions. 1) Which centralised data analysis, which requires a coordi- is the most accurate ML prediction method for the nated suite of decentralised data models, including given data in a centralised manner? We identify the parameter/data exchange protocols and synchronisa- best hyper-parameters for each method. 2) Under tion mechanisms among the decentralised data mod- which conditions federated learning is effective? We els (Fiosina et al., 2013a). distribute the dataset among various providers, and The MAS based representation of transportation analyse after which point the distributed and non- networks helps overcome the limitations of cen- synchronised models lose their accuracy and feder- tralised data analyses, which will enable autonomous ated learning is beneficial. We define an optimal syn- vehicles to make better and safer routing decisions chronisation plan for parameter exchange, identifying (Dotoli et al., 2017). Various cloud-based architec- the hyper-parameters and frequency of parameter ex- tures for intelligent transportation systems were pro- change that is acceptable and beneficial. 3) Do exist- posed (Khair et al., 2021). A cloud-based architec- ing black-box explanation methods successfully ex- ture, which focuses on decentralised big transporta- plain federated learning models? We investigate how tion data analyses was presented in (Fiosina et al., the state-of-the-art explainability methods can explain 2013a). federated models. The rest of the paper is organised as follows. Sec- Travel-time is an important parameter of trans- tion 2 describes the state-of-the-art. Section 3 de- portation networks, which accurate prediction helps scribes the proposed explainable federated deep learn- to reduce delays and transport delivery costs, im- ing concept and parameter synchronisation mech- proves reliability through better selection of routes anisms. Section 4 introduces the available data, and increases the service quality of commercial deliv- presents the experimental setup, and provides insights ery by bringing goods within the required time win- into the data preprocessing step. Section 5 presents dow (Ciskowski et al., 2018). A centralised deep the model validation and experimental results. learning based approach to the estimation of travel- time for ride-sharing problem was discussed in (Al- Abbasi et al., 2019). Often proper travel time forecasting model needs 2 STATE OF THE ART a pre-processing such as data filtering and ag- gregation. Travel-time aggregation models (non- 2.1 Distributed Data Analysis parametric, semi-parametric) for decentralized data clustering and corresponding coordination and pa- The large amount of contemporary generated data rameter exchange algorithms were researched in in the transportation domain requires application of (Fiosina et al., 2013b). Travel-time estimation and state-of-the-art methods of distributed/decentralised forecasting using decentralized multivariate linear and kernel-density regression with corresponding pa- 1https://captum.ai/ rameter/data exchange was proposed in (Fiosina et al.,

671 VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

2013a). Initial requirements and ideas for methods of de- centralised data analysis development in the trans- portation domain operating with big data flows have been identified (Fiosina et al., 2013b). The impact of incorporating decentralised data analysis methods into MAS-based applications taking into account in- dividual user routing preferences has been assessed (Fiosina and Fiosins, 2014).

2.2 Federated Learning

Federated learning (Konecny´ et al., 2016) was pro- posed by Google and continues the research line Figure 1: Architecture of horizontal federated learning. of distributed data analyses, which focuses mainly data {D }N . A centralised approach uses all data on development of privacy-preserved ML models for i i=1 together D = ∪N D to train a model M . A feder- physically distributed data. When the isolated dataset i=1 i Σ ated system is a learning process in which the data used by each company cannot create an accurate owners collaboratively train a model M , where any model, the mechanism of federated learning enables FD data owner F does not expose its data D to others. access to more data and better training of models. i i In addition, the accuracy of M , denoted as V , Federated learning enables different devices to collab- FD FD should be very close to the performance of M , V oratively learn a shared prediction model while main- Σ Σ (Yang et al., 2019). Each row of the matrix D repre- taining all training data on the device, without the i sents a sample, while each column represents a fea- need to store the data in the cloud. The main differ- ture. Some datasets may also contain label data. The ence between federated learning and distributed learn- feature X, label Y, and sample Ids I constitute the ing is attributed to the assumptions on the properties complete training dataset (I, X, Y). The feature and of the local datasets, as distributed learning originally sample space of the data parties may not be identi- aims to parallelise the computing power, whereas fed- cal. We classify federated learning into horizontal, erated learning originally aims to train on heteroge- vertical, and federated transfer learning based on the neous datasets (Konecny´ et al., 2016). This approach data distribution among various parties. In horizontal may use a central server that orchestrates the different or sample-based federated learning data sets share the steps of the algorithm and acts as a reference clock same feature space but different in samples. In verti- or they may be peer-to-peer, where no such central cal or feature-based federated learning data sets share server exists. In this study, we use a central server the same sample ID space but differ in feature space. for this aggregation, while local nodes perform local In federated transfer learning data sets differ in both training (Yang et al., 2019). sample and feature space, having small intersections. The general principle consists of training local In this study, we develop a joint privacy- models on local data samples and exchanging param- preserving model of travel time forecasting of differ- eters (e.g., the weights of a deep neural network) be- ent service providers based on the horizontal feder- tween these local models at some frequency to gener- ated learning architecture. The training process of ate a global model (Bonawitz et al., 2019). To ensure such a system is independent on specific ML algo- a good task performance of a final central ML learn- rithms. All participants share the final model parame- ing model, federated learning relies on an iterative ters (Figure 1). process broken down to an atomic set of client–server interactions referred to as federated learning round (Yang et al., 2019). Each round of this process con- 2.3 Explainable AI sists of transmitting the current global model state to participating nodes, training local models on these lo- Conventional ML methods, such as , cal nodes to produce a set of potential model updates decision trees, and support vector machine, are inter- at each node, and aggregating and processing these pretable in nature. Typically, highly accurate complex local updates into a single global update and applying deep learning-based black-box models are favoured it to the global model. over less accurate but more interpretable conventional N We consider N data owners {Fi}i=1, who wish to ML models. Extensive studies have been carried out train an ML model by consolidating their respective in the past few years to design techniques to make AI

672 Explainable Federated Learning for Taxi Travel Time Prediction

methods more explainable, interpretable, and trans- parent to developers and users (Molnar, 2020). Join- ing such methods in hybrid systems (e.g., ensem- bling) further increases their explainability and ob- tained accuracy, (Holzinger, 2018). AI for explain- ing decisions in MAS was discussed in (Kraus et al., 2020). Model-agnostic and model-specific explanation methods have been reported (Molnar, 2020). Model- agnostic methods, such as LIME, Sharpley Values, are implementable for each model. However, they re- quire a large number of computations and often are not applicable for big datasets used in deep learn- ing (Molnar, 2020). Sharpley values method was Figure 2: Explainable federated learning architecture. implemented in (Wang, 2019) to interpret a vertical models, while MFD is the federated model. As we federated learning model. Model-specific methods are more suitable for deep learning, which focus on consider learning on batches, Mi is the lo- only one type of model and are more computationally cal model of the participant Fi for the current epoch effective (Molnar, 2020), e.g., Integrated Gradients and batch of data, while MFD is the cur- epoch,batch (Sundararajan et al., 2017), and DeepLIFT (Shriku- rent federated model. wi are the current pa- mar et al., 2017), (Ancona et al., 2018). These meth- rameters of the model M: wepoch,batch = ods have an additive nature, which enables comput- i i w(M). The training process is described ing them in a distributed manner across processors, i in Algorithm 1. Note, that the synchronisation should machines, or GPUs. For example, the explainabil- not appear each batch, so a logical variable synchro- ity scores can be calculated on the participants’ local nisation supervises this process. FedAverage() is a data, and then accumulated at the server together 2 . parameter synchronisation procedure at server side, In this study, we focus on integrated gradients, which, in the simplest case, is an average value, cal- which represent the integral of gradients with respect culated for each parameter over all local models. We to inputs along the path from given baseline to input. start the variable explanation process when the fed- erated training process is finished and a copy of the common federated model Mi of each Fi is lo- 3 EXPLAINABLE FEDERATED cally available. The scoring algorithm is one of the LEARNING explainability methods mentioned above.

We propose a federated model explaining strategy and illustrate it on a travel time prediction problem. Our 4 EXPERIMENTAL SETUP aim is to describe the application of state-of-the-art explainability methods to federated learning, while maintaining data privacy. We apply the federated We predict the Brunswick taxi travel time based on learning architecture and explainability methods to floating car data (FCD) trajectories obtained from two the focal problem and consider what information and different taxi service providers. The data were avail- how often should be exchanged. Moreover, the ap- able for the period of January 2014 - January 2015. plication of each explainability method to a concrete Each raw data item contains: car identifier, time mo- task only produces baseline results because the result ment, geographical point and status (free, drives to interpretation is specific to the particular task or ap- client, drives with client, etc.). The data were received plication at hand (Fiosina et al., 2019). approximately every 40 s. We consider the map of N N Brunswick and surrounding as a box with coordinates Let N participants {Fi}i=1 own datasets {Di}i=1 as previously defined. For the federated learning in the latitude range of 51.87◦ −52.62◦ and longitude ◦ ◦ process, each participant Fi divides its dataset Di = range of 10.07 − 11.05 (Figure 3). TR TE TR TE To evaluate whether the travel times depend on Di ∪ Di into training set Di and test set Di . TR weather conditions, we used corresponding histori- We train the models on Di and calculate attribution TE N cal data about rains, wind, temperature, atmospheric scores on Di . {Mi}i=1 are the participants’ local pressure, etc. However, good-quality weather data for 2https://captum.ai/ the given region were available only until the end of

673 VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

Algorithm 1: Explainable federated learning train- ing process. Result: Trained MFD model Define initial wi for Mi, epoch=1 ; while The loss function does not converge do foreach batch of data do foreach Fi in parallel do TR,batch Train(Mi ,Di ); if synchronisation then Fi sends wi to the Figure 3: Road network of Brunswick and surrounding server; if synchronisation then We then connected the trajectories with the open Server averages the parameters and street map and obtained a routable graph. Addition- broadcasts them: wFD = ally, we divided the map into different size grids (e.g., FedAverage(w); 200m×200m) to determine whether this aggregation i can improve our forecasts. Therefore, we knew to foreach Fi in parallel do Fi receives updated parameters which zone the start and end points of each trip be- from the server and updates its long. Moreover, we found the nearest road graph model: node to the source and destination of each trip and calculated the shortest-path distance. To improve the Mi = MFD ; epoch = epoch + 1 prediction model and filter the noisy data, we calcu- Training process is over. Last obtained lated the distance between the start and end points of each trip according to the FCD trajectory. If the dis- MFD is the final model: Mi of each Fi; tances of the shortest path trajectory and FCD trajec- foreach participant Fi in parallel do tory were considerably different, we analysed the trip TE more closely and divided it into a couple of more re- foreach instance j of Di dataset do Calculates attribution scores: sci, j = alistic trips, excluding false GPS signal places. TE We stored the raw data, trip data, weather data, ScoringAlgorithm(M ,Di, j ); i and graph data obtained from open street map (roads Fi calculates its average scores and sends the ∑ j sci, j and nodes) in the PostgreSQL database. result to the server: sci∗ = TE ; |Di | For visual presentation of trips, their sources and Server aggregates the participant scores and destinations as well as graph representation of roads ˙ TE 3 ∑i sci∗|Di | network of Brunswick we used QGIS (Figure 3). broadcasts the result: scFD = TE ; |∪iDi | We predicted the travel time using different methods foreach participant Fi in parallel do (Table 1) and found the corresponding best hyper- Each F updates its attribution scores: i parameters by the grid search. sc = sc ; i∗ FD The forecasting was based on the following fac- tors: coordinates of the start zone and end zone (200m September 2014, so that we reduced our trajectory × 200m), FCD distance, transformed (with sine and dataset accordingly. cosine) weekday and hour, as well as temperature, air We developed a multi-step data pre-processing pressure, and rain. We forecasted the travel time in pipeline. We constructed a script to transform the data seconds. We divided the dataset into training (80%) into trajectories according to time points, locations, and test (20%) sets. The dataset was normalised with and car identifier. The raw trajectories were then anal- MinMax Scaler before the application of the above ysed to determine their correctness. We split trajecto- methods. We used the mean squared error (MSE) ries with long stay periods into shorter trips. Round as an efficiency criterion and 5-fold cross validation trips with the same source and destination were sepa- for model comparison. The accuracy with an MSE of rated into two trips. Some noisy unrealistic data with .0010 corresponds to 5 min, while that with an MSE probably incorrect global positioning system (GPS) of .0018 to 7.5 min. We used Python programming signals, with incorrect status, or unrealistic average language, PyTorch for deep learning models, PySyft speeds (less than 13 km or more than 100 km), were library 4 for the federated learning (Ryffel et al., 2018) also removed. After this cleaning, the number of tra- jectories was 542066. 3https://www.qgis.org/ 4https://pysyft.readthedocs.io/

674 Explainable Federated Learning for Taxi Travel Time Prediction

Table 1: Optimal model hyper-parameters. Table 2: MSE of travel time prediction with different ML methods. Model Hyper-parameters Linear (no); Model Number of data providers Regression Ridge (α = 0.09); 1 4 8 16 32 Lasso (α = 1e − 9) Linear, colsample bytree = 0.7; Lasso, .0019 .0019 .0019 .0020 .0020 learning rate = 0.12; Ridge XGBoost max depth = 9, α = 15; regression n estimators = 570 XGBoost .00097 .0011 .0012 .0012 .0013 num trees=100; Random .0010 .0011 .0012 .0012 .0013 Random forest max depth and min samples leaf forest are not restricted Deep .0011 .0012 .0013 .0014 .0015 fully conn. percentron learning with 2 hidden layers, Federated number of neurons: 64-100 deep — .0011 .0011 .0011 .0011 Re-Lu act. function; learning Deep learn. 0.2 dropout between hidden layers; optimiser SGD; MSE loss function; sized local datasets, which in federated learning is of- NN batchsize=128, epochs=800; ten not true. Then, we executed various ML models learning rate=0.02 locally on each provider without synchronisation. Fi- synchronisation each 2nd batch, nally, we analysed after which point the distributed NN batch size is proportional to nonsynchronised models lose their accuracy and fed- the size of each provider’s dataset, Federated learn. erated learning outperformed other locally executed the sum of all provider’s ML methods. NN batch size= 128. The next columns of Table 2 show the MSE of

5 travel time prediction on distributed data. The average and captum library to interpret the models. accuracy of all models except federated deep learning was reduced. The federated approach led to the same result as that of the centralised deep learning. There- 5 EXPERIMENTS fore, the accuracy of the federated model was compa- rable to those of XGBoost and random forest. Start- ing at eight data providers, federated learning became 5.1 Alternative Predictors beneficial because it does not lose its accuracy. With more data providers, the benefits of federated learn- We identified the best ML prediction model (Table ing became more evident. The MSE of the federated 2). For a single data provider (centralised approach), model’s prediction remained constant, 0.0011. the best results were obtained by the XGBoost and Another important parameter that influences the random forest methods (.00097 and .0010). Conven- accuracy of the federated model is the batch size. In tional regression methods such as linear, Lasso and our centralised deep learning model, the optimal so- Ridge regressions provided the same inaccurate re- lution was obtained with batch size = 128 or smaller. sults. Deep learning exhibited a slightly lower perfor- The experiments show that with the increase in the mance than those of the best models. The application batch size, the computing speed increases, but the ac- of federated deep learning was impossible because of curacy of the model decreases. This implies that, to a single data owner. obtain the same accuracy by federated learning, we Our next task was to determine under which con- have to distribute the batches proportionally among ditions federated deep learning was effective. De- the providers. Accordingly, with eight data providers, spite the fact that XGBoost and random forest meth- with equally sized datasets, the batch size will be ods provided the most accurate results for the cen- 128/8 = 16. tralised approach, federated learning could be imple- mented only on parametric models like deep learn- 5.2 Synchronisation of Models ing. Unfortunately it was unknown, which data be- longed to which provider. Thus we randomly dis- We investigated the effect of the synchronisation fre- tributed the data among the providers and this led to quency on the accuracy of the federated model. The the assumption of identically distributed and equally accuracy decreased with the step-wise decrease in the synchronisation frequency (Table 3); we aimed to in- 5https://captum.ai/ vestigate this tendency. With synchronisation per-

675 VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

formed in each batch or even each second batch, the accuracy remained the same as that of the centralised 0.025 distance approach. The accuracy decreased with a rarer syn- Federated 0.02 chronisation. Centralised

0.015 end zone Table 3: MSE of travel time prediction for different syn- lat chronisation frequencies. 0.01 cos/sin lon hour Synchronisation each n-th batch Av. start zone sin/cos 0.005 lat weekday for eight providers MSE lon rain 1, 2 .0011 temp wind 3, 4 .0012 0 5 .0013 Figure 5: Explainability of federated vs centralised ap- proach.

5.3 Explaining of the Federated Model have the biggest influence on the results. According to Figures 4 and 5 the most important variable for all In this section, we compare the results of variable the models was FCD distance, which was expected. importances for local, federated and centralised ap- The next important variables were zones’ coordinates, proaches. We have chosen Integrated gradients ex- sine and cosine of the traveling hour and day of the plainability method as an explainability scoring algo- week. The division of the map into zones improved rithm because of its simplicity and speed. Moreover the predictions. However, despite of our expectations, the baseline in this algorithm was taken equal to the almost all weather parameters do not significantly in- average value of each feature. fluence the predictions. Figure 4 contains variable importance calculated with the federated model for each of eight data providers locally using their test data. Despite the fact that the main tendency in variable importance by all of 6 CONCLUSION eight providers remains the same, the locally obtained results differ from the importance scores, calculated We analysed Brunswick taxi FCD data trajectories for with all test data. This may lead to inaccurate ex- travel time prediction with different prediction meth- plainability by some local providers, especially with ods. We identified that XGBoost was the best predic- small testset sizes. The proposed importance scores tion model for the centralised approach, which pre- averaging mechanism (Algorithm 1) allows to avoid dicted the corresponding travel-time with an MSE of this inaccuracy without transmitting the local testsets. 0.00097. In the case of distributed data providers, starting at eight providers with equal distributions of distance data sources, the federated approach outperformed 0.025 Data Providers 0 1 XGBoost and random forest methods if they were 0.02 2 3 4 5 executed locally by each data provider. Upon the end zone 6 7 synchronisation executed on each second batch, the 0.015 lat corresponding federated deep learning model did not 0.01 lon lose its accuracy compared to the centralised model. start zone cos/sin cos/sin 0.005 lon lat hour We proposed an approach to explain the horizontal weekday federated learning model with explainability meth- temp rain wind 0 ods without transferring raw data to the central server. Figure 4: Explainability of individual models. This enabled more accurate determination of the most important prediction variables of the federated model. Figure 5 presents variable importances calculated We illustrated the considered approach on a travel for centralised and federated learning approaches us- time prediction problem using a horizontal federated ing aggregated test datasets (centralised) or aggrega- learning architecture. For calculation simplicity in tion of scores (federated), which led to the same re- our experiments all providers’ local datasets were of sults. We observe that without raw data transfer our the same size and the data were identically distributed approach allows more accurate calculation of variable in all datasets, which is often not true in real-world importance than one each provider can obtain using problems. We are planning to consider not equally only its local test set. sized and not identically distributed datasets in our Our next task is to investigate, which parameters future experiments. We assume that the participants

676 Explainable Federated Learning for Taxi Travel Time Prediction

with smaller datasets will have more benefits from the Fiosina, J. and Fiosins, M. (2014). Resampling based mod- accurate prediction model. A particularly beneficial elling of individual routing preferences in a distributed case could be if we suppose that each taxi’s dataset traffic network. Int. Journal of AI, 12(1):79–103. is confidential and should be processed locally (self- Fiosina, J. and Fiosins, M. (2017). Distributed nonpara- interested ride providers) as in (Ramanan et al., 2020). metric and semiparametric regression on spark for big data forecasting. Applied Comp. Int. and Soft Com- Despite the fact, that the proposed explainable puting, 2017:13. federated approach does not depend on the explain- Fiosina, J., Fiosins, M., and Bonn, S. (2019). Explainable ability method, only one explainability method (Inte- deep learning for augmentation of sRNA expression grated gradients) was analysed. We plan to compare profiles. Journal of Computational Biology. (to ap- different explainability methods and their combina- pear). tions as our future work. We believe that this ap- Fiosina, J., Fiosins, M., and Muller,¨ J. (2013a). Big data proach could be successfully implemented for more processing and mining for next generation intelligent complex prediction and classification tasks with more transportation systems. Jurnal Teknologi, 63(3). complex deep learning and federated learning archi- Fiosina, J., Fiosins, M., and Muller,¨ J. (2013b). Decen- tectures. For example, we aim to implement explain- tralised cooperative agent-based clustering in intelli- gent traffic clouds. In Proc. of 11th German Conf. on able federated learning for traffic demand forecasting MAS Technologies, volume 8076 of LNAI, pages 59– models. 72. Springer. Holzinger, A. (2018). Explainable AI (ex-AI). Informatik Spektrum, 41:138–143. Khair, Y., Dennai, A., and Elmir, Y. (2021). A survey on ACKNOWLEDGEMENTS cloud-based intelligent transportation system, icaires 2020. In AI and Renewables Towards an Energy Tran- The research was funded by the Lower Saxony Min- sition, volume 174 of LNNS. Springer. istry of Science and Culture under grant number Konecny,´ J., McMahan, H. B., Ramage, D., and Richtarik,´ ZN3493 within the Lower Saxony “Vorab“ of the P. (2016). Federated optimization: Distributed ma- Volkswagen Foundation and supported by the Center chine learning for on-device intelligence. CoRR, abs/1610.02527. for Digital Innovations (ZDIN). Kraus, S., Azaria, A., Fiosina, J., Greve, M., Hazon, N., Kolbe, L., Lembcke, T., Muller,¨ J., Schleibaum, S., and Vollrath, M. (2020). AI for explaining decisions in REFERENCES multi-agent environment. In AAAI-2020. AAAI Press. Molnar, C. (2020). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. Al-Abbasi, A. O., Ghosh, A., and Aggarwal, V. (2019). Lulu.com. Deeppool: Distributed model-free algorithm for ride- Ramanan, P., Nakayama, K., and Sharma, R. (2020). Baffle: sharing using deep reinforcement learning. IEEE Blockchain based aggregator free federated learning. Transactions on ITSs, 20(12):4714–4727. CoRR, abs/1909.07452. Ancona, M., Ceolini, E., Oztireli,¨ C., and Gross, M. (2018). Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J., Towards better understanding of gradient-based attri- Rueckert, D., and Passerat-Palmbach, J. (2018). A bution methods for deep neural networks. In In Proc. generic framework for privacy preserving deep learn- of 6th Int. Conf. on Learning Representations, ICLR. ing. CoRR, abs/1811.04017. Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Inger- Shrikumar, A., Greenside, P., and Kundaje, A. (2017). man, A., Ivanov, V., Kiddon, C., Konecny,´ J., Maz- Learning important features through propagating ac- zocchi, S., McMahan, H. B., Overveldt, T. V., Petrou, tivation differences. In Proc. of ML Research, vol- D., Ramage, D., and Roselander, J. (2019). Towards ume 70, pages 3145–3153, Int. Convention Centre, federated learning at scale: System design. CoRR, Sydney, Australia. PMLR. abs/1902.01046. Sundararajan, M., Taly, A., and Yan, Q. (2017). Ax- Ciskowski, P., Drzewinski, G., Bazan, M., and Janiczek, T. iomatic attribution for deep networks. In Proc. of the (2018). Estimation of travel time in the city using neu- 34th Int. Conf. on ML, volume 70 of ICML’17, page ral networks trained with simulated urban traffic data. 3319–3328. JMLR.org. In DepCoS-RELCOMEX 2018, volume 761 of Ad- Wang, G. (2019). Interpret federated learning with shapley vances in Intelligent Systems and Computing, pages values. CoRR, abs/1905.04519. 121–134. Springer. Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated Dotoli, M., Zgaya, H., Russo, C., and Hammadi, S. (2017). machine learning: Concept and applications. ACM A multi-agent advanced traveler information system Trans. Intell. Syst. Technol., 10(2). for optimal trip planning in a co-modal framework. IEEE Transactions on Intelligent Transportation Sys- tems, 18(9):2397–2412.

677