<<

RESULT PAPER

Artificial Intelligence (AI) in Security Aspects of Industrie 4.0 Imprint

Publisher Federal Ministry for Economic Affairs and Energy (BMWi) Public Relations Division 11019 Berlin www.bmwi.de Editorial responsibility Plattform Industrie 4.0 Bertolt-Brecht-Platz 3 10117 Berlin Design PRpetuum GmbH, Status February 2019 Image credits Gorodenkoff – Fotolia (title), ipopba – iStockphoto (p. 5, p. 6), matejmo– iStockphoto (p. 13, p. 17), monsitj – iStockphoto (p. 23) You can obtain this and other brochures from: Federal Ministry for Economic Affairs and Energy (BMWi) Public Relations Email: [email protected] www.bmwi.de Central ordering service: Tel.: +49 30 182722721 Fax: +49 30 18102722721 This brochure is published as part of the public relations work of the Federal Ministry for Economic Affairs and Energy. It is distributed free of charge and is not intended for sale. The distribution of this brochure at campaign events or at infor- mation stands run by political parties is prohibited, and political party-related information or advertising shall not be inserted in, printed on, or affixed to this publication.

2

Contents

Introduction ...... 3.

Management Summary ...... 4.

1. Artificial Intelligence: Definition and Categories ...... 6

1.1 Historical development – Phases of artificial intelligence ...... 7

1.2 Current methods and fields of use of machine learning ...... 8

1.3 Visions about future artificial intelligence and defining the relevant industrial areas ...... 10

1.4 Core security challenges ...... 11

2 . AI Assistance for Security Concepts ...... 13.

2.1 Identification and authentication procedures with AI assistance ...... 14

2.2 Detection of anomalies in data streams ...... 15

2.3 Detection of malicious software ...... 16

3 . New Attack Vectors Using AI and Mechanisms of Defence ...... 17

3.1 Use of AI for attacks ...... 18

3.1.1 Cyberattacks on office IT ...... 18

3.1.2 Cyberattacks on production OT ...... 19

3.1.3 Cyberattacks on the AI system used ...... 19

3.2 Use of GAN technologies for deliberately bypassing security systems ...... 21

4 .Outlook ...... 23

4.1 Concluding remarks ...... 24

4.2 Recommended actions ...... 24

5 .References ...... 26.

6 .Appendix ...... 28

6.1 Example: Border controls using AI ...... 28

6.2 Explanation of ‘Learning to protect communications with adversarial neural cryptography’ ...... 28

Authors ...... 29. 3

Introduction

This paper addresses the area of IT security in Industrie 4.0 in particular and not IT security in general. It presents the current state of applications of artificial intelligence. It does not discuss those concepts of artificial intelligence that have become outdated and have therefore lost their rele- vance. Neither does it speculate about futuristic scenarios that do not yet exist in the field of artificial intelligence today. In other words, it is not a science fiction report but a presentation of what exists today and the developmental potential of it. It presents the subject matter in enough technical detail to enable the reader to understand at least the theoretical workings of modern AI concepts and their impact on security systems. The authors describe how a generative adversarial network (GAN) functions, what role intrusion detection systems (IDS) and intrusion prevention systems (IPS) play in the area of security, and how these GANs are deliberately being used to deceive the latter systems. 4

Management Summary

This paper discusses the current state of knowledge regard- Chapter 2 discusses how artificial intelligence is used to ing the topic of artificial intelligence (AI) in security aspects assist security concepts. AI-assisted identification and of Industrie 4.0 in the early part of 2019. It is directed at authentication procedures are explained. Technologies those specialists and decision-makers in the industrial and that have become commonplace in such things as border political spheres who want to develop a basic understand- controls and access controls are still not found in many ing of AI technologies and how they can be applied in the industrial applications today. Another important topic is securing of networked systems. It illustrates that although the detection of anomalies in data streams, i.e. a kind of the dynamic developments taking place in AI in this area monitoring of the overall healthiness of the network activ- are capable of generating a diversity of economic usages, ities of companies and business associations. they are also a source of new security risks for which new defence strategies must be developed. AI can also help detect if it has not yet been able to take effect. Different kinds of pattern recognition can The economic relevance of artificial intelligence today is help detect and ward off risks at early points in time. The almost exclusively in the area of machine learning (ML). application of such technologies is obviously highly spe- This paper will therefore focus on this area of artificial cialised work that can normally only be performed with intelligence. All of the technologies discussed in this paper the help of knowledgeable and experienced software- and are components of machine learning. For a clearer under- service-providers. The intention of this paper is to simply standing of the subject matter presented here, a short point out the technologies that are currently available on history of artificial intelligence is given, starting with the the market today. time the phrase was coined at the Dartmouth Workshop. This is followed by short descriptions of the methods of the Chapter 3 discusses the more darker sides of AI in security three most important categories of ML technologies today, aspects. Cyberattacks on commercial enterprises are done namely supervised learning, unsupervised learning, and for a variety of reasons. They are often referred to under reinforcement learning. The emphasis here is on the basic the three headings of industrial espionage, industrial sab- concepts underlying modern neural network technologies, otage, and data theft. The use of artificial intelligence in which characterise the area of machine learning today. such attacks has unfortunately experienced an especially dynamic upturn in the recent past, i.e. since around 2013. The next section contains a brief summary of the central This paper distinguishes between cyberattacks on office security requirements of Industrie 4.0. Here too the goal is IT, on production OT, and the special ways that attacks can to elucidate the subject area in order to avoid misunder­ be made on AI systems themselves. It has become clear standings. Discussions about industrial IT security are that unchecked is a new source of risk in the often based on mistaken understandings of the difference security area as well and that the elimination of the human between security and safety, industrial markets and other factor is something that must be planned very carefully. If vertical markets, and between business matters and con- it is not, then something that was meant to save costs can sumer matters. lead to serious financial loss. MANAGEMENT SUMMARY 5

The second part of Chapter 3 is devoted to an entirely new Chapter 4 contains concluding remarks and forward-look- security risk. This risk stems from a form of neural network ing statements. Stakeholders in the industrial sector, i.e. technology that was first created in 2013 in the area of un­ manufacturers, integrators, and operators, and decision-­ supervised learning. An article was published in September makers in the political sphere are given recommendations 2018 in which it was clearly demonstrated that this tech- on the measures that can be taken. nology has the potential to at least partially disable the AI-based security mechanisms of malware detection. It is This paper is the product of the work of a new subgroup called generative adversarial network technology, or simply of Working Group 3 of Plattform Industrie 4.0 (Security of GAN technology. Its actual purpose is, in a game-theoretic Networked Systems) devoted to the topic of AI. The authors equilibrium between two neural networks, to generate new are of the opinion that the topic of artificial intelligence data that can no longer be distinguished, via the pattern-­ in the industrial sector, and therefore in the area of indus- recognition function of the second network, from the trial IT security as well, is going to be around for a long original samples in the training dataset. And if an intrusion time and that this paper will have to be updated in rapid detection system (IDS/IPS) is understood as one of these succession. We also would like to emphasise that this pub- networks, or if one of the networks comes under the con- lication must be understood as a part of a bigger project of trol of the other network, then the second network can Plattform Industrie 4.0 that deals with the relevance of AI generate malware that has the same disastrous effects as for Industrie 4.0 as a whole. This project is organised by the other malware from a training dataset but that is not clas- working group Technological and Application Scenarios. sified as harmful by the IDS. And because it is not classified Our contribution focuses on security aspects, and from this as harmful, no defence mechanisms are activated by the perspective is meant to represent a partial aspect of the IDS. The persons responsible for security in business enter- general discussion. prises should keep abreast of this completely new form of risk. 6

1. Artificial Intelligence: Definition and Categories 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES 7

1.1 Historical development – Phases of known as the convolutional neural network (CNN) for spa- artificial intelligence tially oriented (image) data (which was largely developed by LeCun in 1989), it became possible to recognise hand-­ There is no commonly accepted definition of ‘intelligence’. written postal codes and with this came the automation of What we have at best are negative statements of what postal sorting in the USA [2]: It was a sensation. The way ‘intelligence’ is not. The expression ‘artificial intelligence’ was open for the next spring of AI. was coined by McCarthy, one of the participants of the legendary workshop on AI at Dartmouth College in New The third phase of artificial intelligence began in the Hampshire in 1956, which today is regarded as the dawn of mid-1990s. This phase has lasted to the present day and the era of AI. Artificial intelligence at that time was primar- will presumably continue on for a long time to come. It is ily understood as a kind of mechanical reasoning ability. based on ripened and realistic expectations of how AI can During the early heydays of research activities, the lan- be used and it has one dominant focus: machine learning guages LISP and PROLOG were created, as was the concept (ML). Machine learning is concerned with finding solutions of the perceptron, which was seen as the core of the repli- to a broad spectrum of real and economically relevant cation of the biological brain. Once it became clear in the problems. It uses the construction of approximation func- early 1970s that the expectations of what AI was capable of tions with parameter constellations of almost any degree achieving in the short-term had been far too high, enthusi- of complexity whose values are derived from sufficiently asm for the topic faded rapidly, as did the majority of state large quantities of sample data. In stark contrast to the financing for research on it. McCarthy wrote that it would orientations of artificial intelligence in the first two phases, not take 10 years for machines to acquire simple human there are special areas today where the mechanical cogni- skills but rather 500 due to the technological breakthrough tive skills of ML are already superior to human capacities, needed in the area of mechanical processing power. The a superiority proven in a wide array of objective compari- first ‘AI winter’ had begun. Financial sources dried up and sons. scientific journals categorically refused to publish anything on the topic of AI. Applications that use AI-based algorithms to recognise patterns in data are commonplace occurrences for billions In 1980 a new ‘AI spring’ began, albeit with a new position- of people today in the age of the mobile Internet. They are ing. This time there were hopes of generating an interest one of the main factors for the success of many businesses. in a combination of a basic form of the reasoning capacity Today’s AI stems in part from the unsuccessful efforts of and a pool of expert knowledge in specific topics. These earlier phases. But it is the exponential increase of the ‘expert systems’, which were to be able to accumulate more computational capacities in IT systems of all kinds – from knowledge than any group of individual persons, was the smartphones right through to online mega-datacentres – central concept of artificial intelligence at that time. But coupled with the availability of these at extremely reduced this budding stage ended in disappointment as well, and prices that are responsible for its success. The technological the second ‘AI winter’ began, which lasted from the end breakthrough that McCarthy hoped for within the next of the 1980s to the beginning of the 1990s. Because this 500 years has become a reality much earlier than expected. new market of expert systems was unable to live up to the Computing speed has increased a millionfold. And in the IT commercial expectations that had been placed on it, new industry, the race is now on to increase machine capacities companies collapsed. But despite this, a variety of concepts as fast as possible. Application-specific computer technol- outside the actual focus of this phase were developed fur- ogy will increase the performance of systems engineering ther in significant ways. One in particular was the creation for AI by an additional three-digit factor within the next of a concept of multilayer networks of perceptrons. In 1986, five years and at today’s prices. The main objective of arti- Rumelhart, Hinton, and Williams applied the well-known ficial intelligence today and in the foreseeable future is/ operation from the 1960s, i.e. the backpropagation of errors will be the defining of the parameters of complex approx- of estimation for parameter optimisation, to multilayer imations based on principles of the theory of probabilities. neural networks according to established methods of Through this, estimated allocations of new observations to numerical mathematics and thereby founded the concept defined categories or value areas will increasingly become of deep learning [1]. With a new type of neural network more precise, more reliable, faster, and cheaper. 8 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES

1.2 Current methods and fields of use of line case of a neural network with no hidden layers, i.e. machine learning only the input and output layer.

Machine learning today comprises three areas: supervised The common kinds of neural networks for supervised learning, unsupervised learning, and reinforcement learning are: learning. Supervised learning is by far the most commer- cially relevant discipline. Algorithms that accord with the zzmultilayer perceptrons (MLPs) for data with unspecified principle of supervised learning are based on approxi- structure, mation functions with deliberately chosen nonlinearities and large numbers of parameters that are configured in zzconvolutional neural networks (CNN) for 2D/3D application-specific architectures. They require sufficiently pattern recognition, and large quantities of data with labelled content – for example RGB raster images with an alphanumeric labelling of the zzrecursive neural networks (RNN) for sequential data particular theme of the image’s content in relation to pre­ analyses. defined categories (e.g. 320 x 320 JPG/’common blackbird’). In a learning process, the system then processes the input An MLP is made up of layers or perceptrons that are fully data on the basis of the currently existing parameter values connected with each other in a feed-forward way, from the and compares the result with the label. Based on the devia- input layer to the output layer. A network with only one tions ascertained by this, the parameters are then adapted. hidden layer is referred to as flat; all other structures are ‘deep neural networks’. Increasing the number of param- In contrast to the concepts and goals in the earlier phases eters through wider layers (more perceptrons) or through of artificial intelligence, machine learning is not primarily additional layers (more depth) only generates better results concerned with the recognition of input data. It is instead if it is done using a special design in advanced network concerned with the generalising of the learning content architectures. Although it is generally desirable to limit the from new and not yet known data that fit into the classes number of parameters, there are now also many optimisa- specified in the training. For example, the system learns tion procedures for the learning process in all types of net- what an Alsation dog looks like and not only what the ani- works with which earlier problems are being overcome. mals in the related training images look like. It is not the individual animal that is being recognised but its member- CNN was the first of such advanced architectures. Using a ship in a particular class. The system makes a prediction in variation of the mathematical operation of convolution, the form of: ‘With an 87.3% probability, this is an image of an alternative to the fully connected layers of the MLP was an Alsation dog.’ created in the 1990s. It delivers significantly better results for spatial pattern recognition (images, videos, compo- Supervised learning is divided into different procedures nents from these) and allows for configurations with more for different types of data: continuous as opposed to dis- layers. Remarkable advances in image recognition were crete data, spatial data (patterns, images), and sequential made possible through deep learning: Machines learned to data (language/speech, text, audio/video). The analysis of see. The error rate in the ImageNet competition (1,000,000 continuous data using stochastic methods is traceable to samples of images from 1,000 categories, carried out annu- the work of Gauss in 1801 and is known under the name ally in 2010 to 2017) was 28% at the beginning. The eight- ‘linear regression’. It is not suited for dealing with discrete layer CNN AlexNet of the University of Toronto reduced problems. Such tasks are concerned with the classifica- this value to 16% in 2012. All of the subsequent winners tion of observations in classes. McFadden and Heckman were CNNs, and three important variations were created received the Nobel Prize in Economic Science in 2000 for (VGGNet, Inception, ResNet) with which the error rate was their development of the methods of ‘logical regression’, a reduced to around 2% (human rate in the same benchmark principal of classification related to linear regression. From is around 5%). The number of layers grew to 152. CNNs are today’s perspective, this procedure can be seen as a border­ integrated today in practically all digital cameras, including smartphones. 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES 9

RNNs make up a special group of architectures for han- IT market. Unsupervised learning is based on data with- dling sequential data (text, audio, language/speech, video) out labels. Its function is to structure this data according [3]. The two leading operations are long short-term mem- to similarities or to generate new data that fits into these ory (LSTM) und gated recurrent unit (GRU). LSTM was structures. A variety of algorithms are used for structuring developed at the Technical University of Munich (Schmid- data. The most common one is the k-means algorithm, huber, Hochreiter) from the research work on overcoming which has been successfully in use for a long time. It is a the vanishing gradient problem, a problem that has to be relatively simple algorithm that recognises iterative clusters overcome in all neural networks [4]. The adjusting of the the number which is externally prescribed. For generating parameters (weights) of the connection of one network new data, a new algorithm was developed five years ago layer to the next higher one is achieved through the grad- under the name GAN (generative adversarial network). ual shifting of it out of the output layer in the direction of Its significance grew rapidly and it has even given rise the gradient of the error. This is the steepest descent direc- to a whole new class of methods [6]. A GAN generates a tion – the shifting reduces the error therefore in a very effi- game-theoretic equilibrium between two adversarial AI cient way. In order to calculate the vector of the derivatives systems. One is trained to distinguish ‘true’ data from (of the gradient) of the error in the direction of the weights, ‘faked’ data. The other transforms noise into new data that the chain rule of differential calculus has to be applied to is presented as ‘true’. The probability distribution for the the lower layers. In this operation, products of very small construction of new data from the noise is then being iter- numbers are created, which for deeper layers ‘disappear’ atively adjusted until the new ‘faked’ data can no longer be in comparison to the random initialisation of the weights. distinguished from the ‘true’ data. Their relative size is too small to alter the weights. The only results are round-off errors and no adjustment takes place. Reinforcement learning (RL) stems from the methods of This is why in simple MLPs ‘learning’ often stops as early operations research (OR) carried out in the 1950s, which is as the third layer. Particularly in cases involving sequential also known under the name ‘dynamic programming’. The problems, for example when the relationships between basic principle of this method rests on the Bellman equa- words in lengthy texts have to be recognised, other archi- tions pursuant to which an optimal decision is reached in tectures reach their limits. Almost every smartphone today that the current state is converted into a tomorrow state of contains apps with LSTM or GRU. which it is already known that it will be optimal tomorrow. In this way, the path of decisions can be optimised iteratively From the computing load that was already generated by based on the end perspective, and this also under the sto- AI-related inferences in Google computing centres in 2014, chastic scenario of a Markov process. RL is therefore not con- it can be inferred that the capacities of cloud computing cerned with the recognition, structuring, or creative enlarge- centres will foreseeably have to double in order to meet the ment in data records but with the generation of strategies world’s needs. In order to limit the costs needed for this, an and plans for attaining goals and the associated acquisition application-specific additional processor (TPU, tensor pro- of cumulative or final rewards in the process sequence. cessing unit) for standard servers was developed and pro- duced [5]. The successor technology (TPU2), with which the AI only comes into play in this connection when the vol- learning process is also being accelerated, is currently being ume of possible states and optional actions of such a pro- introduced. The composition of the inferences pursuant cess is too big to realistically be able to solve the problem to which TPU1 was developed was based to 61% on MLP using conventional mathematical methods. For example, (5, 20M), 29% on LSTM (58, 52M), and 5% on CNN (16, 8M) the states and optional actions of the game GO comprise – the values in brackets show the most frequent number of more elements than atoms possessed by the planet earth. layers and weights (of network parameters to be learned). One approach to solving these kinds of tasks is to deal Therefore because of inferences alone, supervised learning with the actual problem using the approximation methods is a massive burden on modern cloud processing centres. available to technologies for dealing with neural networks. ‘Q learning’ is a method with which the application of the In comparison to the above, the other branches of ML play established OR methods of ‘value iteration’ and ‘policy iter- a more subsidiary role in terms of their significance for the ation’ can be extended to problems of greater complexity. 10 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES

In addition to this, RL is an area of extremely advanced too small’. One of the sentences would be translated incor- methods for problems with special characteristics, which rectly. The contextual meaning is the same. was first created within reinforcement learning. The absence of this ability can be observed with small chil- dren. They do not have to look at 5000 pictures of cats to 1.3 Visions about future artificial intelligence be able to recognise cats. The human learning algorithm and defining the relevant industrial areas seems in fact to be more ‘self-supervised’ in nature. The brain repeats the learning steps as many times as needed Industry up to now has regarded AI as primarily a new kind until the concept has been understood and then turns its of technology to be used for automating simple routine attention to other interesting aspects of the world and tasks. It is already being used in this way in many occupa- continues on with the learning process. Although it is tional areas, and new applications are coming into use all uncertain when such a learning technique will be able to be the time. But in many areas, AI has achieved performance created, there is no reason to doubt that this developmen- levels that far exceed those of human beings. There are tal level of AI will in fact be reached. At that point in time new areas in which machines are not only working more there will be two kinds of AI: statistical parameter learning cost-efficiently but are also reaching new levels of quality within today’s contextual boundaries and ‘general’ AI (also in intellectual tasks. Deep neural networks with sometimes referred to as ‘strong’ AI). up to nine-digit parameter figures are able to recognise complex patterns in high-definition images not only faster Some features of general AI would be interesting for con- than human beings but also with more precision in some tinuous learning – the self-driving automobile would not applications. Even the performance of goal-oriented tasks repeat the same accident twice. This will also be the phase is an area that tests have shown will eventually be taken of AI where ethical questions will become very important. over by intelligent machines. With reinforcement learning, For industrial applications, however, general AI is of little AlphaGO crushingly defeated every GO world champion help. What the industrial area strives for is a controlled in three tournaments. In only four hours training time, environment. No one is interested in what operations AlphaZERO also crushingly defeated the world’s leading the machines are learning in a ‘self-supervised’ way or chess computer system. Since 1997, chess is regarded as what new discoveries they will surprise the company with a strategy game that human beings can no longer win tomorrow. against the machines. With machine learning, modern AI now even wins against the leading machines programmed General AI is more a topic for philosophical discussions by human beings. or for the realm of science fiction. It also lends itself to research activities, the topic itself being so interesting. New fields of application are coming into existence in all A public discussion is currently underway about what is areas where ‘superhuman skills’ are being created through referred to as ‘technological singularity’ – whereby the AI: in industry, private life, administration, research, the world will be taken over by artificial intelligence and military, and other areas. But the vision that AI has had human beings will no longer be able to understand this from the beginning is conceptually much wider. AI is still world because it goes beyond their intellectual capacities. unable to process context-related problems if the bounds Such issues are not the subject matter of this article. of a single data base controllable with LSTM have been exceeded. The ability to do this would require machines Along with AI applications for actual geometric product that could create, at least in a specific area, their own world data and tolerable error deviations from technical control view representative of such a context. For example, today’s points, AI algorithms could also be used for such things machine translation is basically just a transfer of vocabu- as the automated detection of the most minute material lary. Even making correct correlations to the pronoun ‘it’ defects in profiled sheeting, especially if today’s quality in an English sentence is an insolvable task if contextual expectations continue to get higher. The same applies to knowledge is needed. ‘The trophy did not fit into the suit- the automated detection of processing defects, for example case – it was too large.’ To make the correct correlation in pressure die-casted parts. here, you have to know that small things fit into large things. The machine translator would choose the same The use of AI in the industrial context should be approached correlation if the second half of the sentence read ‘it was with a sound sense of reality. Experience in the area of 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES 11

highly automated use in the industrial setting has shown connected via communication networks across national [7] that AI applications in production processes should take borders, and cooperation activities along supply chains place under the control of ‘human intelligence’ accord- are increasingly becoming automated. This creates addi- ing to the ‘human in the loop’ principle, for example for tional targets for attacks, which means a much larger area ensuring that human operators are able to intervene at all in which vulnerabilities can be found and in which value times. Human knowledge and human experience are still chains and the businesses associated with them can be regarded as crucial factors for success. invaded and harmed. Potential attackers are also becoming more skilful through the wide availability of attack tech- A good example of this is a case where assembly-line nologies and know-how. Attacks can even be commercially robots had been in use for welding certain bottom plates ordered from criminal providers in the Darknet. in automobiles. The robots were later removed and the work once again carried out successfully by skilled human The expansion of networked communication via a large workers after it had been discovered that the welding seams number of participating devices in a variety of security produced by purely mechanical means did not achieve the domains (within a value-added network) requires a great exact, millimetre precision required. During this special pro- deal of trust between the cooperation partners. The inclu- cess, human beings are very quickly able to recognise and sion of business partners in automated processes creates immediately correct the tiniest deviations in welding seams. a challenge for the coordination of security processes. For instance, attacks using social engineering must be pre- The interesting point here was that a production line, once vented similarly and comprehensively throughout the it had been configured to perform as a fully automated entire security domains. Weak security structures of one process, remained at the same developmental level. Neither of the participating partners automatically endangers the robots nor AI can improve existing processes. New develop- security of all the others. ments and improvements can only be achieved by human beings, which is why they should always be the focal point Dynamic and flexible Industrie 4.0 architectures cannot of such tasks. be fully defined as its configuration is able to adapt to the requirements. This is a case where AI could be applied. What also came to light in the course of the conversion Examples of such cases include the AI-assisted, self-organ- of the aforementioned production process was that the ising warehouse management, or the dynamic modifica- knowledge of the production processes crucial to the com- tion of production process chains in accordance with the pany’s success was seriously lacking throughout the whole capacity utilisation of machines or with production costs. company. The workers had not been provided with suffi- These require agile and learning-capable security solutions. cient opportunities to integrate the knowledge they needed A static detection of anomalies that for example works on about the automated processes. A complicated search for the basis of learned production processes would often issue the relevant experts across a number of continents then incorrect error warnings in the above cases due to the fact had to be undertaken. These experts then functioned as the that their workflows can fundamentally change within nucleus for the transfer of knowledge needed, and it was short periods of time. In such cases, additional metadata only through them that the comeback of the production has to be included in the training process. The underlying specialists was possible. learning processes must themselves also be protected from manipulation. A manipulation of the learning process in Any meaningful automation of industrial processes is only the above examples would enable attackers to manipulate possible through the interaction between AI systems and the production process chain in such a way that the pro- human, interdisciplinary expertise. duction of products would no longer be possible.

The required security architectures and mechanisms must 1.4 Core security challenges also be able to keep abreast of the long life-cycles of pro- duction equipment and machinery. Because the skills of The ways in which malicious attacks can be carried out on the attackers are also improving in line with the techno- industrial systems are increasing. One undeniable reason logical developments, things such as cryptographic-based for this is the increased networking in conjunction with security mechanisms may need to be adapted (e.g. through Industrie 4.0. Formerly isolated facilities are now being patches) to the actual changing security requirements. 12 1. ARTIFICIAL INTELLIGENCE: DEFINITION AND CATEGORIES

In addition to the physical implementation of an Indus- the VDMA study, one third of cyberattacks in 2017 resulted trie 4.0 production environment, there is also its so-called in production and operational shutdowns. At the heart of ‘digital twin’1 (see the Glossary of Platform Industrie 4.0). these are often monetary interests, as the NotPetya case has A digital twin contains in principle the entire data asso- shown. ciated with the production and the product. Production processes are planned, simulated, controlled, and also Additional information on the aforementioned challenges monitored in the digital twin. Many of these functions is available in the following publications of Plattform are performed on platforms (clouds, Office IT) that are Industrie 4.0: physically separate from the production facilities. The two worlds communicate constantly with each other and they zzSecure Communication for Industrie 4.0 [10] must have the same security levels in order to prevent the compromising of the overall system through successful zzSecure cross-company communication [11] attacks. Such a security architecture must take into account both machine-machine communication as well as human-­ zzSecure identities [12] machine communication in equal measure, which places higher demands on the complexity and requirements of In order to meet these new challenges, additional or more the security mechanisms. extensive security measures are needed. These must focus on Industrie 4.0 components and on their internal security In order to understand such potential communication and external communication behaviours. problems, one should for example be aware of the fact that human sense organs do not work with the same data for- The issues involved in meeting these challenges are of an mats and degrees of precision that sensors and processing organisational as well as a technical nature. units of machines work with. Human beings generally see images in 3 x 8-bit RGB or in 8-bit greyscale. A neural net- An example of an organisational security challenge is the work learns with 16- or 32-bit floating-point arithmetic. By coordinating of across-the-board security measures within adding noise to the more exact format, images can be gen- an Industrie 4.0 value-added network. Such a coordination erated that look identical to human beings when viewed on can involve the collecting of the data needed for the secu- computer screens or in print but that machines see as rad- rity system from sources outside the confines of the com- ically different. In this way, attack scenarios can be created pany, including from parts of the Internet. that are very difficult to detect in the separate human and machine processes. In the case of new security challenges of a technical nature, AI can be very beneficial for detecting and assessing anom- All of the threats known to Industrie 3.x are relevant for alies within an Industrie 4.0 value-added network, for Industrie 4.0 as well. Instances of massive cyberattacks on a example. This is because many of the workflows within a regular basis aimed at industrial espionage, industrial sabo- value-added network are automated (i.e. without human tage, and data theft have been identified there, as shown in intervention) and therefore are often very similar to each the 2018 report of the Federal Office for Information Secu- other. In other words, statistical variance declines and devi- rity [8] and in the survey of the members of the Mechanical ant behaviour becomes easier to detect. Another security Engineering Industry Association of (VDMA) at challenge is created by the fact that the attackers of Indus- the end of 2017 [9]. Along with malware, , and trie 4.0 infrastructures are also using AI, for example to DDoS, the report of the Federal Office for Information develop suitable methods of analysing pass-through data Security also lists malicious software that enables special once such an infrastructure has been invaded. attacks on industrial control systems (ICS). According to

1 www.plattform-i40.de/I40/Navigation/DE/Service/Glossar/Functions/glossar.html?cms_lv2=157706 13

2. AI Assistance for Security Concepts 14 2. AI ASSISTANCE FOR SECURITY CONCEPTS

Artificial intelligence is a basic technology that can be the purchase was made using the customer’s fingerprint used to automate routine tasks and to increase speed and together with a direct debit authorisation. With the under- precision in processes. It can also contribute to the solving lying AI system, this technology appears to be considerably of those tasks for which no algorithmic solutions previ- faster than paying by cash, with a debit card, or with a ously existed. Some forms of automatable routine work smartphone. can be quite complex and require highly skilled expertise. Remarkable examples of pattern recognition are found in Biometric technologies for identifying persons are based the medical field, quality assurance area, system controls, on photographs and automated comparisons with image and similar fields of application. In the security area, assis- data. The electronic image data record in the passport is tance systems can provide support to skilled personnel, can compared with the photograph taken by the camera at the completely take over certain tasks, can improve the perfor- e-gate (automated border control gate). AI systems are used mance of processes, and with machine learning can open to compare the stored facial image and the photograph up task areas that were not yet accessible by programmed taken by the camera. It can also be compared with the pho- algorithms. AI therefore opens up new perspectives for tographs in a wanted file or with a ‘no-fly’ list that some overcoming the problem of a shortage of skilled labour and countries issue in regularly updated forms. for improving protection. In addition to such AI applications geared to individual personal data, behaviour-based models are also being used 2.1 Identification and authentication for AI systems. These include such things as a person’s gait procedures with AI assistance or typing style on a computer keyboard or laptop. These clearly verify that something is a person and not a machine With the introduction of the electronic passport on and the authenticity of the person. Another example is 1 November 2005, German travellers became acquainted computer-assisted handwriting recognition, for example with the use of biometric data for the identification of on contracts or deeds. For training these AI systems, large persons. The access systems in industrial facilities can be data records are essential for the application and for high seen as analogous to these. Some companies, banks, and quality recognition rates. especially high-security institutions in Germany use access procedures based on biometric technologies, for example In the case of photographs and video recordings, AI-based to get into a building or to access specific parts of a build- algorithms could be used ing. This can be done using a one-factor authentication: a central database, previous registration of the authorised (a) for recognising and localising faces and other features person, and then a 1:n data comparison. Or it can be done searched for in an image (matches are marked with using a two-factor authentication, such as possession of an coloured frames) and identity card plus one individual feature, for example a fin- gerprint. The identity card in such cases is usually a pointer (b) for identifying persons through the recognition of to a data record in a database and a 1:1 data matching is learned individual characteristics and features. made. Recordings via video surveillance cameras can be accom- For more than five years already, certain Japanese banks plished within seconds. Although this is generally not have been using fingerprint recognition for their auto- a form of authentication usable for judicial evidentiary mated teller machines in addition to entering a PIN. And purposes, it can significantly increase the number of iden- beginning this year, some are even using facial recognition tification procedures carried out. Suspicious persons can be with an audio greeting of the customer. After the electronic identified in large crowds of people, their behaviour can be debit card is inserted in the ATM, the customer can with- assessed, and they can be linked to objects being carried. draw his money. In some large urban centres in Europe, both AI procedures A supermarket chain in Hamburg conducted a pilot study are being used to surveil public areas and facilities, some- at the cash registers pursuant to which a confirmation of times using several thousand video recording devices. 2. AI ASSISTANCE FOR SECURITY CONCEPTS 15

London and Stockholm, for example, use automated With respect to these systems, a distinction must be made recording systems to record motor vehicle licence plates between two types of IDPSs: those that detect abuses and for the collection of city tolls from the drivers responsible those that detect anomalies. For detecting abuses, specific for them. These systems are also based on artificial intelli- patterns are extracted from the modelling of the attack gence systems. and the system is systematically searched for these. Anom- alies on the other hand are detected when the system’s Transferring these ideas to Industrie 4.0, sensor data behaviour deviates significantly from the ‘norm’. A detected recorded by machines could be used, for example, to depict anomaly therefore does not necessarily have to be an abuse. the individual characteristics of a machine and the oper- ational state of it. Things such as noise and/or resonance IDPSs supplemented with AI functions can be trained with measurements, temperature values in the operations, servo deliberate attack patterns, for example with denial of ser- controls, path controls, and other data from production vice (DoS) attacks, and conversely with so-called ‘normal machines could make use of AI algorithms to calculate states’ in order to detect anomalies. In general, the reliable impending maintenance work or replacement dates. The detection of these normal states is the biggest problem for three overall purposes for which AI algorithms can be used such operations, especially in cases where normal states are for can dynamically change and the occurrence of new pat- terns is normal. zzmaking predictions, The generation of false alarms (so-called false positives) zzplanning preventative measures, and characteristic of classic IDPSs can lead to ‘alarm fatigue’ on the part of security personnel. This undesirable effect zzinitiating measures. can be avoided through the use of IDPS-AI tools based on behaviour-oriented models. 2D and 3D image recognition and AI-assisted data match- ing against reference images could be used in the auto- However, the spectrum of AI applications today already mated quality-assuring processes of (intermediate) prod- extends far beyond these various uses. A simple example ucts. This would increase the quality in production itself here is the ability, with the help of AI, including in combi- and reduce inspection times, which would reduce produc- nation with the classic search patterns of IDPS tools and tion costs as well. their alarms, to classify alarms (e.g. type of attack, critical­ ness, real or false alarm, etc.). With the help of a super- vised learning model, the results could be continuously 2.2 Detection of anomalies in data streams improved over the course of time in the operations where the application is being used. Of the best known kinds of security measures for improv- ing IT security in company networks are the so-called Market forecasts predict that new technologies and meth- intrusion detection systems (IDS) used on computer sys- ods, such as analytics, machine learning, and behaviour-­ tems and computer networks. They can be used to supple- based recognition, will be integrated in most IDPS tools ment firewalls; and in their more advanced forms, i.e. as and available on the market starting in 2020. intrusion detection and prevention systems (IDPS), can, in a semi-automated manner, actively prevent cyberattacks. Regardless of the progress being made to the arsenal of defence mechanisms, modern IDPS-AI tools should not be The three main types on the market today are the host- seen as the universal solution to cyberattacks, and espe- based intrusion detection systems (HIDS), network-based cially not for all time to come. They should instead be seen intrusion detection systems (NIDS), and hybrid intrusion as a milestone in the ‘hare and the hedgehog’ race between detection systems (HIDS). Whereas HIDSs analyse infor- increasingly intelligent attackers and defenders. mation from log files, kernel files, and databases, NIDS are used to monitor data packages in the network, and the hybrid forms combine the two principles in a single tool. 16 2. AI ASSISTANCE FOR SECURITY CONCEPTS

2.3 Detection of malicious software Early detection mechanisms make use of the fact that in most cases malicious software, on account of its complex- Another area of application for artificial intelligence is the ity, is not being completely rewritten. At least code frag- detection of malicious software at the earliest point in time ments are being reused in many cases. Modern ML meth- after an invasion. ods can enable learning processes with which suspicious codes can be detected by AI, even in those cases where Machine learning methods can be used to detect malware attempts were made to modify the outer appearance or on single devices or in networks. There are two ways of the impact pattern. AI is able to detect potential malicious doing this. The first involves the monitoring of the system software in this way, in order that the quality of it can be for detecting anomalies either subsequently analysed by specialists. In such cases, AI func- tions as an instrument for the automated analysis of a large (a) through the network activities via a surveillance server volume of candidates and suspicious occurrences. or a surveillance service in the network, or In addition to conducting code and impact analyses in a (b) through the analysis of the characteristics of the indi- controlled environment, other characteristics in relation vidual device, for example performance indicators of to the spreading paths of potential malware can also be the hardware and statistics on various processes. learned. Through which path did the software enter the network? Could its origin be connected to other known The second involves analysing potential malware using sources of risk? AI-assisted malware detection is capable classification algorithms based on ML for identifying pos- of inferring many suspicious occurrences from learned sible malignancies. Although the different kinds of mal- experiences, which makes it a valuable assistance system ware have different kinds of ‘signatures’ that are used for for designing and updating security systems. This kind of categorisation purposes, the patterns between only slightly AI application is also an example of a situation in which modified forms of malware are fuzzy and therefore not human labour is being assisted but not made redundant easily detectable using the classic methods. ML provides a by artificial intelligence. This is because the classification way of also identifying these initially fuzzy differences in of software as harmful remains a very sensitive task that malware patterns. This can be done using either a static or harbours the potential for major financial loss. Compre- a dynamic analysis of an executable file. Such AI applica- hensible decision-making and financial accountability are tion mechanisms are usually used by the manufacturers of matters that still require human participation today. security systems to increase the reliability and currentness of security updates. 17

3. New Attack Vectors Using AI and Mechanisms of Defence 18 3. NEW ATTACK VECTORS USING AI AND MECHANISMS OF DEFENCE

3.1 Use of AI for attacks known to the software manufacturer but for which no remedial patches are available and which can be misused Cyberattacks carried out on commercial enterprises are by attackers. The more advanced and longer lasting threats usually referred to under the three headings of industrial are usually the so-called advanced persistent threats (APT). espionage, industrial sabotage, and data theft. The goals Individual persons can be targeted for attacks through pursued by them vary. Some are aimed at obtaining a com- long-lasting social engineering methods that are combined pany’s confidential information, such as its latest technical with subsequent technological attacks. developments on a machine or a product, while others have monetary interests as their goals, as was seen in the The more complex forms of attacks on office IT are in- NotPeyta case. Cyberattacks carried out with AI assis- creasingly making use of AI methods. Very advanced forms tance are more accurate, more precise, and more effective of cyberattacks are imitating the user behaviour of persons at bypassing control systems, as discussed in section 2.1 in key positions. They use such things as voice imitation in above. Attacks that combine human and computer-assisted telephone calls or the form of address used at the begin- methods make use of the various data sources and com- ning and the salutation at the end of an e-mail. This often munication systems in the office IT and the production OT involves e-mails from CEOs to their internal employees, to identify vulnerabilities in order to plan and implement which is now referred to as ‘CEO fraud’. more effective cyberattacks. Deep Locker attacks are particularly menacing. Based on The following discusses three areas where cyberattacks are a variety of previous observations of the victim, usually carried out using artificial intelligence, namely cyberattacks through publicly available information, secret and indi- on office IT, on production OT, and on the AI systems used. vidual cipher keys are created and are then used for the encryption of malware and as secret triggers for later attacks via a correspondingly trained deep neural network 3.1.1 Cyberattacks on office IT (DNN). After the target system has been compromised with the infiltrated DNN and with the practically unidentifiable The majority of cyberattacks are made via e-mails and encrypted malicious code, the secret trigger sits on the tar- Internet applications. Conference systems may also be a get system waiting for its signal. This is reminiscent of the way for malware to penetrate the office IT in future, as famous ‘sleeper’, who remains undisclosed for a long time Deep Locker was able to show [13]. How Deep Locker works waiting for a specific event to occur and when it does he will be explained at the end of this section. In principle attacks. For example if, via a camera in the target system, a there are two basic types of AI-assisted cyberattacks: certain face is detected at a certain time and certain place, then the attack can begin. zzattacks that are more technical in nature, and A trigger can also be a predetermined action in a social net- zzattacks on organisational structures. work or during a particular online conference. An exper- iment demonstrated how WannaCry ransomware could There is some overlapping between these two or the differ- be hidden in a video conference application in such a way ences between them are not easy to distinguish. that it was not detected by anti-virus programs or sandbox mechanisms. The more simple kinds of attacks include phishing attacks, which send out massive quantities of e-mails containing Because secret triggers can be sitting and waiting in nearly links to various kinds of malware. One of the most well- every kind of data stream, they are currently regarded as known attacks is WannaCry. The more intelligent kinds of being impossible to detect. And once such malware has attacks include spear phishing attacks, which send person- been activated, the detection of the attack is probably too alised e-mails containing links to things such as Trojans late. with backdoor functions. They are also used for zero-day attacks. Zero-day vulnerabilities are software security flaws 19 3. NEW ATTACK VECTORS USING AI AND MECHANISMS OF DEFENCE 19

3.1.2 Cyberattacks on production OT autonomous driving, or voice recognition. But such attacks can theoretically be applied to every kind of ML applica- According to a study published by the VDMA at the end of tion that analyses detection and classification patterns in a 2017 and based on a questioning of companies in Germany, quantity of data points (e.g. sensor data). As to whether or a third of all successful cyberattacks on companies in Ger- how these could be exploited in concrete industrial appli- many lead to production or operational shutdowns. The cations depends on the respective industrial environment, attackers are often pursuing financial goals through the use scenario, and model used for the threat. of such things as extortion or ransom demands. One way of attacking ML systems that classify large quan- But there are also new forms of attack targeted at obtaining tities of data (e.g. image recognition) is by manipulating production information. An example of such are cleaning the input data. This would take place once the training of robots, which were rented to clean production halls during the ML system is complete, i.e. the ML system itself is then shift changes. These networked, digital aids are capable static and no longer changes. of performing undesirable espionage tasks on the side. By means of AI-assisted controls, desired locations can be The input data can be manipulated in a number of ways so approached, spied on, and analysed with the help of incor- that erroneous classifications are made: porated sensors, for example cameras that are supposedly for orientation purposes. zzThrough a placement of specially calculated digital artefacts (stickers, graffiti), an attacker can deliberately cause the erroneous classification of traffic signs [14]. An 3.1.3 Cyberattacks on the AI system used example here are stop signs that were erroneously inter- preted as direction signs by the procedures being tested. There is a danger of an existing AI system, for example an The most interesting point here is that the characteris- intelligent IDPS in a company, being deliberately manip- tics that are clear for human beings – e.g. the octagonal ulated via a cyberattack. The sensor data of a machine can form of the sign – were obviously not appropriately be modified before it reaches and is processed by the AI weighted by the ML system used. system. With manipulated input data, the AI algorithm can make incorrect decisions or predictions. Because the source zzOther kinds of attacks generate similar errors, for exam- codes of AI systems are sometimes known or disclosed, ple where ML systems for facial recognition are being attackers can try to modify the AI algorithm itself and deceived by eye glasses [15]. thereby deliberately influence results. zzIt is possible to deliberately generate synthetic image AI-based systems must be protected from attacks in the data that an ML system thinks it is classifying correctly same way that traditional systems are. In order to do this, [16]. Where the human eye only sees abstract patterns it is crucial to first understand the possible areas of AI sys- or noise, the ML system sees and classifies the images as tems that are vulnerable to attacks. An additional threat animals, fruit, or technical devices. since 2013 is a new form of unsupervised learning (GAN, discussed in Chapter 3.2). It involves the ability to design zzAn attacker can superimpose digital artefacts or noise on adversarial neural networks in such a way that the trained correct input data, which results in the making of erro- detection abilities of another given network can be deliber- neous classifications by the ML system. An example of a ately destroyed. manipulation using noise is when an image of a panda is, due to the noise, erroneously classified by the system The following section contains a short overview of the as a gibbon [17]. But the human perception of the pic- latest research results regarding the potential security vul- tures as pandas is completely identical. nerabilities of AI-based systems, especially ML systems. The current discussion focuses primarily on the manipulations And of course the input data for classic algorithms can of and attacks on non-industrial applications, for example also be manipulated. When ML methods are used, a higher on identification/border controls, image recognition for resiliency or robustness vis-à-vis fuzzy input data is gen- 20 3. NEW ATTACK VECTORS USING AI AND MECHANISMS OF DEFENCE

erally expected. But as the above examples show, this does English is: ‘At the end of the day, it is too late.’ The authors not apply when deliberately calculated manipulations are are not aware of any comments on this from Google. made. Added to this is the fact that some manipulations are extremely difficult for human users to detect and the errors Another point is that these AI systems allow people to are not readily comprehensible. This can make it more dif- make ‘improvement suggestions’ online. Although weight- ficult to enact counter measures, analyses, and forensics. ing filters are supposed to prevent improper manipulations, as soon as another AI pursuant to the GAN technology (see Another way of attacking ML systems is through the mani­ section 3.2) begins to influence it, then the credibility, i.e. pulation of the training data. The ML system’s behaviour is the quality of systems such as these that are practically no thereby flawed from the outset, and attackers can manipu- longer verifiable, becomes questionable. late the system’s behaviour as they see fit. The German DeepL translation service [20] chose a differ- Online translation systems are tools that are widely used, ent approach here. In addition to its own, improved algo- and their translation results are increasingly being relied on rithms, it also placed fundamental emphasis on high-qual- without verification. A leading translation service has been ity output data. The training data stems primarily from working as a neural machine translation system since 2016 official documents taken from the Internet that were trans- [18] in a large variety of different languages. lated by professional translation services over a period of a good ten years. Afterwards, the translations were subjected Little is known about the vulnerabilities or side effects to a quality assurance check by a community of volunteer of this system. If, for some languages, the internal neural professionals in a mammoth global action. The basis of this network was created on the basis of small amounts and credible network currently comprises over a billion highly strange kinds of training material, then under certain cir- qualified training data. It was chosen as the best transla- cumstances the translation results can be incorrect and tion system by prominent European and international devoid of meaning. The results are similar to the bazar pat- native-speaking journalists. terns that the Google DeepDream generator recognises in images and accentuates [19]. Another aspect concerns the confidentiality of the training data. Training data can be retrieved by attackers from ML Because the training data and the AI models were imple- systems [21]. The actual goal of an AI system is, for exam- mented in so-called black boxes, it is nearly impossible to ple, to train a neural network in such a way that allows assess the quality of this material. And Google’s approach abstractions to be made from the training data. It is not is to use as much training material as possible. The models supposed to merely recognise the input data but to learn are designed to produce results no matter what, as long as the underlying concepts instead. The example studies a they have some semblance to human language. procedure that, on the basis of a training phase using text data (e.g. e-mails), is supposed to generate suggestions for If this system is offered unusual input for translation, the the completion of strings (words or sentences). Secrets in translation produced will appear as running text but will the form of credit-card numbers are hidden in the train- have no connection to the input. It is possible that, for the ing database. The ML system learns these secrets and an training of the system, material was used that was inade- attacker, who can use the system later to create her own quately verified in terms of its quality and that was based texts (black box), is able to retrieve the credit-card num- on religious scriptures – such as the Bible, the Koran, the bers from the neural network. Although the amount of the Tanakh, the Torah, and others – due to their availability in information stored in the neural network is insufficient for multiple languages. storing the entire text database, the secrets are reproduced in their exact form. The training of the neural network is If Google Translate is confronted with the senseless discontinued before an overfitting occurs. The described sequence ‘daba da ba du da bada ba du’ in Somali, then the example shows that the secrets are already learned at a algorithm presumably resorts to sentences in the afore- relatively early training stage (few iterations/epochs) before mentioned training data. The translation of the above into the time the training is usually brought to an end. 3. NEW ATTACK VECTORS USING AI AND MECHANISMS OF DEFENCE 21

In light of the above examples, the following points are the 3.2 Use of GAN technologies for deliberately minimum of what must be observed with regard to the bypassing security systems security of ML systems: The following is a presentation of a concrete process zzThe integrity and authenticity of the input data is of whereby an intrusion detection system (IDS) or an intru- fundamental importance, as in the case of classic algo- sion prevention system (IPS) is being disabled by an ad- rithms. The robustness/abstraction of the ML proce- versely designed neural network. The specific kind of IDS/ dures vis-à-vis fuzzy input data does not help prevent IPS technology is irrelevant. They are usually AI-assisted­ deliberate manipulations. systems, but they could also be based on any other method. The idea is that the network continues learning the be- zzThe integrity and authenticity of the training data is haviour of the IDS/IPS until it is capable of generating also important. It must be transparent to the users of malicious software that the IDS no longer recognises as the AI system just which data the provider of the AI sys- malicious. This process is also an expression of the afore- tem used for the training. At least the criteria used for mentioned asymmetry between attackers and security selecting the training data must be disclosed. On this systems. An attacker can use the most current version of point, see the DeepL example above. It is not surprising the security system itself to develop and test his own attack that the trust requirements for training data and valida- network while remaining invisible himself to the security tion data are similar to the requirements, selection, and system. It is therefore crucial to individualise IDS/IPSs methodology with respect to the statistical results and through the use of policy configurations that distinctly this should always be understood as a crucial human differ from the standard settings. This makes it much more filter. difficult for attackers to make use of asymmetries. GAN (generative adversarial network) is a technology from the zzThe human user must always be able to discern and area of unsupervised learning, and although it was devel- verify whether the AI system is continuing to behave in oped only five short years ago, it has received a great deal of conformity with the rules. Findings indicative of a possi- attention. In the category of generative AI algorithms, GAN bly low rate of errors in AI results must always be based is a pioneer technology for direct operations with an im- on the objectifiable ability to recognise errors. Such an plicit density function. GAN enables the generation of new ability declines in general in the face of the increas- examples of a known ground structure without an explicit ing abstraction of the training data and application estimation of the underlying probability distribution. The fields. The distinction between AI results and so-called remarkable thing about this method is the quality of the ‘human common sense’ with respect to the consistency data generated, which often are no longer recognisable as and veracity of the results becomes increasingly diffi- faked. The method is, however, not yet of general use. Many cult, however, and can even negate a verifiability of the aspects of GAN are currently being researched. results, which will be shown in section 3.2 (GAN) below. The basic idea of GANs is to allow two competing neu- zzWhen choosing the training data, confidentiality ral networks, a generator and a discriminator, to attain requirements must be observed. This is because there is a game-theoretic equilibrium (Nash equilibrium). Using no way of ruling out a reconstruction of the exact input supervised learning, the discriminator learns to distinguish data from the ML System in addition to the desired ‘true’ elements of a known dataset from new elements abstraction/metadata. This applies especially when pro- generated by the generator out of statistical noise. During cessing personal data in light of the European General the course of the game, the generator receives information Data Protection Regulation (GDPR). about the methods used by the discriminator to sustain its ability to discriminate between true and synthetic ele- ments. The generator uses this information to improve the parameters of its own element-generating function. It is mathematically provable that, in the equilibrium, true and 22 3. NEW ATTACK VECTORS USING AI AND MECHANISMS OF DEFENCE

synthetic are no longer distinguishable for the discrimina- It stands to reason that this form of conceptual AI misuse tor. In the stable final state of the game, the probability that contemplated by the authors of the IDSGAN article is not an element is true – for both true and synthetic samples – limited to IDSs. Other forms of security systems for detect- is stated as 0.5. The generated elements possess attributes ing abnormal states in respect of data, system behaviour, as if they stemmed from the original dataset although or other similar disturbances are also threatened by GAN- they are not found in it. There are many mind-boggling based attack scenarios using similar patterns: As soon as examples in which GANs learn to generate images from the code of a discriminator is technically available or can handwritten numbers, pieces of clothing, furniture, faces, be observed in a suitable form as a black box in such a way paintings, or from sound sequences of specific musical that an imitation of it can be generated by a neural net- styles that appear ‘true’ to human beings. work, then the informational basis needed to deliberately deceive the security system exists. It is completely irrele- The highly acclaimed paper ‘IDSGAN: Generative Adver- vant here whether the system itself is based on AI or has sarial Networks for Attack Generation against Intrusion been conventionally programed. Detection’ [22] describes the new risks that intrusion detec- tion systems (IDS) are exposed to through GAN technol- GAN technology is used in such a case to improperly ogy. The basic concept is that the generator modifies the employ another principle, a topic that has long been the malware through the addition of noise without altering its subject of discussions on autonomous systems: A neural function. This modification process continues until a given network can learn to act autonomously if, for a sufficiently IDS no longer recognises the code written for the attack long period of time, it observes the activities of an individ- and therefore refrains from initiating a defence mecha- ual to be imitated and has the same external information nism. The difficulty that attackers have to overcome is that, that this individual has. Using this principle, it is very easy although an IDS can be licensed for use on the market, the to develop autonomous systems that learn to act from training mechanism of the GAN with this IDS in the role role models. This is not an attractive idea, however, for of the discriminator is not possible because of the inability systems such as autonomous driving systems, because the to observe its internal program mechanisms. Therefore the network would also learn to imitate the driver’s mistakes. IDS cannot function as the source of the information that But this is irrelevant in the case at hand, because it is only the generator needs to deceive the IDS. the learning process of the GAN discriminator that is being derived from the – seen as a black box – IDS. The momen- For this reason, an alternative discriminator is designed tary detection ability of the IDS is learned as a side effect. as a neural network that learns to imitate the activities of The quality of the ability acquired in this way is irrelevant, the IDS. The IDS delivers the labels ‘harmless’ or ‘danger- because all that IDSGAN is concerned with is finding a way ous’ from which the new discriminator learns to classify to selectively destroy this ability through deliberate decep- the malware generated by the generator as harmful. It is a tion. supervised learning process in which the IDS delivers labels on whose basis the weights in the discriminator network There are currently no promising universal methods for are being calibrated in a way that imitates the behaviour effectively combatting attack scenarios based on GAN of the IDS. This information – which is created in the new technology. It is therefore advisable to not blindly trust discriminator through it learning to behave exactly like the automated systems that detect anomalies and invasions of IDS that is regarded as a black box – can now be passed on malicious software. It is also advisable to individualise IDS/ to the generator. IPS using supplementary policies as comprehensively as possible in order to make black-box attacks more difficult, The gap in the flow of information for the GAN is thereby and to employ as many security mechanisms as possible. closed. The development of the imitated defence behaviour of the new discriminator, which receives its information about harmfulness from the IDS, enables the generator to attain the Nash equilibrium characteristic for GANs in the game against the IDS. It is now able to generate malicious software that the IDS regards as harmless and therefore refrains from defending. 23

4. Outlook 24 4. OUTLOOK

4.1 Concluding remarks zzOther foreseeable technology-related security risks related to AI: Quantum computing and the current pre- zzImpressive developments in 2016 have shown that with liminary stages of it: the help of the new discipline ‘AI cryptography’, novel forms of highly dynamic cryptographic procedures that Encryption is a core technology for IT security in almost no longer ad-here to any known crypto standards are all fields of application. The enormous increase in hard- possible. The method [23] learns how to protect commu- ware performance in the last 30 years has made it neces- nication using ‘adversarial neural cryptography’. More sary to lengthen cipher keys on account of the ability to details on this are found in the appendix. break codes through prime factorisation. The last years have witnessed the development of a whole new kind of zzIncreased risk exposure through foreseeable progress in computer system: quantum computers. These systems the area of AI in light of advancements in existing tech- are based on the effects described by quantum mechan- nologies: ics, whose existence was disputed for a long period of time. It has become possible (at least rudimentarily) in The dramatic improvement of the performance and the meantime, albeit with a great deal of effort, to build price/performance ratio of hardware designed for computer systems based on such effects and with which numerical computation beginning in 1995 was one of the existence of these effects has become provable. Such the main success factors that made progress in the area systems can only exist in a temperature close to abso- of AI possible at all. This was not, however, a develop­ lute zero (a few millikelvins) and in an environment free ment aimed at artificial intelligence itself. It was a waste from outside influences (for example tremors of any product left over from the development of the new kind). At this point in time, they can only function for performance requirements for the booming market of short periods of time and with high rates of error. 3D graphic systems. The numerical computation for the coordinate transformation to the interactive manip- zzThe problems that such systems can solve are very spe- ulation of 3D models was structurally very similar to cial in nature. These systems are based on the concept the computational performance requirements in the of qubits, which are capable of having the value of true tensor mathematics of neural networks. Supercom- and false simultaneously, but which remain constant in puter performance based on GPUs (graphics processing groups. Prime factorisation and combinatorial optimi- units) was fundamental to the spectacular success of sation are problems that can be mastered on such sys- machine learning. There is now a large market for appli- tems. Today’s quantum computers are only able to reach cation-specific hardware in the area of tensor processing capacities of a few qubits. Only very small problems can units, with new data formats and new computational be addressed with them. But larger systems are expected architectures. These have reached the market in 2019 to be realisable within the next few years, systems that and are profoundly accelerating learning and inference will be capable of solving large problems at nearly processes and making them increasingly cheaper. Par- infinite speeds. These could pose an existential threat to ticularly the large cloud mega-computing-centres are in today’s encryption methods. Except for the use of con- fierce competition with each other to offer such perfor- siderably longer cipher keys, no successor technology is mance, which makes it widely and inexpensively availa- on the horizon. There is also no plan for the upgrading ble in almost unlimited volumes. of existing security systems.

Only the implementing of existing procedures – both positive and negative – will profit from this. A new class 4.2 Recommended actions of infrastructure will also become available on which the efficient development of modern ML procedures can be General furthered. The training periods needed for complex neu- ral networks will shrink from weeks to hours. Parameter zzStructured cooperation in the area of industrial security figures will increase from the three-digit million range between the operators, manufacturers, and integrators to the two-digit billion range, thereby exceeding the is much more important for exhibiting resilience to capacity limits of biological brain cells. new kinds of attack methods than has been presumed 4. OUTLOOK 25

up to now. The effectiveness of classic IDS/IPS is being critically reviewed in terms of its technical suitability immensely compromised by the attack methods. This and in terms of legal aspects (intellectual property). involves the supplementary upgrading of security tech- nologies during the transition from Industrie 3.x to zzAI-based attacks can Industrie 4.0 environments, and the development of ––be based on patterns (images, language/speech, text, better defence strategies against these maximally inva- etc.) deliberately designed to deceive detection systems, sive forms of attack in the case of those in the 3.x pro- ––understand, through learned imitation, how to tem- duction sector who are not currently planning a transi- porarily bypass security systems, and tion. Along with the former security issues concerning ––be used on security systems of all kinds, AI-based as the securing of supply chains now come the threats well as conventional systems. and defence mechanisms discussed above, all of which must be brought to the attention of those in positions of Security systems or defence mechanisms should accountability. therefore comprise multiple, independent measures in order to increase the probability of detecting delib- zzIntegrators are delivering novel forms of AI assistance erate deceptions. But in light of the progress being in the form of their particular facilities and machinery, made in AI-based attack methods, residual risks still relying thereby on the integrity of the products deliv- remain. These can only be combatted with the help of ered to them. Criteria must be developed to enable the qualified and specialised personnel. determinability and measurability of the quality of the AI training data used and its relevance in the particular case. For manufacturers zzThe educating of the participants must be given high priority. What is needed here is the development of zzWhat is needed here is core knowledge in the develop- expertise in assessing the usability of open source soft- ment and usages of AI assistance. Intensive educational ware and open source training data. The certification measures and keeping constantly abreast of current of open source material by trustworthy institutions can research developments are essential elements in light of reinforce this process. the high speeds at which innovation is happening today.

zzIt is crucial for businesses to understand just how For operators important training data is for their success. zzAI assistance is fundamentally a positive performance element. But with every implementation, it must always For governments be possible to identify the statements created via AI assistance. This allows the verification of such state- zzWherever it is needed, governments should ensure that ments using other means and therefore the uncovering small and medium-sized enterprises are provided with of any deceptions of the AI (see the Panda example or adequate information on the opportunities and risks of the passport control at airports). The verification of AI artificial intelligence in industrial security. The SMEs decisions could be made with another system that is themselves are, however, ultimately responsible for this. provably based on an orthogonal metric. zzConsidering the far-reaching political scope of security zzFor example, the ability to externally influence image and industrial issues, suitable support measures should recognition during quality assurance tasks should be be initiated to promote cooperation between operators, prevented through the use of appropriate, organisational manufacturers, and integrators and should be tailored measures. The wearing of eye glasses should be prohib- especially to their security requirements. They could ited in access controls. work together to come up with suitable ways of com- batting the new threats, could test these out in common zzAI assistance should be subjected to special quality con- field experiments, and could come up with practical trols. Particularly the data used for training should be applications for them. 26

5. References

[1] Rumelhart, D . E ,. Hinton, G . E ,. and Williams, R . J . (1986b). Learning representations by backpropagating errors. Nature, 323, 533–536.

[2] LeCun, Y ,. Jackel, L ,. Boser, B ,. and Denker, J . (1989). Handwritten digit recognition: Applications of neural network chips and automatic learning. IEEE Communications Magazine, 27(11), 41–46.

[3] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”. arXiv:1406.1078

[4] Hochreiter, S ,. Schmidhuber, J . (1997), Long short-term memory, Neural computation 9 (8), 1735–1780.

[5] Jouppi, Norman et al . (2017), In-Datacenter Performance Analysis of a Tensor Processing Unit, ACM Digital Library.

[6] Ian J . Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014), Generative Adversarial Networks, arXiv:1406.2661 stat.ML.

[7] www.tagesspiegel.de/themen/reportage/kuenstliche-intelligenz-toyota-feuert-die-roboter/23821418.html

[8] www.bsi.bund.de/DE/Publikationen/Lageberichte/lageberichte_node.html

[9] www.vdma.org/documents/15012668/22538766/Grafik_PI_Industrial_Security_2017-11-29_1512390672976.pdf/ b94c55dc-5b8f-44f1-ad03-1b7628499e21

[10] www.plattform-i40.de/I40/Redaktion/DE/Downloads/Publikation/sichere-kommunikation-i40.pdf?__blob=publica - tionFile&v=6

[11] www.plattform-i40.de/I40/Redaktion/DE/Downloads/Publikation/sichere-unternehmensuebergreifende- kommunikation.pdf?__blob=publicationFile&v=10

[12] www.plattform-i40.de/I40/Redaktion/DE/Downloads/Publikation/sichere-identitaeten.pdf?__blob=publication- File&v=11

[13] https://securityintelligence.com/deeplocker-how-ai-can-power-a-stealthy-new-breed-of-malware/

[14] Anh Nguyen et al ,. “Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images”, https://arxiv.org/abs/1412.1897

[15] Mahmood Sharif et al ,. „Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition“, www.cs.cmu.edu/~sbhagava/papers/face-rec-ccs16.pdf

[16] Kevin Eykholt et al ,. „Robust Physical-World Attacks on Deep Learning Models”, CVPR 2018, https://arxiv.org/abs/1707.08945

[17] Ian J . Goodfellow et al ,. “Explaining and harnessing adversarial examples” https://arxiv.org/pdf/1412.6572.pdf

[18] www.blog.google/products/translate/higher-quality-neural-translations-bunch-more-languages/

[19] https://vimeo.com/132462576 5. REFERENCES 27

[20] https://deepl.com/

[21] Nicholas Carlini et al ,. „The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets“, https://arxiv.org/pdf/1802.08232.pdf

[22] Zilong Lin, Yong Shi, Zhi Xue (2018), IDSGAN: Generative Adversarial Networks for Attack Generation against Intrusion Detection, arXiv:1809.02077 cs.CR

[23] Martın Abadi and David G . Andersen: https://arxiv.org/pdf/1610.06918v1.pdf, 2016.

[24] http://deeplearning.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/ 28

6. Appendix

6.1 Example: Border controls using AI More than 200 eGates are in use in Germany today and more than 1000 in Europe. More than 130 countries are The way these work is that the person’s face is first stored now issuing biometric passports and more than 50 coun- as a data record in the electronic passport, followed by tries are using automated border controls, mostly at air- two flat fingerprints, which passports contain beginning ports but at country borders as well. More than 700,000 in 2009. In addition to this, automated border controls travellers cross the border between China and Hong Kong (so-called eGates) with previously stored iris image data or Macau each day, predominantly via eGates. Even the were tested at Frankfurt Airport in 2008 with volunteer bridge between and Malaysia counts more than users in the EasyPass Registered Traveller Programme 100,000 border crossings daily, most of which are per- of the German Federal Police. After 2013, these were formed using biometric recognition. These systems allow exchanged with facial recognition systems and installed in for significantly speedier workflows and for the elimina- four other major German airports. Four key requirements tion of the factor fatigue on the part of human workers. can be met with these: While a border guard is able to recall only around 10 to 20 pictures of wanted persons, computers can store and pro- zzrepetitive, monotonous tasks can be performed by cess several thousand image data. computer-assisted systems in consistent quality, Since 2013, all patients in Turkey are being authenticated, zzthe speed with which the work can be executed using in hospitals for example, using the pattern of the veins a computer-assisted system is significantly faster than in the palms of their hands. This can be used not only as when performed by a human being, way of verifying persons but also to prevent mixing up the patients in large hospital complexes. zzin comparison to human beings, computer-assisted sys- tems are able to register even the tiniest discrepancies in the biometric elements, and 6.2 Explanation of ‘Learning to protect communications with adversarial neural zzin conjunction with a database, computer-assisted sys- cryptography’ tems can protect the biometric elements from being accessed by unauthorised parties. In the Brain Project, two instances of artificial intelligence were created. After the conclusion of a training phase, they These key requirements from the area of identification and created their own secure crypto system to communicate authentication processes can be transferred to the pro- with each other. They were thereby able to defend them- cesses of Industrie 4.0, which would mean that, with the selves against attacks from a third AI instance, even though help of AI, the third party was able to ‘listen’ to this communication from the very start. The model is based, among other things, zzmonotonous and repetitive tasks could be performed on an optimisation pursuant to SGD (stochastic gradient at an at least consistent level of quality, descent) [24]. zzclock speeds and therefore productivity could be In the underlying technology, neural networks learn to increased, use secret cipher keys to keep other neural networks from obtaining information and to ensure confidentiality vis- zzquality could be improved through a more reliable à-vis adversaries. In this technology, the neural networks recognition of product attributes, and are not prescribed any specific cryptographic algorithms but are trained using end-to-end recognition. The neural zzintellectual property could be protected by preventing networks learn fundamental forms of encryption and third parties from obtaining access to crucial product decryption as well as uses for achieving confidentiality attributes. goals. After around 7000 communication steps, the two AI external control centres, then a firewall, in the form of a instances synchronised themselves so well that the third AI masquerade, can be shown a valid protocol in which AI-­ system achieved no more successful hits, neither by using encrypted information is presented as syntactically valid pretrained DNN nor by guessing. After a further 6000 com- content and which the IDS/IPS system does not recognise munication steps, the researchers were of the opinion that as an attack or does not decrypt. The same applies, for an external instance would no longer be able to ‘crack’ the example, to new hash functions that can only be generated permanently changing security criteria. It is uncertain at and verified by the two participating AI instances or that this time whether this form of AI cryptography can be clas- do not have to adhere to any known format. It can already sified as strong cryptography according to the classification be presumed that, similar to GAN, this form of ‘proprietary currently in effect. Therefore before this technology can be security’, has a great deal of potential for development in marketed in the certifiable, commercial area, new testing certain sectors. and certification schemes must be developed.

Estimation: Independent of a controlled or controllable use, the ability to use it in an uncontrolled manner already exists and is a challenge for those responsible for security. As soon as malware begins to use this internal commu- nication method with its own components, such as with

AUTHORS Markus Heintel, AG | Dr Detlef Houdeau, AG | Dr Wolfgang Klasen, Siemens AG | Dr Bernd Kosch, Technology Solutions GmbH | Dr Michael Schmitt, SAP SE | Thomas Walloschke, Fujitsu Technology Solutions GmbH | Dr Thomas Wille, NXP Semiconductors Germany GmbH

This publication is a result of the sub-working group of Working Group 3 of the Platform Industrie 4.0 (Security of networked systems). www.plattform-i40.de