ALICE security research site at University of Frankfurt

9th ALICE Tier-1/Tier-2 Workshop

Andres Gomez Ramirez, Udo Kebschull IRI - Goethe University Frankfurt ALICE UF Grid site

[email protected] UF Grid Site

➢ Security research, not production data processing. ➢ PhD thesis: “Deep Learning and Isolation Based Security for Intrusion Detection and Prevention in Grid Computing”. ➢ 5 Ubuntu 16.04 nodes. ➢ Centos 6 containers. ➢ CernVM-FS installed in the hosts and shared as a volume inside Docker containers.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 2 Motivation: Grid Security Challenges

➢ Users can execute any application: arbitrary code execution by design. ➢ Payloads are frequently executed directly on host . ➢ Network sections are shared. ➢ Hundreds of thousands of jobs running simultaneously. ➢ Expensive to have many security experts monitoring the Grid. ➢ Similar to Cloud computing.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 3 Proposed Solutions

Containers to execute payloads. ➢ Network isolation with virtual networks. ➢ Isolation to extract better payload behavior data from the host and from the network. ➢ Automated Intrusion Detection and Prevention. ➢ Deep Learning to enable this automation.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 4 Security by Isolation: Linux Containers

/ Container / Container / Container / /boot /boot /boot /boot /dev Squid /dev rrdTool /dev Nagios /dev /etc /etc /etc /etc / /kernfs /kernfs /kernfs /home /home PHP /home Perl Python /home /media /media /media /media /opt /opt /opt /mnt /proc /proc /proc /opt /run Apache /run Cherokee /run Lighttpd GStreamer core /proc /srv /srv /srv /root /sys /sys /sys /run /usr MySQL /usr MariaDB /usr Drizzle /srv /var /var /var /sys /usr -nspawn /var d systemd user D-Bus n PulseAudio systemd logind journald networkd IPC u

PID1 session o daemon s e v i s u

l System Call Interface (SCI) c x e

Render Memory Virtual File Network Nodes & KMS scheduler manager System interface DRM

System Graphics Display Hardware CPU Storage Network GPU RAM RAM controller

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 5 Grid Job Execution and Network Isolation

Worker node Worker node

Pilot job Bob's grid Alice's grid Bob's grid job job Virtual network job Container Pilot Pilot Container Eve's grid job job Alice's grid job Eve's grid job job

Container Pilot job

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 6 Behavior Monitoring for the Grid

Working Node

Bob's grid Alice's grid Job Job Virtual network Container Pilot Pilot Container Classification: Job Job • Normal Eve's grid Job • Malicious

Container Pilot Job Hidden Input Output

Grid jobs word2vec monitoring embedding data vectors

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 7 ALICE Grid Security Monitoring

➢ Linux Containers ➢ Deep Learning ➢ Convolutional Neural Networks ➢ Recurrent Neural Networks ➢ Generative method for improving training ➢ Grid Jobs – normal vs malware

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 8 Grid job classification with Convolutional Neural Network Trace log

Word2vec

Output class Input

Dense connected layer with dropout and softmax single output word embedding Convolutional layer vectors multiple filters

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 9 Training data generation with Recurrent Neural Networks

Input layer x 1 x 2 ... x N

IO W W HH

Hidden layer h 1 h 2 ... h M

W HO

Output layer o 1 o 2 ... o L

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 10 Arhuaco: proof-of-concept implementation

Administrator Worker node workstation

Grid job Grid job

Virtual network Container Pilot Pilot Container job job

Grid job

Container Pilot job

Source Source interface 1 interface 2 VoBOX Worker node (Sysdig) (Bro) Monitoring Execution Grid job Grid job engine interface Virtual network Virtual network Container Container Pilot Pilot Container Feature Distributed job job container Eve's grid extraction job engine Classification Generation Pilot job

Response engine

Worker node Arhuaco

Grid job Grid job

Virtual network Container Pilot Pilot Container job job

Grid job

Container Pilot job

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 11 Arhuaco: proof-of-concept Implementation

➢ Linux Containers: Docker, Docker Swarm ➢ Deep Learning: Keras, Theano, TensorFlow, Python 3. ➢ Data collection: System Calls - Sysdig, Network connection - The zeek Network analysis tool. ➢ Grid Middelware - ALICE AliEn.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 12 Evaluation: Grid Jobs vs Malware

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 13 Evaluation: Grid Jobs vs Malware

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 14 Results: Impact of Isolation over performance

1600 Grid Jobs in total

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 15 Results: Grid job classification – normal vs malware

➢ CNN: Convolutional Neural Network ➢ SVM: Support Vector Machine ➢ FPR: False Positive rate ➢ ACC: Accuracy

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 16 Results: training data generation results

➢ TPR: Sensitivity or True Positive Rate ➢ SPC: Specificity ➢ FPR: False Positive rate ➢ ACC: Accuracy

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 17 Conclusions

➢ Docker containers can be used to isolate and extract behavior information from Grid jobs without big performance impact. ➢ Deep learning is highly effective to identify “malicious” Grid jobs. ➢ CNNs with Word2Vec preprocessing provides improved accuracy than traditional SVM. ➢ Synthetic generated data can improve the training process.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 18 Future work

➢ Thesis submitted. ➢ Exploring options: Arhuaco as source and/or commercial tool. ➢ UF Grid site probably will not continue operations. ➢ Future research topics of interest: differential privacy, adversarial machine learning training, privacy preserving intrusion detection.

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 19 Thank you!

Questions? Appendix: CNN results

System Calls Network connections

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 21 Appendix: System call results

CNN CNN vs SVM

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 22 Appendix: Network traces results

CNN CNN vs SVM

Andrés Gómez – 9th ALICE Tier-1/Tier-2 Workshop Slide 23 CERN Security Operations Center

Andrés Gómez – Frankfurt University Slide 24