Gonzalez Vijay Chidambaram Jivko Sinapov Peter Stone University of Texas at Austin

CC-Log: Drastically Reducing Storage Requirements for Robots Using Classification and Compression Santiago Gonzalez Vijay Chidambaram Jivko Sinapov Peter Stone University of Texas at Austin Abstract cal storage or streaming data as it is produced to a remote server are not feasible solutions. Sub-sampling the sen- Modern robots collect a wealth of rich sensor data during sor data at a fixed reduced rate runs the risk of missing their operation. While such data allows interesting anal- data essential for debugging or analysis. ysis and sophisticated algorithms, it is simply infeasible to store all the data that is generated. However, collecting We tackle this problem in the context of the Building- only samples of the data greatly minimizes the usefulness Wide Intelligence (BWI) project at the University of of the data. We present CC-LOG, a new logging system Texas at Austin [4]. The robot associated with the built on top of the widely-used Robot Operating Sys- project, the BWIBot, is used for research for human- tem that uses a combination of classification and com- robot interaction, multi-robot interaction, and planning pression techniques to reduce storage requirements. Ex- in cohabited environments. The BWIBot has sensors periments using the Building-Wide Intelligence Robot, a such as the Kinect and the odometer which produce ten mobile autonomous mobile platform capable of operat- to hundreds of MB of data per minute. ing for long periods of time in human-inhabited environ- We present a system, CC-LOG, that combines event ments, showed that our proposed system can reduce stor- classification and compression to drastically reduce the age requirements by more than an order of magnitude. storage used for logging (x3). First, instead of logging Our results indicate that there is significant unrealized all events, we use a Support Vector Machine (SVM) to potential in optimizing infrastructure commonly used in classify events as anomalous or not, and only log anoma- robotics applications and research. lous events (along with a few events preceding and suc- ceeding the anomalous event). Second, we leverage the 1 Introduction high inherent redundancy of the JSON format of the logs, using well-known loseless compression schemes such as In recent times, we have witnessed an unprecedented LZMA [7] to reduce the size of the logs significantly. surge in sensors and IoT devices. This has been accom- We use the Gazebo robot simulation framework [8] to panied by an exponential increase in the amount of data collect data from several runs of the robot (as it is lo- being generated. Sensors often generate small amounts gistically difficult to collect enough data on the shared of data, but at high frequency (e.g., at 100 Hz). Prior re- physical robot to evaluate our system). Compression research has looked at how to efficiently store and process duces the logged data to 10% of its original size, and these streams of data in sensor networks [1–3]. our classification scheme our event classification scheme However, one such source of sensor data has been further reduces the amount of data logged (x4). In our largely unexplored: robotics. Modern robotic platforms evaluation, the classifier does not produce any false neg- seek to understand natural language commands and per- atives (i.e., missing data we want to log). CC-LOG was form a wide variety of tasks. To achieve this, robots in- developed as a package in the widely-used Robot Op- clude a large number of sensors. For example, structured erating System [5] to facilitate integration into different infrared depth sensors (such as the Microsoft Kinect) and robot platforms. LIDAR scanners are used for localization and naviga- We believe our work on CC-LOG is a strong indica- tion in a number of robots [4]. Other sensors used in- tor of the untapped opportunities in the field of robotics. clude rotary encoders, inertial measurement sensors, and Analyzing robotics infrastructure from a systems view- standard cameras. Such sensors produce data frequently point can yield significant improvements in efficiency (e.g., depth sensors produce 30 GB in 15 seconds), and (x6). Sadly, the different areas of research in computer logging the sensor data allows post-hoc analysis and de- science have become silos, with very little cross-over of bugging [5, 6]. ideas. By presenting this work in HotStorage, we hope to However, the dizzying rate of data production poses alleviate this problem a little by showing there are inter- a problem for limited storage and short battery life of esting systems challenges and opportunities for fruitful robots. Due to these restrictions, logging all data on lo- collaboration in robotics. 2 Background LOG accomplishes this through the use of a unique slid- ing window structure and a classifier. We first provide some background on the BWIBot and Anomalous events typically include the sudden and the Robot Operating System. The BWIBot version extreme acceleration or sudden twisting to a large degree. 2 (also called the Segbot) contains a desktop com- Note that both acceleration and twisting are not anoma- puter powered by an Intel i7-4790T/i7-6700T processor, lous in the general case; only when they deviate from placed in HD-Plex H1.S fanless case, with 6 Gigabit normal behavior. Since anomalous events typically are Ethernet Network Interfaces, along with 4 USB3 and 2 not instantaneous, CC-LOG takes recent history into ac- USB2 interfaces. A 20 inch touchscreen is mounted at count when classifying data. Additionally, while anoma- a human-operable height to serve as the primary user in- lous events are interesting in and of themselves, it may terface with the robot. The BWIBot is equipped with be desirable to save data for a period of time following several sensors including a PointGrey BlackFly GigE the event. Such data can help roboticists see how, and if camera and the Kinect. The BWIBot is built on top of the robot recovers from anomalies. the Segway RMP 110 mobile base which, coupled with We first describe how anomalies are detected, how Simultaneous Localization and Mapping (SLAM), pro- we use windows to log anomalous events, and how log vides sufficiently accurate odometry estimates for robust events are compressed. navigation. The robot is controlled and programmed using the Anomaly Detection. To classify feature vectors as Robot Operating System (ROS) [5]. ROS is structured as anomalous or nominal, we elected to use a one-class a group of communicating processes called nodes. While Support Vector Machine [9] (SVM) with a nonlinear ra- nodes need to communicate with each other using a com- dial basis function (RBF) kernel [10]. SVMs function by mon format, each node can be written in a different lan- attempting to find the maximally-separating hyperplane guage. This allows developers to easily extend ROS in between two sets of data using linear programming tech- the language of their choice. Nodes and sensors can send niques. Since most datasets are not linearly separable, messages to a topic that other nodes can subscribe to. kernel functions can be used to map data into a high- Typically, ROS processes the raw data from sensors and dimensional feature space. In our case, we have chosen publishes a topic with formatted sensor data, which other to use a Gaussian RBF kernel. One-class SVMs, specif- nodes consume. For example, the odometry sensor topic ically, are a popular distribution density estimation tech- publishes several KB of data every 10 milliseconds (100 nique; largely due to their simplicity and wide applica- Hz). The raw odometry data is processed by ROS and bility to different domains. Within the CC-LOG system, published in JSON format. The JSON format has a lot of we use the Scikit-learn library [11] (built atop libsvm) to redundancy that increases the storage requirements. provide SVM functionality. Machine learning classifiers require a feature vector: a vector of values that have been a meaningful dependence 3 Design on the set to be classified. For example, for classification based on odometry data, the x-component of the robot’s velocity is a useful feature. Feature selection is impor- Goal. In this work, we focus on efficiently logging the tant in many types of machine learning. A sub-optimal BWIBot odometry data (but our techniques are general selection of features can result in an inaccurate classifier. and will work for other sensor data as well). We con- Feature vectors are constructed as a concatenated array sider that the logging data is used mainly for debugging of recently seen features. and analysis that considers anomalous events to be of in- terest. We note that sometimes the log data is used to CC-LOG uses pose position, pose orientation, linear obtain statistics about the robot (e.g., how many kilome- twist, and angular twist data for our feature vectors. We ters has the robot covered over the past year?). For such deliberately limited the size of our feature set to reduce cases, other techniques (such as statistical sampling) can the size of the training and testing datasets. Selecting be used so that samples are logged to allow statistics ag- feature vectors happens at setup time, even before the gregation with enough accuracy; we do not handle it in first event has been logged. this work. Logging Windows. To help in debugging and analysis, We present CC-LOG, a new logging module in the CC-LOG logs the events around each detected anomaly Robot Operating System [5] that drastically reduces the (i.e., what happened just before and after the anomaly storage requirements for logging sensor data. At a high was detected). To implement this, CC-LOG uses a log- level, CC-LOG processes data from subscribed topics, ging window.

Load more