Classifying the Behaviors of Boid Swarms

Zihao Liu, Ryan McArdle, Gianni Orlando 9 December 2020 Institute for Artificial Intelligence University of Georgia Athens, Georgia, USA [email protected], [email protected], [email protected]

I.A BSTRACT force that is calculated against each of its neighbors. To achieve alignment, the boid calculates a force which will adjust First presented by Craig Reynolds in 1987, a boid is a class its velocity towards the average velocity of its neighbors. To of bird-like computational objects intended to simulate the achieve cohesion, the boid calculates a force that moves it flocking behavior that can be found in a number of animals. towards the centroid or average position of its neighbors. The While the concept of has been well established and behavior which emerges from an appropriate balance of these implemented for some time now in order to generate simu- forces allows a of boids to simulate natural flocking lations of flocking swarms, the project of classifying swarms behavior in a tractable way, allowing one to define behavior of animals and classifying when they are exhibiting certain over the entire class rather than for each instance. The boid behaviors is a newer one, without as much success as of definition is also reliable in that it generates behaviors that yet. We utilize the Swarm Behavior Data uploaded by other distributive methods fail to achieve, such as allowing researchers at the University of New South Wales to the UCI a swarm to split to avoid colliding with an obstacle and Machine Learning Repository in order to train a classifier to recombine once they have passed the obstacle. recognize swarms behaviors labeled as Aligned, , and While boids have been well established for some time Grouped, with the hopes that classification of boid behavior now in order to simulate flocking swarms, the classification will translate nicely into the classification of their counterparts of swarms of animals to identify when they are or are within the animal kingdom. We explore logistic regression, not flocking is a newer task without as much success as random forest, support vector machines, and neural networks. of yet. Although the community has long understood that We find that classifying the Aligned and Grouped behaviors simulation of flocking is achieved through the alignment is rather trivial and narrow our focus onto the single-label and cohesion of individuals, little work has been done to classification problem with respect to Flocking. Noting issues distinguish these characteristics in observation [2]. Much of of generalizability along the way, we transform the data set in the modern analytical work concerning coordinated animal such a way that models trained on the set should more readily movement focuses on dissecting individual roles, such as the and reliably be able to classify other examples of swarm relations of individuals to influential neighbors, rather than behavior. We find the greatest success with the neural network the exhibition of macroscopic behavior [3]. Researchers at the approach trained on our transformed data set, obtaining a University of Tokyo have recently touched on the difficulty maximum accuracy of 88%. We recognize that this seems to of classifying flocking behavior, especially in large groups, as be a theoretical ceiling for the accuracy on this dataset due to flocks tend to “generate and collapse spontaneously,” allowing the nature of its creation and suggest that the problem with for various modes of flocking [4]. classifying Flocking may be an issue of proper collection of The goal of this project is to develop a classifier which training data rather than optimization of an effective model. is reliably successful at classifying three different behaviors of swarms of boids, formalized in our dataset as ‘Aligned,’ II.I NTRODUCTION ‘Grouped,’ and ‘Flocking,’ with the hopes that classification of these boid behaviors will translate nicely into the classification First presented by Craig Reynolds in 1987, a boid is a class of of their counterparts within the animal kingdom. While these bird-like computational objects intended to simulate the flock- classifications come from a survey with many participants, ing behavior that can be found in a number of animals such discussed further in section III. Data Set, one can roughly birds, , fish, and humans [1]. The distributed behavioral understand ‘Aligned’ to describe boids with matching veloctiy, model means that each instance of the boid class that is within ‘Grouped’ to describe boids collected together spatially, and the totality, which we will refer to as a swarm in order to avoid ‘Flocking’ to describe some more complex concept that may confusion of terms, is determining its behavior individually not be as readily summarized, which implies that the boid- at each timestep. Each boid acts only upon its immediate swarm is behaving as one might expect a flock of natural surroundings and neighbors in order to achieve a number of animals to behave. goals. To achieve separation, the boid averages a repulsive In the sections that follow, we discuss the data set as Scalar Value Statistics provided to us and explore initial models including logistic re- Attribute Min. Max. Mean Std. Dev. Median gression and random forests. Finding reasonable performance Speed 1.99 20.00 9.16 3.55 8.74 and indentifying a significant concern with the generalizability Num. Align./Coh. 0.00 171.00 23.68 31.38 9.00 of the model, we focus our classification problem onto the Num. Sep. 0.00 108.00 1.79 6.43 0.00 ‘Flocking’ target label and perform a statistical transformation TABLE I: Statistics describing the scalar quantities included on the data set. This transformation reduces the overall size in the data. Speed has been calculated for human evaluation of the data and model while also allowing us to represent the of the data set and is not used in the training data. problem in a way that allows other swarms to be classified by our models as well. We explore the use of support vector machines and neural networks on this reduced data set, finding attribute. This attribute is unused in the training of the models, similar success and identifying an apparent limit to the accu- as the is inherent within the velocity vectors, but racy of classificiation on the ‘Flocking’ behavior provided by is presented here for the sake of human understanding. We this data set. see that the speed of the boids ranges from ∼ 2 to 20 spatial units per timestep. The mean of 9.16 and standard deviation of 3.55, along with a median not far from the mean, suggest III.D ATA SET that the speed of the boids is near normally distributed over In order to train our classifier, we utilize the Swarm Behavior this range without any significant outliers. This is what one Data Set uploaded by researchers at the University of New might expect for the speed of individuals and serves as a sanity South Wales to the UCI Machine Learning Repository [5], check that the data set represents at least relatively normative [6]. The data set provides three different .csv files which swarm behavior. contain the same 24, 016 temporal instances of boid-swarms, We next consider the number of boids within the radius with each .csv classifying for one of the three behaviors in of alignment and cohesion, which is involved in determining the associated class label. These class labels were determined the magnitude of the forces which drive the boids to align by a publicly available survey in which participants use three and group. We see here a much greater standard deviation sliders to rate the extent to which they consider a short sample of 31.28 among the data points, with the median value 9.00 of a swarm of boids to be flocking, grouped, or aligned [7]. being far less than the mean of 23.68. This suggests that there Our underrstanding is that these slider values are averaged are a number of outlier instances for which these numbers are over all participants, and the results are rounded to either 0 or abnormally high, which we will indeed see in subsection III-C. 1 for negative and positive labels, respectively. Class Values, where we plot sample instances of the swarm There are 2, 400 attributes for each instance; 12 attributes behaviors where boids aggressively congregrate on a small each for the 200 boids in the swarm. For each boid, the handful of points. attributes include a position vector, a velocity vector, an The number of boids within the radius of separation shows alignmnent force vector, a separation force vector, a cohesion a somewhat similar distribution. This number is involved in force vector, the number of nearby boids within the align- determing the force that prevents the boids from colliding into ment/cohesion radius, and the number of nearby boids within one another and attempting to occupy the same space. We see the separation radius (each vector is split into x and y di- that the mean value 1.79 is quite near the minimum of 0.00, mensions and thus accounts for two attributes). Unfortunately, but the standard deviation of 6.43 suggests that there is a skew these instances are sorted by the x-value of the first boid, and present from some instances that are much greater than the as such, all temporal information relating the instances or their mean, especially coupled with the recognition that the median ordering has been lost. While this is helpful for being able to value is 0.00. This also suggests the instances of aggressive feed arbitrary instances of swarms to a classifier for training, it congregation reference above and discussed later. limits our ability to observe the behavior of the swarms linearly in time or implement any time-series classification methods. B. Vector Values Here we further discuss some statistics of the data set. In We now consider the vector data, for which each vector is this section, the presented statistics consider the associated presented as both an x and a y component both in the data attribute over every instance and every boid, e.g., the mean set and in our analysis. number of boids within the separation radius, 1.79, shown in Table I, is the average of roughly 4,800,000 cells (1 attribute We first note that the values of a given statistic for each axis × 200 boids × ∼ 24, 000 instances). of an attribute are quite comparable and therefor have little dependence on the axis, i.e., the boids’ motion and relevant forces both exhibit little dimensional preference. We also note A. Scalar Values that the mean and median in each axis of a given attribute We first consider the scalar attributes associated with each are relatively near zero, further supporting that there is no boid. preference for any direction within a given dimension. To paint a clearer picture of the data and obtain a better We now focus on the force vectores of alignment, sepa- understanding of the boids’ behaviors, we calculate a speed ration, and cohesion in each dimension. We address first the Vector Value Statistics Attribute Min. Max. Mean Std. Dev. Median distribution over their range of values and no preference for Velocity: x −19.95 19.83 −1.41 5.94 −1.13 any dimension or direction, except that the magnitude of the Velocity: y −19.99 19.78 0.22 7.69 −0.25 cohesion vector is roughly twice that (in range alone, not Alignment: x −1.07 1.10 −0.16 0.39 0.00 mean) of the alignment vector. However, it is possible that this Alignment: y −1.12 1.10 0.13 0.58 0.00 Separation: x −1.04e06 3.69e06 12.49 6.01e03 0.00 difference in magnitude may be corrected for in the weight Separation: y −1.05e06 4.19e06 20.14 7.69e03 0.00 given to each of these vectors by the boid simulation when Cohesion: x −2.68 2.68 −0.05 0.55 0.00 considering their effect on the acceleration of the boids. Cohesion: y −2.68 2.68 0.09 0.61 0.00

TABLE II: Statistics describing the vector quantities included C. Class Values in the data. We now turn our attention to the classes presented in the data. As there are three different classes provided by the data, mean and median of each of these attributes. Given a common we will first consider the data from the perspective of three Newtonian understanding of forces, each force on some boid different classification problems and then from the perspective should be matched with an equal yet opposite force on some of a multilabeled classification problem. In order to obtain a other boids for each of its dimensions. We see this behavior in more intuitive understanding of the class labels and what sort the median values, suggesting that there are (at least roughly, of behavior they are referencing, the reader is directed towards accounting for possible error) just as many positive values as Figs. 1-4 for a sampling of boid-swarm behaviors. there are negative, with the median value being some instance Single Class Statistics for which there is no force of the given type affecting the boid. Class Negative Positive Percent Positive This pattern is broken when we consider the mean values, as Aligned 16511 7505 31.25% Flocking 12008 12008 50.00% the same argument would lead one to expect a mean of 0.00 Grouped 15010 9006 37.50% for each dimensions. However, we propose that the relatively small deviations from 0.00 in these calculations are a result of TABLE III: Statistics describing the distribution of classes rounding errors accumulating over the 4.8 million cells, which included in the data. Most instances are classified as not would affect mean values but not median values. representing the behavior of concern. We next compare the three kinds of force vectors and The percentages for each class are based upon the 24, 016 immediately address the large disparity in the ranges of values. instances of data available. We note that, for each class, a While the magnitudes of the alignment and cohesion forces are majority (or almost a majority, in the case of flocking) of on the order of 100, the separation vectors can be on the order instances do not represent a positive instance of the class. This of 106. This is a very significant difference, and we consider seems reasonable, as one should not expect most instances to a couple of explanations for this disparity. represent the target emergent behaviors. The first explanation considers that the separation vector This information is useful, but when we conside the multil- is given much more weight than the other forces, which abeled classification problem, we are able to get a much richer could be reasonable for modeling the importance of avoiding understanding of the instances contained in the data. collisions in flocking animals. Given that there is only one force responsible for keeping the boids away from each other, Multi-Label Statistics as opposed to the two forces promoting gathering, one would Class Triple (A, F, G) Number of Instances Percent of Instances (0,0,0) 10316 42.95% expect that a increase in the scale of the separation force would (0,0,1) 264 01.10% be necessary. However, this does not seem to account for the (0,1,0) 3193 13.30% sheer magnitude of this disparity. (0,1,1) 2738 11.40% (1,0,0) 0 00.00% The other explanation is that there are simply massive out- (1,0,1) 1428 05.95% liers in the data for the separation vectors. There are a number (1,1,0) 1501 06.25% of instances of the data in which the boids are aggressively (1,1,1) 5476 22.80% congregrating on local cohesion target points, as though the TABLE IV: Statistics describing the classes included in the weights with which the separation and perhaps alignment data when considered as a 3-label problem, allowing for a vectors influence the boids’ behavior has been minimized. The deeper understanding of the relationships between classes. result of this weighting would be boids that are quite accepting Note in line 5 that ‘Aligned’ is never classified alone; another of getting very near to each other and calculating massive target, and often both, are always positively classified as well (though ultimately inconsequential) values for their repulsive (lines 5-8). Similarly, ‘Flocking’ is often associated with the separation vectors. A single-instance sample of this behavior other targets (lines 3, 4, 7, and 8), but noting line 3, it is not can be seen in Fig. 1. We suspect that this explanation gets to wholy dependent upon them. the root of the issue, and we consider these effects later in the work when we implement our transformation of the data set. We note that 42.95% of the instances exhibit none of the We otherwise note that the alignment and cohesion vectors behaviors for which we classify, while 22.80% exhibit all seem relatively well behaved, with a comparable sort of normal three. Notably, any instance for which the swarm is aligned, Fig. 1: A sample of a swarm labeled as Not Aligned, Flocking, Fig. 3: A sample of a swarm labeled as Aligned, Flocking, and and Grouped. The boids in this instance are onto 3 Not Grouped. These boids are flying in the same direction, but distinct points as if through gravitational attraction. They seem are relatively spread out over the space. to be disregarding the alignment and separation goals.

Fig. 4: A sample of a swarm labeled as Aligned, Flocking, and Fig. 2: A sample of a swarm labeled as Aligned, Not Flocking, Grouped. These boids are flying in the same direction while and Grouped. These boids are flying in a single file pattern. staying tightly compacted in one region of the y-axis. They seem to be ignoring the separation goal while focusing most heavily on alignment. A. Logistic Regression In the logistic regression case, we approach the problem as it also exhibits one of the other behaviors, and roughly 65% a single label classification problem, focusing on the class of of the time it exhibits both of the other behaviors. However, grouped behavior. After initial experimentation with a Logistic none of the available relationships between the classes seem Regression model, we find that this method very easily overfits definite enough that classification of any one of the classes the data set, providing near perfect accuracy. Highly suspicious can be made based upon classification of the other classes. of this performance, we decide to reduce the number of training instances available to the model, and settle upon a 5% IV.I NITIAL MODELING training set that is the size of the overall data, testing on the other 95%. In order to help mitigate the loss in generalizable For our intial exploration of modeling our data set, we imple- accuracy that comes with this reduction in training, we decide ment both logistic regression and random forest classification to undersample the most populated class, ‘Not Grouped,’ so methods. that the classifier is trained on an equal number of both positively and negatively classified instances. Following initial success with default parameters, we seek to improve performance through optimization of the regulariza- tion parameter and threshold cutoff value, finding the defaults to provide the preferred balance in performance.

B. Random Forest We next perform an initial test using a decision tree model on the single label classification problem, but quickly find that an F1 score of 1.0 was achieved. As this is a relatively simple9 formulation of the problem and these results provid no room for improving the model or gaining an understanding of the data and underlying behaviors, we expand both the problem and the employed model. We therefore approach the classification as a multilabel classification problem for which an instance of a swarm of boids is classified using the permutations of length-3 binary vectors for which each bit represents aligned, flocking, and grouped behavior and implement a random forest classifier. Because of the increased complexity of this problem, we use a more standard training set size of 80% of the data set Fig. 5: The ROC Cruve for a logistic regression over the and do not perform the undersampling. We perform two grid grouped data set. Noting the shape of the curve, the validation searchs, the first seeking to optimize the number of estimators value of the classifier is not very sensitive to the cutoff value. in the forest and the second to optimize the maximum depth and the minimum number of samples for splitting an internal node for the individual trees.

V.I NITIAL RESULTS We now present the results for our chosen model and their performance on the classification problem.

A. Logistic Regression We found our initial accuracy of 95% very promising and looked to improve the results still. We sought to test whether decreased regularization had any affect on the recall score, yet all values returned similar results with a decrease in performance around values of 100. We decided to continue with a strong regularization value of 0.01 for the remaining tests. We further experimented by manipulating the threshold values of the classifier and comparing the resulting confusion matrices, however due to a lack of significant improvement in accuracy (note Fig. 5), we decided upon a 0.5 threshold value. This resulted in a recall score of ∼ 0.99 and an F1 score of ∼ 0.96, and the confusion matrix can be seen in Fig. 6. Fig. 6: The confusion matrix for a logistic regression over the B. Random Forest grouped data set with a cutoff threshold of 0.5. Our intial 3-label decision tree model resulted in a reasonable 86% accuracy score, with a weighted average F1 score of 0.94 and a recall weighted average of 0.94. We felt these results first grid search found that 50 trees would be the optimal could easily be improved by expanding to the random forest size of the forest, which results in a significant increase in ensemble method and employing a number of decision trees. accuracy to 98.6%. Our second grid search returned a less An initial random forest with default parameters returned significant improvement in performance, however we found a slightly improved 88% accuracy score as expected. Our that the optimal values for depth and split parameters were 13 and 50, respectively, which resulted in an accuracy of 98.8%.

Fig. 9: The confusion matrix for the random forest on the grouped class. Note the perfect performance, without any false Fig. 7: The confusion matrix for the random forest on the positives or negatives. alignment class. Note the perfect performance, without any false positives or negatives. VI.I NITIAL DISCUSSION

We found that our random forest classifier outperformed the logistic regression classifier by a significant margin, even when approaching the problem through a more complex formulation. We correlate this improved performance to the increased emphasis on the influence of the features in the random forest model, where the logistic regression creates a simple linear relationship. It is interesting to note that scaling the data before passing into the models for training provided no impact on performance for either model. The significant range in the values used, particularly in the positional and separation vector data, seemed to have little ability to confuse either model. See Figs. 7-9 for the confusion matrices for each label, which reveal significant information about the problem and our classifier’s performance. We note that in the cases of both aligned and grouped behavior, our classifier was able to achieve 100% success in classification, and our classifier only erred in classifying flocking behavior. This suggests that aligned and grouped behaviors are in a sense less complex and easier to model than flocking. We propose that alignment can easily be classified for by considering the difference in velocity vectors between the boids or similarly the magnitude of the alignment force vectors. We also propose that grouping can easily be classified using the number of nearby boids or the Fig. 8: The confusion matrix for the random forest on the magnitude of the cohesion force vectors. We note that there flocking class. Note the presence of over 600 misclassified does not seem to be a similarly simple way to summarize instances. flocking behavior as a relationship between a small handful of attributes. A comment on the attributes found to be most influential speak less to an understanding of the flocking behavior and for our random forest, plotted in Fig. 10, seems in order. A more to an exploitation of the peculiarities of the training data single, seemingly arbitrary attribute was identified as roughly set. This exploitative tendency is amplified if care is not taken 4 times more important than the next most important attribute during training to limit the number of testing instances on for classification, with the rest trailing at a percentage of the which the model is fit. Further, the presence of an extreme importance. This attribute, ‘xC142,’ describes boid-142’s co- value for an attribute in a test classification instance could hesion vector along the x-axis. The particularity and seeming dramatically affect the classification if the model is overly arbitrarity of this attribute’s importance appears to be a quirk reliant on that attribute. of this dataset and therefore highly limiting for generalization In order to address these issues, we calculate statistical to other sets. Consequently, it may be important to reduce information about the swarm for each time-step instance, the dimensionality of the data set so that classification is rather than the information about each individual boid. We done on aggregate, macroscopic data (means of vector values then train our classifiers on this aggregate statistical data from and number of nearby neighbors, standard deviations of these a perspective that it is the behavior of the swarm as a unit values, etc.), rather than based on the data points of each that should be classified upon, rather than the totality of the individual boid. This should hopefully not rely on a single behaviors of individual boids. boid’s attributes so heavily, while significantly reducing the dimensionality of this data set and allowing for generalizability A. The Reduced Data to boid swarms of various sizes. For each instance of our data set, we take each of the kinds of Because of the discussed concerns regarding overreliance attributes, e.g. number of boids within separation radius, for on particular attributes and the limitations to generalizing the the 200 boids and calculate five statistics over the swarm for models, the remainder of the work seeks to further develop this attribute: minimum, maximum, mean, standard deviation, potential models with concerns of generalizability at the and median. We then save these calculated statistics as our forefront. We begin by manipulating the data set in such a new instance. As such, we reduce the number of attributes way that these individual attributes about particular boids are for each instance from 2, 400 (12 attributes times 200 boids) removed, while trying to maintain important information about down to only 60 (12 attributes times 5 statistics). the behavior of the flock as a whole. We then use this data to train more complex models in the hopes of improving our Original and Unreduced Swarm Statistics performance, particularly on the flocking target class. Attribute Min. Max. Mean Std. Dev. Median Speed 1.99 20.00 9.16 3.55 8.74 Pos.: x −1.42e3 1.42e3 −55.95 799.57 −106.83 Pos.: y −1.02e3 1.02e3 −6.44 561.49 −29.77 Vel.: x −19.95 19.83 −1.41 5.94 −1.13 Vel.: y −19.99 19.78 0.22 7.69 −0.25 Align.: x −1.07 1.10 −0.16 0.39 0.00 Align.: y −1.12 1.10 0.13 0.58 0.00 Sep.: x −1.04e06 3.69e06 12.49 6.01e03 0.00 Sep.: y −1.05e06 4.19e06 20.14 7.69e03 0.00 Coh.: x −2.68 2.68 −0.05 0.55 0.00 Coh.: y −2.68 2.68 0.09 0.61 0.00 # Ali./Coh. 0.00 171.00 23.68 31.38 9.00 # Sep. 0.00 108.00 1.79 6.43 0.00 Fig. 10: The top 10 most important features for the random TABLE V: Statistics describing the original representation of forest model trained on the original data. A seemingly arbi- the data, in which each boid is explicitly represented at each trary boid’s x-axis cohesion force is an order of magnitude timestep. This table combines the data from Tables I and II. more important than most other features for classification via decision tree. We present in Table VI a sample of statistics calculated over our newly reduced data to give an idea of the dimensionality reduction and how this changes the distribution of data. The statistics presented refer to the mean value columns of the VII.D ATA SET REDUCTION attributes listed along the left axis; similar tables could be In order to reduce the complexity of our models and improve produced for each of the other four statistic functions used to generalizability, we perform a dimensionality reduction on our reduce the data. data set. Rather than applying recursive feature elimination or Compared to the original statistics, reproduced in Table V, a similar feature selection method to identify which attributes we first note that the mean values in Table VI are the same, as are most useful for classification, we instead recognize that would be expected from the operations performed, i.e. taking a our data as presented is fundamentally not well suited for mean of means. Beyond this, there are notable changes to the generalizability. The data set’s reliance on information about distributions of the data. The ranges presented by the minimum individual boids allows for models to overfit on the behaviors and maximum values are significantly reduced, along with the of particular boids, which leads to a reliance on attributes that standard deviations. We note that by averaging over the entire Statistics Describing the Mean Values of Each Attribute Attribute Min. Max. Mean Std. Dev. Median Speed 0.00 11.71 5.82 2.80 6.23 Pos.: x −639.50 925.49 −55.95 197.92 −27.77 Pos.: y −573.79 525.04 −6.44 116.28 −3.83 Vel.: x −9.13 6.67 −1.41 2.95 −0.78 Vel.: y −8.01 10.86 0.22 5.57 −1.01 Align.: x −0.98 0.90 −0.12 0.31 0.00 Align.: y −0.98 1.04 0.13 0.53 0.00 Sep.: x −5.91e3 1.85e4 12.49 403.32 0.00 Sep.: y −5.19e3 2.13e4 20.14 544.53 0.00 Coh.: x −0.71 0.26 −0.05 0.13 0.00 Coh.: y −0.41 0.95 0.09 0.28 0.00 # Ali./Coh. 0.41 121.68 23.68 26.86 12.47 # Sep. 0.00 50.07 1.79 5.05 0.22 Fig. 11: The top 10 most important features for a naive TABLE VI: A sample of statistics describing the reduced data random forest model trained on the transformed data. Feature about the swarm. Presented (along the top) are various statis- importances are largely within one order of magnitude, and tics calculated over all time-steps about the listed attributes the model therefore is not exploiting any particular features. (along the side) averaged (mean) over all boids in the swarm.

A. Neural Network flock, we are unable to reach the same sort of extreme values We implement two different neural networks and train one on that could be achieved by just a single boid. This reduces the our original data set and the other on our reduced data set in influence of extreme outliers, which were noted as a potentially order to judge the efficacy of the data reduction. concerning aspect of the original data set that could also negatively impact generalizability. However, the influence of We initially experiment with the possibility of applying these outliers is not fully removed, as they will still have an convolutional and dropout layers to the original data set impact on the standard deviation, minimum, and maximum in search of signs of improvement. Our expectation is that values. convolution could help the network simplify inputs through a pooling method in order to better process the large number While the application of this transformation to the data of inputs and that dropout layers would help to limit over- does fundamentally forget information about the swarm, it is fitting on a small number of exploitable attributes.While our still implicitly able to supply a robust description about the investigation was unsuccessful, further exploration guided by behavior of the swarm, including it’s average direction and a stronger intuition about appropriate neural architectures may speeds, how densly packed it is, etc. We have maintained all prove more fruitful. Following our lack of success, we decide of the information that we expect to be relevant in classifying upon a more conventional funnel-shaped dense neural network boid flight behaviors, but have presented it in a more compact structure. Rectified linear activation and Adam optimization and general way. are used throughout due to the generalizability and power of Overall, supposing that similar success is achieved by both over short testing time-frames. training on this reduced data set, we suspect this to be a very The model we decide upon for the original data set has an desirable transformation. The dimensionality has been reduced initial dense layer of 256 nodes to process the 2, 400 original 40-fold, and the data itself is less sensitive to outliers and is attributes. This input layer is then funneled through two hidden ignorant of the explicit behaviors of individual boids beyond dense layers of size 128 and 64, which then determine the their effects on the statistical analysis. The data therefore now flocking classification encoded into a single-node output layer. focuses on the behavior of the swarm as a whole, which is the actual target behavior. B. Suport Vector Machine We can also see in Fig 11 the various important features in this data as extracted from a naively trained random forest We next test a Support Vector Machine classification technique model have much more comparable importance values. Unlike on the reduced data set, expecting that the high dimensionality the earlier data (see again Fig 10), the new model does not of the original training data would excessively complicate the rely too explicitly on any feature. This is very promising for modeling process. achieving a reliable and generalizable, rather than exploitative, We perform a handful of grid searches in order to isolate model. ideal parameters for training and compare the method to a handful of other comparable classification methods without performing any optimization on the others. VIII.F URTHER MODELING

For this project, we expand the tested models to explore both a IX.F URTHER RESULTS neural network and Support Vector Machine (SVM) approach to classification. We now present the results of our experiments and models. A. Neural Network Depicted in Fig. 12, this fairly simple architecture provided a rather unstable and steadily decreasing accuracy over time during the training process. This seems to imply that the network was not efficiently learning from the training data and extracting relevant feature relations among the boids’ qualities, and even seems to have further confused itself the longer that it attempted to learn from the data.

Fig. 13: The accuracy over time plot for a neural network trained over our reduced data set. Notice the steady rise in both training and validation accuracy over time.

Fig. 12: The accuracy over time plot for a neural network trained over the original data set. Notice the chaotic behavior and overall downward trend.

For the reduced data set, we apply a very similar network structure. This model begins with only 124 nodes, which is quite sufficient to process the 60 input attributes, with a single hidden layer of size 64 feeding into the single-node output layer for classification. We see in Fig. 13 a dramatic improvement to performance. Not only has the accuracy over time plot stabilized, both the training and validation accuracy exhibit steady growth over time. With validation and training accuracies reaching 88% and a healthy and consistent curvature to the plot, it does appear that the feature reduction has significant effects on Fig. 14: An ROC curve representing the SVM model found the ability to accurately model the data when using a neural by our grid search. This model shows a higher sensitivity to network model. the cutoff value than our logicstic regression curve (Fig. 5).

B. Suport Vector Machine clearly not a method that is particularly well suited to the clas- Following our grid search, we settle upon largely default sification problem and it speaks to success through exploitation parameters, excepting a regularization value of 1000, which of the data set rather than a well-developed understanding. provides a model accuracy around 88%. While this is a fairly satisfactory accuracy output, it is comparable to other models X.F INAL DISCUSSION that have been evaluated and further evaluation shows this method lags behind others. We have seen that even with a reduction in the dimensionality In Fig. 15, we compare the ROC curves of the optimized of the data set by roughly 97%, we are able to achieve quite SVM against a number of naively trained and unoptimized promising performance classifying for the flocking behavior. models. Note that the SVM has a largely linear curve and a While this section of the work has not focused on the aligned significantly lower area under the curve compared to any of and grouped classes explored earlier, the triviality of that the other evaluated methods. Although it was able to achieve problem is expected to continue with the reduced data set. a similar classification accuracy following optimization, it is It should be noted of our neural networks that these are data set has been constructed on too small of a population with differing views about what it means to be flocking, providing a difficult to overcome upper limit on classification due to this disagreement.

XI.C ONSCLUSIONS From our experimentation, we have found that a number of different models may be viable options for classifying on the flocking behavior provided by this data set, but also that Random Forest and well-designed neural network models are likely the most promising. The failings of the SVM could be in part due to the manner in which the data set was reduced, but the aggregation of attributes was found to be critical in the case of the nerual network, since training upon the original data set was found to be too unstable to be acceptable. Once the bulky original data was statistically aggregated, the stability Fig. 15: An ROC curve comparing the performance of a range of training was vastly improved and led to a superior network of classifiers. Our optimized SVM is colored in dark blue, with a reduction in training parameters and an increase in with the lowest AUC. The ensemble method showed the best accuracy. performance of the lot. While multiple approaches seemed to be viable for model- ing the data in this work, our exploration did seem to confirm the existence of an upper bound on the accuracy of flocking only initial results using simple architectures. More testing classificiation. This bound is present in the models explored could be performed in an attempt to build more complex prior to the statistical transformation of data, notably the models that could successfully represent the original data set Random Forest classifier which achieved a 98.8% success on and possibly optimize the model for the reduced data set. all classification labels but only 88% on flocking, so it does not Further work could also focus on preventing overfitting, as seem to suggest that it is a consequence of the data reduction the validation accuracy for the reduced model is not as stable but rather more inherent to the data set itself. as the training accuracy, and generalizability of our model has Perhaps work could be done on identifying those instances been our primary concern throughout this process. which are incorrectly classified and identify whether it is It is important to note that regardless of the method or consistently the same instances and whether they share some data set used, we often find our models optimizing around commonality. If they could be identified as outlier instances, an 88% success rate when classifying on the flocking label. perhaps they could be removed from the training set and the We referenced earlier wisdom from the literature that the data could be pruned such that it more accurately represents classification of flocking behavior is known to be difficult, a distinction between flocking and not flocking behavior. and throughout our work we have been surprised at the Further exploration into viable statistical transformations immediate and reasonably consistent success that we have of the original data could also prove fruitful. We have not had in producing viable results. However, noticing this upper performed a feature elimination process on our statistical limit on the success of classifying for the flocking behavior, it features, but rather chose them as a first approximation of suggests that difficulty in the classification problem may not valuable statistics for describing the behavior of a swarm. originate so much so in the problem itself, but rather in the Eliminating the less relevant features from our transformed creation of a data set that can accurately train classifiers on data set could help isolate the kinds of features that are the distinction between flocking and not flocking. effective in this problem and provide information about what The data set was produced by an openly available survey in additional statistical aggregations could be viable. which anyone who is interested in submitting their opinions In future work, it may be valuable to test the models is able to do so and ultimately influence the classification that trained on the data set against other data sets from other boid each of these instances is labeled with. This method inherently simulations and perform a human analysis to determine how introduces significant complications and conerns into the data accurately they are classifying not only new data but perhaps a set. Not only does it require classifying on a somewhat new kind of boid behavior that they have not seen before, e.g., arbitrary judgment of behavior, there are also a number of boids with different a weighting of their aligment, separation, people’s opinions that must be reconciled into the single and cohesion forces. classification label and may perhaps even be intentionally The development of a similar data set while keeping in misleading survey responses. We also do not have access to mind statistical concerns raised by the survey method could the number of participants, especially unique participants, that be valuable as well. For instance, having the survey be taken submitted answers for the survey. It is quite possible that the by a controlled study group who are given certain definitions or descriptions of the behaviors for which the participants are to classify could return more reliable results than allowing the survey to be openly taken on the internet with no real concensus as to what is meant by each of the behavior labels. Beyond improving the reliability of the data set, we are also interested in the possibilty that this two-dimensional data set could somehow be used to train a three-dimensional classifi- cation model which is able to classify on real bird flocking instances. It seems within reason that the two-dimensional data could be interpreted as projections of three-dimensional data into various lower dimensions which are missing one dimension and then use this perspective to extrapolate a three- dimensional model. However, the construction of a new three- dimensional data set through a new survey seems a more viable and reliable path forward for increasing the spatial dimensionality.

REFERENCES

[1] C. W. Reynolds, “Flocks, , and schools: A distributed behavioral model,” 1987. [2] T. Oboshi, S. Kato, A. Mutoh, and H. Itoh, “A simulation study on the form of fish schooling for escape from predator,” 01 2003. [3] L. Jiang, L. Giuggioli, A. Perna, R. Escobedo, V. Lecheval, C. Sire, Z. Han, and G. Theraulaz, “Identifying influential neighbors in animal flocking,” PLOS Computational Biology, vol. 13, pp. 1–32, 11 2017. [4] N. Maruyama, D. Saito, Y. Hashimoto, and T. Ikegami, “Dynamic organization of flocking behaviors in a large-scale boids model,” Journal of Computational Social Science, vol. 2, no. 1, pp. 77–84, 2019. [5] D. Dua and C. Graff, “UCI machine learning repository,” 2017. [6] S.Abpeikar, K. Kasmarik, M. Barlow, and M. Khan, “Human perception of swarming: data set,” 2020. [7] S.Abpeikar, K. Kasmarik, M. Barlow, and M. Khan, “Human perception of swarming,” 2020.