FARMING: METHODS FOR THE PRESENT, OPPORTUNITIES FOR THE FUTURE

Susan M. Sanchez

SEED Center for Data Farming Operations Research Department Naval Postgraduate School

ISIM 2017 Research Workshop Durham, U.K.

Department of Defense Distribution Statement: Approved for public release; distribution is unlimited vs. Data Farming

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield - Farmers manipulate the environment to their advantage: pest control, irrigation, fertilizer, etc.

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield - Farmers manipulate the environment to their advantage: pest control, irrigation, fertilizer, etc. - Data Farming manipulates simulation models to advantage with designed experimentation

2 Data Mining vs. Data Farming

• Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield - Farmers manipulate the environment to their advantage: pest control, irrigation, fertilizer, etc. - Data Farming manipulates simulation models to advantage with designed experimentation

One way of thinking of big data…any data set that pushes against the limits of currently available analysis technology

2 = Correlation 6 Causation

“Wall Streeters have the “Harnessing vast quantities of data fastest computers, most rather than a small portion, and sophisticated software and biggest privileging more data of less exactitude, databases money can buy, and yet many opens the door to new ways of failed to see the 2008 crash coming. The understanding. It leads society to abandon hope that Big Data will make economics its time-honored preference for causality, and other social sciences truly scientific and in many instances tap the benefits 2 — that is, precise and predictive– of correlation.” remains, for now, a fantasy.”1

Correlation = 0.947

1Hogan, J., 2014. “So Far, Big Data is Small Potatoes” 2Mayer-Schonberger, V. and K. Cukier, 2013. “Big Data: A Revolution That Will Transform How We Live, Work, and Think” 3Vigen, T., 2014. “Spurious Correlations,” www.tylervigen.com”

3 = Correlation 6 Causation

“Wall Streeters have the “Harnessing vast quantities of data fastest computers, most rather than a small portion, and sophisticated software and biggest privileging more data of less exactitude, databases money can buy, and yet many opens the door to new ways of failed to see the 2008 crash coming. The understanding. It leads society to abandon hope that Big Data will make economics its time-honored preference for causality, and other social sciences truly scientific and in many instances tap the benefits 2 — that is, precise and predictive– of correlation.”Simulators don’t haveremains, to for now, a fantasy.”1 choose!

Correlation = 0.947

1Hogan, J., 2001. “So Far, Big Data is Small Potatoes” 2Mayer-Schonberger, V. and K. Cukier, 2013. “Big Data: A Revolution That Will Transform How We Live, Work, and Think” 3Vigen, T., 2014. “Spurious Correlations,” www.tylervigen.com”

4

Large-scale computational experiments are transformative

“Petaflop machines like Roadrunner have the potential to fundamentally alter science and engineering…[allowing scientists to] perform experiments that would previously have been impractical.” The New York Times, June 9, 2008 Large-scale computational Experimentation is hard: experiments are transformative “2100 is forever” —Maj Gen Jasper Welch “Petaflop machines like Roadrunner have the potential to fundamentally alter science and Even with today’s most powerful computers, engineering…[allowing scientists to] perform brute force exploration of 100 variables at 2 experiments that would previously have been levels for a simulation that runs in one second impractical.” would take many times the age of the universe… The New York Times, June 9, 2008 so we need to be smart! Large-scale computational Experimentation is hard: experiments are transformative “2100 is forever” —Maj Gen Jasper Welch “Petaflop machines like Roadrunner have the potential to fundamentally alter science and Even with today’s most powerful computers, engineering…[allowing scientists to] perform brute force exploration of 100 variables at 2 experiments that would previously have been levels for a simulation that runs in one second impractical.” would take many times the age of the universe… The New York Times, June 9, 2008 so we need to be smart!

Moore’s Law is not enough!

The “curse of dimensionality” cannot be solved by hardware alone.

Petaflop = 1 quadrillion ops/second Cost of “Roadrunner”= $133 million Large-scale computational Experimentation is hard: experiments are transformative “2100 is forever” —Maj Gen Jasper Welch “Petaflop machines like Roadrunner have the potential to fundamentally alter science and Even with today’s most powerful computers, engineering…[allowing scientists to] perform brute force exploration of 100 variables at 2 experiments that would previously have been levels for a simulation that runs in one second impractical.” would take many times the age of the universe… The New York Times, June 9, 2008 so we need to be smart!

Moore’s Law is not enough! Data farming helps overcome the curse of dimensionality… The “curse of dimensionality” cannot be solved by hardware alone. With large-scale efficient experimental designs, we generate “better big data” and regularly study hundreds of factors for longer-running simulations in hours, days, or weeks on high-performance computing clusters…

Petaflop = 1 quadrillion ops/second Cost of “Roadrunner”= $133 million Simulation is different

Response Surface Complexity

Simulation

Efficient R5 FF Experiments and CCD

Number of Factors of Number Physical Experiments

6 Simulation is different

What is? What if? What matters?

What could be?

What should be?

How might we get there?

7 Simulation is different

What is? What if? What matters?

What could be?

What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? What matters?

What could be?

What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time

What could be?

What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be?

What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be? • Retain flexibility in the design and analysis, “explore” the output to gain understanding: we advocate space-filling designs

What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be? • Retain flexibility in the design and analysis, “explore” the output to gain understanding: we advocate space-filling designs You can always double check! What should be?

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be? • Retain flexibility in the design and analysis, “explore” the output to gain understanding: we advocate space-filling designs You can always double check! What should be? • Confirmation runs of your simulation can be made to see how well your metamodels perform at previously untested design points

How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be? • Retain flexibility in the design and analysis, “explore” the output to gain understanding: we advocate space-filling designs You can always double check! What should be? • Confirmation runs of your simulation can be made to see how well your metamodels perform at previously untested design points Statistical significance is NOT practical importance How might we get there?

“think big!” — factors, features, flexibility 7 Simulation is different

What is? Analyst may be more “expensive” than the data What if? • Don’t focus on keeping the #design points or #reps small, if you can make runs in a What matters? reasonable time Make few assumptions What could be? • Retain flexibility in the design and analysis, “explore” the output to gain understanding: we advocate space-filling designs You can always double check! What should be? • Confirmation runs of your simulation can be made to see how well your metamodels perform at previously untested design points Statistical significance is NOT practical importance How might we get there? • You may throw out many metamodel terms with low p-values without sacrificing explanatory power or affecting the decision “think big!” — factors, features, flexibility 7 SEED student impact…

“If the Division Commanders want a UAV at their level and have nothing else, we ought to give it to them.” Saving money: Major Chris Nannini, USA – LTG Scott Wallace, Commanding General, U.S. Army V Corps, “Analysis Of The Assignment Scheduling cited in Sinclair (2005) Capability For Unmanned Aerial Vehicles (ASC-U) Simulation Tool.” “The UAV modeling…harvested $6 billion in savings and 6,000 to M.S. in Operations Research 10,000 billets, that’s a brigade’s worth of soldiers. Over 20 years that allowed us to avoid a cost of $20 billion.” – Michael F. Bauman (2007), Director of the United States Army New methods: LTC Tom Cioppa, USA Training and Doctrine Command Analysis Center. “Efficient Nearly Orthogonal and Space- Filling Experimental Designs for High- Dimensional Complex Models" Ph.D. in Operations Research

Helping the Fleet: LT Chad Kaiser, USN “Air Defense Against UAS Kamikaze Saturation Attack” M.S. in Operations Research

U.S. Navy TACBUL AD 09-01, “USN Surface Weapon System Capabilities and Limitations against low slow flyers (Helicopters/Small Aircraft/UAVs).” Graphs from a few examples

• STORM – U.S. Navy campaign analysis model – ~40MB of input data spread over 150 input files – A single replication takes hours to complete, yields tens or hundreds of GBs of output date (mix of database fields, large flat files) – Graphs shown come from a notional training scenario – See “Improving U.S. Navy campaign analysis with big data” by Morgan, Schramm, Smith, Lucas, McDonald, Sanchez, Sanchez, & Upton (2017), forthcoming in Interfaces

• Fleet management model (matlab) – Australia using for its naval helicopter fleet (30 year lifetime) – Exploring how results depend on different policies – Graphs shown are based on notional data – Marlow, Sanchez, & Sanchez (2015 MODSIM, 2017 forthcoming).

9 Morgan et al.: Improving U.S. Navy Campaign Analyses with Big Data 8 Interfaces, Articles in Advance, pp. 1–17 scenario to be readily customized for other entities and (Figure 3) can be used to examine a user-specified sub- conditions in other campaigns. set of the metrics. This figure shows two very strong positive correlations and one very strong negative cor- Visual Summary Tools relation. A few potential insights into this notional The new quick-look dashboard (Figure 2) informs the scenario follow. For example, the positive correlation user how often objectives are met in each instance and between the number of Blue aircraft lost and the num- is the starting point for exploring the response space. ber of Red advanced surface-to-air missile (SAM) sites It displays scores of output measures across dozens of destroyed suggests that destroying Red SAM sites, runs at a glance. Each row describes a campaign objec- an important objective, comes at a cost. Additionally, tive specified by the user. Hyperlinks allow researchers there is a negative correlation between the number to dynamically access other analytic artifacts described of amphibious ship losses and the amount of time ⇤ below. In this example, Blue carrier losses 0 is defined Blue has air supremacy. One possibility is that the as success, whereas Blue carrier losses 1 is defined longer Blue has air supremacy, the more protection as failure. Each cell contains the number of occur- the amphibious ships receive. We also observe that rences of the condition for that replication. The green the length of time Blue is able to achieve and hold or red color indicates if the threshold condition was air supremacy is positively correlated with Red car- met (green) or not met (red). rier losses. These last two correlations are consistent A similar outlier dashboard (not shown) presents with the conventional wisdom about the importance of analysts with a color-coded map that identifies runs achieving early air supremacy. in which discordant data occur for user-specified outcomes. Condition, Event, and Resource Heatmaps It is also informative to see how the key metrics What, when, and where certain events and conditions relate to each other. The correlation plot of key metrics took place is critical to understanding a simulated

A9 Figure 2. (Color online) The Quick-LookMultiple Dashboard Shows, responses in Aggregate, How Often… the User-Defined Success Metrics Are Met

ReplicationReplication CriteriaCriteria namename linkslinks toto metricmetric’s’ s MetricMetric’s’ s valuevalue numbernumber partitionpartition treetree analysisanalysisQuickLook Dashboard atat endend ofof runrun

Variable variable 171742 4229 2934 3416 1645 451 122 2233 3343 4330 303 340 4049 4950 5032 325 535 3524 2448 4827 2711 1112 126 638 3810 1013 1325 2539 3926 2644 4447 4721 218 837 3718 1831 319 941 412 24 47 720 2014 1415 1546 4623 2328 2819 1936 36 Blue_Carrier_Losses ugh 1100101111100102111113222202211213333322323222323310 0 1 0 1 1 1 1 1 0 0 1 0 2 1 1 1 1 1 3 2 2 2 2 0 2 2 1 1 2 1 3 3 3 3 3 2 2 3 2 3 2 2 2 3 2 3 3 1 Blue_SurfaceShip_Losses ugh 886 64 44 49 96 68 86 67 79 95 57 76 66 69 99 99 910 107 79 99 94 47 78 87 77 79 96 611 118 88 88 86 68 87 77 79 95 58 810 106 67 710 1010 105 58 87 711 119 910 10 Blue_Sub_Losses ugh 3354444545534344434444434555555556545344444544364455 4 4 4 4 5 4 5 5 3 4 3 4 4 4 3 4 4 4 4 4 3 4 5 5 5 5 5 5 5 5 6 5 4 5 3 4 4 4 4 4 5 4 4 3 6 4 4 5 Blue_Amphib_Losses ugh 1100000120000003101000010001000001000010000000002110 0 0 0 0 1 2 0 0 0 0 0 0 3 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 1 1 BlueAirSupremacy ugh 004 44 49 95 52 20 01 10 05 513 136 66 60 00 00 06 60 00 06 612 1210 1013 136 69 98 86 67 70 09 915 1517 174 44 40 00 06 60 00 03 35 55 55 510 106 65 50 00 06 60 0 BlueAirSuperiority ugh 181818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 18 RedAirSupremacy ugh 0000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RedAirSuperiority ugh 181818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 1818 18 FailFail carthageESGHasEnteredMed ugh 303029 2930 3030 3030 3030 3029 2930 3030 3029 2930 3030 3029 2930 3030 3029 2930 3029 2929 2930 3029 2930 3029 2930 3029 2930 3029 2929 2929 2930 3029 2929 2929 2930 3030 3030 3030 3030 3030 3030 3029 2929 2929 2929 2930 3030 3030 3030 3029 2930 30 carthageCBGHasEnteredMedugh 303030 3030 3030 3031 3130 3030 3031 3131 3130 3030 3030 3030 3030 3030 3030 3031 3130 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3030 3031 3131 3131 3130 3030 3031 3130 3030 3030 3030 3030 3030 3030 3031 3130 3030 3030 30 carthageESGHasArrivedOffRomeugh 6666666666666666666666106666671196686666661266696666666 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 10 6 6 6 6 6 7 11 9 6 6 8 6 6 6 6 6 6 12 6 6 6 9 6 6 6 6 6 6 expeditionaryOpsHaveBegun ugh 55555555555555555555559555555975575555551155585555555 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9 5 5 5 5 5 5 9 7 5 5 7 5 5 5 5 5 5 11 5 5 5 8 5 5 5 5 5 5 PassPass carthageESGHasReturnedFromBeachugh 2222222222233322222222733222375224222222822362323232 2 2 2 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 7 3 3 2 2 2 3 7 5 2 2 4 2 2 2 2 2 2 8 2 2 3 6 2 3 2 3 2 3 gibralterMinesweepingHasStartedugh 363636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 3636 36 seabaseIsComplete ugh 373737 3737 3737 3737 3737 3737 3737 370 037 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 370 037 3737 3737 3737 37 cruisersHaveArrivedAtSeabaseugh 373737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 3737 37 seabaseNWScreenIsCompleteugh 000 00 031 310 030 3032 3232 320 00 00 00 00 00 00 00 00 00 033 330 00 033 330 00 033 3334 340 00 00 00 031 310 00 00 00 027 270 032 320 00 027 2720 200 00 032 320 034 340 00 00 0 InIn jeopardyjeopardy seabaseNEScreenIsComplete ugh 373737 3737 3737 3737 3737 3737 3737 370 037 3737 370 00 037 3737 3737 3737 370 037 3737 3737 3737 3737 3737 3737 3737 3737 3737 370 037 3737 3737 3737 3737 3737 3737 370 037 3737 3737 3737 3737 3737 3737 3737 3737 3737 370 00 037 37 isRedCarrierDead ugh 006 60 00 00 00 04 45 50 00 00 06 60 00 00 04 40 00 00 06 612 1218 1813 136 69 98 86 67 717 1717 1715 1511 110 00 00 00 06 610 100 00 00 00 00 00 06 66 60 00 03 30 0 seabaseSWScreenIsCompleteugh 000 00 031 310 030 3032 320 00 00 030 300 00 00 033 330 00 00 033 3329 2933 3333 3329 2932 320 034 3418 180 00 032 320 00 020 2027 270 027 2728 2832 3219 190 027 270 030 300 032 320 034 3433 330 028 28 isESGInPort ugh 4444444444444444444444444444444444444444444444444444 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ColorColor indicatesindicates whichwhich user-specifieduser-specified thresholdsthresholds forfor successsuccess areare met met

Notes. The figure also shows the worst or best performance against these metrics. The replication numbers are not in numeric order because a clustering algorithm groups red cells together, making the dashboard easier to read by presenting less of a checkerboard display. 21 responses indicate whether or not the naval campaign has gone well for the notional Carthage empire (“Blue” side)

10 Morgan et al.: Improving U.S. Navy Campaign Analyses with Big Data 12 Interfaces, Articles in Advance, pp. 1–17

Figure 6. (Color online) Resource Heatmap:Delving The Horizontal deeperFigure 7. (Color online) Multidimensional Scaling Axis Represents Time (i.e., Days in the Campaign) Depiction of the Separation Between Two Clusters, Where

.OOFREPSWHICHINVENTORYLEVEL the Clusters Are Determined by WITTW Key Metrics FORBLUE!!MISSILE Replications by cluster

#ASABLANCA.AVAL3TATION 39 26 Cluster 11 10 a 1 #ARTHAGE3OUTH#ARRIER3TRIKE'ROUP 26 1,000 a 2 11 #ARTHAGE.ORTH#ARRIER3TRIKE'ROUP 44 SubLosses SurfaceShipLosses AmphibLosses CarrierLosses C2_RedAdvSAMSitesDead_count ACLosses C2_isRedCarrierDead_count C2_BlueAirSupremacy_count 1 12 27 12 SubLosses 47 #ARTHAGE.AVAL"ASE 0.8 25 #OUNT 27  38 25 SurfaceShipLosses 0.07 0.6 6 #ARTHAGE%XPEDITIONARY3TRIKE'ROUP  500 6  40 0.4 13 AmphibLosses −0.06 0.23  #ARTHAGE#,&  3 0.2 3URFACE!SSET.AME 13 3 CarrierLosses  42 0.03 0.23 0.12 28 23 #ARTHAGE#,& 23 29 0 2814 15 16 Coordinate 2 15 9 0 9 17 C2_RedAdvSAMSitesDead_count 0.14 −0.03 0.1 0.27 14 11 22 16 30 #ARDIFF3UB"ASE −0.2 46 29 34 22 45 17 ACLosses 0.15 −0.2 0.06 0.21 0.62 −0.4 43 30 50 "LUE"AYOF"ISCAY3EA"ASE 19 2 48 24 4 33 −0.6 4 32 C2_isRedCarrierDead_count 0.12 −0.04 −0.11 0.01 0.08 0.18 19 21 2 35 49 21 41 8 8 20 24 !NGLO2EPUBLIC#ARRIER3TRIKE'ROUP 31 7 7 20 −0.8 –500 37 36 5 C2_BlueAirSupremacy_count −0.08 −0.15 −0.35 0 0.06 0.19 0.5 5     −1 $AY 18 18 Notes. The maximum count in the legend indicates the number of replications (in this case, 50). The title of the graphic indicates –1,000 –500 0 500 1,000 which threshold value is being used. The user is provided with two Coordinate 1 heatmaps for each resource type—one to correspond with a “ > 50%” threshold, and one to correspond with an “⇤ 0%” threshold. The heatmap displays the status of resources for just the Blue air-to-air unit plan. For example, in Figure 9, the execution node (A2A) missile, where the threshold corresponds to “⇤ 0%.” Here, the rows correspond to Blue assets that carry A2A missiles. The highest for Phase A (line number 873 in the orders file) con- saturation ofWe green can corresponds look to aat resource correlations level never (over or all clusterstains 0.00 of followed the responses by 50. This means that this plan the replications) being equal to 0%. The highest saturation of red cor- responds to a resource level always (over all the replications) being phase executed in all 50 replications, with a median equal to 0%. The colors in between reflect variability. firing time of zero (scenario start). The execution node for Phase E (line number 909), however, fired only five nodes are the line numbers, from the naval orders file, out of 50 times, with a median execution time of 16.88. 11 which correspond to each plan of the plan sequence. For easy reference, we provide the mapping of unit In this example, Phase D fired only once out of the 50 plan phase text (e.g., “TRANSIT TO CENTRAL MED”) replications; it was skipped during four out of five runs to line numbers in a separate table. The pink nodes rep- where Phase E fired. The color gray corresponds with resent conditions that cause the unit to shift from one the proportion of times (over the replications) that the phase of execution to another. For example, there is an phase executed, with darker colors corresponding to “UNTIL” condition at line number 876 that causes the higher proportions. Carthage South CSG to shift from Phase A to Phase B of its unit plan sequence. The yellow nodes exist to Metamodel-Based Approaches assign unique condition numbers to every user-defined A common approach to help analysts glean insights condition (function) in the naval orders file, because a from simulation output involves fitting statistical meta- condition is often referenced by more than one unit. models. Metamodels are mathematical models that The grayscale nodes that appear next to the pink con- encapsulate the observed behavior of the simulation by dition nodes are execution nodes, and contain infor- mapping inputs to outputs. While simulation runs may mation about the firing of the different phases of the require a massive setup effort and have long run times, delving deeper…

WITTW conditions heatmap Number of Reps which seabaseSWScreenIsComplete Inventory Level = 0% for seabaseSEScreenIsComplete Blue A2A Missile seabaseNWScreenIsComplete seabaseNEScreenIsComplete Casablanca Naval Station seabaseIsComplete redShipsThreatenWestMed RedLRSAMSiteNFranceDead Carthage South Carrier Strike Group RedLegacySAMSitesDead RedAirSupremacy always fail RedAirSuperiority Carthage North Carrier Strike Group RedAdvSAMSitesDeadInSwissEmpire RedAdvSAMSitesDeadInSouthFranceAOI RedAdvSAMSitesDeadInNorthFranceAOI count 50 Carthage Naval Base RedAdvSAMSitesDead count isRedCarrierDead 40 50 isESGInPort 30 Carthage Expeditionary Strike Group 40 gibralterMinesweepingHasStarted 20 30 Condition expeditionaryOpsHaveBegun 10 20 cruisersHaveArrivedAtSeabase Carthage CLF 2 0 10 carthageESGHasReturnedFromBeach

carthageESGHasEnteredMed SurfaceAssetName 0 carthageESGHasArrivedOffRome Carthage CLF 1 carthageCBGHasEnteredMed BlueMRMSAMSitesDead Cardiff Sub Base BlueLRSAMSiteSSpainDead BlueAirSupremacy BlueAirSuperiority sometimes met Blue Bay of Biscay Sea Base BlueAdvSAMSitesDeadInSouthSpainAOI BlueAdvSAMSitesDeadInNorthCarthageAOI BlueAdvSAMSitesDeadInAngloRepublicAOI always met Anglo Republic Carrier Strike Group BlueAdvSAMSitesDead

0 5 10 15 20 0 5 10 15 20 Day Day

Heat maps can show conditions, events, or resources over time, rather than just end-of-run results.

12 delving deeper

Naval C2 Plan Sequence Execution Naval C2 Plan Sequence Execution Cluster:1 Cluster:2 Rabat Combat Logistics Force Rabat Combat Logistics Force Portsmouth Combat Logistics Force A B Portsmouth Combat Logistics Force A B Carthage West Med SSN−2 Patrol A Carthage West Med SSN−2 Patrol A Carthage West Med SSN−1 Patrol A B Carthage West Med SSN−1 Patrol A B Carthage South CSG Mine Clearance A B Carthage South CSG Mine Clearance A B Carthage South Carrier Strike Group AB Carthage South Carrier Strike Group AB Carthage S CSG SSN−2 Patrol A B C D E Carthage S CSG SSN−2 Patrol A B C D E Carthage S CSG SSN−1 Patrol A B C Carthage S CSG SSN−1 Patrol A B C Carthage North Carrier Strike Group A B C Carthage North Carrier Strike Group A B C Carthage N CSG SSN−2 Patrol A B Carthage N CSG SSN−2 Patrol A B Carthage N CSG SSN−1 Patrol A Carthage N CSG SSN−1 Patrol A Carthage N CSG SSGN−1 Patrol A Carthage N CSG SSGN−1 Patrol A Carthage Expeditionary Strike Group A Blue Carthage Expeditionary Strike Group A Blue Carthage East Med SSN−2 Patrol A B C Carthage East Med SSN−2 Patrol A B C Carthage East Med SSN−1 Patrol A B Carthage East Med SSN−1 Patrol A B Carthage Combat Logistics Force A B Carthage Combat Logistics Force A B Blue Sea Base SW Outer Screen A B C D Blue Sea Base SW Outer Screen A B C D Blue Sea Base SE Outer Screen A B Blue Sea Base SE Outer Screen A B Blue Sea Base NW Outer Screen A B Blue Sea Base NW Outer Screen A B Blue Sea Base NE Outer Screen A B Blue Sea Base NE Outer Screen A B Blue Marine Exp Op Group A B Blue Marine Exp Op Group A B Blue Bay of Biscay Sea Base A B C Blue Bay of Biscay Sea Base A B C

Unit Blue Atlantic SAG A BC Unit Blue Atlantic SAG A BC Anglo Republic SSN−2 Patrol A B C D Anglo Republic SSN−2 Patrol A B C D Anglo Republic SSN−1 Patrol A Anglo Republic SSN−1 Patrol A Anglo Republic Carrier Strike Group A Anglo Republic Carrier Strike Group A A B A B West SWEMP Surface Action Group West SWEMP Surface Action Group West SWEMP SSN−2 Patrol A B C D West SWEMP SSN−2 Patrol A B C D West SWEMP SSN−1 Patrol A B West SWEMP SSN−1 Patrol A B West SWEMP SAG SSN−2 Patrol A B West SWEMP SAG SSN−2 Patrol A B West SWEMP SAG SSN−1 Patrol A B West SWEMP SAG SSN−1 Patrol A B SWEMP Sicily CSG SSN−1 Patrol A B C D SWEMP Sicily CSG SSN−1 Patrol A B C D SWEMP Rome CSG SSN−2 Patrol A SWEMP Rome CSG SSN−2 Patrol A A A

SWEMP Rome CSG SSN−1 Patrol Red SWEMP Rome CSG SSN−1 Patrol Red SWEMP Rome Carrier Strike Group A SWEMP Rome Carrier Strike Group A SWEMP CSG SSN−4 Patrol A SWEMP CSG SSN−4 Patrol A SWEMP CSG SSN−3 Patrol A B SWEMP CSG SSN−3 Patrol A B SWEMP CSG SSN−2 Patrol A B SWEMP CSG SSN−2 Patrol A B SWEMP CSG SSN−1 Patrol A SWEMP CSG SSN−1 Patrol A SWEMP CSG Destroyers South A SWEMP CSG Destroyers South A SWEMP CSG Destroyers East A SWEMP CSG Destroyers East A SWEMP Carrier Strike Group A SWEMP Carrier Strike Group A East SWEMP Surface Action Group A B C East SWEMP Surface Action Group A B C A B A B 0 5 10 15 20 25 0 5 10 15 20 25 Time(median over 30 reps) Time(median over 20 reps)

How often are different rule conditions triggered? These side-by-side graphs show median times and how often different phases of the plans fired, by cluster.

13 ‘Carthage South Carrier Strike Group‘

HAS_PLAN_SEQUENCE

‘Carthage Med Plan‘

Code line #s for phases

HAS_PLAN HAS_PLAN HAS_PLAN HAS_PLAN HAS_PLAN A, B, C, D, E triggering

873 879 888 900 909

LEADS_TO LEADS_TO LEADS_TO LEADS_TO

EXECUTED UNTIL EXECUTED UNTIL EXECUTED UNTIL EXECUTED UNTIL EXECUTED

0.00 3.89 6.78 16.26 16.88 876 885 894 906 50 50 46 1 5 Pink boxes show code line

FUNCTION_OF FUNCTION_OF FUNCTION_OF FUNCTION_OF #s for next event, Gray boxes show median

COND−10 COND−11 COND−12 COND−13 time (top) and number of replications where condition triggered (bottom): the latter also corresponds to shading Multiple responses: fleet management

Ideal: both responses small

100

90 Identify design points that 80 satisfy multiple criteria 70

60 then use partition trees /

50 Mean(%yr

40 important factors and

30 thresholds

20 0 10 20 30 40 50 60 70 80 90 100 Mean(%yr

Partition for Good_descrip Number RSquare N of Splits 0.604 512 6

All Rows

Count G^2 LogWorth 512 318.59625 21.575029

max_emb_hrs_day>=7.2 max_emb_hrs_day<7.26 6027397 027397

Count G^2 LogWorth Count G^2 201 213.77387 7.6549705 311 33.818051

bathtub bathtub height<0.85127202 height>=0.85127202

Count G^2 LogWorth Count G^2 LogWorth 59 81.774417 3.4908626 142 99.989005 14.526981

squadron_sharing_heuri squadron_sharing_heuri bathtub bathtub stic(need) stic(none) duration<2.19178082 duration>=2.19178082

Count G^2 LogWorth Count G^2 Count G^2 LogWorth Count G^2 31 37.351296 2.5629927 28 31.490768 36 49.461234 2.4000123 106 0

daily_increase_allowanc daily_increase_allowanc unsched_maint_freq_val unsched_maint_freq_val e>=0.30136986 e<0.30136986 s(1)>=1.00596869 s(1)<1.00596869

Count G^2 Count G^2 Count G^2 Count G^2 24 18.084968 7 5.7416285 20 24.434572 16 12.056645

15 ARTeMIS: Automated Red Teaming Multiobjective Innovation Seeker Successive generations of ARTeMIS reveal the Pareto frontier of non-dominated solutions. The absence of points in the lower left indicates that a moderate cost must be incurred in order to reduce loss.

Genes represent factors of interest, such as UAV speed or search pattern Initial parents are chosen to give diversity in input space

2

0

4

Offspring are mutants of their parents

3

1 ideal direction: disrupting Red disrupting loss due to not due loss low loss, low cost 5

0 1 cost of new UAV capabilities (notional) 1

2

0

5

“loss” in the robust sense: minimizing average (squared deviation from target) across the noise space

16 Future simulation clients: • Complex problems Avoid Type III errors, embrace computational tractability • Complex questions Big simulation data leads to new and interesting insights • Comfort with computerized and computer-based decisions more professionals, new tools for simulation community, greater range of applications Future simulation methods: • Continual processing Simulation study as a process, not an end state; Leverage unused computing cycles • Changing areas of research emphasis Multi-objective, adaptive procedures, parallel computing, large-scale simulation experiments, exploration/optimization on adaptively updated metamodels • Causal computerized decision making Simulation: the gold standard for model-driven big data and inferential decision making within big data analytics Future simulation models • Living models Automated links between real-world data capture and simulation modeling environments Models / confederations of models that evolve over time Listener event graph objects (LEGOs) Future research / application by data scientists • Anticipate changes, suggest “what if?” Massive sensitivity analysis of metamodels might indicate what to watch for, suggest interventions for direct experimentation • Methods that leverage structure Data from designed experiments has some different characteristics than observational data • Smarter computational agents Intelligent agents search through model-driven data sets, identifying important factors and interesting features find resources, or add yourself to our mailing list http://harvest.nps.edu bonus

and 1. The details of the packages will be explained later in this paper. The predictions given by laGPE smoothly interpolate between the observations. We can see that mlegpE exhibits mean reversion, where it predicts the observed points correctly, but then quickly reverts to the mean away from those points. At the other extreme, Dice2 oversmooths, causing its predictions to be far from the observed points. Furthermore, the predictions of the standard error across the range of x values will also be significantly different. Thus, even for a simple data set, we can obtain very different results.

(a) d =4,n =40 (b) d =4,n =80 Figure 1: Comparison of Gaussian process fits from three software packages, laGPE, mlegpE, and Dice2, on one-dimensional data. The black points are the input data given to each package to fit a GP model. The lines are the predicted mean over the interval [0,1] for each package, showing significant differences.

2. GP fitting

2.1. Model A Gaussian process is characterized such that the output from any set of input points has a multivariate normal distribution. If we have n inputs in d dimensions, then the th T i input is xi =(xi1,...,xid) . These are stored in the rows of the n by d input matrix T X. The output is one dimensional y =(y1,...,yn) . Following Sacks et al. (1989) the surface is modeled as a mean, µ, plus a Gaussian process, z, which is a function of xi as follows:

4

(c) d =8,n =80 (d) d =8,n =160 (a) n =80 (b) n =160 Figure 2: Borehole 4-D and 8-D comparison. All four plots are on the same scale. Figure 3: Run times (seconds) for Borehole 8-D with n =80and n =160,bothwithm = 2000. There are enormous di↵erences among the packages, but the relative speeds of the packages are similar.

Linux cluster, except for JMP which was run on a personal Dell laptop running Win- dows. The relative run times for each package are the same for both sample sizes, and 18 the same pattern is found on other test functions as well. GPfit is by far the slowest, taking over 15 minutes per replicate for n = 160. JMP is the next slowest, taking two minutes per replicate, but this data is unreliable since it was run on a different computer. The next slowest is mlegp, taking about eight minutes, with GPy only slightly faster. The fastest packages were DiceKriging, laGP, sklearn, and DACE, which only took a handful of seconds. Thus we see that there is a massive di↵erence in the run times, with a factor of over 1000 between the fastest and the slowest packages performing the same task. The times shown in this plot are for the borehole function, but the relative times are similar for the other functions. In particular, GPfit and JMP are extremely slow, mlegp is also very slow, and the rest are much faster. Therefore when one is choosing a package, it may be necessary to consider not only the model options and capability, but also how quickly it runs. Run times must also be considered in the context of the data being used. If the data comes from a simulation model that takes hours per observation, then the difference of a minute may be negligible.

5.2. The OTL Circuit Function Ben-Ari and Steinberg (2007) use a test function that describes an output transformer- less (OTL) push-pull circuit. There are six input parameters, five for resistors (Rb1, Rb2,

19