<<

Neuro-Fuzzy Dynamic Programming for Decision-Making and Resource Allocation during Wildland Fires

A thesis submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Master of Science

in the School of Aerospace Systems

of the College of Engineering and Applied Science

by

Nicholas P. Hanlon

B.S., Georgetown College, 2001

M.B.A., Northern Kentucky University, 2006

Committee Chair: Kelly Cohen, Ph.D. Abstract

Fire is a natural agent of change for our planet’s survival and has the capability to cause devastating effects (economical, societal, environmental, etc) when it encroaches into our daily lives. In the midst of a wildland fire, incident commanders are bombarded with massive amounts of data, accurate or not, and must make real-time decisions on how to allocate available resources to extinguish the fire with minimal damage.

The scenario is modeled as an attacker-defender style game, such that the defender (resources with fire retardants) is protecting its assets (homes, businesses, power plants, etc) while the attacker

(wildland fires) is attempting to deliver maximum destruction to those assets. The problem can be formulated in terms of optimal control theory, utilizing the gold standard of optimization, Dynamic

Programming (DP), to exhaustively search the solution space for the minimized cost. However, its drawback is directly related to its method of finding the optimal solution: the exhaustive search. The amount of processing time to compute the minimum cost exponentially increases with the complexity of the system. For this reason, the DP approach is generally executed offline for real-world applications.

Due to the large solution space of a wildland fire scenario, execution of DP offline is problematic as resource allocation decisions must be made in real-time.

The current research effort seeks to show a new and unique control algorithm, based on Neuro-

Fuzzy Dynamic Programming (NFDP), that can nearly replicate the DP algorithm results but can execute in real-time and remain robust to uncertainties. An artificial neural network provides the approximate cost-to-go function for the DP, fulfilling the need for real-time execution. The neural network is trained by approximate policy iteration using Monte Carlo simulations. Since our sensors may provide inaccurate or incomplete data of the environment, a fuzzy logic component is integrated to provide robustness in the system. The problem is also extended to include multiple layers of defense as

I opposed to a one layer attempt to eliminate the incoming threat. The multi-layered defense requires a unique approach in the NFDP algorithm that calculates future expected costs since a fire must successfully elude three layers of defense to constitute an attack on an asset.

Four control methodologies are examined in the research: a greedy-based heuristic, DP, NDP

(Neuro-Dynamic Programming), and NFDP. DP and the heuristic are used as benchmark cases; the premise of the heuristic approach is to protect the highest valued assets at all costs. The control methodologies are compared based on three parameters: processing time, remaining asset health, and scalability. The processing time quantifies the requirement of real-time decisions. The asset health is a measure of how well the defender protected its assets from the attacker. Scalability is how well the algorithm scales with increased complexity. With proper adjustments to the architecture and training techniques of the artificial neural network and fine-tuning of the fuzzy controller parameters, NFDP illustrates its ability to perform real-time decision-making, obtaining near optimal results in the presence of uncertainty in the sensor data, and scales well with increased complexity.

II

Blank Page or Copyright Notice

III

Acknowledgements

First and foremost I would like to thank my family, especially my wife Dr. Jaime Dann Hanlon for her never-ending love, support, and great patience at all times. She gave me the confidence to pursue this degree and the journey would never have been possible without her encouragement.

I would like to express my sincere gratitude to my advisor Dr. Kelly Cohen for his support, guidance, and wealth of knowledge he provided for this thesis, especially in the area of fuzzy logic. I thoroughly enjoyed our meetings, whether academic or personal in nature, and I look forward to our future work. I would also like to thank Dr. Manish Kumar for his invaluable assistance and guidance for the thesis. Dr. Kumar was always kindly available for questions on the thesis and helped provide direction in the areas of Dynamic Programming and Neural Networks.

I am also grateful for the committee members, Dr. Bruce Walker and Dr. Grant Schaffner, for their time to review my work and provide feedback. Finally, I would like to thank Dr. Benjamin Tyler and

Dr. Praveen Chawla of Edaptive Computing, Inc. Dr. Tyler and Dr. Chawla provided the means for this project to exist and were an immense help in the formulation of the control algorithms.

IV

Table of Contents

Acknowledgements ...... IV Table of Contents ...... V List of Figures ...... VI Chapter 1: Introduction ...... 1 Chapter 2: Literature Survey ...... 5 Dynamic Programming...... 11 Artificial Neural Networks ...... 14 Fuzzy Logic ...... 17 Previous Research ...... 19 Reasoning Behind NFDP ...... 22 Chapter 3: Problem Formulation and Scenario Description ...... 24 Vector Assignment ...... 25 Uncertainty Analysis ...... 32 Figures of Merit ...... 33 Software and Platform ...... 33 Syscape ...... 34 Chapter 4: Benchmark Methods ...... 36 Greedy-Based Heuristic ...... 36 Dynamic Programming...... 36 Chapter 5: NDP Formulation ...... 40 Neural Network Architecture...... 41 Challenges ...... 45 Chapter 6: NFDP Formulation ...... 46 Fuzzy Inference System ...... 47 Challenges ...... 49 Chapter 7: Simulation Results ...... 51 Sensitivity to Engagement Probability ...... 51 Uncertainty Analysis ...... 53 Scalability ...... 58 Chapter 8: Conclusions & Recommendations for Future Work ...... 60 Bibliography ...... 66

V

List of Figures

Figure 1: Estimate of Fires by Type in the United States (9) ...... 6 Figure 2: Average Loss per Structure Fire in the United States (9) ...... 6 Figure 3: Number of Wildland Fires and Acres Burned from 1960 to 2008 (10) ...... 7 Figure 4: Damage-Time Function (14) ...... 10 Figure 5: Neuron Components and Synapse Structure (17) ...... 15 Figure 6: The Neuron Model (18)...... 16 Figure 7: 10-Input, 2-Output ANN with 1-Hidden Layer (18) ...... 17 Figure 8: Fuzzy Controller ...... 19 Figure 9: Taxonomy of Optimization Approaches ...... 23 Figure 10: Single-Layer Defense ...... 26 Figure 11: Three-Layer Defense ...... 27 Figure 12: Receding Horizon ...... 29 Figure 13: Schematic of First Attack Wave ...... 31 Figure 14: Primary Features of Edaptive's Product – Syscape (29) ...... 34 Figure 15: State transitions with Substates ...... 38 Figure 16: The Standard Reinforcement-Learning Model (24) ...... 40 Figure 17: Fully Connected ANN with 7 Input Neurons, 8 Hidden Neurons, & 1 Output Neuron...... 42 Figure 18: ANN Training Process ...... 43 Figure 19: Block Diagram of Fuzzy Logic System ...... 47 Figure 20: Fuzzy Inputs: S* (left) and DA (right) Membership Functions ...... 48 Figure 21: No Uncertainty Results - Remaining Asset Health vs Execution Time ...... 54 Figure 22: Fire Error Results - Remaining Asset Health vs Execution Time ...... 55 Figure 23: Breakup Results - Remaining Asset Health vs Execution Time ...... 56 Figure 24: False Alarm Results - Remaining Asset Health vs Execution Time ...... 57 Figure 25: Comparison of Results for Uncertainty Scenarios ...... 58 Figure 26: Scalability to System Complexity ...... 58 Figure 27: Scalability to System Complexity (sans DP) ...... 59

VI

Chapter 1: Introduction

Wildland fire, a natural agent of change and one of the basic environmental factors on our planet, is an essential tool in regulating complex forest ecosystems causing both destruction and birth in plant and animal life, in an effort to ensure diversity. These complex ecosystems seek a point of criticality, a state of readiness when the correct fuel accumulation is primed for ignition for the fire to fulfill its global role in our planet’s continual survival. This state of readiness provides the balance between destruction and rebirth. The absence of natural occurring fires causes fuel sources to accumulate to hazardous levels. The severity and intensity of the fire causes utter destruction, minimizing the benefits that promote plant and animal diversity. A prescribed burn, a method of mimicking the natural occurrence of a fire attempts to restore the natural fire regime and recondition the ecosystem. The intensity of wildland fires is dictated by a mixture of variables such as fuel accumulation, humidity, wind speed and direction, and dryness. Once the fire is ignited, a column of smoke and heat rises miles into the atmosphere, creating a void below which rapidly funnels more oxygen into the space further fueling the fire. The repeated cycle of air movement creates gale-force winds which can blow fire up to half a mile in distance, hurdling over any fire barrier and starting a new spot fire (1). The human species and nature are not isolated systems but coupled, each playing a vital role in the future of the other.

So the looming question becomes: Why is this important? A fire creates havoc on society and leaves detrimental effects; these uncontrolled fires destroy structures and ultimately cost human lives.

In October 1991, 25 lives were lost and 2900 structures were destroyed in a California wildfire. In

October 2003, 275,000 acres burned in California that cost 15 lives and leveled 2400 structures.

Although technology and management have improved, a recent fire in March 2006 in Texas burned

1

907,245 acres, 80 structures were destroyed and 12 lives were lost (2). The environmental impact of a fire, or lack of, wreaks havoc on an ecosystem for many years. The by-product of smoke impacts the air- quality and respiratory health of anything within the vicinity. The economic impact only increases as the size of the fire increases. Any burn that is human-initiated or naturally occurring has dramatic effects that require extensive amounts of resources if not monitored or controlled. Millions of tax-dollars are used to fund governmental agencies to manage and control wild land fires (3). An estimated $10 billion of fire suppression and resources were used to fight over 90,000 wildfires in 2000 (4).

Emergency situations, namely uncontrolled wildland fires, are undoubtedly complex events within a partially known environment. It is cumbersome to obtain a precise mathematical model of the spatio-temporal behavior of a wild land fire. Nevertheless, in the event that a fire is deemed hazardous, real-time decision-making for resource allocation and control strategy is required although we only possess partial information and an inaccurate model. Using terminology borrowed from control systems, the resources available for fire protection include both “sensors” which enable information gathering and “actuators” which actively suppress the fire and limit its growth. For example, resources available include: ground crews, ground vehicles, UAVs (unmanned aerial vehicles), satellites, aerial vehicles, etc. Some resources as aerial vehicles may act as a sensor (NASA’s Ikhana UAV and the Global

Hawk UAV) to detect fire intensity and direction as well as provide fire suppression (C-130 Tankers and helicopters).

During wildland fires, decision-makers attempt to have an accurate perception of the environment, known as situation awareness. The use of sensors provides data into the system to understand what is currently going on in the field in spite of the inherent uncertainty and incomplete information. Ideally, complete information is needed to update the system continuously, but the data collected by sensors are added to the system in discrete time periods and may include missing bits of

2 information which adds to the complexity of the system. Based on the situational awareness and a model of the environmental and geographical factors, we predict the growth of the fire. A well accepted fire growth simulation model, FARSITE developed by the Department of Agriculture, aids in the prediction of the fire based on the perception of the environment (5). Decisions and resource allocation can be made based on the fire model and the process is repeated until the danger has been eliminated.

The challenge is: ‘Given a set of spatially separate fires and number of resources to suppress the fire, how do we make decisions and allocate our resources optimally to limit the damage in terms of assets destroyed?’ This scenario can be formulated in terms of optimal control theory and the basic approach to solving the problem is simply searching the solution space for the optimized cost. Dynamic

Programming (6) is an exhaustive search algorithm that will search the entire solution space for the minimum cost. However, its drawback is directly related to its method of searching for the most optimized solution, namely the exhaustive search. The amount of processing time to compute the minimum cost increases with the complexity of the system. Therefore, the dynamic programming approach is executed offline. This creates a problem as a wildland fire is a complex, dynamic system that is constantly changing and requires real-time decision-making.

A different approach is required to fulfill the requirements of real-time decision-making and resource allocation while still attempting to achieve the results of dynamic programming. The current research effort seeks to show a new and unique control algorithm, based on Neuro-Fuzzy Dynamic

Programming (NFDP), that can nearly replicate the dynamic programming algorithm results while still remaining robust and adaptive to execute real-time.

The NFDP approach capitalizes on two strategies, namely artificial neural nets and fuzzy logic, which originated from the Artificial Intelligence / Soft Computing community. A detailed perspective and literature survey of these tools are provided in the next chapter. Neural nets mimic the human

3 brain using neurons and synapses to inputs to outputs. Neural nets must be trained and validated prior to use, however, they provide real-time results in an efficient time period as compared to dynamic programming. The neural nets introduced in the form of a reinforcement learning framework are trained to replicate exactly or very near the same solution space as the dynamic programming, but in a fraction of the time. In the perfect world, we would be able to have pure knowledge of the system; however, this is not always the case. The sensors that provide information about the environment, may introduce errors or noise into the system which alters the true perception of the system. Additionally, we have some inherent uncertainties concerning the direction and intensity of the wind, the ability of the fire to jump lines, the possibility of new “hot spots” and the reliability and availability of the resources. The fuzzy logic component provides robustness to the system due to any uncertainty in the system.

The layout of this MS thesis includes five main sections. The second chapter, literature survey, includes the impact of fires and the desire for a robust, real-time execution algorithm; a brief description of the Dynamic Programming, Artificial Neural Networks, and Fuzzy Logic; and current applications of

NDP and NFDP. Chapter 3 details the problem formulation and scenario description. Chapter 4 listed the benchmark methods for data comparison. Chapters 5 and 6 detail the NDP and NFDP algorithm approaches, respectively. Chapter 7 displays the results of the simulation for four algorithms studied: greedy heuristics, DP, NDP, and NFDP along with uncertainty analysis. Finally, chapter 8 lists the conclusions and recommendations for future research.

4

Chapter 2: Literature Survey

Chapter 2 provides a background on the economic impact of general fires and those occurring within the United States. The major challenges of incident commanders are discussed to show the need for intervention to reduce the effect of unintended wildland fires. Next, the three primary control methods are explored: dynamic programming, artificial neural networks, and fuzzy logic. Finally, recent applications of NDP and FDP illustrate the current use of these methodologies.

General Fire Statistics in the U.S.

General fires include a wide range of fires categorized by major property class, such as residential structure; non-residential structure; outdoor rubbish; outside with value; grass, brush, or forest; highway vehicle; other vehicle; and miscellaneous. The information listed is reported by municipal fire departments; it does not include statistics from private fire brigades, state or federal firefighting authorities (7). In 2008, fire departments responded to an estimated 1,451,500 fires, corresponding to a fire department on call every 22 seconds. Within the same year, a motor vehicle theft occurred every 33 seconds during the 2008 year in the United States (8). Unfortunately, 3320 individuals, not including firefighters, lost their lives while 16,705 individuals were injured during the

2008 year. This corresponds to a civilian death every 158 minutes and a civilian injury every 31 minutes.

It is estimated that these fires caused 15.5 billion dollars in direct property damage (9).

Figure 1 and Figure 2 display the number of fires and the average loss per structure in the US, respectively, throughout the time span of 1977 to 2008. The graphs illustrate the inverse relationship between the total number of fires in the US and economic impact over the past 32 years.

5

Figure 1: Estimate of Fires by Type in the United States (9)

Figure 2: Average Loss per Structure Fire in the United States (9)

Wildland Fires in the U.S.

The National Interagency Fire Center (NIFC) is the collection of federal agencies and organizations that serves as the support center for wildland firefighting. Due to the multiple relationships that collectively make up the NIFC, decisions are made by their interagency cooperation

6 concept since there is no direct leader (10). The NIFC has been tracking wildland fire statistics within the

United States since 1960. Figure 3 illustrates the number of wildland fires and acres burned during the

50 years of data collection by the NIFC. A trend line has been included in the graphs using a polynomial function of order 3. Wildland fires have followed similar trends as general fires in the United States.

After a sharp increase into the mid-1080s, the total number of wildland fires decreased dramatically and has remained steady since then. However, the total number of acres burned has steadily risen.

Figure 3: Number of Wildland Fires and Acres Burned from 1960 to 2008 (10)

The National Fire Protection Association (NFPA) categorizes large-loss fires as any fire resulting in at least $10 million of property damage and designated 35 of the estimated 1,451,500 fires reported during the 2008 fire season as large-loss fires. Although these large-large files were a minuscule 0.0002 percent of all general fires, they accounted for 15.3 percent of the total property damage. Four of the

35 large-loss fires were wildland fires yet contributed 40 percent of all economic impact. While the number of wildland fires is minuscule in comparison to general files, the total economic impact is great.

The core strategies and tactics have not changed for fighting wildland fires. Murray Taylor (11), a former firefighter, states:

…the basic strategy of fighting a wildfire has not changed in decades. Firefighters try to encircle the fire and cut it off from fuel sources — to starve the fire — while dousing it from the air with flame retardants. It is labor-intensive, exhausting and often dangerous

7

work…Fundamentally, the way we go about putting out fires is the same as it was 70 years ago. Yes, we have better tools, the fire engines are bigger, the crews are better trained and the aircraft are more modern. But we're dealing with Mother Nature, and she dances a mean boogie.

Fire Growth and Propagation

The intensity and severity of a wildland fire is the result of a mixture of variables such as fuel accumulation, humidity, wind speed and direction, and dryness. Once the fire is ignited, a column of smoke and heat rises miles into the atmosphere, creating a void below, which rapidly funnels more oxygen into the space further fueling the fire. The repeated cycle of air movement creates gale-force winds which can blow fire embers up to half a mile in distance, hurdling over any fire barrier and starting a new spot fire (1).

Fire Escalation Procedures and Challenges

The NIFC adheres to a three-tier plan as a wildland fire escalates in size and intensity. Initially, local level fire departments and agencies are responsible for resource allocation and decisions; additional local agencies may be called upon to provide extra crew and equipment. Given that the fire cannot be suppressed with the local resources, the Geographic Area Coordination Center (GACC) is contacted and the GACC will dispatch a Type 2 Incident Management Team (IMT) to assume responsibility of resource allocation and supplement the local agencies with additional crews and equipment from other agencies within the geographical area. In the event that the emergency supersedes the capabilities of the GACC, the National Interagency Coordination Center (NICC) is contacted and the NICC dispatches a Type 1 Incident Management Team. The Type 1 IMT is the most experienced and trained personnel with the authority to call upon national resources from various agencies (12).

8

There are five primary types of Incident Management Teams, ranging from small local groups

(Type 5) with minimal training to national level groups (Type 1) with extensive training and experience.

The Incident Commander is the acting officer whom oversees all operations during the emergency, performs a situational awareness, and requests resources as needed. However, due to the dynamic nature of the situation, incident commanders often are required to make hurried decisions based on incomplete and inaccurate information. These quick decisions are based on instinct, learned from repeated challenges and fine-tuned over time. Although the decisions are adequate for Type 4 and Type

5 IMTs, they often fall short of effective decision-making and resource allocation at the national level as incident commanders move up in rank. The Type 1 and Type 2 incident commanders are bombarded with data that is incomprehensible for the human brain to manage, thus resulting in a less-than desirable performance (13).

For instance, an incident commander has four possible resources and must allocate three resources for small-scaled fires. The number of permutations equates to 24 different possible outcomes.

(1)

( )

Simply doubling the number of resources and decisions required for a larger-scaled fire, an incident commander must make the optimal choice from 20,160 different possible outcomes.

(2)

( )

New methods, other than instinct and brute force, are needed to analyze the extensive amounts of data that incident commanders are required to comprehend. Dynamic Programming is a classical method of finding the optimal, strategic control action based on a sequence of interrelated decisions.

However, the power of the dynamic programming method for finding the optimal control actions, the

9 exhaustive search of the solution space, is the “Achilles’ Heel” for the wildland application, thereby falling victim to Bellman’s “curse of dimensionality”.

Figure 4: Damage-Time Function (14)

Agoston Restas (14) of the Szendro Fire Department depicts the relationship between the acreage burned and time in the Damage-Time Function illustrated in Figure 4. The destruction caused by a fire can be modeled as an exponential curve that diverges to infinity and the rise of the curve is governed by two factors: the area of the existing fire and the velocity of the fire spread. The amount of acreage burned can be reduced exponentially by small reductions in time.

In most large-scale applications, the DP approach is executed offline due to the time commitment of the algorithm. However, an online approach is required for large-scale, time-constraint applications that the classical DP method cannot fulfill. Therefore, we introduce Neuro-Dynamic

Programming (NDP) to affectively overcome the burden of execution time by allowing the system to learn behavior through simulation and improving upon itself by reinforcement leaning.

Finally, another inherent challenge is the presence of inaccurate and incomplete information in the system. The control actions at each stage of NDP are dependent upon the knowledge of the system’s state at each discrete time period. Introducing fuzzy logic adds an element of robustness into

10 the system. In Chapter 6, the Neuro-Fuzzy Dynamic Programming (NFDP) formulation is described and the results in Chapter 7 illustrates the real time decision-making and resource allocation tasks at near optimal performance while maintaining robustness to handle noise and uncertainties in the system.

Dynamic Programming

Incident commanders (IC) thought process during an emergency situation can be modeled as a closed-loop feedback control system; the IC makes a decision (actuation) concerning resource allocation given situational awareness (processed sensor information), applies a new decision, and repeats the process until the emergency has been resolved. However, strategic decisions at each stage by ICs cannot be made blind to future events, decisions must be made based on the current state and future predictions. Thus, ICs are seeking to optimize usage of their resources such that the undesired situation is returned to a desired state with minimum cost. Dynamic Programming (DP) lends itself well to multi- stage decisions where there is a tradeoff between the current state’s cost and future state’s cost (15).

In addition, DP is considered as the “gold standard” with respect to performance because it allows us to obtain the optimal solution, albeit at a huge computational cost.

The underlining theory of DP is Bellman’s Principle of Optimality: an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision (16). Therefore, we make two assumptions, first, our model is a discrete-time dynamic system which has the form:

( ) (3)

The current state of the system

The next state of the system (subsequent to state )

The control or decision variable

11

Function describes the system and how the state is updated given the control

Second, the cost function is summative for all stages. The transition from state to state using control

does not constitute a free move but incurs a cost.

( ) (4)

Cost associated with moving from state to state given control

Function to calculate the cost

The discrete-time system will transition from the current state to the subsequent state given some probability ( ), for the control . The optimal feedback control policy provides the control given the current state . The principle of optimality shows that we will have the minimum cost if we know the optimal cost for all remaining states starting from state . Therefore, let ̂( ) represent the optimal cost-to-go of all remaining states starting at state .

̂( ) [ ( ) ̂( ) ] (5)

The optimal control at each state is formulated by minimizing Equation (5). The feedback control policy is based on the summative cost of the current state and the optimal expected cost of all subsequent states (16).

The formulation of the ̂( ) function is dependent on the “life expectancy” of the system (16).

Finite horizon problems dictate that the cost accumulates over a fixed number of stages as opposed to infinite horizon problems where costs accumulate indefinitely. There are multiple methods to approach infinite horizon problems, such as stochastic shortest path, discounted, and average cost per stage.

Stochastic shortest path problems are similar to finite problems such that a termination stage will be reached under the optimal policy and the termination stage incurs no cost. However, the number of stages in the problem is random and can change based on the control policy. Therefore, the goal is to

12 find the minimal expected cost to reach the termination stage. Discounted problems are special cases of stochastic shortest path problems, whereas the difference is the preference in the time of the cost: current stage costs have a greater impact on the control policy than future expected costs. Sometimes, the stochastic shortest path nor discounted problem are suitable because the termination stage has an inherent cost or all stages are weighed the same. At this point, the average cost per stage is a feasible option, and is exactly as it is named, optimizing the average cost per stage in an infinite horizon problem (16). For purposes of the thesis, the problem is formulated as a stochastic shortest path problem, thus the discounted problem and average cost per stage problem is not explored any further.

An advantage of DP is the number of required calculations as compared to other methods such as direct enumeration. DP reduces many needless calculations based on the principle of optimality; paths that are known not to be optimal are not considered. The TSP (Traveling Salesman Problem) provides an excellent example: for an ‘ ’ city tour, direct enumeration needs solving possibilities, whereas with DP requires just . For small-scaled problems or situations where real-time results are not needed and the optimal cost can be computed off-line, DP usually is the choice method. However, the drawback is the computational requirements required as the number of states and controls increase, where DP falls prey to Bellman’s “curse of dimensionality”. For high-dimensional systems, a real-time solution is required and we are generally prepared to tradeoff a sub-optimal performance for computational speed (16).

Approximate DP

The bottleneck of the DP algorithm is the calculation of the optimal cost-to-go function, thus replacing the optimal cost-to-go function ̂( ) with an approximation function ̃ ( ) provides near- optimal performance as required. The function ̃ is referred to as the scoring function (or approximate cost-to-go function) and the output of the function ̃ ( ) is referred to as the score (or approximate

13 cost-to-go) at state . The variable r is a vector of parameters as defined for the system. By manipulating Bellman’s equation with the new scoring function, Equation (5) is rewritten as

̃( ) [ ( ) ̃( ) ] (6)

Now minimizing the right-side of the equation provides the near-optimal control ̃( ) given the current state . The evaluation of ̃ ( ) may be characterized as a lookup table or compact representation. When the dimension of the vector r is large, the lookup table is the preferred method; a table stores the output of ̂( ) for all states . The score is then retrieved from the table rather than computed. On the other hand, when the dimension of the vector r is small, the compact representation stores only the vector r and the scoring function ̃ ( ). In this case, the score is computed rather than retrieved from a table.

Neural networks are one general method to approximate the cost-to-go function ̂( ). The vector parameter r represents the synaptic weights of the neural net. Given some state and the synaptic weights of the net, the neural network provides the score ̃ ( ), an approximation of the optimal cost-to-go ̂( ) The key ingredient to the neural net is finding the parameter vector r to minimize the error between the exact cost-to-go ̂( ) and the approximate cost-to-go ̃ ( ) (16).

Artificial Neural Networks

The human brain is a highly complex, nonlinear, parallel-computer, comprised of nearly 10 billion neurons and 60 trillion synapses. The neuron is a specialized cell that transmits an electrochemical signal and the synapse is the brain highway carrying electrochemical signals from the axon of one neuron to the dendrite of another neuron. Each individual neuron is comprised of an input structure (dendrites), the cell body, and an output structure (axon). When a neuron is activated, an

14 electrochemical signal is fired as an output through the axon which in turn becomes the input through the dendrites for the next neuron as illustrated in Figure 5.

Figure 5: Neuron Components and Synapse Structure (17)

Whether the neuron fires an electrochemical signal is dependent on the total strength of the signal received into the cell body through the dendrites, exceeds the threshold level. Learning occurs in the human brain by changes in the synapses that effect how one neuron influences neighboring neurons. Although the processing speed of a single neuron is five to six orders of magnitude slower than a silicon logic gate, the massive number and organization of the neurons permits the human brain to operate at rates and efficiency that the modern day computer cannot match (18).

The artificial neural network (ANN) is designed to mimic the architecture and processing of the biological neural network (see Figure 6). The neuron can receive multiple inputs ( ) with each input associated with a strength or “weight” ( ). The weighted sum of the inputs into neuron minus the threshold level dictates the activation of the neuron, which is sent through a transfer function to produce the output signal.

15

Figure 6: The Neuron Model (18)

The neuron is modeled mathematically as

(7) ∑ and

( ) (8)

The activation function, ( ), is used to contain the neuron output to within a specified amplitude range, accomplished by using activation functions that are fixed, linear, or nonlinear (such as sigmoid).

The architecture of an artificial neural network defines the arrangement of neurons in the network. Single-layer networks have one layer of inputs and one layer of outputs. Multilayered networks extend the single-layer with one or more hidden layers of neurons located between the input and output layers. The chosen structure of the ANN is usually dictated by the learning algorithm to train the network. Fully connected networks have every neuron connected at the next forward layer; partially connected neurons contain some missing connections between the neurons. Figure 7 illustrates fully connected, multi-layered, feed forward ANN where information is propagated through the layers from input to output.

16

Figure 7: 10-Input, 2-Output ANN with 1-Hidden Layer (18)

ANNs attempt to mirror the performance of the human brain by two methods. First, acquire knowledge through a learning process and second, store that knowledge by synaptic weights in the network (parameter vector r).

Neuro-Dynamic Programming (NDP) is the combination of the dynamic programming algorithm utilizing the artificial neural network as the architecture of the approximation of the optimal cost-to-go function. Bertsekas (16) states that “the methods [i.e. NDP] … allow systems to ‘learn how to make good decisions by observing their own behavior, and use built-in mechanisms for improving their actions through reinforcement mechanism’.” In other words, running simulations allows the system to observe its own behavior. Likewise, iterative learning algorithms such as backward propagation permit the neural network to improve upon itself (reinforcement learning), thus minimizing the error between the exact cost-to-go ( ) and the approximate cost-to-go ̃ ( ) to operate at near-optimal performance.

Fuzzy Logic

Conventional control theory is based on mathematical models of differential equations that result in a discrete, binary value of 0 or 1 (true or false). However, these models lack the human

17 intuition that processes are not always 0 or 1. Rather than the binary logic of conventional controllers, fuzzy logic controllers are multi-valued based on degree of membership which better represent the state of the process. Fuzzy logic sets become an ideal tool for complex systems for generating the control rules (19).

Lewis (20) states that fuzzy control is the union of three different disciplines: conventional control theory, artificial intelligence, fuzzy set theory (in particular, approximate reasoning and linguistic variables). Although fuzzy logic does not require a mathematical model of the plant, the detailed structure and evaluation criteria remains the same as conventional control theory. Engineers must still understand the system behaviors and collect the expert knowledge to build the linguistic rules of the system.

Fuzzy controllers are composed of four main elements (21):

1. Rule Base: The rule base is a series of If-Then-Else statements in linguistic nature that describes

the expert’s knowledge of the system.

2. Inference Mechanism: The inference mechanism serves two main purposes:

a. Matching: Determine to what extent that each rule is applicable in the system. There is

a tendency to characterize fuzzy logic as probability, yet from a ‘time’ point-of-view,

these two concepts are different. Probability measures the likelihood that an event will

occur in the future as opposed to fuzzy logic which measures the ambiguity of past

events.

b. Inference Step: Conclude which actions to take based on the rules determined to apply

at the given time.

18

3. Fuzzification Interface: Fuzzification converts the crisp, numeric input values into fuzzy sets that

the inference mechanism can then apply the rules.

4. Defuzzification Interface: Defuzzification is the result of evaluating the rule base to create a

fuzzy set that is converted into crisp output values. Although the number of defuzzification

methods is numerous, two commonly used methods are the Center of Gravity (COG) and the

Center-Average.

Figure 8: Fuzzy Controller

Previous Research

Dimitri Bertsekas has been a pioneer in the development of NDP applications. Bertsekas et al.

(22) tackled a complex, dynamic problem of resource allocation for the Theater Missile Defense (TMD) problem, categorized as a dynamic programming/Markovian decision problem. The scenario is setup such that an attacker has a pre-defined number of ballistic missiles and missile launchers, and the defender has a pre-defined number of interceptors and interceptor launchers. The goal of the attacker is inflict the maximum damage to the assets of the defender, as opposed by the defender’s goal to minimize the damage to their assets. It is assumed that the attacker will launch ballistic missiles in discrete time waves during a finite window of time. Bertsekas states that “the TMD interceptor

19 allocation problem is a dynamic decision problem, since a series of decisions must be made over an extended period of time, with the consequences of a given decision affecting the subsequent decisions.”

Bertsekas et al. solved the problem by Dynamic Programming but due to the large number of states and complexity, it could not compute the control policy in real-time. Therefore to overcome the real-time decision-making issue, they experimented with different approximation architectures, neural network/multilayer perception architecture and linear architecture using feature extraction to aid in the approximation of future states as well as a greedy-based heuristic. They experimented with 24 test cases varying from situations of “overwhelmed defender” to “overwhelmed attacker”.

The general trend of the results showed the Dynamic Programming as the superior algorithm in regards to remaining asset health with nearly optimal results with the approximation architectures.

Although the computation time to execute was not recorded, Bertsekas et al. state that the NDP methods took approximately 10 minutes to 2 hours whereas the exact DP took as much as 36 hours.

The greedy heuristic method was at or near optimal in some of the cases, but the data may be offset as the training parameters (step sizes, number of simulation runs, etc) were fine tuned for one of the cases. Bertsekas et al. (22) concluded that

A. The performance of the NDP algorithms was highly dependent on tuning of the training

parameters.

B. None of the various NDP methods proved predominant.

C. The NDP methods scale well with the complexity and size of the problem. To further test the

complexity of the problem, Bertsekas et al. suggest introducing phases of the incoming ballistic

missiles, i.e. boost phase, midcourse phase, and terminal phase which requires more decision

making.

20

The Bertsekas TMD problem laid the foundation work for this thesis in which different aspects are expanded or included. Bertsekas et al. made a few assumptions to the model that this thesis further explores. They assume that complete and accurate knowledge of the environment and the engagement probability to be constant at 0.9. This thesis explores the sensitivity the DP and NDP algorithms are to various engagement probabilities as well as incorporating several different uncertainty factors. They also concluded on introducing ballistic phases to improve decision-making; this thesis plans to investigate a multi-layered approach in its decision-making.

Fuzzy systems have been utilized for solving optimization problems in conjunction with dynamic programming. Senthil Kumar and Palanisamy designed a fuzzy dynamic programming method for unit commitment of a power plant, based on the Neyveli Thermal Power Station, India (23). Unit commitment problem (UCP) is the optimization of the on/off states of generating units to minimize operating costs given specified constraints over a predetermined time frame. Thus, the goal of the problem was to find the minimum cost given the load demand on the power plant and the spinning reserve requirements. The fuzzy dynamic programming (FDP) algorithm was based on a 24 hour time period at the power plant with 1 hour time steps. At each time step, the production cost was calculated based on the fuzzy inference system. The FDP was compared to the standard DP algorithm for four different generating unit power systems.

System Units Method Total Cost, PU CPU Time, s 7 DP 1 180 FDP 0.9964029 158 10 DP 1 246 FDP 0.937094 221 20 DP 1 502 FDP 0.9437873 472 26 DP 1 1849 FDP 0.9557328 1827 Table 1: Cost and CPU Time Results (23)

21

The FDP total cost represents the percentage of DP cost. In each of the four cases, Kumar and

Palanisamy found that the FDP proved that it is comparable with the conventional DP method. In addition, found that the CPU time was linear as system units increased; thus, showing favorable results of scalability as the system grows in complexity.

Reasoning Behind NFDP

Exact mathematical optimization methods have a solid foundation and proven ability to provide the optimal results desired in the wildland fire application. However, it lacks scalability as the complexity of the system increases and cannot provide the optimal results in real-time fashion. Thus, the approximate methods act as the supporting role to approximate future states. The benefit of computing results in real-time with approximate methods has the drawback producing near-optimal results. This inherent cost is well worth the benefit of real-time results. Another drawback of the mathematical optimization methods is its ability to handle uncertainty. Heuristic-based approaches do not require knowledge of system models and are versatile in system uncertainty. Therefore, the union of optimization methods, Neuro-Fuzzy Dynamic Programming, is selected to utilize the benefits of both the heuristic algorithms and the mathematical optimization. Figure 9 illustrates the hybrid of the mathematical and heuristic algorithms.

22

Figure 9: Taxonomy of Optimization Approaches

23

Chapter 3: Problem Formulation and Scenario Description

The resource allocation problem concerning wildland fire, which is the main focus of this thesis, is modeled as an attacker-defender style game, such that the defender is defending its assets while the attacker is attempting to destroy the assets, based on the approached developed by Bertsekas et al. (22) for the TMD problem. The assets of the system are any resources of economic value that we are striving to protect, e.g. structures (governmental, commercial, and private), agriculture, land, etc. Each asset is assigned a property value (level of importance) giving preference to protecting one asset group over another. The “attack vector” is comprised of the wildland fire hotspots that are burning through the landscape. The “defense vector” is comprised of the collection of resources to protect the assets, e.g. ground crews/vehicles, aerial vehicles, etc. The components of the two vectors are described in more detail later in the chapter.

Although the realistic situation is a continuous system, in our scenario we will assume a discrete time model. At least one fire attack will occur at each time step to ensure that the simulation will terminate in a finite time due to destruction of all assets or the elimination of all fires. Proving this special case implies that the solution is valid for time periods in which no attacking fire occurs. We assume that the attacking fires at each time step are independent of one another and the selection of the asset attack by a fire is based on equal probability. Since the likelihood of a successful attacking fire or defense measure is not 100 percent, we assign the attacking fire and defense a probability of fulfilling its intended purpose. ( ) denotes the probability of a fire suppression successfully defending an asset from an oncoming fire. Likewise, ( ) represents the probability of an attacking fire successfully causing damage to an asset.

Each asset is assigned an initial health value which is decremented by one unit each time a

fire successfully reaches a target. represents the remaining health of asset and the asset becomes 24

completely destroyed once . The end goal is maximum health of the remaining assets by the end of the simulation and is calculated as:

(9) ∑

Number of assets

Property value of asset

Total health of asset

Remaining health of asset

Vector Assignment

Three vectors (reduced state vector, attack vector, defense vector) are used to store all the necessary data of the scenario and are updated after each discrete time step.

Reduced State Vector

( ) (10)

Number of assets

Remaining health of the th asset

Number of fire suppression resources

Number of fires

Attack Vector

( ) (11)

Number of assets

Number of fires attacking the th asset

25

Subject to:

Defense Vector

( ) (12)

Number of assets

Number of fire suppression resources defending the th asset

Subject to:

It is assumed that the attacker has partial knowledge of the reduced state vector. The attacker generates an attack vector based only on those assets in the reduced state vector where the remaining health of the asset is greater than zero; the exact health of the asset is not vital data but only that the asset is still a viable option for attack.

Single-Layer Approach

Figure 10: Single-Layer Defense

26

The simplified version of the scenario is a single layer defense to counter incoming fires attacking asset . If the defense fire suppression is unsuccessful in defending the asset based on ( ), then the fire diminishes the health of the asset group by one unit given ( ).

Three-Layer Approach

In order to more closely mirror the wildland fire environment, the scenario is extended to include multiple layers of defense to impede an attack. Given that a fire suppression is unsuccessful, the fire moves into the next layer where additional fire retardants may be deployed. In this case, a fire must successfully elude multiple attempts to effectively cause damage to the asset. Figure 11 depicts the

three-layered defense approach. Asset is attacked by fire . Defense vector is selected to eliminate the attack in the first layer. If the defense could not eliminate the threat, the surviving attack

fire moves into layer 2, and so forth. represents the attacking fire that successfully navigated through the three layers and reached its intended asset.

Figure 11: Three-Layer Defense

The three-layer approach provides a more realistic model of a wildland fire. The approach calls for defense resources to be available at each layer of defense and must be accounted for in the defense vector and reduced state vector. The defense vector maintains the number of defense resources at each layer and the reduced state vector maintains the number of defense resources available for use at each layer.

27

Reduced State Vector

( ) (13)

Number of assets

Remaining health of the th asset

Number of remaining fire retardant resources at layer

Number of remaining fires

Attack Vector

( ) (14)

Number of assets

Number of fires attacking the th asset at layer

Subject to:

Defense Vector

( ) (15)

Number of assets

Number of fire retardant resources defending the th asset at layer

Subject to:

28

The decision-making process of the defense vector at each layer is independent of one another.

Each layer is supplied with the updated reduced state vector and attack vector.

Receding Horizon Approach

The problem takes the three-layer approach one step further to mirror the real-world situation.

In a finite-horizon model (24), an agent forgoes current rewards to optimize its reward after discrete steps. In subsequent steps, the agent makes decisions on steps until the reward is one step away.

This approach assumes that the agent knows how far away the horizon is for its decision making. A modification to the finite horizon is the receding horizon; the agent will continuously make decisions based on the horizon always appearing discrete steps away. The receding horizon is ideal for the wildland fire as the terminal stage of the scenario is unknown. The three-layer approach is extended to a cyclical path and the algorithm makes decisions based on a three step horizon (three-layer approach), illustrated in Figure 12. The receding horizon creates a virtual continuous environment that attacks are initiated at discrete steps and the simulation continues until one of two situations is satisfied: the total number of fires is exhausted or the total number of resources is depleted.

Figure 12: Receding Horizon

29

Wildland Fire Predictions

The predictive services (25) program is a unit under the guidance of the National Predictive

Services Subcommittee located throughout the United States at various NICC and GACCs establishments. The program “was developed to provide decision support information needed to be more proactive in anticipating significant fire activity and determining resource allocation needs…[and] consists of three primary functions: fire weather, fire danger/fuels, and intelligence/resource status information (25).” It provides daily outlooks up to 3-month seasonal trend forecasts and aides in their short and long-term strategies for resource allocation. The predictive services function allows us to make some assumptions for the wildland simulation and construct initial inventories for the attackers and defenders:

The total number of wildland fires for the scenario is estimated based on the 3-month

seasonal trend forecasts.

The daily outlooks provide the maximum number of fires that can be created at each

discrete time step of the simulation.

The total number of defense resources available for use over the length of the simulation, at

each layer

The maximum number of defense resources available for use at layer at each time step

Example: First Wave of Attack (1 Time Step) Attacker Defender Probabilities Layer 1 Layer 2 Layer 3

( ) ( ) 10 4 5 1 5 2 5 2 1.0 0.9

Table 2: Initial Inventories for Attacker and Defender

30

The attacker has targeted three distinct assets: . The probability of a fire causing damage to an asset is ( ) ; the probability of a defense resource eliminating an attack threat is ( ) . The assets have initial property values of and total health

values of . Before the first wave of attacks, the remaining asset health is calculated as

Asset 1 Asset 2 Asset 3

(16)

In the first wave of attack, three fires are ignited and target each of the three assets. In the first layer, the control policy deploys one defense resource to defend its highest asset (the other two fires move freely onto the second layer) and the defense resource was not able to eliminate the threat. In the second layer, the control policy deploys two more defense resources to protect its highest two assets. The attacking fire targeting the highest valued asset was eliminated. In the third layer, two defense resources attempt to stop the two remaining attacking fires and successfully defends the lowest valued asset. Since ( ) , the attacking fire has successfully decremented the value of the middle asset by 1 unit. The remaining asset health is

(17)

Figure 13 illustrates the sequence of defense measures used to defend the three assets.

Figure 13: Schematic of First Attack Wave

31

The figure is shown as a three-layer approach for illustrative purposes but adheres to the receding horizon method. The control policy used for the example is random and is not depictive of any of the four algorithms used in this thesis.

A total of five defense measures were used during the first attack wave, decrementing the number of defense resources available at each respective layer. The environment produced three attacking fires, two reaching the assets and decrementing the value of each by one unit. The initial inventory is reduced to 8 after the first wave attack wave. The scenario continues with future

attacks in new waves until the remaining asset health or the total number of fires is exhausted.

Uncertainty Analysis

Ideally the information gathered would be complete and accurate, simplifying many of the constraints imposed on the problem. However, we must handle uncertainty within our system since error may be introduced due to the reliability of sensors. Three different uncertainty cases are explored:

1. Fire Error Percentage: The fire error percentage is the percentage increase in the number of

fires over the estimated prediction. The control algorithms utilize the estimated number of fires

to allocate resources accordingly throughout the simulation. The increase in fires will ultimately

affect the results of the resource allocation.

2. Breakup Percentage: The breakup percentage is the possibility that a fire “jumps” and creates

additional hotspots which occurs only at the third layer. An additional fire is added to the attack

vector accordingly.

32

3. False Alarm Percentage: The false alarm percentage is the possibility that a fire hotspot is

rendered harmless in the third layer. The fire is removed from the attack vector accordingly.

Figures of Merit

The figures of merit of the four control methodologies (greedy-based heuristic, DP, NDP, NFDP) are based on three parameters: execution time, remaining asset health, and scalability. The execution time quantifies the requirement of real-time decisions. The faster the execution time equates to a quicker reaction time to find the control policy. The remaining asset health is a measure of how well the control algorithm performed in protecting its assets from the attacker. The higher the remaining asset health results in a more successful control algorithm. Therefore, a promising algorithm is one where the execution is fast (sufficient for real-time decision-making) and a high remaining asset health. Finally, we examine the scalability of the system based on the complexity of its initial configuration.

Five scenarios are simulated, each scenario with a unique combination of initial attacker and defender inventories. The scenarios include a variety of situations where each entity has a distinct advantage and where the entities are equal. The same five scenarios are repeated with uncertainty in the system, as listed in the Uncertainty Analysis section.

Software and Platform

The algorithms were coded in the Java (version 1.6.0_14) programming language within the

Netbeans IDE (version 6.5.1). Java is an object oriented, class-based, high-level programming language that can be executed on any Java Virtual Machine (JVM). Java code is compiled into Java bytecode rather than platform specific machine code, which makes the language a robust and portable language for any computer architecture, e.g. Windows, Macintosh, UNIX, and Linux. Therefore, developers rely on the “write once, run anywhere” idea. This portability is known to hamper the performance of the

33

Java platform but advances in virtual machines and compilers have improved the performance to within the same category of platform-dependent languages (26). The Netbeans IDE application is an open- source and free software package that provides a graphical user interface for code development and the key Java platform to execute the algorithms (27).

Syscape

In conjunction with Edaptive Computing, Inc. the Java code can be implemented into Edaptive’s product Syscape (28), picture in Figure 14.

Figure 14: Primary Features of Edaptive's Product – Syscape (29)

Syscape is a highly flexible and customizable framework technology that can perform system analysis on structure and behavior of systems-of-systems to assist in decision-making. It is capable of capturing complex systems “that integrate multiple, independent and self-contained systems to provide a capability greater than the sum of its constituent parts” (29). Syscape permits users to enter

34 simulation properties such as resource organization and constraints and model process and rules for the agents of the system.

35

Chapter 4: Benchmark Methods

The DP and greedy-based heuristic algorithms are used to benchmark the performance results and were designed to strictly adhere to the Bertsekas approach. Although Bertsekas presents the NDP methodology, this method is described in the next chapter as we made various implementation changes to training the neural net and integrating the multi-layer approach.

Greedy-Based Heuristic

The premise of the greedy-based heuristic approach is to give preference to the highest valued assets at all times; remaining assets are protected based on resource availability and property value.

The algorithm strictly adhered to the design of Bertsekas with the exception of the multi-layered approach, which decisions of resource allocation at each layer are made independent of one another.

Once a fire is detected in a particular layer, the algorithm inventories the current state of the assets prioritizing the assets based on property value. The heuristic matches one-for-one every attack with a defense for the current highest property valued assets. If the number of remaining defense resources is greater than the expected number of remaining fires, then the surplus of defense resources are used on lower valued assets in decreasing order matching every attack with one defense. Otherwise, no further defense allocation is made.

Dynamic Programming

The problem is cast as a Markovian decision process, such that ( )( )( ) represents the probability that the new state will be ( ) given the current reduced state, attack vector and defense vector. A Markov chain describes the transition probabilities of the system, the probability that attack

36 will occur given the current state . Since the problem is setup as a stochastic shortest path problem, then ̂( ) can be formulated as:

̂( ) ∑ ( ) ( ) (18)

where ( ) is the optimal expected long-term cost starting at state ( ) and ( ) is the conditional probability that the next attack is given the current state . With the reduced optimal cost,

̂( ), at reduced state , Bellman’s equation can be rewritten as

(19) ̂( ) ∑ ( ) [ ( ) ̂( ) ]

where

(20) ( ) ∑ ( ( ) ( ))

Equation (20) represents the one time step cost to transition from state to state . The goal is to find the defense vector that minimizes the expected long-term cost given the current state and attack vector . Since we know that they system will terminate in finite time (based on previous assumptions that at least one attacking fire will occur at each time point and, either the attacking fires are exhausted or the number of assets are destroyed), Bellman’s equation will converge to a unique solution, the reduced optimal cost ̂( ), for all states .

The DP approach is complicated by the use of multiple layers of defense. The transition from state to state inherently includes substates that result from the engagement of defense resources against the attack at different layers but does not incur a cost, a previously noted for DP that implies transition cost to be summative, i.e. a fire eluding a defense at any layer excluding the final

37 layer has not reach an asset. The defense allocation is the composite of all the substate defenses selected; the layer defense allocations are dependent on previous layer engagement results and the expectations of future layer engagements. The defense vector selected at each layer is the composite that minimizes the expected cost at the end of the layer sequence. Figure 15 shows the substates that transition from state to state .

Figure 15: State transitions with Substates

DP lacks the ability to make good decisions with the absence of a cost to transition through the substates. To overcome this issue, the cost to transition from state to state is distributed to the appropriate substates as if each is considered a one-layer approach. The control action selected is then based on the summation of the substates plus the optimal cost-to-go, adhering to the constraint that cost is summative. The DP algorithm is updated as:

(21) ̂( ) ∑ ( ) [ ( ) ̂( ) ]

Theoretically, equation (22) can be solved by classical methods by iterating over the equation such that the generated sequences ( ) will converge to the optimal cost for all states .

(22) ( ) ∑ ( ) [ ( ) ( ) ]

38

However, the solution by exact methods is computationally expensive due to the large number of states. To overcome this drawback, a series of Monte Carlo simulations are performed on scenario models to collect empirical data for the expected engagement result , given the current state , attack vector , and defense vector . This essentially allows DP to learn from observing its own behavior and eliminates the need to perform probabilistic analysis to create the Markov chain for transition probabilities. The Bertsekas’ TMD work is extended by decreasing the engagement probability to examine the sensitivity of the DP algorithm. We expect the performance of the DP algorithm to decrease provided that the number of Monte Carlo simulations does not change between the various engagement probabilities. Given a lower engagement probability permits more options of transition states as compared to a system that is (nearly) deterministic. For the DP to be trained properly, it must explore as many paths as possible to get an accurate expected value. This is not the case for deterministic systems as the expected value to transition from state to is static. To compensate for the need for DP to train more, the number of Monte Carlo simulations is increased for the lower engagement probabilities.

39

Chapter 5: NDP Formulation

The Neuro Dynamic Programming (NDP) algorithm is formulated based on the concept of reinforcement learning, the idea of “an agent that must learn behavior through trial-and-error interactions with a dynamic environment” (24).

Figure 16: The Standard Reinforcement-Learning Model (24)

Agent is provided the current state of the environment via perception and performs an action which changes the state of the environment. The subtle difference between the figure above and the general closed feedback system is the environment variable fed into the agent, providing both an input signal to the agent and a reinforcement signal , a scalar value to numerically value the state transition. The input is how the agent perceives the environment, with complete state knowledge or with some added noise in its perception. The overall goal for the agent is to select a control policy such that it maximizes the long-run summation of the reinforcement signal. Therefore, the “learning” is accomplished through systematic trail-and-error algorithms, such as Artificial Neural Networks (ANNs)

(24).

The NDP algorithm focuses on approximating the reduced optimal cost function ̂( ) from equation (19) by utilizing ANNs.

40

(19) ̃( ) ∑ ( ) [ ( ) ̃( ) ]

The approximation produces near-optimal results as compared to the DP algorithm, but can significantly decrease execution time of the algorithm; these two ideas are explored later in the chapter. A suitable approximation ̃( ) replaces the optimal expected-to-go cost ̂( ) of the DP algorithm. The variable

represents the vector of parameters used in conjunction with the future state to approximate the cost-to-go function (22). In this case, the vector of parameters is the synaptic weights and the thresholds of the artificial neural network.

Neural Network Architecture

Aydin Gürel (30) developed a feed-forward neural network in Java to approximate a simple function using batch training. The source code was acquired under licensing GPLv3 which allows for free use of the source code and modifications. Based on the soundness of the design and license agreement GCLv3, the neural network provided the foundation of the approximation method to further develop the NDP algorithm. The structure of the ANN was modified to a fully-connected, feed- forward network, single hidden layer with eight neurons, designed specifically for the scenario described in chapter 3. Figure 17 illustrates the architecture and how the reduced state vector information is passed into the input neurons.

41

Figure 17: Fully Connected ANN with 7 Input Neurons, 8 Hidden Neurons, & 1 Output Neuron

All the neurons in the hidden layer and the neuron in the output layer utilize a symmetrical sigmoid function (31) for the activation function

(23) ( )

where represents the input into the neuron. The sigmoid function is a commonly used activation function as it is continuous and differentiable (31). This becomes important for the learning algorithm described later in the chapter. The weight matrix is a 64 x 1 matrix, each element in the weight matrix representing a synapse connection in the ANN.

(24)

[ ]

42

The neural network is trained using Approximate Policy Iteration using Monte Carlo Simulations by iterating a predefined amount over the following four steps. Figure 18 shows an illustrated view of the cyclical training process.

Figure 18: ANN Training Process

1. Neural Network Approximator

The weight matrix is initialized with random weights and the neural network generates

values for the cost-to-go function ̃( ).

2. Policy Update

The optimal control policy is obtained based on the generated cost-to-go values from

the ANN, by the following equation:

̅( ) [ ( ) ̃( ) ] (25)

3. Monte Carlo Simulations

An initial reduced state is randomly created and all possible attack vectors are

generated for that state. The optimal policy in step 2 generates the defense vector for every

43

( ) pair. The simulation continues the scenario generating defense vectors for every ( )

pair until a terminating state is reached. Sample costs are calculated from the equation:

̂ ̂ (26) ( ) ∑ ( ) [ ( ) ( ) ̅( )]

The combination of the reduced state and its respective sample cost represents the training

set used for training the neural network. The Monte Carlo simulations are executed multiple

times such that a sufficient training set is generated for training.

4. Neural Network Training

The collection of inputs-outputs from step 3 is used to batch train the neural network

with the goal of minimizing the error between the output of the neural network and the sample

costs. Training is the means by which the algorithm learns from its own behavior. The learning

algorithm uses back-propagation by the gradient descent method and since the gradient

descent method requires differentiating the activation function to minimize the error function,

the sigmoid function guarantees continuity and differentiability (31). The neural network trains

the nets up to 500 times with the batch training data; training is prematurely interrupted if an

error ratio tolerance of 0.15 is met for 10 consecutive training steps.

After completion of neural network training, the NDP algorithm is primed for production use.

Although it takes time to train the ANN, the process listed above is completed offline so that the parameter vector is adjusted accordingly; subsequently, the ANN becomes a ‘fixed network’ (17) once simulation runs are executed. Since the ANN is trained offline, the training time is removed from the execution time of the simulation.

44

Challenges

Individuals who employ ANNs must make a series of nontrivial decisions about the parameters of the neural network that can ultimately affect the outcome of the simulation. There are no formal rules or guidelines for the optimal neural net architecture; i.e. the number of hidden layers, the number of neurons, or correct activation function. Duin (32) states that ANNs are often larger than necessary and can be downgraded to a smaller design without significant change in behavior. But he continues that ‘some redundancy in a network is advantageous for good training’ (32). These decisions on the

ANN architecture are usually based on previous experience and the knowledge of the application at hand, accompanied with minor modifications during testing. Adjustments in the error ratio tolerance have an inverse relationship with respect to training time and error in the neural net. Tuning the learning rate and momentum rate parameters for the optimal convergence is usually done by trial-and- error methods or through a random search (31). Overtraining an ANN is also a common pitfall of neural networks, similar to selecting too large a degree of polynomial for curve fitting. The goal of the neural network is to generalize the system for mapping inputs-to-outputs, but overtraining an ANN might introduce the noise and errors from the training set.

NDP overcomes Bellman’s curse of dimensionality by approximating the cost-to-go function.

The computation burden of DP is reduced since high dimensional cost-to-go functions are approximated by low dimensional functions resulting in a quicker execution. The drawback is that results obtained via

NDP are not optimal since the optimal cost-to-go function is approximated. If the ANN is trained properly, it is possible to achieve results that are near-optimal. In certain cases, as in wildland scenarios, forgoing optimal results for near-optimal results such that the decisions are made in real-time is an acceptable tradeoff.

45

Chapter 6: NFDP Formulation

Removing the assumptions about the number of fires and their characteristics introduces a new complexity to the problem, uncertainty to our inputs that the prior algorithms DP and NDP struggle to cope with during simulations. As the NICC provides predictions for wildland fires, we are reminded that these are solely predictions; changes in weather conditions and sensor reliability may provide inaccurate and incomplete data into the simulation. The NFDP algorithm seeks to minimize the impact of the uncertainty by providing robustness in the system and scalability through local decision-making.

The NDP algorithm uses just one trained ANN to compute the approximate cost-to-go ̃( ) in its formulation. The NFDP approach extends this formulation and utilizes neural networks at each layer,

i.e. ̃ ( ) where is the layer. The one step cost defined in equation (20) is modified to estimate the contribution of each step cost by utilizing a fuzzy logic controller. Expert knowledge is incorporated into the controller through linguistic reasoning to estimate the cost based on two parameters described below.

(27) ( ) ∑ ( ( ) ( ))

The normalized weight defined as

(28)

Output of the fuzzy logic controller, defined in more detailed below.

The defense resources must be used effectively given that the estimate of the maximum number of predicted fires may be inaccurate. The rule-base for the fuzzy inference system is based on the rules of engagement by Naveh et al. (33) in the Strategic Defense Initiative and terminal phase

Arrow interceptor system. The framework is based on the theater missile defense and the transition to

46 the wildland fire scenario is mostly transparent as the concepts of the two are similar. Naveh states that the interceptor allocation is based on the defended asset (DA) requirement per asset and the specific layer, and classification of each are fuzzy in nature. In addition, few discrete layers without the relative location of the missile in the layer may cause suboptimal allocation of interceptors. These concepts hold true for the wildland fires and the fuzzy inputs are the required defense level per defended asset (DA) and the normalized distance of the fire along its trajectory (S*). The output of the fuzzy inference system is the value (for each asset ) stated in equation (27).

Fuzzy logic is a tool to handle the uncertainties in the system; it is able to smooth out control actions based on the idea of many-valued logic that result in system robustness.

Fuzzy Inference System

Figure 19 is a schematic of the multi-input, single-output Sugeno Fuzzy Inference System (FIS).

Figure 19: Block Diagram of Fuzzy Logic System

The fuzzification stage calculates the degree of membership of the input crisp values: asset level of defense (DA) and the normalized distance (S*). Each input uses a combination of sigmoid and

Gaussian distributions for the fuzzy set and the configuration of the input membership functions consisted of five fuzzy sets for each.

S* = {boost, boost-mid, midcourse, mid-term, terminal}

47

DA = {low, low-medium, medium, medium-high, high}

Figure 20 is the MATLAB depictions of the two fuzzy inputs S* and DA.

Figure 20: Fuzzy Inputs: S* (left) and DA (right) Membership Functions

The second stage of the FIS applies the heuristic rules, a series of IF-THEN statements of the format:

IF normalized distance is initial AND level of defense is high THEN output is medium

In the presence of multiple IF-THEN statements, the AND conjunction evaluates the minimum of the weights. Every rule is a unique combination of each fuzzy set of the inputs to form a total of 25 rules.

S* Boost Boost-Mid Midcourse Mid-Term Terminal DA Slightly Low Zero Very Light Light Medium Medium Slightly Very Low-Medium Very Light Light Medium Medium Medium Slightly Very Slightly Medium Light Medium Medium Medium Heavy Slightly Very Slightly Medium-High Medium Heavy Medium Medium Heavy Very Slightly High Medium Heavy Very Heavy Medium Heavy

48

Table 3: Fuzzy Rule Base for Calculation of Weights, Vk

The inference system is designed as a Sugeno style so outputs of the rule base are crisp values denoted in Table 4.

Output Function Name Value (zj) Zero 0 Very Light 0.2 Light 0.25 Slightly Medium 0.5 Medium 0.75 Very Medium 1 Slightly Heavy 1.5 Heavy 2 Very Heavy 3

Table 4: Output Membership Values

Finally, the defuzzification stage converts the rule base results into a final crisp output value.

The output level of each rule is weighted by the firing strength of the rule. The final output of the

FIS is the weighted average of all rule outputs computed as:

∑ (29)

The total number of rules

The weight of the rule

The output level at the rule

Challenges

Without a mathematical model of the system, tuning the parameters of the FIS can be a trial- and-error process. The typical approach is to start with a few number of membership functions and rules, and once satisfied with a general outcome, fine tune by expanding on the membership functions and rules. The controller was initially designed with only three membership functions for both inputs

49 and four output membership functions; this configuration subsequently had a rule base of 9 rules.

There are a few different methods used to fine-tune the fuzzy controller. First, the type of membership functions (Gaussian, sigmoid, triangular, etc.) can be changed and the parameters of the distribution.

Second, the output membership functions can be adjusted similar to the input membership functions.

Third, modifications are made to the rule base. The original controller was designed using rectangular membership functions and was adjusted to a combination of sigmoid and Gaussian distribution membership functions. Likewise, the center point and standard deviation σ parameters of the sigmoid and Gaussian membership functions were modified. Since the output membership functions were constants, the output membership value was altered slightly. Finally, modifications were made to the rule base as changing the output results based on the inputs into the system. Once the fuzzy controller was validated with the test data, it was expanded to five membership functions for each input and nine output membership functions, subsequently generating a rule base of 25 rules. As the original controller provided a solid basis of a controller, the expansion fine-tuned the controller even more to its optimal state.

50

Chapter 7: Simulation Results

The goal of the thesis is to evaluate three different aspects: sensitivity of engagement probability, allocation performance under different levels of uncertainty, and scalability of the different approaches to system complexity. Cases 2-4 are neutral scenarios where the number of defense resources and fires are nearly identical. Cases 1 and 5 where either the fires or the defense have a distinct advantage, respectively.

Fires Defense Resources Layer 1 Layer 2 Layer 3

Case 1 15 3 3 1 3 1 3 2 2 15 4 5 1 5 1 5 2 3 15 5 5 1 5 2 5 2 4 15 3 3 1 5 1 8 2 5 15 3 7 1 7 1 7 2 Table 5: Test Case Scenarios

The simulations were based on three assets with initial property values of {5, 10, 15} and

total health values of {5, 5, 5}. The ideal algorithm extinguishes all fires and scores a perfect value of

30 units.

Sensitivity to Engagement Probability

The system would be considered a deterministic system if the engagement probability of the defense resources were 1.0. An area of interest is how the performance would be impacted due to varying degrees of engagement probability. Since the Dynamic Programming approach provides the optimal and benchmark results, only the DP approach was used to evaluate this area. In all cases, the simulations were simulated under the same conditions with three different engagement probabilities of

0.9, 0.8, and 0.7.

51

Remaining Asset Health Case 0.9 0.8 0.7 1 21.80 19.80 16.80 2 27.50 23.80 24.10 3 27.80 24.80 23.60 4 28.10 27.10 24.30 5 29.50 28.50 27.40 Average 26.94 24.80 23.24 Table 6: Sensitivity – Remaining Asset Health

It would be expected that low scenarios where engagement probabilities were more stochastic would result in poorer performance. To counter the decreased engagement probability, the number of

Monte Carlo simulations was increased to allow the DP to train better from observing its own behavior.

The adverse effect to achieve similar results by increasing the Monte Carlo simulations was the increased execution time, as evident in Table 7.

Execution Time (s) Case 0.9 0.8 0.7 1 375 978 2,742 2 5,243 16,651 41,739 3 9,770 40,346 111,641 4 4,780 17,286 37,460 5 31,193 90,281 216,938 Average 10,272 33,108 82,504 Table 7: Sensitivity - Execution Time (s)

Due to the large amount of time to run simulations for 0.8 and 0.7 engagement probabilities and the purpose of the thesis to show how artificial intelligence and fuzzy logic can provide robustness, the remainder of the results are focus solely on the engagement probability of 0.9.

52

Uncertainty Analysis

The uncertainty analysis is subdivided into four sections: no uncertainty, fire error, breakup, and false alarm. Each section displays the results of the simulations in tabular for the remaining asset health and execution time, and a final graphical representation of outcomes.

No Uncertainty

The no uncertainty analysis assumes complete knowledge and accurate information of the system.

Case Heuristic NDP NFDP DP 1 18.15 18.60 18.25 21.50 2 23.93 24.68 25.29 27.50 3 23.17 25.38 24.50 28.40 4 24.43 27.19 25.46 28.40 5 28.32 29.20 29.04 28.80 Average 23.60 25.01 24.51 26.92

Table 8: No Uncertainty – Remaining Asset Health

Case Heuristic NDP NFDP DP 1 0.055 0.015 0.078 86 2 0.065 0.071 0.055 1,790 3 0.035 0.179 0.066 2,851 4 0.043 0.072 0.094 1,495 5 0.047 0.190 0.094 9,601 Average 0.049 0.105 0.077 3,165

Table 9: No Uncertainty – Execution Time (s)

53

Figure 21: No Uncertainty Results - Remaining Asset Health vs Execution Time

Fire Error

The fire error analysis increases the total number of fires in the simulation by 20%. All resource allocation decisions are made based on the original estimation.

Case Heuristic NDP NFDP DP 1 16.40 16.10 15.49 16.00 2 22.10 22.31 23.67 23.80 3 21.87 22.16 23.47 21.60 4 23.34 23.35 23.94 22.40 5 26.99 28.00 27.89 28.60 Average 22.14 22.38 22.89 22.48 Table 10: 20% Fire Error – Remaining Asset Health

Case Heuristic NDP NFDP DP 1 0.068 0.016 0.094 13 2 0.037 0.095 0.067 1,414 3 0.034 0.195 0.072 2,624 4 0.045 0.089 0.079 1,376 5 0.068 0.016 0.078 10,613 Average 0.050 0.082 0.078 3,208 Table 11: 20% Fire Error – Execution Time (s)

54

Figure 22: Fire Error Results - Remaining Asset Health vs Execution Time

Breakup Percentage

The breakup increases the number of fires at the third layer by 50%, the idea that a fire jumps and creates new hotspots.

Case Heuristic NDP NFDP DP 1 15.49 12.50 15.20 17.40 2 20.26 20.29 22.16 23.70 3 20.46 20.42 22.15 23.10 4 21.51 21.94 19.89 22.90 5 26.20 25.20 27.54 27.80 Average 20.79 20.07 21.39 22.98 Table 12: 50% Breakup – Remaining Asset Health

Case Heuristic NDP NFDP DP 1 0.063 0.010 0.094 124 2 0.096 0.065 0.082 1,654 3 0.049 0.168 0.107 2,848 4 0.052 0.072 0.061 1,809 5 0.068 0.016 0.078 11,709 Average 0.065 0.066 0.084 3,629 Table 13: 50% Breakup – Execution Time (s)

55

Figure 23: Breakup Results - Remaining Asset Health vs Execution Time

False Alarm Percentage

The false alarm decreases the number of fires by 50% in the third layer, the idea that a fire fizzles out naturally.

Case Heuristic NDP NFDP DP 1 23.81 26.50 24.28 25.60 2 26.96 29.27 28.06 29.06 3 26.96 28.87 28.00 29.00 4 27.39 29.30 28.83 29.60 5 29.44 30.00 29.62 30.00 Average 26.91 28.79 27.76 28.65 Table 14: 50% False Alarm – Remaining Asset Health

Case Heuristic NDP NFDP DP 1 0.067 0.015 0.078 111 2 0.062 0.111 0.070 1,543 3 0.059 0.218 0.097 3,394 4 0.047 0.067 0.046 1,573 5 0.569 0.016 0.078 11,570 Average 0.161 0.085 0.074 3,638 Table 15: 50% False Alarm – Execution Time (s)

56

Figure 24: False Alarm Results - Remaining Asset Health vs Execution Time

Summary

Table 16 and Table 17 display the average results over all scenarios and the average over all uncertainty scenarios.

Heuristic NDP NFDP DP No Uncertainty 23.60 25.01 24.51 26.92 Fire Error 20% 22.14 22.38 22.89 22.48 Breakup 50% 20.79 20.07 21.39 22.98 False Alarm 50% 26.91 28.79 27.76 28.65 Average Results of 23.28 23.75 24.01 24.70 Uncertainty Cases Table 16: Summary – Remaining Asset Health

Heuristic NDP NFDP DP No Uncertainty 0.049 0.069 0.077 3,165 Fire Error 20% 0.050 0.082 0.078 3,208 Breakup 50% 0.065 0.066 0.084 3,629 False Alarm 50% 0.161 0.085 0.074 3,638 Average Results of 0.092 0.078 0.079 3,492 Uncertainty Cases Table 17: Summary – Execution Time (s)

57

Figure 25: Comparison of Results for Uncertainty Scenarios

Scalability

The final metric of interest is the scalability of the approaches shown in Figure 26. The DP increases a couple orders of magnitude as the complexity of the system increases as opposed to the other methods.

Algorithm Scalability to System Complexity 10,000.00

1,000.00

100.00 Heuristic 10.00 NDP

1.00 DP

ExecutionTime (s) 1 2 3 4 5 NFDP 0.10

0.01 Case

Figure 26: Scalability to System Complexity

58

The DP results are removed from the figure since it distorts the ability to examine the results of the other methods since overwhelms the graph due to its large swing in execution time. Additionally, another test case was added to evaluate scalability of more complex scenarios (the DP algorithm was subsequently removed from this test case).

Fires Defense Resources Layer 1 Layer 2 Layer 3

Case 6 22 4 8 2 8 2 8 2 Table 18: Additional Test Case for Scalability

Figure 27 provides a better representation of the execution time to system complexity for the heuristic, NDP and NFDP algorithms.

Algorithm Scalability to System Complexity (sans DP) 1.00

1 2 3 4 5 6

Heuristic 0.10 NDP

NFDP ExecutionTime (s)

0.01 Case

Figure 27: Scalability to System Complexity (sans DP)

59

Chapter 8: Conclusions & Recommendations for Future Work

There are three key areas of interest for this thesis: sensitivity to various engagement probabilities, ability to handle uncertainty, and scalability. Bertsekas et al. (22) used an engagement probability of 0.9 for the TMD problem, a probability that can be considered realistic for purposes of the problem. It is difficult to find literature or any acceptable agreed upon probability of using fire retardants to extinguish a fire although understandably will be anything less than 1.0. The results of the thesis are based on the 0.9 engagement probability for two reasons; first, to compare the benchmarks cases with Bertsekas and second, to provide a solid foundation to implement a fuzzy logic component.

To supplement a study in lower probabilities of engagement, the thesis explored the results solely for

Dynamic Programming for probabilities of 0.9, 0.8 and 0.7. The expected outcome is a decrease in performance (remaining asset health) when engagement probabilities are lower; a higher chance that a fire escapes a defense. The results subsequently show the decrease in performance. Initial tests of the sensitivity without changing the amount of learning performed (i.e. Monte Carlo simulations) resulted in a larger depreciation of results such that the simple greedy-based heuristics outperformed the DP which intuitively should not occur.

The DP minimizes the expected value of the one step cost plus optimal cost-to-go. The 0.9 probability case does not require an extensive amount of Monte Carlo simulations to find the expected value. State will transition to based on the control policy ; given that the 0.9 case is nearly- deterministic, it is not necessary to execute numerous Monte Carlo simulations since there will not be a lot of differing states. A total of 15 Monte Carlo simulations were found to be suitable to calculate the expected value. However, as the engagement probability decreases, the number of possible states increases and 15 Montes Carlo simulations does not sufficiently explore all the additional states for a suitable expected value. To overcome this drawback, the number of Monte Carlo simulations is

60 increased to 30 and 60 for the 0.8 and 0.7 cases, respectively. The increase permits the DP to explore more subsequent states for a better estimation of the expected value. The results in Table 6 show the smaller decrease in remaining asset health for the various engagement probabilities as expected. The drawback for increasing the number of Monte Carlo simulations is seen in Table 7. The execution time needed to improve the DP learning greatly increases the time for a suitable solution, an unacceptable byproduct of the lower engagement probabilities.

The second metric is the performance of the various algorithms to varying levels of uncertainties along with the benchmark case of perfect knowledge of the system. The no uncertainty case ensures the design sufficiently correlates with the work by Bertsekas et al. and further provides a solid foundation to compare results for the uncertainty cases. The sensitivity analysis provided insightful information on the execution time of the DP approach. The NDP and NFDP algorithms have substantially faster execution times but do not reflect the required offline training performed by means of a DP simulation. To expedite the process of gathering results for the DP, NDP, and NFDP, the DP algorithm was tested with fewer Monte Carlo simulations and found that seven Monte Carlo simulations to calculate the expected value improved the execution time with minimal impact on the results.

The DP provided the average optimal result for all cases1 at a huge computational cost, two orders of magnitude longer than the other approaches. On the other hand, the greedy-based heuristic using linguistic rules had the best execution time since it simply evaluates the current state of the system without any mathematical optimization or dependence on future states. Subsequently, this approach results in the worst performance. The performance of the NDP and NFDP for remaining asset health and execution were similar. Each had quick execution time by approximating the cost-to-go function but suffered the cost of poorer performance for that approximation. However, the decrease in

1 The DP was optimal in all cases for 15 Monte Carlo simulations. The decrease in Monte Carlo simulations decreased the performance of the final case.

61 remaining asset health was minimal for the reward of quicker execution time. Excluding the NFDP algorithm since this option was not explored in the Bertsekas TMD paper, the results of the greedy- based heuristic, NDP and DP algorithms were sufficient and solidified the current work.

Three uncertainty cases were explored for the thesis: an increase in the number of fires in the simulation, the probability of fires breaking up in the third layer and creating new fires, and the probability of a fire being rendered harmless naturally in the third layer. This information is unknown to the optimization algorithms prior to their execution. In two of the three uncertainty scenarios, the DP algorithm failed to handle the unknown information and resulted in weaker remaining asset health performance. Essentially, the DP utilized its defense resources on lower valued assets with its own expectation of the number of remaining fires resulting in poor decision-making. However, its strong performance in the false alarm scenario was able to over the deficiencies to have the best overall remaining asset health performance for the uncertainty scenarios. In all three cases though, the execution time of the DP again eliminates the approach as a feasible real-time decision tool.

The objective of the NDP algorithm was to reduce the computational costs of DP by training an artificial neural network to approximate the optimal cost-to-go function and the NDP was trained without any knowledge of uncertainty. NDP results were affected by the uncertainty but not nearly to the same effect as DP. The NFDP performed the best of the real-time decision making algorithms for the uncertainty scenarios. The fuzzy logic component incorporated into the one step cost function provided robustness in the presence of uncertainty. By using linguistic reasoning, the one step cost is weighted by the fuzzy controller to aid in the decision-making, effectively making better decisions. Additionally, whereas NDP makes decisions on the collective effort of the three layers of defense, NFDP decouples local decision-making permitting a more consistent execution time across the different cases even as system complexity increases. The results of the uncertainty call for two additional areas of

62 recommended research. First, how the NDP and NFDP would perform under lower engagement probabilities based on the same uncertainty analysis. Second, introduce another uncertainty where the engagement probability is uncertain and may vary in a range of plus or minus 0.1. The control algorithms would be trained based on an assumed value but encounter various engagement probabilities during simulation.

The final metric is the scalability of the algorithms to system complexity due to increased defense resources and fires in the scenario. The increased size of resources and fires requires more resource allocation tasks and decisions to be made. The cases are ordered in terms of increasing complexity (cases 2 through 4 can be considered roughly the same in terms of complexity). The DP algorithm fared the worst of the algorithms with increases on the order of magnitudes in time. The DP explores all feasible states to find the optimal control policy, prompting a significant increase in execution time to explore those possible states. This is inherently true for the training of the NDP and

NFDP algorithms but since those algorithms are trained offline, the execution time is not reflected in the final online simulation time. The execution times of NDP and NFDP remained relatively stagnant since cost-to-go values are computed utilizing approximation methods. The NFDP appears to fare slightly better in scalability since decisions are made locally at each layer whereas NDP still relies on the collective decision-making between the layers requiring more time and decisions. It is recommended to execute additional test cases with sizable increases in complexity to statistically confirm this conclusion.

The development of this thesis was its intended use as a decision support system during wildland fires. FARSITE (5) is a commonly used fire growth and behavior simulation tool used to simulate wildland fires and evaluate their characteristics. In addition, users can make decisions on various defense mechanisms and evaluate their effectiveness at the conclusion of the simulation. What

FARSITE lacks is an innovative decision support system to complement the simulation tool. The decision

63 algorithms defined in this thesis, in particular NFDP, would allow users to execute simulations utilizing this tool to make optimal or near optimal decisions on the allocation of their resources.

This thesis lays the groundwork for the decision making tool but would require further enhancements for a FARSITE integration. First, this work is developed as a linear system and requires an extension to the spatial-temporal FARSITE environment. Therefore, the optimization algorithm must account for spatial position of its resources and timing. Second, this approach uses a three-layered defense (three step horizon). The FARSITE model permits a larger horizon from which decisions are made. The DP algorithm shows its inability to provide real-time execution while the NDP and NFDP approaches were promising algorithms. However, since the change to a spatial-temporal environment and larger horizon requires more possible control options to evaluate and further look ahead render the

NDP as an infeasible option. Recall, the NDP approach’s scalability may become an issue as the complexity of the scenario increases since the NDP layers rely on the choices made at each layer.

It is envisioned that the algorithm integration is formulated as a feedback loop. The FARSITE simulation will progress one time step and the NFDP algorithm makes a control decision given the perception of the environment, estimating the cost of one time step plus the approximated cost-to-go.

Since decisions are made locally at the layer, the decision is made in real-time and the contol action is fed into the FARSITE model. This process continues until an ending state is reached.

Given such a large solution space, it can be overwhelming to a user to find the optimal allocation of resources. In addition, the knowledge captured during simulation is retained by the user and it can be difficult to transfer that knowledge to other users. The decision support tool extracts that knowledge and allows the user to focus on other tasks. Thus, users are able to simulate a fire and utilize the decision making algorithms to aid in the allocation of resources. Additionally, as FARSITE is a visual

64 simulation tool, this feature allows users to visually watch and evaluate the effectiveness of the control algorithm.

65

Bibliography

1. Walch, Brian. TIME. TIME. [Online] November 5, 2007. [Cited: June 10, 2010.] pg 14-17. http://www.time.com/time/classroom/glenspring2008/pdfs/Nation.pdf.

2. Fire Information - Wildland Fire Statistics. National Interagency Fire Center. [Online] [Cited: January 7, 2010.] http://www.nifc.gov/fire_info/historical_stats.htm.

3. The Ecology of Fire. Fire & Aviation Management. [Online] [Cited: January 7, 2010.] http://www.fs.fed.us/fire/fireuse/rxfire/ecology/index.html.

4. Mandel, J., et al., et al. A Note on Dynamic Data Driven Wildfire Modeling. SpringerLink. [Online] May 2004. [Cited: October 10, 2008.] http://www.springerlink.com/content/vxwtuulbgnd21hg3.

5. Missoula Fire Sciences Laboratory. FARSITE. FireModels.org Fire Behavior and Fire Danger Software. [Online] July 20, 2009. http://firemodels.fire.org/content/view/112/143/.

6. Kirk, Donald E. Optimal Control Theory - An Introduction. s.l. : Dover Publications, Inc., 1970.

7. Ahrens, Marty. Overall Fire Statistics. National Fire Protection Association. [Online] http://www.nfpa.org/assets/files//PDF/OS.Trends.pdf.

8. Federal Bureau of Investigation. Motor Vehicle Theft - Crime in the United States 2008. Federal Bureau of Investigation - Uniform Crime Reports. [Online] September 2009. [Cited: February 20, 2010.] http://www.fbi.gov/ucr/cius2008/offenses/property_crime/index.html.

9. Karter Jr., Michael J. Fire Loss in the United States 2008. National Fire Protection Assocation. [Online] September 2009. [Cited: February 9, 2010.] http://www.nfpa.org/assets/files/PDF/OS.fireloss.pdf.

10. National Interagency Fire Center. National Interagency Fire Center. [Online] [Cited: February 1, 2010.] http://www.nifc.gov.

11. Nielson, John. Fabled Santa Ana Winds Fuel Wildfires in California. NPR. [Online] [Cited: March 2, 2010.] http://www.npr.org/templates/story/story.php?storyId=15584420.

12. National Interagency Fire Center. Wikipedia. [Online] January 28, 2010. [Cited: March 22, 2010.] http://en.wikipedia.org/wiki/National_Interagency_Fire_Center.

13. Crooks, Ken. Data-Analysis Program Aids in Resource Allocation. Fire Chief. [Online] December 1, 2009. [Cited: March 10, 2010.] http://firechief.com/technology/ar/resource-allocation-analysis-200912/.

14. Forest Fire Management Supporting by UAV Based Air Reconnaissance Results of Szendro Fire Department, Hungary. Restas, Agoston. Corte-Ajaccio : IEEE, 2006. Environment Identities and Mediterranean Area, 2006. pp. 73-77.

66

15. Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 3rd Edition. Belmont : Athena Scientific, 2005. Vol. 1.

16. Bertsekas, Dimitri P and Tsitsiklis, John N. Neuro-Dynamic Programming. Belmont : Athena Scientific.

17. Stergiou, Christos and Siganos, Dimitrios. Neural Networks. Imperial College - Department of Computing. [Online] [Cited: September 19, 2010.] http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html.

18. Haykin, Simon. Neural Networks. New York : Macmillan College Publishing , 1994. ISBN 0- 02-352761-7.

19. Gupta, Madan M. Fuzzy sets and systems. McGraw's-Hill's AccessScience Encyclopedia of Science & Technology Online. [Online] McGraw-Hill. [Cited: May 09, 2010.] http://www.accessscience.com. DOI 10.1036/1097-8542.276950.

20. Lewis III, Harold W. The Foundations of Fuzzy Control. New York : Plenum Press, 1997.

21. Passino, Kevin M. and Yurkovich, Stephen. Fuzzy Control. Menlo Park : Addison Wesley Longman, Inc., 1998.

22. Missile Defense and Interceptor Allocation by Neuro-Dynamic Programming. Bertsekas, Dimitri P, et al., et al. 1, January 2000, IEEE Transactions on Systems, Man and Cybernetics, Vol. 30, pp. 42-51.

23. A Hybrid Fuzzy Dynamic Programming Approach to Unit Commitment. Kumar, S Senthil and Palanisamy, V. March 2008, IE(I) Journal-EL, Vol. 88, pp. 3-9.

24. Reinforcement Learning: A Survey. Kaelbling, Leslie P, Littman, Michael L and Moore, Andrew W. s.l. : AI Access Foundation and Morgan Kaufmann Publishers, 1996, Journal of Artificial Intelligence Research, Vol. 4, pp. 237-285.

25. National Interagency Coordination Center. GACC >Predictive Services. National Interagency Fire Center. [Online] http://www.predictiveservices.nifc.gov/predictive.htm.

26. About Java Technologies. Oracle. [Online] [Cited: August 3, 2010.] http://www.sun.com/java/about/.

27. Netbeans. Netbeans. [Online] [Cited: August 3, 2010.] http://netbeans.org.

28. Edaptive Syscape: Engineer's Electronic Napkin. Edaptive Computing, Inc. [Online] [Cited: August 3, 2010.] http://www.edaptive.com/syscape.php.

29. Welcome to Edpative Computing, Inc. Edaptive Computing, Inc. [Online] [Cited: August 3, 2010.] http://www.edaptive.com.

30. Gurel, Aydin. Feed Forward Neural Network in Java . Aydın Gürel's Home Page . [Online] http://www.ncorpus.com/aydingurel/.

67

31. Rojas, Raul. Neural Networks A Systematic Introduction. Berlin : Springer, 1996.

32. Duin, Robert P.W. Learned from Neural Networks. Department of Applied Physics, Delft University of Technology.

33. Naveh, Ben-Zion, Levy, E and Cohen, Kelly. Theater Ballistic Missile Defense Architecture Development. [book auth.] Ben-Zion Naveh and Azriel Lorber. Theater Ballistic Missile Defense. Reston : AIAA, 2001, 6, pp. 77-97.

34. Ecological Society of America. ESA - Education and Diversity. [Online] Winter 2002. [Cited: January 7, 2010.] http://www.esa.org/education_diversity/pdfDocs/fireecology.pdf.

35. Badger, Stephen G. Large-Loss Fires in the United States - 2008. National Fire Protection Association. [Online] http://www.nfpa.org/assets/files//PDF/LargeLoss.pdf.

36. Interagency Aviation Training. SEAT 3: SEAT Firefighting Tactics. Interagency Aviation Training. [Online] [Cited: March 2, 2010.] https://www.iat.gov/Training/modules/seat/seat3.html.

37. Wildland Home Safety. Ebbets Pass Fire District. [Online] May 7, 2007. [Cited: October 25, 2010.] http://www.epfd.org/wildland%20home%20safety.htm.

38. Finney, Mark A. Rocky Mountain Research Station. US Forest Research and Development. [Online] March 2004. [Cited: Febuary 20, 2009.] http://www.fs.fed.us/rm/pubs/rmrs_rp004.html.

68