Data Center Optimization Jim Gao, Google

Machine Learning Applications for Data Center Optimization Jim Gao, Google Abstract The modern data center (DC) is a complex interaction of multiple mechanical, electrical and controls systems. The sheer number of possible operating configurations and nonlinear interdependencies make it difficult to understand and optimize energy efficiency. We develop a neural network framework that learns from actual operations data to model plant performance and predict PUE within a range of 0.004 +/ 0.005 (mean absolute error +/ 1 standard deviation), or 0.4% error for a PUE of 1.1. The model has been extensively tested and validated at Google DCs. The results demonstrate that machine learning is an effective way of leveraging existing sensor data to model DC performance and improve energy efficiency. 1. Introduction The rapid adoption of Internetenabled devices, coupled with the shift from consumerside computing to SaaS and cloudbased systems, is accelerating the growth of largescale data centers (DCs). Driven by significant improvements in hardware affordability and the exponential growth of Big Data, the modern Internet company encompasses a wide range of characteristics including personalized user experiences and minimal downtime. Meanwhile, popular hosting services such as Google Cloud Platform and Amazon Web Services have dramatically reduced upfront capital and operating costs, allowing companies with smaller IT resources to scale quickly and efficiently across millions of users. These trends have resulted in the rise of largescale DCs and their corresponding operational challenges. One of the most complex challenges is power management. Growing energy costs and environmental responsibility have placed the DC industry under increasing pressure to improve its operational efficiency. According to Koomey, DCs comprised 1.3% of the global energy usage in 2010 [1]. At this scale, even relatively modest efficiency improvements yield significant cost savings and avert millions of tons of carbon emissions. While it is well known that Google and other major Internet companies have made significant strides towards improving their DC efficiency, the overall pace of PUE reduction has slowed given diminishing returns and the limitations of existing cooling technology [2]. Furthermore, best practice techniques such as hot air containment, water side economization, and extensive monitoring are now commonplace in largescale DCs [3]. Figure 1 demonstrates Google’s historical PUE performance from an annualized fleetwide PUE of 1.21 in 2008 to 1.12 in 2013, due to implementation of best practices and natural progression down the learning curve [4]. Note the asymptotic decline of the trailing twelvemonth (TTM) PUE graph. 1 Fig 1. Historical PUE values at Google. The application of machine learning algorithms to existing monitoring data provides an opportunity to significantly improve DC operating efficiency. A typical largescale DC generates millions of data points across thousands of sensors every day, yet this data is rarely used for applications other than monitoring purposes. Advances in processing power and monitoring capabilities create a large opportunity for machine learning to guide best practice and improve DC efficiency. The objective of this paper is to demonstrate a datadriven approach for optimizing DC performance in the sub1.10 PUE era. 2. Methodology 2.1 General Background Machine learning is wellsuited for the DC environment given the complexity of plant operations and the abundance of existing monitoring data. The modern largescale DC has a wide variety of mechanical and electrical equipment, along with their associated setpoints and control schemes. The interactions between these systems and various feedback loops make it difficult to accurately predict DC efficiency using traditional engineering formulas. For example, a simple change to the cold aisle temperature setpoint will produce load variations in the cooling infrastructure (chillers, cooling towers, heat exchangers, pumps, etc.), which in turn cause nonlinear changes in equipment efficiency. Ambient weather conditions and equipment controls will also impact the resulting DC efficiency. Using standard formulas for predictive modeling often produces large errors because they fail to capture such complex interdependencies. Furthermore, the sheer number of possible equipment combinations and their setpoint values makes it difficult to determine where the optimal efficiency lies. In a live DC, it is possible to meet the target setpoints through many possible combinations of hardware (mechanical and electrical equipment) and software (control strategies and setpoints). Testing each and every feature combination to maximize efficiency would be unfeasible given time constraints, frequent fluctuations in the IT load and weather conditions, as well as the need to maintain a stable DC environment. 2 To address these challenges, a neural network is selected as the mathematical framework for training DC energy efficiency models. Neural networks are a class of machine learning algorithms that mimic cognitive behavior via interactions between artificial neurons [6]. They are advantageous for modeling intricate systems because neural networks do not require the user to predefine the feature interactions in the model, which assumes relationships within the data. Instead, the neural network searches for patterns and interactions between features to automatically generate a bestfit model. Common applications for this branch of machine learning include speech recognition, image processing, and autonomous software agents. As with most learning systems, the model accuracy improves over time as new training data is acquired. 2.2 Model Implementation A generic threelayered neural network is illustrated in Figure 2. In this study, the input matrix x is an (m x n) array where m is the number of training examples and n is the number of features (DC input variables) including the IT load, weather conditions, number of chillers and cooling towers running, equipment setpoints, etc. The input matrix x is then multiplied by the model parameters matrix θ 1 to produce the hidden state matrix a [6]. In practice, a acts as an intermediary state that interacts with the second 2 parameters matrix θ to calculate the output hθ(x) [6]. The size and number of hidden layers can be varied to model systems of varying complexity. Note that hθ(x) is the output variable of interest and can represent a range of metrics that we wish to optimize. PUE is selected here to represent DC operational efficiency, with recognition that the metric is a ratio and not indicative of total facilitylevel energy consumption. Other examples include using server utilization data to maximize machine productivity, or equipment failure data to understand how the DC environment impacts reliability. The neural network will search for relationships between data features to generate a mathematical model that describes hθ(x) as a function of the inputs. Understanding the underlying mathematical behavior of hθ(x) allows us to control and optimize it. Fig. 2 Threelayer neural network. Although linear independence between features is not required, doing so can significantly reduce the model training time, as well as the chances of overfitting [8]. Additionally, linear independence can simplify model 3 complexity by limiting the number of inputs to only those features fundamental to DC performance. For example, the DC cold aisle temperature may not be a desirable input for predicting PUE because it is a consequence of variables more fundamental to DC control, such as the cooling tower leaving condenser water temperature and chilled water injection setpoints. The process of training a neural network model can be broken down into four steps, each of which are covered in greater detail below: (1) Randomly initialize the model parameters θ , (2) Implement the forward propagation algorithm, (3) Compute the cost function J(θ) , (4) Implement the back propagation algorithm and (5) Repeat steps 2 4 until convergence or the desired number of iterations [7]. 2.2.1 Random Initialization Random initialization is the process of randomly assigning θ values between [1, 1] before starting model training. To understand why this is necessary, consider the scenario in which all model parameters are initialized at 0. The inputs into each successive layer in the neural network would then be identical, since they are multiplied by θ . Furthermore, since the error is propagated backwards from the output layer through the hidden layers, any changes to the model parameters would also be identical [7]. We therefore randomly initialize θ with values between [1, 1] to avoid the formation of unstable equilibriums [7]. 2.2.2 Forward Propagation Forward propagation refers to the calculation of successive layers, since the value of each layer depends upon the model parameters and layers before it. The model output hθ(x) is computed through the forward l l propagation algorithm, where aj represents the activation of node j in layer l , and θ represents the matrix of weights (model parameters) mapping layer l to layer l + 1 . 2 1 1 1 1 1 1 1 1 a 1 = g(θ10x 0 + θ11x 1 + θ12x 2 + θ13x 3) 2 1 1 1 1 1 1 1 1 a 2 = g(θ20x 0 + θ21x 1 + θ22x 2 + θ23x 3) 2 1 1 1 1 1 1 1 1 a 3 = g(θ30x 0 + θ31x 1 + θ32x 2 + θ33x 3) 2 1 1 1 1 1 1 1 1 a 4 = g(θ40x 0 + θ41x 1 + θ42x 2 + θ43x 3) 3 2 2 2 2 2 2 2 2 2 2 hθ(x) = a 1 = g(θ10a 0 + θ11a 1 + θ12a 2 + θ13a 3 + θ14a 4) Bias units (nodes with a value of 1) are appended to each nonoutput layer to introduce a numeric offset 1 1 within each layer [6]. In the equations above, θ10 represents the weight between the appended bias unit x 0 2 and the hidden layer element a 1 . The purpose of the activation function g(z) is to mimic biological neuron firing within a network by mapping the nodal input values to an output within the range (0, 1).

Data Center Optimization Jim Gao, Google

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support