Energy Consumption Predictor
Total Page:16
File Type:pdf, Size:1020Kb
Modeling and Predicting Energy Consumption Predicting Energy Usage on the ASHRAE Kaggle Competition Kevin Pham (kpham123), Vikram Shanker (vshanker), Vikas Yadav (vikasy) CS221 P-Poster Problem Choosing Features Hybrid RF + Gradient Boosting (continued) Neural Net Identifying and reducing energy consumption is We performed EDA by computing To explore models in SKLearn, a variety of linear The neural network (still a work in one of many necessary avenues in tackling cross-correlation of all of the features with each and decision-based approaches were trained and progress) will use two topographies. The climate change. A large source of energy other. This showed that weather data has a much fitted using default hyperparameters on a subset first is the following network topography. consumption goes into the heating, cooling, and lower correlation with meter reading and our of the data and the different values of MSE after The fundamental idea behind it is that the powering buildings. The goal of our project is to current models are deprioritizing and/or 5-fold cross-validation is shown in box-plots. features of the building that affect energy build a model that can be used to accurately removing those features during training. The consumption are independent of the predict the pre-improvement energy meter reading is highly correlated with building weather features. Finally, the time of day is consumption of buildings with energy area and slightly correlated with floor counts. also independent. Therefore, they are each improvements based on their use case, size, grouped independently in their own age, and local weather information. The hidden layer. predicted usage can be compared with the actual usage to understand the value added by specific energy improvements, thereby creating a greater understanding of how these energy Gradient boost algos have seen recent progression improvements impact energy usage. Correlation matrix also shows that air and top proposed algorithms are XGBoost, temperature is highly correlated with dew Support vector regressor was not considered due LightGBM, and recently CatBoost. LightGBM is Dataset temperature, and wind direction is correlated to its poor runtime performance on large number chosen due to fast training time as CatBoost with wind speed, thus, the feature dimension is of data points. The basic linear regression, seems to only outperform with very high numbers - The data set consists of three types of data: reduced by dropping dew temperature and wind ElasticNet (combo of Ridge and Lasso by reducing of categorical variable. Above, we graph the (1) building information, (2) local weather, direction. model complexity and reducing number of feature importances of the trained model, and (3) energy usage. The building data, for We convert our single categorical var, “ features), and Huber (less susceptible to outliers) showing that meter type, building_id, 1449 buildings, consists of its location, area, primary_use”, with one-hot encoder. The performs the lowest. A single decision tree is also air_temperature are prioritized. Of interest is age, floor count, usage type. The weather and timestamp field of hourly measurements across mediocre but we see that the innovations of square_feet and building_id are 1:1 uniquely The second topography will be a fully energy usage data have various time-series 2016 is decomposed into “month”, “weekday”, random forest, extra trees, and gradient boosting mapped, so square_feet possibly has high connected topography. We will explore over one year sampled at every hour, total “hour” fields. However, the weather patterns in algorithms are promising. Random forests (and its importance due to its large comparative performance with different numbers of size ~1.5GB. The data source is a Kaggle December are quite similar to January but are cousin Extra Trees) are easier to train but are magnitude and is a candidate for normalization. hidden nodes. competition scored by RMSLE [1]. not captured. So we use trig functions to map generally outperformed by modern boosting algos By using default parameters of ExtraTrees and time features to a unit circle and define 2 features that are carefully tuned to avoid overfitting. To LightGBM, the model was able to achieve RMSLE - The weather data has air and dew per time feature, such as “month_sin”, balance this tradeoff, the chosen approach is to of 1.5 and at the time of writing, 2444/2933 on temperatures, wind speed and direction, “month_cos”. Also, we ran into general issues due train two concurrent models 1) Extra trees, 2) leaderboard. Training on Surface Book 2 took 2.25 cloud coverage and precipitation. to dataset size (basic manipulation and LightGBM and average their prediction. hrs with n_jobs=2, or using 2 cores that are Link to Video MemoryError) - after importing, we change 64-bit hyperthreaded into 4 threads. Further work is - The two data sets (building and weather) data types to the smallest possible size in numpy necessary to explore parallelization and Google Cloud for training speedup. have missing values. We addressed this using that can still represent the information. shorturl.at/euxIZ standard methods of dropping and imputing as follows: References - cloud coverage and precipitation columns are dropped as they are missing more than 50% of values [1] https://www.kaggle.com/c/ashrae-energy-prediction/data [2] - missing floor counts are replaced by 1 as https://www.energy.gov/eere/buildings/about-building-energy-mo that is the mode (more than 90%) and deling missing age was replaced by the median. [3] https://arxiv.org/pdf/1607.06332.pdf [4] https://www.researchgate.net/publication/316653386_Modeling_ energy_consumption_in_residential_buildings_A_bottom-up_an alysis_based_on_occupant_behavior_pattern_clustering_and_st ochastic_simulation Next Steps More Data Exploration - - Identify outliers and/or trends that could give more features Gradient Boosting Model - - Explore sensitivity to: data scaling/normalization - - Hyperparameter search (built-in, RandomizedCV, ‘hyperopt’) - - Model training speedup (parallelization?) Neural Net Model - - finish implementation - - explore topographies (number of hidden nodes, whether it is fully connected).