National Election Prediction

Total Page:16

File Type:pdf, Size:1020Kb

National Election Prediction

National Election Prediction

Lei Xu University of Wisconsin—Madison ECE/CS 539 Introduction to Artificial Neuron Network and Fuzzy Systems

ABSTRACT President election is always a hot topic for this country .And as the new turn of election is coming ,I just curious about how the individuals elect for their president ? Is there any regular we can find to predict one’s choice based on his personality ? For my project ,I’d like to construct a multi-layer perceptron Artificial Neural Network using for make the prediction for an individual’s vote result based on four main features:race,age,education level and income. If we want to do the prediction ,we need to do the classification task first and train the artificial neuron network.The configuration of the Neuron Network also plays a n essential role .In my project ,discovering the most well-performed ANN structure is also an key part. Finally ,if I can access to enough detailed election data ,I will be able to predict the final result of the election with the trained Artificial Neuron Network.

PROBLEM STATEMENT It is extremely difficult, if not impossible, for a politician to estimate if he/she can win in the coming election. A politician may able to predict his ballot in each individual region. However, the final election is always hard to foresee,which may largely depends on your opponent’s campaign.the other competitors in each event. However in my project ,I simplify the problem into a classification problem .Using 4 features of an individual voter to judge one’s election choice.Using the Artificial Neuron Network to process the voter data then generates the results.

BACKGROUND 1.About the election The United States presidential election of 2016, scheduled for Tuesday, November 8, 2016, will be the 58th quadrennial U.S. presidential election. Voters will select presidential electors who in turn will elect a new president andvice president through the Electoral College. The series of presidential primary elections and caucuses is taking place between February 1 and June 14, 2016, staggered among the 50 states, the District of Columbia and U.S. territories. This nominating process is also an indirect election, where voters cast ballots for a slate of delegates to a political party'snominating convention, who then in turn elect their party's presidential nominee. The 2016 Republican National Conventionwill take place from July 18 to July 21, 2016 in Cleveland, Ohio. The2016 Democratic National Convention will take place from July 25 to July 28, 2016 inPhiladelphia, Pennsylvania. Businessman and reality television personality Donald Trump became thepresumptive nominee of the Republican Party on May 3, 2016, after the suspensions of Ted Cruz and John Kasich's campaigns, respectively, and his win in theIndiana primary. He is expected to face the as of yet undetermined nominee of the Democratic Party in the general election, presumably either Hillary Clintonor Bernie Sanders.

2. Multilayer perceptron A multi-layer perceptron is a type of feed-forward neural network of threshold units . Multi-layer perceptrons are composed of an input layer of neurons, successive layers of intermediate units, and a layer of output neurons. The output of each layer is connected to the input of the next layer. A synaptic weight is associated with the each unique connection between neurons in neighboring layers. Each neuron itself is associated with a hyper plane, and classifies its input based on which side of the hyper plane the input falls, this classification is then passed on to neurons in the next layer. To be used for classification, the weights and activation functions of each neuron must be calibrated so that when feature vectors are inputted to the input layer of neurons, the correct classification vector is outputted from the output neurons.

3. Back-propagation Backpropagation, is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of a loss function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the loss function. Backpropagation requires a known, desired output —an individual’s election result ,for each input value—the four dimension feature vector, in order to calculate the loss function gradient. It is therefore usually considered to be a supervised learning method,. It is a generalization of the delta rule to multi- layered feedforward networks, made possible by using the chain rule to iteratively compute gradients for each layer.

4. CROSS VALIDATION Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). Itinvolves partitioning a sample of data into complementary sub sets, training set, and testing set. To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. In summary, cross-validation combines (averages) measures of fit (prediction error) to correct for the optimistic nature of training error and derive a more accurate estimate of model prediction performance.

IMPLEMENTATION Data Currently,I decide to use the turnout data set from https://vincentarelbundock.github.io/Rdatasets/doc/Zelig/turnout .html ,which contains individual-level turnout data and pools several American National Election Surveys conducted during the 1992 presidential election year.

Example: race age educate income vote white 60 14 3.3458 1 white 51 10 1.8561 0 white 24 12 0.6304 0

In the last column, result 1 represent the choice “Bill Clinton” and 0 represent the choice “George Bush”’

Feature vectors The features being analyzed in this project are Race Age Education Background and Income(thousand per month).The reason is simple that they are the data we are accessible from the data set ,and also contributing to one’s final vote decision to some extent.However,we cannot deny that there do exist some other more essential factors play a more in one’s decision ,such as political stand or occupation.At first I’ll try to use this data set as samples ,future replacement will be made ,if more reasonable data set can be found.

Each feature vector contains 4 features and 1 label,giving a total of 5 position .An example feature vector is as follow . Race Age Educate Income Label

Model For this project ,the model I plan to use is a Multi-Layer Perceptron(MLP)with 4 inputs, 1 output and 2 hidden layers with 50 neurons each. In each neuron ,I’d like to use a sigmoidal activation function with the alpha value 0.1 and the momentum is set to 0.9 at first . With the 2000 sample data ,I choose to use a n way cross- validation method to modify the network.May choose 1500 samples to be the training data and the rest 500 sample as the testing data. The MLP is applied to predict the vote choice of a individual (Clinton or Bush) .The output label corresponding to this as follow : [1 0]——Clinton [0 1]——Bush

RESULT [10 10]; η:0.15; MSE=0.020619

[10 10]; η:0.01;MSE=0

[10 10]; η:0.10 MSE=0

[10 10]; η:0.15; MSE=0.020619 [10 10 10] η=0.15 MSE=0.1856

[5 5] η=0.15 MSE=0 o be the t The result above shows that with the different netrork configuration.The performance of the ANN,measured by mean square error(MSE) ,varies.Generally the simpler structure did a better job.Because the complex structure of ANN may have a over-fitting problem. In statistics and machine learning, one of the most common tasks is to fit a "model" to a set of training data, so as to be able to make reliable predictions on general untrained data. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfit has poor predictive performance, as it overreacts to minor fluctuations in the training data. DISCUSSION AND FUTURE My pr Due to the simplicity of the data ,it’s hard for me to generalize a final conclusion of the prediction problem.Because the types in the samples are limited ,many of them are very similar but yield a different vote result .My project is almost finish ,but it’s far way to perfect .The current data set is a little simple ,and I would like to try with a more challenge and complex data set and construct a more complex MLP in the future.

REFERENCE [1]King, Gary, Michael Tomz, Jason Wittenberg (2000). “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science, vol. 44, pp.341–355. [2]http://heraqi.blogspot.com.eg/2015/11/mlp-neural-network- with-backpropagation.html [3]http://neuralnetworksanddeeplearning.com/chap3.html [4]Professor Hu Lecture Slides [5]Michael Nielsen “Neuron Network and Deep Learning” http://neuralnetworksanddeeplearning.com/about.html Appendix Matlab code %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%% % Multilayer Perceptron (MLP) Neural Network Function using MATLAB: % % An implementation for Multilayer Perceptron Feed Forward Fully % % Connected Neural Network with a sigmoid activation function. The % % training is done using the Backpropagation algorithm with options for % % Resilient Gradient Descent, Momentum Backpropagation, and Learning % % Rate Decrease. The training stops when the Mean Square Error (MSE) % % reaches zero or a predefined maximum number of epochs is reached. % % % % Four example data for training and testing are included with the % % project. They are generated by SharkTime Sharky Neural Network % % (http://sharktime.com/us_SharkyNeuralNetwork.html) % % % % Copyright (C) 9-2015 Hesham M. Eraqi. All rights reserved. % % [email protected] % % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%

%% Clear Variables, Close Current Figures, and Create Results Directory clc; clear all; close all; mkdir('Results//'); %Directory for Storing Results

%% Configurations/Parameters dataFileName = 'sharky.spirals.points'; %sharky.linear.points - sharky.circle.points - sharky.wave.points - sharky.spirals.points nbrOfNeuronsInEachHiddenLayer = [10]; %linear:[4] - circle:[10] - wave,spirals:[10 10] nbrOfOutUnits = 2; unipolarBipolarSelector = 0; %0 for Unipolar, -1 for Bipolar learningRate = 0.15; nbrOfEpochs_max = 5000; enable_resilient_gradient_descent = 1; %1 for enable, 0 for disable learningRate_plus = 1.2; learningRate_negative = 0.5;0 deltas_start = 0.9; deltas_min = 10^-6; deltas_max = 50; enable_decrease_learningRate = 0; %1 for enable decreasing, 0 for disable learningRate_decreaseValue = 0.0001; min_learningRate = 0.05; enable_learningRate_momentum = 0; %1 for enable, 0 for disable momentum_alpha = 0.05; draw_each_nbrOfEpochs = 100;

%% Read Data importedData = importdata(dataFileName, '\t', 6); Samples = importedData.data(:, 1:length(importedData.data(1,:))-1); TargetClasses = importedData.data(:, length(importedData.data(1,:))); TargetClasses = TargetClasses - min(TargetClasses); ActualClasses = -1*ones(size(TargetClasses));

%% Calculate Number of Input and Output NodesActivations nbrOfInputNodes = length(Samples(1,:)); %=Dimention of Any Input Samples % nbrOfOutUnits = ceil(log2(length(unique(TargetClasses)))) + !; %Ceil(Log2( Number of Classes )) nbrOfLayers = 2 + length(nbrOfNeuronsInEachHiddenLayer); nbrOfNodesPerLayer = [nbrOfInputNodes nbrOfNeuronsInEachHiddenLayer nbrOfOutUnits];

%% Adding the Bias as Nodes with a fixed Activation of 1 nbrOfNodesPerLayer(1:end-1) = nbrOfNodesPerLayer(1:end-1) + 1; Samples = [ones(length(Samples(:,1)),1) Samples];

%% Calculate TargetOutputs %TODO needs to be general for any nbrOfOutUnits TargetOutputs = zeros(length(TargetClasses), nbrOfOutUnits); for i=1:length(TargetClasses) if (TargetClasses(i) == 1) TargetOutputs(i,:) = [1 unipolarBipolarSelector]; else TargetOutputs(i,:) = [unipolarBipolarSelector 1]; end end

%% Initialize Random Wieghts Matrices Weights = cell(1, nbrOfLayers); %Weights connecting bias nodes with previous layer are useless, but to make code simpler and faster Delta_Weights = cell(1, nbrOfLayers); ResilientDeltas = Delta_Weights; % Needed in case that Resilient Gradient Descent is used for i = 1:length(Weights)-1 Weights{i} = 2*rand(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1))-1; %RowIndex: From Node Number, ColumnIndex: To Node Number Weights{i}(:,1) = 0; %Bias nodes weights with previous layer (Redundant step) Delta_Weights{i} = zeros(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1)); ResilientDeltas{i} = deltas_start*ones(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1)); end Weights{end} = ones(nbrOfNodesPerLayer(end), 1); %Virtual Weights for Output Nodes Old_Delta_Weights_for_Momentum = Delta_Weights; Old_Delta_Weights_for_Resilient = Delta_Weights;

NodesActivations = cell(1, nbrOfLayers); for i = 1:length(NodesActivations) NodesActivations{i} = zeros(1, nbrOfNodesPerLayer(i)); end NodesBackPropagatedErrors = NodesActivations; %Needed for Backpropagation Training Backward Pass zeroRMSReached = 0; nbrOfEpochs_done = 0;

%% Iterating all the Data MSE = -1 * ones(1,nbrOfEpochs_max); for Epoch = 1:nbrOfEpochs_max

for Sample = 1:length(Samples(:,1)) %% Backpropagation Training %Forward Pass NodesActivations{1} = Samples(Sample,:); for Layer = 2:nbrOfLayers NodesActivations{Layer} = NodesActivations{Layer- 1}*Weights{Layer-1}; NodesActivations{Layer} = Activation_func(NodesActivations{Layer}, unipolarBipolarSelector); if (Layer ~= nbrOfLayers) %Because bias nodes don't have weights connected to previous layer NodesActivations{Layer}(1) = 1; end end

% Backward Pass Errors Storage % (As gradient of the bias nodes are zeros, they won't contribute to previous layer errors nor delta_weights) NodesBackPropagatedErrors{nbrOfLayers} = TargetOutputs(Sample,:)-NodesActivations{nbrOfLayers}; for Layer = nbrOfLayers-1:-1:1 gradient = Activation_func_drev(NodesActivations{Layer+1}, unipolarBipolarSelector); for node=1:length(NodesBackPropagatedErrors{Layer}) % For all the Nodes in current Layer NodesBackPropagatedErrors{Layer}(node) = sum( NodesBackPropagatedErrors{Layer+1} .* gradient .* Weights{Layer} (node,:) ); end end

% Backward Pass Delta Weights Calculation (Before multiplying by learningRate) for Layer = nbrOfLayers:-1:2 derivative = Activation_func_drev(NodesActivations{Layer}, unipolarBipolarSelector); Delta_Weights{Layer-1} = Delta_Weights{Layer-1} + NodesActivations{Layer-1}' * (NodesBackPropagatedErrors{Layer} .* derivative); end end

%% Apply resilient gradient descent or/and momentum to the delta_weights if (enable_resilient_gradient_descent) % Handle Resilient Gradient Descent if (mod(Epoch,200)==0) %Reset Deltas for Layer = 1:nbrOfLayers ResilientDeltas{Layer} = learningRate*Delta_Weights{Layer}; end end for Layer = 1:nbrOfLayers-1 mult = Old_Delta_Weights_for_Resilient{Layer} .* Delta_Weights{Layer}; ResilientDeltas{Layer}(mult > 0) = ResilientDeltas{Layer} (mult > 0) * learningRate_plus; % Sign didn't change ResilientDeltas{Layer}(mult < 0) = ResilientDeltas{Layer} (mult < 0) * learningRate_negative; % Sign changed ResilientDeltas{Layer} = max(deltas_min, ResilientDeltas{Layer}); ResilientDeltas{Layer} = min(deltas_max, ResilientDeltas{Layer});

Old_Delta_Weights_for_Resilient{Layer} = Delta_Weights{Layer};

Delta_Weights{Layer} = sign(Delta_Weights{Layer}) .* ResilientDeltas{Layer}; end end if (enable_learningRate_momentum) %Apply Momentum for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate*Delta_Weights{Layer} + momentum_alpha*Old_Delta_Weights_for_Momentum{Layer}; end Old_Delta_Weights_for_Momentum = Delta_Weights; end if (~enable_learningRate_momentum && ~enable_resilient_gradient_descent) for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate * Delta_Weights{Layer}; end end

%% Backward Pass Weights Update for Layer = 1:nbrOfLayers-1 Weights{Layer} = Weights{Layer} + Delta_Weights{Layer}; end

% Resetting Delta_Weights to Zeros for Layer = 1:length(Delta_Weights) Delta_Weights{Layer} = 0 * Delta_Weights{Layer}; end

%% Decrease Learning Rate if (enable_decrease_learningRate) new_learningRate = learningRate - learningRate_decreaseValue; learningRate = max(min_learningRate, new_learningRate); end

%% Evaluation for Sample = 1:length(Samples(:,1)) outputs = EvaluateNetwork(Samples(Sample,:), NodesActivations, Weights, unipolarBipolarSelector); bound = (1+unipolarBipolarSelector)/2; if (outputs(1) >= bound && outputs(2) < bound) %TODO: Not generic role for any number of output nodes ActualClasses(Sample) = 1; elseif (outputs(1) < bound && outputs(2) >= bound) ActualClasses(Sample) = 0; else if (outputs(1) >= outputs(2)) ActualClasses(Sample) = 1; else ActualClasses(Sample) = 0; end end end

MSE(Epoch) = sum((ActualClasses-TargetClasses).^2)/ (length(Samples(:,1))); if (MSE(Epoch) == 0) zeroRMSReached = 1; end

%% Visualization if (zeroRMSReached || mod(Epoch,draw_each_nbrOfEpochs)==0) % Draw Mean Square Error subplot(2,1,2); MSE(MSE==-1) = []; plot([MSE(1:Epoch)]); ylim([-0.1 0.6]); title('Mean Square Error'); xlabel('Epochs'); ylabel('MSE'); grid on;

saveas(gcf, sprintf('Results//fig%i.png', Epoch),'jpg'); pause(0.05); end display([int2str(Epoch) ' Epochs done out of ' int2str(nbrOfEpochs_max) ' Epochs. MSE = ' num2str(MSE(Epoch)) ' Learning Rate = ' ... num2str(learningRate) '.']);

nbrOfEpochs_done = Epoch; if (zeroRMSReached) saveas(gcf, sprintf('Results//Final Result for %s.png', dataFileName),'jpg'); break; end end display(['Mean Square Error = ' num2str(MSE(nbrOfEpochs_done)) '.']);

Recommended publications