<<

10-601 , Project Phase1 Report

Group Name: DEADLINE

Team Member: Zhitao Pei (zhitaop), Sean Hao (xinhao)

Random Forest Environment: Weka 3.6.11

Data: Full dataset

Parameters:

 200 trees  400 features  1 seed  Unlimited max depth of trees

Accuracy: The training takes about half an hour and achieve an accuracy of 39.886%.

Explanation:

The reason we choose it is that random forest learner will usually give good performance compared to other classifiers. Decision tree is one of the best classifiers as the ranking showed in the class. Random forest is an ensemble of decision trees which is able to reduce the variance and give a better and unbiased result compared to other decision tree. The error mostly occurs when the images are hard to tell the difference simply based on the grid.

Multilayer Environment: Weka 3.6.11

Parameters:

 Hidden Layer: 3  : 0.3  Momentum: 0.2  Training Time: 500  Validation Threshold: 20

Accuracy: 27.448%

Explanation:

I chose Neural Network because I consider the features are independent since they are pixels of picture. To get the relationships between those pixels, a good way is weight different features and combine them to get a result. is perfectly match with my imagination. However, training Multilayer consumes huge time once there are many nodes in hidden layer. So I indicates that the node in hidden layer only could be 3. It is bad but that's a sort of trade off. In next phase, I will try to construct different Neural Network structure to reduce the training time and improve model accuracy.