Space Analysis
Factors: shortest_transit, num_transit, num_business
In this section, we will look at a few spatial variables and explore how they contribute to the attractiveness of events. We will use cluster analysis, classifications, and hypothesis testing here.
Classification
We implemented 5 classification methods to predict if an event is popular with only three space related variables. To evaluate the performance of these models, we split the dataset into 80% of training data and 20% of test data and calculated accuracy of the trained model on test data. At the first glance, the accuracy for these models are better than random guess, which is ideal. However, if we take a further look at the confusion matrix of each model, we find the precision of predicting unattractive events is pretty low relative to precision of predicting attractive events. This means there are way more unattractive events than attractive events in our dataset. Among the 5 classifiers, SVM returned relatively high precision for both categories. With that being said, we determine that SVM performed the best.
This result aligns with the classfication output using 19 variables.
Linear Regression
H0: All the space related factors are statistically significant for predicting how many times an event is repeated
HA: At least one factor does not have a relationship with how many times an event is repeated
From the OLS Regression Results, we observe that among the three space related variables, only shortest_transit and num_business are statistically significant for predicting recursive_count.
Logistic Regression
- For similar interpretation and statement of hypothesis please refer to the page Predictive Analysis.
H0: All the space related factors are statistically significant for predicting the probability of attractive events
HA: At least one factor does not have a relationship with the probability of attractive events
From the Logistic Regression Results, we observe that among the three space related variables, only num_business and num_transits are statistically significant for predicting probability of attractive events.
Conclusion
Based on the above analysis, only num_business and num_transits are statistically significant for predicting probability of attractive events; shortest_transit and num_business are statistically significant for predicting recursive_count.