Wednesday, March 31, 2021

What Might happen If All Ages Citizens Had Voted for the 2019 Canadian Federal Election?


       We already knew that the liberal party won the federal election in 2019. I am interested in  predicting which party would win the election if all ages citizens in Canada had voted for the 2019 Canadian Federal Election. Models will be built based on the online survey results of Canadian Election Study in 2019. Next, apply the model to the 2016 census data to find which party will have the highest votes. Finally, comparing the predicted result with the actual election result of 2019 to find if there will exist any difference.

        In the model, "age","province", "education" and "gender" will be considered as independent variables and "vote choice" will be the only response variable. Additionally, eight vote choices are included in the outcome variable and their distributions are shown in Figure 1. It reveals that the majority chose "Liberal Party"; "Conservative Party" was the second popular choice; "People's Party" had the lowest vote. Next, using the mentioned variables to build a logistic regression, and then implement this model to the 2016 census data to predict if all ages citizens had voted then what would happen to the election result.

 


        The survey data consists of a lot of multi-choice questions, and we would like to transfer them to multi-binary model by using one v.s rest method. To explain, if we have ๐‘› levels in outcome, then ๐‘› outcome will be obtained. For each vote choice, it would be a binary question. In this project, the first vote choice would be choosing "Liberal Party" or not, and other vote choice works in the same way.

        For binary questions, logistic regression model will be trained by using the equation below. In addition, categorical variables with ๐‘› levels will contain ๐‘› − 1 dummy variables. Next, each choice will train one logistic regression model, and then implement the model to census data to obtain the predicted probability of choosing the corresponding party.



        After that, one combined features group will have six predicted probability on different party. Then, we use softmax method to convert the probability into a range of [0~1]. The formula of softmax method is shown below.


        One of the model results is shown below. It reflects that all features are significant when significance level is set as 0.1, but if we set the significance level to 0.05, then variable "education" would not be significant any more. Moreover, for the coefficient of each variable, let's take "gender" as the example, there are three levels in variable “gender”(Male, Female, Others). Specifically, female level is treated as the reference; the other two will compare with it. In addition, the coefficient of "genderMale" is -0.34731, it means that the odds ratio is exp(−0.34731) = 0.7066.





        As we can see from the above plot, "Liberal Party" has the biggest predicted probability, which is approximately 0.26. "Conservative Party" is the second highest. Thus, we suggest that Liberal Party would win the election if all ages citizens in Canada had voted for the 2019 Canadian Federal Election. Our predicted result is exactly same as the actual election result of 2019 Canadian Federal Election.




Monday, March 22, 2021

Predicting 2020 American Presidential Election

        Last semester, I was interested in the result of 2020 American presidential election, so I built a logistic regression to see how predictor variables, such as race, sex, work class, education level, state, age, affect the probability of a person voting for Trump in the 2020 American Presidential Election.

The model is:

        Next, we employed the model to the census data to predict the proportion of voters in each group. In addition, we grabbed the raw survey data from the website called Vote For Study Group, and we also grabbed the census data from the website called IPUMS USA. There are six variables in the model: race, sex, age, education level, work class, and state. Afterward, with these values, we weighed the proportion of voting for Trump with the corresponding population of each variable group. Then, we summed up these values. Finally, we divided the sum by the total population size. The result of 44.424% is the predicted proportion voting.

        In conclusion, the estimated proportion of voting for Donald Trump is approximately 44.424%, and the estimated proportion of voting for Joe Biden is approximately 55.576%. Based on the analysis, we find that the proportion of Trump-voting is lower than the proportion voting for Biden. Therefore, Trump is less likely to win. 

        Now, we knew that Biden won the election. My prediction has correctly reveal the result.  

Monday, March 8, 2021

Uses of Statistics in Real Life

        We usually see statistics related examples are on the quiz or test, and they are asking us to calculate the result of the question. Why we are studying statistics? What is the role of statistics in real life?





1) Medical Study

Statistics are used behind all the medical study. Statistic help doctors keep track of where the baby should be in his/her mental development. Physician’s also use statistics to examine the effectiveness of treatments.

2) Weather Forecasts

Statistics are very important for observation, analysis and mathematical prediction models. Weather forecast models are built using statistics that compare prior weather conditions with current weather to forecast future weather conditions.

3) Survey Design

Statistic method can help the government or other institutions to collect data and analyze the data. It will help them for future decision making. 

4) Stock Market

The stock market also uses statistical computer models for stock analysis. Stock analysts get the information about economy using statistics concepts.

5) Consumer Goods

Retailers keeps track of everything they sell and to know the stock using statistics. Worldwide leading retailers use statistics to calculate what products ship to each store and when.

Sunday, February 14, 2021

The History of Statistics

        As an undergraduate student in UofT, I am taking statistics as one of my majors. By taking the statistics courses at school, I have learned how to build a model, how to interpret the results, and how to write codes and the report. Yes, we have learned a lot in class, but do you know the history of statistics?





        The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician"). The subject statistics arose from the interplay of mathematical concepts and the needs of several applied sciences including astronomy, geodesy, experimental psychology, genetics, and sociology. Statistics was created in the 18th century in response to the needs of industrializing. 

        By the 18th century, the term "statistics" designated the systematic collection of demographic and economic data by states. In the early 19th century, the meaning of "statistics" broadened to include the discipline concerned with the collection, summary, and analysis of data. Today, data is collected and statistics are computed and widely distributed in government, business, most of the sciences and sports, and many other fields. 

What Might happen If All Ages Citizens Had Voted for the 2019 Canadian Federal Election?

       We already knew that the liberal party won the federal election in 2019.  I am interested in  predicting which party would win the e...