At left, candidate Roy Moore (R). At right, candidate Doug Jones (D) - Image from Yellowhammer News

At left, candidate Roy Moore (R). At right, candidate Doug Jones (D) - Image from Yellowhammer News

A fascinating political race has been unfolding before our very eyes since the beginning months of 2017, when Jefferson Sessions vacated his U.S. Senate seat in Alabama to become the Attorney General of newly-elected U.S. President Donald Trump.

The vacated seat left by Sessions came at a time of stark political polarization, including strong backlash against the Trump administration amid a series of controversial policy decisions. However, all early indicators pointed to the seat being strong Republican, with Trump winning the state by a hefty 28.3% margin and the Senate delegation from Alabama being both Republican since 1997. Despite the many indicators of another impending Republican blowout in the Cotton State, many eyes remained trained on the race.

The hopes of a Republican blowout took a major hit in early November, when Republican candidate Roy Moore had several sexual misconduct allegations levied against him. The fallout was swift. Since November 9th, polls between Moore and Democratic candidate Doug Jones have indicated that the race is practically a tossup.

But can a Senate race in Alabama actually be a tossup? We will let our Bayesian analysis answer that question.

A Two Pronged Analysis

Our analysis takes shape in two forms. First, a polls-only method called the Dirichlet-Multinomial model that is minimally involved computationally and instead relies on basic Bayesian principles and a conjugate Dirichlet prior. Three submodels arise from this method:

Second, we conduct a Markov Chain Monte Carlo (MCMC) hierarchical logistic regression to better inform our prior understanding of the election and use its results in a Beta-Binomial model, then weight the output by information about each poll. To do :

We conduct these two separate analyses because they are unique methods with complimentary results. The Dirichlet-Multinomial model allows us to compute the probability of a certain candidate winning in a computationally simple manner, while the MCMC logistic regression and Beta-Binomial models allow us to compute Bayesian credible intervals and rigorously incorporate prior knowledge about the state of Alabama.

Dirichlet-Multinomial

The Dirichlet-Multinomial model provides a comforting level of simplicity in the complex world of modeling elections.

First, consider a simple Dirichlet distribution with shape parameters \(\alpha_1 = 2\), \(\alpha_2 = 2\), \(\alpha_3 = 1\).

\[\text{Figure 1: Dirichlet(2,2,1) Demonstration}\]

Figure 1 is simply a density for three variables, or outcomes, \(x_1\), \(x_2\), and \(z\). Other characteristics of the Dirichlet make it a useful distribution for modeling probabilities:

In this model, our polls data to take a multinomial form in which there are three options:

  1. Republican candidate wins the poll
  2. Democratic candidate wins the poll
  3. Other candidate wins the poll

General goal

Dirichlet-Multinomial Model Overview

\[X_i|\theta \sim multi(\theta)\]

\[\theta \sim Dir(\alpha)\]

The Dirichlet distribution can describe our understanding about the probability of an event occuring. In our case, there are three possible events, only one of which can occur. Thus, the generalized probability vector that describes this situation can be defined as:

\[ p = (p_1, p_2, p_3) \]

Where \(p_1 = \text{p(Republican winning)}\), \(p_2 = \text{p(Democrat winning)}\), and \(p_3 = \text{p(Other winning)}\).

The Dirichlet is a wonderful distribution for this analysis because it models mutually exclusive events, such as a certain candidate winning an election. Thus, \(p_1 + p_2 + p_3 = 1\) always.

Our goal is to obtain a probability model that tells us how likely each possible vector \(p\) of events is after incorporating some data (likelihood) to inform the model and obtain a posterior (more on incorporating the likelihood later). In order to do this, we set up a Dirichlet prior that expresses our initial understanding of the race, possibly a fairly uninformative description:

\[p = (.33, .33, .33)\]

Now that we have decided on prior probabilities, our next goal would be to select prior Dirichlet parameters that accurately model this probability vector as the most likely outcome.

Our first prior could look something like this:

\[\theta \sim Dir(\alpha)\] where \(\alpha\) is a vector of shape parameters: \[ \alpha = (\alpha_1, \alpha_2, \alpha_3)\\ \]

These shape parameters have an expected value:

\[E(\alpha_i) = \frac{\alpha_i}{\sum_k \alpha_k}\]

The expected values of each shape parameter must reflect the desired probability that we identify in \(p\). Let’s set the \(\alpha\) parameters as such:

\[ \alpha = (1, 1, 1) \]

These shape parameters create a distribution that describes the most likely probability of a Republican, Democrat, or Other candidate winning as:

\[E(\alpha_1) = E(\alpha_2) = E(\alpha_3) = \frac{1}{1+1+1} = .33 = p_1 = p_2 = p_3\]

Now, to better inform this model, other than the fairly arbitrary shape parameters on the Dirichlet prior, we will incorporate our polls data to model the posterior probability of each candidate winning. This will reveal a new vector of updated shape parameters:

\[\alpha' = (\alpha_1', \alpha_2', \alpha_3')\]

Normally, Markov chains would be necessary to simulate an intractible posterior, but why deal with an intractible posterior if we don’t have to?

If we model our polls data as Multinomial, the posterior can be easily derived without Markov chains. The Dirichlet is a conjugate prior of the Multinomial, so if we can represent our polls data as a multinomial we will be able to compute the exact posterior distribution.

We can represent our polls data as such:

\[X_i \sim Multi(N, k_i)\]

Where \(N\) is the number of polls that have been conducted on the race and \(k\) is a vector of counts of polls won by a candidate: \[k = (k_{1}, k_{2}, k_{3})\]

And \(k_{1} = \text{Number of polls won by the R candidate}\), \(k_{2} = \text{Number of polls won by D candidate}\) and \(k_{3} = \text{Number of polls won by Other candidate}\).

For example, the notation below represents the outcomes of 10 separate polls in which only one of three candidates can win. Our data indicate that the Republican candidate won 5 of 10 polls, the Democratic candidate won 4 of 10 polls, and the Other candidate won 1 of 10 polls.

\[X_i \sim Multi(10, (5,4,1))\]

The Dirichlet-Multinomial model allows us to systematically combine multinomial data, like the notation above, with a Dirichlet prior and obtain a Dirichlet posterior that better reflects our data.

Derivation of Dirichlet-Multinomial posterior:

\[\begin{split} f(\theta|X) & \propto f(X|\theta)f(\theta) \\ & = \prod_{i=1}^{d}\theta_i^{k_i} * \prod_{i=1}^{d}\theta_i^{\alpha_{i}-1} \\ & = \prod_{i=1}^{d}\theta_i^{k_i + \alpha_i - 1} \ \text{ for} \ \theta_i \in [0,1], k_i \in \{1,...,d\}, \alpha_i \in [0,\infty] \end{split}\]

Above is the PDF for the distribution \(\theta_i \sim Dir(k_i + \alpha_i)\)

\[\text{where d represents the number of possible events in the probability simplex (3 in our case)}\]

Dirichlet-Multinomial Alabama

In a Bayesian analysis, our likelihood (data) can be fed into the model all at once or sequentially. Both methods will result in the same outcome given the same prior. Beneath you will find a step-by-step explanation of how the data dump and sequential algorithm work. Then, we will computationally put these algorithms into action via some interactive visualizations.

  1. A Polls Data Dump

\[\theta|\alpha \sim Dir(1,1,1)\]

\[X|\theta \sim Multi(29, (21, 8, 0))\]

\[\theta|X \sim Dir([1+21], [1+8], [1+0])\\ \sim Dir(22, 9, 1)\]

By definition of the expected value for Dirichlet, we can easily calculate the \(\textbf{updated}\) expected probability of winning for each candidate:

\[E(\alpha_1') = \frac{22}{22+9+1} = .6875\] \[E(\alpha_2') = \frac{9}{22+9+1} = .2813\]

\[E(\alpha_3') = \frac{1}{22+9+1} = .0312\]

Thus, our prior understanding that Moore and Jones had equal chances of winning the race has changed to \(68.75\)% for Moore, \(28.13\)% for Jones, and \(3.12\)% for a third candidate.

  1. Sequential

This process would take too long to demonstrate in its entirety, but understanding the sequential analysis is essential to moving forward with the Alabama election Dirichlet-Multinomial model.

\[\theta|\alpha \sim Dir(1,1,1)\]

\[X_1|\theta \sim Multi(1, (1, 0, 0))\]

Thus, the posterior for poll \(1\) is:

\[\theta|X \sim Dir([1+1], [1+0], [1+0])\\ \sim Dir(2, 1, 1)\]

\[X_{2}|\theta \sim Multi(1, (0, 1, 0))\]

Thus, the posterior for poll \(2\) is

\[\theta|X \sim Dir([2+0], [1+1], [1+0])\\ \sim Dir(2, 2, 1)\]

This process continues for each poll \(X_i \in \{1,35\}\). Since we used the same initial prior and the same data, we would end up with the same results as in the data dump.

The process used above for the Alabama Senate election example is a simple model and makes several general assumptions. A more rigorous approach would:

  1. Develop a weighting method to reflect that winning polls early in the race is not as important as winning polls later in the race.

  2. Utilize multiple modeling methods to produce both maximum likelihood estimators and credible intervals.

Let’s look at this sequential polls model in more detail.

Alabama Senate Special Election Models

Sequential analysis of Alabama using Dirichlet

Let’s run this same Dirichlet-Multinomial model for Alabama, but in a more computationally efficient and reasonable way. We will calculate the probability of each candidate winning after each poll is submitted, record it in a posterior matrix, and plot the results.

Step-by-step, this is what the algorithm below does:

  1. Begins with prior Dirichlet shape parameters \(\alpha_1, \alpha_2, \alpha_3\).
  2. Starting with the first poll, follows the logic:

\[\text{Figure 2: Unweighted Dirichlet-Multinomial Alabama Polls Posterior}\]

Simple Dirichlet Polls Interactive App

Shinyapps don’t work in static documents. See Figure 2 above with a \(Dir(1,1,1)\) prior and a \(Dir(3,1,.1)\) prior respectively.

Set your own prior \(\alpha\) parameters and see how the model reacts given the polls data.

Each person’s prior understanding of Roy Moore’s chance of winning will vary slightly, but this model clearly gives Moore the upper hand. This unweighted polls model is \(\textbf{data centric}\), meaning that the data impacts the posterior very quickly.

Weighted Dirichlet Polls Model

Polls taken closer to election day may be a better indicator of current voter sentiment and thus may more accurately predict the election. We can tweak the previous model to make a poll victory more valuable to a candidate closer to election day. Instead of any victory adding 1 to the respective prior shape parameter, polls will be weighted based on their proximity to election day. This is the next logical progression in predicting the Alabama special election outcome using the Dirichlet-Multinomial model.

One difference between this model and the previous: - Instead of simply adding 1 to shape parameter \(i\) when candidate \(i\) wins a poll, \(\frac{1}{n}\) where \(n = \text{# days to election on that poll's end date}\) is added to shape parameter \(i\).

Computationally, we obtain the final posterior probability beginning with a very uninformative prior \(Dir(1,1,1)\).

##    pWinMoore pWinJones pWinOther       Date
## 32 0.4698965 0.3456376 0.1844659 2017-12-10

As you can see, this model is ineffective when given an uninformative prior. Clearly, we know that on December 10th, two days before election day, that the Other candidate does not have an 18% chance of victory. Thus, our prior is essential to the effectiveness of this model.

Try tuning the shape parameters below and notice how the slightest adjustments have a much bigger impact on the posterior probabilities for each candidate.

Alabama with the same weighting scheme

\[\text{Figure 3: Weighted Dirichlet-Multinomial Alabama Polls Posterior}\]

Weighted Dirichlet Polls Interactive App

Shinyapps don’t work in static documents. See Figure 3 above with a \(Dir(1,1,1)\) prior and a \(Dir(1.5,1,.1)\) prior respectively.

Change the prior \(\alpha\) shape parameters and see how the model reacts. Notice that a slight change in a prior parameter has a significant impact on the model. This is a product of the weighting scheme, which places very little emphasis on the data in the beginning. Thus, the prior shape parameters are more impactful to the model’s performance.

Unweighted Posterior Parameters for Predicted Vote Share

Finally, instead of simply adding 1 to the Dirichlet shape parameter of the candidate who one a particular poll, we add the percent of votes obtained by each candidate in that particular poll to the shape parameters.

For example:

\[\text{Prior:} \ \ Dir(1,1,1)\]

\[\text{Data: Poll 1 Outcome} \rightarrow \text{Moore 44, Jones 40, Other 16} \sim Mult(1, (.44, .40, .16))\]

\[\text{Posterior} \rightarrow Dir(1+.44, 1+.40, 1+.16) \sim Dir(1.44, 1.40, 1.16) \]

Posterior probabilities after the first poll:

\[E(\alpha_1') = \frac{1.44}{4} = .36\] \[E(\alpha_2') = \frac{1.40}{4} = .35\]

\[E(\alpha_3') = \frac{1.16}{4} = .29\]

As in the previous algorithm, the updated shape parameters become the new prior and each following poll is incorporated sequentially.

\[\text{Figure 4: Unweighted Dirichlet-Multinomial Alabama Vote Share Posterior}\]

Interactive App for Posterior Prediction of Vote Share

Shinyapps don’t work in static documents. See Figure 4 above with a \(Dir(1,1,1)\) prior and a \(Dir(3,1,.1)\) prior respectively.

Update the prior shape parameters \(\alpha\) and see how the model predictions change with your prior. This model clearly reflects the shrinking polls margin between Moore and Jones as election day approached. However, it is limited in that the “Other” candidate is awarded too high of a probability of victory.

This method has clear limitations as well. First, as we are utilizing a conjugate prior method and not approximating the posterior via more complex MCMC methods, we are unable to obtain overlapping credible intervals for vote share predictions. This just means that this model gives us its maximum likelihood estimator guess regarding vote share without a wide credible interval. It also awards too much vote share to the Other candidate. Finally, we are not accounting for any outside factors that may influence the outcome or provide a better picture of the political climate entering the election.

Now, we enter the computationally-heavy area of MCMC posterior parameter approximation. We will account for a few outside factors to better inform our prior, instead of always beginning with an uninformative prior, and compute credible intervals for candidate vote share. This should provide a more clear and thorough picture of this special election.

MCMC Vote Share Modeling

We can also look at this election as a two man race. There aren’t any credible third party candidates on the ballot, and the two write-ins listed on Ballotpedia’s website are not prominent political figures with organized campaigns. While it is true that some voters might vote for write-in candidates, these voters often make a marginal impact. Even in the cases where write-ins do matter, prior to the election, we have no way of knowing who will be writing in votes, and how many write-ins there will be. That is why in the following models, we will make the assumption that one hundred percent of votes will go to either Roy Moore or Doug Jones.

This assumption raises an important question though: How do you allocate the polling responses of undecided voters? Unfortunately, there isn’t a good answer. You might deduce that a large amount of these voters are Republicans who cannot decide between the controversial Roy Moore and the Democrat Doug Jones. Even so, we can imagine multiple scenarios in the case of these voters. Maybe most of them will reluctantly fall in line with their party and vote for Moore. Maybe last minute advertising highlighting Moore’s scandals will influence voters in favor of Jones. In any case, we have no good way of knowing what undecided voters will do, so we ultimately allocate them evenly to each candidate. While this is by no means a perfect assumption of how undecided voters will act, it arbitrarily favors neither candidate.

Beta-Binomial

Model 1: Vague Prior

For our first model, we will start with a Beta-Binomial and a simple, uninformed prior. That is, we assume that Roy Moore has an equal probability of winning 0% of the vote, 100% of the vote, and everything in between. This kind of prior lets the data do most of the talking, which could be useful if the polls are incredibly accurate, but it also ignores information about the our prior understanding about the state of politics that might be helpful in designing a model. The specifics of the model are described and illustrated below.

We choose the Beta-Binomial model because this way, we can use the Beta distribution to distribute Roy Moore’s projected support over values between 0 and 1, and the Binomial distribution lets us model the number of Roy Moore supporters in a poll as depending upon the poll’s sample size, and Roy Moore’s baseline support.

\[\begin{split} Y_{i} & = \text{Number of voters supporting Republican candidate Roy Moore in poll } i \\ N_i & = \text{Sample size of poll } i \\ \theta_i & = \text{Percentage of support for Republican candidate Roy Moore at time of poll } i \\ \end{split}\]

\[\begin{split} Y_{i}|\theta_i & \sim \text{Bin}(N_i,\theta_i) \\ \theta_i & \sim \text{Beta}(1, 1) \\ \end{split}\]

MCMC simulations

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 32
##    Unobserved stochastic nodes: 32
##    Total graph size: 160
## 
## Initializing model

Unweighted Analysis

After our rjags simulation, we generate a most likely percentage of votes for Roy Moore given each poll, and have taken an average of that percentage associated with each poll.

Note that in each of the following visualizations, the red and blue vertical lines represent the actual two-party vote share in the election.

## $SummaryStatistics
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4524  0.5066  0.5182  0.5182  0.5300  0.5972 
## 
## $CredibleInterval
##      2.5%     97.5% 
## 0.4840963 0.5520304 
## 
## $ProbabilityOfVictory
## [1] 0.8536
## 
## $Visualization

\[\text{Figure 5: Unweighted Uninformed Posterior}\]

This model predicts that Roy Moore has an 85.4% chance of victory, and that he will most likely earn 51.8% of the two party vote, with a 95% chance of earning between 48.4% and 55.2% of the vote. These results are visualized in Figure 5. We can see that this isn’t a very good method because this prediction weights the results from polls conducted in September the same as those taken in the week before the election. This wouldn’t matter if the polls were steady throughout the months, but this was not the case, so the prediction fails to account for a lot of valuable information.

Weighted Analysis

That is why in the following simulations we weight polls by three different factors. For one thing, we know that polls taken close to election day are a lot more useful than polls taken all the way back in September For this reason, it makes sense to weight each poll based on when it was conducted relative to election day. To do this we use an exponential decay function to generate a weight for each poll as follows:

\[W_{1i} = \large{e^{-N_i/10}}\]

This, however, is not the only factor we want to incorporate. We also know that larger polls tend to be more reliable. After a certain point, though, increasingly larger sample sizes do not translate to much better polls. To create a weight respecting this information, we took the logarithm of each poll’s sample size, and divided that by the mean logarithm of each poll to normalize the weights around the value 1. This gives larger polls a greater influence, and smaller polls a lesser influence, without giving any one poll too much or too little power.

\[W_{2i} = \frac{\log(N_i)}{(\sum_{j=1}^{32}\log(N_j))/32}\]

Finally, we know that certain pollsters are historically more reliable than others. FiveThirtyEight has compiled a list of pollster ratings in the form of letter grades, and we used those to add an additional weight to each poll. We assigned a percentage to each letter grade, and for any pollster that had no rating, we assigned a numerical weight of 0.5.

Grade A+ A A- B+ B B- C+ C C- D+ D D- F NA
Weight 1 .97 .93 .90 .87 .83 .80 .77 .73 .70 .67 .63 .60 .50

You can find FiveThirtyEight’s Pollster Ratings here.

Using these weights, we calculate a weighted posterior distribution for the expected vote share for Roy Moore. To do this, we first calculate a weighted mean of the expected vote share across all polls for each step of the MCMC simulation. We then calculate the standard deviation of expected vote share for each poll, and calculate the weighted mean of these standard deviations. We calculate this weighted standard deviation because when we just take the weighted average of the polls, it shrinks range of the posterior distribution on the variable \(\theta\), and we want to preserve the spread of \(\theta\) from the original simulations. Finally, for each new weighted value, we pick 1 value from a normal distribution centered around the expected vote share with the standard deviation calculated above. Even though Roy Moore’s support is gamma distributed, we pick from the normal because these variables are roughly normally distributed.

## $SummaryStatistics
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4351  0.4946  0.5063  0.5063  0.5181  0.5803 
## 
## $CredibleInterval
##      2.5%     97.5% 
## 0.4719687 0.5404040 
## 
## $ProbabilityOfVictory
## [1] 0.6417333
## 
## $Visualization

\[\text{Figure 6: Weighted Uninformed Posterior}\]

We see from Figure 6 that this is a much better model. It predicts Roy Moore has a 64.2% chance of winning the elction, and that he will most likely earn 50.6% of the two-party vote, with a 95% chance of earning between 47.2% and 54.0% of the vote.

One potential limitation, though, of this model, is that is has a vague prior. It doesn’t take into account anything we know about the state of Alabama and its recent history as a republican stronghold.

Logistic Regression

To account for this problem, we will use a logistic regression model on past Senate Elections to determine the predicted base level of support for Republicans in a state. We will use data from the 2016 Senate election, the 2010 Senate election, and the 2012 Presidential election to build a model that can determine the fundamentals in any state.

We use a logistic regression model here so we can break down our understanding of Republican support into specific components based on past presidential and Senate elections. This model transforms those components onto a scale of 0 to 1 that represents the baseline level of support for Republican Senate candidates in a state.

Model Specification

\[\begin{split} Y_i = & \text{ Republican Senate candidate's total votes in state } i \text{ in 2016} \\ N_i = & \text{ Total Senate votes in state } i \text{ in 2016}\\ \theta_i = & \text{ Republican candidate's support in state } i \\ X_{1i} = & \text{ Margin of Republican Senate vote in state } i \text{ over national Republican Senate vote in 2010} \\ X_{2i} = & \text{ Margin of Republican Presidential vote in state } i \text{ over national Republican Presidential vote in 2012} \\ \end{split}\]

\[\begin{split} Y_i|\theta_i,\beta_0,\beta_1,\beta_2 & \sim \text{Bin}(N_i, \theta_i) \\ \log(\frac{\theta_i}{1-\theta_i})|\beta_0,\beta_1,\beta_2 & = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i} \\ \beta_0 & \sim \text{ N}(0,100^2) \\ \beta_1 & \sim \text{ N}(0,100^2) \\ \beta_2 & \sim \text{ N}(0,100^2) \end{split}\]

Note that we choose the hyperpriors to be intentionally vague so they can converge two one of a large range of values. Our variables \(X_{1i}\) and \(X_{2i}\) measure how Republican support in a specific state differs from Republican support nationally, for Senate races and Presidential races respectively.

MCMC Simulation

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 29
##    Unobserved stochastic nodes: 3
##    Total graph size: 241
## 
## Initializing model

Alabama Election

With this model built, we can use data from the 2016 Presidential election and the 2016 Senate election in this model to calculate the fundamentals in Alabama.

\[\begin{split} \text{mean}(\beta_0) &= -0.135 \\ \text{mean}(\beta_1) &= 0.032 \\ \text{mean}(\beta_2) &= 0.015 \\ \\ X_{1\text{Alabama}} &= 21.5 \\ X_{2\text{Alabama}} &= 16.0 \\ \\ \log(\frac{\theta_\text{Alabama}}{1-\theta_\text{Alabama}}) &= \beta_0 + \beta_1*X_{1\text{Alabama}} + \beta_2*X_{2\text{Alabama}} \\ \\ \theta_\text{Alabama} &= 0.690 \end{split}\]

The fundamentals model predicts Roy Moore will win 68.98% of the vote, with a narrow 95% confidence interval from 68.96% to 69% of the vote, and a 100% probability of victory. We see that on its own, this isn’t a good model, because it doesn’t incorporate all the information about this specific race. Rather, it only takes into account past data about Alabama elections. So we don’t want to use the fundamentals model to try to predict the election. Rather, we want these fundamentals to inform our prior knowledge about the Republican support in the state of Alabama, and then use this to build a model and make a prediction.

Beta-Binomial model informed by Alabama fundamentals

Here is our new model where the parameter \(\alpha\) is based on this fundamental level of Republican support in Alabama, and it informs our prior understanding of \(\theta\) to predict Roy Moore will receive about 70% of the votes, before incorporating any polling data.

\[\begin{split} Y_{i} & = \text{Number of voters supporting Republican candidate Roy Moore in poll } i \\ N_i & = \text{Sample size of poll } i \\ \theta_i & = \text{Percentage of support for Republican candidate Roy Moore at time of poll } i \\ \alpha & = \text{Fundamental support for Republican candidates in Alabama} \end{split}\]

\[\begin{split} Y_{i}|\theta_i,\alpha & \sim \text{Bin}(N_i,\theta_i) \\ \theta_i|\alpha & \sim \text{Beta}(100*\alpha, 100*(1- \alpha)) \\ \alpha & \sim \text{N}(0.689794, 0.0001129983^2) \end{split}\]

Note that to produce our predictions, we will use the same weighting process as described above.

MCMC Simulation

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 32
##    Unobserved stochastic nodes: 33
##    Total graph size: 201
## 
## Initializing model
## $SummaryStatistics
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4606  0.5141  0.5252  0.5252  0.5364  0.5948 
## 
## $CredibleInterval
##      2.5%     97.5% 
## 0.4927260 0.5573121 
## 
## $ProbabilityOfVictory
## [1] 0.9389667
## 
## $Visualization

\[\text{Figure 7: Informed Weighted Posterior}\]

This model gives Roy Moore a 93.9% chance of victory, predicting he will most likely earn 52.5% of the two-party vote, with a 95% chance that he will earn between 49.3% and 55.7% of the vote. We see from Figure 7 that this model was off by a lot. In fact, the actual two-party vote shares were out of the 95% confidence interval.

Conclusions

In retrospect, the Alabama Senate election was very hard to predict. In the first place, too many factors in special elections are unknown and unpredictable. Turnout is always expected to be much lower, but we can never guarentee who will turnout. This throws off the fundamentals model, which assumes a similar composition of the electorate that of past elections. In this election for example, Jones earns close to the same amount of votes that Hillary Clinton earned in 2016, but Moore earned less than half as many votes as Donald Trump did in the same year. So while the fundamentals model assumed close to 70% baseline support for Republicans, only about half of the Republicans who voted in the last election actually showed up to vote for Moore, so the real “fundamentals” in this election should have been closer to a 50-50 split. The problem is that we had no way of knowing this before election day. This is probably why our model with an uninformed prior performed the best: Prior to election day, we had little reliable information about how people would vote, aside from polling data. This isn’t to say that an uninformed model is always better, but in the case of a competitve, special election, we might be better off assuming we don’t know what’s going to happen in the first place. This fundamentals model would likely perform better in a normal Senate election than in a special election, and it is useful because it can model the fundamental level of Republican support in any state.

Next Steps

The way we built our logistic regression model, we weren’t able to include parameters to account for interesting variables like the results of the generic congressional ballot, or presidential approval ratings at the time of an election, because these variables did not change over time. If we built a model that incorporated data from other previous Senate elections, we would have been able to include these variables that are otherwise static for a single election.

We also could:

Code Appendix

Setup all packages that will be used.

library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(rjags)
library(MCMCpack)
library(shiny)
library(rsconnect)

For Dirichet-Multinomial

Reading in data for Alabama

Alabama_2017_Senate_Polls <- read.csv("Alabama 2017 Senate Polls.csv", stringsAsFactors = FALSE)

Add New Polls Into Alabama Data (came out after we gathered initial data)

#Add new polls to Alabama

Alabama_2017_Senate_Polls[10,] <-  c("Raycom News Network", 51, 40, 9, 2, 2200, "11/6")

Alabama_2017_Senate_Polls[11,] <- c("Decision Desk HQ", 46, 46, 8, 4.3, 515, "11/9")

Alabama_2017_Senate_Polls[12,] <- c("JMC Analytics", 42, 46, 11, 4.1, 575, "11/11")

Alabama_2017_Senate_Polls[13,] <- c("Emerson College", 55, 45, 0, 3.9, 600, "11/11")

Alabama_2017_Senate_Polls[14,] <- c("FOX 10 News/Strategy", 49, 43, 8, 2, 3000, "11/13")

Alabama_2017_Senate_Polls[15,] <- c("National Republican Senatorial Commmittee", 39, 51, 10, "N/A", 500, "11/13")

Alabama_2017_Senate_Polls[16,] <- c("Fox News", 42, 50, 8, 3.5, 649, "11/15")

Alabama_2017_Senate_Polls[17,] <- c("Gravis Marketing", 42, 47, 11, 3.5, 628, "11/15")

Alabama_2017_Senate_Polls[18,] <- c("Change Research", 43, 46, 11, "N/A", 2090, "11/16")

Alabama_2017_Senate_Polls[19,] <- c("Raycom News Network/Strategy", 47, 45, 8, 2, 3000, "11/20")

Alabama_2017_Senate_Polls[20,] <- c("Sky Research", 47, 40, 14, 3.1, 1059, "11/21")

Alabama_2017_Senate_Polls[21,] <- c("Change Reserach", 49, 47, 7, 2.3, 1868, "11/27")

Alabama_2017_Senate_Polls[22,] <- c("JMC Analytics", 48, 43, 9, 3.8, 650, "11/28")

Alabama_2017_Senate_Polls[23,] <- c("Washington Post/Schar School", 47, 50, 3, 4.5, 739, "11/30")

Alabama_2017_Senate_Polls[24,] <- c("CBS News/YouGov", 49, 43, 8, 4.8, 766, "12/1")

Alabama_2017_Senate_Polls[25,] <- c("Emerson", 49, 46, 5, 4.3, 500, "12/2")

Alabama_2017_Senate_Polls[26,] <- c("Big League/Gravis", 44, 48, 8, 2.7, 1276, "12/3")

Alabama_2017_Senate_Polls[27,] <- c("WBRC-TV/Strategy Reserach", 50, 43, 7, 2, 3200, "12/4")

Alabama_2017_Senate_Polls[28,] <- c("Trafalgar Group", 51, 46, 3, 2.6, 1419, "12/7")

Alabama_2017_Senate_Polls[29,] <- c("Gravis", 49, 45, 6, 2.8, 1254, "12/8")

Alabama_2017_Senate_Polls[30,] <- c("Emerson", 53, 44, 3, 3.9, 600, "12/9")

Alabama_2017_Senate_Polls[31,] <- c("Monmouth", 46, 46, 8, 4.2, 546, "12/9")

Alabama_2017_Senate_Polls[32,] <- c("Fox News", 40, 50, 10, 3, 1127, "12/10")

Cleaning the Alabama polls data

#Clean Alabama data


colnames(Alabama_2017_Senate_Polls)[2] <- "Moore"
colnames(Alabama_2017_Senate_Polls)[3] <- "Jones"
colnames(Alabama_2017_Senate_Polls)[7] <- "EndDate"


  
  
Alabama_2017_Senate_Polls$RMargin <- as.numeric(Alabama_2017_Senate_Polls$Moore) - as.numeric(Alabama_2017_Senate_Polls$Jones)

Alabama_2017_Senate_Polls$Moore <- as.numeric(Alabama_2017_Senate_Polls$Moore)
Alabama_2017_Senate_Polls$Jones <- as.numeric(Alabama_2017_Senate_Polls$Jones)
Alabama_2017_Senate_Polls$Undecided.Other <- as.numeric(Alabama_2017_Senate_Polls$Undecided.Other)

#Make the data descending by date

Alabama_2017_Senate_Polls <- Alabama_2017_Senate_Polls[order(as.Date(Alabama_2017_Senate_Polls$EndDate, "%m/%d")), ]

#Calculate days to election

Alabama_2017_Senate_Polls$DaysToElection <- as.numeric(as.Date("12/12", "%m/%d") - as.Date(Alabama_2017_Senate_Polls$EndDate, "%m/%d"))

Code to produce static visualization of Dirichlet distribution

#function plots used to create Dirichlet distribution (not Shinyapp) taken from: https://www.math.wustl.edu/~victor/classes/ma322/r-eg-35.txt


########  MULTIPLE DIRICHLET PLOTS as a function of shape parameters

dirichlet3persp <- function(alpha1=2, alpha2=2, alpha3=2) {
  alpha<-c(alpha1,alpha2,alpha3)
  x1 <- seq(0,1, by=0.01)
  x2 <- seq(0,1, by=0.01)
  z <- matrix(0, nrow=length(x1), ncol=length(x2))
  # Fill z by looping over all valid (x1,x2) pairs, putting in
  # x3=1-x1-x2 as the third variable in ddirichlet():
  for(i in 1:length(x1)) {
     for(j in 1:length(x2) ) {
       if( x1[i]+x2[j] < 1) {
     x <- c(x1[i], x2[j], 1-x1[i]-x2[j])  # so x1+x2+x3=1     
     z[i,j] <- ddirichlet(x,alpha);
       } else { 
     z[i,j] <- 0
       }
     }
  }
  persp(x1,x2,z)
}

dirichlet3contour <- function(alpha1=2, alpha2=2, alpha3=2) {
  alpha<-c(alpha1,alpha2,alpha3)
  x1 <- seq(0,1, by=0.01)
  x2 <- seq(0,1, by=0.01)
  z <- matrix(0, nrow=length(x1), ncol=length(x2))
  # Fill z by looping over all valid (x1,x2) pairs, putting in
  # x3=1-x1-x2 as the third variable in ddirichlet():
  for(i in 1:length(x1)) {
     for(j in 1:length(x2) ) {
       if( x1[i]+x2[j] < 1) {
     x <- c(x1[i], x2[j], 1-x1[i]-x2[j])  # so x1+x2+x3=1     
     z[i,j] <- ddirichlet(x,alpha);
       } else { 
     z[i,j] <- 0
       }
     }
  }
  contour(x1,x2,z)
}


### 
# Examples of use:
dirichlet3persp(2,2,1)

#dirichlet3contour(12,5,32)

Code for the unweighted Dirichlet-Multinomial Alabama polls function and Shinyapp

dirMultAL <- function(a1,a2,a3){

priorAlpha <- c(a1, a2, a3)

posterior <- matrix(0, nrow=length(Alabama_2017_Senate_Polls$Poll), ncol=4)

for(i in 1:length(Alabama_2017_Senate_Polls$Poll)){
  if(Alabama_2017_Senate_Polls$RMargin[i] > 0){
    alphaR <- priorAlpha[1] + 1
    alphaSum <- alphaR + priorAlpha[2] + priorAlpha[3]
    postMeanR <- (alphaR)/(alphaSum)
    posterior[i,1] <- postMeanR
    posterior[i,2] <- (priorAlpha[2]/alphaSum)
    posterior[i,3] <- 1 - (posterior[i,1] + posterior[i,2])
    #Update the prior for sequential analysis
    priorAlpha[1] <- alphaR
  }
  
  if(Alabama_2017_Senate_Polls$RMargin[i] < 0){
    alphaD <- priorAlpha[2] + 1
    alphaSum <- priorAlpha[1] + alphaD + priorAlpha[3]
    postMeanD <- (alphaD)/(alphaSum)
    posterior[i,1] <- (priorAlpha[1]/alphaSum)
    posterior[i,2] <- postMeanD
    posterior[i,3] <- 1 - (posterior[i,1] + posterior[i,2])
    #Update the prior for sequential analysis
    priorAlpha[2] <- alphaD
  }
  
  if(Alabama_2017_Senate_Polls$RMargin[i] == 0){
  
    posterior[i,1] <- posterior[i-1,1]
    posterior[i,2] <- posterior[i-1,2]
    posterior[i,3] <- posterior[i-1,3]
    
  }
  
  
  
}


#Clean data for visualization

posterior <- as.data.frame(posterior)

posterior$pWinMoore <- as.numeric(posterior$V1)
posterior$pWinJones <- as.numeric(posterior$V2)
posterior$pWinOther <- as.numeric(posterior$V3)
posterior$Date <- as.Date(Alabama_2017_Senate_Polls$EndDate, "%m/%d")

posterior <- posterior[,-c(1,2,3,4)]

#Visualize

ggplot(posterior, aes(x=Date, y=pWinMoore))+
  geom_smooth(color="Red")+
  geom_smooth(aes(y=pWinJones), color="Blue")+
  geom_smooth(aes(y=pWinOther), color="Black")+
  labs(title="2017 Alabama Senate Special Election - MLE Dirichlet Polls", y="Probability of Victory")


}

#dirMultAL(1,1,1)

#Stiffer model

#dirMultAL(.45*10, .45*10, .1*10)
#build the server
server1 <- function(input, output) {
    output$winProbability <- renderPlot({

      dirMultAL(input$alpha1, input$alpha2, input$alpha3)

    })
    
}

#build the user interface
ui1 <- fluidPage(
    sidebarLayout(
        sidebarPanel(
            h4("Tune the Dirichlet shape priors:"),
            sliderInput("alpha1", "Republican shape parameter - alpha 1", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha2", "Democratic shape parameter - alpha 2", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha3", "Other shape parameter - alpha 3", min = 0, max = 5, value = 1, step=.01),
            h4("Specify the limit of the y-axis:"),
            sliderInput("ymax", "", min = 0, max = 1, value = 1)
        ),
        mainPanel(
            h4("Probability of Victory for each candidate:"),
            plotOutput("winProbability")
        )
    )
)


shinyApp(ui = ui1, server = server1)

Code for weighted Dirichlet-Multinomial Alabama polls and Shinyapp

wDirMultAL <- function(a1,a2,a3){

priorAlpha <- c(a1, a2, a3)

posterior <- matrix(0, nrow=length(Alabama_2017_Senate_Polls$Poll), ncol=4)

for(i in 1:length(Alabama_2017_Senate_Polls$Poll)){
  if(Alabama_2017_Senate_Polls$RMargin[i] > 0){
    alphaR <- priorAlpha[1] + (1*(1/Alabama_2017_Senate_Polls$DaysToElection[i]))
    alphaSum <- alphaR + priorAlpha[2] + priorAlpha[3]
    postMeanR <- (alphaR)/(alphaSum)
    posterior[i,1] <- postMeanR
    posterior[i,2] <- (priorAlpha[2]/alphaSum)
    posterior[i,3] <- 1 - (posterior[i,1] + posterior[i,2])
    #Update the prior for sequential analysis
    priorAlpha[1] <- alphaR
  }

  if(Alabama_2017_Senate_Polls$RMargin[i] < 0){
    alphaD <- priorAlpha[2] + (1*(1/Alabama_2017_Senate_Polls$DaysToElection[i]))
    alphaSum <- priorAlpha[1] + alphaD + priorAlpha[3]
    postMeanD <- (alphaD)/(alphaSum)
    posterior[i,1] <- (priorAlpha[1]/alphaSum)
    posterior[i,2] <- postMeanD
    posterior[i,3] <- 1 - (posterior[i,1] + posterior[i,2])
    #Update the prior for sequential analysis
    priorAlpha[2] <- alphaD
  }

  if(Alabama_2017_Senate_Polls$RMargin[i] == 0){

    posterior[i,1] <- posterior[i-1,1]
    posterior[i,2] <- posterior[i-1,2]
    posterior[i,3] <- posterior[i-1,3]

  }



}


#Clean data for visualization

posterior <- as.data.frame(posterior)

posterior$pWinMoore <- as.numeric(posterior$V1)
posterior$pWinJones <- as.numeric(posterior$V2)
posterior$pWinOther <- as.numeric(posterior$V3)
posterior$Date <- as.Date(Alabama_2017_Senate_Polls$EndDate, "%m/%d")

posterior <- posterior[,-c(1,2,3,4)]


#Visualize

ggplot(posterior, aes(x=Date, y=pWinMoore))+
  geom_smooth(color="Red")+
  geom_smooth(aes(y=pWinJones), color="Blue")+
  geom_smooth(aes(y=pWinOther), color="Black")+
  labs(title="2017 Alabama Senate Special Election - Weighted MLE Dirichlet Polls", y="Probability of Victory")


}


#wDirMultAL(1,1,1)
#build the server
server2 <- function(input, output) {
    output$winProbability <- renderPlot({

      wDirMultAL(input$alpha1, input$alpha2, input$alpha3)

    })
}

#build the user interface
ui2 <- fluidPage(
    sidebarLayout(
        sidebarPanel(
            h4("Tune the Dirichlet shape priors:"),
            sliderInput("alpha1", "Republican shape parameter - alpha 1", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha2", "Democratic shape parameter - alpha 2", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha3", "Other shape parameter - alpha 3", min = 0, max = 5, value = 1, step=.01),
            h4("Specify the limit of the y-axis:"),
            sliderInput("ymax", "", min = 0, max = 1, value = 1)
        ),
        mainPanel(
            h4("Probability of Victory for each candidate:"),
            plotOutput("winProbability")
        )
    )
)


shinyApp(ui = ui2, server = server2)

Code for Dirichlet-Multinomial vote share Alabama Polls model and Shinyapp

library(ggplot2)

dirMultALMargin <- function(a1,a2,a3){

priorAlpha <- c(a1, a2, a3)

posterior <- matrix(0, nrow=length(Alabama_2017_Senate_Polls$Poll), ncol=4)

for(i in 1:length(Alabama_2017_Senate_Polls$Poll)){
    alphaR <- priorAlpha[1] + (Alabama_2017_Senate_Polls$Moore[i]/100)
    alphaD <- priorAlpha[2] + (Alabama_2017_Senate_Polls$Jones[i]/100)
    alphaO <- priorAlpha[3] + (Alabama_2017_Senate_Polls$Undecided.Other[i]/100)
    alphaSum <- alphaR + alphaD + alphaO
    postMeanR <- (alphaR)/(alphaSum)
    postMeanD <- (alphaD)/(alphaSum)
    postMeanO <- (alphaO)/(alphaSum)
    posterior[i,1] <- postMeanR
    posterior[i,2] <- postMeanD
    posterior[i,3] <- postMeanO
    #Update the prior for sequential analysis
    priorAlpha[1] <- alphaR
    priorAlpha[2] <- alphaD
    priorAlpha[3] <- alphaO
  }
  
  
  
  



#Clean data for visualization

posterior <- as.data.frame(posterior)

posterior$pWinMoore <- as.numeric(posterior$V1)
posterior$pWinJones <- as.numeric(posterior$V2)
posterior$pWinOther <- as.numeric(posterior$V3)
posterior$Date <- as.Date(Alabama_2017_Senate_Polls$EndDate, "%m/%d")

posterior <- posterior[,-c(1,2,3,4)]

#Visualize

ggplot(posterior, aes(x=Date, y=pWinMoore))+
  geom_smooth(color="Red")+
  geom_smooth(aes(y=pWinJones), color="Blue")+
  geom_smooth(aes(y=pWinOther), color="Black")+
  labs(title="2017 Alabama Senate Special Election - MLE Dirichlet Polls", y="Predicted Vote Share")


}

#dirMultALMargin(1,1,1)
#build the server
server3 <- function(input, output) {
    output$winProbability <- renderPlot({

      dirMultALMargin(input$alpha1, input$alpha2, input$alpha3)

    })
}

#build the user interface
ui3 <- fluidPage(
    sidebarLayout(
        sidebarPanel(
            h4("Tune the Dirichlet shape priors:"),
            sliderInput("alpha1", "Republican shape parameter - alpha 1", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha2", "Democratic shape parameter - alpha 2", min = 0, max = 5, value = 1, step=.01),
            sliderInput("alpha3", "Other shape parameter - alpha 3", min = 0, max = 5, value = 1, step=.01),
            h4("Specify the limit of the y-axis:"),
            sliderInput("ymax", "", min = 0, max = 1, value = 1)
        ),
        mainPanel(
            h4("Probability of Victory for each candidate:"),
            plotOutput("winProbability")
        )
    )
)


shinyApp(ui = ui3, server = server3)

For MCMC models

Read in and wrangle polling data

#Save the cleaned Alabama_2017_Senate_Polls data as a new .csv for additional cleaning
#write.csv(Alabama_2017_Senate_polls, "Alabama.csv")

# Read in polling data
Alabama <- read_csv("Alabama.csv")

# Wrangle polling data
ALPolls <- Alabama %>%
  mutate(MooreTotal = round((Moore + Undecided.Other/2)/100*Sample.Size)) %>%
  dplyr::select(Poll, MooreTotal, Sample.Size, DaysToElection, EndDate) %>%
  arrange(DaysToElection)

# Add in polls since 12/4/17
NewPolls <- data.frame(Poll = c("FOX News", "Emerson", "Monmouth", "Traflagar", "Gravis"), MooreTotal = c(507,327,273,745,652), Sample.Size = c(1127,600,546,1419,1254), DaysToElection = c(2,3,3,5,4))
ALPolls <- NewPolls %>%bind_rows(ALPolls)

Function to analyze rjags simulation

analyze_results <- function(model_samples, weighted = FALSE, weight = NA) {
  ## Preliminary calculations
  # Weighted Analysis
  if (weighted) {
    # Calculate weighted mean of theta for each simulation across all polls
    model_samples1 <- rep(0,30000)
    for(i in 1:length(model_samples1)) {
      model_samples1[i] <- weighted.mean(model_samples[i,], w=weights)
    }
    
    # Calculate standard deviation for each poll across all simulations
    model_sample_sds <- rep(0, ncol(model_samples))
    for(i in 1:length(model_sample_sds)) {
      model_sample_sds[i] <- sd(model_samples[,i])
    }
    # Calculate weighted mean of all standard deviations
    model_sample_sd <- weighted.mean(model_sample_sds, w=weights)
    
    
  # Unweighted Analysis  
  } else {
    # Calculate mean theta for each simulation across all polls
    model_samples1 <- model_samples %>% rowMeans()
  
    # Calculate standard deviation for each poll across all simulations
    model_sample_sds <- rep(0, ncol(model_samples))
    for(i in 1:length(model_sample_sds)) {
      model_sample_sds[i] <- sd(model_samples[,i])
    }
  
    # Calculate mean of all standard deviations
    model_sample_sd <- mean(model_sample_sds)
  
  }
  ## Simulate Election results
  set.seed(2017)
  model_predictions <- rep(0,30000)
  for(i in 1:length(model_predictions)) {
    model_predictions[i] <- rnorm(1, model_samples1[i], model_sample_sd)
  }
  
  
  # Summary statistics and hypothesis testing
  ## Summary statistics including mean vote share for Roy Moore
  sumstat <- summary(model_predictions)
  
  ## 95% Credible Interval for Roy Moore's vote share
  CI <- quantile(model_predictions, c(0.025, 0.975))
  
  ## Roy Moore's probability of victory
  prob <- mean(model_predictions>=.5)
  
  # Visualizations
  model_predictions1 <- data.frame(Moore=model_predictions, Jones=1-model_predictions) %>%
  gather("Candidate", "VoteShare", Moore, Jones)
  
  #The actual two party vote share
  JonesShare <- 671151/(671151 + 650436)
  MooreShare <- 1-JonesShare

  vis <- ggplot(model_predictions1, aes(x = VoteShare, fill = Candidate)) + 
    geom_density() +
    geom_vline(xintercept = JonesShare, color="blue") +
    geom_vline(xintercept = MooreShare, color="red") +
    facet_wrap(~Candidate, nrow=2) + 
    scale_fill_manual(values = c("blue", "red"))+
    xlab("Predicted Vote Share") +
    ylab("Density")

  
  return(list(SummaryStatistics = sumstat, CredibleInterval = CI, ProbabilityOfVictory = prob, Visualization = vis))
}

Define poll weights

# Use exponential decay to weight by days to election
dayWeights <- exp(-ALPolls$DaysToElection/10)

# Weight by normalized logarithm of sample size
sampleSizeWeights <- log(ALPolls$Sample.Size)/mean(log(ALPolls$Sample.Size))

# Weight by FiveThirtyEight Pollster ratings
# A+ = 1
# A = .97
# A- = .93
# B+ = .90
# B = .87
# B- = .83
# C+ = .80
# C = .77
# C- = .73
# D+ = .70
# D = .67
# D- = .63
# F = .6
# Unknown = .5


pollRatingWeights <- c(.87, .87, 1, .77, .83, .5, .83, .87, .87, 1, .77, .5, 0.5, .5, .5, .83, .87, 0.5, 0.5, .73, .87, 0.5, .5, 0.5, 0.5, 0.5, 0.87, 0.5, 0.77, 0.5, 0.87, 0.87)

# Combine weights
weights <- dayWeights*sampleSizeWeights*pollRatingWeights

Vague fundamentals rjags

vague_beta_bin_model <- "model{
  for(i in 1:length(Y)) {
    #Data
    Y[i] ~ dbin(theta[i], N[i])

    #Prior
    theta[i] ~ dbeta(1,1)
  }
}"

vague_beta_bin_jags <- jags.model(textConnection(vague_beta_bin_model),
                         data=list(Y=ALPolls$MooreTotal, N=ALPolls$Sample.Size),
                         inits=list(.RNG.name="base::Wichmann-Hill", .RNG.seed=2017))

vague_beta_bin_sim <- coda.samples(vague_beta_bin_jags, variable.names=c("theta"), n.iter=30000)

vague_beta_bin_samples <- data.frame(vague_beta_bin_sim[[1]])

Unweighted vague beta-binomial results output

analyze_results(vague_beta_bin_samples)

Weighted vague beta-binomial results output

analyze_results(vague_beta_bin_samples, weighted = TRUE, weight = weights)

Data cleaning for logistic regression for more informed prior (this data is what informs our prior more rigorously)

Senate2016 <- read_csv("Senate2016Data.csv")
Senate2016_1 <- Senate2016 %>%
  mutate(Total2016 = Republican2016 + Democrat2016, DemShare2010 = Democrat2010/(Democrat2010+Republican2010)*100, DemMargin2010 = DemShare2010 - DemNat2010, DemMargin2012 = DemPres2012 - DemPresPop2012, RepShare2010 = Republican2010/(Democrat2010+Republican2010)*100, RepMargin2010 = RepShare2010 - RepNat2010, RepMargin2012 = RepPres2012 - RepPresPop2012)

Model specification rjags for logistic regression

fundamentals_model <- "model{
  for(i in 1:length(Y)) {
    #Data
    Y[i] ~ dbin(theta[i], N[i])

    #Prior
    logit(theta[i]) <- beta0 + beta1*X1[i] + beta2*X2[i]
  }

  #Hyper Priors
  beta0 ~ dnorm(0, 1/100^2)
  beta1 ~ dnorm(0, 1/100^2)
  beta2 ~ dnorm(0, 1/100^2)
}"

fundamentals_jags <- jags.model(textConnection(fundamentals_model),
                         data=list(Y=Senate2016_1$Republican2016, N=Senate2016_1$Total2016, X1=Senate2016_1$RepMargin2010, X2=Senate2016_1$RepMargin2012),
                         inits=list(.RNG.name="base::Wichmann-Hill", .RNG.seed=2017))

fundamentals_sim <- coda.samples(fundamentals_jags, variable.names=c("beta0", "beta1", "beta2", "theta"), n.iter=10000)

fundamentals_samples <- data.frame(fundamentals_sim[[1]])

Calculating Alabama fundamentals with the approximated coefficients

#Calculate Alabama Senate Margin over National Vote for Republicans
ALSenRep2016 <- 63.9
NatSenRep2016 <- 42.4
RepSenMargin2016 <- ALSenRep2016 - NatSenRep2016

#Calculate Alabama Presidential Margin over National Vote for Donald Trump
ALPresRep2016 <- 62.1
NatPresRep2016 <- 46.1
RepPresMargin2016 <- ALPresRep2016 - NatPresRep2016

#Calculate Fundamentals for Alabama
logtheta <- fundamentals_samples$beta0 + RepSenMargin2016*fundamentals_samples$beta1 + RepPresMargin2016*fundamentals_samples$beta2
theta <- 1/(1+exp(-logtheta))

#Summary Statistics
mean(theta)
sd(theta)
quantile(theta, c(0.025, 0.975))
mean(theta>.5)

Informed beta-binomial model using summary stats from simulated coefficients in logistic regression

ALwithFundamentals_model <- "model{
  for(i in 1:length(Y)) {
    #Data
    Y[i] ~ dbin(theta[i], N[i])

    #Prior
    theta[i] ~ dbeta(100*alpha, 100 - 100*alpha)
  }
  
  #Hyperprior
  alpha ~ dnorm(0.689794, 1/0.0001129983^2)
}"

ALwithFundamentals_jags <- jags.model(textConnection(ALwithFundamentals_model),
                         data=list(Y=ALPolls$MooreTotal, N=ALPolls$Sample.Size),
                         inits=list(.RNG.name="base::Wichmann-Hill", .RNG.seed=2017))

ALwithFundamentals_sim <- coda.samples(ALwithFundamentals_jags, variable.names=c("alpha", "theta"), n.iter=30000)

ALwithFundamentals_samples <- data.frame(ALwithFundamentals_sim[[1]])

Cleaning rjags samples for analysis

ALwithFundamentals_samples1 <- ALwithFundamentals_samples %>% dplyr::select(-alpha)

Analyze results from the new model with informed hyperpriors

analyze_results(ALwithFundamentals_samples1, weighted = TRUE, weight = weights)

References

2010 Senate Race - Election Results by State | NBC News. (2010, November 2). Retrieved December 13, 2017, from http://elections.nbcnews.com/ns/politics/2010/all/senate/

2012 Presidential Race - Election Results by State | NBC News. (2012, November 6). Retrieved December 13, 2017, from http://elections.nbcnews.com/ns/politics/2012/all/president/#.WjHHhUqnHIU

2016 Election Results: Senate Live Map by State, Real-Time Voting Updates. (2016, November 8). Retrieved December 13, 2017, from https://www.politico.com/mapdata-2016/2016-election/results/map/senate/

Alabama 2017 Special Election Polls. (2017, December 12). Retrieved December 13, 2017, from https://www.realclearpolitics.com/epolls/2017/senate/al/alabama_senate_special_election_moore_vs_jones-6271.html

Generic Congressional Vote 2016. (n.d.). Retrieved December 13, 2017, from https://www.realclearpolitics.com/epolls/other/2016_generic_congressional_vote-5279.html

Obama Presidential Approval Rating. (n.d.). Retrieved December 13, 2017, from https://www.realclearpolitics.com/epolls/other/president_obama_job_approval-1044.html#polls

United States Senate special election in Alabama, 2017. (n.d.). Retrieved December 13, 2017, from https://ballotpedia.org/United_States_Senate_special_election_in_Alabama,_2017

\(\textbf{Packages Used}\)

  1. readr
  2. dplyr
  3. tidyr
  4. ggplot2
  5. rjags
  6. MCMCpack
  7. shiny
  8. rsconnect