#### Research Question

In this assignment, we are working with the regression model from Assignment 2. In that test, we looked at the regression of various independent variables, and how much variation they explained in our dependent variable, “adopt05”, or the percent of broadband adoption in 2004 among our sample. With these models, we were able to find which independent variables likely explain more of the variability in broadband adoption among the wider population. We also found the measurements of the residuals of these measures, or the portion of the variance that was not explained. What we were not able to do is determine whether there is any spatial dependence that might contribute to the residual unexplained variance.

When measuring spatial autocorrelation, we are looking at whether a particular measure is more similar to that same measure in close proximity than it is to that measure at a further distance. In other words, are similar values clustered together, or are they distributed randomly? One way to determine whether a measure is clustered or random is by using Moran’s I. Then in order to test the significance of those clusters, we need to use other a test that compares the value of Moran’s I to what we would expect if the values are random, or simulations of random data in order to see how likely it is that we would get the value of Moran’s I if the values are random.

#### Methods

In order to test spatial autocorrelation, we first need to create a network to define who the spatial neighbors are for this comparison. In this analysis, we are defining neighbor’s using the rook’s case, or those which share a boundary, but not a corner. After we define who the neighbors are, we can create a spatial weights matrix, which determines the weight of influence those neighbors have based on the total number of numbers. Once the spatial weights matrix is created by assigning a 1 to each neighbor, a 0 to each non-neighbor, and then dividing the 1 by the number of neighbors, the lagged means can be calculated. These are calculated by multiplying the observed value of each neighbor by that neighbor’s weighted value. Each county then has a lagged mean of its neighbors. That can be compared to the value of the county itself, to see what the relationship is. In this way, first we will test for spatial dependency in the dependent variable, “adopt05”. Then we will use the models of lagged means and the spatial error to test the regression model of “adopt05” by “medage”. Where the Moran’s I can give an indication of clustering or not, it cannot provide the diagnostic of significance. The Monte Carlo test will compare the Moran’s I output to that of complete spatial randomness, 10,000 times, in order to see what the probability is that we would see the value of Moran’s I that we see if the values are random. We can use the Local Moran’s I to understand the clustering of particular high or low values.
If we see a statistically significant Monte Carlo test, we can determine that it is very unlikely that the spatial distribution is random. Rather, we could say with a fairly high certainty that there the values of our dependent variable are affected by their neighbors. That value is represented by the rho value in the result of our spatial autoregression model.

#### Results

The map above shows how much influence the lagged means, or the weights of the neighbors times their value for “adopt05” exert on the dependent variable of each county. There seems to be a much stronger spatial influence in the central parts of the state, especially in the north central areas. From what we saw in the map of the dependent variable above, it seems that the areas where the percent of broadband adoption is greater also have large influence on their neighbors. Considering the physical infrastructure necessary to expand broadband networks, it isn’t surprising that the places where it is found are clustered together. Whether that clustering is significant and not random will be shown in the Monte Carlo simulation of our Moran’s I model.

Global Moran’s I for regression residuals

Model: lm(formula = reg.eq1, data = KYMap)
Weights: listw1

Moran I statistic standard deviate = 4.1204, p-value = 0.00001891
Alternative hypothesis: greater
Sample estimates:
Observed Moran I Expectation Variance
0.226727978 -0.010564525 0.003316559

The global Moran’s I measures global spatial association by analyzing the region as a whole. The test can tell us if there is clustering, by giving the observed Moran’s value vs the expected. However this in itself cannot tell where the clustering is happening, or how significant the clusters are.

Monte-Carlo simulation of Moran I