80 lines
3.0 KiB
Plaintext
80 lines
3.0 KiB
Plaintext
---
|
|
title: "Location Based Services Exercise 23/24"
|
|
author: "Erik Neller"
|
|
date: "`r Sys.Date()`"
|
|
output: pdf_document
|
|
---
|
|
# Moran's I
|
|
A measure of clustering for spatial data, defined as
|
|
$$
|
|
I = \frac{N}{W} \frac{\sum_{i=1}^N \sum_{j=1}^N w_{ij}(x_i-\overline{x})(x_j - \overline{x})}
|
|
{\sum_{i=1}{N}(x_i - x)^2}
|
|
$$
|
|
where
|
|
- $N$ is the number of spacial units indexed by $i$ and $j$
|
|
- $x$ is the variable of interest
|
|
- $\overline{x}$ is the mean of $x$
|
|
$w_{ij} are the elements of a matrix of spatial weights that denote adjacency
|
|
- $W = \sum_{i=1}^N \sum_{j=1}^N w_{ij}$ is the sum of all $w_{ij}$
|
|
|
|
It may be considered time series stationarity-agnostic as the calculation does not make assumptions about temporal behavior of the underlying data.
|
|
The deviation from the global mean $\overline{x}$ is calculated at a snapshot in time and weighted by $w_{ij}$,
|
|
resulting in a value that ranges from $[-1;1]$.
|
|
|
|
## Sources
|
|
- [https://doi.org/10.2307/2332142](http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/moran_paper.pdf)
|
|
- https://en.wikipedia.org/wiki/Moran%27s_I
|
|
|
|
## Calculation
|
|
|
|
```{r}
|
|
library(spdep) # for moran calculation
|
|
library(dplyr)
|
|
european_iso2 <- c(
|
|
"AL", "AD", "AT", "BY", "BE", "BA", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR",
|
|
"DE", "GR", "HU", "IS", "IE", "IT", "XK", "LV", "LI", "LT", "LU", "MT", "MD", "MC",
|
|
"ME", "NL", "MK", "NO", "PL", "PT", "RO", "RU", "SM", "RS", "SK", "SI", "ES", "SE",
|
|
"CH", "UA", "GB", "VA")
|
|
cities = read.csv('worldcities.csv')
|
|
capitals <- cities %>% subset( capital == "primary") %>% subset(iso2 %in% european_iso2)
|
|
gdp = read.csv('flat-ui__data-Mon Jan 12 2026.csv')
|
|
result <- merge(capitals,gdp, by.x= 'iso3', by.y = 'Country.Code', all.x = TRUE)
|
|
|
|
# Group by a unique identifier (e.g., iso3) and filter for the most recent year
|
|
result <- result %>%
|
|
group_by(iso3) %>% # Replace 'iso3' with the appropriate column for unique identification
|
|
slice(which.max(Year)) %>%
|
|
ungroup()
|
|
|
|
# Convert the result dataframe to an sf object
|
|
coordinates <- result %>% select(lng, lat)
|
|
result_sf <- st_as_sf(result, coords = c("lng", "lat"), crs = 4326)
|
|
|
|
# Create a spatial weights matrix using k-nearest neighbors
|
|
k <- 5 # Number of nearest neighbors
|
|
knn_nb <- knn2nb(knearneigh(coordinates, k = k))
|
|
weights <- nb2listw(knn_nb, style = "W")
|
|
|
|
# Ensure the variable of interest is numeric and handle NA values
|
|
result_sf$gdp <- as.numeric(result_sf$`Value`)
|
|
result_sf <- result_sf %>% na.omit() # Remove rows with NA values
|
|
|
|
# Calculate Moran's I
|
|
n <- length(result_sf$gdp)
|
|
s0 <- Szero(weights)
|
|
moran_result <- moran(result_sf$gdp,n=n, weights, S0 = s0)
|
|
print(moran_result)
|
|
|
|
# Perform Monte Carlo simulation
|
|
set.seed(123) # For reproducibility
|
|
moran_mc_result <- moran.mc(result_sf$gdp, listw = weights, nsim = 999)
|
|
print(moran_mc_result)
|
|
|
|
towrite <- result[, c('lat','lng', 'Value', 'city', 'iso3')]
|
|
|
|
write.csv(towrite, file = 'gdp.csv')
|
|
|
|
```
|
|
|
|
# Interpretation
|
|
Moran's I close to 0 is an indicator for low autocorrelation, meaning low clustering in the underlying data. The gdp does not seem to follow a clustering. |