--- title: "Location Based Services Exercise 23/24" author: "Erik Neller" date: "`r Sys.Date()`" output: pdf_document --- # Moran's I A measure of clustering for spatial data, defined as $$ I = \frac{N}{W} \frac{\sum_{i=1}^N \sum_{j=1}^N w_{ij}(x_i-\overline{x})(x_j - \overline{x})} {\sum_{i=1}{N}(x_i - x)^2} $$ where - $N$ is the number of spacial units indexed by $i$ and $j$ - $x$ is the variable of interest - $\overline{x}$ is the mean of $x$ $w_{ij} are the elements of a matrix of spatial weights that denote adjacency - $W = \sum_{i=1}^N \sum_{j=1}^N w_{ij}$ is the sum of all $w_{ij}$ It may be considered time series stationarity-agnostic as the calculation does not make assumptions about temporal behavior of the underlying data. The deviation from the global mean $\overline{x}$ is calculated at a snapshot in time and weighted by $w_{ij}$, resulting in a value that ranges from $[-1;1]$. ## Sources - [https://doi.org/10.2307/2332142](http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/moran_paper.pdf) - https://en.wikipedia.org/wiki/Moran%27s_I ## Calculation ```{r} library(spdep) # for moran calculation library(dplyr) european_iso2 <- c( "AL", "AD", "AT", "BY", "BE", "BA", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR", "DE", "GR", "HU", "IS", "IE", "IT", "XK", "LV", "LI", "LT", "LU", "MT", "MD", "MC", "ME", "NL", "MK", "NO", "PL", "PT", "RO", "RU", "SM", "RS", "SK", "SI", "ES", "SE", "CH", "UA", "GB", "VA") cities = read.csv('worldcities.csv') capitals <- cities %>% subset( capital == "primary") %>% subset(iso2 %in% european_iso2) gdp = read.csv('flat-ui__data-Mon Jan 12 2026.csv') result <- merge(capitals,gdp, by.x= 'iso3', by.y = 'Country.Code', all.x = TRUE) # Group by a unique identifier (e.g., iso3) and filter for the most recent year result <- result %>% group_by(iso3) %>% # Replace 'iso3' with the appropriate column for unique identification slice(which.max(Year)) %>% ungroup() # Convert the result dataframe to an sf object coordinates <- result %>% select(lng, lat) result_sf <- st_as_sf(result, coords = c("lng", "lat"), crs = 4326) # Create a spatial weights matrix using k-nearest neighbors k <- 5 # Number of nearest neighbors knn_nb <- knn2nb(knearneigh(coordinates, k = k)) weights <- nb2listw(knn_nb, style = "W") # Ensure the variable of interest is numeric and handle NA values result_sf$gdp <- as.numeric(result_sf$`Value`) result_sf <- result_sf %>% na.omit() # Remove rows with NA values # Calculate Moran's I n <- length(result_sf$gdp) s0 <- Szero(weights) moran_result <- moran(result_sf$gdp,n=n, weights, S0 = s0) print(moran_result) # Perform Monte Carlo simulation set.seed(123) # For reproducibility moran_mc_result <- moran.mc(result_sf$gdp, listw = weights, nsim = 999) print(moran_mc_result) towrite <- result[, c('lat','lng', 'Value', 'city', 'iso3')] write.csv(towrite, file = 'gdp.csv') ``` # Interpretation Moran's I close to 0 is an indicator for low autocorrelation, meaning low clustering in the underlying data. The gdp does not seem to follow a clustering.