ex23-24 finished

2026-01-13 08:09:35 +01:00
parent 04838a848d
commit 3e6f8476e0
5 changed files with 62284 additions and 37 deletions
--- a/ex23-24.Rmd
+++ b/ex23-24.Rmd
@@ -0,0 +1,80 @@
+---
+title: "Location Based Services Exercise 23/24"
+author: "Erik Neller"
+date: "`r Sys.Date()`"
+output: pdf_document
+---
+# Moran's I
+A measure of clustering for spatial data, defined as
+$$
+I = \frac{N}{W} \frac{\sum_{i=1}^N \sum_{j=1}^N w_{ij}(x_i-\overline{x})(x_j - \overline{x})}
+{\sum_{i=1}{N}(x_i - x)^2} 
+$$
+where
+- $N$ is the number of spacial units indexed by $i$ and $j$
+- $x$ is the variable of interest
+- $\overline{x}$ is the mean of $x$
+$w_{ij} are the elements of a matrix of spatial weights that denote adjacency
+- $W = \sum_{i=1}^N \sum_{j=1}^N w_{ij}$ is the sum of all $w_{ij}$
+
+It may be considered time series stationarity-agnostic as the calculation does not make assumptions about temporal behavior of the underlying data.
+The deviation from the global mean $\overline{x}$ is calculated at a snapshot in time and weighted by $w_{ij}$,
+resulting in a value that ranges from $[-1;1]$.
+
+## Sources
+- [https://doi.org/10.2307/2332142](http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/moran_paper.pdf)
+- https://en.wikipedia.org/wiki/Moran%27s_I
+
+## Calculation
+
+```{r}
+library(spdep) # for moran calculation
+library(dplyr)
+european_iso2 <- c(
+  "AL", "AD", "AT", "BY", "BE", "BA", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR",
+  "DE", "GR", "HU", "IS", "IE", "IT", "XK", "LV", "LI", "LT", "LU", "MT", "MD", "MC",
+  "ME", "NL", "MK", "NO", "PL", "PT", "RO", "RU", "SM", "RS", "SK", "SI", "ES", "SE",
+  "CH", "UA", "GB", "VA")
+cities = read.csv('worldcities.csv')
+capitals <- cities %>% subset( capital == "primary") %>% subset(iso2 %in% european_iso2)
+gdp = read.csv('flat-ui__data-Mon Jan 12 2026.csv')
+result <- merge(capitals,gdp, by.x= 'iso3', by.y = 'Country.Code', all.x = TRUE)
+
+# Group by a unique identifier (e.g., iso3) and filter for the most recent year
+result <- result %>%
+  group_by(iso3) %>%  # Replace 'iso3' with the appropriate column for unique identification
+  slice(which.max(Year)) %>%
+  ungroup()
+
+# Convert the result dataframe to an sf object
+coordinates <- result %>% select(lng, lat)
+result_sf <- st_as_sf(result, coords = c("lng", "lat"), crs = 4326)
+
+# Create a spatial weights matrix using k-nearest neighbors
+k <- 5  # Number of nearest neighbors
+knn_nb <- knn2nb(knearneigh(coordinates, k = k))
+weights <- nb2listw(knn_nb, style = "W")
+
+# Ensure the variable of interest is numeric and handle NA values
+result_sf$gdp <- as.numeric(result_sf$`Value`)
+result_sf <- result_sf %>% na.omit()  # Remove rows with NA values
+
+# Calculate Moran's I
+n <- length(result_sf$gdp)
+s0 <- Szero(weights)
+moran_result <- moran(result_sf$gdp,n=n, weights, S0 = s0)
+print(moran_result)
+
+# Perform Monte Carlo simulation
+set.seed(123)  # For reproducibility
+moran_mc_result <- moran.mc(result_sf$gdp, listw = weights, nsim = 999)
+print(moran_mc_result)
+
+towrite <- result[, c('lat','lng', 'Value', 'city', 'iso3')]
+
+write.csv(towrite, file = 'gdp.csv')
+
+```
+
+# Interpretation
+Moran's I close to 0 is an indicator for low autocorrelation, meaning low clustering in the underlying data. The gdp does not seem to follow a clustering.