ex23-24 finished
This commit is contained in:
80
ex23-24.Rmd
Normal file
80
ex23-24.Rmd
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
title: "Location Based Services Exercise 23/24"
|
||||
author: "Erik Neller"
|
||||
date: "`r Sys.Date()`"
|
||||
output: pdf_document
|
||||
---
|
||||
# Moran's I
|
||||
A measure of clustering for spatial data, defined as
|
||||
$$
|
||||
I = \frac{N}{W} \frac{\sum_{i=1}^N \sum_{j=1}^N w_{ij}(x_i-\overline{x})(x_j - \overline{x})}
|
||||
{\sum_{i=1}{N}(x_i - x)^2}
|
||||
$$
|
||||
where
|
||||
- $N$ is the number of spacial units indexed by $i$ and $j$
|
||||
- $x$ is the variable of interest
|
||||
- $\overline{x}$ is the mean of $x$
|
||||
$w_{ij} are the elements of a matrix of spatial weights that denote adjacency
|
||||
- $W = \sum_{i=1}^N \sum_{j=1}^N w_{ij}$ is the sum of all $w_{ij}$
|
||||
|
||||
It may be considered time series stationarity-agnostic as the calculation does not make assumptions about temporal behavior of the underlying data.
|
||||
The deviation from the global mean $\overline{x}$ is calculated at a snapshot in time and weighted by $w_{ij}$,
|
||||
resulting in a value that ranges from $[-1;1]$.
|
||||
|
||||
## Sources
|
||||
- [https://doi.org/10.2307/2332142](http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/moran_paper.pdf)
|
||||
- https://en.wikipedia.org/wiki/Moran%27s_I
|
||||
|
||||
## Calculation
|
||||
|
||||
```{r}
|
||||
library(spdep) # for moran calculation
|
||||
library(dplyr)
|
||||
european_iso2 <- c(
|
||||
"AL", "AD", "AT", "BY", "BE", "BA", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR",
|
||||
"DE", "GR", "HU", "IS", "IE", "IT", "XK", "LV", "LI", "LT", "LU", "MT", "MD", "MC",
|
||||
"ME", "NL", "MK", "NO", "PL", "PT", "RO", "RU", "SM", "RS", "SK", "SI", "ES", "SE",
|
||||
"CH", "UA", "GB", "VA")
|
||||
cities = read.csv('worldcities.csv')
|
||||
capitals <- cities %>% subset( capital == "primary") %>% subset(iso2 %in% european_iso2)
|
||||
gdp = read.csv('flat-ui__data-Mon Jan 12 2026.csv')
|
||||
result <- merge(capitals,gdp, by.x= 'iso3', by.y = 'Country.Code', all.x = TRUE)
|
||||
|
||||
# Group by a unique identifier (e.g., iso3) and filter for the most recent year
|
||||
result <- result %>%
|
||||
group_by(iso3) %>% # Replace 'iso3' with the appropriate column for unique identification
|
||||
slice(which.max(Year)) %>%
|
||||
ungroup()
|
||||
|
||||
# Convert the result dataframe to an sf object
|
||||
coordinates <- result %>% select(lng, lat)
|
||||
result_sf <- st_as_sf(result, coords = c("lng", "lat"), crs = 4326)
|
||||
|
||||
# Create a spatial weights matrix using k-nearest neighbors
|
||||
k <- 5 # Number of nearest neighbors
|
||||
knn_nb <- knn2nb(knearneigh(coordinates, k = k))
|
||||
weights <- nb2listw(knn_nb, style = "W")
|
||||
|
||||
# Ensure the variable of interest is numeric and handle NA values
|
||||
result_sf$gdp <- as.numeric(result_sf$`Value`)
|
||||
result_sf <- result_sf %>% na.omit() # Remove rows with NA values
|
||||
|
||||
# Calculate Moran's I
|
||||
n <- length(result_sf$gdp)
|
||||
s0 <- Szero(weights)
|
||||
moran_result <- moran(result_sf$gdp,n=n, weights, S0 = s0)
|
||||
print(moran_result)
|
||||
|
||||
# Perform Monte Carlo simulation
|
||||
set.seed(123) # For reproducibility
|
||||
moran_mc_result <- moran.mc(result_sf$gdp, listw = weights, nsim = 999)
|
||||
print(moran_mc_result)
|
||||
|
||||
towrite <- result[, c('lat','lng', 'Value', 'city', 'iso3')]
|
||||
|
||||
write.csv(towrite, file = 'gdp.csv')
|
||||
|
||||
```
|
||||
|
||||
# Interpretation
|
||||
Moran's I close to 0 is an indicator for low autocorrelation, meaning low clustering in the underlying data. The gdp does not seem to follow a clustering.
|
||||
Reference in New Issue
Block a user