Lesson 8:
Geographically Weighted Regression

Author
Affiliation
Dr. Kam Tin Seong
Assoc. Professor of Information Systems(Practice)

School of Computing and Information Systems,
Singapore Management University

Published

12 Mar 2023

Content

  • Introducing Regression Modelling

    • Simple Linear Regression
    • Multiple Linear Regression
  • What is Spatial Non-stationary

  • Introducing Geographically Weighted Regression

    • Weighting functions (kernel)
    • Weighting schemes
    • Bandwidth
  • Interpreting and Visualising

The WHY Questions

  • Why some condominium units were transacted at relatively higher prices than others?

The WHY Questions

Why condominium units located at the central part of Singapore were transacted at relatively higher prices than others?

What is regression analysis?

  • A set of statistical processes for explaining the relationships among variables.

  • The focus is on the relationship between a dependent variable (y) and one or more independent variables (x)

    • Does X affect Y? If so, how?
    • What is the change in Y given a one unit change in X?
  • Estimate outcomes based on the relationships modelled.


A Simple Linear Regression Model

The formula:


The Least Squares Method

  • The sum of the vertical deviations (y axis) of the points from the line is minimal.


Multiple Linear Regression


Assessing the goodness of fit


Significance testing in regression


Goodness of fit test


Individual parameter testing


Assessing individual parameters


Are there redundant explanatory variables?

Assumptions of linear regression models

  • Linearity assumption. The relationship between the dependent variable and independent variables is (approximately) linear.

  • Normality assumption. The residual errors are assumed to be normally distributed.

  • Homogeneity of residuals variance. The residuals are assumed to have a constant variance (homoscedasticity).

  • The residuals are uncorrelated with each other.

    • serial correlation, as with time series
  • (Optional) The errors (residuals) are normally distributed and have a 0 population mean.]


The linearity assumption


The linearity assumption

Residuals vs Fitted plot - Used to check the linear relationship assumptions. A horizontal line, without distinct patterns is an indication for a linear relationship, what is good.


Demystifying the linearity assumption myth

The myth: - We should transform the values of the y variable when they are large.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.1     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(sjPlot)
library(sjmisc)

Attaching package: 'sjmisc'

The following object is masked from 'package:purrr':

    is_empty

The following object is masked from 'package:tidyr':

    replace_na

The following object is masked from 'package:tibble':

    add_case
library(sjlabelled)

Attaching package: 'sjlabelled'

The following object is masked from 'package:forcats':

    as_factor

The following object is masked from 'package:dplyr':

    as_label

The following object is masked from 'package:ggplot2':

    as_label
library(olsrr)

Attaching package: 'olsrr'

The following object is masked from 'package:datasets':

    rivers
data1 <- read_delim("data/ex1_data.txt", delim=";")
Rows: 100 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (1): group
dbl (2): x, y

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data1.lm <- lm(formula=y ~ x, data = data1)
tab_model(data1.lm)
  y
Predictors Estimates CI p
(Intercept) 1.98 1.66 – 2.30 <0.001
x 2.35 2.13 – 2.57 <0.001
Observations 100
R2 / R2 adjusted 0.819 / 0.817
ggplot(data=data1,  
       aes(x=`x`, y=`y`)) +
  geom_point() +
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'


The linearity assumption

Despite the values of the dependent variable is rather similar to the values of the independent variable, the diagnostic plot shows that the linearity assumption has been violated.

ols_plot_obs_fit(data1.lm, print_plot = TRUE)


Data transformation come to rescue

data.lm2 <- lm(formula=y ~ exp(x), data = data1)
tab_model(data.lm2)
  y
Predictors Estimates CI p
(Intercept) 0.51 0.36 – 0.65 <0.001
x [exp] 1.01 0.98 – 1.04 <0.001
Observations 100
R2 / R2 adjusted 0.977 / 0.977
ggplot(data=data1,  
       aes(x=exp(x), y=`y`)) +
  geom_point() +
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'


The linearity assumption

The diagnostic plot on the right shows that the linearity assumption has been conformed.

ols_plot_obs_fit(data.lm2, print_plot = TRUE)


The normality assumption

Warning

This is the test on the residual and not on the dependent variable.


Checking for serial correlation

The purpose of this test is to ensure the residuals of a multiple regression are time independent.

Spatial Non-stationary

  • When applied to spatial data, as can be seen, it assumes a stationary spatial process.
    • The same stimulus provokes the same response in all parts of the study region.
    • Highly untenable for spatial process.

Why do relationships vary spatially?

  • Sampling variation
    • Nuisance variation, not real spatial non-stationarity.
  • Relationships intrinsically different across space
    • Real spatial non-stationarity.
  • Model misspecification
    • Can significant local variations be removed?

Some definitions

  • Spatial non-stationarity: the same stimulus provokes a different response in different parts of the study region.

  • Global models: statements about processes which are assumed to be stationary and as such are location independent.

  • Local models: spatial decompositions of global models, the results of local models are location dependent – a characteristic we usually anticipate from geographic (spatial) data.


Spatial Autocorrelation assumption

The residuals are assumed to be distributed at random over geographical space.


Test of spatial autocorrelation

To test if the relationships in the model are non-stationary.

  • lm.morantest() of spdep package will be used.

Geographically Weighted Regression (GWR)

  • Local statistical technique to analyze spatial variations in relationships.

  • Spatial non-stationarity is assumed and will be tested.

  • Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related.


Geographically Weighted Regression (GWR): The method


Calibration of GWR

  • Local weighted least squares
    • Weights are attached with locations
    • Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related than remote ones

Calibration - Weighting functions


Calibration - Weighting functions


Calibration - Weighting schemes

  • Determines weights
    • Most schemes tend to be Gaussian or Gaussian-like reflecting the type of dependency found in most spatial processes.
    • It can be either Fixed or Adaptive.


Calibration - Determining Bandwidth


GWR Report

  • Package Model

  • Results of Global Regression

  • Results of Geographically Weighted Regression

  • SDF: A SpatialPointDataFrame


gwr: local R2


gwr: intercept

References

Brunsdon, C., Fotheringham, A.S., and Charlton, M. (2002) “Geographically weighted regression: A method for exploring spatial nonstationarity”. Geographical Analysis, 28: 281-289.

Brunsdon, C., Fotheringham, A.S. and Charlton, M., (1999) [“Some Notes on Parametric Significance Tests for Geographically Weighted Regression”](https://onlinelibrary-wiley-com.libproxy.smu.edu.sg/doi/abs/10.1111/0022-4146.00146. Journal of Regional Science, 39(3), 497-524.

Mennis, Jeremy (2006) “Mapping the Results of Geographically Weighted Regression”, The Cartographic Journal, Vol.43 (2), p.171-179.

Stephen A. Matthews ; Tse-Chuan Yang (2012) “Mapping the results of local statistics: Using geographically weighted regression”, Demographic Research, Vol.26, p.151-166.