data310_spring2021

View the Project on GitHub aehilla/data310_spring2021

Final Project: Impact of Urbanization on Temperature in the Solomon Islands

Due May 12
Research Question

What is the effect of urban sprawl on air temperature in the Solomon Islands?

I chose the Solomon Islands because I wanted to examine a country that was urbanizing rapidly, but which was not too large and difficult to analyze. Uganda has the fastest urbanization rate of any country currently, with an urbanization rate of 5.7% as of 2020. However, Uganda is very large and populous, with 45 million people across 236,040 square kilometers. The Solomon Islands have a much more managable geographic size and population, with just 652,857 people across 28,400 square kilometers. However, it still has a very high urbanization rate and is in the top 30 fastest urbanizing countries, and its rising economic potential is often overlooked due to its small size.

“Although 80% of the population live in rural areas, the Solomon Islands is considered to be one of the world’s fastest urbanizing countries, with an annual urban growth rate of 4.7 percent.” - U.N Habitat for a Better Urban Future

Understanding the effect of urban sprawl on temperatures is important for understanding the impact of human development on environmental factors. The Solomon Islands are urbanizing quickly but but their economy relies heavily on subsistence agriculture and their growing tourism industry. It is crucial to understand the consequences that urbanization may have, because increasing temperatures could hurt the natural environment and potentially harm both the agricultural and tourism sectors. By examining whether the urban change variable is having an effect on temperature, it can be determined whether the Solomon Islands needs to prepare for environmental consequences of urbanization in their key industries.

Data

The data I will be utilizing for this project is the Worldpop Urban Change dataset for the Solomon Islands, using the Estimated persons per grid square feature as the independent variable, which is continuous geographic data from 2000 and 2010. I will be using the World Bank air temperature data, using air Temperature at 2 m above ground level in °C as the dependent variable, which is also continuous geographic data. The urban change dataset was collected via land cover mapping. The air temperature data is modeled from metereological station measurements. I will be using the GeoBoundaries administrative boundaries shapefiles to split the geographic data based on ADM3 zones.

Data sources:

Machine Learning Model

Provide the specification for your applied machine learning method that presented the most promise in providing a solution to your problem. Include the section from your python or R script that specifies your model architecture, layers, functional arguments and specifications for compiling and fitting. Provide a brief description of how you implemented your code in practice.

First I split the geospatial data into 183 observations, based on the ADM3 zones:

Then I used a similar strategy from our DHS project, by stacking the urban change and air temperature rasters, cropping to the extent of the ADM0 boundary, and using exact_extract to obtain the sum in each ADM3 zone:

sol_adm0 <- read_sf("SLB_ADM0_fixedInternalTopology.shp")
sol_adm3 <- read_sf("SLB_ADM3_fixedInternalTopology.shp")

urbchg <- raster("./Solomon_Islands_100m_Urban_change/SLB10urbchg.tif") # 2010 urban change raster 
temp <- raster("temperature.tif")

temp_adm0 <- crop(temp, sol_adm0)
temp_adm0 <- mask(temp_adm0, sol_adm0)
urbchg_adm0 <- crop(urbchg, sol_adm0)
urbchg_adm0 <- mask(urbchg_adm0, sol_adm0)

urbchgResamp <- resample(urbchg_adm0, temp_adm0, resample='bilinear')

stacked <- stack(temp_adm0, urbchgResamp)

temp_adm3 <- exact_extract(temp, sol_adm3, fun=c('sum', 'mean'))
urbchg_adm3 <- exact_extract(urbchg, sol_adm3, fun=c('sum', 'mean'))

stackadm3 <- exact_extract(stacked, sol_adm3, fun=c('sum', 'mean'))

To explore the data, I plotted the two variables and used a simple linear regression to plot the line of best fit. This plot shows that there seems to be a positive, roughly linear relationship between urban change and temperature, showing that temperature increases as urbanization increases.

The linear regression summary below also shows that the relationship is positive, with urban change having a coefficient of 26.30 and statistically significant to the 0.05 level.

Residuals:
   Min     1Q Median     3Q    Max 
 -7110  -3435  -1008   2262  27236 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      3670.29     477.36   7.689 9.16e-13 ***
sum.SLB10urbchg    26.30      12.04   2.184   0.0302 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4440 on 181 degrees of freedom
Multiple R-squared:  0.02569,	Adjusted R-squared:  0.0203 
F-statistic: 4.772 on 1 and 181 DF,  p-value: 0.03022

To try to build from the simple linear regression model, I used a simple Support Vector Machine model to assess the impact of the urban change data (X) on the air temperature measurements (Y) in each administrative zone. This was a very basic model and did not have a high level of accuracy:

> modelSvmRRB
Support Vector Machines with Radial Basis Function Kernel 

147 samples
  1 predictor

Pre-processing: scaled (1), Yeo-Johnson transformation (1) 
Resampling: Cross-Validated (5 fold, repeated 5 times) 
Summary of sample sizes: 118, 118, 117, 118, 117, 119, ... 
Resampling results:

  RMSE      Rsquared    MAE      
  0.220309  0.08397746  0.1664855

Tuning parameter 'sigma' was held constant at a value of
 0.025
Tuning parameter 'C' was held constant at a value of 2

SVM Radial Predicted Test plot:

I tried to improve the model by using a Linear kernel instead and using a tuning parameter of 1, which did slightly increase the R-squared and decreased the mean absolute error and RMSE:

> svm_Linear
Support Vector Machines with Linear Kernel 

147 samples
  1 predictor

Pre-processing: centered (1), scaled (1) 
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 132, 132, 132, 132, 132, 133, ... 
Resampling results:

  RMSE       Rsquared   MAE      
  0.1434152  0.1057846  0.1065911

Tuning parameter 'C' was held constant at a value of 1

SVM Linear Predicted Test plot:

Conclusion

Literature on urbanization and surface temperature: