How to do spatial analysis in R with sf

0
139


Where do you vote? Who are you legislators? What is your ZIP code? These questions have something geospatially in common: The answer involves determining which polygon a point falls within.

Such calculations are often done with specialized GIS software. But they’re also easy to do in R. You need three things:

  1. A way to geocode addresses to find latitude and longitude; 
  2. Shapefiles that outline ZIP code polygon boundaries; and 
  3. The sf package.

For geocoding, I usually use the . It’s free for 2,500 lookups a day and has a , but you need a (free) API key to use it. To get around that bit of complexity for this article, I’ll use the free, open-source Open Street Map . It doesn’t require a key. The tmaptools package has a function, geocode_OSM(), to use that API.

Importing and prepping geospatial data

I’ll be using the sf, tmaptools, tmap, and dplyr packages. If you want to follow along, load each with pacman::p_load() or install any not yet on your system with install.packages(), then load each with library().

For this example, I’ll create a vector with two addresses, our IDG office in Framingham, Massachusetts, and the RStudio office in Boston.

, but it’s a file for the entire country. Only do that if you don’t mind a large data file. 

One place to download a ZCTA file for a single state is . Search for any data by state, such as population, and then  and choose .

I could unzip my downloaded file manually, but it’s easier in R. Here I use base R’s unzip() function on a downloaded file, and unzip it to a project subdirectory named ma_zip_shapefile. That junkpaths = TRUE argument says I don’t want unzip adding another subdirectory based on the name of the zip file.

unzip("data/acs2017_5yr_B01003_86000US02648.zip", 
exdir = "ma_zip_shapefile", junkpaths = TRUE,
overwrite = TRUE)

Geospatial import and analysis with sf

Now finally some geospatial work. I’ll import the shapefile into R using sf’s st_read() function.

Screens shot by Sharon Machlis, IDG

A map of Massachusetts ZIP Code Tabulation Areas with the tmap R package.

And it looks like I do indeed have Massachusetts geometry with polygons that could be ZIP codes.

Next I want to use the geocoded address data. This is currently a plain data frame, but it needs to be converted into an sf geospatial object with the right coordinate system.

We can do that with sf’s st_as_sf() function. (Note: sf package functions that operate on spatial data start with st_, which stands for “spatial” and “temporal.”)

st_as_sf() takes several arguments. In the code below, the first argument is the object to transform—my geocoded addresses. The second argument vector tells the function which columns have the x (longitude) and y (latitude) values. The third sets the coordinate reference system to 4326, so it’s the same as my ZIP code polygons.

point_geo <- st_as_sf(geocoded_addresses, 
coords = c(x = "lon", y = "lat"),
crs = 4326)

Geospatial joins with sf

Now that I’ve set up my two data sets, calculating the ZIP code for each address is easy with sf’s st_join() function. The syntax:

st_join(point_sf_object, polygon_sf_object, join = join_type)

In this example, I want to run st_join() on the geocoded points first and the ZIP code polygons second. It’s a so-called left join format: All points in the first data (geocoded addresses) are included, but only points in the second (ZIP code) data that match. Finally, my join type is st_within, since I want the match to be points within. 

my_results <- st_join(point_geo, zipcode_geo, 
join = st_within)

That’s it! Now if I look at my results by printing out several of the most important columns, you”ll see each address has a ZIP code (in the “name” column). 

print(my_results[,c("query", "name", "geometry")])
# Simple feature collection with 2 features and 2 fields # geometry type: POINT # dimension: XY # bbox: xmin: -71.39105 ymin: 42.31348 xmax: -71.03673 ymax: 42.34806 # epsg (SRID): 4326 # proj4string: +proj=longlat +datum=WGS84 +no_defs # query name geometry # 1 492 Old Connecticut Path, Framingham, MA 01701 POINT (-71.39105 42.31348) # 2 250 Northern Ave., Boston, MA 02210 POINT (-71.03673 42.34806)

Mapping points and polygons with tmap

If you’d like to map the points and polygons, here’s one way to do it with tmap:

tm_shape(zipcode_geo) +
tm_fill() +
tm_shape(my_results) +
tm_bubbles(col = "red", size = 0.25)
Screen shot by Sharon Machlis, IDG

Massachusetts points and polygons mapped with tmap.

Want more R tips? Head to the