Where do you vote? Who are you legislators? What is your ZIP code? These questions have something geospatially in common: The answer involves determining which polygon a point falls within.
Such calculations are often done with specialized GIS software. But they’re also easy to do in R. You need three things:
- A way to geocode addresses to find latitude and longitude;
- Shapefiles that outline ZIP code polygon boundaries; and
- The sf package.
For geocoding, I usually use the . It’s free for 2,500 lookups a day and has a , but you need a (free) API key to use it. To get around that bit of complexity for this article, I’ll use the free, open-source Open Street Map . It doesn’t require a key. The tmaptools package has a function,
geocode_OSM(), to use that API.
Importing and prepping geospatial data
I’ll be using the sf, tmaptools, tmap, and dplyr packages. If you want to follow along, load each with
pacman::p_load() or install any not yet on your system with
install.packages(), then load each with
For this example, I’ll create a vector with two addresses, our IDG office in Framingham, Massachusetts, and the RStudio office in Boston.
, but it’s a file for the entire country. Only do that if you don’t mind a large data file.
One place to download a ZCTA file for a single state is . Search for any data by state, such as population, and then and choose .
I could unzip my downloaded file manually, but it’s easier in R. Here I use base R’s
unzip() function on a downloaded file, and unzip it to a project subdirectory named ma_zip_shapefile. That
junkpaths = TRUE argument says I don’t want unzip adding another subdirectory based on the name of the zip file.
exdir = "ma_zip_shapefile", junkpaths = TRUE,
overwrite = TRUE)
Geospatial import and analysis with sf
Now finally some geospatial work. I’ll import the shapefile into R using sf’s
Screens shot by Sharon Machlis, IDG A map of Massachusetts ZIP Code Tabulation Areas with the tmap R package.
A map of Massachusetts ZIP Code Tabulation Areas with the tmap R package.
And it looks like I do indeed have Massachusetts geometry with polygons that could be ZIP codes.
Next I want to use the geocoded address data. This is currently a plain data frame, but it needs to be converted into an sf geospatial object with the right coordinate system.
We can do that with sf’s
st_as_sf() function. (Note: sf package functions that operate on spatial data start with
st_, which stands for “spatial” and “temporal.”)
st_as_sf() takes several arguments. In the code below, the first argument is the object to transform—my geocoded addresses. The second argument vector tells the function which columns have the x (longitude) and y (latitude) values. The third sets the coordinate reference system to 4326, so it’s the same as my ZIP code polygons.
point_geo <- st_as_sf(geocoded_addresses,
coords = c(x = "lon", y = "lat"),
crs = 4326)
Geospatial joins with sf
Now that I’ve set up my two data sets, calculating the ZIP code for each address is easy with sf’s
st_join() function. The syntax:
st_join(point_sf_object, polygon_sf_object, join = join_type)
In this example, I want to run
st_join() on the geocoded points first and the ZIP code polygons second. It’s a so-called left join format: All points in the first data (geocoded addresses) are included, but only points in the second (ZIP code) data that match. Finally, my join type is
st_within, since I want the match to be points within.
my_results <- st_join(point_geo, zipcode_geo,
join = st_within)
That’s it! Now if I look at my results by printing out several of the most important columns, you”ll see each address has a ZIP code (in the “name” column).
print(my_results[,c("query", "name", "geometry")])
# Simple feature collection with 2 features and 2 fields # geometry type: POINT # dimension: XY # bbox: xmin: -71.39105 ymin: 42.31348 xmax: -71.03673 ymax: 42.34806 # epsg (SRID): 4326 # proj4string: +proj=longlat +datum=WGS84 +no_defs # query name geometry # 1 492 Old Connecticut Path, Framingham, MA 01701 POINT (-71.39105 42.31348) # 2 250 Northern Ave., Boston, MA 02210 POINT (-71.03673 42.34806)
Mapping points and polygons with tmap
If you’d like to map the points and polygons, here’s one way to do it with tmap:
tm_bubbles(col = "red", size = 0.25)
Want more R tips? Head to the