NCDC introduction

Scott Chamberlain

2019-09-26

rnoaa is an R wrapper for many NOAA data types, including National Climatic Data Center (NCDC).

Load rnoaa

library('rnoaa')
library('plyr')

Get info on a station by specifying a datasetid, locationid, and stationid

ncdc_stations(datasetid='GHCND', locationid='FIPS:12017', stationid='GHCND:USC00084289')

Search for data and get a data.frame

out <- ncdc(datasetid='NORMAL_DLY', datatypeid='dly-tmax-normal', startdate = '2010-05-01', enddate = '2010-05-10')
out$data

Note that the value column has strangely large numbers for temperature measurements. By convention, rnoaa doesn’t do any conversion of values from the APIs and some APIs use seemingly odd units.

You have two options here:

  1. Use the add_units parameter on ncdc() to have rnoaa attempt to look up the units. This is a good idea to try first.

  2. Consult the documentation for whiechever dataset you’re accessing. In this case, GHCND has a README which indicates TMAX is measured in tenths of degrees Celcius.

See a data.frame with units

As mentioned above, you can use the add_units parameter with ncdc() to ask rnoaa to attempt to look up units for whatever data you ask it to return. Let’s ask rnoaa to add units to some precipitation (PRCP) data:

From the above output, we can see that the units for PRCP values are “mm_tenths” which means tenths of a millimeter. You won’t always be so lucky and sometimes you will have to look up the documentation on your own.

Plot data, super simple, but it’s a start

out <- ncdc(datasetid='NORMAL_DLY', stationid='GHCND:USW00014895', datatypeid='dly-tmax-normal', startdate = '2010-01-01', enddate = '2010-12-10', limit = 300)
ncdc_plot(out)

Note that PRCP values are in units of tenths of a millimeter, as we found out above.

More on plotting

Example 1

Search for data first, then plot

Default plot

Create 14 day breaks

One month breaks

Example 2

Search for data

Make a plot, with 6 hour breaks, and date format with only hour

Combine many calls to noaa function

Search for two sets of data

out1 <- ncdc(datasetid='GHCND', stationid='GHCND:USW00014895', datatypeid='PRCP', startdate = '2010-03-01', enddate = '2010-05-31', limit=500)

out2 <- ncdc(datasetid='GHCND', stationid='GHCND:USW00014895', datatypeid='PRCP', startdate = '2010-09-01', enddate = '2010-10-31', limit=500)

Then combine with a call to ncdc_combine

df <- ncdc_combine(out1, out2)
head(df[[1]]); tail(df[[1]])

Then plot - the default passing in the combined plot plots the data together. In this case it looks kind of weird since a straight line combines two distant dates.

ncdc_plot(df)

But we can pass in each separately, which uses facet_wrap in ggplot2 to plot each set of data in its own panel.

ncdc_plot(out1, out2, breaks="45 days")