Using R To Extract Census Data from Statistics Canada
I’ve noticed that Statistics Canada seems to lag other national statistical agencies in making their data available in user friendly formats.
There's that Can't-Do-Won't-Do spirit @StatCan_eng is famous for.
— Stephen Gordon (@stephenfgordon) January 15, 2019
Is there another statistics agency whose suggestion for obtaining annual GDP data - the most-used economic indicator - is to do it yourself? https://t.co/fl7DqpvTkl
In my own personal experience, it has been a difficult to get what was seemingly very basic information. By comparison, US census data is readily available in R packages (i.e. census, censusapi, and tidycensus).
Recently I needed to get some 2016 census data for the city of Toronto and I decided to check in with StatsCan because I had seen some references to new developer services, including cpr2016, a service that returns census data in JSON or XML format. So that got me prety excited.
But, the help documentation is extremely limited.
I fought with it for a day, required some help on stackoverflow and had to reach out to StatsCan’s help desk, but I was able to get what I needed so I thought I would share it here.
The key is constructing a URL to pass a request for particular data to Statistics CAnada. This is the help url that is provided:
https://www12.statcan.gc.ca/rest/census-recensement/CPR2016.json?lang=E&dguid=2016A000011124&topic=1¬es=0
I tried using rjson::fromJSON()
to get it, but ran into this error:
#Uncomment the following line if you do not have rjson installed.
#install.packages('rjson')
library(rjson)
census_url<-'https://www12.statcan.gc.ca/rest/census-recensement/CPR2016.json?lang=E&dguid=2016A000011124&topic=1¬es=0'
fromJSON(census_url)
## Error in fromJSON(census_url): unexpected character 'h'
So, the error message is that there’s an unexpected character ‘h’ somewhere in the file. Which is unfortunate and weird. It’s unfortunate because it shows that somehow R and Statistics Canada are not playing well together right off the top. It’s weird because yesterday, it was returning an unexpected charcter of /
.
When I plunked the test URL into a JSON validator, it pinpointed two forward slashes at the beginning of the returned output as being errors. When I looked around at how JSON files should be formatted, two forward slashes at the beginning did seem to be …not right.
So, with some help at Stackoverflow, I just figured out a way to remove those characters.
#Get the output with readlines
tmp<-readLines(census_url)
#Delete the two forward slashes
tmp<-substring(tmp,3)
head(tmp)
## [1] "{\"COLUMNS\":[\"PROV_TERR_ID\",\"PROV_TERR_NAME_NOM\",\"GEO_UID\",\"GEO_ID\",\"GEO_NAME_NOM\",\"GEO_TYPE\",\"TOPIC_THEME\",\"TEXT_ID\",\"HIER_ID\",\"INDENT_ID\",\"TEXT_NAME_NOM\",\"NOTE_ID\",\"NOTE\",\"T_DATA_DONNEE\",\"T_SYM\",\"M_DATA_DONNEE\",\"M_SYM\",\"F_DATA_DONNEE\",\"F_SYM\"],\"DATA\":[[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24000,\"6.1.1\",0,\"Total - Aboriginal identity for the population in private households - 25% sample data\",79,null,3.4460065E7,null,1.6971575E7,null,1.7488485E7,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24001,\"6.1.1.1\",1,\" Aboriginal identity\",80,null,1673785.0,null,813520.0,null,860265.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24002,\"6.1.1.1.1\",2,\" Single Aboriginal responses\",81,null,1629805.0,null,792970.0,null,836835.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24003,\"6.1.1.1.1.1\",3,\" First Nations (North American Indian)\",82,null,977235.0,null,471510.0,null,505725.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24004,\"6.1.1.1.1.2\",3,\" Métis\",null,null,587545.0,null,289435.0,null,298115.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24005,\"6.1.1.1.1.3\",3,\" Inuk (Inuit)\",null,null,65030.0,null,32030.0,null,32995.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24006,\"6.1.1.1.2\",2,\" Multiple Aboriginal responses\",83,null,21310.0,null,10165.0,null,11145.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24007,\"6.1.1.1.3\",2,\" Aboriginal responses not included elsewhere\",84,null,22670.0,null,10385.0,null,12290.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24008,\"6.1.1.2\",1,\" Non-Aboriginal identity\",null,null,3.278628E7,null,1.615806E7,null,1.6628225E7,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24009,\"6.1.2\",0,\"Total - Population by Registered or Treaty Indian status for the population in private households - 25% sample data\",85,null,3.4460065E7,null,1.697158E7,null,1.748849E7,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24010,\"6.1.2.1\",1,\" Registered or Treaty Indian\",86,null,820120.0,null,395670.0,null,424445.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24011,\"6.1.2.2\",1,\" Not a Registered or Treaty Indian\",null,null,3.3639945E7,null,1.6575905E7,null,1.706404E7,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24012,\"6.1.3\",0,\"Total - Aboriginal ancestry for the population in private households - 25% sample data\",87,null,3.4460065E7,null,1.6971575E7,null,1.748849E7,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24013,\"6.1.3.1\",1,\" Aboriginal ancestry (only)\",88,null,727790.0,null,356970.0,null,370815.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24014,\"6.1.3.1.1\",2,\" Single Aboriginal ancestry (only)\",89,null,709235.0,null,347985.0,null,361245.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24015,\"6.1.3.1.1.1\",3,\" First Nations (North American Indian) single ancestry\",82,null,573215.0,null,280040.0,null,293180.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24016,\"6.1.3.1.1.2\",3,\" Métis single ancestry\",null,null,91255.0,null,45740.0,null,45515.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24017,\"6.1.3.1.1.3\",3,\" Inuit single ancestry\",null,null,44765.0,null,22210.0,null,22555.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24018,\"6.1.3.1.2\",2,\" Multiple Aboriginal ancestries (only)\",90,null,18555.0,null,8985.0,null,9565.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24019,\"6.1.3.1.2.1\",3,\" First Nations (North American Indian) and Métis ancestries\",null,null,15140.0,null,7305.0,null,7835.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24020,\"6.1.3.1.2.2\",3,\" First Nations (North American Indian) and Inuit ancestries\",null,null,2470.0,null,1205.0,null,1265.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24021,\"6.1.3.1.2.3\",3,\" Métis and Inuit ancestries\",null,null,770.0,null,390.0,null,380.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24022,\"6.1.3.1.2.4\",3,\" First Nations (North American Indian), Métis and Inuit ancestries\",null,null,170.0,null,85.0,null,85.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24023,\"6.1.3.2\",1,\" Aboriginal and non-Aboriginal ancestries\",91,null,1402735.0,null,669710.0,null,733025.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24024,\"6.1.3.2.1\",2,\" Single Aboriginal and non-Aboriginal ancestries\",92,null,1347610.0,null,644210.0,null,703400.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24025,\"6.1.3.2.1.1\",3,\" First Nations (North American Indian) and non-Aboriginal ancestries\",null,null,881455.0,null,418215.0,null,463235.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24026,\"6.1.3.2.1.2\",3,\" Métis and non-Aboriginal ancestries\",null,null,441000.0,null,213940.0,null,227065.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24027,\"6.1.3.2.1.3\",3,\" Inuit and non-Aboriginal ancestries\",null,null,25155.0,null,12045.0,null,13105.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24028,\"6.1.3.2.2\",2,\" Multiple Aboriginal and non-Aboriginal ancestries\",93,null,55125.0,null,25505.0,null,29625.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24029,\"6.1.3.2.2.1\",3,\" First Nations (North American Indian), Métis and non-Aboriginal ancestries\",null,null,49325.0,null,22740.0,null,26585.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24030,\"6.1.3.2.2.2\",3,\" First Nations (North American Indian), Inuit and non-Aboriginal ancestries\",null,null,3470.0,null,1655.0,null,1815.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24031,\"6.1.3.2.2.3\",3,\" Métis, Inuit and non-Aboriginal ancestries\",null,null,2005.0,null,940.0,null,1065.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24032,\"6.1.3.2.2.4\",3,\" First Nations (North American Indian), Métis, Inuit and non-Aboriginal ancestries\",null,null,325.0,null,170.0,null,160.0,null],[\"01\",\"Canada\",\"2016A000011124\",\"01\",\"Canada\",null,\"Aboriginal peoples\",24033,\"6.1.3.3\",1,\" Non-Aboriginal ancestry (only)\",94,null,3.2329545E7,null,1.59449E7,null,1.6384645E7,null]]}"
Now, I won’t print the results here, because it looks messy, but it does work.
#Get from RJSON
out<-fromJSON(tmp)
#structure
str(out)
Now, it still looks like a pretty messy file, and I’m not going to lie, I don’t understand how to extract the data I need from this, but it does appear to be some progress.
The trick is finding out what the geographic codes are to get the data you need.
According to the help page, you can search for geocodes using another URL:
https://www12.statcan.gc.ca/rest/censusapp/CensusGeoService.json?lang={lang}&geolevel={geolevel}¬es={notes}
But, like I said, it’s extremely confusing about how to actually use that, and I had to turn to StatsCan’s help to get Toronto’s URL.
For future reference, this is the URL for the city of Toronto 2016 census profile.
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm?Lang=E&Geo1=CSD&Code1=3520005&Geo2=CD&Code2=3520&Data=Count&SearchText=3520005&SearchType=Begins&SearchPR=01&B1=All&TABID=3