Covid-19 Data Visualization with R

Posted by:

Data Source

UPDATE: JHU dataset changed its name and format, The data for USA and global are separated. The US data adapted a new format. Some amateur data scientists are in charge of the this project. 04-02-2020

The data source used for this analysis is the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository1 built the Center for Systems Science and Engineering, Johns Hopkins University (GitHub – CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE). The following is true for global data set.

Field description:

  1. Province/State: China – province name; US/Canada/Australia/ – city name, state/province name; Others – name of the event (e.g., “Diamond Princess” cruise ship); other countries – blank.
  2. Country/Region: country/region name conforming to WHO (will be updated).
  3. Last Update: MM/DD/YYYY HH:mm (24 hour format, in UTC).
  4. Confirmed: the number of confirmed cases. For Hubei Province: from Feb 13 (GMT +8), we report both clinically diagnosed and lab-confirmed cases. For lab-confirmed cases only (Before Feb 17), please refer to who_covid_19_situation_reports. For Italy, diagnosis standard might be changed since Feb 27 to “slow the growth of new case numbers.” (Source)
  5. Deaths: the number of deaths.
  6. Recovered: the number of recovered cases.

    Time series summary (csse_covid_19_time_series)

    This folder contains daily time series summary tables, including confirmed, deaths and recovered. All data are from the daily case report.

    Field description:

    1. Province/State: same as above.
    2. Country/Region: same as above.
    3. Lat and Long: a coordinates reference for the user.
    4. Date fields: M/DD/YYYY (UTC), the same data as MM-DD-YYYY.csv file.

R Packages Needed

Loading Data (UPDATED)

The data preparation code is shamelessly taken from here.

Each dataset has `r dim([1]` rows, corresponding to country/region/province/state. It has `r dim([2]` columns. Starting from column 5, each column corresponds to a single day. Here we draw a random sample of 10 rows and have a look at their first 10 columns.

84022059 US USA 840 22059 LaSalle Louisiana US 31.67884782 -92.15907765
84016019 US USA 840 16019 Bonneville Idaho US 43.38713372 -111.6161537
84032003 US USA 840 32003 Clark Nevada US 36.21458855 -115.0130241
84036117 US USA 840 36117 Wayne New York US 43.15494365 -77.02976528
84026025 US USA 840 26025 Calhoun Michigan US 42.24633834 -85.00493569
84037173 US USA 840 37173 Swain North Carolina US 35.48665845 -83.48748932
84048065 US USA 840 48065 Carson Texas US 35.40365929 -101.3542669
84049003 US USA 840 49003 Box Elder Utah US 41.52106798 -113.0832816
84054089 US USA 840 54089 Summers West Virginia US 37.65390597 -80.86009693
84080040 US USA 840 80040 Out of OK Oklahoma US 0 0

It shows that the data was last updated on 2020-04-02.

USA State-by-state Map


Animation by Day

The gganimate or gapminder package is almost working for a much easy job, but the data set have city/county with duplicate dates. So, the following rendering method used.

However the city/county stopped updating according to Cities and counties statics going dark · Issue #1068 · CSSEGISandData/COVID-19 · GitHub.

starting 3/10/20, they’re only doing the US by states. Province/State is smallest subcategory they go, and only for some countries (dependencies of some nations and provinces/states of US, China, Canada, Australia, I believe).





Add a Comment