Inspired and adapted from here. dplyer tutorial here, or its own documentation. Another tutorial for dplyr and tidyr; The cheatsheet is useful too, install the dataset from the page’s footnote devtools::install_github("rstudio/EDAWR").
magrittr: Simplifying R code with pipes
%>%, This is not a pipe.
github here, a tutorial from the author, Magrittr.
Basic piping:
x %>% f is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
x %>% f %>% g %>% h is equivalent to h(g(f(x)))
The argument placeholder
x %>% f(y, .) is equivalent to f(y, x) x %>% f(y, z = .) is equivalent to f(y, z = x)
The “tee” operator,
%T>% can be used for this purpose and works exactly like %>%, except it returns the left-hand side value, rather than the potential result of the right-hand side operation.
1 2 3 4 5 6 |
rnorm(200) %>% matrix(ncol = 2) %T>% plot %>% # plot usually does not return anything. colSums |
Pipe with exposition of variables
1 2 3 4 5 6 7 8 |
iris %>% subset(Sepal.Length > mean(Sepal.Length)) %$% cor(Sepal.Length, Sepal.Width) data.frame(z = rnorm(100)) %$% ts.plot(z) |
Compound assignment pipe operations
1 2 3 |
iris$Sepal.Length %<>% sqrt |
This operator works exactly like %>%, except the pipeline assigns the result rather than returning it. It must be the first pipe operator in a longer chain.
dplyr
dplyr verbs | Description | base equivalent |
---|---|---|
select() | select columns | subset() |
filter() or slice() | filter | |
distinct() | unique rows /observations | unique() |
arrange() | re-order or arrange rows | order() |
mutate() or transmute() | only keep new variables,create new columns | transform() |
summarise() | summarise data by functions of choice | |
group_by() | allows for group operations in the “split-apply-combine” conceptor | |
sample_n() and sample_frac() | to take a random sample of rows |
select() can be used with:
ends_with() = Select columns that end with a character string
contains() = Select columns that contain a character string
matches() = Select columns that match a regular expression
one_of() = Select columns names that are from a group
tidyr
1 2 3 4 5 6 7 |
devtools::install_github("rstudio/EDAWR") library(EDAWR) tidyr::separate(storms, date,c("y","m","d")) tidyr::gather(cases, "year","n", 2:4) tidyr::spread(pollution, size, amount) |
gather:Gather columns into rows
spread Spread rows into columns
separate:Separate one column into several; Splitting a single variable into two
unite: Unite several columns into one; Merging two variables into one
MAY
About the Author:
Beyond 8 hours - Computer, Sports, Family...