Inspired and adapted from here. dplyer tutorial here, or its own documentation. Another tutorial for dplyr and tidyr; The cheatsheet is useful too, install the dataset from the page’s footnote devtools::install_github("rstudio/EDAWR").

## magrittr: Simplifying R code with pipes

%>%, This is not a pipe.

github here, a tutorial from the author, Magrittr.

### Basic piping:

x %>% f is equivalent to f(x)

x %>% f(y) is equivalent to f(x, y)

x %>% f %>% g %>% h is equivalent to h(g(f(x)))

### The argument placeholder

x %>% f(y, .) is equivalent to f(y, x) x %>% f(y, z = .) is equivalent to f(y, z = x)

### The “tee” operator,

%T>% can be used for this purpose and works exactly like %>%, except it returns the left-hand side value, rather than the potential result of the right-hand side operation.

1 2 3 4 5 6 |
rnorm(200) %>% matrix(ncol = 2) %T>% plot %>% # plot usually does not return anything. colSums |

### Pipe with exposition of variables

1 2 3 4 5 6 7 8 |
iris %>% subset(Sepal.Length > mean(Sepal.Length)) %$% cor(Sepal.Length, Sepal.Width) data.frame(z = rnorm(100)) %$% ts.plot(z) |

### Compound assignment pipe operations

1 2 3 |
iris$Sepal.Length %<>% sqrt |

This operator works exactly like %>%, except the pipeline assigns the result rather than returning it. It must be the first pipe operator in a longer chain.

## dplyr

dplyr verbs | Description | base equivalent |
---|---|---|

select() | select columns | subset() |

filter() or slice() | filter | |

distinct() | unique rows /observations | unique() |

arrange() | re-order or arrange rows | order() |

mutate() or transmute() | only keep new variables,create new columns | transform() |

summarise() | summarise data by functions of choice | |

group_by() | allows for group operations in the “split-apply-combine” conceptor | |

sample_n() and sample_frac() | to take a random sample of rows |

select() can be used with:

ends_with() = Select columns that end with a character string

contains() = Select columns that contain a character string

matches() = Select columns that match a regular expression

one_of() = Select columns names that are from a group

## tidyr

1 2 3 4 5 6 7 |
devtools::install_github("rstudio/EDAWR") library(EDAWR) tidyr::separate(storms, date,c("y","m","d")) tidyr::gather(cases, "year","n", 2:4) tidyr::spread(pollution, size, amount) |

gather:Gather columns into rows

spread Spread rows into columns

separate:Separate one column into several; Splitting a single variable into two

unite: Unite several columns into one; Merging two variables into one

MAY

About the Author:

Beyond 8 hours - Computer, Sports, Family...