R is a free software environment for statistical computing and graphics. The following is some basics about R data types.
Comparison of vector,list,matrix and dataframe
vector | list | matrix | dataframe | |
---|---|---|---|---|
creation | c | list | matrix | data.frame |
same type | Y | N | Y | N |
class | class of its elements | list | matrix | data.frame |
name | names | names | names(for col name) | names (for column name) |
dimnames | – | – | dimnames(c)<-list(c(row1,row2),c(col1,col2)) | sames as matrix |
arithmetic | member-wise (cycling rule) | – | cbind or cbind | – |
index | [] numeric or name | [] -list | [row,] [row,column] [,col] | [[]] -column vector |
[[]]-member(vector) | [n] and [[]] retrieve member as deconstructed | [] -column slice |
Basic Data Types
Character
1 2 3 4 5 |
x <- as.character(3.14) fname <- "Joe"; lname <-"Smith" paste(fname, lname) |
1 2 3 4 |
Joe Smith |
To extract a substring, we apply the substr function.
1 2 3 |
substr("Mary has a little lamb.", start=3, stop=12) |
Replace:
1 2 3 |
sub("little", "big", "Mary has a little lamb.") |
1 2 3 4 |
Mary has a big lamb. |
Complex
A complex value in R is defined via the pure imaginary value i.
1 2 3 4 |
x<-1+2i class(x) |
1 2 3 4 |
complex |
Vector
Vectors can be combined via the function c. Elements will be coerced into same type if not already.
1 2 3 4 5 6 |
a <- c(1,2,6) b <- c("a", "b","c","d") c <- c(a,b) c |
1 2 3 4 |
[1] "1" "2" "6" "a" "b" "c" "d" |
Arithmetic operations of vectors are performed member-by-member, i.e., memberwise.
Recycling Rule
If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector.
1 2 3 4 5 |
u <- c(10, 20, 30) v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) u + v |
1 2 3 4 |
[1] 11 22 33 14 25 36 17 28 39 |
Vector Index
- We retrieve values in a vector by declaring an index inside a single square bracket “[]” operator.
- Negative Index: it would strip the member whose position has the same absolute value as the negative index.
- Out-of-Range Index will be reported as NA
- can be used with numeric index vector, such as a[c(2,3,3)] etc
- or, with logical index vector, such as a[c(TRUE,FALSE)], NB should have the same length as the original.
Matrix
Matrix is contructed with matrix function. I can be combined based on rows with rbind, or columns with cbind, tranposed with t, decontructed with c.
List
A list is a generic vector containing other objects.
We retrieve a list slice with the single square bracket “[]”
and its member can be retrieved with double brakets “[[]]”
Dataframe
A data frame is used for storing data tables. It is a list of vectors of equal length.
Data Frame Column Vector
We reference a data frame column with the double square bracket “[[]]” operator.
1 2 3 4 5 6 |
mtcars[[9]] # or mtcars[["am"]] # or by is name mtcars$am #or by $ operator mtcars[,"am"] # or with single bracket |
Data Frame Column Slice
We retrieve a data frame column slice with the single square bracket “[]” operator.
1 2 3 4 5 6 |
mtcars[1] mtcars["mpg"] # or by its column name mtcars[c("mpg", "hp")] # or with a index vector summary(mtcars) |
Min. :10.40 | Min. :4.000 | Min. : 71.1 | Min. : 52.0 | Min. :2.760 | Min. :1.513 | Min. :14.50 | Min. :0.0000 | Min. :0.0000 | Min. :3.000 | Min. :1.000 |
1st Qu.:15.43 | 1st Qu.:4.000 | 1st Qu.:120.8 | 1st Qu.: 96.5 | 1st Qu.:3.080 | 1st Qu.:2.581 | 1st Qu.:16.89 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.:3.000 | 1st Qu.:2.000 |
Median :19.20 | Median :6.000 | Median :196.3 | Median :123.0 | Median :3.695 | Median :3.325 | Median :17.71 | Median :0.0000 | Median :0.0000 | Median :4.000 | Median :2.000 |
Mean :20.09 | Mean :6.188 | Mean :230.7 | Mean :146.7 | Mean :3.597 | Mean :3.217 | Mean :17.85 | Mean :0.4375 | Mean :0.4062 | Mean :3.688 | Mean :2.812 |
3rd Qu.:22.80 | 3rd Qu.:8.000 | 3rd Qu.:326.0 | 3rd Qu.:180.0 | 3rd Qu.:3.920 | 3rd Qu.:3.610 | 3rd Qu.:18.90 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.:4.000 | 3rd Qu.:4.000 |
Max. :33.90 | Max. :8.000 | Max. :472.0 | Max. :335.0 | Max. :4.930 | Max. :5.424 | Max. :22.90 | Max. :1.0000 | Max. :1.0000 | Max. :5.000 | Max. :8.000 |
Data Frame Row Slice
We retrieve rows from a data frame with the single square bracket operator, just like what we did with columns. However, in additional to an index vector of row positions, we append an extra comma character. This is important, as the extra comma signals a wildcard match for the second coordinate for column positions.
1 2 3 4 5 6 7 |
mtcars[c(3, 24),] # with numeric indexing mtcars["Camaro Z28",] # or with name mtcars[c("Datsun 710", "Camaro Z28"),] # or name vectors L = mtcars$am == 0 mtcars[L,] #or with logical indexing |
MAY
About the Author:
Beyond 8 hours - Computer, Sports, Family...