The CoRsica Package for Hockey Analysis in R (0.2: Fundamentals)

EP: This is the third part in what I hope will become a lengthy and informative tutorial series on a pseudo-package I am building for R called coRsica. In this instalment, I’ll discuss some fundamentals of the R language and apply them to our Hello World script.

Review and More
In section 0.1 you were introduced to object classes, syntax rules, functions and some basic mathematical operators. There is still much more ground to cover when it comes to these fundamental concepts, so let’s do it right this time.

You should have saved the hello.R script from last tutorial, but in case you didn’t and to ensure everybody is abreast, I suggest you download the script here.

You’ll notice the syntactically incorrect code on lines 10 and 12 was not removed, but rather commented out. Chunks of code are sometimes turned to comments in this manner to save for future use, or to remind the scriptwriter of something. It’s not necessary that you keep these two lines of code past this point, however, it is important that you understand why somebody might want to.

Run the script once to ensure that the string “Hello, world!” is printed out, and the value stored in the object my_string. Switch the workspace view to Grid if you’re not already using that setting to see that the Type given to the variable my_string is character. So far, we’ve only dealt with two object classes: numeric and character. Go ahead and assign a negative decimal to a variable x:

> x <- -87.71

Your workspace will confirm that x is indeed an object of class numeric, proving this class is not exclusively for positive integers. Integers in R actually belong to their own class. Whole numbers, however, are not given this distinction by default. To verify this, store a whole number between 1 and 10 in a new variable y:

> y <- 7

The Type shown in your workspace for the object y is the same as for the object x. Though it may be confusing, while the number 7 is an integer, the datum stored in the variable y is not considered as such.

EP: I’ll introduce several new functions in this next paragraph in rapid succession. Don’t worry if you can’t remember them all or what they each do. They’re simply a means to explain the distinction between numeric and integer at this point. 

In addition to being shown in your workspace, the class of a given object can be queried using the class() command. Use class() to verify the class of my_string and y:

> class(my_string)
[1] “character”
> class(y)
[1] “numeric”

Here, the function returns a character describing the class of the argument passed to it (fun fact: you can check that this is true by asking class(class(y))). Another way to check that y is not an integer is by using the is.integer() function:

> is.integer(y)
[1] FALSE

Note the characteristics of the value returned here. The word FALSE in upper case is not quoted, hinting this isn’t a character. Can you think of a way to check what it is?

> class(is.integer(y))
[1] “logical”

We’ve just used our first nested function, so it’s important we discuss it before proceeding. In crude terms, nesting means an element contains one or more elements of a similar type. The is.integer() function above is contained within the class() function. In these cases, R evaluates the innermost function(s) first. This means the result of is.integer(y), which we’ve seen is the value FALSE, is passed as an argument to class().

The logical or boolean class is a binary type of data used to model true or false logic. In R, they take on the value of either TRUE or FALSE. The capitalization of all letters is vital to the syntax, but R will accept T and F as shortcuts. Store the value FALSE in three separate variables, bool1, bool2 and bool3, using three methods:

> bool1 <- is.integer(y)
> bool2 <- FALSE
> bool3 <- F

So, we’ve established y is not an integer. But, what is an integer in R and can we convert y into one? The answer is yes, using the as.integer() function. Use it to change y into an integer and store it in a new object, my_integer:

> my_integer <- as.integer(y)
> my_integer
[1] 7

If you check the value of my_integer in the workspace, however, you’ll see that R stores it with an L suffix. I’m not entirely sure what the origin of the L is, but R uses it to explicitly specify a number’s identity as an integer. You can use the L suffix to coerce a new variable to the class integer, like so:

> new_integer <- 7L

There are other classes we’ll encounter later on, but for now it is sufficient to simply understand that R is equipped to handle data of various types, and that these classes can be queried and sometimes transformed.

Another concept I rushed through in the previous section is mathematical operators. Recall that we performed addition and subtraction in the console using the + and – symbols. These are operators used to form mathematical expressions, and there are several more to learn. The complete list is below:

Screen Shot 2016-08-16 at 4.18.53 PM

Let’s start by giving our variables x and y more reasonable values:

> x <- 2
> y <- 5

Next, evaluate the following expressions using our new x and y variables:

> x + y
> y – x
> x * y
> x / y
> y^x
> y %% x
> y %/% x

With the exception of the last two expressions, these represent basic arithmetic. Modulus, on the other hand, may be unfamiliar to many. y mod x gives the remainder obtained from dividing y by x. Integer division, performed using the %/% operator, returns the result of a division after discarding the remainder. A property of these operators is that x == (x %% y) + y * ( x %/% y ) unless y is equal to zero. No, that double equal sign (==) is not a typo. It’s a new type of operator – a logical operator.

Logical Expressions
Logical expressions use a set of operators to evaluate whether certain conditions are met, returning a boolean value of either TRUE or FALSE. One such operator is the == symbol, indicating equality. You might think the regular = sign should serve this purpose, or even that they can be used interchangeably. Recall, though, that = is used to assign a value to an object. In R, “exactly equal to” is represented by the double equal sign. The complete set of logical operators is as follows:

Screen Shot 2016-08-16 at 5.03.19 PM

To get a feel for logical expressions, start with a simple one:

> 3 > 2     # Remember the first > is just the console prompt
[1] TRUE

R has evaluated the expression by verifying its conditions have been met. We can do the same with some other operators using the values stored in x and y:

> # x = 2; y = 5
> x >= y
[1] FALSE
> x != y
[1] TRUE
> x <= y
[1] TRUE

The & and | operators can be used to string together multiple conditions. For instance, [Condition 1] & [Condition 2] will return TRUE if and only if both conditions are met, and FALSE otherwise. As an exercise, create four logical expressions, two of which should return TRUE and two FALSE. At least one of your expressions should contain the & operator, and another should contain the | operator. The expressions below are examples, but writing your own is good practice.

> 7 <= 286
[1] TRUE
> class(“hockey”) == “numeric”
[1] FALSE
> is.integer(21L) & 65 > 8
[1] TRUE
> 14 >= 16 | class(“hockey”) == class(TRUE) | 2 + 2 == 5
[1] FALSE

There is no substitute for experience, so it’s worth the time to work at this until you feel truly comfortable with the concept. It can be particularly frustrating for those without a computer programming background, but it’s a vital idea to grasp for anybody hoping to progress in R.

Hello Again, World
Before moving on to new material, we can apply what we’ve learned so far to enhance our Hello World script from the last tutorial. I’ll also introduce a handy new function called paste() that can be used to combine strings. Before moving on, ensure that your hello.R script resembles mine:

Screen Shot 2016-08-16 at 5.47.47 PM

Let’s begin by tidying up a little. Since we’re making edits to the code, we should set the comment on line 2 to the current date. Because we also know by now that the syntax on line 8 is correct, we no longer need the comment:

Screen Shot 2016-08-16 at 5.50.36 PM

Again, this is about forming good habits. As you learn and begin to write more complex code, you will appreciate having learned to keep things clean and organized.

The next thing we’ll do is query the class of our object my_string, using class(). Store the result of that function in a new variable called my_class, making sure to write a comment describing what you’re doing:

Screen Shot 2016-08-16 at 5.57.01 PM

Next, we’ll use the paste() function to combine the contents of both our objects. Paste, like most R functions, accepts multiple arguments. These arguments are separated by commas, and provide additional details on what action you want a function to perform. In this case, each of the strings you wish to combine are passed as separate arguments, like so:

> paste(“Sidney”, “Crosby”)
[1] “Sidney Crosby”

Create a new object called statement, and assign it the string “Hello, world! is a character” using the paste() function. Then, substitute my_string for statement in the final print() command:

Screen Shot 2016-08-16 at 6.03.10 PM

[1] “Hello, world! is a character”

We can improve this statement, but I first need to introduce additional arguments for both paste() and print(). As you learn R, Google will be your best friend. You’ll be able to pull up documentation for R function like paste() to help give you a better understanding of how they may be used. Like I mentioned earlier, most R functions accept multiple arguments. Many of these are given default values, meaning you mustn’t specify them when using the function. In the documentation for paste(), in addition to accepting one or more objects to be concatenated, the argument sep has a default value of ” “. As its name might indicate, this argument is used to pass a string to separate each term. The output obtained above proves that paste() will separate each term with a space unless otherwise specified. If you had tried to add punctuation, the output would have resembled:

[1] “Hello, world! is a character .”

We’ll prevent this by explicitly setting sep to “”, an empty character. The new statement on line 11 should be:

statement <- paste(my_string, ” is a “, my_class, “.”, sep = “”)

Note that we’ve had to put spaces on either side of “is a” to account for the fact they are no longer formed by default. The next thing we want to do is put the original string “Hello, world!” in quotations. Since the printed output is already contained within quotations, and quotes in quotes are ugly, we’ll do two things. First, we’ll use the quote argument for print() to remove the quotation marks from the output. Use your friend Google to find out how to do this (I’ll have done it in my code below if you get stuck). The second thing you’ll do is add single quotes to the statement using paste (you’ll find that R does not like “””; see below for an explanation/solution).

EP: Open quotations or parentheses confuse R. When it sees a new quotation, it will evaluate everything as text until a new quotation completes it. This is more advanced material, but if you want to coerce R to treat something as a literal character, you can escape its properties using \ (backslash). You can ask for a literal double quote with “\””, but here’s the rub: not all functions in R know how to recognize escapes. Print(), for example, does not. Instead, you can use cat() with the added benefit that it does not surround the output in quotations.

Screen Shot 2016-08-16 at 6.50.40 PM

[1] ‘Hello, world!’ is a character.

Next, to illustrate how to use logical expressions in a real albeit impractical script, I’ll create a new object called is_character and assign to it the result of the expression my_class == “character”. Then, I’ll modify the statement to print “It is TRUE that ‘Hello, world!’ is a character.”:

Screen Shot 2016-08-16 at 8.33.59 PM

[1] It is TRUE that ‘Hello, world!’ is a character.

You’ll notice I’ve restructured my code. This is a matter of preference and you’re not obligated to follow suit. This is closer to my preferred structure, though in time you are likely to develop your own style.

The upper-case TRUE is an eyesore in our printed statement, so we’ll use a new function, tolower(), to convert it to lower-case:

tolower(is_character),                                                 # Line 13
[1] It is true that ‘Hello, world!’ is a character.     # Output

Voila!

In the next section, I’ll talk about the ubiquitous [1], vectors, matrices and data frames. Then, you’ll apply what you’ve learned to real hockey data.

Author: Emmanuel Perry

Creator and webmaster of corsica.hockey.

Leave a Reply