How to Write a Function When Run Again Adds to Itself
Writing Functions
A function is a self independent clamper of lawmaking which performs a specified task. Think of them as "mini-scripts" that are written separately from your main script.
Well-written code uses lots of functions. This probable includes:
- functions from base R,
- functions from packages you have installed, and
- functions y'all accept written yourself.
It's hard to practise annihilation in R without using some of the congenital-in functions, but accept you written y'all're own functions? If not, it's fourth dimension to get-go.
Beneath nosotros spend some fourth dimension outlining the 2 main types of function, why utilize functions, then how they are constructed.
To illustrate our examples, we will use a sample data gear up containing a series of different measurements from replicated algal samples. You lot can read the data into R direct from the web:
library(tidyverse) algae <- read_csv("Algal_traits.csv") (or if you like download the data fix, Algal_traits.csv). Taking a expect we see a agglomeration of variables like height, weight etc
## # A tibble: lx × 8 ## Location Blazon Species height length dryweight wetwet strength ## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 w1 blood-red.algae a 0.395 2.sixteen 0.956 2.46 2.993355157 ## two w1 ruby.algae a 0.0189 i.98 0.0655 ane.96 2.756726384 ## 3 w1 red.algae a 0.698 4.72 0.200 2.24 2.252360245 ## 4 w1 red.algae a 0.139 two.00 0.467 1.53 2.310011661 ## 5 w1 red.algae a 0.377 4.41 0.978 2.x 2.334597887 ## half dozen w2 ruby.algae a 0.0767 0.572 0.100 1.61 <NA> ## vii w2 red.algae a 0.933 0.839 0.564 1.75 2.472866529 ## 8 w2 scarlet.algae a 0.0617 4.62 0.252 1.72 ii.635546813 ## 9 w2 cerise.algae a 0.991 4.08 0.254 ane.71 2.521458416 ## 10 w2 blood-red.algae a 0.314 2.13 0.125 two.14 2.580392619 ## # … with 50 more rows Types of function
Broadly, there are two main types of function:
First are functions that do something and return an object. These functions take some specified inputs, practice some manipulations / operations, then render an object back to you. Examples include hateful() (takes mean of a vector), lm() (fits a linear model), or read.csv (loads a table of data).
Second are functions that accept some external outcome on your calculator or working environment. These functions do something but don't return any objects. Examples include things similar write.csv() (writes a file to disk), plot() (makes a plot), library() (loads a bundle).
For the first type, you lot'll often save the output in a variable and manipulate it farther. For example, let's say we desire to calculate the average of the variable height of the samples in the algae data. We can utilise the function hateful:
mean_height <- mean(algae$meridian) This code takes the mean of algae$top and stores it in the variable mean_height. We can query the answer by running the variable name:
mean_height ## [1] 0.4590399 We can also run the function without assigning the output to a variable. The output is still returned, this fourth dimension to the console - afterwards which information technology is printed and lost.
mean(algae$height) ## [1] 0.4590399 By contrast, output from the 2d type of function does not need to be assigned to a variable. Moreover, the role doesn't print anything to screen either. Due east.g.
write.csv(Algae, "data.csv") Why use functions?
And then why is it and so useful to carve up your script into many split, but cooperating, functions? Why not write 1 large, long script? There are multiple means in which writing functions can improve your coding.
Code with functions is easier to read
Writing functions is a good way of organising your belittling methods into self contained chunks. By and large, code written in this way is much easier to read.
Consider some of the functions that you lot have already used within R. For case, mean().
This part is already predefined within the R base package, meaning that you lot didn't take to tell the reckoner how to compute the hateful, and because that programming job has already been done, y'all can simply use the office in your own script. Imagine if every fourth dimension you needed a hateful you had to write the following:
sum(x) / length(x) Even this line of script uses 2 functions: the sum function and length function. If these weren't available, you lot would need to write out the full method every time yous needed a mean.
(x[one] + x[two] + 10[3] + x[iv] + x[5]) / 5 Instead, nosotros simply use mean without giving information technology two thoughts.
Importantly, information technology is much easier to tell what mean(x) is doing than the line above. Reading the code you lot know exactly what is happening. Using the full formula, it would exist less obvious what was happening every time you lot wanted to calculate the mean of a different variable.
Which raises an of import betoken: functions should accept a clear and informative name, that tells you what the function does.
Functions apace increment the ease of which y'all tin can read and interpret the lawmaking.
Information technology is non obvious what the code sqrt(var(algae$height)/length(algae$height)) what this is doing, whereas it is immediately obvious what the lawmaking standard_error(x) is doing.
Organise your workflow
Building on the idea of making code easier to read, functions can help organise your whole workflow and make information technology easier to follow. Often people have a big long analysis script, which is difficult to interpret. When you use functions, your assay script might stop up looking much simpler:
data <- read_csv("Algal_traits.csv") stats_species <- fit_model_species(data) stats_spatial <- fit_model_spatial(data) make_plot_species(stats_species) make_plot_spatial(stats_spatial) save_output(stats_species) Here all the functions like fit_model_species are ones that you've written yourself.
Wow, how much easier is that to engage with, than some long script with 100'due south of lines?
Reuse code (a.k.a. "Don't echo yourself")
Not only is using the mean function more informative (its easier to tell what your line of lawmaking is doing) it'southward also reusable. In one case a function is divers information technology can be used over and over again, not only within the aforementioned script but within other scripts too.
To further highlight this, we will go through an example of writing our ain function to summate the standard mistake of a bunch of variables. R has built in functions for the mean of a vector (hateful(10)) and standard departure (sd(x)) only not the standard error. To calculate standard mistake,
\[SE_\bar{ten}= \sqrt{\frac{var}{n}}\]
nosotros need the variance and sample size, n. These are relatively easy to calculate using other base of operations functions in R. var volition summate the variance and length gives the length of the vector and thus the sample size (n).
Let's say we first wanted the mean and standard fault of height. This is given by
sqrt(var(algae$height) / length(algae$height)) ## [i] 0.04067788 Imagine at present that yous wanted to calculate these same statistics on a different variable (e.grand., dry weight). When faced with wanting to employ this piece of lawmaking twice, we may exist tempted to just re-create-and-paste it to a new place, thus having two copies of the above snippet in our code. Yet, a much more than elegant (and benifitial in the long-term) approach is to go far into a function and call that part twice.
If we first define a function for standard error:
standard_error <- role(x) { sqrt(var(x) / length(x)) } we simply use standard_error like we would any other function.
standard_error(algae$acme) ## [1] 0.04067788 standard_error(algae$dryweight) ## [1] 0.02190001 Reduce adventure of errors
Wrapping lawmaking into functions reduces the chance of making inadvertent errors. Such errors may non cause your lawmaking to crash, but may crusade the results to be wrong. These types of mistakes are the hardest to find and can return our results meaningless.
There are at least two ways functions reduce the chance of errors.
Get-go, copy and paste leads to errors. Without a office, you may copy and past code all over the place. For example, if I wanted to calcualte the standard mistake of a agglomeration of variables (without using our new standard_error role)
sqrt(var(algae$height) / length(algae$height)) ## [one] 0.04067788 sqrt(var(algae$dryweight) / length(algae$dryweight)) ## [one] 0.02190001 sqrt(var(algae$length) / length(algae$dryweight)) ## [1] 0.1824489 Did you detect the mistake? I forgot to change the second variable on the 3rd line!!!!! The code will run just requite the wrong results. This is less likely if we write:
standard_error(algae$elevation) ## [1] 0.04067788 standard_error(algae$dryweight) ## [1] 0.02190001 standard_error(algae$length) ## [one] 0.1824489 Second, functions limit the scope of variables and enforce cleanup. When calculating something, it'south mutual to create new variables. Equally an example, Let's say we calculated standard error as follows
var_x <- var(algae$acme) north <- length(algae$height) sqrt(var_x / n) ## [1] 0.04067788 Note you now have two new objects in your environment: var_x and n:
var_x ## [1] 0.0992814 n ## [1] sixty You can become rid of them by running:
rm(var_x, n) (the office rm() "removes", i.e. deletes, objects from the surround).
But what if yous forget? There'south a real danger that after you accidentally reuse the variable northward or var_x, thinking they're something that they're non. And if they have non-specific names similar n the risk of this happening is high.
If instead y'all put the lawmaking to a higher place into a function, every bit follows, this danger disappears.
standard_error <- office(x) { var_x <- var(algae$tiptop) n <- length(algae$elevation) sqrt(var_x / n) } When you run:
standard_error(algae$height) ## [i] 0.04067788 The result is returned but variables var_x and n are nowhere to exist seen. That's because they were automatically cleaned upwards when the function exited.
Whatever variables created inside a function get automatically cleaned up at the end of the function. So using functions leaves u.s. with a nice make clean workspace. Moreover, the environment within the function is much safer than the global env, considering we're less likely to catch random variables from elsewhere.
Aid your brain to solve big problems
The best way to solve large complex problems is to split up information technology into a series of smaller problems. It's well known that our brains cannot cope with more nearly v-10 bits of information at any one fourth dimension.
Writing functions allows usa to place a series of smaller problems and solve these one by ane, using all of our cognitive ability.
When I look at the function standard_error as defined in a higher place, I can remember about the operations beingness performed (addition, division, square root)in isolation from the broader trouble I'm solving (studying algae).
Every bit a general dominion, a practiced function does one matter well. If that one matter is complicated, information technology be made up a bunch smaller functions (i.e. steps), each doing one matter well.
Writing your own functions
Now let's expect more than closely at the mechanics of writing a role.
The syntax of a function
A role definition has the post-obit form:
function_name <- function(arg1, arg2, ...) { statements # do useful stuff object # render something } function_name: The role's proper noun. Can be any valid text without a space, but you should avert using names that are used elsewhere in R. Check to see if your proper noun is already used every bit a keyword by asking for the assist page ?function_name (no 100% guarantee, simply a good check). Also, aim for names that describe what the function does. A long name like calculate_standard_error is much improve than something brusque and unintuitive like f.
arg1, arg2, …: The arguments of the function. You lot can write a office with any number of arguments, with those beingness any R objects (numeric, strings, characters, data.frames, matrices, other functions).
function body: The lawmaking between the {} is the function body and run every time the function is chosen. This is the code that is doing all the useful stuff and is chosen the office body.
render value: The last line of lawmaking is the object to be returned. Some times you lot'll see people write return(object), though information technology'south plenty to write object.
Using this format, a role to summate the standard error of the values in the object x would exist:
standard_error <- office(x) { sqrt(var(x) / length(ten)) } To be able to use the office, y'all need to run that lawmaking into your console. Once defined we can telephone call the role like we would any other role.
standard_error(algae$height) ## [1] 0.04067788 Default arguments
Let's accept a closer expect at the function mean. Typing ?mean into the panel brings upwardly the relevant "assist" details. Note the construction
mean(x, trim = 0, na.rm = Faux, ...) The first argument ten is our vector of numbers. To utilise the part nosotros demand to specify something for x, e.g.
mean(x = algae$height) or just
mean(algae$height) The outset version makes information technology explicit that the values in algae$superlative outside of the function are passed to the variable x within the function. The 2nd version does the same thing, but less explictly. It works because R takes the values of peak and maps information technology onto the first unnamed argument in our function telephone call onto the first unnamed argument in the function definition. So the following will as well work:
hateful(na.rm = TRUE, x = algae$peak) mean(na.rm = TRUE, algae$top) Just what are those are other arguments in the function definition: trim and na.rm? These are optional arguments, with default values set every bit specified. The function needs a value to run simply unless yous specify information technology, information technology volition employ the default.
Try running the hateful() function on the strength variable.
hateful(algae$forcefulness) ## Warning in mean.default(algae$strength): argument is non numeric or logical: ## returning NA ## [1] NA Notice we become NA, this is because by default the function doesn't know how to deal with missing values (NA is a missing value) and there is one in that cavalcade of the information. How you lot bargain with missing values is highly dependent on what you are trying to summate (see the help module on importing data), but in this case, we're happy remove NAdue south before calculating the hateful. This can be accomplished by setting the argument for na.rm to Truthful:
mean(algae$strength, na.rm = TRUE) ## Alert in mean.default(algae$strength, na.rm = TRUE): argument is not numeric ## or logical: returning NA ## [1] NA The functions hateful, var, sd, sum all behave similarly. Without specifying the statement, the functions all use their default value, which in this case is na.rm=False. So these give the same issue
hateful(algae$force) ## Warning in mean.default(algae$force): argument is non numeric or logical: ## returning NA ## [i] NA hateful(algae$strength, na.rm = Faux) ## Alert in mean.default(algae$strength, na.rm = FALSE): argument is non numeric ## or logical: returning NA ## [one] NA Just, we can override this if that'south what we want:
hateful(algae$strength, na.rm = TRUE) ## Alert in mean.default(algae$strength, na.rm = TRUE): statement is not numeric ## or logical: returning NA ## [1] NA Yous'll notice that many functions have arguments with default values set.
Going back to our new function standard_error, permit's add a new argument na.rm and then that it behaves like hateful and the other function listed above:
standard_error <- function(x, na.rm = Imitation) { sqrt(var(10, na.rm = na.rm) / sum(!is.na(x))) } Similar the other functions, we've set the default behaviour of na.rm to FALSE.
Now, let'due south try out our new office on the strength variable with missing data, alternate na.rm = Truthful and na.rm = FALSE.
standard_error(algae$forcefulness) ## Warning in var(x, na.rm = na.rm): NAs introduced past coercion ## [1] NA standard_error(algae$strength, na.rm = FALSE) ## Alarm in var(x, na.rm = na.rm): NAs introduced past coercion ## [one] NA standard_error(algae$strength, na.rm = True) ## Warning in var(x, na.rm = na.rm): NAs introduced past coercion ## [one] 0.03870419 Within the function the value for na.rm that is received by the function is passed into the var office. The var function already has a na.rm argument already built within it (run into assist file ?var), but length does not. We tin employ the code function sum(!is.na(x) to summate n. The part is.na will test each value of the vector, x, to meet if information technology is missing. If it not missing (the ! means NOT), and then it returns a True for that position, and past counting the values returned as TRUE with sum, we are effectively counting but values that are not missing.
Functions that extend functions
Let's say you have a script where you continually want to set na.rm=Truthful and go ill of typing this everywhere:
standard_error(algae$height, na.rm = Truthful) standard_error(algae$forcefulness, na.rm = TRUE) ... (Besides, we're also repeating ourselves a lot and then increasing the risk of errors – what if we forget?)
One approach here is to define a new function that builds of our previous function but with the desired behaviour. E.g.
standard_error_narm <- role(x) { standard_error(x, na.rm = TRUE) } Nosotros tin now call the new function and the the same result equally the above specifying na.rm=TRUE
standard_error_narm(algae$forcefulness) ## Warning in var(x, na.rm = na.rm): NAs introduced past coercion ## [1] 0.03870419 While the example with standard_error is perhaps a bit trivial, yous can take this arroyo all over the place. For case, a role that makes a style of plot with defaults set merely the way you like them.
What's the ... argument for?
Discover the statement ... in the definition of the hateful office above? What's that about? The ..., or ellipsis, element in the function definition allows for other arguments to exist passed into the function, and passed onto to some other office within the function beingness called, without having to write them all out past proper noun. For case, in the definition of the function standard_error_narm we could instead write
standard_error_narm <- part(...) { standard_error(..., na.rm = TRUE) } When you call standard_error_narm defined similar this, anything other than na.rm will exist passed directly into the side by side role. This avoids repeating the arguments of ane function when defining another.
A less trivial example is using plot. I could write a function setting changing some of defaults for plot, and so that I don't have to keep repeating these.
my_plot <- role(...) { plot(..., pch = 16, las = 1, log = "xy") } Storing and using functions
One time you go into the habit of writing functions information technology's a skillful idea to go along them in a separate file containing your functions together. Why? Because otherwise yous accept these big clunky files clogging up your script. If y'all've solved the problem of how to do something, why non stuff it away somewhere you can go, but only if needed.
To get become your functions out of the style, we recommend keeping all the functions for each project together in a folder called R inside your projection directory. (For more on project prepare run into our mail on projection direction.)
To make these functions accessible within your workflow, you then use the function source to read the function files into memory, e.m.
source("R/stats.R") Often, you lot may have a serial of files
source("R/data_cleaning.R") source("R/stats.R") source("R/plots.R") It's a thing of preference whether yous apply a single or multiple files.
Writing functions to work with pipes %>%
For many of us, pipes have get an essential part of our workflow. (If this is strange to y'all, run across our postal service using pipes under data manipulation).
Importantly, you can write functions that work with the piping operator. All you demand to do is setup your so that the first argument is the object existence piped into the part. In fact, our standard_error already works with pipes, assuming you are passing in 10:
algae$height %>% standard_error() ## [1] 0.04067788 Returning multiple arguments
The examples above all return a unmarried item. What if I want to return multiple items from a office? The answer is to return a list object. Lists are helpful because y'all tin bundle together many different items.
For example, we could write a function that returns several statistics of a variable:
summary_stats <- function(ten, na.rm = TRUE) { list( hateful = hateful(x, na.rm = na.rm), var = var(10, na.rm = na.rm), n = sum(!is.na(10)) ) } If we run this function, we receive an object that has named elements:
height_stats <- summary_stats(algae$height) names(height_stats) ## [1] "mean" "var" "n" height_stats$hateful ## [1] 0.4590399 height_stats$var ## [1] 0.0992814 height_stats$due north ## [ane] 60 In fact many functions do this, due east.g.lm() (for plumbing fixtures a linear model). Fitting a model we tin can check it's a list, then inquire for a proper name of the returned elements, and start calling them by name:
fit <- lm(algae$superlative ~ algae$dryweight) is.list(fit) ## [i] Truthful names(fit) ## [i] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [nine] "xlevels" "phone call" "terms" "model" fit$coefficients ## (Intercept) algae$dryweight ## 0.4054402 0.1276447 What makes a good function
Finally, permit's recap a few pointers on what makes a practiced role.
It's short
Ideally each function does i thing well. Often this means lots of short functions. Short functions are extremely useful. Fifty-fifty if the code in the function body is more complex, ideally it still does one thing well.
It does one thing well It has an intuitive proper name
Source: https://environmentalcomputing.net/coding-skills/writing-functions/
0 Response to "How to Write a Function When Run Again Adds to Itself"
Post a Comment