Helpful bits of code for life
Time: 90 min
Description: This session will cover useful code snippets that are designed to improve your overall experience doing computational research. This will include how to customise your BASH and R environments, useful keyboard short cuts, and short pieces of code to do common tasks e.g. read a directory of files into R.
RStudio is the most popular environment for using R and so learning some keyboard shortcuts can make your life much nicer and prevent typos
Useful keyboard shortcuts:
alt
+ shift
+ K
: keyboard shortcut to display common keyboard shortcutsctrl
+ shift
+ M
: insert magrittr pipealt
+ -
: assignment arrow <-
ctrl
+ alt
+ I
: insert R code chunk in Rmarkdown scriptctrl
+ .
: jump to file/functionctrl
+ shift
+ .
: navigate through open scriptsctrl
+ shift
+ F10
: restart R# Load helper packages if using interactive session (doesn't alter your environment)
# Don't add 'analysis' packages here
if (interactive()) {
suppressMessages(require(devtools))
suppressMessages(require(usethis))
suppressMessages(require(testthat))
}
# set CRAN
options(repos = c(CRAN = "https://cloud.r-project.org/"))
# warn on partial matches
options(
warnPartialMatchArgs = TRUE,
warnPartialMatchDollar = TRUE,
warnPartialMatchAttr = TRUE
)
# fancy quotes are annoying and lead to
# 'copy + paste' bugs / frustrations
options(useFancyQuotes = FALSE)
Below is the standard template I use for creating an Rmarkdown document.
It creates a floating table of contents, and lets you toggle the code on or off, and also dates the report for when it is made.
The first code chunk sets my default of echoing all of my code. The second will load the tidyverse
which for me is pretty much always used and the quitely = FALSE
means that I don’t get the usual loading messages about conflicts coming through into my document.
---
title: a cool title
author: Murray Cadzow
date: "`r Sys.Date()` "
output:
html_document:
toc: true
toc_float: true
toc_depth: 4
code_folding: "show" # "hide" if code less important for audience
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
suppressMessages(library(tidyverse))
```
Here are a couple of other Rmarkdown tricks that can be useful. The RMarkdown cookbook is a more comprehensive resource for these though.
In rmd the code chunk eval
option takes more than just TRUE
or FALSE
- if you want to selectively run lines within a rmd code chunk you can exclude them explicitly:
```{r, eval = c(-1,-3)}
1
3
5 # only this line will evaluate
```
## 1
## 3
5 # only this line will evaluate
[1] 5
Sometimes, such as in the creation of this workshop, there is a need to show the code chunk verbatim. There is a section in the rmarkdown book about this and is extremely useful - https://bookdown.org/yihui/rmarkdown-cookbook/verbatim-code-chunks.html
To create verbatim code chunks, add `r ''`
after the {r}
part of the code chunk.
The knitr
package provides a function to make alright looking standard tables with options for custom column names, alignments and rounding.
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
mtcars %>%
select(1:4) %>%
head() %>%
kable(caption = "A table caption",
col.names = c("MPG", "Cylinders", "Displacement","Horse Power"))
MPG | Cylinders | Displacement | Horse Power | |
---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 |
Datsun 710 | 22.8 | 4 | 108 | 93 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 |
Valiant | 18.1 | 6 | 225 | 105 |
kableExtra
brings in extra table styling. Although this website formatting prevents this displaying as it should in a normal RMarkdown document. Check out https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html for the documentation and examples for kableExtra
.
library(kableExtra)
mtcars %>%
select(1:4) %>%
head() %>%
kbl(caption = "A table caption",
col.names = c("MPG", "Cylinders", "Displacement","Horse Power")) %>%
row_spec(0, angle = -45) %>%
kable_styling(bootstrap_options = "striped")
MPG | Cylinders | Displacement | Horse Power | |
---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 |
Datsun 710 | 22.8 | 4 | 108 | 93 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 |
Valiant | 18.1 | 6 | 225 | 105 |
Read in a directory of files
library(tidyverse)
files <- list.files(pattern = "*.csv", full.names = TRUE)
my_csvs <- map(files, read_csv)
here
janitor
usethis
library(usethis)
create_project(path = "path/to/new/project") # creates a new rstudio project and opens it
use_r(name = "new_r_script") # creates a new script with the name provided
edit_r_profile() # opens your Rprofile so you can edit it
quickly find out the number of blank entries in a column:
replace blank cells in a data.frame with NA:
data[data == ""] = NA
[1] 3 1 2
x[order(x)]
[1] "a" "b" "c"
df <- data.frame(x = rep(1:3, each = 2), y = 6:1, z = letters[1:6])
format your numbers into a fixed width (turns from numeric to character)
# returns character type of number rounded to 3 decimal places
sprintf('%.3f', 0.123456)
format all numeric columns to 3 decimal places and make into a table
library(tidyverse)
mtcars %>%
head() %>%
mutate(across(where(is.numeric), list(~sprintf('%.3f', .) ) )) %>%
kableExtra::kbl()
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | mpg_1 | cyl_1 | disp_1 | hp_1 | drat_1 | wt_1 | qsec_1 | vs_1 | am_1 | gear_1 | carb_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 21.000 | 6.000 | 160.000 | 110.000 | 3.900 | 2.620 | 16.460 | 0.000 | 1.000 | 4.000 | 4.000 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 21.000 | 6.000 | 160.000 | 110.000 | 3.900 | 2.875 | 17.020 | 0.000 | 1.000 | 4.000 | 4.000 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | 22.800 | 4.000 | 108.000 | 93.000 | 3.850 | 2.320 | 18.610 | 1.000 | 1.000 | 4.000 | 1.000 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 21.400 | 6.000 | 258.000 | 110.000 | 3.080 | 3.215 | 19.440 | 1.000 | 0.000 | 3.000 | 1.000 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | 18.700 | 8.000 | 360.000 | 175.000 | 3.150 | 3.440 | 17.020 | 0.000 | 0.000 | 3.000 | 2.000 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | 18.100 | 6.000 | 225.000 | 105.000 | 2.760 | 3.460 | 20.220 | 1.000 | 0.000 | 3.000 | 1.000 |
tidy way to transpose a dataframe/tibble
mtcars %>%
head() %>%
tibble::rownames_to_column() %>% # may or may not be needed
tidyr::pivot_longer(-rowname,
names_to = "var",
values_to = "value") %>%
tidyr::pivot_wider(names_from = "rowname",
values_from = "value")
# A tibble: 11 x 7
var `Mazda RX4` `Mazda RX4 Wag` `Datsun 710` `Hornet 4 Drive`
<chr> <dbl> <dbl> <dbl> <dbl>
1 mpg 21 21 22.8 21.4
2 cyl 6 6 4 6
3 disp 160 160 108 258
4 hp 110 110 93 110
5 drat 3.9 3.9 3.85 3.08
6 wt 2.62 2.88 2.32 3.22
7 qsec 16.5 17.0 18.6 19.4
8 vs 0 0 1 1
9 am 1 1 1 0
10 gear 4 4 4 3
11 carb 4 4 1 1
# … with 2 more variables: Hornet Sportabout <dbl>, Valiant <dbl>
pull out the nth string after a string split
purr::map_chr(stringr::str_split(string_vec, "pattern"), n)
purrr::map_chr(stringr::str_split(c("chr1","chr2","chr3"), "chr"), 2) # you would get back c("1","2","3")
[1] "1" "2" "3"
BASH is a common UNIX commandline or terminal, but for MacOS the default is zsh.
To find out which shell you are using enter this command:
echo $0
Creating your .bashrc or .zsh
Bash profile for login .bashrc
is the common file that controls your bash set up and usually can be found at ~/.bashrc
. Some systems (such as MacOS) also have a file .bash_profile
. If your system uses the .bash_profile
file, you can make it refer to .bashrc
by having this as the contents of .bash_profile
:
[[ -r ~/.bashrc ]] && . ~/.bashrc
In the .bashrc
file it is useful to set a customised prompt, set variables that are useful - e.g. PATH to define where bash looks for installed software - and set up some custom commands (aliases) to make common tasks easier.
Custom prompt Creating your own prompt in bash can be really useful rather than having a straight $
. http://ezprompt.net provides a nice way of modifying your prompt and providing the code to add to your .bashrc
.
Things you might want to do:
Exporting variables is a useful way for defining environmental settings. Often this is setting a bash variable to tell programs where to look for things. This website has a few examples of bash variables (https://www.thegeekstuff.com/2010/08/bash-shell-builtin-commands/).
It is useful to include the RSTUDIO_PANDOC variables below.
Rmarkdown Pandoc A useful one on the server, is defining where R is going to look for pandoc for compiling RMarkdown documents.
I have the following in my .bashrc file
export RSTUDIO_PANDOC=/usr/lib/rstudio/bin/pandoc
But the location is likely different on your computer. In R, use the command rmarkdown::find_pandoc()
to find out where the RStudio version of pandoc is located.
Setting this in your `.bashrc is important because there might be another instance of pandoc that is available on your PATH and might cause issues if you run R from the commandline. RSTUDIO_PANDOC is the name that R has specified to use if you want to customise which pandoc is used.
Bash records your history as it goes but if you are operating across multiple windows it doesn’t work the way you would hope for - e.g. it is only recorded from a single given session, even if you work in multiple. PROMPT_COMMAND is a bash variable that is run as part of running commands. This particular one is designed to time and date stamp commands (not run as root) and their working directory into a daily log file. The logs live in ~/.logs/
so this needs to be made for the command to run mkdir -p ~/.logs
.
export PROMPT_COMMAND='if [ "$(id -u)" -ne 0 ]; then echo "$(date "+%Y-%m-%d.%H:%M:%S") $(pwd) $(history 1)" >> ~/.logs/bash-history-$(date "+%Y-%m-%d").log; fi'
If I want to search my logs I can use grep <command> ~/.logs/*
and it will tell me all the times and directories I ran a command, and how I ran it. The history in these log files is made up of all commands you run on the computer, regardless of how many terminal windows you have open.
Aliases
If you have commands that you’re always typing out such as ls -lrth
it can be useful to create an alias
for the command that is shorter and easier to type out.
e.g.
alias ll="ls -lrth"
This can be written in your .bashrc
so that you can use them in new sessions. These can be very useful but remember they are only available on machines you have been able to customise your .bashrc
.
Virtual environments - conda - conda create - conda activate - conda deactivate