Useful code snippets for everyday tasks

Helpful bits of code for life

Murray Cadzow (University of Otago)
2021-07-07

Time: 90 min

Description: This session will cover useful code snippets that are designed to improve your overall experience doing computational research. This will include how to customise your BASH and R environments, useful keyboard short cuts, and short pieces of code to do common tasks e.g. read a directory of files into R.

R

Increasing Efficiency

RStudio is the most popular environment for using R and so learning some keyboard shortcuts can make your life much nicer and prevent typos

Useful keyboard shortcuts:

Rprofile

# Load helper packages if using interactive session (doesn't alter your environment)
# Don't add 'analysis' packages here
if (interactive()) {
  suppressMessages(require(devtools))
  suppressMessages(require(usethis))
  suppressMessages(require(testthat))
}

# set CRAN
options(repos = c(CRAN = "https://cloud.r-project.org/"))

# warn on partial matches
options(
  warnPartialMatchArgs = TRUE,
  warnPartialMatchDollar = TRUE,
  warnPartialMatchAttr = TRUE
)

# fancy quotes are annoying and lead to
# 'copy + paste' bugs / frustrations
options(useFancyQuotes = FALSE)

RMarkdown

Standard template:

Below is the standard template I use for creating an Rmarkdown document.

It creates a floating table of contents, and lets you toggle the code on or off, and also dates the report for when it is made.

The first code chunk sets my default of echoing all of my code. The second will load the tidyverse which for me is pretty much always used and the quitely = FALSE means that I don’t get the usual loading messages about conflicts coming through into my document.

---
title: a cool title
author: Murray Cadzow
date: "`r Sys.Date()` "
output:
  html_document:
    toc: true
    toc_float: true
    toc_depth: 4
    code_folding: "show" # "hide" if code less important for audience
---

```{r setup, include=FALSE} 
knitr::opts_chunk$set(echo = TRUE)
```

```{r}
suppressMessages(library(tidyverse))
```

Rmarkdown tricks

Here are a couple of other Rmarkdown tricks that can be useful. The RMarkdown cookbook is a more comprehensive resource for these though.

Selective evaluation of lines within a code chunk

In rmd the code chunk eval option takes more than just TRUE or FALSE - if you want to selectively run lines within a rmd code chunk you can exclude them explicitly:

```{r, eval = c(-1,-3)}
1

3

5 # only this line will evaluate
```
## 1

## 3

5 # only this line will evaluate
[1] 5

Creating verbatim code chunks

Sometimes, such as in the creation of this workshop, there is a need to show the code chunk verbatim. There is a section in the rmarkdown book about this and is extremely useful - https://bookdown.org/yihui/rmarkdown-cookbook/verbatim-code-chunks.html

To create verbatim code chunks, add `r ''` after the {r} part of the code chunk.

Making better tables

The knitr package provides a function to make alright looking standard tables with options for custom column names, alignments and rounding.

library(knitr)
mtcars %>% kable(caption = "A better table from knitr::kable")
Table 1: A better table from knitr::kable
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mtcars %>% 
  select(1:4) %>% 
  head() %>% 
  kable(caption = "A table caption", 
        col.names = c("MPG", "Cylinders", "Displacement","Horse Power"))
Table 2: A table caption
MPG Cylinders Displacement Horse Power
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
Hornet Sportabout 18.7 8 360 175
Valiant 18.1 6 225 105

kableExtra brings in extra table styling. Although this website formatting prevents this displaying as it should in a normal RMarkdown document. Check out https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html for the documentation and examples for kableExtra.

library(kableExtra)

mtcars %>% 
  select(1:4) %>% 
  head() %>% 
  kbl(caption = "A table caption", 
      col.names = c("MPG", "Cylinders", "Displacement","Horse Power")) %>% 
  row_spec(0, angle = -45) %>% 
  kable_styling(bootstrap_options = "striped")
Table 3: A table caption
MPG Cylinders Displacement Horse Power
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
Hornet Sportabout 18.7 8 360 175
Valiant 18.1 6 225 105

Workflow advice

Read in a directory of files

library(tidyverse)
files <- list.files(pattern = "*.csv", full.names = TRUE)

my_csvs <- map(files, read_csv)
library(usethis)

create_project(path = "path/to/new/project") # creates a new rstudio project and opens it
use_r(name = "new_r_script") # creates a new script with the name provided
edit_r_profile() # opens your Rprofile so you can edit it

quickly find out the number of blank entries in a column:

table(is.na(df$colname))

replace blank cells in a data.frame with NA:

data[data == ""] = NA
Reorder a vector
x <- c("b", "c", "a")

# sorts the vector then returns the indices 
order(x)
[1] 3 1 2
x[order(x)]
[1] "a" "b" "c"
df <- data.frame(x = rep(1:3, each = 2), y = 6:1, z = letters[1:6])

format your numbers into a fixed width (turns from numeric to character)

# returns character type of number rounded to 3 decimal places
sprintf('%.3f', 0.123456) 

format all numeric columns to 3 decimal places and make into a table

library(tidyverse)
mtcars %>% 
  head() %>% 
  mutate(across(where(is.numeric), list(~sprintf('%.3f', .) ) )) %>% 
  kableExtra::kbl()
mpg cyl disp hp drat wt qsec vs am gear carb mpg_1 cyl_1 disp_1 hp_1 drat_1 wt_1 qsec_1 vs_1 am_1 gear_1 carb_1
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.000 6.000 160.000 110.000 3.900 2.620 16.460 0.000 1.000 4.000 4.000
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.000 6.000 160.000 110.000 3.900 2.875 17.020 0.000 1.000 4.000 4.000
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 22.800 4.000 108.000 93.000 3.850 2.320 18.610 1.000 1.000 4.000 1.000
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 21.400 6.000 258.000 110.000 3.080 3.215 19.440 1.000 0.000 3.000 1.000
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 18.700 8.000 360.000 175.000 3.150 3.440 17.020 0.000 0.000 3.000 2.000
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 18.100 6.000 225.000 105.000 2.760 3.460 20.220 1.000 0.000 3.000 1.000

tidy way to transpose a dataframe/tibble

mtcars %>%
  head() %>% 
  tibble::rownames_to_column() %>% # may or may not be needed
  tidyr::pivot_longer(-rowname,
                      names_to = "var", 
                      values_to = "value") %>% 
  tidyr::pivot_wider(names_from = "rowname", 
                     values_from = "value")
# A tibble: 11 x 7
   var   `Mazda RX4` `Mazda RX4 Wag` `Datsun 710` `Hornet 4 Drive`
   <chr>       <dbl>           <dbl>        <dbl>            <dbl>
 1 mpg         21              21           22.8             21.4 
 2 cyl          6               6            4                6   
 3 disp       160             160          108              258   
 4 hp         110             110           93              110   
 5 drat         3.9             3.9          3.85             3.08
 6 wt           2.62            2.88         2.32             3.22
 7 qsec        16.5            17.0         18.6             19.4 
 8 vs           0               0            1                1   
 9 am           1               1            1                0   
10 gear         4               4            4                3   
11 carb         4               4            1                1   
# … with 2 more variables: Hornet Sportabout <dbl>, Valiant <dbl>

pull out the nth string after a string split

purr::map_chr(stringr::str_split(string_vec, "pattern"), n)
purrr::map_chr(stringr::str_split(c("chr1","chr2","chr3"), "chr"), 2) # you would get back c("1","2","3")
[1] "1" "2" "3"

Bash

Configuration

.bashrc or .zshrc

BASH is a common UNIX commandline or terminal, but for MacOS the default is zsh.

To find out which shell you are using enter this command:

echo $0

Creating your .bashrc or .zsh

Bash profile for login .bashrc is the common file that controls your bash set up and usually can be found at ~/.bashrc. Some systems (such as MacOS) also have a file .bash_profile. If your system uses the .bash_profile file, you can make it refer to .bashrc by having this as the contents of .bash_profile:

[[ -r ~/.bashrc ]] && . ~/.bashrc

In the .bashrc file it is useful to set a customised prompt, set variables that are useful - e.g. PATH to define where bash looks for installed software - and set up some custom commands (aliases) to make common tasks easier.

Custom prompt Creating your own prompt in bash can be really useful rather than having a straight $. http://ezprompt.net provides a nice way of modifying your prompt and providing the code to add to your .bashrc.

Things you might want to do:

Exported variables

Exporting variables is a useful way for defining environmental settings. Often this is setting a bash variable to tell programs where to look for things. This website has a few examples of bash variables (https://www.thegeekstuff.com/2010/08/bash-shell-builtin-commands/).

It is useful to include the RSTUDIO_PANDOC variables below.

Rmarkdown Pandoc A useful one on the server, is defining where R is going to look for pandoc for compiling RMarkdown documents.

I have the following in my .bashrc file

export RSTUDIO_PANDOC=/usr/lib/rstudio/bin/pandoc

But the location is likely different on your computer. In R, use the command rmarkdown::find_pandoc() to find out where the RStudio version of pandoc is located.

Setting this in your `.bashrc is important because there might be another instance of pandoc that is available on your PATH and might cause issues if you run R from the commandline. RSTUDIO_PANDOC is the name that R has specified to use if you want to customise which pandoc is used.

Better bash history

Bash records your history as it goes but if you are operating across multiple windows it doesn’t work the way you would hope for - e.g. it is only recorded from a single given session, even if you work in multiple. PROMPT_COMMAND is a bash variable that is run as part of running commands. This particular one is designed to time and date stamp commands (not run as root) and their working directory into a daily log file. The logs live in ~/.logs/ so this needs to be made for the command to run mkdir -p ~/.logs.

export PROMPT_COMMAND='if [ "$(id -u)" -ne 0 ]; then echo "$(date "+%Y-%m-%d.%H:%M:%S") $(pwd) $(history 1)" >> ~/.logs/bash-history-$(date "+%Y-%m-%d").log; fi'

If I want to search my logs I can use grep <command> ~/.logs/* and it will tell me all the times and directories I ran a command, and how I ran it. The history in these log files is made up of all commands you run on the computer, regardless of how many terminal windows you have open.

Aliases

If you have commands that you’re always typing out such as ls -lrth it can be useful to create an alias for the command that is shorter and easier to type out.

e.g.

alias ll="ls -lrth"

This can be written in your .bashrc so that you can use them in new sessions. These can be very useful but remember they are only available on machines you have been able to customise your .bashrc.

Python

Virtual environments - conda - conda create - conda activate - conda deactivate