Dunedin

11-13 February, 2019

Registration is closed

Timetable

Please check back before the event for updates

Install instuctions for each of the workshop sessions are HERE

Venue

The event will be at the University of Otago, with each day starting in Castle 1 lecture theatre.

Talks will be held in the Castle 1 lecture theatre and workshops will be at the Otago Business School


Venue (workshops: 11-13 February)

The workshops will be held at the Otago Business School (aka Commerce Building), University of Otago.

Exact rooms for the workshops will be detailed at the event following the first talk of each day


Workshop sessions

Unlike previous Research Bazaars, this year workshop spaces will be on a first-come-first-served basis on the day.

These are the intended workshop sessions but are subject to change. Make sure to check back before the event

Unless specified otherwise, a laptop is required

We have an exciting lineup of workshops, all of which will be of an introductory nature and provide a foundation for further learning.

Workshop timetable

The timetable is now set but please check back before the event for updates. All workshops within a session are run concurrently in separate rooms. Nearly all workshops will require you to bring a laptop.

Skill level description:
In order to convey the skill level of a particular workshop the following terms are used to describe the assumed levels of prior knowledge or experience.

Beginner: Someone new to the topic with minimal prior knowledge of the workshop topic beyond the stated pre-requisites

Post-beginner: Greater expectation of prior knowledge or experience beyond the direct topic being covered in the workshop

Install instuctions for each of the workshop sessions are HERE

Monday 11th February

Session 1:

Good data organization is the foundation of any research project. Spreadsheets are tools that are commonly used to store data and we organize data in spreadsheets in the ways that we as humans want to work with the data. But computers require that data be organized in particular ways so in order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data.

This session will cover the best practices for using spreadsheets with data. This will include:

  • Learning about the "Tidy data" principles
  • How to organise data according to "Tidy data" principles
  • Dealing wth dates
  • Exporting data for use with other tools
This session would be useful for:
  • people who collect data
  • people who want to begin analysing data

Targeted skill level: beginner

Software required: spreadsheet program installed (e.g. MS Excel or Libre Office Calc)

Note taking is a key part of research. Digital notebooks enable you to not only capture ideas but also enrich them with embedded images, links to reference material, and also let you modify and improve them while keeping track of the previous states. They can also be used as self-promotion of your work. This session will cover the creation of a simple web-based notebook that could serve as a lab notebook or blog. By the end of this workshop, participants will have covered:
  • Creating up a github account
  • Creating a repository to store the notebook
  • Introduction to markdown syntax for formatting
  • Adding entries to the notebook
  • Modifying existing entries
  • Collaborating with multiple authors
This session would be useful for:
  • create a reprodicible and collaborative document
  • people who want to create a digital notebook
  • people who want to collaborate with notes
  • creating documentation

Targeted skill level: beginner

Software required: web-browser

What is data? What data might I have? Is there an easier way to do what I'm doing? These are some of the questions that you may ask during your research and this session is designed to help you start answering these questions. Computational research is full of jargon making it difficult to know if a particular program is going to solve the problems you have. But being able to put names to your problems is powerful is being able to start solving them. This lesson (based on Library Carpentry) introduces librarians and others to working with data. At the conclusion of the lesson you will be able to:
  • define terms, phrases, and concepts in software development and data science
  • understand what tasks are best performed by a computer
  • identify and use best practices in data structures

No prior knowledge required

Targeted skill level: beginner

Software required: web-browser

This session will provide a space for a yet to be determined session of a more skilled nature
Session 2:
R is a programming language that is useful for data analysis, and by learning R you can improve your efficiency and reproducibility of your analysis. This is an introductory session to the R programming language. Participants attending this session can expect by the end of the session to understand:
  • basic R syntax
  • the components of the RStudio interface
  • how to compute basic statistics
  • where to find further help for R
Useful for:
  • Data analysis
  • Day 2: functions in R and data manipulation in R
  • Day 3: data visualisation in R

Targeted skill level: beginner

Software required: R and RStudio

The power of the unix shell comes from it's reproduciblity and ability to automate and scale tasks. This is an introductory session to the unix commandline. By the end of this session participants can expect to:
  • open the commandline
  • understand how to navigate and create files and directories
  • run commandline programs
Useful for:
  • task automation
  • day 2 lessons for Make and Docker

Targeted skill level: beginner

Software required: MacOS/Linux - Terminal (comes pre-installed). Windows - GitBash

Have you found yourself repeating the same search but only making small changes each time? Maybe you want to know where all the occurances of a list of words are in a document. Instead of having to perform each search individually there are more efficient ways, such as creating code that embodies the commonalities in your search terms.
This session will cover the syntax for creating patterns (regular expressions) for use in searching text.
  • match text using simple patterns
  • understand basic regular expression syntax
Useful for:
  • People who want to improve their understanding of how search works
  • People who use find/replace

Targeted skill level: beginner

Software required: web-browser

Finding and replacing is a common task for text editing. This session will cover using regular expressions for the purposes of manipulating text by creating patterns. By attending this session you can expect to:

  • extract text from files that match patterns
  • find and replace text using patterns
  • rearrange columns in files

pre-requisites: a working knowledge of navigating the filesystem on the commandline and running commandline programs.

Targeted skill level: post-beginner

Software required: MacOS/Linux - Terminal (comes pre-installed). Windows - GitBash

Lesson link

Session 3:
Often the data we have is not in the format, or subsetted in the way we need to analyse it. This session is all about manipulating the data you have into the formats and groupings you need for analysis in R

By attending this session you can expect to understand how to:

  • subset data based on columns
  • filter rows by conditions
  • create new columns based on other columns
  • create data summaries
  • create columns or summaries by data groupings

pre-requisites: Introduction to R or prior experience with R

Targeted skill level: beginner

Software required: R and RStudio (please also install the tidyverse package)

This session directly continues on from introduction to unix shell pre-requisites: Introduction to unix shell

Targeted skill level: beginner

There are many publicly available datasets that can be used for research and to supplement data you may already have. Sites like Data.govt.nz helps people discover, learn and use open data easily; empowering, enabling informed decision making, and problem-solving for citizens and business alike.

By attending this session you can expect to:

  • Understand the purpose of data.govt.nz
  • Understand what open data is
  • Understand how to browse for open data sets
  • Be familiar with some of the main sources for open data

Software required: web-browser

Targeted skill level: beginner

This session is an unstructured breakout session to provide space to work or discuss together.

Tuesday 12th February

Session 4:
Being able to know what you did in the past and how it differs from the present is a key part of research. Using software, this can be automated so that you can focus on your research without having to be concerned about manually keeping track of all the different versions of documents or scripts you have.
This session will cover using version control with Git to automate tracking and dealing with changes when writing scripts within RStudio.

The session will cover:

  • setting up Git
  • adding files to be tracked
  • making and tracking changes
  • reviewing changes

pre-requisites: Introduction to R or prior experience with R

Targeted skill level: beginner

Software required: R and RStudio (please also install the tidyverse package)

This session will cover creating a workflow script to manage dependencies and outputs. After this session you should be able to:
  • understand targets and dependencies
  • create a basic scripted workflow
This session is useful for people who:
  • want to repeat a workflow with changing data
pre-requisites: Introduction to unix shell or equivalent

Targeted skill level: post-beginner

Software required: MacOS/Linux - Terminal (comes pre-installed). Windows - GitBash

This session will cover using OpenRefine for cleaning and tidying data. By the end of the session attendees should expect to:
  • Load data into OpenRefine
  • perform basic data cleaning operations
  • Export data from OpenRefine
This session would be useful for people who:
  • clean and organise data
  • want to apply the same data cleaning operation to multiple datasets

Targeted skill level: beginner

Software required: OpenRefine

This session will provide a space for a yet to be determined session of a more skilled nature
Session 5:
This session will be and introduction to how to create your own functions (methods) in R. By the end of the session you should:
  • Understand the how to create a function
  • Understand how to specify arguments to a function
  • Understand how to return data from a function
Uesful for:
  • People who want to specify their own methods for dealing with data
pre-requisites: Introduction to R or prior experience with R

Targeted skill level: beginner

Software required: R and RStudio

This introductory session will cover creating a reproducible workflow environment using docker images.
  • obtain a pre-built docker image
  • create a dockerfile to create a custom contatiner
  • access the docker container and run a command
  • share data between the host and the container
pre-requisites: Introduction to unix shell or equivalent

Targeted skill level: post-beginner

Software required: Docker

Lesson link

This presentation will cover general best-practice principles of management, storage and sharing of research data. It will include practical tips for improving data management practices that can be implemented immediately regardless of the type of data. By attending students will feel better prepared to respond to university, employer, funder and/or publisher data requirements.

Targeted skill level: all

Software required: none

This session will cover saving commands used in the unix shell and saving them into scripts for reuse.

pre-requisites: Introduction to unix shell or equivalent

Targeted skill level: beginner

Software required: MacOS/Linux - Terminal (comes pre-installed). Windows - GitBash

Session 6:
This session will cover using the markdown syntax and R code to create reproducible documents.

suggested prior knowledge: Introduction to R or prior experience with R

Targeted skill level: beginner

Software required: R and RStudio

This session will directly continues from Reproducible computational environments using containers.

pre-requisites: Reproducible computational environments using containers

Good data organization is the foundation of any research project. Spreadsheets are tools that are commonly used to store data and we organize data in spreadsheets in the ways that we as humans want to work with the data. But computers require that data be organized in particular ways so in order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data.

This session is extremely similar to that of "Best practices for data organisation in spreadsheets" on Monday.

This session will cover the best practices for using spreadsheets with data. This will include:

  • Learning about the "Tidy data" principles
  • How to organise data according to "Tidy data" principles
  • Dealing wth dates
  • Exporting data for use with other tools
This session would be useful for:
  • people who collect data
  • people who want to begin analysing data

Targeted skill level: beginner

Software required: spreadsheet program installed (e.g. MS Excel or Libre Office Calc)

This session will provide a space for a yet to be determined session of a more skilled nature

Wednesday 13th February

Session 7:
This session will cover how to make plots from data in R. This will be done using the ggplot2 package for R. Participants can expect to learn how to:
  • specify the data to be visualised
  • be able to create scatter plots, line graphs, bar plots, box and whisker plots, and histograms
  • understand how to make customisations to default themes

pre-requisites: Introduction to R or prior experience with R

Targeted skill level: beginner

Software required: R and RStudio (please also install the tidyverse package)

This session will be an introduction to querying databases. After this session participants can expect to:
  • understand what a relational database is
  • create basic queries for choosing columns
  • create basic queries for filtering data
  • create queries for summarising data based on groups

Targeted skill level: beginner

Software required: SQLite and http://sqlitebrowser.org

Github provides an online way to collaborate and track changes on plain text files, such as markdown. Markdown is a simple text language that encodes text formatting of a single document that can then be converted into multiple different formats such as html, doc, or pdf.
This session will cover getting started with Gitub and the basic syntax of markdown for creating formatted documents which can be converted into a website.
  • Creating up a github account
  • Creating a repository
  • Introduction to markdown syntax for formatting
  • Creating a simple webpage with markdown
  • Modifying pages and tracking changes
  • How to use Github for collaboration

Targeted skill level: beginner

Software required: web-browser

This session will be an unstructured session during ResBaz based on interest of participants.
Session 8:
This session will cover how to take your own functions in R and turn them into R packages to improve maintainability.

suggested prior knowledge: Introduction to R or prior experience with R

Targeted skill level: post-beginner

Software required: R and RStudio

This session will continue directly from Introduction to getting data from databases.

pre-requisites: Introduction to getting data from databases

This session directly continues from Introduction to Github

Pre-requisites: Introduction to Github

This session will be an unstructured session during ResBaz based on interest of participants.


Contact