OER Commons

Data Wrangling and Processing for Genomics

Unrestricted Use

CC BY

Data Wrangling and Processing for Genomics

Rating

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Thomas; Ahmed R. Hasan; Aniello Infante; Anita Schürch; Dev Paudel; Erin Alison Becker; Fotis Psomopoulos; François Michonneau; Gaius Augustus; Gregg TeHennepe; Jason Williams; Jessica Elizabeth Mizzi; Karen Cranston; Kari L Jordan; Kate Crosby; Kevin Weitemier; Lex Nederbragt; Luis Avila; Peter R. Hoyt; Rayna Michelle Harris; Ryan Peek; Sheldon John McKay; Sheldon McKay; Taylor Reiter; Tessa Pierce; Toby Hodges; Tracy Teal; Vasilis Lenis; Winni Kretzschmar; dbmarchant
Date Added:: 08/07/2020

More Less

Unrestricted Use

CC BY

Introduction to R for Geospatial Data

Rating

The goal of this lesson is to provide an introduction to R for learners working with geospatial data. It is intended as a pre-requisite for the R for Raster and Vector Data lesson for learners who have no prior experience using R. This lesson can be taught in approximately 4 hours and covers the following topics: Working with R in the RStudio GUI Project management and file organization Importing data into R Introduction to R’s core data types and data structures Manipulation of data frames (tabular data) in R Introduction to visualization Writing data to a file The the R for Raster and Vector Data lesson provides a more in-depth introduction to visualization (focusing on geospatial data), and working with data structures unique to geospatial data.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Anne Fouilloux; Chris Prener; Claudia Engel; David Mawdsley; Erin Becker; François Michonneau; Ido Bar; Jeffrey Oliver; Juan Fung; Katrin Leinweber; Kevin Weitemier; Kok Ben Toh; Lachlan Deer; Marieke Frassl; Matt Clark; Miles McBain; Naupaka Zimmerman; Paula Andrea Martinez; Preethy Nair; Raniere Silva; Rayna Harris; Richard McCosh; Vicken Hillis; butterflyskip
Date Added:: 08/07/2020

More Less

Project Organization and Management for Genomics

Unrestricted Use

CC BY

Project Organization and Management for Genomics

Rating

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:: Business and Communication; Genetics; Life Science; Management
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Bérénice Batut; Daniel O. S. Ouso; Deborah Paul; Erin Alison Becker; François Michonneau; Jason Williams; Juan A. Ugalde; Kevin Weitemier; Laura Williams; Paula Andrea Martinez; Peter R. Hoyt; Rayna Michelle Harris; Taylor Reiter; Toby Hodges; Tracy Teal
Date Added:: 08/07/2020

More Less

Unrestricted Use

CC BY

R for Reproducible Scientific Analysis

Rating

This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam H. Sparks; Ahsan Ali Khoja; Amy Lee; Ana Costa Conrado; Andrew Boughton; Andrew Lonsdale; Andrew MacDonald; Andris Jankevics; Andy Teucher; Antonio Berlanga-Taylor; Ashwin Srinath; Ben Bolker; Bill Mills; Bret Beheim; Clare Sloggett; Daniel; Dave Bridges; David J. Harris; David Mawdsley; Dean Attali; Diego Rabatone Oliveira; Drew Tyre; Elise Morrison; Erin Alison Becker; Fernando Mayer; François Michonneau; Giulio Valentino Dalla Riva; Gordon McDonald; Greg Wilson; Harriet Dashnow; Ido Bar; Jaime Ashander; James Balamuta; James Mickley; Jamie McDevitt-Irwin; Jeffrey Arnold; Jeffrey Oliver; John Blischak; Jonah Duckles; Josh Quan; Julia Piaskowski; Kara Woo; Kate Hertweck; Katherine Koziar; Katrin Leinweber; Kellie Ottoboni; Kevin Weitemier; Kiana Ashley West; Kieran Samuk; Kunal Marwaha; Kyriakos Chatzidimitriou; Lachlan Deer; Lex Nederbragt; Liz Ing-Simmons; Lucy Chang; Luke W Johnston; Luke Zappia; Marc Sze; Marie-Helene Burle; Marieke Frassl; Mark Dunning; Martin John Hadley; Mary Donovan; Matt Clark; Melissa Kardish; Mike Jackson; Murray Cadzow; Narayanan Raghupathy; Naupaka Zimmerman; Nelly Sélem; Nicholas Lesniak; Nicholas Potter; Nima Hejazi; Nora Mitchell; Olivia Rata Burge; Paula Andrea Martinez; Pete Bachant; Phil Bouchet; Philipp Boersch-Supan; Piotr Banaszkiewicz; Raniere Silva; Rayna Michelle Harris; Remi Daigle; Research Bazaar; Richard Barnes; Robert Bagchi; Rémi Emonet; Sam Penrose; Sandra Brosda; Sarah Munro; Sasha Lavrentovich; Scott Allen Funkhouser; Scott Ritchie; Sebastien Renaut; Thea Van Rossum; Timothy Eoin Moore; Timothy Rice; Tobin Magle; Trevor Bekolay; Tyler Crawford Kelly; Vicken Hillis; Yuka Takemon; bippuspm; butterflyskip; waiteb5
Date Added:: 03/20/2017

More Less

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

4 Results

Search Resources

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

4 Results