All resources in Educators

Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking

(View Complete Item Description)

The designing, collecting, analyzing, and reporting of psychological studies entail many choices that are often arbitrary. The opportunistic use of these so-called researcher degrees of freedom aimed at obtaining statistically significant results is problematic because it enhances the chances of false positive results and may inflate effect size estimates. In this review article, we present an extensive list of 34 degrees of freedom that researchers have in formulating hypotheses, and in designing, running, analyzing, and reporting of psychological research. The list can be used in research methods education, and as a checklist to assess the quality of preregistrations and to determine the potential for bias due to (arbitrary) choices in unregistered studies.

Material Type: Reading

Authors: Coosje L. S. Veldkamp, Hilde E. M. Augusteijn, Jelte M. Wicherts, Marcel A. L. M. van Assen, Marjan Bakker, Robbie C. M. van Aert

Library Carpentry: Tidy data for Librarians

(View Complete Item Description)

Tidy data for librarians: Library Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with Library data in Spreadsheets.

Material Type: Module

Authors: Alex Volkov, Annelise Sklar, Belinda Weaver, Christopher Erdmann, erikamias, Erin Alison Becker, Francois Michonneau, Jacqueline Frisina, James Baker, Jeffrey Oliver, Jez Cope, Ken Lacey, Niamh Wallace, Phil Reed, Scott Carl Peterson, Serah Anne Njambi Kiburu, Sherry Lake, Thea Atwood, Tim Dennis, yvonnemery

The Unix Shell

(View Complete Item Description)

Software Carpentry lesson on how to use the shell to navigate the filesystem and write simple loops and scripts. The Unix shell has been around longer than most of its users have been alive. It has survived so long because it’s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they aren’t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including “high-performance computing” supercomputers). These lessons will start you on a path towards using these resources effectively.

Material Type: Module

Authors: Adam Huffman, Adam James Orr, Adam Richie-Halford, AidaMirsalehi, Alexander Konovalov, Alexander Morley, Alex Kassil, Alex Mac, Alix Keener, Amy Brown, Andrea Bedini, Andrew Boughton, Andrew Reid, Andrew T. T. McRae, Andrew Walker, Ariel Rokem, Armin Sobhani, Ashwin Srinath, Bagus Tris Atmaja, Bartosz Telenczuk, Ben Bolker, Benjamin Gabriel, Bertie Seyffert, Bill Mills, Brian Ballsun-Stanton, BrianBill, Camille Marini, Chris Mentzel, Christina Koch, Colin Morris, Colin Sauze, csqrs, Damien Irving, Dana Brunson, Daniel Baird, Danielle M. Nielsen, Daniel McCloy, Daniel Standage, Dan Jones, Dave Bridges, David Eyers, David McKain, David Vollmer, Dean Attali, Devinsuit, Dmytro Lituiev, Donny Winston, Doug Latornell, Dustin Lang, earkpr, ekaterinailin, Elena Denisenko, Emily Dolson, Emily Jane McTavish, Eric Jankowski, Erin Alison Becker, Ethan P White, Evgenij Belikov, Farah Shamma, Fatma Deniz, Filipe Fernandes, Francis Gacenga, François Michonneau, Gabriel A. Devenyi, Gerard Capes, Giuseppe Profiti, Greg Wilson, Halle Burns, Hannah Burkhardt, Harriet Alexander, Hugues Fontenelle, Ian van der Linde, Inigo Aldazabal Mensa, Jackie Milhans, Jake Cowper Szamosi, James Guelfi, Jan T. Kim, Jarek Bryk, Jarno Rantaharju, Jason Macklin, Jay van Schyndel, Jens vdL, John Blischak, John Pellman, John Simpson, Jonah Duckles, Jonny Williams, Joshua Madin, Kai Blin, Kathy Chung, Katrin Leinweber, Kevin M. Buckley, Kirill Palamartchouk, Klemens Noga, Kristopher Keipert, Kunal Marwaha, Laurence, Lee Zamparo, Lex Nederbragt, Mahdi Sadjadi, Marcel Stimberg, Marc Rajeev Gouw, Maria Doyle, Marie-Helene Burle, Marisa Lim, Mark Mandel, Martha Robinson, Martin Feller, Matthew Gidden, Matthew Peterson, M Carlise, Megan Fritz, Michael Zingale, Mike Henry, Mike Jackson, Morgan Oneka, Murray Hoggett, Nicolas Barral, Nicola Soranzo, Noah D Brenowitz, Noam Ross, Norman Gray, nther, Orion Buske, Owen Kaluza, Patrick McCann, Paul Gardner, Pauline Barmby, Peter R. Hoyt, Peter Steinbach, Philip Lijnzaad, Phillip Doehle, Piotr Banaszkiewicz, Rafi Ullah, Raniere Silva, Rémi Emonet, reshama shaikh, Robert A Beagrie, Ruud Steltenpool, Ry4an Brase, Sarah Mount, Sarah Simpkin, s-boardman, Scott Ritchie, sjnair, Stéphane Guillou, Stephan Schmeing, Stephen Jones, Stephen Turner, Steve Leak, Susan Miller, Thomas Mellan, Tim Keighley, Tobin Magle, Tom Dowrick, Trevor Bekolay, Varda F. Hagh, Victor Koppejan, Vikram Chhatre, Yee Mey

Library Carpentry: OpenRefine

(View Complete Item Description)

Library Carpentry lesson: an introduction to OpenRefine for Librarians This Library Carpentry lesson introduces people working in library- and information-related roles to working with data in OpenRefine. At the conclusion of the lesson you will understand what the OpenRefine software does and how to use the OpenRefine software to work with data files.

Material Type: Module

Authors: Alexander Mendes, andreamcastillo, Anna Neatrour, Antonin Delpeuch, Betty Rozum, Christina Koch, Christopher Erdmann, Daniel Bangert, dnesdill, Elizabeth Lisa McAulay, Evan Williamson, hauschke, Jamene Brooks-Kieffer, James Baker, Jamie Jamison, Jeffrey Oliver, Katherine Koziar, mhidas, Naupaka Zimmerman, Paul R. Pival, Rémi Emonet, Tim Dennis, Tom Honeyman, Tracy Teal

Library Carpentry: SQL

(View Complete Item Description)

Library Carpentry, an introduction to SQL for Librarians This Library Carpentry lesson introduces librarians to relational database management system using SQLite. At the conclusion of the lesson you will: understand what SQLite does; use SQLite to summarise and link data.

Material Type: Module

Authors: 222064h, Anna-Maria Sichani, Belinda Weaver, Christopher Erdmann, Dan Michael Heggø, David Kane, Elaine Wong, Emanuele Lanzani, Fernando Rios, Jamene Brooks-Kieffer, James Baker, Janice Chan, Jeffrey Oliver, Katrin Leinweber, Kunal Marwaha, mdschleu, orobecca, Reid Otsuji, Ruud Steltenpool, thegsi, Tim Dennis

Library Carpentry: The UNIX Shell

(View Complete Item Description)

Library Carpentry lesson to learn how to use the Shell. This Library Carpentry lesson introduces librarians to the Unix Shell. At the conclusion of the lesson you will: understand the basics of the Unix shell; understand why and how to use the command line; use shell commands to work with directories and files; use shell commands to find and manipulate data.

Material Type: Module

Authors: Adam Huffman, Alexander Konovalov, Alexander Morley, Alex Kassil, Alex Mendes, Ana Costa Conrado, Andrew Reid, Andrew T. T. McRae, Ariel Rokem, Ashwin Srinath, Bagus Tris Atmaja, Belinda Weaver, Benjamin Bolker, Benjamin Gabriel, BertrandCaron, Brian Ballsun-Stanton, Christopher Erdmann, Christopher Mentzel, colinmorris, Colin Sauze, csqrs, Dan Michael Heggø, Dave Bridges, David McKain, Dmytro Lituiev, earkpr, ekaterinailin, Elena Denisenko, Eric Jankowski, Erin Alison Becker, Evan Williamson, Farah Shamma, Gabriel Devenyi, Gerard Capes, Giuseppe Profiti, Halle Burns, Hannah Burkhardt, hugolio, Ian Lessing, Ian van der Linde, Jake Cowper Szamosi, James Baker, James Guelfi, Jarno Rantaharju, Jarosław Bryk, Jason Macklin, Jeffrey Oliver, jenniferleeucalgary, John Pellman, Jonah Duckles, Jonny Williams, Katrin Leinweber, Kevin M. Buckley, Kunal Marwaha, Laurence, Marc Gouw, Marie-Helene Burle, Marisa Lim, Martha Robinson, Martin Feller, Megan Fritz, Michael Lascarides, Michael Zingale, Michele Hayslett, Mike Henry, Morgan Oneka, Murray Hoggett, Nicolas Barral, Nicola Soranzo, Noah D Brenowitz, Owen Kaluza, Patrick McCann, Peter Hoyt, Rafi Ullah, Raniere Silva, Rémi Emonet, reshama shaikh, Ruud Steltenpool, sjnair, Stéphane Guillou, Stephan Schmeing, Stephen Jones, Stephen Leak, Susan J Miller, Thomas Mellan, Tim Dennis, Tom Dowrick, Travis Lilleberg, Victor Koppejan, Vikram Chhatre, Yee Mey

Library Carpentry: Introduction to Git

(View Complete Item Description)

Library Carpentry lesson: An introduction to Git. What We Will Try to Do Begin to understand and use Git/GitHub. You will not be an expert by the end of the class. You will probably not even feel very comfortable using Git. This is okay. We want to make a start but, as with any skill, using Git takes practice. Be Excellent to Each Other If you spot someone in the class who is struggling with something and you think you know how to help, please give them a hand. Try not to do the task for them: instead explain the steps they need to take and what these steps will achieve. Be Patient With The Instructor and Yourself This is a big group, with different levels of knowledge, different computer systems. This isn’t your instructor’s full-time job (though if someone wants to pay them to play with computers all day they’d probably accept). They will do their best to make this session useful. This is your session. If you feel we are going too fast, then please put up a pink sticky. We can decide as a group what to cover.

Material Type: Module

Authors: 222064h, abracarambar, ajtag, Alexander Gary Zimmerman, Alexander Mendes, Alex Mendes, Amiya Maji, Amy Olex, Andrew Lonsdale, Annika Rockenberger, Begüm D. Topçuoğlu, Belinda Weaver, Benjamin Bolker, Bill McMillin, Brian Moore, butterflyskip, Casey Youngflesh, Christopher Erdmann, Christoph Junghans, cmjt, Dan Michael O. Heggø, David Jennings, DSTraining, Erin Alison Becker, Evan Williamson, Garrett Bachant, Grant Sayer, hdinkel, Ian Lee, Jake Lever, Jamene Brooks-Kieffer, James Baker, James E McClure, James O'Donnell, James Tocknell, Janoš Vidali, Jeffrey Oliver, Jeremy Teitelbaum, Jeyashree Krishnan, João Rodrigues, Joe Atzberger, Jonah Duckles, Jonathan Cooper, jonestoddcm, Katherine Koziar, Katrin Leinweber, Kunal Marwaha, Kurt Glaesemann, Lauren Ko, L.C. Karssen, Lex Nederbragt, Madicken Munk, Maneesha Sane, Marie-Helene Burle, Mark Woodbridge, Martino Sorbaro, Matt Critchlow, Matteo Ceschia, Matthew Bourque, Matthew Hartley, Maxim Belkin, Megan Potterbusch, Michael Torpey, Michael Zingale, Mingsheng Zhang, Nicola Soranzo, Nima Hejazi, Nora McGregor, Oscar Arbeláez, Peace Ossom Williamson, pllim, Raniere Silva, Rayna Harris, Rémi Emonet, Rene Gassmoeller, Richard Barnes, Rich McCue, Ruud Steltenpool, Ryan Wick, Samniqueka Halsey, Samuel Lelièvre, Sarah Stevens, Saskia Hiltemann, Schlauch, Tobias, Scott Bailey, Shari Laster, Simon Waldman, Stefan Siegert, Thea Atwood, Thomas Morrell, Tim Dennis, Tommy Keswick, Tracy Teal, Trevor Keller, TrevorLeeCline, Tyler Crawford Kelly, Tyler Reddy, Umihiko Hoshijima, Veronica Ikeshoji-Orlati, Wes Harrell, William Sacks, Will Usher, Wolmar Nyberg Åkerström, Yuri

Social Science Workshop Overview

(View Complete Item Description)

Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction. Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.

Material Type: Module

Authors: Angela Li, Erin Alison Becker, Francois Michonneau, Maneesha Sane, Sarah Brown, Tracy Teal

R for Social Scientists

(View Complete Item Description)

Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches how to analyse and visualise data used by social scientists. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Material Type: Module

Authors: Angela Li, Ben Marwick, Christina Maimone, Danielle Quinn, Erin Alison Becker, Francois Michonneau, Geoffrey LaFlair, Hao Ye, Jake Kaupp, Juan Fung, Katrin Leinweber, Martin Olmos, Murray Cadzow

Automation and Make

(View Complete Item Description)

A Software Carpentry lesson to learn how to use Make Make is a tool which can run commands to read files, process these files in some way, and write out the processed files. For example, in software development, Make is used to compile source code into executable programs or libraries, but Make can also be used to: run analysis scripts on raw data files to get data files that summarize the raw data; run visualization scripts on data files to produce plots; and to parse and combine text files and plots to create papers. Make is called a build tool - it builds data files, plots, papers, programs or libraries. It can also update existing files if desired. Make tracks the dependencies between the files it creates and the files used to create these. If one of the original files (e.g. a data file) is changed, then Make knows to recreate, or update, the files that depend upon this file (e.g. a plot). There are now many build tools available, all of which are based on the same concepts as Make.

Material Type: Module

Authors: Adam Richie-Halford, Ana Costa Conrado, Andrew Boughton, Andrew Fraser, Andy Kleinhesselink, Andy Teucher, Anna Krystalli, Bill Mills, Brandon Curtis, David E. Bernholdt, Deborah Gertrude Digges, François Michonneau, Gerard Capes, Greg Wilson, Jake Lever, Jason Sherman, John Blischak, Jonah Duckles, Juan F Fung, Kate Hertweck, Lex Nederbragt, Luiz Irber, Matthew Thomas, Michael Culshaw-Maurer, Mike Jackson, Pete Bachant, Piotr Banaszkiewicz, Radovan Bast, Raniere Silva, Rémi Emonet, Samuel Lelièvre, Satya Mishra, Trevor Bekolay

Databases and SQL

(View Complete Item Description)

Software Carpentry lesson that teaches how to use databases and SQL In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy. Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.

Material Type: Module

Authors: Amy Brown, Andrew Boughton, Andrew Kubiak, Avishek Kumar, Ben Waugh, Bill Mills, Brian Ballsun-Stanton, Chris Tomlinson, Colleen Fallaw, Daniel Suess, Dan Michael Heggø, Dave Welch, David W Wright, Deborah Gertrude Digges, Donny Winston, Doug Latornell, Erin Alison Becker, Ethan Nelson, Ethan P White, François Michonneau, George Graham, Gerard Capes, Gideon Juve, Greg Wilson, Ioan Vancea, Jake Lever, James Mickley, John Blischak, JohnRMoreau@gmail.com, Jonah Duckles, Jonathan Guyer, Joshua Nahum, Kate Hertweck, Kevin Dyke, lorra, Louis Vernon, Luc Small, Luke William Johnston, Maneesha Sane, Mark Stacy, Matthew Collins, Matty Jones, Mike Jackson, Morgan Taschuk, Patrick McCann, Paula Andrea Martinez, Pauline Barmby, Piotr Banaszkiewicz, Raniere Silva, Ray Bell, Rayna Michelle Harris, Rémi Emonet, Rémi Rampin, Seda Arat, Sheldon John McKay, Sheldon McKay, slimlime, Stephen Davison, Thomas Guignard, Trevor Bekolay

R for Reproducible Scientific Analysis

(View Complete Item Description)

This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Material Type: Module

Authors: Adam H. Sparks, Ahsan Ali Khoja, Amy Lee, Ana Costa Conrado, Andrew Boughton, Andrew Lonsdale, Andrew MacDonald, Andris Jankevics, Andy Teucher, Antonio Berlanga-Taylor, Ashwin Srinath, Ben Bolker, Bill Mills, bippuspm, Bret Beheim, butterflyskip, Clare Sloggett, Daniel, Dave Bridges, David J. Harris, David Mawdsley, Dean Attali, Diego Rabatone Oliveira, Drew Tyre, Elise Morrison, Erin Alison Becker, Fernando Mayer, François Michonneau, Giulio Valentino Dalla Riva, Gordon McDonald, Greg Wilson, Harriet Dashnow, Ido Bar, Jaime Ashander, James Balamuta, James Mickley, Jamie McDevitt-Irwin, Jeffrey Arnold, Jeffrey Oliver, John Blischak, Jonah Duckles, Josh Quan, Julia Piaskowski, Kara Woo, Kate Hertweck, Katherine Koziar, Katrin Leinweber, Kellie Ottoboni, Kevin Weitemier, Kiana Ashley West, Kieran Samuk, Kunal Marwaha, Kyriakos Chatzidimitriou, Lachlan Deer, Lex Nederbragt, Liz Ing-Simmons, Lucy Chang, Luke W Johnston, Luke Zappia, Marc Sze, Marie-Helene Burle, Marieke Frassl, Mark Dunning, Martin John Hadley, Mary Donovan, Matt Clark, Melissa Kardish, Mike Jackson, Murray Cadzow, Narayanan Raghupathy, Naupaka Zimmerman, Nelly Sélem, Nicholas Lesniak, Nicholas Potter, Nima Hejazi, Nora Mitchell, Olivia Rata Burge, Paula Andrea Martinez, Pete Bachant, Phil Bouchet, Philipp Boersch-Supan, Piotr Banaszkiewicz, Raniere Silva, Rayna Michelle Harris, Remi Daigle, Rémi Emonet, Research Bazaar, Richard Barnes, Robert Bagchi, Sam Penrose, Sandra Brosda, Sarah Munro, Sasha Lavrentovich, Scott Allen Funkhouser, Scott Ritchie, Sebastien Renaut, Thea Van Rossum, Timothy Eoin Moore, Timothy Rice, Tobin Magle, Trevor Bekolay, Tyler Crawford Kelly, Vicken Hillis, waiteb5, Yuka Takemon

Plotting and Programming in Python

(View Complete Item Description)

This lesson is part of Software Carpentry workshops and teach an introduction to plotting and programming using python. This lesson is an introduction to programming in Python for people with little or no previous programming experience. It uses plotting as its motivating example, and is designed to be used in both Data Carpentry and Software Carpentry workshops. This lesson references JupyterLab, but can be taught using a regular Python interpreter as well. Please note that this lesson uses Python 3 rather than Python 2.

Material Type: Module

Authors: Adam Steer, Allen Lee, Andreas Hilboll, Ashley Champagne, Benjamin, Benjamin Roberts, CanWood, Carlos Henrique Brandt, Carlos M Ortiz Marrero, Cephalopd, Cian Wilson, Daniel W Kerchner, Dan Mønster, Daria Orlowska, Dave Lampert, David Matten, Erin Alison Becker, Florian Goth, Francisco J. Martínez, Greg Wilson, ian, Jacob Deppen, Jarno Rantaharju, Jeremy Zucker, Jonah Duckles, Kees den Heijer, Keith Gilbertson, Kyle E Niemeyer, Lex Nederbragt, Logan Cox, Louis Vernon, Lucy Dorothy Whalley, Madeleine Bonsma-Fisher, Mark Phillips, Mark Slater, Maxim Belkin, Michael Beyeler, Mike Henry, mzc9, Narayanan Raghupathy, Nigel Bosch, Olav Vahtras, Pablo Hernandez-Cerdan, Paul Anzel, Phil Tooley, Raniere Silva, Robert Woodward, Ryan Avery, Ryan Gregory James, Sarah M Brown, SBolo, Shyam Dwaraknath, Sourav Singh, Stéphane Guillou, Steven Koenig, Taylor Smith, Thor Wikfeldt, Timothy Warren, Tyler Martin, Vasu Venkateshwaran, Vikas Pejaver

Version Control with Git

(View Complete Item Description)

This lesson is part of the Software Carpentry workshops that teach how to use version control with Git. Wolfman and Dracula have been hired by Universal Missions (a space services spinoff from Euphoric State University) to investigate if it is possible to send their next planetary lander to Mars. They want to be able to work on the plans at the same time, but they have run into problems doing this in the past. If they take turns, each one will spend a lot of time waiting for the other to finish, but if they work on their own copies and email changes back and forth things will be lost, overwritten, or duplicated. A colleague suggests using version control to manage their work. Version control is better than mailing files back and forth: Nothing that is committed to version control is ever lost, unless you work really, really hard at it. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results. As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor. When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s. Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded). Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.

Material Type: Module

Authors: abracarambar, Alexander G. Zimmerman, Amiya Maji, Amy L Olex, Andrew Lonsdale, Annika Rockenberger, Begüm D. Topçuoğlu, Ben Bolker, Bill Sacks, Brian Moore, butterflyskip, Casey Youngflesh, Charlotte Moragh Jones-Todd, Christoph Junghans, David Jennings, Erin Alison Becker, François Michonneau, Garrett Bachant, Grant Sayer, Holger Dinkel, Ian Lee, Jake Lever, James E McClure, James Tocknell, Janoš Vidali, Jeremy Teitelbaum, Jeyashree Krishnan, Jimmy O'Donnell, João Rodrigues, Joe Atzberger, Jonah Duckles, Jonathan Cooper, jonestoddcm, Katherine Koziar, Katrin Leinweber, Kunal Marwaha, Kurt Glaesemann, Lauren Ko, L.C. Karssen, Lex Nederbragt, Madicken Munk, Maneesha Sane, Marie-Helene Burle, Mark Woodbridge, Martino Sorbaro, Matt Critchlow, Matteo Ceschia, Matthew Bourque, Matthew Hartley, Maxim Belkin, Megan Potterbusch, Michael Torpey, Michael Zingale, Mingsheng Zhang, Nicola Soranzo, Nima Hejazi, Oscar Arbeláez, Peace Ossom Williamson, Pey Lian Lim, Raniere Silva, Rayna Michelle Harris, Rémi Emonet, Rene Gassmoeller, Richard Barnes, Rich McCue, Ruud Steltenpool, Samniqueka Halsey, Samuel Lelièvre, Sarah Stevens, Saskia Hiltemann, Schlauch, Tobias, Scott Bailey, Simon Waldman, Stefan Siegert, Thomas Morrell, Tommy Keswick, Traci P, Tracy Teal, Trevor Keller, TrevorLeeCline, Tyler Crawford Kelly, Tyler Reddy, Umihiko Hoshijima, Veronica Ikeshoji-Orlati, Wes Harrell, Will Usher, Wolmar Nyberg Åkerström

Library Carpentry: Introduction to Working with Data (Regular Expressions)

(View Complete Item Description)

This Library Carpentry lesson introduces librarians and others to working with data. This Library Carpentry lesson introduces people with library- and information-related roles to working with data using regular expressions. The lesson provides background on the regular expression language and how it can be used to match and extract text and to clean data.

Material Type: Module

Authors: Alexander Mendes, Alex Volkov, Angus Taggart, Belinda Weaver, BertrandCaron, Bianca Peterson, Christopher Edsall, Christopher Erdmann, Chuck McAndrew, Dan Michael Heggø, Dan Michael O. Heggø, Elizabeth Lisa McAulay, fdsayre, Felix Hemme, François Michonneau, James Baker, Janice Chan, Jeffrey Oliver, Jeremy Guillette, Jodi Schneider, Jonah Duckles, Katherine Koziar, Katrin Leinweber, Kunal Marwaha, lsult, Paul R. Pival, PH03N1X007, remerjohnson, Saskia Scheltjens, Shari Laster, Tim Dennis, yvonnemery

Introduction to Cloud Computing for Genomics

(View Complete Item Description)

Data Carpentry lesson to learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources. The cloud is a fancy name for the huge network of computers that host your favorite websites, stream movies, and shop online, but you can also harness all of that computing power for running analyses that would take days, weeks or even years on your local computer. In this lesson, you’ll learn about renting cloud services that fit your analytic needs, and how to interact with one of those services (AWS) via the command line.

Material Type: Module

Authors: Abigail Cabunoc Mayes, Adina Howe, Amanda Charbonneau, ammatsun, Bérénice Batut, Bob Freeman, Brittany N. Lasseigne, PhD, Caryn Johansen, Chris Fields, Darya Vanichkina, David Mawdsley, Erin Becker, François Michonneau, Greg Wilson, Jason Williams, Joseph Stachelek, Kari L. Jordan, PhD, Katrin Leinweber, Maxim Belkin, Michael R. Crusoe, Piotr Banaszkiewicz, Raniere Silva, Rémi Emonet, Renato Alves, Stephen Turner, Taylor Reiter, Thomas Morrell, Tracy Teal, vuw-ecs-kevin, William L. Close

Data Wrangling and Processing for Genomics

(View Complete Item Description)

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Material Type: Module

Authors: Adam Thomas, Ahmed R. Hasan, Aniello Infante, Anita Schürch, dbmarchant, Dev Paudel, Erin Alison Becker, Fotis Psomopoulos, François Michonneau, Gaius Augustus, Gregg TeHennepe, Jason Williams, Jessica Elizabeth Mizzi, Karen Cranston, Kari L Jordan, Kate Crosby, Kevin Weitemier, Lex Nederbragt, Luis Avila, Peter R. Hoyt, Rayna Michelle Harris, Ryan Peek, Sheldon John McKay, Sheldon McKay, Taylor Reiter, Tessa Pierce, Toby Hodges, Tracy Teal, Vasilis Lenis, Winni Kretzschmar

Introduction to the Command Line for Genomics

(View Complete Item Description)

Data Carpentry lesson to learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards with genomics data. Command line interface (OS shell) and graphic user interface (GUI) are different ways of interacting with a computer’s operating system. The shell is a program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination. There are quite a few reasons to start learning about the shell: For most bioinformatics tools, you have to use the shell. There is no graphical interface. If you want to work in metagenomics or genomics you’re going to need to use the shell. The shell gives you power. The command line gives you the power to do your work more efficiently and more quickly. When you need to do things tens to hundreds of times, knowing how to use the shell is transformative. To use remote computers or cloud computing, you need to use the shell.

Material Type: Module

Authors: Amanda Charbonneau, Amy E. Hodge, Anita Schürch, Bastian Greshake Tzovaras, Bérénice Batut, Colin Davenport, Diya Das, Erin Alison Becker, François Michonneau, Giulio Valentino Dalla Riva, Jessica Elizabeth Mizzi, Karen Cranston, Kari L Jordan, Mattias de Hollander, Mike Lee, Niclas Jareborg, Omar Julio Sosa, Rayna Michelle Harris, Ross Cunning, Russell Neches, Sarah Stevens, Shannon EK Joslin, Sheldon John McKay, Siva Chudalayandi, Taylor Reiter, Tobi, Tracy Teal, Tristan De Buysscher

Project Organization and Management for Genomics

(View Complete Item Description)

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Material Type: Module

Authors: Amanda Charbonneau, Bérénice Batut, Daniel O. S. Ouso, Deborah Paul, Erin Alison Becker, François Michonneau, Jason Williams, Juan A. Ugalde, Kevin Weitemier, Laura Williams, Paula Andrea Martinez, Peter R. Hoyt, Rayna Michelle Harris, Taylor Reiter, Toby Hodges, Tracy Teal

Genomics Workshop Overview

(View Complete Item Description)

Workshop overview for the Data Carpentry genomics curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for genomics research including: best practices for organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction. Please note that workshop materials for working with Genomics data in R are in “alpha” development. These lessons are available for review and for informal teaching experiences, but are not yet part of The Carpentries’ official lesson offerings. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Genomics workshops.

Material Type: Module

Authors: Amanda Charbonneau, Erin Alison Becker, François Michonneau, Jason Williams, Maneesha Sane, Matthew Kweskin, Muhammad Zohaib Anwar, Murray Cadzow, Paula Andrea Martinez, Taylor Reiter, Tracy Teal