Collection

Reproducibility

Agreement of research results repeated. Reproducibility, replicability, repeatability, robustness, generalizability, organization, documentation, automation, dissemination, guidance, definitions, and more.

185 affiliated resources

Open filters Close filters

Open science challenges, benefits and tips in early career and beyond

Unrestricted Use

CC BY

Open science challenges, benefits and tips in early career and beyond

Rating

The movement towards open science is a consequence of seemingly pervasive failures to replicate previous research. This transition comes with great benefits but also significant challenges that are likely to affect those who carry out the research, usually early career researchers (ECRs). Here, we describe key benefits, including reputational gains, increased chances of publication, and a broader increase in the reliability of research. The increased chances of publication are supported by exploratory analyses indicating null findings are substantially more likely to be published via open registered reports in comparison to more conventional methods. These benefits are balanced by challenges that we have encountered and that involve increased costs in terms of flexibility, time, and issues with the current incentive structure, all of which seem to affect ECRs acutely. Although there are major obstacles to the early adoption of open science, overall open science practices should benefit both the ECR and improve the quality of research. We review 3 benefits and 3 challenges and provide suggestions from the perspective of ECRs for moving towards open science practices, which we believe scientists and institutions at all levels would do well to consider.

Subject:: Biology; Life Science
Material Type:: Reading
Provider:: PLOS Biology
Author:: Christopher Allen; David M. A. Mehler
Date Added:: 08/07/2020

Optimizing Research Collaboration

Unrestricted Use

CC BY

Optimizing Research Collaboration

Rating

In this webinar, we demonstrate the OSF tools available for contributors, labs, centers, and institutions that support stronger collaborations. The demo includes useful practices like: contributor management, the OSF wiki as an electronic lab notebook, using OSF to manage online courses and syllabi, and more. Finally, we look at how OSF Institutions can provide discovery and intelligence gathering infrastructure so that you can focus on conducting and supporting exceptional research. The Center for Open Science’s ongoing mission is to provide community and technical resources to support your commitments to rigorous, transparent research practices. Visit cos.io/institutions to learn more.

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Outcome reporting bias in randomized-controlled trials investigating antipsychotic drugs

Unrestricted Use

CC BY

Outcome reporting bias in randomized-controlled trials investigating antipsychotic drugs

Rating

Recent literature hints that outcomes of clinical trials in medicine are selectively reported. If applicable to psychotic disorders, such bias would jeopardize the reliability of randomized clinical trials (RCTs) investigating antipsychotics and thus their extrapolation to clinical practice. We therefore comprehensively examined outcome reporting bias in RCTs of antipsychotic drugs by a systematic review of prespecified outcomes on ClinicalTrials.gov records of RCTs investigating antipsychotic drugs in schizophrenia and schizoaffective disorder between 1 January 2006 and 31 December 2013. These outcomes were compared with outcomes published in scientific journals. Our primary outcome measure was concordance between prespecified and published outcomes; secondary outcome measures included outcome modifications on ClinicalTrials.gov after trial inception and the effects of funding source and directionality of results on record adherence. Of the 48 RCTs, 85% did not fully adhere to the prespecified outcomes. Discrepancies between prespecified and published outcomes were found in 23% of RCTs for primary outcomes, whereas 81% of RCTs had at least one secondary outcome non-reported, newly introduced, or changed to a primary outcome in the respective publication. In total, 14% of primary and 44% of secondary prespecified outcomes were modified after trial initiation. Neither funding source (P=0.60) nor directionality of the RCT results (P=0.10) impacted ClinicalTrials.gov record adherence. Finally, the number of published safety endpoints (N=335) exceeded the number of prespecified safety outcomes by 5.5 fold. We conclude that RCTs investigating antipsychotic drugs suffer from substantial outcome reporting bias and offer suggestions to both monitor and limit such bias in the future.

Subject:: Applied Science; Health, Medicine and Nursing
Material Type:: Reading
Provider:: Translational Psychiatry
Author:: C. H. Vinkers; C. M. C. Lemmens; J. J. Luykx; M. Lancee; R. S. Kahn
Date Added:: 08/07/2020

Plotting and Programming in Python

Unrestricted Use

CC BY

Plotting and Programming in Python

Rating

This lesson is part of Software Carpentry workshops and teach an introduction to plotting and programming using python. This lesson is an introduction to programming in Python for people with little or no previous programming experience. It uses plotting as its motivating example, and is designed to be used in both Data Carpentry and Software Carpentry workshops. This lesson references JupyterLab, but can be taught using a regular Python interpreter as well. Please note that this lesson uses Python 3 rather than Python 2.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Steer; Allen Lee; Andreas Hilboll; Ashley Champagne; Benjamin; Benjamin Roberts; CanWood; Carlos Henrique Brandt; Carlos M Ortiz Marrero; Cephalopd; Cian Wilson; Dan Mønster; Daniel W Kerchner; Daria Orlowska; Dave Lampert; David Matten; Erin Alison Becker; Florian Goth; Francisco J. Martínez; Greg Wilson; Jacob Deppen; Jarno Rantaharju; Jeremy Zucker; Jonah Duckles; Kees den Heijer; Keith Gilbertson; Kyle E Niemeyer; Lex Nederbragt; Logan Cox; Louis Vernon; Lucy Dorothy Whalley; Madeleine Bonsma-Fisher; Mark Phillips; Mark Slater; Maxim Belkin; Michael Beyeler; Mike Henry; Narayanan Raghupathy; Nigel Bosch; Olav Vahtras; Pablo Hernandez-Cerdan; Paul Anzel; Phil Tooley; Raniere Silva; Robert Woodward; Ryan Avery; Ryan Gregory James; SBolo; Sarah M Brown; Shyam Dwaraknath; Sourav Singh; Steven Koenig; Stéphane Guillou; Taylor Smith; Thor Wikfeldt; Timothy Warren; Tyler Martin; Vasu Venkateshwaran; Vikas Pejaver; ian; mzc9
Date Added:: 08/07/2020

Preparing code and data for computationally reproducible collaboration and publication: a hands-on workshop

Unrestricted Use

CC BY

Preparing code and data for computationally reproducible collaboration and publication: a hands-on workshop

Rating

Computational analyses are playing an increasingly central role in research. Journals, funders, and researchers are calling for published research to include associated data and code. However, many involved in research have not received training in best practices and tools for sharing code and data. This course aims to address this gap in training while also providing those who support researchers with curated best practices guidance and tools.This course is unique compared to other reproducibility courses due to its practical, step-by-step design. It is comprised of hands-on exercises to prepare research code and data for computationally reproducible publication. Although the course starts with some brief introductory information about computational reproducibility, the bulk of the course is guided work with data and code. Participants move through preparing research for reuse, organization, documentation, automation, and submitting their code and data to share. Tools that support reproducibility will be introduced (Code Ocean), but all lessons will be platform agnostic.Level: IntermediateIntended audience: The course is targeted at researchers and research support staff who are involved in the preparation and publication of research materials. Anyone with an interest in reproducible publication is welcome. The course is especially useful for those looking to learn practical steps for improving the computational reproducibility of their own research.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Activity/Lab
Author:: April Clyburne-Sherin
Date Added:: 08/08/2019

The Preregistration Challenge: A How To Guide

Unrestricted Use

CC BY

The Preregistration Challenge: A How To Guide

Rating

This video shows interested researchers how to get started on their own preregistration as part of the Preregistration Challenge. Learn how to create a new draft, find example preregistrations from different fields, respond to comments from the preregistration review team, and turn your final draft into a formal preregistration. For more information, check out https://www.cos.io/initiatives/prereg-more-information.

Subject:: Education
Material Type:: Lesson
Provider:: Center for Open Science
Date Added:: 03/31/2021

Preregistration: Improve Research Rigor, Reduce Bias

Unrestricted Use

CC BY

Preregistration: Improve Research Rigor, Reduce Bias

Rating

In this webinar Professor Brian Nosek, Executive Director of the Center for Open Science (https://cos.io), outlines the practice of Preregistration and how it can aid in increasing the rigor and reproducibility of research. The webinar is co-hosted by the Health Research Alliance, a collaborative member organization of nonprofit research funders. Slides available at: https://osf.io/9m6tx/

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Preregistration Overview page

Unrestricted Use

CC BY

Preregistration Overview page

Rating

What is Preregistration? When you preregister your research, you're simply specifying your research plan in advance of your study and submitting it to a registry. Preregistration separates hypothesis-generating (exploratory) from hypothesis-testing (confirmatory) research. Both are important. But the same data cannot be used to generate and test a hypothesis, which can happen unintentionally and reduce the credibility of your results. Addressing this problem through planning improves the quality and transparency of your research. This helps you clearly report your study and helps others who may wish to build on it.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Reading
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 06/18/2020

Preregistration in Complex Contexts: A Preregistration Template for the Application of Cognitive Models

Unrestricted Use

CC BY

Preregistration in Complex Contexts: A Preregistration Template for the Application of Cognitive Models

Rating

In recent years, open science practices have become increasingly popular in psychology and related sciences. These practices aim to increase rigour and transparency in science as a potential response to the challenges posed by the replication crisis. Many of these reforms -- including the highly influential preregistration -- have been designed for experimental work that tests simple hypotheses with standard statistical analyses, such as assessing whether an experimental manipulation has an effect on a variable of interest. However, psychology is a diverse field of research, and the somewhat narrow focus of the prevalent discussions surrounding and templates for preregistration has led to debates on how appropriate these reforms are for areas of research with more diverse hypotheses and more complex methods of analysis, such as cognitive modelling research within mathematical psychology. Our article attempts to bridge the gap between open science and mathematical psychology, focusing on the type of cognitive modelling that Crüwell, Stefan, & Evans (2019) labelled model application, where researchers apply a cognitive model as a measurement tool to test hypotheses about parameters of the cognitive model. Specifically, we (1) discuss several potential researcher degrees of freedom within model application, (2) provide the first preregistration template for model application, and (3) provide an example of a preregistered model application using our preregistration template. More broadly, we hope that our discussions and proposals constructively advance the debate surrounding preregistration in cognitive modelling, and provide a guide for how preregistration templates may be developed in other diverse or complex research contexts.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Reading
Author:: Nathan Evans; Sophia Crüwell
Date Added:: 12/07/2019

Presentations Given by Center for Open Science

Read the Fine Print

Presentations Given by Center for Open Science

Rating

A collection of slides for virtually all presentations given by Center for Open Science staff since its founding in 2013.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Lesson
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 04/23/2015

Programming with MATLAB

Unrestricted Use

CC BY

Programming with MATLAB

Rating

The best way to learn how to program is to do something useful, so this introduction to MATLAB is built around a common scientific task: data analysis. Our real goal isn’t to teach you MATLAB, but to teach you the basic concepts that all programming depends on. We use MATLAB in our lessons because: we have to use something for examples; it’s well-documented; it has a large (and growing) user base among scientists in academia and industry; and it has a large library of packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so that you can share your work with them easily, and to use that language well.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Gerard Capes
Date Added:: 03/20/2017

Programming with Python

Unrestricted Use

CC BY

Programming with Python

Rating

The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis. Arthritis Inflammation We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, columns represent successive days. The first three rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 Each number represents the number of inflammation bouts that a particular patient experienced on a given day. For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study. So, we want to: Calculate the average inflammation per day across all patients. Plot the result to discuss and share with colleagues. To do all that, we’ll have to learn a little bit about programming.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Anne Fouilloux; Lauren Ko; Maxim Belkin; Trevor Bekolay; Valentina Staneva
Date Added:: 08/07/2020

Programming with R

Unrestricted Use

CC BY

Programming with R

Rating

The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis. Our real goal isn’t to teach you R, but to teach you the basic concepts that all programming depends on. We use R in our lessons because: we have to use something for examples; it’s free, well-documented, and runs almost everywhere; it has a large (and growing) user base among scientists; and it has a large library of external packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so you can share your work with them easily, and to use that language well. We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 We want to: load that data into memory, calculate the average inflammation per day across all patients, and plot the result. To do all that, we’ll have to learn a little bit about programming.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Diya Das; Katrin Leinweber; Rohit Goswami
Date Added:: 03/20/2017

Project Organization and Management for Genomics

Unrestricted Use

CC BY

Project Organization and Management for Genomics

Rating

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:: Business and Communication; Genetics; Life Science; Management
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Bérénice Batut; Daniel O. S. Ouso; Deborah Paul; Erin Alison Becker; François Michonneau; Jason Williams; Juan A. Ugalde; Kevin Weitemier; Laura Williams; Paula Andrea Martinez; Peter R. Hoyt; Rayna Michelle Harris; Taylor Reiter; Toby Hodges; Tracy Teal
Date Added:: 08/07/2020

Conditional Remix & Share Permitted

CC BY-SA

PsyTeachR

Rating

Materials for the University of Glasgow Institute of Neuroscience and Psychology’s undergraduate and MSc methods courses + Experiences, insights, and materials for teaching R across all undergraduate and postgraduate levels.

Subject:: Psychology; Social Science
Material Type:: Textbook
Provider:: University of Glasgow
Date Added:: 06/18/2020

Public Availability of Published Research Data in High-Impact Journals

Unrestricted Use

CC BY

Public Availability of Published Research Data in High-Impact Journals

Rating

Background There is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature. Methods and Results We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available. Conclusion A substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. This empiric evaluation highlights opportunities for improvement.

Subject:: Applied Science; Health, Medicine and Nursing
Material Type:: Reading
Provider:: PLOS ONE
Author:: Alawi A. Alsheikh-Ali; John P. A. Ioannidis; Mouaz H. Al-Mallah; Waqas Qureshi
Date Added:: 08/07/2020

Public Data Archiving in Ecology and Evolution: How Well Are We Doing?

Unrestricted Use

CC BY

Public Data Archiving in Ecology and Evolution: How Well Are We Doing?

Rating

Policies that mandate public data archiving (PDA) successfully increase accessibility to data underlying scientific publications. However, is the data quality sufficient to allow reuse and reanalysis? We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. We suggest that cultural shifts facilitating clearer benefits to authors are necessary to achieve high-quality PDA and highlight key guidelines to help authors increase their data’s reuse potential and compliance with journal data policies.

Subject:: Biology; Life Science
Material Type:: Reading
Provider:: PLOS Biology
Author:: Dominique G. Roche; Loeske E. B. Kruuk; Robert Lanfear; Sandra A. Binning
Date Added:: 08/07/2020

Python for Humanities

Unrestricted Use

CC BY

Python for Humanities

Rating

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Iain Emsley
Date Added:: 08/07/2020

Questionable research practices among italian research psychologists

Unrestricted Use

CC BY

Questionable research practices among italian research psychologists

Rating

A survey in the United States revealed that an alarmingly large percentage of university psychologists admitted having used questionable research practices that can contaminate the research literature with false positive and biased findings. We conducted a replication of this study among Italian research psychologists to investigate whether these findings generalize to other countries. All the original materials were translated into Italian, and members of the Italian Association of Psychology were invited to participate via an online survey. The percentages of Italian psychologists who admitted to having used ten questionable research practices were similar to the results obtained in the United States although there were small but significant differences in self-admission rates for some QRPs. Nearly all researchers (88%) admitted using at least one of the practices, and researchers generally considered a practice possibly defensible if they admitted using it, but Italian researchers were much less likely than US researchers to consider a practice defensible. Participants’ estimates of the percentage of researchers who have used these practices were greater than the self-admission rates, and participants estimated that researchers would be unlikely to admit it. In written responses, participants argued that some of these practices are not questionable and they have used some practices because reviewers and journals demand it. The similarity of results obtained in the United States, this study, and a related study conducted in Germany suggest that adoption of these practices is an international phenomenon and is likely due to systemic features of the international research and publication processes.

Subject:: Psychology; Social Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: Coosje L. S. Veldkamp; Franca Agnoli; Jelte M. Wicherts; Paolo Albiero; Roberto Cubelli
Date Added:: 08/07/2020

RStudio Cheatsheets

Unrestricted Use

CC BY

RStudio Cheatsheets

Rating

RStudio Cheatsheets

The cheatsheets below make it easy to use some of our favorite packages. Cheatsheets include the following topics:

Python with R and Reticulate Cheatsheet
The reticulate package provides a comprehensive set of tools for interoperability between Python and R. With reticulate, you can call Python from R in a variety of ways including importing Python modules into R scripts, writing R Markdown Python chunks, sourcing Python scripts, and using Python interactively within the RStudio IDE. This cheatsheet will remind you how.

Factors with forcats Cheatsheet
Factors are R’s data structure for categorical data. The forcats package makes it easy to work with factors. This cheatsheet reminds you how to make factors, reorder their levels, recode their values, and more.

Tidy Evaluation with rlang Cheatsheet
Tidy Evaluation (Tidy Eval) is a framework for doing non-standard evaluation in R that makes it easier to program with tidyverse functions. Non-standard evaluation, better thought of as “delayed evaluation,” lets you capture a user’s R code to run later in a new environment or against a new data frame. The tidy evaluation framework is implemented by the rlang package and used by functions throughout the tidyverse.

Deep Learning with Keras Cheatsheet
Keras is a high-level neural networks API developed with a focus on enabling fast experimentation. Keras supports both convolution based networks and recurrent networks (as well as combinations of the two), runs seamlessly on both CPU and GPU devices, and is capable of running on top of multiple back-ends including TensorFlow, CNTK, and Theano.

Dates and Times Cheatsheet
Lubridate makes it easier to work with dates and times in R. This lubridate cheatsheet covers how to round dates, work with time zones, extract elements of a date or time, parse dates into R and more. The back of the cheatsheet describes lubridate’s three timespan classes: periods, durations, and intervals; and explains how to do math with date-times.

Work with Strings Cheatsheet
The stringr package provides an easy to use toolkit for working with strings, i.e. character data, in R. This cheatsheet guides you through stringr’s functions for manipulating strings. The back page provides a concise reference to regular expresssions, a mini-language for describing, finding, and matching patterns in strings.

Apply Functions Cheatsheet
The purrr package makes it easy to work with lists and functions. This cheatsheet will remind you how to manipulate lists with purrr as well as how to apply functions iteratively to each element of a list or vector. The back of the cheatsheet explains how to work with list-columns. With list columns, you can use a simple data frame to organize any collection of objects in R.

Data Import Cheatsheet
The Data Import cheatsheet reminds you how to read in flat files with http://readr.tidyverse.org/, work with the results as tibbles, and reshape messy data with tidyr. Use tidyr to reshape your tables into tidy data, the data format that works the most seamlessly with R and the tidyverse.

Data Transformation Cheatsheet
dplyr provides a grammar for manipulating tables in R. This cheatsheet will guide you through the grammar, reminding you how to select, filter, arrange, mutate, summarise, group, and join data frames and tibbles.

Sparklyr Cheatsheet
Sparklyr provides an R interface to Apache Spark, a fast and general engine for processing Big Data. With sparklyr, you can connect to a local or remote Spark session, use dplyr to manipulate data in Spark, and run Spark’s built in machine learning algorithms.

R Markdown Cheatsheet
R Markdown is an authoring format that makes it easy to write reusable reports with R. You combine your R code with narration written in markdown (an easy-to-write plain text format) and then export the results as an html, pdf, or Word file. You can even use R Markdown to build interactive documents and slideshows.

RStudio IDE Cheatsheet
The RStudio IDE is the most popular integrated development environment for R. Do you want to write, run, and debug your own R code? Work collaboratively on R projects with version control? Build packages or create documents and apps? No matter what you do with R, the RStudio IDE can help you do it faster. This cheatsheet will guide you through the most useful features of the IDE, as well as the long list of keyboard shortcuts built into the RStudio IDE.

Shiny Cheatsheet
If you’re ready to build interactive web apps with R, say hello to Shiny. This cheatsheet provides a tour of the Shiny package and explains how to build and customize an interactive app. Be sure to follow the links on the sheet for even more information.

Data Visualization Cheatsheet
The ggplot2 package lets you make beautiful and customizable plots of your data. It implements the grammar of graphics, an easy to use system for building plots. See docs.ggplot2.org for detailed examples.

Package Development Cheatsheet
The devtools package makes it easy to build your own R packages, and packages make it easy to share your R code. Supplement this cheatsheet with r-pkgs.had.co.nz, Hadley’s book on package development.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Student Guide
Provider:: RStudio
Author:: RStudio
Date Added:: 08/07/2020