Updating search results...

Data

Information that has been collected through research. Research data management, metadata, data repositories, data citations, data sharing, data reuse, and more.

154 affiliated resources

Search Resources

View
Selected filters:
Poor statistical reporting, inadequate data presentation and spin persist despite editorial advice
Unrestricted Use
CC BY
Rating
0.0 stars

The Journal of Physiology and British Journal of Pharmacology jointly published an editorial series in 2011 to improve standards in statistical reporting and data analysis. It is not known whether reporting practices changed in response to the editorial advice. We conducted a cross-sectional analysis of reporting practices in a random sample of research papers published in these journals before (n = 202) and after (n = 199) publication of the editorial advice. Descriptive data are presented. There was no evidence that reporting practices improved following publication of the editorial advice. Overall, 76-84% of papers with written measures that summarized data variability used standard errors of the mean, and 90-96% of papers did not report exact p-values for primary analyses and post-hoc tests. 76-84% of papers that plotted measures to summarize data variability used standard errors of the mean, and only 2-4% of papers plotted raw data used to calculate variability. Of papers that reported p-values between 0.05 and 0.1, 56-63% interpreted these as trends or statistically significant. Implied or gross spin was noted incidentally in papers before (n = 10) and after (n = 9) the editorial advice was published. Overall, poor statistical reporting, inadequate data presentation and spin were present before and after the editorial advice was published. While the scientific community continues to implement strategies for improving reporting practices, our results indicate stronger incentives or enforcements are needed.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
PLOS ONE
Author:
Annie A. Butler
Joanna Diong
Martin E. Héroux
Simon C. Gandevia
Date Added:
08/07/2020
Preparing code and data for computationally reproducible collaboration and publication: a hands-on workshop
Unrestricted Use
CC BY
Rating
0.0 stars

Computational analyses are playing an increasingly central role in research. Journals, funders, and researchers are calling for published research to include associated data and code. However, many involved in research have not received training in best practices and tools for sharing code and data. This course aims to address this gap in training while also providing those who support researchers with curated best practices guidance and tools.This course is unique compared to other reproducibility courses due to its practical, step-by-step design. It is comprised of hands-on exercises to prepare research code and data for computationally reproducible publication. Although the course starts with some brief introductory information about computational reproducibility, the bulk of the course is guided work with data and code. Participants move through preparing research for reuse, organization, documentation, automation, and submitting their code and data to share. Tools that support reproducibility will be introduced (Code Ocean), but all lessons will be platform agnostic.Level: IntermediateIntended audience: The course is targeted at researchers and research support staff who are involved in the preparation and publication of research materials. Anyone with an interest in reproducible publication is welcome. The course is especially useful for those looking to learn practical steps for improving the computational reproducibility of their own research.

Subject:
Applied Science
Life Science
Physical Science
Social Science
Material Type:
Activity/Lab
Author:
April Clyburne-Sherin
Date Added:
08/08/2019
The Preregistration Challenge: A How To Guide
Unrestricted Use
CC BY
Rating
0.0 stars

This video shows interested researchers how to get started on their own preregistration as part of the Preregistration Challenge. Learn how to create a new draft, find example preregistrations from different fields, respond to comments from the preregistration review team, and turn your final draft into a formal preregistration. For more information, check out https://www.cos.io/initiatives/prereg-more-information.

Subject:
Education
Material Type:
Lesson
Provider:
Center for Open Science
Date Added:
03/31/2021
Preregistration: Improve Research Rigor, Reduce Bias
Unrestricted Use
CC BY
Rating
0.0 stars

In this webinar Professor Brian Nosek, Executive Director of the Center for Open Science (https://cos.io), outlines the practice of Preregistration and how it can aid in increasing the rigor and reproducibility of research. The webinar is co-hosted by the Health Research Alliance, a collaborative member organization of nonprofit research funders. Slides available at: https://osf.io/9m6tx/

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Pre-results Review in Economics: Lessons Learned from Setting up Registered Reports
Unrestricted Use
CC BY
Rating
0.0 stars

Hear from Andrew Foster, editor at the Journal of Development Economics, and Irenaeus Wolff, a guest editor for Experimental Economics, as they discuss their experiences with implementing the Registered Reports format, how it was received by authors, and the trends they noticed after adoption. Aleksandar Bogdanoski of BITSS also joins us to explore pre-results review, how to facilitate the process at journals, and best practices for supporting authors and reviewers.

Subject:
Education
Material Type:
Lesson
Provider:
Center for Open Science
Author:
Aleksandar Bogdanoski
Andrew Foster
Irenaeus Wolff
Date Added:
03/31/2021
Programming with MATLAB
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to MATLAB is built around a common scientific task: data analysis. Our real goal isn’t to teach you MATLAB, but to teach you the basic concepts that all programming depends on. We use MATLAB in our lessons because: we have to use something for examples; it’s well-documented; it has a large (and growing) user base among scientists in academia and industry; and it has a large library of packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so that you can share your work with them easily, and to use that language well.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Gerard Capes
Date Added:
03/20/2017
Programming with Python
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis. Arthritis Inflammation We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, columns represent successive days. The first three rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 Each number represents the number of inflammation bouts that a particular patient experienced on a given day. For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study. So, we want to: Calculate the average inflammation per day across all patients. Plot the result to discuss and share with colleagues. To do all that, we’ll have to learn a little bit about programming.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Anne Fouilloux
Lauren Ko
Maxim Belkin
Trevor Bekolay
Valentina Staneva
Date Added:
08/07/2020
Programming with R
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis. Our real goal isn’t to teach you R, but to teach you the basic concepts that all programming depends on. We use R in our lessons because: we have to use something for examples; it’s free, well-documented, and runs almost everywhere; it has a large (and growing) user base among scientists; and it has a large library of external packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so you can share your work with them easily, and to use that language well. We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 We want to: load that data into memory, calculate the average inflammation per day across all patients, and plot the result. To do all that, we’ll have to learn a little bit about programming.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Diya Das
Katrin Leinweber
Rohit Goswami
Date Added:
03/20/2017
Project Organization and Management for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:
Business and Communication
Genetics
Life Science
Management
Material Type:
Module
Provider:
The Carpentries
Author:
Amanda Charbonneau
Bérénice Batut
Daniel O. S. Ouso
Deborah Paul
Erin Alison Becker
François Michonneau
Jason Williams
Juan A. Ugalde
Kevin Weitemier
Laura Williams
Paula Andrea Martinez
Peter R. Hoyt
Rayna Michelle Harris
Taylor Reiter
Toby Hodges
Tracy Teal
Date Added:
08/07/2020
Public Availability of Published Research Data in High-Impact Journals
Unrestricted Use
CC BY
Rating
0.0 stars

Background There is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature. Methods and Results We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available. Conclusion A substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. This empiric evaluation highlights opportunities for improvement.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
PLOS ONE
Author:
Alawi A. Alsheikh-Ali
John P. A. Ioannidis
Mouaz H. Al-Mallah
Waqas Qureshi
Date Added:
08/07/2020
Public Data Archiving in Ecology and Evolution: How Well Are We Doing?
Unrestricted Use
CC BY
Rating
0.0 stars

Policies that mandate public data archiving (PDA) successfully increase accessibility to data underlying scientific publications. However, is the data quality sufficient to allow reuse and reanalysis? We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. We suggest that cultural shifts facilitating clearer benefits to authors are necessary to achieve high-quality PDA and highlight key guidelines to help authors increase their data’s reuse potential and compliance with journal data policies.

Subject:
Biology
Life Science
Material Type:
Reading
Provider:
PLOS Biology
Author:
Dominique G. Roche
Loeske E. B. Kruuk
Robert Lanfear
Sandra A. Binning
Date Added:
08/07/2020
Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size
Unrestricted Use
CC BY
Rating
0.0 stars

Background The p value obtained from a significance test provides no information about the magnitude or importance of the underlying phenomenon. Therefore, additional reporting of effect size is often recommended. Effect sizes are theoretically independent from sample size. Yet this may not hold true empirically: non-independence could indicate publication bias. Methods We investigate whether effect size is independent from sample size in psychological research. We randomly sampled 1,000 psychological articles from all areas of psychological research. We extracted p values, effect sizes, and sample sizes of all empirical papers, and calculated the correlation between effect size and sample size, and investigated the distribution of p values. Results We found a negative correlation of r = −.45 [95% CI: −.53; −.35] between effect size and sample size. In addition, we found an inordinately high number of p values just passing the boundary of significance. Additional data showed that neither implicit nor explicit power analysis could account for this pattern of findings. Conclusion The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology.

Subject:
Psychology
Social Science
Material Type:
Reading
Provider:
PLOS ONE
Author:
Anton Kühberger
Astrid Fritz
Thomas Scherndl
Date Added:
08/07/2020
P values in display items are ubiquitous and almost invariably significant: A survey of top science journals
Unrestricted Use
CC BY
Rating
0.0 stars

P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.

Subject:
Mathematics
Statistics and Probability
Material Type:
Reading
Provider:
PLOS ONE
Author:
Ioana Alina Cristea
John P. A. Ioannidis
Date Added:
08/07/2020
Python for Humanities
Unrestricted Use
CC BY
Rating
0.0 stars

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Iain Emsley
Date Added:
08/07/2020
Questionable and Open Research Practices in Education Research
Unrestricted Use
CC BY
Rating
0.0 stars

Discussions of how to improve research quality are predominant in a number of fields, including education. But how prevalent are the use of problematic practices and the improved practices meant to counter them? This baseline information will be a critical data source as education researchers seek to improve our research practices. In this preregistered study, we replicated and extended previous studies from other fields by asking education researchers about 10 questionable research practices and 5 open research practices. We asked them to estimate the prevalence of the practices in the field, self-report their own use of such practices, and estimate the appropriateness of these behaviors in education research. We made predictions under four umbrella categories: comparison to psychology, geographic location, career stage, and quantitative orientation. Broadly, our results suggest that both questionable and open research practices are part of the typical research practices of many educational researchers. Preregistration, code, and data can be found at https://osf.io/83mwk/.

Subject:
Education
Material Type:
Reading
Author:
Bryan G. Cook
Jaret Hodges
Jonathan Plucker
Matthew C. Makel
Date Added:
08/07/2020
Questionable research practices in ecology and evolution
Unrestricted Use
CC BY
Rating
0.0 stars

We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution.

Subject:
Biology
Ecology
Life Science
Material Type:
Reading
Provider:
PLOS ONE
Author:
Ashley Barnett
Fiona Fidler
Hannah Fraser
Shinichi Nakagawa
Tim Parker
Date Added:
08/07/2020
Raiders of the lost HARK: a reproducible inference framework for big data science
Unrestricted Use
CC BY
Rating
0.0 stars

Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
Palgrave Communications
Author:
Iain E. Buchan
James S. Koopman
Jiang Bian
Matthew Sperrin
Mattia Prosperi
Mo Wang
Date Added:
08/07/2020
Registered reports: an early example and analysis
Unrestricted Use
CC BY
Rating
0.0 stars

The recent ‘replication crisis’ in psychology has focused attention on ways of increasing methodological rigor within the behavioral sciences. Part of this work has involved promoting ‘Registered Reports’, wherein journals peer review papers prior to data collection and publication. Although this approach is usually seen as a relatively recent development, we note that a prototype of this publishing model was initiated in the mid-1970s by parapsychologist Martin Johnson in the European Journal of Parapsychology (EJP). A retrospective and observational comparison of Registered and non-Registered Reports published in the EJP during a seventeen-year period provides circumstantial evidence to suggest that the approach helped to reduce questionable research practices. This paper aims both to bring Johnson’s pioneering work to a wider audience, and to investigate the positive role that Registered Reports may play in helping to promote higher methodological and statistical standards.

Subject:
Applied Science
Information Science
Psychology
Social Science
Material Type:
Reading
Provider:
PeerJ
Author:
Caroline Watt
Diana Kornbrot
Richard Wiseman
Date Added:
08/07/2020
Reproducible and reusable research: are journal data sharing policies meeting the mark?
Unrestricted Use
CC BY
Rating
0.0 stars

Background There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible. Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature. Methods The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics. The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts. The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal. Results A total of 11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. A total of 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it. A total of 9.1% of journals mentioned data sharing indirectly, and only 14.8% addressed protein, proteomic, and/or genomic data sharing. There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing criteria. Open access journals were not more likely to require data sharing than subscription journals. Discussion Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement. Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable.

Subject:
Applied Science
Biology
Health, Medicine and Nursing
Life Science
Material Type:
Reading
Provider:
PeerJ
Author:
Jessica Minnier
Melissa A. Haendel
Nicole A. Vasilevsky
Robin E. Champieux
Date Added:
08/07/2020