Information that has been collected through research. Research data management, metadata, data repositories, data citations, data sharing, data reuse, and more.
The objective of this study was to evaluate the nature and extent …
The objective of this study was to evaluate the nature and extent of reproducible and transparent research practices in neurology publications. Methods The NLM catalog was used to identify MEDLINE-indexed neurology journals. A PubMed search of these journals was conducted to retrieve publications over a 5-year period from 2014 to 2018. A random sample of publications was extracted. Two authors conducted data extraction in a blinded, duplicate fashion using a pilot-tested Google form. This form prompted data extractors to determine whether publications provided access to items such as study materials, raw data, analysis scripts, and protocols. In addition, we determined if the publication was included in a replication study or systematic review, was preregistered, had a conflict of interest declaration, specified funding sources, and was open access. Results Our search identified 223,932 publications meeting the inclusion criteria, from which 400 were randomly sampled. Only 389 articles were accessible, yielding 271 publications with empirical data for analysis. Our results indicate that 9.4% provided access to materials, 9.2% provided access to raw data, 0.7% provided access to the analysis scripts, 0.7% linked the protocol, and 3.7% were preregistered. A third of sampled publications lacked funding or conflict of interest statements. No publications from our sample were included in replication studies, but a fifth were cited in a systematic review or meta-analysis. Conclusions Currently, published neurology research does not consistently provide information needed for reproducibility. The implications of poor research reporting can both affect patient care and increase research waste. Collaborative intervention by authors, peer reviewers, journals, and funding sources is needed to mitigate this problem.
Currently, there is a growing interest in ensuring the transparency and reproducibility …
Currently, there is a growing interest in ensuring the transparency and reproducibility of the published scientific literature. According to a previous evaluation of 441 biomedical journals articles published in 2000–2014, the biomedical literature largely lacked transparency in important dimensions. Here, we surveyed a random sample of 149 biomedical articles published between 2015 and 2017 and determined the proportion reporting sources of public and/or private funding and conflicts of interests, sharing protocols and raw data, and undergoing rigorous independent replication and reproducibility checks. We also investigated what can be learned about reproducibility and transparency indicators from open access data provided on PubMed. The majority of the 149 studies disclosed some information regarding funding (103, 69.1% [95% confidence interval, 61.0% to 76.3%]) or conflicts of interest (97, 65.1% [56.8% to 72.6%]). Among the 104 articles with empirical data in which protocols or data sharing would be pertinent, 19 (18.3% [11.6% to 27.3%]) discussed publicly available data; only one (1.0% [0.1% to 6.0%]) included a link to a full study protocol. Among the 97 articles in which replication in studies with different data would be pertinent, there were five replication efforts (5.2% [1.9% to 12.2%]). Although clinical trial identification numbers and funding details were often provided on PubMed, only two of the articles without a full text article in PubMed Central that discussed publicly available data at the full text level also contained information related to data sharing on PubMed; none had a conflicts of interest statement on PubMed. Our evaluation suggests that although there have been improvements over the last few years in certain key indicators of reproducibility and transparency, opportunities exist to improve reproducible research practices across the biomedical literature and to make features related to reproducibility more readily visible in PubMed.
This lesson in part of Software Carpentry workshop and teach novice programmers …
This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches …
Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches how to analyse and visualise data used by social scientists. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Una introducción a R utilizando los datos de Gapminder. El objetivo de …
Una introducción a R utilizando los datos de Gapminder. El objetivo de esta lección es enseñar a las programadoras principiantes a escribir códigos modulares y adoptar buenas prácticas en el uso de R para el análisis de datos. R nos provee un conjunto de paquetes desarrollados por terceros que se usan comúnmente en diversas disciplinas científicas para el análisis estadístico. Encontramos que muchos científicos que asisten a los talleres de Software Carpentry utilizan R y quieren aprender más. Nuestros materiales son relevantes ya que proporcionan a los asistentes una base sólida en los fundamentos de R y enseñan las mejores prácticas del cómputo científico: desglose del análisis en módulos, automatización tareas y encapsulamiento. Ten en cuenta que este taller se enfoca en los fundamentos del lenguaje de programación R y no en el análisis estadístico. A lo largo de este taller se utilizan una variedad de paquetes desarrolados por terceros, los cuales no son necesariamente los mejores ni se encuentran explicadas todas sus funcionalidades, pero son paquetes que consideramos útiles y han sido elegidos principalmente por su facilidad de uso.
Background Sharing research data provides benefit to the general scientific community, but …
Background Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. Principal Findings We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. Significance This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim …
Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction. Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.
This webinar will introduce the integration of JASP Statistical Software (https://jasp-stats.org/) with …
This webinar will introduce the integration of JASP Statistical Software (https://jasp-stats.org/) with the Open Science Framework (OSF; https://osf.io). The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github, Mendeley, and now is integrated with JASP, to streamline workflows and increase efficiency.
Replication is the cornerstone of a cumulative science. However, new tools and …
Replication is the cornerstone of a cumulative science. However, new tools and technologies, massive amounts of data, interdisciplinary approaches, and the complexity of the questions being asked are complicating replication efforts, as are increased pressures on scientists to advance their research. As full replication of studies on independently collected data is often not feasible, there has recently been a call for reproducible research as an attainable minimum standard for assessing the value of scientific claims. This requires that papers in experimental science describe the results and provide a sufficiently clear protocol to allow successful repetition and extension of analyses based on original data. The importance of replication and reproducibility has recently been exemplified through studies showing that scientific papers commonly leave out experimental details essential for reproduction, studies showing difficulties with replicating published experimental results, an increase in retracted papers, and through a high number of failing clinical trials. This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility. In order to foster such aspects, it has been suggested that the scientific community needs to develop a “culture of reproducibility” for computational science, and to require it for published claims. We want to emphasize that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducibility can also be a burden for you as an individual researcher. As an example, a good practice of reproducibility is necessary in order to allow previously developed methodology to be effectively applied on new data, or to allow reuse of code and results for new projects. In other words, good habits of reproducibility may actually turn out to be a time-saver in the longer run. We further note that reproducibility is just as much about the habits that ensure reproducible research as the technologies that can make these processes efficient and realistic. Each of the following ten rules captures a specific aspect of reproducibility, and discusses what is needed in terms of information handling and tracking of procedures. If you are taking a bare-bones approach to bioinformatics analysis, i.e., running various custom scripts from the command line, you will probably need to handle each rule explicitly. If you are instead performing your analyses through an integrated framework (such as GenePattern, Galaxy, LONI pipeline, or Taverna), the system may already provide full or partial support for most of the rules. What is needed on your part is then merely the knowledge of how to exploit these existing possibilities.
This article offers a short guide to the steps scientists can take …
This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more—but our goal here is not to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to “care for and feed” data, with some practical advice on how to do that. The final section at the close of this work (Links to Useful Resources) offers links to the types of services referred to throughout the text.
Journal policy on research data and code availability is an important part …
Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. We make a further contribution by evaluating code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012. We build a predictive model of open data and code policy adoption as a function of impact factor and publisher and find higher impact journals more likely to have open data and code policies and scientific societies more likely to have open data and code policies than commercial publishers. We also find open data policies tend to lead open code policies, and we find no relationship between open data and code policies and either supplemental material policies or open access journal status. Of the journals in this study, 38% had a data policy, 22% had a code policy, and 66% had a supplemental materials policy as of June 2012. This reflects a striking one year increase of 16% in the number of data policies, a 30% increase in code policies, and a 7% increase in the number of supplemental materials policies. We introduce a new dataset to the community that categorizes data and code sharing, supplemental materials, and open access policies in 2011 and 2012 for these 170 journals.
Several fields of science are experiencing a "replication crisis" that has negatively …
Several fields of science are experiencing a "replication crisis" that has negatively impacted their credibility. Assessing the validity of a contribution via replicability of its experimental evidence and reproducibility of its analyses requires access to relevant study materials, data, and code. Failing to share them limits the ability to scrutinize or build-upon the research, ultimately hindering scientific progress.Understanding how the diverse research artifacts in HCI impact sharing can help produce informed recommendations for individual researchers and policy-makers in HCI. Therefore, we surveyed authors of CHI 2018–2019 papers, asking if they share their papers' research materials and data, how they share them, and why they do not. The results (N = 460/1356, 34% response rate) show that sharing is uncommon, partly due to misunderstandings about the purpose of sharing and reliable hosting. We conclude with recommendations for fostering open research practices.This paper and all data and materials are freely available at https://osf.io/csy8q
Background: Reproducible research is a foundational component for scientific advancements, yet little …
Background: Reproducible research is a foundational component for scientific advancements, yet little is known regarding the extent of reproducible research within the dermatology literature. Objective: This study aimed to determine the quality and transparency of the literature in dermatology journals by evaluating for the presence of 8 indicators of reproducible and transparent research practices. Methods: By implementing a cross-sectional study design, we conducted an advanced search of publications in dermatology journals from the National Library of Medicine catalog. Our search included articles published between January 1, 2014, and December 31, 2018. After generating a list of eligible dermatology publications, we then searched for full text PDF versions by using Open Access Button, Google Scholar, and PubMed. Publications were analyzed for 8 indicators of reproducibility and transparency—availability of materials, data, analysis scripts, protocol, preregistration, conflict of interest statement, funding statement, and open access—using a pilot-tested Google Form. Results: After exclusion, 127 studies with empirical data were included in our analysis. Certain indicators were more poorly reported than others. We found that most publications (113, 88.9%) did not provide unmodified, raw data used to make computations, 124 (97.6%) failed to make the complete protocol available, and 126 (99.2%) did not include step-by-step analysis scripts. Conclusions: Our sample of studies published in dermatology journals do not appear to include sufficient detail to be accurately and successfully reproduced in their entirety. Solutions to increase the quality, reproducibility, and transparency of dermatology research are warranted. More robust reporting of key methodological details, open data sharing, and stricter standards journals impose on authors regarding disclosure of study materials might help to better the climate of reproducible research in dermatology. [JMIR Dermatol 2019;2(1):e16078]
A study by David Baker and colleagues reveals poor quality of reporting …
A study by David Baker and colleagues reveals poor quality of reporting in pre-clinical animal research and a failure of journals to implement the ARRIVE guidelines. There is growing concern that poor experimental design and lack of transparent reporting contribute to the frequent failure of pre-clinical animal studies to translate into treatments for human disease. In 2010, the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were introduced to help improve reporting standards. They were published in PLOS Biology and endorsed by funding agencies and publishers and their journals, including PLOS, Nature research journals, and other top-tier journals. Yet our analysis of papers published in PLOS and Nature journals indicates that there has been very little improvement in reporting standards since then. This suggests that authors, referees, and editors generally are ignoring guidelines, and the editorial endorsement is yet to be effectively implemented.
Software Carpentry lesson on how to use the shell to navigate the …
Software Carpentry lesson on how to use the shell to navigate the filesystem and write simple loops and scripts. The Unix shell has been around longer than most of its users have been alive. It has survived so long because it’s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they aren’t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including “high-performance computing†supercomputers). These lessons will start you on a path towards using these resources effectively.
The CONsolidated Standards Of Reporting Trials (CONSORT) Statement provides a minimum standard …
The CONsolidated Standards Of Reporting Trials (CONSORT) Statement provides a minimum standard set of items to be reported in published clinical trials; it has received widespread recognition within the biomedical publishing community. This research aims to provide an update on the endorsement of CONSORT by high impact medical journals. Methods We performed a cross-sectional examination of the online “Instructions to Authors” of 168 high impact factor (2012) biomedical journals between July and December 2014. We assessed whether the text of the “Instructions to Authors” mentioned the CONSORT Statement and any CONSORT extensions, and we quantified the extent and nature of the journals’ endorsements of these. These data were described by frequencies. We also determined whether journals mentioned trial registration and the International Committee of Medical Journal Editors (ICMJE; other than in regards to trial registration) and whether either of these was associated with CONSORT endorsement (relative risk and 95 % confidence interval). We compared our findings to the two previous iterations of this survey (in 2003 and 2007). We also identified the publishers of the included journals. Results Sixty-three percent (106/168) of the included journals mentioned CONSORT in their “Instructions to Authors.” Forty-four endorsers (42 %) explicitly stated that authors “must” use CONSORT to prepare their trial manuscript, 38 % required an accompanying completed CONSORT checklist as a condition of submission, and 39 % explicitly requested the inclusion of a flow diagram with the submission. CONSORT extensions were endorsed by very few journals. One hundred and thirty journals (77 %) mentioned ICMJE, and 106 (63 %) mentioned trial registration. Conclusions The endorsement of CONSORT by high impact journals has increased over time; however, specific instructions on how CONSORT should be used by authors are inconsistent across journals and publishers. Publishers and journals should encourage authors to use CONSORT and set clear expectations for authors about compliance with CONSORT.
This lesson is part of the Software Carpentry workshops that teach how …
This lesson is part of the Software Carpentry workshops that teach how to use version control with Git. Wolfman and Dracula have been hired by Universal Missions (a space services spinoff from Euphoric State University) to investigate if it is possible to send their next planetary lander to Mars. They want to be able to work on the plans at the same time, but they have run into problems doing this in the past. If they take turns, each one will spend a lot of time waiting for the other to finish, but if they work on their own copies and email changes back and forth things will be lost, overwritten, or duplicated. A colleague suggests using version control to manage their work. Version control is better than mailing files back and forth: Nothing that is committed to version control is ever lost, unless you work really, really hard at it. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results. As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo†feature in an editor. When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s. Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded). Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
This webinar will introduce the concept of version control and the version …
This webinar will introduce the concept of version control and the version control features that are built into the Open Science Framework (OSF; https://osf.io). The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github and Mendeley, to streamline workflows and increase efficiency. This webinar will discuss how keeping track of the different file versions is important for efficient reproducible research practices, how version control works on the OSF, and how researchers can view and download previous versions of files.
More researchers are preregistering their studies as a way to combat publication …
More researchers are preregistering their studies as a way to combat publication bias and improve the credibility of research findings. Preregistration is at its core designed to distinguish between confirmatory and exploratory results. Both are important to the progress of science, but when they are conflated, problems arise. In this webinar, we discuss the What, Why, and How of preregistration and what it means for the future of science. Visit cos.io/prereg for additional resources.
No restrictions on your remixing, redistributing, or making derivative works. Give credit to the author, as required.
Your remixing, redistributing, or making derivatives works comes with some restrictions, including how it is shared.
Your redistributing comes with some restrictions. Do not remix or make derivative works.
Most restrictive license type. Prohibits most uses, sharing, and any changes.
Copyrighted materials, available under Fair Use and the TEACH Act for US-based educators, or other custom arrangements. Go to the resource provider to see their individual restrictions.