Today we're going to talk about p-hacking (also called data dredging or …
Today we're going to talk about p-hacking (also called data dredging or data fishing). P-hacking is when data is analyzed to find patterns that produce statistically significant results, even if there really isn't an underlying effect, and it has become a huge problem in science since many scientific theories rely on p-values as proof of their existence! Today, we're going to talk about a few ways researchers have "hacked" their data, and give you some tips for identifying and avoiding these types of problems when you encounter stats in your own lives.
Last week we introduced p-values as a way to set a predetermined …
Last week we introduced p-values as a way to set a predetermined cutoff when testing if something seems unusual enough to reject our null hypothesis - that they are the same. But today we’re going to discuss some problems with the logic of p-values, how they are commonly misinterpreted, how p-values don’t give us exactly what we want to know, and how that cutoff is arbitrary - and arguably not stringent enough in some scenarios.
We're going to finish up our discussion of p-values by taking a …
We're going to finish up our discussion of p-values by taking a closer look at how they can get it wrong, and what we can do to minimize those errors. We'll discuss Type 1 (when we think we've detected an effect, but there actually isn't one) and Type 2 (when there was an effect we didn't see) errors and introduce statistical power - which tells us the chance of detecting an effect if there is one.
Today we’re going to finish up our unit on data visualization by …
Today we’re going to finish up our unit on data visualization by taking a closer look at how dot plots, box plots, and stem and leaf plots represent data. We’ll also talk about the rules we can use to identify outliers and apply our new data viz skills by taking a closer look at how Justin Timberlake’s song lyrics have changed since he went solo.
Today we’re going to begin our discussion of probability. We’ll talk about …
Today we’re going to begin our discussion of probability. We’ll talk about how the addition (OR) rule, the multiplication (AND) rule, and conditional probabilities help us figure out the likelihood of sequences of events happening - from optimizing your chances of having a great night out with friends to seeing Cole Sprouse at IHop!
Today we're going to introduce bayesian statistics and discuss how this new …
Today we're going to introduce bayesian statistics and discuss how this new approach to statistics has revolutionized the field from artificial intelligence and clinical trials to how your computer filters spam! We'll also discuss the Law of Large Numbers and how we can use simulations to help us better understand the "rules" of our data, even if we don't know the equations that define those rules.
There are a lot of events in life that we just can’t …
There are a lot of events in life that we just can’t predict, but just because something is random doesn’t mean we don’t know or can’t learn anything about it. Today, we’re going to talk about how we can extract information from seemingly random events starting with the expected value or mean of a distribution and walking through the first four “moments” - the mean, variance, skewness, and kurtosis.
Today we're going to introduce one of the most flexible statistical tools …
Today we're going to introduce one of the most flexible statistical tools - the General Linear Model (or GLM). GLMs allow us to create many different models to help describe the world - you see them a lot in science, economics, and politics. Today we're going to build a hypothetical model to look at the relationship between likes and comments on a trending YouTube video using the Regression Model. We'll be introducing other popular models over the next few episodes.
Replication (re-running studies to confirm results) and reproducibility (the ability to repeat …
Replication (re-running studies to confirm results) and reproducibility (the ability to repeat an analyses on data) have come under fire over the past few years. The foundation of science itself is built upon statistical analysis and yet there has been more and more evidence that suggests possibly even the majority of studies cannot be replicated. This "replication crisis" is likely being caused by a number of factors which we'll discuss as well as some of the proposed solutions to ensure that the results we're drawing from scientific studies are reliable.
Today we’re going to talk about good and bad surveys. Surveys are …
Today we’re going to talk about good and bad surveys. Surveys are everywhere, from user feedback surveys to telephone polls, and those questionnaires at your doctor's office. Still, with their ease to create and distribute, they're also susceptible to bias and error. So today we’re going to talk about identifying good and bad survey questions, and how groups (or samples) are selected to represent the entire population since it's often just not feasible to ask everyone.
We’ve talked a lot in this series about how often you see …
We’ve talked a lot in this series about how often you see data and statistics in the news and on social media - which is ALL THE TIME! But how do you know who and what you can trust? Today, we’re going to talk about how we, as consumers, can spot flawed studies, sensationalized articles, and just plain poor reporting. And this isn’t to say that all science articles you read on facebook or in magazines are wrong, but that it's valuable to read those catchy headlines with some skepticism.
When collecting data to make observations about the world it usually just …
When collecting data to make observations about the world it usually just isn't possible to collect ALL THE DATA. So instead of asking every single person about student loan debt for instance we take a sample of the population, and then use the shape of our samples to make inferences about the true underlying distribution our data. It turns out we can learn a lot about how something occurs, even if we don't know the underlying process that causes it. Today, we’ll also introduce the normal (or bell) curve and talk about how we can learn some really useful things from a sample's shape - like if an exam was particularly difficult, how often old faithful erupts, or if there are two types of runners that participate in marathons!
As we near the end of the series, we're going look at …
As we near the end of the series, we're going look at how statistics impacts our lives. Today, we're going to discuss how statistics is often used and misused in the courtroom. We're going to focus on three stories in which three huge statistical errors were made: the handwriting analysis of French officer Alfred Dreyfus in 1894, the murder charges of mother Sally Clark in 1998, and the expulsion of student Jonathan Dorfman from UC San Diego in 2011.
We've talked a lot about modeling data and making inferences about it, …
We've talked a lot about modeling data and making inferences about it, but today we're going to look towards the future at how machine learning is being used to build models to predict future outcomes. We'll discuss three popular types of supervised machine learning models: Logistic Regression, Linear discriminant Analysis (or LDA) and K Nearest Neighbors (or KNN).
Today we're going to walk through a couple of statistical approaches to …
Today we're going to walk through a couple of statistical approaches to answer the question: "is coffee from the local cafe, Caf-fiend, better than that other cafe, The Blend Den?" We'll build a two sample t-test which will tell us how many standard errors away from the mean our observed difference is in our tasting experiment, and then we'll introduce a matched pair t-tests which allow us to remove variation in the experiment. All of these approaches rely on the test statistic framework we introduced last episode.
Test statistics allow us to quantify how close things are to our …
Test statistics allow us to quantify how close things are to our expectations or theories. Instead of going on our gut feelings, they allow us to add a little mathematical rigor when asking the question: “Is this random… or real?” Today, we’ll introduce some examples using both t-tests and z-tests and explain how critical values and p-values are different ways of telling us the same information. We’ll get to some other test statistics like F tests and chi-square in a future episode.
Today we're going to discuss how machine learning can be used to …
Today we're going to discuss how machine learning can be used to group and label information even if those labels don't exist. We'll explore two types of clustering used in Unsupervised Machine Learning: k-means and Hierarchical clustering, and show how they can be used in many ways - from book suggestions and medical interventions, to giving people better deals on pizza!
Today we're going to discuss the role of statistics during war. From …
Today we're going to discuss the role of statistics during war. From helping the Allies break Nazi Enigma codes and estimate tank production rates to finding sunken submarines, statistics have and continue to play a critical role on the battlefield.
Today we’re going to talk about why many predictions fail - specifically …
Today we’re going to talk about why many predictions fail - specifically we’ll take a look at the 2008 financial crisis, the 2016 U.S. presidential election, and earthquake prediction in general. From inaccurate or just too little data to biased models and polling errors, knowing when and why we make inaccurate predictions can help us make better ones in the future. And even knowing what we can’t predict can help us make better decisions too.
In our series finale, we're going to take a look at some …
In our series finale, we're going to take a look at some of the times we've used statistics to gaze into our crystal ball, and actually got it right! We'll talk about how stores know what we want to buy (which can sometimes be a good thing), how baseball was changed forever when Paul DePodesta created a record-winning Oakland A's baseball team, and how statistics keeps us safe with the incredible strides we've made in weather forecasting. Statistics are everywhere, and even if you don't remember all the formulae and graphs we've thrown at you in this series, we hope you take with you a better appreciation of the many ways statistics impacts your life, and hopefully we've given your a more math-y perspective on how the world works. Thanks so much for watching DFTBAQ!
No restrictions on your remixing, redistributing, or making derivative works. Give credit to the author, as required.
Your remixing, redistributing, or making derivatives works comes with some restrictions, including how it is shared.
Your redistributing comes with some restrictions. Do not remix or make derivative works.
Most restrictive license type. Prohibits most uses, sharing, and any changes.
Copyrighted materials, available under Fair Use and the TEACH Act for US-based educators, or other custom arrangements. Go to the resource provider to see their individual restrictions.