Ensuring Rigor in Local assessment systems: A Self-Evaluation Protocol
SCILLSS logo
Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores (SCILLSS)
Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol
Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol was developed with funding from the US Department of Education under Enhanced Assessment Grants Program CFDA 84.368A. The contents do not necessarily represent the policy of the US Department of Education, and no assumption of endorsement by the Federal government should be made.
All rights reserved. Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as: Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores Project (SCILLSS). (2017). Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol. Lincoln, NE: Nebraska Department of Education.
Table of Contents within PDF
Background
Why evaluate assessments?
What is this protocol designed to do?
Guidelines for Implementing the Self-Evaluation Protocol
Self-Evaluation Protocol, Step One: Articulate your current and planned needs for assessment scores and data
Self-Evaluation Protocol, Step Two: Identify all current and planned assessments
Self-Evaluation Protocol, Step Three: Gather and evaluate the evidence for each assessment
Evidence for Construct Coherence
Evidence for Comparability and Reliability
Evidence for Fairness and Accessibility
Evidence Related to Consequences and Use
Self-Evaluation Protocol, Step Four: Review the evidence across assessments
Self-Evaluation Protocol, Steps One and Two: Identifying Purposes and Assessments Used to Serve those Purposes
Self-Evaluation Protocol, Step Three: Gather and Evaluate the Evidence for Each Assessment
Construct Coherence
Comparability and Reliability
Fairness and Accessibility
Consequences and Use
Self-Evaluation Protocol, Step Four: Summary of Individual Assessment Reviews
Glossary
List of Exhibits
Exhibit 1. Assessment Uses and Associated Stakes
Exhibit 2. Evidence for Construct Coherence
Exhibit 3. Evidence for Comparability and Reliability
Exhibit 4. Evidence for Fairness and Accessibility
Exhibit 5. Evidence Related to Consequences and Use
The Standards for Educational and Psychological Testing are referenced throughout this document.
Its citation is:
American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological testing. Washington DC: AERA.
Background
Every US state and school district uses one or more assessments of students’ academic knowledge and skills for a variety of purposes. This self-evaluation protocol is designed to support local educators (including but not limited to, district test coordinators, curriculum specialists, principals, and/or teachers) in evaluating each of these assessments as well as their local assessment system, as a whole. We suggest using an inclusive process with this protocol, with multiple individuals contributing as a team and with the understanding that this process may lead to some internal debate on the value and purpose of assessment within your school or district.
Why evaluate assessments?
An assessment system at a school or school district level should provide students, teachers, administrators, and school personnel with an accurate reflection of the key concepts, knowledge, and skills that students have achieved for a range of purposes. Each assessment within the local assessment system should yield information that is meaningful and useful for a particular purpose or purposes. The only way one can know if an assessment yields valid and useful information is to evaluate evidence in relation to how its scores are to be interpreted and used. This process, known as validity evaluation, is what this protocol is designed to support.
Addressing questions about the validity and reliability of assessments is an essential obligation of any person or agency using test scores to make judgments about any individual or group. This obligation applies whether a test is teacher-made for a class or produced commercially for large-scale use. What differs are expectations for the nature and degree of evidence necessary to support the interpretations and uses of the test scores. Note that validity and reliability are not characteristics of a test itself: they apply to the scores a test yields and the uses for those scores. A test is not inherently good or bad, but its scores can be used for appropriate or inappropriate purposes.
This notion of validity in relation to scores and score uses is so fundamental that it is the very first standard in the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), the document that guides all educational and psychological assessment practices in the US.
“Standard 1.0. Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.”
(AERA, APA, & NCME, 2014, p. 23)
For the present purposes, we reflect this concept in foundational questions that underlie this self-evaluation protocol:
For what purpose(s) was the assessment developed? Is the purpose for which you are using the assessment among those purposes for which it was developed?
The protocol begins and ends with a consideration of purpose.
What is this protocol designed to do?
This self-evaluation protocol provides a framework for educators at a school, school district, or local system level to use in considering how to best implement an assessment system. It is designed to focus on assessments that are state- or district-mandated, developed by an independent test vendor, and selected for use within a school throughout the school year. Scores from these assessments may be used as part of official accountability programs; others may inform instruction or yield information for use in assigning grades. All tests that yield scores used for any of these purposes are part of a school’s or district’s assessment system and should be evaluated on a regular basis.
This protocol is meant to support reviews for each assessment in a system and for a local assessment system, as a whole. Educators at a school or school district level can use and modify this protocol as needed to best suit their needs. It may be helpful to consider each assessment from multiple perspectives, such as those of test administrators, teachers, parents, and students. Different stakeholders may hold different views on what scores mean and how they should be used; it may be necessary to determine which interpretations and uses are supported by evidence and which are not.
The SCILLSS Digital Workbook on Educational Assessment Design and Evaluation is designed as a resource for those implementing the self-evaluation protocol. The workbook encompasses five chapters that together are intended to provide state and local educators with a grounding in the principles for high quality assessment. Such principles are critical to the appropriate selection, development, and use of assessments in educational settings. While this digital workbook is not a toolkit for developing assessments, it offers a framework for making decisions about whether to develop or adopt tests and for evaluating tests currently in use. The workbook is designed to be used on its own or as a resource for those completing the SCILLSS self-evaluation protocols at the local or state level. The five chapters comprising the digital workbook are available on the
SCILLSS website here.
Guidelines for Implementing the Self-Evaluation Protocol
We recommend four steps in implementing this protocol:
1. Articulate your current and planned needs for assessment scores and data
2. Identify all current and planned assessments
3. Gather and evaluate the evidence for each assessment
4. Review the evidence across assessments
Next, we provide considerations and guidelines for preparation prior to implementing the self- evaluation protocol. We also recommend your team gather information on each of the assessments administered at the school or district level. This information includes, but is not limited to, assessment purposes and uses, assessment technical manuals, assessment research conducted by publishers/test vendors and/or by outside researchers, and administration manuals for the assessment.
Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores (SCILLSS)
Click here to access the PDF which explains the four step process.
Digital Workbook on Educational Assessment Design and Evaluation: Creating and Evaluating Effective Educational Assessments
This professional development assessment literacy resource is divided into chapters and is intended to help educators ensure that the assessments they use provide meaningful information about what students know and can do. This resource will complement the self-evaluation protocol. Each chapter will provide a deeper understanding and examples of evidence to complete the various sections of the protocol. This assessment literacy professional development resource could also be used in isolation.
Chapter 1: Purposes and Uses of Assessment Scores, Validity, Validity Questions (Adobe Connect Digital Workbook, PPT, Script, MP3)
- Chapter 1.1 Assessment Purposes and Uses
- Chapter 1.2 Validity as the Key Principle of Assessment Quality
- Chapter 1.3 Four Validity Questions to Guide Assessment Development and Evaluation
- Chapter 1.4 Summary and Next Steps
Chapter 2: Construct Coherence (Adobe Connect Digital Workbook, PPT, Script, MP3)
- Chapter 2.1 Review of Key Concepts from Chapter 1
- Chapter 2.2 The Concept of Construct Coherence
- Chapter 2.3 Validity Questions Related to Construct Coherence
Chapter 3: Comparability (Adobe Connect Digital Workbook, PPT, Script, MP3)
- Chapter 3.1 Review of Key Concepts from Chapters 1 and 2
- Chapter 3.2 What is Comparability and Why is it Important?
- Chapter 3.3 What is Reliability/Precision and Why is it Important?
- Chapter 3.4 Validity Questions Related to Comparability and Reliability/Precision
Chapter 4: Fairness and Accessibility (Adobe Connect Digital Workbook, PPT, Script, MP3
- Chapter 4.1 Review of Key Concepts from Chapters 1, 2, and 3
- Chapter 4.2 What is Fairness and Accessibility and Why is it Important?
- Chapter 4.3 Validity Questions Related to Fairness and Accessibility
Chapter 5: Consequences Associated with Testing (Adobe Connect Digital Workbook, PPT, Script, MP3)
- Chapter 5.1 Review of Key Concepts from Chapters 1, 2, 3, and 4
- Chapter 5.2 Consequences Associated with Testing
- Chapter 5.3 Validity Questions Related to Consequences
i