Ensuring Rigor in Local assessment systems: A Self-Evaluation Protocol

Created Feb. 7, 2024 by Dorann Avey

SCILLSS1.PNGSCILLSS logo

SCILLSS logo

Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores (SCILLSS)

Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol

Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol was developed with funding from the US Department of Education under Enhanced Assessment Grants Program CFDA 84.368A. The contents do not necessarily represent the policy of the US Department of Education, and no assumption of endorsement by the Federal government should be made.

All rights reserved. Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as: Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores Project (SCILLSS). (2017). Ensuring Rigor in Local Assessment Systems: A Self-Evaluation Protocol. Lincoln, NE: Nebraska Department of Education.

Table of Contents within PDF

Background

Why evaluate assessments?

What is this protocol designed to do?

Guidelines for Implementing the Self-Evaluation Protocol

Self-Evaluation Protocol, Step One: Articulate your current and planned needs for assessment scores and data

Self-Evaluation Protocol, Step Two: Identify all current and planned assessments

Self-Evaluation Protocol, Step Three: Gather and evaluate the evidence for each assessment

Evidence for Construct Coherence

Evidence for Comparability and Reliability

Evidence for Fairness and Accessibility

Evidence Related to Consequences and Use

Self-Evaluation Protocol, Step Four: Review the evidence across assessments

Self-Evaluation Protocol, Steps One and Two: Identifying Purposes and Assessments Used to Serve those Purposes

Self-Evaluation Protocol, Step Three: Gather and Evaluate the Evidence for Each Assessment

Construct Coherence

Comparability and Reliability

Fairness and Accessibility

Consequences and Use

Self-Evaluation Protocol, Step Four: Summary of Individual Assessment Reviews

Glossary

List of Exhibits

Exhibit 1. Assessment Uses and Associated Stakes

Exhibit 2. Evidence for Construct Coherence

Exhibit 3. Evidence for Comparability and Reliability

Exhibit 4. Evidence for Fairness and Accessibility

Exhibit 5. Evidence Related to Consequences and Use

The Standards for Educational and Psychological Testing are referenced throughout this document.

Its citation is:

American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for Educational and Psychological testing. Washington DC: AERA.

Background

Every US state and school district uses one or more assessments of students’ academic knowledge and skills for a variety of purposes. This self-evaluation protocol is designed to support local educators (including but not limited to, district test coordinators, curriculum specialists, principals, and/or teachers) in evaluating each of these assessments as well as their local assessment system, as a whole. We suggest using an inclusive process with this protocol, with multiple individuals contributing as a team and with the understanding that this process may lead to some internal debate on the value and purpose of assessment within your school or district.

Why evaluate assessments?

An assessment system at a school or school district level should provide students, teachers, administrators, and school personnel with an accurate reflection of the key concepts, knowledge, and skills that students have achieved for a range of purposes. Each assessment within the local assessment system should yield information that is meaningful and useful for a particular purpose or purposes. The only way one can know if an assessment yields valid and useful information is to evaluate evidence in relation to how its scores are to be interpreted and used. This process, known as validity evaluation, is what this protocol is designed to support.

Addressing questions about the validity and reliability of assessments is an essential obligation of any person or agency using test scores to make judgments about any individual or group. This obligation applies whether a test is teacher-made for a class or produced commercially for large-scale use. What differs are expectations for the nature and degree of evidence necessary to support the interpretations and uses of the test scores. Note that validity and reliability are not characteristics of a test itself: they apply to the scores a test yields and the uses for those scores. A test is not inherently good or bad, but its scores can be used for appropriate or inappropriate purposes.

This notion of validity in relation to scores and score uses is so fundamental that it is the very first standard in the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), the document that guides all educational and psychological assessment practices in the US.

“Standard 1.0. Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.”

(AERA, APA, & NCME, 2014, p. 23)

For the present purposes, we reflect this concept in foundational questions that underlie this self-evaluation protocol:

For what purpose(s) was the assessment developed? Is the purpose for which you are using the assessment among those purposes for which it was developed?

The protocol begins and ends with a consideration of purpose.

What is this protocol designed to do?

This self-evaluation protocol provides a framework for educators at a school, school district, or local system level to use in considering how to best implement an assessment system. It is designed to focus on assessments that are state- or district-mandated, developed by an independent test vendor, and selected for use within a school throughout the school year. Scores from these assessments may be used as part of official accountability programs; others may inform instruction or yield information for use in assigning grades. All tests that yield scores used for any of these purposes are part of a school’s or district’s assessment system and should be evaluated on a regular basis.

This protocol is meant to support reviews for each assessment in a system and for a local assessment system, as a whole. Educators at a school or school district level can use and modify this protocol as needed to best suit their needs. It may be helpful to consider each assessment from multiple perspectives, such as those of test administrators, teachers, parents, and students. Different stakeholders may hold different views on what scores mean and how they should be used; it may be necessary to determine which interpretations and uses are supported by evidence and which are not.

The SCILLSS Digital Workbook on Educational Assessment Design and Evaluation is designed as a resource for those implementing the self-evaluation protocol. The workbook encompasses five chapters that together are intended to provide state and local educators with a grounding in the principles for high quality assessment. Such principles are critical to the appropriate selection, development, and use of assessments in educational settings. While this digital workbook is not a toolkit for developing assessments, it offers a framework for making decisions about whether to develop or adopt tests and for evaluating tests currently in use. The workbook is designed to be used on its own or as a resource for those completing the SCILLSS self-evaluation protocols at the local or state level. The five chapters comprising the digital workbook are available on the

SCILLSS website here.

Guidelines for Implementing the Self-Evaluation Protocol

We recommend four steps in implementing this protocol:

1. Articulate your current and planned needs for assessment scores and data

2. Identify all current and planned assessments

3. Gather and evaluate the evidence for each assessment

4. Review the evidence across assessments

Next, we provide considerations and guidelines for preparation prior to implementing the self- evaluation protocol. We also recommend your team gather information on each of the assessments administered at the school or district level. This information includes, but is not limited to, assessment purposes and uses, assessment technical manuals, assessment research conducted by publishers/test vendors and/or by outside researchers, and administration manuals for the assessment.

Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores (SCILLSS)

Click here to access the PDF which explains the four step process.

Digital Workbook on Educational Assessment Design and Evaluation: Creating and Evaluating Effective Educational Assessments

This professional development assessment literacy resource is divided into chapters and is intended to help educators ensure that the assessments they use provide meaningful information about what students know and can do. This resource will complement the self-evaluation protocol. Each chapter will provide a deeper understanding and examples of evidence to complete the various sections of the protocol. This assessment literacy professional development resource could also be used in isolation.

Chapter 1: Purposes and Uses of Assessment Scores, Validity, Validity Questions (Adobe Connect Digital Workbook, PPT, Script, MP3)

Chapter 1.1 Assessment Purposes and Uses
Chapter 1.2 Validity as the Key Principle of Assessment Quality
Chapter 1.3 Four Validity Questions to Guide Assessment Development and Evaluation
Chapter 1.4 Summary and Next Steps

Chapter 2: Construct Coherence (Adobe Connect Digital Workbook, PPT, Script, MP3)

Chapter 2.1 Review of Key Concepts from Chapter 1
Chapter 2.2 The Concept of Construct Coherence
Chapter 2.3 Validity Questions Related to Construct Coherence

Chapter 3: Comparability (Adobe Connect Digital Workbook, PPT, Script, MP3)

Chapter 3.1 Review of Key Concepts from Chapters 1 and 2
Chapter 3.2 What is Comparability and Why is it Important?
Chapter 3.3 What is Reliability/Precision and Why is it Important?
Chapter 3.4 Validity Questions Related to Comparability and Reliability/Precision

Chapter 4: Fairness and Accessibility (Adobe Connect Digital Workbook, PPT, Script, MP3

Chapter 4.1 Review of Key Concepts from Chapters 1, 2, and 3
Chapter 4.2 What is Fairness and Accessibility and Why is it Important?
Chapter 4.3 Validity Questions Related to Fairness and Accessibility

Chapter 5: Consequences Associated with Testing (Adobe Connect Digital Workbook, PPT, Script, MP3)

Chapter 5.1 Review of Key Concepts from Chapters 1, 2, 3, and 4
Chapter 5.2 Consequences Associated with Testing
Chapter 5.3 Validity Questions Related to Consequences