6 Standard Setting
The standard setting process for the Dynamic Learning Maps® (DLM®) Alternate Assessment System in English language arts (ELA) and mathematics was originally conducted following the 2014–2015 administration. Four performance level descriptors (PLDs) were developed to describe performance on the assessment. A 4-day standard setting meeting specified cut points and included a follow-up evaluation of impact data and cut points.
This chapter provides a brief description of the development of the rationale for the standard setting approach; the policy PLDs; methods, preparation, procedures, and results of the original standard setting meeting and follow-up evaluation of the impact data and cut points; and specification of grade- and subject-specific PLDs, which were developed after approval of the consortium cuts. A more detailed description of the original DLM standard setting activities and results can be found in the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015) and in the corresponding peer-reviewed academic journal article (Clark et al., 2017).
6.1 Original Standard Setting Process
The 2014–2015 school year was the first fully operational testing year for the DLM assessments in ELA and mathematics. The operational testing window ended on June 12, 2015, and DLM staff conducted standard setting during June 15–18, 2015, in Kansas City, Missouri. The standard setting event included all states administering DLM assessments in 2014–2015 with the purpose of establishing a set of cut points for each of the two testing models. The DLM Technical Advisory Committee (TAC) advised on the standard setting methodology from early design through to the development of grade- and subject-specific PLDs and review of impact data after the event. Although the DLM Governance Board voted on acceptance of final cut points, individual states had the option to adopt the consortium cut points or develop their own independent cut points.
6.1.1 Standard Setting Approach: Rationale and Overview
The approach to standard setting was developed to be consistent with the DLM Alternate Assessment System’s design and to rely on established methods, such as recommended practices for developing, implementing, evaluating, and documenting standard settings (Cizek, 1996; Hambleton et al., 2012) and the Standards on Educational and Psychological Testing (American Educational Research Association et al., 2014). The DLM standard setting approach used DLM mastery classifications and drew from several established methods, including generalized holistic (Cizek & Bunch, 2006) and body of work (Kingston & Tiemann, 2012).
Because the DLM assessments are based on large, fine-grained learning maps and make use of diagnostic classification modeling rather than traditional psychometric methods, the standard setting approach relied on the aggregation of dichotomous classifications of linkage level mastery for each Essential Element (EE) in the blueprint. Drawing from the generalized holistic and body of work methods, the standard setting method used a profile approach to classify student mastery of linkage levels into performance levels (see Clark et al., 2017). Profiles provided a holistic view of student performance by summarizing across the EEs and linkage levels. Cut points were determined by evaluating the total number of mastered linkage levels. Although the number of mastered linkage levels is not an interval scale, the process for identifying the DLM cut points is roughly analogous to assigning a cut point along a scale score continuum.
Before making a final decision whether to use the profile approach, the DLM TAC reviewed a preliminary description of the proposed methods. At the TAC’s suggestion, DLM staff conducted a mock panel process using this profile-based approach to evaluate the feasibility of the rating task and the likelihood of obtaining sound judgments using this method. Figure 6.1 summarizes the complete set of sequential steps included in the DLM standard setting process. This includes steps conducted before, during, and after the on-site meeting during June 2015.
Note. Dark shading represents steps conducted at the standard setting meeting in June 2015.
6.1.2 Policy Performance Level Descriptors
Student results are reported as performance levels, and PLDs are used to inform the interpretation of those scores. The DLM Governance Board developed PLDs through a series of discussions and draft PLD reviews between July and December 2014. Discussion began at the July 2014 governance meeting with governance board members in attendance who had special education and assessment backgrounds. As part of the discussion, the group reviewed the language used in the general education consortia and in the Common Core State Standards for key features describing performance. Following the meeting, governance board members took draft PLDs back to their states and were responsible for collecting feedback at the state and local level according to their own state policies and practices for stakeholder involvement. Table 6.1 presents the final version of policy PLDs. The consortium-level definition of proficiency was at target. Policy PLDs served as anchors for panelists during the standard setting process.
|Performance level descriptors|
|The student demonstrates emerging understanding of and ability to apply content knowledge and skills represented by the Essential Elements.|
|The student’s understanding of and ability to apply targeted content knowledge and skills represented by the Essential Elements is approaching the target.|
|The student’s understanding of and ability to apply content knowledge and skills represented by the Essential Elements is at target.|
|The student demonstrates advanced understanding of and ability to apply targeted content knowledge and skills represented by the Essential Elements.|
6.1.3 Profile Development
Prior to the standard setting meeting, DLM staff generated profiles of student learning that summarized linkage level mastery for each assessed EE. First, for each EE and linkage level that was assessed, we calculated the students’ probability of mastery using the diagnostic classification model (see Chapter 5 of this manual). For each linkage level, all students with a probability greater than or equal to .8 received a linkage level mastery status of 1, or mastered. All students with a probability lower than .8 received a linkage level mastery status of 0, or not mastered. Maximum uncertainty occurs when the probability is .5 and maximum certainty when the probability approaches 0 or 1. Considering the risk of false positives and negatives, the threshold used to determine mastery classification was set at .8.
The threshold value was applied to student assessment data to create profiles of student mastery, which summarize linkage level mastery by EE. Profiles were created using data for each subject and grade. Each profile listed all the linkage levels for all the EEs from the blueprint, along with the conceptual area for each EE, with shaded boxes indicating the mastered linkage levels. Figure 6.2 provides an example profile for a hypothetical student.
Note. Green shading represents linkage level mastery.
Profiles were available for all students who participated in the fall or spring windows by May 15, 2015 (n = 14,278). The frequency with which each precise profile (i.e., pattern of linkage level mastery) occurred in this population was computed. Based on these results, the three most common profiles were selected for each possible total linkage level mastery value (i.e., total number of linkage levels mastered) for each grade and subject. In instances in which data were not available at a specific linkage level value (e.g., no students mastered exactly 47 linkage levels for a grade and subject), profiles were based on simulated data. To simulate profiles, the DLM test development teams used adjacent profiles for reference and created simulated profiles that represented likely patterns of mastery. Fewer than 10% of all the profiles developed were simulated. Further detail on specific procedures for preparing standard setting profiles may be found in Chapter 1 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
DLM staff worked with participating states in March 2015 to recruit standard setting panelists. States were responsible for communicating within their state to recruit potential panelists. Panelists sought were those with both content knowledge and expertise in the education and outcomes of students with the most significant cognitive disabilities, including educators and school and district administrators. Other subject matter experts, such as higher education institution faculty or state and regional educational staff, were also suggested for consideration. Employers were considered at the high school level only, specifically targeting companies that employ individuals with disabilities.
The 45 panelists who participated in standard setting represented varying backgrounds. Table 6.2 and Table 6.3 summarize their demographic information. Most of the selected panelists were classroom educators. Panelists had a range of years of experience with ELA, mathematics, and working with students with the most significant cognitive disabilities.
Approximately one third of participants had experience with setting standards for other assessments (n = 16). Some panelists already had experience with the DLM assessment, either from writing items (n = 6) or externally reviewing items and testlets (n = 8). Only three panelists reported having less than 1 year or no experience with alternate assessments: one was a classroom educator with 24 years of experience working with students with the most significant cognitive disabilities, one was a special education educator with 10 years of experience working with students with the most significant cognitive disabilities, and one was district staff. Further detail on standard setting volunteers, selection process, and panel composition may be found in Chapter 3 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
|American Indian/Alaska Native||1|
|Native Hawaiian/Pacific Islander||1|
|English language arts||16.0||0||40|
|Students with significant cognitive disabilities||16.3||0||41|
6.1.5 Meeting Procedures
Panelists participated in a profile-based standard setting procedure to make decisions about cut points. The panelists participated in four rounds of activities in which they moved from general to precise recommendations about cut points.
The primary tools of this procedure were range-finding folders and pinpointing folders. The range-finding folders contained profiles of student work that represented the scale range. Pinpointing folders contained profiles for specific areas of the range.
Throughout the procedure, DLM staff instructed panelists to use their best professional judgment and consider all students with the most significant cognitive disabilities to determine which performance level best described each profile. Each panel had at least two, and up to three, grade-level cut points to set.
The subsequent sections provide details of the final procedures, including quality assurance used for determining cut points. Further information regarding all meeting procedures and fidelity of the final procedures to the planned procedures may be found in Chapter 4 and the appendix of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
Panelists were provided with training both before and during the standard setting workshop. Advance training was available online, on demand, in the 10 days prior to the standard setting workshop. The advance training addressed the following topics:
- Students who take the DLM assessments
- Content of the assessment system, including DLM learning maps, EEs, claims and conceptual areas, linkage levels, and alignment
- Accessibility by design, including the framework for the DLM System’s cognitive taxonomy and strategies for maximizing accessibility of the content; the use of the Personal Needs and Preferences Profile to provide accessibility supports during the assessment; and the use of the First Contact survey to determine linkage level assignment
- Assessment design, including item types, testlet design, and sample items from various linkage levels in both subjects
- An overview of the assessment model, including test blueprints and the timing and selection of testlets administered
- A high-level introduction to two topics that would be covered in more detail during on-site training: the DLM approach to scoring and reporting and the steps in the standard setting process
Additional panelist training was conducted at the standard setting workshop. The purposes of on-site training were twofold: (1) to review advance training concepts that panelists had indicated less comfort with and (2) to complete a practice activity to prepare panelists for their responsibilities during the panel meeting. The practice activity consisted of range finding using training profiles for just a few total linkage levels mastered (e.g., 5, 10, 15, 20). Overall, panelists participated in approximately 8 hours of standard setting training before beginning the practice activity.
184.108.40.206 Range Finding
During the range-finding process, panelists reviewed a limited set of profiles to assign general divisions between the performance levels using a two-round process. The goal of range finding was to locate ranges (in terms of number of linkage levels mastered) in which panelists agreed that approximate cut points should exist.
First, panelists independently evaluated profiles and identified the performance level that best described each profile. Once all panelists completed their ratings, the facilitator obtained the performance level recommendations for each profile by a raise of hands.
After a table discussion of how panelists chose their ratings, the panelists were given the opportunity to adjust their independent ratings if they chose. A second round of ratings were recorded and shared with the group.
Using the second round of ratings, built-in logistic regression functions were used to calculate the probability of a profile being categorized in each performance level, conditioned on the number of linkage levels mastered, and the most likely cut points for each performance level were identified. In instances in which the logistic regression function could not identify a value (i.e., the group unanimously agreed on the categorization of profiles to performance levels, so there was no variance in the ratings to fit a logistic regression), the approximate cut point was determined as the midway point between the unanimous profiles. For example, if all profiles with 10 linkage levels mastered were unanimously rated as the Emerging performance level, and all profiles with 15 linkage levels mastered were unanimously rated as the Approaching the Target performance level, the approximate cut point was set at 13. Chapter 4 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015) provides greater detail on range finding and pinpointing and includes the number of linkage levels per grade and subject.
Pinpointing rounds followed range finding. During pinpointing, panelists reviewed additional profiles to refine the cut points. The goal of pinpointing was to pare down to specific cut points in terms of the number of linkage levels mastered within the general ranges determined in range finding, while not relying on conjunctive or compensatory judgments.
First, panelists reviewed profiles for the seven linkage levels including and around the cut point value identified during range finding. Next, panelists independently evaluated the leveled profiles and assigned each a performance level—those in the higher level and those in the lower level. Once all panelists completed their ratings, the facilitator obtained the recommendations for each profile by a raise of hands.
After discussion of the ratings, a second round of ratings commenced. Panelists were given the opportunity to adjust their independent ratings if they chose. Using the second round of ratings, built-in logistic regression functions were used to calculate the probability of a profile being categorized in each performance level, conditioned on the number of linkage levels mastered, and the most likely cut points for each performance level were identified. In instances in which the logistic regression function could not identify a value (e.g., the group unanimously agreed on the categorization of profiles to performance levels), psychometricians evaluated the results to determine the final recommended cut point based on panelist recommendations. Chapter 4 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015) provides greater detail on range finding and pinpointing and includes the number of linkage levels per grade and subject.
220.127.116.11 Panelist Evaluations and Panel-Recommended Cut Points
Across all cut points (N = 384), panelists indicated they were comfortable with the group-recommended cut point in 97.7% of cases. Table 6.4 provides the panelist comfort rating of group-recommended cut points. Only 2.3% of responses (n = 9) indicated a discomfort with a group-recommended cut point. For 13 out of 17 cut point panels (76.5%; i.e., one panel for each grade and subject), panelists indicated comfort with all three recommended cut points. Most recommendations for a change to the cut point were for only one of the three cut points for a given panel, and most often, the recommended changes differed from the initial recommendation by only a single linkage level. Chapter 5 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015) provides greater detail on final independent evaluations of panel-recommended cut points.
|Subject||N Panelists||N Ratings (n panelists \(\times\) n cut points evaluated)||``Yes’’ ratings||% Agreement|
|English language arts||22||177||170||96|
6.1.6 Smoothing the Cut Points
To mitigate the effect of sampling error and issues related to a system of cut points across a series of grade levels, adjustments were made to the panel-recommended cut points in an effort to systematically smooth distributions within the system of cut points being considered. The specific steps applied to each subject within each grade level can be found in the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015). The goal of the smoothing process was to have a more consistent percentage of students in each performance level and across grade levels within each subject. The smoothing process followed these steps:
- For each grade and subject, calculate the cumulative percentage of students at each number of total linkage levels mastered.
- Perform a probit transformation to convert each cumulative percentage to a z-score.
- Find the z-score associated with each of the panel-recommended cut points.
- For each z-score identified in Step 3, calculate a new weighted z-score by assigning 0.5 weight to the current z-score, and 0.25 weight to each adjacent grade. For Grades 3 and 11, which had only one adjacent grade, 0.667 weight was given to the current grade, and 0.333 weight was given to the adjacent grade. For example, when calculating the weighted z-score for the Grade 4 cut point between the Emerging and Approaching performance levels, 0.5 weight would be given to the z-score for the Grade 4 Emerging/Approaching cut point, 0.25 weight would be given to the z-score for the Grade 3 Emerging/Approaching cut point, and 0.25 weight would be given to the z-score for the Grade 5 Emerging/Approaching cut point.
- For each grade and subject, the total linkage levels mastered associated with the z-score closest to the weighted average for each cut point is the smoothed cut point.
For a complete description of the smoothing process, see the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
This section summarizes the panel-recommended and smoothed cut points and presents impact data for the final cut points. Additional detailed results are provided in Chapter 5 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
18.104.22.168 Panel Recommended and Smoothed Cut Points
Table 6.5 displays the cut point recommendations reached by the panelists following the range-finding and pinpointing process.
|Grade||Emerging/ Approaching||Approaching/ Target||Target/ Advanced||Minimum required linkage levels|
|English language arts|
As described in section 6.1.6, a smoothing procedure was applied to the panel-recommended cut points to mitigate the effect of sampling error and issues related to a system of cut points across a series of grade levels. Table 6.6 shows the smoothed cut points that were derived from the methods described above.
|Grade||Emerging/ Approaching||Approaching/ Target||Target/ Advanced||Minimum required linkage levels|
|English language arts|
22.214.171.124 Final Impact Data
Figure 6.3 and Figure 6.4 display the results of the smoothed cut points in terms of impact for ELA and mathematics, respectively. Chapter 5 of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015) reports the frequency distributions for the panel-recommended cut points. Table 6.7 includes the demographic data for students included in the impact data.
|Other health impairment||478||3.1|
|Specific learning disability||234||1.5|
|Two or more races||519||3.4|
|Native Hawaiian/Pacific Islander||60||0.4|
|English learning (EL) participation|
|Not EL eligible or monitored||14,947||97.6|
|EL eligible or monitored||346||2.3|
|English language arts complexity band|
|Mathematics complexity band|
|† Demographic variables were not required in 2014–2015.|
6.1.8 External Evaluation of Standard Setting Process and Results
The DLM TAC chair was on-site for the duration of the standard setting event and reported that the standard setting meeting was well planned and implemented, the staff were helpful to the panelists, and the panelists worked hard to set standards. The full TAC accepted a resolution about the adequacy, quality of judgments, and extent to which the process met professional standards. The TAC chair memorandum and TAC resolution are provided in Appendix L of the 2015 Integrated Model Standard Setting: English Language Arts and Mathematics Technical Report (Karvonen, Clark, et al., 2015).
The panel-recommended cut points, adjusted cut points, and associated impact data for both sets of cut points were presented to the TAC and governance board for review. The TAC supported the DLM smoothing method and resulting adjusted cut points. Following the states’ review process and discussion with DLM staff, the DLM Governance Board voted to accept the DLM-recommended smoothed cut points as the final consortium cut points with no further adjustment.
6.1.9 Grade Level and Subject Performance Level Descriptors
Based on the general approach to standard setting, which relied on mastery profiles to anchor panelists’ content-based judgments, grade- and subject-specific PLDs were not used during standard setting. Instead, they emerged based on the final cut points and example profiles, and they were syntheses of content from the more fine-grained linkage level descriptors. Grade- and subject-specific PLDs were completed after standard setting in 2015.
Standard setting panelists began the process by drafting lists of skills and understandings that they determined were characteristic of specific performance levels after cut points had been established. In general, these draft lists of skills and understandings were based on linkage levels described in the mastery profiles used for standard setting—either separate linkage level statements or syntheses of multiple statements.
These draft lists of important skills were collected and used as a starting point for DLM test development teams as they developed language for grade- and subject-specific descriptions for each performance level in every grade for both ELA and mathematics. The purpose of these content descriptions was to provide information about the knowledge and skills that are typical for each performance level.
Test development teams prepared to draft PLDs by consulting published research related to PLD development (e.g., Perie, 2008) and reviewing PLDs developed for other assessment systems to consider grain size of descriptive language and formats for publication. In addition to the draft lists generated by standard setting panelists, test development teams used the following materials as they drafted specific language for each grade- and subject-specific PLD:
- DLM assessment blueprints
- Cut points set at standard setting for each grade and subject
- Sample mastery profiles from the standard setting event
- Essential Element Concept Maps for each EE included on the blueprint for each grade level
- Linkage level descriptions and associated sections of the DLM learning maps for every EE
- The Standards of Mathematical Practice
Test development teams reviewed the EEs, Essential Element Concept Maps, and linkage level descriptors on the profiles to determine skills and understandings assessed at the grade level. These skills and understandings come from each conceptual area assessed at the specific grade level and vary from one grade to the next. Then, the teams reviewed the draft skill lists created by standard setting panelists and final cut points approved by the consortium. Test development teams then used the sample mastery profiles to consider the types and ranges of student performances that could lead to placement into specific performance levels. Using these multiple sources of information, the test development teams evaluated the placement of skills into each of the four performance levels.
While not an exhaustive list of all the content related to each EE from the DLM learning maps, the synthesis of standard setting panelist judgments and test development team judgments provided the basis for descriptions of the typical performance of students showing mastery at each performance level. As test development teams drafted PLDs for each grade, they reviewed the descriptors in relation to each other and the underlying DLM learning map to ensure that there was differentiation in skills from one grade to the next. In very few cases, in which panelists recommended skill placement that was inconsistent with development of content knowledge as represented in the DLM maps, test development teams adjusted the placement of skills. This was only done in cases in which the original judgment of the panelists was inconsistent with a logical ordering of skill development from one level to the next in a particular grade.
DLM staff prepared initial drafts of the grade- and subject-specific descriptions for Grade 3. Project staff reviewed these drafts internally. Additional drafts were prepared for Grades 4 and 5. The DLM Governance Board reviewed a combination of Grades 3, 4, and 5 at the December 2015 consortium governance meeting. Project staff asked the governance board to review the progression of descriptors from grade to grade within the four performance levels in Grades 3, 4, and 5 and to provide general feedback to the initial drafts. Feedback from the governance board focused on utility for educators and parents and structuring the descriptions to make them more user-friendly. The primary responses to governance board feedback were to:
- Review technical language in existing drafts and simplify wherever possible.
- Organize each grade and subject-specific description so that a broad conceptual statement about what students at a performance level typically knew and were able to do was followed by specific skills and understandings shown in bulleted lists.
- Organize descriptions consistently within and across grades so that related skills were described in the same sequence within each level in a grade.
DLM staff delivered drafts of all grade- and subject-specific descriptions to the governance board for review in February 2016. After the review period ended, test development teams responded to feedback received by adjusting technical descriptions, removing any content that exceeded the requirements of EEs in the grade level, simplifying language, and clarifying descriptions of skills and understandings. These adjustments were followed by a full editorial review. Appendix E.1 contains examples of grade level and subject PLDs, and all PLDs for ELA and mathematics are available on the DLM website.
In summary, the performance levels for DLM assessments are determined by applying cut points to the total number of linkage levels mastered within each subject. The cut points were developed by experienced panelists evaluating mastery profiles that summarize the skills and understandings that a student mastered in each subject. Thus, the resulting performance levels are based on the most common profiles of skill mastery that align to the policy PLDs adopted by the DLM Governance Board in 2015. Finally, grade- and subject-specific PLDs that describe the skills most commonly mastered by students who achieve at each performance level were developed based on the content of the EEs and the cut points derived from the standard setting process.