Comments on: The basics of test score reliability for educators https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/ See Every Student. Thu, 16 Feb 2023 22:18:47 +0000 hourly 1 https://wordpress.org/?v=6.4.2 By: Catherine Close https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-120 Thu, 28 Aug 2014 17:29:09 +0000 http://rliblog.wpengine.com/?p=391#comment-120 In reply to G Gordon.

I completely agree with you about GE scores. As a matter of fact, fluctuations in GE scores are not unique to the STAR assessments! The GE score is simply used to show what the average student would score at a particular testing time in a specific grade. It doesn’t really tell you what the student knows, and it’s not the best score to look at for graphing trends. The scaled score is better for that purpose, and in STAR, it’s the scaled score that drives the trend line in the Student Progress Monitoring Report.

Sometimes fidelity issues with test administration are to blame for wild fluctuations in any student scores, but assuming you’re sure that’s not the issue in your case, there are other potential reasons, such as measurement error, fluctuations in student performance, and regression to the mean. For more about these reasons, you might find this guidance document helpful.

]]>
By: Catherine Close https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-119 Tue, 26 Aug 2014 19:01:00 +0000 http://rliblog.wpengine.com/?p=391#comment-119 In reply to Bryan.

The short answer is no, students who have tested previously with STAR will have slightly more efficient test sessions, but the reliability of these tests should be similar to the reliability of students taking their first STAR test.

To explain according to your example, student A is testing for the first time, so STAR does not “know” the student’s achievement level. In this case, the software automatically assigns a starting point that is typical/average for fifth graders. As the student progresses through the test, the software scores each item response and—via the efficient CAT engine—homes in on his achievement level with great precision by the end of the test. In student B’s case, a previous STAR test gives the software a starting achievement level, and this is where the new test starts off. So instead of assuming student B is average, we start her off at her estimated achievement level. The difference here is not one of precision (both students will have precise scores) but rather of efficiency in terms of the number of items needed to fully capture achievement levels. Student A might need a few more items for the software to converge on his achievement level, whereas student B’s achievement estimate is captured sooner based on her prior testing history.

Both STAR Math and STAR Reading contain 34 items per test to ensure that we can reliably measure all students, even the student testing for the first time. At the end of the test, all things being equal, the two students should score similarly if they have matched achievement levels.

]]>
By: Bryan https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-118 Mon, 25 Aug 2014 18:20:57 +0000 http://rliblog.wpengine.com/?p=391#comment-118 I’m curious to hear more about test-retest or alternative forms reliability given the adaptive nature of the test. When you retest, are they simply starting from scratch or does the retest pick up from where the first test ended?

Here’s a real-world example – suppose I have two 5th graders who are on exactly the same skill level. Student A is taking the test for the first time, student B is taking the test for the 5th time. Assuming they are exactly equal in skill level, how close will their results be? Does a student who has a historical test record of multiple data points have more precise data compared to a similar student taking the test for the first time?

]]>
By: Catherine Close https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-117 Mon, 25 Aug 2014 14:33:18 +0000 http://rliblog.wpengine.com/?p=391#comment-117 In reply to Andrea Quinn.

Great question. See my reply to Tristan’s very similar question. Thanks for reading!

]]>
By: Catherine Close https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-116 Mon, 25 Aug 2014 14:22:28 +0000 http://rliblog.wpengine.com/?p=391#comment-116 In reply to Tristan.

That is an excellent question, and I could write a full blog post on just that! Briefly, the standard error of measurement is what we use to quantify measurement error. Recall that no measurement is without some degree of uncertainty, and that uncertainty is due to measurement error. It follows that as measurement error increases, reliability decreases. We want the standard error of measurement to be as small as possible to maximize reliability.

]]>
By: G Gordon https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-115 Mon, 25 Aug 2014 13:00:03 +0000 http://rliblog.wpengine.com/?p=391#comment-115 If the STAR Reading tests are so reliable, why do I have students whose scores look like a heartbeat when graphed? When a student scores a 12.9 GE, does the next STAR test start that student at that level and then decrease the difficulty of the questions if the student misses some? I get very concerned when I get students whose GE scores jump around: 12.9 to 6.4 to 10.6 to 8 and so on.

]]>
By: Tristan https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-114 Fri, 22 Aug 2014 14:16:04 +0000 http://rliblog.wpengine.com/?p=391#comment-114 Where does the standard of error of measurement fit into this conversation?

]]>
By: Andrea Quinn https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-113 Thu, 21 Aug 2014 23:43:43 +0000 http://rliblog.wpengine.com/?p=391#comment-113 Catherine, thanks for your explanations – where does standard error measure fall into this story? Isn’t that important as well?

]]>
By: Catherine Close https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-112 Thu, 21 Aug 2014 19:36:48 +0000 http://rliblog.wpengine.com/?p=391#comment-112 In reply to Randy Hoover.

Thank you, Randy. You make a great point in that validity is what we are aiming for, but to get there we need to do some things, and ensuring test scores are consistent is one of those things. Hence, reliability is a necessary (though not sufficient) requirement for validity. If you are interested in reading more, I briefly addressed this in my previous blog post, “Understanding the Reliability and Validity of Test Scores.”

]]>
By: Randy Hoover https://www.renaissance.com/2014/08/21/the-basics-of-test-score-reliability-for-educators/#comment-111 Thu, 21 Aug 2014 18:00:57 +0000 http://rliblog.wpengine.com/?p=391#comment-111 Well done, but it needs to be explicitly noted that when a test is not valid, reliability is moot.

]]>