[Home ] [Archive]

[ فارسی ]

Main Menu

Home

Journal Information

Articles archive

For Authors

For Reviewers

Registration

Contact us

Site Facilities

Publication Statistics

Publication Information

Publisher
Kharazmi University
Managing Director
Prof. Fazel Asadi Amjad
Editor-in-Chief
Dr. Hossein Talebzadeh


ISSN: 1735-1634
EISSN: 3115-8560

Search in website

Indexing Databases

SID
ISC
Civilica
Magiran
Noormags
Google Scholar
Institute for Humanities and Cultural Studies

Search published articles

Showing 4 results for Bias

Rater Bias in Assessing Iranian EFL Learners’ Writing Performance

Mahnaz Saeidi, Mandana Yousefi, Purya Baghayei,
Volume 16, Issue 1 (3-2013)

Abstract

Evidence suggests that variability in the ratings of students’ essays results not only from their differences in their writing ability, but also from certain extraneous sources. In other words, the outcome of the rating of essays can be biased by factors which relate to the rater, task, and situation, or an interaction of all or any of these factors which make the inferences and decisions made about students’ writing ability undependable. The purpose of this study, therefore, was to examine the issue of variability in rater judgments as a source of measurement error this was done in relation to EFL learners’ essay writing assessment. Thirty two Iranian sophomore students majoring in English language participated in this study. The learners’ narrative essays were rated by six different raters and the results were analyzed using many-facet Rasch measurement as implemented in the computer program FACETS. The findings suggest that there are significant differences among raters concerning their harshness as well as several cases of bias due to the rater-examinee interaction. This study provides a valuable understanding of how effective and reliable rating can be realized, and how the fairness and accuracy of subjective performance can be assessed.

Investigating the Effect of the Training Program on Raters’ Oral Performance Assessment: A Mixed-Methods Study on Raters’ Think-Aloud Verbal Protocols

Houman Bijani, Mona Khabiri,
Volume 20, Issue 1 (4-2017)

Abstract

Although the use of verbal protocols is growing in oral assessment, research on the use of raters’ verbal protocols is rather rare. Moreover, those few studies did not use a mixed-methods design. Therefore, this study investigated the possible impacts of rater training on novice and experienced raters’ application of a specified set of standards in rating. To meet this objective, the study made use of verbal protocols produced by 20 raters who scored 300 test takers’ oral performances and analyzed the data both qualitatively and quantitatively. The outcomes demonstrated that through applying the training program, the raters were able to concentrate more on linguistic, discourse, and phonological features; therefore, the extent of their agreement increased specifically among the inexperienced raters. The analysis of verbal protocols also revealed that training how to apply a well-defined rating scale can foster its use for raters both validly and reliably. Various groups of raters approach the task of rating in different ways, which cannot be explored through pure statistical analysis. Thus, think-aloud verbal protocols can shed light on the vague sides of the issue and add to the validity of oral language assessment. Moreover, since the results of this study showed that inexperienced raters can produce protocols of higher quality and quantity in the use of macro and micro strategies to evaluate test takers’ performances, there is no evidence based on which decision makers should exclude inexperienced raters solely because of their lack of adequate experience.

The Effect of CLIL on Language Skills and Components: A Meta-Analysis

Seyyed Ali Ostovar-Namaghi, Shiva Nakhaee,
Volume 22, Issue 2 (9-2019)

Abstract

Content and Language Integrated Learning (CLIL) has recently been the focus of numerous studies in language education since it aims to overcome the pitfalls of form-focused and meaning-focused instruction by systematically integrating content and language. This meta-analysis aims to synthesize the findings of 22 primary studies that tested the effect of CLIL on language skills and components. Guiding the analysis are three questions: What is the overall combined effect of CLIL on language skills and components? How do moderators condition the effect of CLIL? To what extent the overall combined effect is conditioned by publication bias? The overall effect size was found to be g=0.81, which represents a medium effect size with respect to Plonsky and Oswald’s (2014) scale. The results of moderator analysis show that CLIL has the highest effect on students’ grammar and listening proficiency and in lower levels of education, especially in elementary schools. It also has the highest effect when combined with hotel management as the subject matter. Fail-safe N test of publication bias shows that the significant positive outcome of CLIL cannot be accounted for by publication bias. The findings have clear implications for practitioners, researchers and curriculum developers.

Construct Validation of a Rating Scale through a Training Program: A Multifaceted Rasch Analysis in Speaking Assessment

Wander Lowie, Houman Bijani, Mohammad Reza Oroji, Zeinab Khalafi, Pouya Abbasi,
Volume 26, Issue 2 (9-2023)

Abstract

Performance testing including the use of rating scales has become highly widespread in the evaluation of second/foreign oral assessment. However, few studies have used a pre-, post-training design investigating the impact of a training program on the reduction of raters’ biases to the rating scale categories resulting in increase in their consistency measures. Besides, no study has used MFRM including the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale category, and test version all in a single study. 20 EFL teachers rated the oral performances produced by 200 test takers before and after a training program using an analytic rating scale including fluency, grammar, vocabulary, intelligibility, cohesion and comprehension categories. The outcome of the study indicated that MFRM can be used to investigate raters’ scoring behavior and can result in enhancement in rater training and validating the functionality of the rating scale descriptors. Training can also result in higher levels of interrater consistency and reduced levels of severity/leniency; however, it cannot turn raters into duplicates of one another, but can make them more self-consistent. Training helped raters use the descriptors of the rating scale more efficiently of its various band descriptors resulting in reduced halo effect. Finally, the raters improved consistency and reduced rater-scale category biases after the training program. The remaining differences regarding bias measures could probably be attributed to the result of different ways of interpreting the scoring rubrics which is due to raters’ confusion in the accurate application of the scale.

Page 1 from 1

Persian site map - English site map - Created in 0.07 seconds with 30 queries by YEKTAWEB 4729