Analysis of Multiple Choice Tests used for Two Psychology courses in 2007.
The use if Item response theory has been used in the analysis of many different kinds of tests and assessment. It has especially become popular for analysis in educational and psychological measurement and assessment settings (Embretson &Reise, 2000). Item response theory is especially useful when analyzing tests where possible responses to be questions (items) are stipulated. Usually theses come in the form of multiple choices per item from where the respondent or examinee must select the choice he deems fit. This paper seeks to use the Item Response theory to perform an analysis of psychology course-Personality and Individual differences and Language, thinking and memory tests undertaken in the year 2007.
For purposes of establishing the relationship between individuals’ responses to items on the test and the particular traits that the tests seek to measure the item response theory was particularly recommendable.
In this particular analysis, the key underlying traits being measured or analysed are centred on scholastic abilities which includes in this case; intelligence, memory and reasoning capabilities. The test measures the ability of the students to have adequately grasped and understood concepts taught in the two courses as well as their ability to recall information given during the learning process.
In the analysis of the multiple choice questions for two courses presents tests which seek to determine the level of ability or latent traits. The tests are multiple-choice in nature which means that the examinee has to choose the answer that seems right. This makes it easier to grade them reliably(Embretson &Reise, 2000). Different items test for knowledge in different areas of the course content but generally, all items test for are the same latent trait; scholastic ability. The tests have therefore been dichotomously scored and are thus binary items. The tests provide item information varies over different levels of theta. Theta is generally considered in Item Response Theory to represent the ability of the examinee or respondent or the amount of latent trait being measured present in the examinee.IRT modal was chosen for evaluating the multiple choice test because the method . Unlike CTT defines a scale for the underlying trait (theta) that is measured by a given set of items. In this evaluation the items are calibrated with respect to the theta scale.
The Method is particularly reliable fro evaluating these tests as the precision required for measurement is dependent on different abilities of theta -latent trait being measured (Mislevy&Back, 1986).
.Evaluation of the tests has to be done in sections where items are divided into subsets according to the area which they are designed to test. In this way assessment can be done for different areas of the course independently, although still in keeping with the general idea, which is to measure scholastic abilities in the two courses.
IN evaluating the tests from the two courses, the usefulness of the IRT model used will be heavily contingent upon the extent to which the assumptions involved are considered. In this evaluation, there assumptions will be made. The first one is the uni-dimensionality of the tests. It is assumed that the ability of students to get correct answers in the tests is mainly dependent upon their scholastic ability. It is assumed that the trait accounts for the variation between item responses and that each individual taking the test posses some amount of this scholastic ability.
Model-fit
The model chosen for the evaluation of the data accurately fits the type quantity and nature of data.
The tests are subjected to the 3 parameter logistic model. This is because three parameters must be considered. The first parameter is the discriminating parameters which considers how well the items or questions were able to discriminate between the students that had high ability and those with lesser abilities. Those with high abilities(scholastic should therefore be able to answer the questions easier and get more correct responses per item as compared to those with lesser abilities. This parameter allows the evaluation to consider how well questions discriminate among individuals with different levels of ability. The second parameter considered in this evaluation is the location or threshold. This refers to the difficulty of each item relative to ability. The third parameter which the evaluation consider sis the guessing parameter where it is considered that some individual may give correct responses to items solely as a result of guessing(Barker,2001).
This would infer that the latent trait being measured does not in that case contribute to the response and thus is not able to properly indicate the level of latent trait being measured which is the scholastic ability. Models were also tested wit the chi-squared difference tests .All parameters included were considered necessary and were considered an improvement to the fit.
The three parameter logistic (3PL) model puts into consideration the guessing parameter which the 2pl model ignores and the discrimination parameter absent in the 1pl model. The number of students considered in this course seems to be indicative of a 2PL model fit. However, this is disregarded due to the importance of considering the guessing parameter in a college test where multiple choices are involved. It would be unreasonable to assume that none of the individual guessed any of the answers given (Barker, 2001). Graphs are plotted where the potential to give correct answers to responses are plotted against ability. This results in graphical curves. The sum of these curves yields a test information curve. The sum of these graphs result sin s test information curve.
The second assumption made in this evaluation is that the model is specified properly so that the item characteristics and nature are able to reveal the correct theta.
Plotted item characteristic graphs reveal that the smooth s-shaped curves which are indicative of a typical IRT model graphs. Parameters and their standard errors were estimated and the resultant slopes examined. Some
Item characteristic graph lines are steeper for some items s compared to others. This is indicative of higher discrimination in those items than in the others. They are able to better discriminate among the examinees through the number of people that give correct responses to them. Some item characteristic graphs for certain items reveal more flat curves and this means their level of discrimination is relatively low or moderate.
To check the assumptions of unidimensionality, factor analysis was conducted and this was followed by a confirmatory factor analysis. Item loadings were checked in respect to number of correct responses. Using data from graphs plotted, it was determined that in all the item response categories, the correct responses were mainly given by those individuals who reflected high ability levels. From this, inferences can be drawn that the tests items are highly effective in their discrimination over ally. Item information provided had little interference with the measurement precision of the latent traits.
Item difficulty parameter showed variation with different items, and in different sections of the test. Some items recorded relatively low levels of items difficulty compared to others. Entire subsets of the test showed variation in the difficulty parameter as compared to others. The Personality and Individual differences tests for instance revealed lower threshold of difficulty as compared to the Language, thinking and memory test. This indicates that items were more complex in the latter in respect to examinee abilities. Increased item information resulted in the increase of item clarity which resulted in higher potentials of giving correct responses to items (Lord, 1980). In such cases, plotted graphs reflected higher instances of correct responses across the ability continuum. Item characteristic graphs of such items therefore revealed relatively smooth and flatter s-shaped graphs, as with increased information the level of discrimination was lowered. In the same regard the level of difficulty was also lowered.
Conclusion
The two tests were reliable in measuring the latent trait which was scholastic ability. The positioning of individual highly on the ability scale directly corresponded to the frequency of their giving of correct responses. The tests also proved a good tool for testing as the discrimination ability of items was fair through the tests .True evaluation of individuals was therefore actualised.
References
Lord, F.M.(1980).Applications of Item Response Theory to Practical Testing Problems. New Jersey;
Erlbaum.
Mislevy, R.J.& Back, R.D.(1986).Item Analysis and Test Scoring with Binary Logistic
Models.Indiana: Scientific Software ,Inc.
Barker , F.B.(2001).The Basics of Item Response Theory. Wisconsin:ERIC Clearing House on
Assessment and Evaluation.
Embretson, S.E. & Reise, S.P.(2000).Item Response Theory for Psychologists. New Jersey: Erlbaum.