Assessment of differential item functioning in personnel selection



A psychological test should be developed to accurately estimate the measured construct, being insensitive to extraneous factors that could affect it. The way in which the various external factors jeopardize the item responses is known as test bias and minimizing these influences is a primary concern for test developers. A relatively recent category of psychometric techniques that aim to identify those items that function differently in a group of test-takers is called Differential Item Functioning (DIF). The purpose of this paper is to present, in a non-technical manner, the conceptual foundation of DIF, and the most common applied techniques. We will discuss methods based on logistic regression, those using Item Response Theory models as well as the Mantel-Haenszel method. We will then discuss the strengths and limitations of each method, conditions of application, and the main software packages used to conduct such analyses.


Differential item functioning; validity; test bias.

Full Text:



Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: Wiley-Interscience.

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Cellar, D. F., Miller, M. L., Doverspike, D. D., & Klawsky, J. D. (1996). Comparison of factor structures and criterion-related validity coefficients for two measures of personality based on the five factor model. Journal of Applied Psychology, 81(6), 694–704.

Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30.

Choi, S. W., Gibbons, L. E., & Crane, P. K. (2016, 03 03). Package ‘lordif’: Logistic Ordinal Regression Differential Item Functioning using IRT. Retrieved from

Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurements: Issues and Practice, 17(1), 31–-44.

Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44(11), 115–123.

De Fruyt F., De Bolle M., McCrae R.R., Terracciano A., Costa P.T., Jr. Assessing the universal structure of personality in early adolescence: The NEO-PI-R and NEO-PI-3 in 24 cultures (2009). Assessment, 6(3), 301–311.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In P. W. Holland, & H. I. Braun, Test validity (pp. 129–145). Hillsdale: Erlbaum.

Ion, A. & Iliescu, D. (2017). Measurement equivalence of personality measures in low-and high-stake testing contexts. Journal of Personality and Individual Differences, 110, 1–6.

Item Discrimination, Guessing and Carelessness Asymptotes: Estimating IRT Parameters with Rasch. (2017, October 11). Retrived from

Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.

Karami, H. (2012). An introduction to differential item functioning. The International Journal of Educational and Psychological Assessment, 11(2), 59–76.

Lim, B., & Ployhart, R. E. (2006). Assessing the convergent and discriminant validity of Goldberg's International Personality Item Pool: A multitrait-multimethod examination. Organizational Research Methods, 9, 29–54.

Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff's Delta Plot. Journal of Statistical Software, 59(1), 1–19.

Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862.

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719–748.

McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. The Journal of Mathematical Sociology, 4(1), 103–120.

McNamara, T., Roever, C., & Young, R. F. (2007). Language Testing: The Social Dimension. Oxford: Blackwell Publishing.

Miller, T. R., & Spray, J. A. (1993). Logistic Discriminant Function Analysis for DIF Identification of Polytomously Scored Items. Journal of Educational Measurement, 30(2), 107–122.

Morgeson, F. P., Campion, M.A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., Schmitt, N. (2007). Are we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel selection. Personnel Psychology, 60, 1029–1049.

Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.

Ones, D. S., Dilchert, S., Viswesvaran, C., Judge, T. A. (2007). In support of personality assessment in organizational settings. Personnel Psychology, 60, 995–1027.

Stark, S., Chernyshenko, O. S., Chan, K. Y., Lee, W. C., & Drasgow, F. (2001). Effects of the testing situation on item responding: Cause for concern. Journal of Applied Psychology, 86(5), 943–953.

Schmit, M.J., & Ryan, A.M. (1993) The big five in personnel selection: Factor structure in applicant and nonapplicant populations. Journal of Applied Psychology, 78, 966–974.

Swaminathan, H., & Rogers, J. H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.

Thomas, R. D., & Zumbo, B. D. (1998). Variable importance in logistic regression based on partitioning an R-squared measure. Presented at the Psychometric Society Meetings, Urbana, Illinois.

Wikipedia (2017, September 25). Differential item functioning. Retrieved from

Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-Type (ordinal) item scores. Ottawa, Ontario: Directorate of Human Resources Research and Evaluation, Department of National Defense.


  • There are currently no refbacks.

Comments on this article

View all comments

Copyright (c) 2017 Psihologia Resurselor Umane

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Asociatia de Psihologie Industriala si Organizationala
Strada Grigore Moisil, nr. 42, sector 2, București, cod poștal 023796

Creative Commons License
This work by Psihologia Resurselor Umane is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at