1. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1985). Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.
2. Arends, R. I. (2006). Performance assessment in perspective: History, opportunities, and challenges. In S. Castle & B. D. Shaklee (Eds.), Assessing teacher performance: Performance-based assessment in teacher education (pp. 3–22). Lanham, MD: Rowman & Littlefield Education.
3. Aseltine, J. M., Faryniarz, J. O., & Rigazio-DiGilio, A. J. (2006). A performance-based approach to teacher development and school improvement: Supervision for learning. Association for Supervision and Curriculum Development, Alexandria, VA.
4. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests (Vol. 1). Oxford University Press.
5. Brennan, R. L., & Johnson, E. G. (1995). Generalizability of performance assessments. Educational Measurement: Issues and Practice, 14(4), 25-27.
6. Brown, A. (2012). Interlocutor and rater trianing. In G. Fulcher, & F. Davidson (Eds.), The Routledge Handbook of Language Testing. New York, NY: Routledge.
7. California Commission on Teacher Credentialing (2016). CalTPA Handbook. Sacramento: California Commission on Teacher Credentialing.
8. Chung, R. R. (2008). Beyond assessment: Performance assessments in teacher education. Teacher Education Quarterly, 35 (1), 7-28.
9. Cizek, G. J. (1996). Standard‐setting guidelines. Educational Measurement: Issues and Practice, 15(1), 13-21.
10. Cizek, G. J., & Bunch, M. B.(2007). Standard setting: A guide to establishing and evaluating performance standards on tests. UK. Thousand Oaks.
11. Danielson, C. (2011). Enhancing professional practice: A framework for teaching. Association for Supervision and Curriculum Development (ASCD).
12. Danielson, C., & Marquez, E. (1998). A collection of performance tasks and rubrics: High school mathematics. Larchmont, NY: Eye on Education.
13. Darling-Hammond, L. (2010). Evaluating teacher effectiveness: How teacher performance assessments can measure and improve teaching. Center for American Progress.
14. Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and teacher education, 16(5), 523-545.
15. Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T. & McNamara, T. (1999). Dictionary of language testing. Cambridge, UK: Cambridge University Press.
16. Delandshere, G., & Arens, S. A. (2003). Examining the quality of the evidence in preservice teacher portfolios. Journal of Teacher Education, 54(1), 57-73.
17. Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions. Princeton, NJ: Educational Testing Service.
18. Fehrmann, M. L., Woehr, D. J., & Arthur, W. (1991). The Angoff cutoff score method: The impact of frame-of-reference rater training. Educational and psychological measurement, 51(4), 857-872.
19. Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational researcher, 18(9), 27-32.
20. Fulcher, G. (1996). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13(2), 208-238.
21. Fulcher, G. (2012). Scoring performance tests. In G. Fulcher, & F. Davidson (Eds.), The Routledge handbook of language testing (pp. 378-392). New York, NY: Routledge.
22. Girod, M., & Girod, G. R. (2008). Simulation and the need for practice in teacher preparation. Journal of Technology and Teacher Education, 16(3), 307-337.
23. Haertel, E. H. (2002). Standard setting as a participatory process: Implications for validation of standards-based accountability programs. Educational measurement issues and practice, 21(1), 16-22.
24. Hamilton, L. (2003). Assessment as a policy tool. Review of research in education, 27, 25-68.
25. İşlek, D., & Hürsen, Ç. (2014). The evaluation of students’ views concerning the teacher qualifications for the total quality implementations. Procedia - Social and Behavioral Sciences, 116(2), 4834–4838.
26. Jaeger, R. M. (1991). Establishing standards for teacher certification tests. Educational Measurement, Issues and Practices, 9(4), 15-20.
27. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425-461.
28. Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5-17.
29. Kiany, G., Karimi, M., & Norouzi, M. (2017). An assessment scheme for ELT performance: An iranian case of Farhangian University. Journal of Teaching Language Skills. In Press.
30. Koirala, H. P., Davis, M., & Johnson, P. (2008). Development of a performance assessment task and rubric to measure prospective secondary school mathematics teachers’ pedagogical content knowledge and skills. Journal of Mathematics Teacher Education, 11(2), 127-138.
31. Marshall, K. (2009). Rethinking teacher evaluation and supervision how to work smart, build collaboration, and close the achievement gap. Jossey-Bass, San Francisco.
32. Medley, D. M., (1982). Teacher competency testing and the teacher educators. Association of Teacher Educators and the Bureau of Educational Research: University of Virginia.
33. Messick, S. (1989). Validity. In R.L. Linn (Ed.) Educational measurement (pp.13-103). Washington, DC: National Council of Measurement in Education and The American Council on Measurement in Education.
34. Navidinia, H., Kiany, G. R., Akbari, R., & Ghafarsamar, R. (2015). EFL teacher performance evaluation in Iranian high schools: Examining the effectiveness of the status quo and setting the groundwork for developing an alternative model. The International Journal of Humanities, 21(4), 27-53.
35. Pecheone, R., & Chung, R. R. (2006). Evidence in teacher education: The performance assessment for California teachers. Journal of Teacher Education, 57(1), 22-36.
36. Ross, S. J. (2012). Claims, evidence, and inference in performance assessment. In G. Fulcher & F. Davidson (Eds.). The Routledge handbook of language testing. New York, NY: Routledge
37. Sanders, W. L., Wright, S. P., & Horn, S. P. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation.Journal of personnel evaluation in education, 11(1), 57-67.
38. Sandholtz, J. H. (2012). Predictions and performance on the PACT teaching event: Case studies of high and low performers. Teacher Education Quarterly,39(3), 103-126.
39. Sergiovanni, T. J., & Starrat, R.J. (2002). Supervision: A redefinition (7th ed.) Boston, MA: McGraw Hill.
40. Shinkfield, A. J., & Stufflebeam, D. L. (1995). School professionals’ guide to improving teacher evaluation systems. In Teacher Evaluation (pp. 81-172). Springer Netherlands.
41. Stewart, A. R., Scalzo, J. N., Merino, N., & Nilsen, K. (2015). Beyond the criteria: Evidence of teacher learning in a performance assessment.Teacher Education Quarterly, 42(3), 33.
42. Taut, S., & Sun, Y. (2014). The development and implementation of a national, standards-based, multi-method teacher performance assessment system in Chile. Education Policy Analysis Archives, 22 (71).
43. Torgerson, C. W., Macy, S. R., Beare, P., & Tanner, D. E. (2009). Fresno assessment of student teachers: A teacher performance assessment that informs practice. Issues in Teacher Education, 18(1), 63-82.
44. Wilkerson, J.R. & Lange, W.S. (2003). Portfolio, the pied piper of teacher certification assessments: Legal and psychometric issues. Education Policy Analysis Archives, 11(45).