Educational Technology & Society 5 (2) 2002
ISSN 1436-4522

Approaches to Evaluation of Training: Theory & Practice

Deniz Eseryel
Syracuse University, IDD&E, 330 Huntington Hall
Syracuse, New York 13244 USA
Tel: +1 315 443 3703
Fax: +1 315 443 9218
deseryel@mailbox.syr.edu

 

ABSTRACT

There is an on-going debate in the field of evaluation about which approach is best to facilitate the processes involved. This article reviews current approaches to evaluation of training both in theory and in practice. Particular attention is paid to the complexities associated with evaluation practice and whether these are addressed in the theory. Furthermore, possible means of expediting the performance of evaluations and expanding the range and precision of data collection using automated systems are discussed. Recommendations for further research are also discussed.

Keywords: Automated evaluation, Expert guidance, Training evaluation


Introduction

Evaluation is an integral part of most instructional design (ID) models. Evaluation tools and methodologies help determine the effectiveness of instructional interventions. Despite its importance, there is evidence that evaluations of training programs are often inconsistent or missing (Carnevale & Schulz, 1990; Holcomb, 1993; McMahon & Carter, 1990; Rossi et al., 1979). Possible explanations for inadequate evaluations include: insufficient budget allocated; insufficient time allocated; lack of expertise; blind trust in training solutions; or lack of methods and tools (see, for example, McEvoy & Buller, 1990).

Part of the explanation may be that the task of evaluation is complex in itself. Evaluating training interventions with regard to learning, transfer, and organizational impact involves a number of complexity factors. These complexity factors are associated with the dynamic and ongoing interactions of the various dimensions and attributes of organizational and training goals, trainees, training situations, and instructional technologies.

Evaluation goals involve multiple purposes at different levels. These purposes include evaluation of student learning, evaluation of instructional materials, transfer of training, return on investment, and so on. Attaining these multiple purposes may require the collaboration of different people in different parts of an organization. Furthermore, not all goals may be well-defined and some may change.

Different approaches to evaluation of training indicating how complexity factors associated with evaluation are addressed below. Furthermore, how technology can be used to support this process is suggested. In the following section, different approaches to evaluation and associated models are discussed. Next, recent studies concerning evaluation practice are presented. In the final section, opportunities for automated evaluation systems are discussed. The article concludes with recommendations for further research.

 

Approaches to Evaluation of Training

Commonly used approaches to educational evaluation have their roots in systematic approaches to the design of training. They are typified by the instructional system development (ISD) methodologies, which emerged in the USA in the 1950s and 1960s and are represented in the works of Gagné and Briggs (1974), Goldstein (1993), and Mager (1962). Evaluation is traditionally represented as the final stage in a systematic approach with the purpose being to improve interventions (formative evaluation) or make a judgment about worth and effectiveness (summative evaluation) (Gustafson & Branch, 1997). More recent ISD models incorporate evaluation throughout the process (see, for example, Tennyson, 1999).

Six general approaches to educational evaluation can be identified (Bramley, 1991; Worthen & Sanders, 1987), as follows:

  • Goal-based evaluation
  • Goal-free evaluation
  • Responsive evaluation
  • Systems evaluation
  • Professional review
  • Quasi-legal

Goal-based and systems-based approaches are predominantly used in the evaluation of training (Philips, 1991). Various frameworks for evaluation of training programs have been proposed under the influence of these two approaches. The most influential framework has come from Kirkpatrick (Carnevale & Schulz, 1990; Dixon, 1996; Gordon, 1991; Philips, 1991, 1997). Kirkpatrick’s work generated a great deal of subsequent work (Bramley, 1996; Hamblin, 1974; Warr et al., 1978). Kirkpatrick’s model (1959) follows the goal-based evaluation approach and is based on four simple questions that translate into four levels of evaluation. These four levels are widely known as reaction, learning, behavior, and results. On the other hand, under the systems approach, the most influential models include: Context, Input, Process, Product (CIPP) Model (Worthen & Sanders, 1987); Training Validation System (TVS) Approach (Fitz-Enz, 1994); and Input, Process, Output, Outcome (IPO) Model (Bushnell, 1990).

Table 1 presents a comparison of several system-based models (CIPP, IPO, & TVS) with a goal-based model (Kirkpatrick’s). Goal-based models (such as Kirkpatrick’s four levels) may help practitioners think about the purposes of evaluation ranging from purely technical to covertly political purpose. However, these models do not define the steps necessary to achieve purposes and do not address the ways to utilize results to improve training. The difficulty for practitioners following such models is in selecting and implementing appropriate evaluation methods (quantitative, qualitative, or mixed). Because of their apparent simplicity, “trainers jump feet first into using [such] model[s] without taking the time to assess their needs and resources or to determine how they’ll apply the model and the results” (Bernthal, 1995, p. 41). Naturally, many organizations do not use the entire model, and training ends up being evaluated only at the reaction, or at best, at the learning level. As the level of evaluation goes up, the complexities involved increase. This may explain why only levels 1 and 2 are used.

 

Kirkpatrick (1959)

CIPP Model (1987)

IPO Model (1990)

TVS Model (1994)

1. Reaction: to gather data on participants reactions at the end of a training program

1. Context: obtaining information about the situation to decide on educational needs and to establish program objectives

1. Input: evaluation of system performance indicators such as trainee qualifications, availability of materials, appropriateness of training, etc.

1. Situation: collecting pre-training data to ascertain current levels of performance within the organization and defining a desirable level of future performance

 

2. Learning: to assess whether the learning objectives for the program are met

 

2. Input: identifying educational strategies most likely to achieve the desired result

2. Process: embraces planning, design, development, and delivery of training programs

2. Intervention: identifying the reason for the existence of the gap between the present and desirable performance to find out if training is the solution to the problem

 

3. Behavior: to assess whether job performance changes as a result of training

3. Process: assessing the implementation of the educational program

3. Output: Gathering data resulting from the training interventions

3. Impact: evaluating the difference between the pre- and post-training data

 

4. Results: to assess costs vs. benefits of training programs, i.e., organizational impact in terms of reduced costs, improved quality of work, increased quantity of work, etc.

4. Product: gathering information regarding the results of the educational intervention to interpret its worth and merit

4. Outcomes: longer-term results associated with improvement in the corporation’s bottom line- its profitability, competitiveness, etc.

4. Value: measuring differences in quality, productivity, service, or sales, all of which can be expressed in terms of dollars

Table 1. Goal-based and systems-based approaches to evaluation

 

On the other hand, systems-based models (e.g., CIPP, IPO, and TVS) seem to be more useful in terms of thinking about the overall context and situation but they may not provide sufficient granularity. Systems-based models may not represent the dynamic interactions between the design and the evaluation of training. Few of these models provide detailed descriptions of the processes involved in each steps. None provide tools for evaluation. Furthermore, these models do not address the collaborative process of evaluation, that is, the different roles and responsibilities that people may play during an evaluation process.

 

Current Practices in Evaluation of Training

Evaluation becomes more important when one considers that while American industries, for example, annually spend up to $100 billion on training and development, not more than “10 per cent of these expenditures actually result in transfer to the job” (Baldwin & Ford, 1988, p.63). This can be explained by reports that indicate that not all training programs are consistently evaluated (Carnevale & Shulz, 1990). The American Society for Training and Development (ASTD) found that 45 percent of surveyed organizations only gauged trainees’ reactions to courses (Bassi & van Buren, 1999). Overall, 93% of training courses are evaluated at Level One, 52% of the courses are evaluated at Level Two, 31% of the courses are evaluated at Level Three and 28% of the courses are evaluated at Level Four. These data clearly represent a bias in the area of evaluation for simple and superficial analysis.

This situation does not seem to be very different in Europe, as evident in two European Commission projects that have recently collected data exploring evaluation practices in Europe. The first one is the Promoting Added Value through Evaluation (PAVE) project, which was funded under the European Commission’s Leonardo da Vinci program in 1999 (Donoghue, 1999). The study examined a sample of organizations (small, medium, and large), which had signaled some commitment to training and evaluation by embarking on the UK’s Investors in People (IiP) standard (Sadler-Smith et al., 1999). Analysis of the responses to surveys by these organizations suggested that formative and summative evaluations were not widely used. On the other hand, immediate and context (needs analysis) evaluations are more widely used. In the majority of the cases, the responsibility for evaluation was that of managers and the most frequently used methods were informal feedback and questionnaires. The majority of respondents claimed to assess the impact on employee performance (the ‘learning’ level). Less than one-third of the respondents claimed to assess the impact of training on organization (the ‘results’ level). Operational reasons for evaluating training were cited more frequently than strategic ones. However, information derived from evaluations was used mostly for feedback to individuals, less to revise the training process, and rarely for return on investment decisions. Also, there were some statistically significant effects of organizational size on evaluation practice. Small firms are constrained in the extent to which they can evaluate their training by the internal resources of the firm. Managers are probably responsible for all aspects of training (Sadler-Smith et al., 1999).

The second study was conducted under the Advanced Design Approaches for Personalized Training-Interactive Tools (ADAPTIT) project. ADAPTIT is a European project within the Information Society Technologies programme that is providing design methods and tools to guide a training designer according to the latest cognitive science and standardisation principles(Eseryel & Spector, 2000). In an effort to explore the current approaches to instructional design, a series of surveys conducted in a variety of sectors including transport, education, business, and industry in Europe. The participants were asked about activities that take place including the interim products produced during the evaluation process, such as a list of revisions or an evaluation plan. In general, systematic and planned evaluation was not found in practice nor was the distinction between formative and summative evaluation. Formative evaluation does not seem to take place explicitly while summative evaluation is not fully carried out. The most common activities of evaluation seem to be the evaluation of student performance (i.e., assessment) and there is not enough evidence that evaluation results of any type are used to revise the training design (Eseryel et al., 2001). It is important to note here that the majority of the participants expressed a need for evaluation software to support their practice.

 

Using Computer to Automate Evaluation Process

For evaluations to have a substantive and pervasive impact on the development of training programs, internal resources and personnel such as training designers, trainers, training managers, and chief personnel will need to become increasingly involved as program evaluators. While using external evaluation specialists has validity advantages, time and budget constraints make this option highly impractical in most cases. Thus, the mentality that evaluation is strictly the province of experts often results in there being no evaluation at all. These considerations make a case for the convenience and cost-effectiveness of internal evaluations. However, the obvious concern is whether the internal team possesses the expertise required to conduct the evaluation, and if they do, how the bias of internal evaluators can be minimized. Therefore, just as automated expert systems are being developed to guide the design of instructional programs (Spector et al., 1993), so might such systems be created for instructional evaluations. Lack of expertise of training designers in evaluation, pressures for increased productivity, and the need to standardize evaluation process to ensure effectiveness of training products are some of the elements that may provide motivations for supporting organization’s evaluation with technology. Such systems might also help minimize the potential bias of internal evaluators.

Ross & Morrison (1997) suggest two categories of functions that automated evaluation systems appear likely to incorporate. The first is automation of the planning process via expert guidance; the second is the automation of the data collection process.

For automated planning through expert guidance, an operational  or procedural model can be used during the planning stages to assist the evaluator in planning an appropriate evaluation. The expert program will solicit key information from the evaluator and offer recommendations regarding possible strategies. Input information categories for the expert system include:

  • Purpose of evaluation (formative or summative)
  • Type of evaluation objectives (cognitive, affective, behavioral, impact)
  • Level of evaluation (reaction, learning, behavior, organizational impact)
  • Type of instructional objectives (declarative knowledge, procedural learning, attitudes)
  • Type of instructional delivery (classroom-based, technology-based, mixed)
  • Size and type of participant groups (individual, small group, whole group)

Based on this input, an expert system can provide guidance on possible evaluation design orientations, appropriate collection methods, data analysis techniques, reporting formats, and dissemination strategies. Such expert guidance can be in the form of flexible general strategies and guidelines (weak advising approach). Given the complexities associated with the nature of evaluation, a weak advising approach such as this is more appropriate than a strong approach that would replace the human decision maker in the process. Indeed, weak advising systems that supplement rather than replace human expertise have generally been more successful when complex procedures and processes are involved (Spector et al., 1993).

Such a system may also embed automated data collection functions for increased efficiency. Functionality of automated data collection systems may involve intelligent test scoring of procedural and declarative knowledge, automation of individual profile interpretations, and intelligent advice during the process of learning (Bunderson et al., 1989). These applications can provide increased ability to diagnose the strengths and weaknesses of the training program in producing the desired outcomes. Especially, for the purposes of formative evaluation this means that the training program can be dynamically and continuously improved as it is being designed.

Automated evaluation planning and automated data collection systems embedded in a generic instructional design tool may be an efficient and integrated solution for training organizations. In such a system it will also be possible to provide advice on revising the training materials based on the evaluation feedback. Therefore, evaluation data, individual performance data, and revision items can be tagged to the learning objects in a training program. ADAPTIT instructional design tool is one of the systems that provide such an integrated solution for training organizations (Eseryel et al., 2001).

 

Conclusion

Different approaches to evaluation of training discussed herein indicate that the activities involved in evaluation of training are complex and not always well-structured. Since evaluation activities in training situations involve multiple goals associated with multiple levels, evaluation should perhaps be viewed as a collaborative activity between training designers, training managers, trainers, floor managers, and possibly others.

There is a need for a unifying model for evaluation theory, research, and practice that will account for the collaborative nature of and complexities involved in the evaluation of training. None of the available models for training evaluation seem to account for these two aspects of evaluation. Existing models fall short in comprehensiveness and they fail to provide tools that guide organizations in their evaluation systems and procedures. Not surprisingly, organizations are experiencing problems with respect to developing consistent evaluation approaches. Only a small percentage of organizations succeed in establishing a sound evaluation process that feeds back into the training design process. Evaluation activities are limited to reaction sheets and student testing without proper revision of training materials based on evaluation results. Perhaps lack of experience in evaluation is one of the reasons for not consistently evaluating. In this case, the organization may consider hiring an external evaluator, but that will be costly and time consuming. Considering the need for the use of internal resources and personnel in organizations, expert system technology can be useful in providing expert support and guidance and increase the power and efficiency of evaluation. Such expert systems can be used by external evaluators as well.

Strong, completely automated systems offer apparent advantages, but their development and dissemination lag behind their conceptualization. Future research needs to focus on the barriers to evaluation of training, how training is being evaluated and integrated with the training design, how the collaborative process of evaluation is being managed and how they may be assisted. This will be helpful in guiding the efforts for both the unifying theory of evaluation and in developing automated evaluation systems.

 

References

  • Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Review, 26(3), 201-213.
  • Bassi, L. J., & van Buren, M. E. (1999). 1999 ASTD state of the industry report. Alexandria, VA: The American Society for Training and Development.
  • Bramley, P. (1996). Evaluating training effectiveness. Maidenhead: McGraw-Hill.
  • Bernthal, P. R. (1995). Evaluation that goes the distance. Training and Development Journal, 49(9), 41-45.
  • Bunderson, C. V., Inouye, D. K., & Olsen, J. B. (1989). The four generations of computerized educational measurement. In R. L. Linn (Ed.). Educational measurement (3rd ed.) (pp. 367-407). New York: Macmillan.
  • Bushnell, D. S. (March, 1990). Input, process, output: A model for evaluating training. Training and Development Journal, 44(3), 41-43.
  • Carnevale, A. P., & Schulz, E.R. (July, 1990). Return on investment: Accounting for training. Training and Development Journal, 44(7), S1-S32.
  • Dixon, N. M. (1996). New routes to evaluation. Training and Development, 50(5), 82-86.
  • Donoghue, F. (1999). Promoting added value through evaluation of training. Dublin: European Commission Leonardo-PAVE Project.
  • Eseryel, D., Schuver-van Blanken, M., & Spector, J. M. (2001). Current practice in designing training for complex skills: Implications for design and evaluation of ADAPT-IT. In C. Montgomerie & J. Vitelli (Eds.), Proceedings of ED-MEDIA 2001: World Conference on Educational Multimedia, Hypermedia, & Telecommunications (pp. 474-479). Tampere, Finland: Association for Advancement of Computing in Education.
  • Eseryel, D., & Spector, J. M. (2000). Assessing adaptive instructional design tools and methods in ADAPT-IT. In M. Crawford & M. Simonson (Eds.), Annual Proceedings of Selected Research and Development Papers Presented at the National Convention of the Association for Educational Communications and Technology (Vol. 1) (pp. 121-129). Denver, CO: Association for Educational Communications and Technology.
  • Fitz-Enz, J. (July, 1994). Yes…you can weigh training’s value. Training, 31(7), 54-58.
  • Gagné, R., & Briggs, L. J. (1974). Principles of instructional design. New York: Holton, Rinehart & Winston.
  • Goldstein, I. (1993). Training in organizations: Needs assessment, development, & evaluation. Monterey, CA: Brooks-Cole.
  • Gordon, J. (August, 1991). Measuring the “goodness” of training. Training, 28(8), 19-25.
  • Gustafson, K. L, & Branch, R. B. (1997). Survey of instructional development models (3rd ed.). Syracuse, NY: ERIC Clearinghouse on Information and Technology.
  • Hamblin, A. C. (1974). Evaluation and control of training. Maidenhead: McGraw-Hill.
  • Holcomb, J. (1993). Make training worth every penny. Del Mar, CA: Wharton.
  • Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of the American Society of Training Directors, 13, 3-26.
  • Mager, R. F. (1962). Preparing objectives for programmed instruction. San Francisco, CA: Fearon Publishers.
  • McEvoy, G. M., & Buller, P. F. (August, 1990). Five uneasy pieces in the training evaluation puzzle. Training and Development Journal, 44(8), 39-42.
  • McMahon, F. A., & Carter, E. M. A. (1990). The great training robbery. New York: The Falmer Press.
  • Phillips, J. J. (1991). Handbook of training evaluation and measurement methods. (2nd ed.). Houston, TX: Gulf.
  • Phillips, J. J. (July, 1997). A rational approach to evaluating training programs including calculating ROI. Journal of Lending and Credit Risk Management, 79(11), 43-50.
  • Rossi, P.H., Freeman, H. E., & Wright, S. R. (1979). Evaluation: A systematic approach. Beverly Hills, CA: Sage.
  • Ross, S. M., & Morrison, G. R. (1997). Measurement and evaluation approaches in instructional design: Historical roots and current perspectives. In R. D. Tennyson, F. Scott, N. M. Seel, & S. Dijkstra (Eds.), Instructional design: Theory, research and models. (Vol.1) (pp.327-351). Hillsdale, NJ: Lawrence Erlbaum.
  • Sadler-Smith, E., Down, S., & Field, J. (1999). Adding value to HRD: Evaluation, investors in people, and small firm training. Human Resource Development, 2(4), 369-390.
  • Spector, J. M., Polson, M. C., & Muraida, D. J. (1993) (Eds.). Automating instructional design: Concepts and issues. Englewood Cliffs, NJ: Educational Technology Publications, Inc.
  • Tennyson, R. D. (1999). Instructional development and ISD4 methodology. Performance Improvement, 38(6), 19-27.
  • Warr, P., Bird, M., & Rackcam, N. (1978). Evaluation of management training. London: Gower.
  • Worthen, B. R., & Sanders, J. R. (1987). Educational evaluation. New York: Longman.

decoration