Approaches to Evaluation of Training: Theory & Practice
Evaluation is an integral part of most instructional design (ID) models. Evaluation tools and methodologies help determine the effectiveness of instructional interventions. Despite its importance, there is evidence that evaluations of training programs are often inconsistent or missing (Carnevale & Schulz, 1990; Holcomb, 1993; McMahon & Carter, 1990; Rossi et al., 1979). Possible explanations for inadequate evaluations include: insufficient budget allocated; insufficient time allocated; lack of expertise; blind trust in training solutions; or lack of methods and tools (see, for example, McEvoy & Buller, 1990).
Part of the explanation may be that the task of evaluation is complex in itself. Evaluating training interventions with regard to learning, transfer, and organizational impact involves a number of complexity factors. These complexity factors are associated with the dynamic and ongoing interactions of the various dimensions and attributes of organizational and training goals, trainees, training situations, and instructional technologies.
Evaluation goals involve multiple purposes at different levels. These purposes include evaluation of student learning, evaluation of instructional materials, transfer of training, return on investment, and so on. Attaining these multiple purposes may require the collaboration of different people in different parts of an organization. Furthermore, not all goals may be well-defined and some may change.
Different approaches to evaluation of training indicating how complexity factors associated with evaluation are addressed below. Furthermore, how technology can be used to support this process is suggested. In the following section, different approaches to evaluation and associated models are discussed. Next, recent studies concerning evaluation practice are presented. In the final section, opportunities for automated evaluation systems are discussed. The article concludes with recommendations for further research.
Approaches to Evaluation of Training
Commonly used approaches to educational evaluation have their roots in systematic approaches to the design of training. They are typified by the instructional system development (ISD) methodologies, which emerged in the USA in the 1950s and 1960s and are represented in the works of Gagné and Briggs (1974), Goldstein (1993), and Mager (1962). Evaluation is traditionally represented as the final stage in a systematic approach with the purpose being to improve interventions (formative evaluation) or make a judgment about worth and effectiveness (summative evaluation) (Gustafson & Branch, 1997). More recent ISD models incorporate evaluation throughout the process (see, for example, Tennyson, 1999).
Six general approaches to educational evaluation can be identified (Bramley, 1991; Worthen & Sanders, 1987), as follows:
Goal-based and systems-based approaches are predominantly used in the evaluation of training (Philips, 1991). Various frameworks for evaluation of training programs have been proposed under the influence of these two approaches. The most influential framework has come from Kirkpatrick (Carnevale & Schulz, 1990; Dixon, 1996; Gordon, 1991; Philips, 1991, 1997). Kirkpatrick’s work generated a great deal of subsequent work (Bramley, 1996; Hamblin, 1974; Warr et al., 1978). Kirkpatrick’s model (1959) follows the goal-based evaluation approach and is based on four simple questions that translate into four levels of evaluation. These four levels are widely known as reaction, learning, behavior, and results. On the other hand, under the systems approach, the most influential models include: Context, Input, Process, Product (CIPP) Model (Worthen & Sanders, 1987); Training Validation System (TVS) Approach (Fitz-Enz, 1994); and Input, Process, Output, Outcome (IPO) Model (Bushnell, 1990).
Table 1 presents a comparison of several system-based models (CIPP, IPO, & TVS) with a goal-based model (Kirkpatrick’s). Goal-based models (such as Kirkpatrick’s four levels) may help practitioners think about the purposes of evaluation ranging from purely technical to covertly political purpose. However, these models do not define the steps necessary to achieve purposes and do not address the ways to utilize results to improve training. The difficulty for practitioners following such models is in selecting and implementing appropriate evaluation methods (quantitative, qualitative, or mixed). Because of their apparent simplicity, “trainers jump feet first into using [such] model[s] without taking the time to assess their needs and resources or to determine how they’ll apply the model and the results” (Bernthal, 1995, p. 41). Naturally, many organizations do not use the entire model, and training ends up being evaluated only at the reaction, or at best, at the learning level. As the level of evaluation goes up, the complexities involved increase. This may explain why only levels 1 and 2 are used.
Table 1. Goal-based and systems-based approaches to evaluation
On the other hand, systems-based models (e.g., CIPP, IPO, and TVS) seem to be more useful in terms of thinking about the overall context and situation but they may not provide sufficient granularity. Systems-based models may not represent the dynamic interactions between the design and the evaluation of training. Few of these models provide detailed descriptions of the processes involved in each steps. None provide tools for evaluation. Furthermore, these models do not address the collaborative process of evaluation, that is, the different roles and responsibilities that people may play during an evaluation process.
Current Practices in Evaluation of Training
Evaluation becomes more important when one considers that while American industries, for example, annually spend up to $100 billion on training and development, not more than “10 per cent of these expenditures actually result in transfer to the job” (Baldwin & Ford, 1988, p.63). This can be explained by reports that indicate that not all training programs are consistently evaluated (Carnevale & Shulz, 1990). The American Society for Training and Development (ASTD) found that 45 percent of surveyed organizations only gauged trainees’ reactions to courses (Bassi & van Buren, 1999). Overall, 93% of training courses are evaluated at Level One, 52% of the courses are evaluated at Level Two, 31% of the courses are evaluated at Level Three and 28% of the courses are evaluated at Level Four. These data clearly represent a bias in the area of evaluation for simple and superficial analysis.
This situation does not seem to be very different in Europe, as evident in two European Commission projects that have recently collected data exploring evaluation practices in Europe. The first one is the Promoting Added Value through Evaluation (PAVE) project, which was funded under the European Commission’s Leonardo da Vinci program in 1999 (Donoghue, 1999). The study examined a sample of organizations (small, medium, and large), which had signaled some commitment to training and evaluation by embarking on the UK’s Investors in People (IiP) standard (Sadler-Smith et al., 1999). Analysis of the responses to surveys by these organizations suggested that formative and summative evaluations were not widely used. On the other hand, immediate and context (needs analysis) evaluations are more widely used. In the majority of the cases, the responsibility for evaluation was that of managers and the most frequently used methods were informal feedback and questionnaires. The majority of respondents claimed to assess the impact on employee performance (the ‘learning’ level). Less than one-third of the respondents claimed to assess the impact of training on organization (the ‘results’ level). Operational reasons for evaluating training were cited more frequently than strategic ones. However, information derived from evaluations was used mostly for feedback to individuals, less to revise the training process, and rarely for return on investment decisions. Also, there were some statistically significant effects of organizational size on evaluation practice. Small firms are constrained in the extent to which they can evaluate their training by the internal resources of the firm. Managers are probably responsible for all aspects of training (Sadler-Smith et al., 1999).
The second study was conducted under the Advanced Design Approaches for Personalized Training-Interactive Tools (ADAPTIT) project. ADAPTIT is a European project within the Information Society Technologies programme that is providing design methods and tools to guide a training designer according to the latest cognitive science and standardisation principles(Eseryel & Spector, 2000). In an effort to explore the current approaches to instructional design, a series of surveys conducted in a variety of sectors including transport, education, business, and industry in Europe. The participants were asked about activities that take place including the interim products produced during the evaluation process, such as a list of revisions or an evaluation plan. In general, systematic and planned evaluation was not found in practice nor was the distinction between formative and summative evaluation. Formative evaluation does not seem to take place explicitly while summative evaluation is not fully carried out. The most common activities of evaluation seem to be the evaluation of student performance (i.e., assessment) and there is not enough evidence that evaluation results of any type are used to revise the training design (Eseryel et al., 2001). It is important to note here that the majority of the participants expressed a need for evaluation software to support their practice.
Using Computer to Automate Evaluation Process
For evaluations to have a substantive and pervasive impact on the development of training programs, internal resources and personnel such as training designers, trainers, training managers, and chief personnel will need to become increasingly involved as program evaluators. While using external evaluation specialists has validity advantages, time and budget constraints make this option highly impractical in most cases. Thus, the mentality that evaluation is strictly the province of experts often results in there being no evaluation at all. These considerations make a case for the convenience and cost-effectiveness of internal evaluations. However, the obvious concern is whether the internal team possesses the expertise required to conduct the evaluation, and if they do, how the bias of internal evaluators can be minimized. Therefore, just as automated expert systems are being developed to guide the design of instructional programs (Spector et al., 1993), so might such systems be created for instructional evaluations. Lack of expertise of training designers in evaluation, pressures for increased productivity, and the need to standardize evaluation process to ensure effectiveness of training products are some of the elements that may provide motivations for supporting organization’s evaluation with technology. Such systems might also help minimize the potential bias of internal evaluators.
Ross & Morrison (1997) suggest two categories of functions that automated evaluation systems appear likely to incorporate. The first is automation of the planning process via expert guidance; the second is the automation of the data collection process.
For automated planning through expert guidance, an operational or procedural model can be used during the planning stages to assist the evaluator in planning an appropriate evaluation. The expert program will solicit key information from the evaluator and offer recommendations regarding possible strategies. Input information categories for the expert system include:
Based on this input, an expert system can provide guidance on possible evaluation design orientations, appropriate collection methods, data analysis techniques, reporting formats, and dissemination strategies. Such expert guidance can be in the form of flexible general strategies and guidelines (weak advising approach). Given the complexities associated with the nature of evaluation, a weak advising approach such as this is more appropriate than a strong approach that would replace the human decision maker in the process. Indeed, weak advising systems that supplement rather than replace human expertise have generally been more successful when complex procedures and processes are involved (Spector et al., 1993).
Such a system may also embed automated data collection functions for increased efficiency. Functionality of automated data collection systems may involve intelligent test scoring of procedural and declarative knowledge, automation of individual profile interpretations, and intelligent advice during the process of learning (Bunderson et al., 1989). These applications can provide increased ability to diagnose the strengths and weaknesses of the training program in producing the desired outcomes. Especially, for the purposes of formative evaluation this means that the training program can be dynamically and continuously improved as it is being designed.
Automated evaluation planning and automated data collection systems embedded in a generic instructional design tool may be an efficient and integrated solution for training organizations. In such a system it will also be possible to provide advice on revising the training materials based on the evaluation feedback. Therefore, evaluation data, individual performance data, and revision items can be tagged to the learning objects in a training program. ADAPTIT instructional design tool is one of the systems that provide such an integrated solution for training organizations (Eseryel et al., 2001).
Different approaches to evaluation of training discussed herein indicate that the activities involved in evaluation of training are complex and not always well-structured. Since evaluation activities in training situations involve multiple goals associated with multiple levels, evaluation should perhaps be viewed as a collaborative activity between training designers, training managers, trainers, floor managers, and possibly others.
There is a need for a unifying model for evaluation theory, research, and practice that will account for the collaborative nature of and complexities involved in the evaluation of training. None of the available models for training evaluation seem to account for these two aspects of evaluation. Existing models fall short in comprehensiveness and they fail to provide tools that guide organizations in their evaluation systems and procedures. Not surprisingly, organizations are experiencing problems with respect to developing consistent evaluation approaches. Only a small percentage of organizations succeed in establishing a sound evaluation process that feeds back into the training design process. Evaluation activities are limited to reaction sheets and student testing without proper revision of training materials based on evaluation results. Perhaps lack of experience in evaluation is one of the reasons for not consistently evaluating. In this case, the organization may consider hiring an external evaluator, but that will be costly and time consuming. Considering the need for the use of internal resources and personnel in organizations, expert system technology can be useful in providing expert support and guidance and increase the power and efficiency of evaluation. Such expert systems can be used by external evaluators as well.
Strong, completely automated systems offer apparent advantages, but their development and dissemination lag behind their conceptualization. Future research needs to focus on the barriers to evaluation of training, how training is being evaluated and integrated with the training design, how the collaborative process of evaluation is being managed and how they may be assisted. This will be helpful in guiding the efforts for both the unifying theory of evaluation and in developing automated evaluation systems.