Educational Technology & Society 3(4) 2000
ISSN 1436-4522

Peering Through a Glass Darkly: Integrative evaluation of an on-line course

Josie Taylor
Senior Lecturer, Institute of Educational Technology
The Open University, Walton Hall
Milton Keynes  MK7 6AA United Kingdom
Tel: +44 1908 655965
Fax: +44 1908 653744

Mark Woodman
Professor in Information Technology
Middlesex University, School of Computing Science
Trent Park, Bramley Road
London N14 4YZ United Kingdom
Tel: +44 7887 708 384
Fax: +44 20 8411 5924

Tamara Sumner
Assistant Professor
Dept of Computer Science and Institute of Cognitive Science
Campus Box 430, University of Colorado at Boulder
Boulder, CO 8039-0430 USA
Tel: +1 303 492 2233
Fax: +1 303 492 2844

Canan Tosunoglu Blake
Research Fellow, Institute of Educational Technology
The Open University, Walton Hall
Milton Keynes MK7 6AA United Kingdom
Tel: +44 1908 654966
Fax: +44 1908 653744



In this study we describe a wide-spectrum approach to the integrative evaluation of an innovative introductory course in computing. Since both the syllabus, designed in consultation with industry, and the method of presentation of study materials are new, the course requires close scrutiny. It is presented in the distance mode to a class of around 5,000 students and uses a full range of media: paper, broadcast television, interactive CD-ROM, a Web-oriented programming environment, a Web site and computer conferencing. The evaluation began with developmental testing whilst the course was in production, and then used web-based and paper-based questionnaires once the course was running. Other sources of data, in the form of observation of computing conferences and an instrumented version of the Smalltalk programming environment, also provide insight into students’ views and behaviour. This paper discusses the ways in which the evaluation study was conducted and lessons we learnt in the process of integrating all the information at our disposal to satisfy a number of stakeholders.

Keywords: Evaluation, integration, distance education, developmental testing, on-line questionnaires

Introduction and Background

For the academic year beginning in February 1998, after 50 person-years of intensive research and development, the U.K.’s Open University (OU) began the presentation of a new introductory computing course, Computing: An Object-oriented Approach. Its OU code is ‘M206’ and this term is used to refer to the course throughout the paper. The course not only claimed to redefine the curriculum for the teaching of Computer Science, but also deployed a wide variety of media, delivered in ways which took advantage of the (then newly emerging) opportunities that the web was offering. This made it a challenge to the university’s normal course production systems that are geared up to industrial levels of production of primarily print-based materials. OU courses are “knowledge products” delivered through an assortment of media. Traditionally, the assortment has been limited to two or three media, usually paper (in the form of high quality booklets) and broadcast TV programmes or audio and video cassettes and possibly software. In contrast to other distance teaching approaches, the OU’s is not teacher centric. OU academics deliver their expertise to students in the knowledge products; these learning resources are the course – they constitute the teaching to be evaluated.

All these factors also led M206 to being one of the most expensive courses that the University had mounted, so it was regarded as critical that an appropriate level of evaluation be conducted both during production of the course and during presentation to students.

In this paper, we begin by describing the course, the stakeholders and the evaluation information which was potentially available to us.  We then describe the multi-stranded integrative evaluation strategy we adopted and the outcomes of the evaluation. Then we discuss issues arising from our experience and lessons we learnt in the process.


The Course: ‘M206’

As previously mentioned, M206 departs markedly in its approach to the introduction to computing and both implicitly and explicitly challenges and extends accepted thinking on the nature and practice of the discipline of Computing Science.  Both the syllabus and the method of presentation of study materials were new at the time of presentation. Eschewing any distinction between “computer science” and “software engineering”, the syllabus was industry-oriented: it introduced and thereby defined the discipline as being concerned with the team-based development of complex interactive software systems (not programs).

The OU’s vocational degree courses, such as those in computing, are often the most popular with its students – who are typically aged in their late thirties and in employment. Therefore, when devising the syllabus in the early 1990s, the M206 course team had to assess what it believed to be the technologies for software development relevant to the end of the decade. The team made two crucial decisions in this respect: object technology would be fundamental to the development of complex systems and such systems would often involve network computing, especially the Internet. With an industry-oriented agenda, the course team determined that it would adopt an “objects-first” approach to computing and chose to use the programming language Smalltalk-80, particularly as Smalltalk programming environments lend themselves to tailoring and the production of student-alterable microworlds which were to be extensively used in the pedagogy. Using LearningWorks (Goldberg et al., 1997), which allows a Smalltalk environment to be used as a vehicle for interactive courseware, the course team would go on to design a set of “LearningBooks”, which are software modules presented to the user in a book metaphor, being organised into sections and sections into pages (Woodman et al., 1999).

The course is very practical and a learning-by-doing pedagogy was designed accordingly (Woodman and Holland, 1996). The main course objectives were that, after successfully studying the course, students should:

  1. generally understand computing and software and confidently use their extensive vocabulary;
  2. skilfully use complex practical software;
  3. recognise complex systems, be able speculate about their parts and to design and complete or extend them;
  4. apply object-oriented analysis and design ideas;
  5. develop small Smalltalk applications including their graphical user interface (GUI);
  6. understand issues concerning large scale software development including those due to group working;
  7. describe concepts of human–computer interaction and be able to analyse, design and implement user interfaces;
  8. appreciate the benefits and risks associated with global networked computing.


Stakeholders in the Evaluation

In any organisation where analysis of teaching quality is important, the number of stakeholders, and their influence and power, affects the evaluation strategy. In our context there are five main groups of stakeholders:

Senior management: The most senior academics in the university have formal responsibility for the curriculum, for the use of educational technology and for student affairs. They must ensure that the course arrived in good shape, that students were able to engage with it appropriately and that all the necessary quality assurance evaluation activities are taking place. This is particularly important the bigger and more expensive a course is, as is the case with M206. For senior management, levels of student satisfaction are appropriate forms of data, as would be a summative evaluation of the overall success of the course. Summative evaluation would address issues such as: is this an appropriate way for future courses to proceed, what lessons were learnt for the future, what was achieved? Our evaluation could not accommodate costs due to organisational features of the OU. Although this aspect would clearly figure highly in the overall assessment by senior managers, we did not have access to relevant information to address it directly.

Institute of Educational Technology (IET): IET is an autonomous OU academic unit whose remit includes a responsibility to routinely collect data on courses which it does through an annual survey of courses organised by its Student Survey Office (SSO). Reports produced by the SSO are not externally published, but are made widely available internally to encourage the sharing of experience across faculties and courses within the OU. Evaluation data would be collected that tracks the ability of students to perform certain types of activity (e.g. to install all the various pieces of software – such as LearningWorks and the conferencing software – and arranging Internet provision). This data is normally collected through the process of developmental testing which involves close observation of a small number of subjects selected to resemble typical M206 students and then by questionnaire once the course is in presentation.

Academic Unit: Evaluation of supported open learning in distance education courses is an important distinguishing feature of the OU’s external profile for teaching excellence and in particular in terms of the teaching quality guidelines established by UK’s national Quality Assurance Agency for Higher Education (QAA). The QAA handbook provides a series of points concerning core aspects which need to be addressed in formal teaching quality reviews. These points identify areas where evidence will be required from the institution about its knowledge of the student experience and the ways in which it has taken their views into account in course (re)design and production. Evaluation activities need to encompass all of this in some form -  this data is mainly gathered by questionnaire.

Course Team: The course team that develops the course will need various levels of information at different times in its work. To begin with, it will need to be reassured that the design of individual components of the teaching materials are of high quality and express the pedagogy effectively.  At the same time, the way components are combined can affect how effectively students construe teaching points – badly combined components can seriously skew interpretation, making the student’s life more difficult rather than making it easier. Developmental testing during materials production is an essential activity, therefore. Such testing relies upon detailed studies, sometimes observational, sometimes using interviews or questionnaires. When the materials are complete, the course team will need to satisfy itself that the way students at home interpret the materials is in line with expectations - bearing in mind there is no peer group in people’s sitting rooms to help calibrate or moderate their impressions and interpretations. So the course team will then want to be able to lay its hands on data that supports claims made for effective curriculum design, learning outcomes and student satisfaction. This will be partially fulfilled by evaluation studies conducted by questionnaire that will complement other forms of data (e.g. exams and assessment outcomes, progression studies).

Students: Obviously, students have a vested interest in receiving well-designed, well-tested and effective teaching materials. They will want reassurance that they have opportunity to express their views and that these are seen to be taken on board. We also needed to reassure students that evaluation questionnaires would remain confidential – they must feel confident that it was the materials under scrutiny, not them.

In order to provide appropriate data for all the stakeholders,  a variety of evaluation perspectives would need to be presented, some based on qualitative and some based on quantitative data. It was clear to the evaluators that formative developmental testing was going to be critical as a means of establishing that the pedagogy and approach to use of materials was appropriate before the course got anywhere near students. Subsequently, summative evaluation of the course in presentation would provide a check on the efficacy of the developmental testing.


Evaluation Approach

There are many debates in the literature on appropriate methods and paradigms for evaluation (Guba and Lincoln, 1989; Calder, 1994; Hammersley, 1993, Draper, 1996; Oliver, 1998). Our study would have two phases: the first would be the developmental testing required to establish that the pedagogy and use of media were robust. In the second phase, we needed to be able to evaluate the entire course across its first year of presentation and follow it up in the second year, so that we could validate the decisions made and provide useful data regarding student satisfaction levels to the senior managers and the faculty.  The evaluation strategy had to encompass both the content of the course and the ability of students to understand the teaching, as well as the way that media had been used to communicate those teaching aims. This was not an easy thing to do, especially as encouraging students to stand back from the media-mix could lead them into adopting a quasi-analytic 'media critic' role that focuses on superficial elements of a medium, rather than on the deeper role of that medium in the teaching and learning situation, a phenomenon we had experienced in previous evaluation studies. It can lead students to spend time critiquing, for example, aesthetic aspects of the interface in multimedia systems, rather than the effectiveness of the underlying teaching. This kind of difficulty is, of course, crucially dependent on the way in which questions are posed, but we have consistently found that, when completing questionnaires, students at a distance often interpret questions the way they want them to read. Such misinterpretations are easier to correct in interviews and observation studies. Perhaps more importantly, we also run the risk of forcing students into a position of unpicking the very learning experience that the course team  had spent so much time carefully constructing.

In order to take into account all these issues, as well as the needs of our stakeholders, we turned to integrative evaluation (Draper, 1996) to provide our framework. In this view, the evaluation is aimed at:

...improving teaching and learning by better integration of the [materials] into the overall situation. It is not primarily either formative or summative of the software but surrounding materials and activities.

This is also compatible with the evaluation framework developed by Jones (1996) et al., who rightly acknowledge the importance of context on evaluation and where the researchers' concerns were: evaluate the effectiveness and quality of CAL, whilst at the same time, investigating the educational situation as a whole and focussing on the learners.

Consequently, we adopted an approach to the evaluation of a whole course that aimed at understanding how well the media mix works for students in their overall context. In order to do this, five sources of information were available to us:

  1. Developmental testing, using observation studies, interview and questionnaires
  2. Concurrent on-line questionnaires administered via the Web
  3. End of course paper-based questionnaires administered by post
  4. Computer conferences
  5. Data from an instrumented version of the LearningWorks programming environment

Each of the above offers different potential. We will take each in turn and consider its role in the evaluation strategy.


Developmental Testing

As discussed earlier, there was a strong prima facie case for developmental testing because of the course team’s radical syllabus and its plans for use of media. To accomplish this, it was  necessary to engage – and pay – a small number of people, relative to the estimated expected course number, willing to take on the role of M206 student and therefore committed to properly studying preliminary learning materials. These were considered preliminary in that the syllabus and pedagogy were still being reviewed and the software and Web site were still being designed and implemented. A combination of longitudinal surveys, open-ended and semi-structured interviews and questionnaires was used (see later discussion). Developmental testing was to be the most important part of the evaluation strategy both because the findings would greatly influence the course in its development and because the function of the later questionnaire studies is to confirm the decisions that were made as a result. We would not expect to make major changes to the course on the basis of the summative evaluation, provided that the developmental testing had gone well.



After the course has been developed, the numbers involved are much greater, so the approach needed to be streamlined. Traditionally, the main instrument for evaluation studies with large numbers of people is questionnaires, so this would be appropriate for a course being taken by 5,000 students. Questionnaires have the advantage of cheaply providing extensive coverage of the population under consideration. However, questionnaire evaluation often does not provide deep and meaningful accounts of student learning: questionnaires are (in comparison with observation and interview) relatively blunt instruments, providing data about preferences, trends and patterns of behaviour that can be tracked over time. But by cross-tabulating one question with another we can begin to establish patterns of preference and relationships between materials and, by further cross-tabulating with demographic data, we can find out things about what in a student’s background makes a difference to their success in M206 – for example, gender, previous education, previous courses taken, age and so on.

Another crucial role for this kind of data collection  is to provide a baseline that can be used as a benchmark for monitoring change - continuous improvement, ideally - and given that M206 is a course specifically designed to be evolutionary,  this aspect is important. At the same time, survey data can also provide snapshots of the population under study and how students are responding to the teaching and learning experience. Hence, although the questionnaire data collected would feed into the rolling remake of the course by the course team (and would therefore be formative), it would also provide the summative “snapshot” view required by the senior management.

There were two opportunities available to us for using questionnaire studies – on-line questionnaires, made available to students via the Web, that were posted  at the end of each block of work in the course and paper questionnaires delivered to students’ homes at the end of the course.

Concurrent on-line questionnaires: Although an extremely convenient method of delivery for feedback, there is an issue here of sampling. Normally, a questionnaire study would be administered to a balanced sample of the population under scrutiny. In such a case, one routinely checks the return of the questionnaires to ascertain whether or not bias has been introduced to the system by default (i.e by comparing the demographic information of the respondents with the demographic profile of the population under study). There is always an element of self-selection in the returns of questionnaires – this hazard is always present – but in the case of M206, we were exposing ourselves to some very specific bias. Our guess was that the feedback questionnaires would be returned by (a) the most computer literate students, who will have found them on the web quite easily; (b) students who had the (spare) time and energy to complete them and (c) people who had a specific point of view they wanted to communicate (at both ends of the spectrum from “this is brilliant” to “this is awful”). Although we knew that some of the students registering for the course would be highly computer literate, we were also aware that, as an introductory course, M206 would attract a proportion who would be new to computing and who might only have acquired a computer in order to follow the course. So we needed to ensure that we were not just obtaining feedback from a particular sub-set of students, which led us to the paper-based questionnaires

End of course paper-based questionnaires: The OU has an annual means of gathering information about courses in a given year – the Annual Courses Survey, briefly mentioned earlier. Approximately 30,000 students in a stratified sample are sent postal questionnaires, the response rate being typically about  65%. Students are quizzed about all aspects of the teaching and delivery of their course. The first part of the questionnaire is the same for all courses and asks standardised questions about use of media in general terms, plus other aspects of the course(s), for example satisfaction levels for delivery of materials, tutorials, assessments, study time spent, etc. This data is for all stakeholders and is prepared by the Student Survey Office for the academic unit, course team and senior managers in the university. The second part of the questionnaire is designed in consultation with the relevant course team to be specific to their course. We obviously wanted to take advantage of the Annual Survey to collect information on M206 – it would provide information about the overall course context which has the value of also allowing us to compare this course with others in the University. We adapted the second part of the questionnaire for M206 to be compatible with our Web questionnaires, but asking about student satisfaction levels over the whole course, rather than over individual chapters.

This paper-based questionnaire was sent to a properly composed sample of the course (constructed to represent the demographic of the whole course) and was posted to students’ homes. By excluding from this sample anyone who had participated in the Web questionnaires we would achieve the necessary balance we were looking for: the demographics of the two cohorts (Web respondents and paper respondents) could be compared with the demographics of the course as a whole to determine whether or not we could claim to have obtained views from an appropriate cross-section of students.


Computer Conferences

A further source of information available to us was the computer conferencing system, where students use an on-line conference area to discuss issues with one another and their tutors. Examining the discourse in the open lobbies to conferences would tell us something about concerns students were having. However, once again, caution needs to be exercised because of the potential bias in these discussions. For example, if one student with a particular bias enters the system and posts an inflammatory note, the effects can spread throughout the course very quickly (a variant of the well known e-mail phenomenon of ‘flaming’). Lots of other folk may take the opportunity to jump on the bandwagon with minor quibbles, whilst others may feel the need to leap to the defence of the course and so on. Similarly, people who might otherwise have joined in with discussion could find the volume of messages too great to handle, preferring simply to withdraw than deal with it.


Data from an instrumented version of the LearningWorks programming environment

The final source of data is the most ambitious to achieve in the sense that it required considerable research and development. The malleability of Smalltalk programming environments convinced the course team and colleagues from the OU’s Centre for Informatics Education Research, who were not involved with M206, of the feasibility of instrumenting the LearningWorks system to gather data about how students actually use it to learn programming and to learn about object technology concepts. Such instrumentation is not part of the original LearningWorks programming framework (Goldberg et al., 1997) and so an on-going project called AESOP (Thomas et al., 1998) was established to progress the development of this source of data.


The Evaluation Process

We now examine how each form of data gathering was deployed and what the outcomes were.


Developmental Testing Process and Outcomes

The strategy for the early part of production (1994–5), was to gather data both from surveys of the testers and open-ended interviews of them (Sumner and Taylor, 1997). Semi-structured interviews were also to be used for insight into the way students might cope with the media mix. Fourteen people, similar to the expected student population, were paid to act as students and to take the course using the preliminary materials. Most of these putative students, these testers, had no prior programming experience of any kind and a few had very little experience using a computer. The longitudinal surveys in developmental testing looked at whether people were able to learn with the resources as they were being developed. Each person was given for the duration of the testing the necessary computer hardware and software (much of it in prototype form). As the testers worked through each chapter, they filled in a questionnaire about the resources and the subject matter; the resources were mostly printed material and early versions of LearningWorks and LearningBooks. If testers did not reply for any reason, they were interviewed by telephone and the questionnaire filled in on their behalf. In practice, the only stakeholder in developmental testing is the course team; unless significant problems arise, the other agencies would not review the results although they are available to all stakeholders.

Analyses of the testing questionnaires and interview data showed that, overall, things were going fairly well – the tester ‘students’ were able to do the practical programming activities and to answer questions about various object-oriented concepts. In general terms, it was clear that the approach to pedagogy was successful and students were learning what they were meant to. However, it was clear that testers were not really making effective use of their computer-based course resources and instead were relying heavily on the paper-based materials. In addition, while they could write Smalltalk code or answer questions about the material, they seemed to have difficulties mapping between the two; that is, they had difficulties drawing the connections between theoretical or conceptual issues and what appeared to be mechanistic activities. Given the testing situation, these problems could stem from the use of preliminary course resources still very much in development. To determine if these worrisome phenomena were arising from the testing situation or if a more fundamental problem existed, towards the end of this testing, open-ended, semi-structured interviews were also conducted with eight of the testers. These interviews focused on understanding testers’ experiences working and learning with the computer-based resources. They were then asked in detail about how they studied, how they organised their computer-based work and study time and their experiences using particular tools such as the emerging LearningWorks programming environment. Testers were asked to show us their notes, diaries, etc. and explain them. These interviews were transcribed and analysed and we summarise our analysis of developmental testing thus:

  1. Some testers had little confidence in their ability to work and learn with the computer-based resources. Several had long-standing prior anxiety about computers, which was slow to change. Others had experienced setbacks, e.g. a difficult installation or Web session, which had shattered their confidence.
  2. Many testers felt disoriented and had difficulty judging progress through electronic materials.
  3. Nearly all of them were unclear about the role of the various resources in the course and how they should be using them to support their learning.
  4. While they could use resources to carry out specific activities, many had difficulties making connections between practical activities and larger conceptual issues. Consequently, they tended to develop isolated, piecemeal views of their computer resources.
  5. Some testers were relying heavily on reading texts. The interleaving approach adopted in these texts between discussion of programming and hands-on exercises led some to believe they could read about programming rather than actually doing it. Some reported difficulties using the texts side-by-side with the LearningBooks.
  6. Some testers rarely visited the web site since the materials there were redundant to their printed text.

In retrospect, these results are not particularly surprising. Many testers had been selected because they had little or no experience of computers and had been thrown into a new way of working and learning with technology. They assembled and plugged in their hardware, opened and installed a diverse collection of software and had expectations that they should become productive users virtually overnight. Instead, many felt overwhelmed by it all, lost confidence, focused on trying to master isolated bits of technology and thereby lost track of the larger goals for the course.

Within the course team, a sub-group of those people involved in media production (i.e. graphic designers, editors, television producers and educational technologists) formed a Media Group with a special remit to consider the developmental testing data (leaving the computing academics to continue the job of generating the basic course materials). Working with the evaluators, the group concluded that the preliminary materials relied far too heavily on the print component – students were required to read some expository text and were then instructed in what they were expected to do in the programming environment. They would then start up their computer, open LearningWorks and execute the exercises. Upon completion, they would return to the text to read about how to interpret what they had done and what to do next. This entailed a lot of descriptive writing in the main text to set the scene; multiple screen shots to allow students to recognise where they were and what state the machine would be in; and a great deal of detailed step-by-step explanation. This approach not only made the texts daunting to read, but also divorced activities performed on the computer from the context provided by the print materials.  This led to testers being unable to recall why they were doing some tasks, as well as being unable to relate them to the “big ideas” contained in the course. A bridge between theory and practice in the course had to be constructed.

Furthermore developmental testing clearly identified the difficulties that students were having understanding why they were using various media in the course or how one kind of course component related to another – for example, what was email to be used for and how would this be different from using computer conferencing? What contribution did the television programmes make and how did content presented there relate to programming exercises? An explicit architecture for the course was designed that showed students the relationships between media. The architecture was supported by a Course Map CD-ROM that both introduced the media and explained how the course team expected students to use those media. Consistent with this architecture was the need to move the problem-solving contexts out of the text and into the LearningBooks to create a clearer distribution of activities across the resources; this conclusion resulted in the way the course team decided to structure and to use LearningBooks. Hence, evaluation during development directly resulted in clear guidance to students, for example by way of setting an agenda for practical work in the HTML – similar to Lemke and Fischer (1990) and Wroblewski et al. (1991) – and to support bridging between practical action and general concepts. Embedded hypertext links to concepts presented in an electronic glossary were provided.

The changes made to the course materials as a result of this formative developmental testing were far-reaching. By clearly moving software development activities into the LearningWorks programming environment, students ought to be able to elaborate a more substantial ‘doing’ context for their programming activities. The print materials were then able to take a longer view, presenting a more reflective perspective, away from the nitty-gritty of programming tasks. The changes were expected to make the relationship between media much clearer, enabling students to be more sophisticated in their use of media (there was even a short tutorial on how to watch television for study purposes!).

However, whilst these changes to the course materials were indicated by the developmental testing data, we had to make assumptions that we did not have time to thoroughly scrutinise at the time. Indeed, the Media Group only had time to take one chapter and rewrite it according to the new design principles (Sumner and Taylor, 1998) but, on testing, we found a greater preference for the style and considerable improvement in performance. Since we did not have time to change all the chapters and re-test them, we would have to trust our instincts and await the data from real students on the course to validate the approach.


Questionnaire Studies

Concurrent on-line evaluation was carried out while the course was running using Web-based questionnaires – one for each of the seven blocks of the course. These questionnaires, to be filled in at the end of each block, asked for feedback on each chapter of the course. Completion of them by students was optional. Problems can arise here due to the process of learning itself. For example, if, having struggled with a concept, a student is immediately asked about the quality of teaching, he or she may give a negative response, possibly reflecting a lack of understanding as well as frustration. However, a fortnight later, when the concept has become integrated with other concepts so that it now makes more sense, the student may, quite naturally, give a more positive response. Therefore, timing can play a critical role in this process.

Similarly, a decision was taken not to break chapters down into their constituent media components in order to discover their effectiveness because we wanted to avoid having the students immediately deconstructing the teaching that had just taken place. Whilst it is acceptable to pick the brains of testers (who, after all, are being paid to do the job), we didn’t want to lead students into a critique of the teaching that might ultimately be harmful to them. We wanted instead for students to reflect on their overall experience, so questions were designed with that in mind. At the same time, of course, we did not want to overload people with complex questions that might take as long to complete as working through the course materials themselves. The questionnaires were, therefore, designed for rapid completion, and took a holistic view of chapters.

By this stage in the evaluation process, the questions were not directed at ascertaining how well students had learned. That function was to be carried out by the various assessment tasks and final examination. Instead we were targeting their experience of learning – “how was it for you? Did we make it as painless as possible, or did we place obstacles in your way?”

There was no active process of sampling for completion of these questionnaires, but we asked students who completed them to provide their university Personal Identification Numbers, so we could examine the demographic composition of the group who returned these. (They were also tagged in the student database as having responded so would not be bothered by other forms of evaluation request.) One of the major problems that arose in this part of the evaluation was that, because of resource constraints and technical problems, not all the electronic questionnaires were available at the optimal time for completion – i.e. at the end of a block. Students then had to sit down and think back over their experience of a month or so before. Whilst from the data collection point of view, this may not be overly problematic, students found it irritating – they had moved on since then – and we think it led to a decline in response levels. The response rate begins high, then rapidly tails off. Due to server difficulties, the questionnaires for the last two blocks of the course were not available at an appropriate time for students to complete, so that data is missing.

Since the M206 study, we have had experience of using on-line questionnaires in other courses where the rapid tail-off is also commonly observed. The tail-off does limit the validity of later results, in principle, although the information we gathered can be triangulated from the paper-based survey.

End-of-course paper-based questionnaires (the course-specific part of the paper based questionnaire) were designed to be complementary to the data obtained from the on-line evaluation – for example, amongst other things, students were asked to say how much time they were spending on the paper-based and the computer-based activities in the course in relation to the course team estimations of time, which were provided for students in the print materials. The on-line questionnaires asked for this information on a chapter by chapter basis. The paper questionnaire, administered at the end of the course, asked for the same information estimated by students on a block-by-block basis, as well as an overall estimation of the whole course. These data were then compared for consistency.


Computer Conferences

The computer conferences proved more difficult to deal with than expected, mostly because of the sheer volume of data items resulting from much greater use of conferencing predicted. During the first year, the course team set up conferences for social interaction, a metaphorical cafe and rooms for students from each of the thirteen OU regions across the UK, as well as for communal study and discussion. Many students spent a great deal of time in these conference areas and initially they were not very disciplined about where they discussed what. Consequently, discussions of teaching material became scattered and hard to track. Plans were made to have a group of evaluators monitor the conferences but the analysis has also proved an overwhelming task. The volume of messaging amongst 5,000 students was so high that the buffers were flushed more or less on a daily basis. The evaluation team simply did not have the staff resource to keep up with students. (Many students didn’t keep up either!) Some data was preserved, but so far, there is not the human resource to analyse it. Changes to the structure and prescribed use of the conferences in the second year of the course may make it easier to manage this source of data. The much smaller block conferences did prove extremely useful in identifying problem areas for students, though. There we found discussion of the various problems that arose during the year, from the down-time of the computers on which the conference system ran (failures due to the unexpected high usage) to the various interpretations of sentences within the teaching or assessment material.


Observing student practice with instrumented Smalltalk

During the first presentation of M206, the observatory was limited to the first four LearningBooks used in the course; this was no mean achievement as several classes needed to be designed and implemented to allow the recording and playback of student interactions with LearningWorks. Consequently, data from the first year is limited.

Early results from AESOP indicate students have in general used the facilities in the LearningWorks environment as the course team intended. However, it is noticeable that they are very forgetful of the protocol of classes – they often cannot remember what message to send an object. This suggests the need for automated assistance for even the limited protocols they must learn. Note that this evaluation tool has a potential benefit for tutors when supporting learners: if a student is having a programming or usage problem, with an instrumented LearningWorks they can “record” what they have done and email the resulting text file (Thomas et al., 1998).  Their tutor can replay the recording, using a system with appropriate facilities, and probably solve the student’s difficulty – an important benefit in distance education.


Outcomes of the Evaluation

The outcomes from developmental testing have been described in detail earlier. In this section we discuss how the data from the various sources outlined earlier were combined together into a mosaic of information that allowed us to demonstrate that the majority of students found it straightforward to construe the learning that the course team had designed into the materials. Overall, the majority of students did not suffer from the same level of difficulty that the developmental testers had. Interestingly, though, from the AESOP project, although still in the early stages, it can be seen that some students did exhibit the same lack of confidence with computer systems as shown by the developmental testers; this points to further advice being given to prospective students regarding expected entry skills. Some have failed to grasp the elementary fact that simply repeating a failing task will not lead to progression; explicit teaching on deterministic behaviour might be need to be introduced in early practical exercises. So, some further work in this direction is indicated.

That the majority did not suffer from incapacitating difficulty may also be related to the fact that most students (94%) were very familiar with the process of software installation prior to starting M206. This was reflected in the high proportion of them (97%) that completed the software installation themselves without requiring expert assistance. Therefore, our concerns that the on-line questionnaire data may be skewed by over-representing those students who were already computer literate before they began the course proved unfounded. By and large, the on-line and paper-based respondents were very similar in respect of relevant previous computing experience. However, in future presentations of the course this may not be the case, so it needs to be monitored.

Although the samples shared similar computing background, there could still have been differences in the way they responded to questionnaires. The first difference is in timing - the electronic questionnaires will have been completed (more or less) at the time in the course when a block has been completed (despite some delay, as noted earlier). This means the experience would have been fresher in the minds of the students who completed electronic questionnaires than those who completed the end-of-course, post-examination questionnaire for the Courses Survey. However, the overall patterns of response were similar – i.e. chapters that were rated difficult, or taking an undue amount of time, showed up in both datasets. Consequently we felt reasonably confident when we recommended changes to specific chapters.

A further difference is in the grain-size of information. The Courses Survey can provide very useful information to compare courses, but there is no doubt that it is a blunt instrument – for example, it asks students to make one statement regarding each course component across the whole year. On the other hand, the on-line questionnaires elicit information regarding blocks and their chapters. In effect, this asks students to draw comparisons between course components and their deployment in the teaching. This more detailed, fine-grained data at chapter level is useful for the course team to decide where to target their efforts for future presentation. From this information it was clear that some of the course components (i.e. the television programmes) were not well enough integrated into the other materials to be useful. Students are often pushed for time, so if a component does not appear to be as important as others are, it will be marginalised and not studied. This was a pity, as the television programmes provided useful case-study material.

Therefore, although the two data sources – Courses Survey and on-line questionnaires - did not exactly match because of these differences, we did see similar overall patterns in the data, as we had hoped we would.

In summary, from the questionnaire studies we ascertained that:

  1. M206 was well received by students.
  2. The workload was uneven and, in places, much too high.
  3. Students valued the print materials and the computing software highly, but found the analysis and design chapters very hard.
  4. Other course components were not well enough integrated into the course to be as valuable as they should have been.
  5. There were some hiccups and problems with the face-to-face tutorial process, but not to the extent that it led to students withdrawing from the course.
  6. There was an issue of student expectations for the course that needed to be managed through more careful targeting of promotional material – some students clearly thought this was a programming course, which it never was designed to be.

Despite caveats about the overwhelming flood of information from computer conferencing, there is no doubt that the conferences were an illuminating place to look for comments on the course, both from students and tutors. Although the evaluation team was not able to incorporate an analysis of this information into the evaluation strategy, the Course Manager found it a very useful way to stay in touch with students on a day to day basis. Through this means, the course team were able to anticipate many of the findings of the formal evaluation process, enabling them to start making changes to materials before we had formally reported.


Recommendations for the future

This paper has described a wide-spectrum integrative approach to the evaluation of teaching of a radically innovative introductory course in computing. The strategy aims to provide a mosaic of data from which patterns of behaviour can be derived and, to this end, uses five data sources. We believe that the integrative approach to evaluation has proved its worth and it is important to recognise that we have been able to provide data at a variety of levels to satisfy our various stakeholders.

In the first year of M206, Computing: An Object-oriented Approach we have learned valuable lessons that will allow us to refine our strategy. For the most part the data collection has been successful. Several areas were problematic – all to some extent because of staff resource and technical inefficiencies. The first was associated with the role of the media model as a result of developmental testing; the second was associated with the use of Web-based questionnaires and the third was our underestimation of the use students would make of conferencing. We will address each in turn.


To be effective, recommendations need to be taken through into policy statements

When working with a complex media-mix, we know from the development of OU courses in the past that it is crucial to try to keep each medium in its place. It is important to have a clear media policy that locates each medium in relation to the rest to prevent ‘media bleed’ - where each medium starts to mimic the others, playing to an overall sense of pedagogy, but not playing to each medium’s strength in delivering that pedagogy. Media integration is an important issue to attend to in this situation – how do media relate, how is control passed from one medium to another, how do they complement each other, etc? However, it is insufficient simply to have articulated this policy – it needs to be referred to, updated, and made workable throughout the production period.

In this respect, the M206 course team found the web a difficult medium to handle, partly because it was viewed as a developing technology that could be deployed by the team to great effect - which was true to some extent. However, this interfered with the view of the course as a set of integrated media resources. In the initial phases of development, the course team conceived of web-based delivery in terms of using the web to deliver what was essentially the print medium. The M206 Media Group advised the course team that they needed to transform their teaching approach to make it web-suitable, not translate from print to on-line print, so a model of media use was offered to them. The course team adopted the model, but adhering to it proved difficult. For example, at one stage, parts of the teaching that the editors had removed from the print-based material began to find their way onto the web as ‘optional material’. The website next made inroads into the Course Map – much of the functionality of the Map began to appear on the web, thereby confusing students and undermining the use of the Map. Then students began to be confused because the web was interfering with the conferences, taking over the role of making announcements, holding the latest versions of course documents and so on, which many students found overwhelming. This is clearly a partial story – many factors were at play as the course was rolled out – but it illustrates how one very powerful medium, viewed as an enabling technology, can begin to dominate other teaching media in unexpected ways.

What has this to do with evaluation? The point is that the evaluators made recommendations to the course team as a result of developmental testing, which were adopted, but that was not enough. Such recommendations needed to be taken through into stated policy. Decisions or amendments to this policy should have been well documented, so that changes in personnel did not affect the rollout of the media, or the development process. Regular monitoring to check whether the course was deviating from the media policy would have also helped control the situation described above. This view assumes that the role of the evaluators is to ensure that students receive a well-produced, well-tested course. In the interests of doing that, they adopt an active role in course development, forming an important part of an on-going quality systems process.


Evaluation findings often entail further commitment of resource – ensure those resources are available

Difficulties in seeing how evaluation findings and recommendations can be incorporated into onward development are still being played out in relation to the course. In the view of the evaluators, the single biggest contribution that could be made now to enriching the student experience for M206 is to develop the learning support structures for students and tutors – i.e. the glossary, the Course Map, indexes, Smalltalk reference documents and so on. Such support mechanisms would help students manage their workload more effectively, clearly an area of the course that needs attention. However, disappointingly, given the initial investment in the course, the faculty’s concerns with resource issues have limited the response that can be made in these areas.


Electronic communication opens the flood-gates for feedback – ensure you have controlling mechanisms in place

We were delighted at the prospect of using conference-based information for evaluative purposes, but we severely underestimated how many students would use the facility for all kinds of chitchat and information exchange. This had both positive and negative effects for students, but it also severely stressed our computing resources – the servers could not cope with the volume of traffic. When we realised that we stood no chance of keeping track of the ongoing discourse, we hoped to store the information so we could review it after the course had ended. However, this proved difficult as the buffers needed to be cleared ready for the next day’s activities. This meant we had lost both the opportunities to analyse that material and to consider how students’ comments and discussions could have been used as tertiary teaching materials for future presentations of the course.



We have two major conclusions from this work. The first confirms the value of collecting relatively cheap, easy to analyse data from multiple sources which, taken individually, might not tell us much of great value. However, when taken altogether and cross-referenced, one illuminating the other, they can be highly informative for a variety of stakeholders, providing useful insights. The analysis of course content and pacing was provided by the on-line questionnaires, enabling the course team to keep track of what was causing problems. This information was buttressed from discussions going on in the conferences where possible, and from the end-of-course evaluation questionnaire.

Similarly, the amount of time students were spending time on either their computer-based activity, or their paper-based activity was recorded in the on-line questionnaires, validated through the paper-based questionnaire at the end of the course. These two data sources were of different granularity, but they both showed similar patterns, indicating the same areas of difficulty for students, or places in the course that were making excessive demands on students. This began to show us which parts of the course students were avoiding and/or those media components that did not deliver their objectives crisply enough for students to continue using them. For example, as mentioned above, the television programmes were quickly perceived as peripheral to the course because students under pressure did not feel they needed to work through the enriching material.

Subjective data from students is not necessarily a good measure of the success of a course in terms of teaching and learning. However, we can see the interplay between several factors in simple questionnaire studies. For example, the on-line questionnaire for Chapter 9 in the course showed some interesting fluctuation – there was a simultaneous increase in the numbers of students who rated it as taking ‘a lot more’ time than the recommended amount and the number who rated it as taking ‘nowhere near as much time’.  So some students found this one a much more time-consuming chapter, whilst others found it much less time-consuming. This contrasts with other chapters, some of which showed an increase in the ‘a lot more time’ proportion, but no increase in the ‘nowhere near as much time’ category, thereby suggesting that they were harder chapters to work through. This observation focused our attention on Chapter 9, a chapter with no particular problems showing up elsewhere in the analysis. It turned out that Chapter 9 was the first time that students engage with Smalltalk programming as an activity – it was the place where all the prior learning in previous chapters came to bear. Therefore, it sorted the wheat from the chaff – who really had understood the object concepts taught earlier and who needed to spend more time clarifying and revising to embed their earlier learning? So whilst it can be tempting to be sceptical of the value of questionnaire-based subjective data, the information yielded can be quite subtle.

Our second major conclusion was related to the use of on-line questionnaires in evaluation, both in terms of the pros and cons underlying the apparent convenience of using on-line evaluation methods for evaluation. As pointed out earlier, the course team had addressed many of the recommendations of the evaluation report before the formal report was produced. Delivering courses on-line facilitates rapid response and feedback from students in a variety of ways – through e-mail, conferencing, and on-line evaluation. Although this is convenient for the course team, this feedback needs to be properly balanced. It has always been the role of evaluators, not just to collect data, but to offer interpretation of that data, to contextualise it and to advise stakeholders on what the appropriate level of response should be. In some senses, this role acts as a buffer between customers and providers that protects each from the worst effects of the other. Good evaluation ensures that students receive well-designed, well-produced, effective teaching materials. Good evaluation also prevents producers of the material from being overwhelmed by critique and criticism from a relatively few students making competing demands to have things changed in this way, or that. In the OU context, there are several dangers, as follows.

Firstly, course teams are not prepared to cope with the volume of information that may come in from a course of 5,000 students. Within that volume, there may be some very loud voices indeed. Whilst evaluators are used to dealing with this kind of situation, course teams and academics can be overwhelmed and de-motivated by it. Secondly, there is a risk of knee-jerk reaction from the course team, who, thanks to electronic presentation, are now easily able to make adjustments and changes to course materials. This has led, in several instances, to students being bombarded with errata in unhelpful ways and to changes being effected that served only a minority of the population rather than the whole. Thirdly, perhaps most importantly, the course team’s own observations of the course as it runs can override the value of considering the outcomes of the properly constituted evaluation study, leading the academics to think that they have taken care of all ‘problems’. Unfortunately, as evaluators, we know that this is unlikely to be the case. From an institutional point of view, it is important to ensure that evaluation findings are reported formally and that lessons learnt are effectively disseminated through the relevant communities. This experience must feed into future development of courses. Only by doing this can we complete the feedback loop that ensures quality, which is, after all, the motivation behind most evaluation studies.



  • Calder, J. (1994). Programme Evaluation and Quality, London: Kogan Page.
  • Draper, S., Brown, M. I., Henderson, F. P. & McAteer, E. (1996). Integrative Evaluation: An Emerging Role for Classroom Studies of CAL. Computers and Education, 26 (1-3), 17-32.
  • Goldberg, A., Abell, S. & Leibs, D. (1997). The LearningWorks Delivery and Development Framework. Communications of the ACM, 40 (10), 78–81.
  • Guba, E. & Lincoln, Y. (1989). Fourth Generation Evaluation, London: Sage.
  • Hammersley, M. (1993). Educational Research: current issues, London: Paul Chapman in association with the Open University.
  • Jones, A., Scanlon, E., Tosunoglu, C., Ross, S., Butcher, P., Murphy, P. & Greenberg, J. (1996). Evaluating CAL at the Open University: 15 years on. Computers and Education, 26 (1-3), 5-15.
  • Lemke, A. C. & Fischer, G. (1990). A Cooperative Problem Solving System for User Interface Design. Paper presented at the Eighth National Conference on Artificial Intelligence, 29 July – 3 August, Boston, MA, USA.
  • Oliver, M. (1998). Innovation in the Evaluation of Learning Technology, London: University of North London press.
  • Sumner, T. & Taylor, J. (1997). Coping with Virtuality: Steps toward a Personal Learning Manager. Paper presented at the UNESCO Workshop on Virtual Learning Environments, 27-29 April, Open University, UK.
  • Sumner, T. & Taylor, J. (1998). New Media, New Practices: Experiences in Open Learning Course Design. Paper presented at the CHI 98, April 18–23, Los Angeles, USA.
  • Thomas, P., Macgregor, M., Martin, M. (1998). AESOP - An Electronic Student Observatory Project, Paper presented at the Frontiers in Education '98 Conference, November 4-7, Tempe, Arizona.
  • Woodman, M., Griffiths, R., Macgregor, M., Holland, S. & Robinson, H. (1999).Exploiting Smalltalk Modules in a Customizable Programming Environment. Paper presented at the International Conference on Software Engineering, 16-22 May, Los Angeles, USA.
  • Woodman, M. & Holland, S. (1996). From Software User To Software Author: An Initial Pedagogy For Introductory Object-Oriented Computing. Paper presented at the SIGCSE/SIGCUE 96 Conference, 2-5 June, Barcelona, Spain.
  • Wroblewski, D., McCandless, T. & Hill, W. (1991). DETENTE: Practical Support for Practical Action. Paper presented at the Conference on Human Factors in Computing Systems (CHI ‘91), April 27 – May 2, New Orleans, USA.