Helping Children Learn Vocabulary during Computer-Assisted Oral Reading
This paper addresses an indispensable skill using a unique method to teach a critical component: helping children learn to read by using computer-assisted oral reading to help children learn vocabulary. Why should you read this paper? Literacy matters: The increasing demands of the information economy require higher and higher standards of reading ability from everyone, not just the privileged few. There is a clear need for better tools for literacy development: The United States Department of Education’s National Assessment of Educational Progress reported that 69% of American fourth graders read below desired proficiency; 38% were below even the basic level (Donahue et al., 1999). Vocabulary knowledge plays a critical role in reading, by enabling and facilitating comprehension (Snow, Burns, and Griffin, 1998). Using computers to boost vocabulary learning holds promise for offering children frequent, engaging practice with the meanings of words.
We focus in on one aspect of vocabulary learning, as follows. First, we focus on learning words during assisted oral reading. Second, we concentrate on initial encounters with words. Third, we subdivide vocabulary learning from initial encounters in text into two stages: encountering new words in text, and learning from those encounters. We demonstrate improvements over baseline computer-assisted oral reading by: (a) making sure that all students – not just better students – see new text; (b) adding information to text so that children can learn more from encounters with words than they would from the original text alone.
Our novel approach builds on a recent advance in computer technology as applied to reading: computer-assisted oral reading. We leverage others’ work by building on a software platform representing years of multidisciplinary endeavor: Project LISTEN’s Reading Tutor, a computer tutor listens to children read aloud, and helps them learn to read (Mostow and Aist, 2001). The Reading Tutor is research software based on years of research and development, but is not (yet) a commercial product (http://www.cs.cmu.edu/~listen). We situate our work in real classrooms at two Pittsburgh-area schools: Fort Pitt Elementary School, in a low-income neighborhood of Pittsburgh, and Centennial Elementary School, in a moderate-to-low income urban neighborhood near Pittsburgh. Computers’ bookkeeping capability enables us to carry out finely detailed in-classroom experiments with massive samples recorded in excruciating detail.
Our results break new ground in several fields of research. For those interested in computer-assisted oral reading, we demonstrate improvements over Project LISTEN’s baseline system prior to our research. For those working on intelligent tutoring systems, we operationalize a hybrid method for deciding which task to work on next: taking turns. For reading researchers, our experiments illuminate the relative merits of natural text and artificially constructed vocabulary help, and provide an example of automatically generated vocabulary assessment. Furthermore, our follow-on experiments demonstrate that children as young as fourth and fifth grades can acquire some word familiarity and word knowledge from as little as 1-2 exposures to a novel word.
The main contribution of the paper is to lay out a framework for studying how to boost children’s vocabulary learning in the context of computer-assisted oral reading, summarize results from several years of empirical research in this area, and draw conclusions and make suggestions for further work.
How can we help children learn new words? We consider two primary methods: direct instruction and learning through reading; and a hybrid: adding information to text.
Direct instruction. Intensive study of specific vocabulary words results in solid knowledge of the taught words, but at a high cost in time. For example, a 1983 study taught fourth graders 104 words over a five-month period, with 75 lessons of approximately 30 minutes each – on average about 21 minutes of instructional time per target word (McKeown et al., 1983). Exposures were during varied tasks: “matching words and definitions, associating a word with a context, creating contexts for words, and comparing and contrasting words to discover relationships” (McKeown et al. JRB 1983). In the high-exposure group of words, students saw 26-40 exposures; even for the low-exposure words, students saw 10-18 exposures – a substantial amount of instructional time. Beck and McKeown (1991) suggest that “the problem that effective instruction takes time can be alleviated by targeting instruction toward the most useful words” (Beck and McKeown, 1991). Which words? Second-tier vocabulary (McKeown 1993), that is, words that are “of high frequency in a mature vocabulary and of broad utility across domains of knowledge” (Beck and McKeown, 1991). Even so, direct instruction may play a role for certain critical words (Zechmeister et al., 1995), but a full-fledged instructional lesson is too time-consuming to use for every new word.
Reading. Children can learn words from written contexts (Nagy et al., 1985, McKeown, 1985, Gipe and Arnold, 1978), but the process is incremental. That is, the amount learned from each exposure may be small, but the net effect is still substantial (Eller, Pappas, and Brown, 1988). Also, readers with better vocabularies learn more from context – because of broader and deeper prior knowledge of words – even though less of the information in the text is new to them than to readers with poorer vocabularies (Shefelbine, 1990).
Reading offers hope for spurring vocabulary growth, if children can be guided to read material that does in fact contain unknown words. Carver (1994) argues that “students must read books above their independent level in order to consistently run into unknown words, that is, about 1, 2, or 3 unknown words for each 100 words of text”. Easier text simply does not contain enough new words to substantially impact children’s vocabulary learning (Carver, 1994).
Is simple exposure to multiple contexts sufficient for all readers to learn new words? Perhaps – or perhaps not. McKeown (1985) studied how high- and low-ability students learn words from context. McKeown’s (1985) study examined 15 fifth-graders who, at the end of fourth grade, had scored between grade equivalent 3.3 and grade equivalent 4.1 on the Vocabulary section of the Stanford Achievement Test (Madden et al., 1973). These low-reading fifth graders had trouble learning words from context both because of incorrect inferences about the meaning of a word from context and because of difficulty in deriving meaning from multiple sentence contexts. Even the 15 higher-ability students in McKeown’s (1985) study, who had scored above grade equivalent 4.8 on the Stanford Vocabulary subtest, had some trouble integrating multiple sentence contexts to derive meaning.
There has been some work aimed at teaching children how to learn words from context, but the major effect may be due to practice at learning new words from context and not due to teaching a specific strategy (Kuhn and Stahl, 1998.) Kuhn and Stahl (1998) conclude that “Ultimately, increasing the amount of reading that children do seems to be the most reliable approach to improving their knowledge of word meanings, with or without additional training in learning words from context.” As Schwanenflugel et al. (1997) put it, “… the vast majority of a person's word growth can be accounted for by exposure to words in written and oral contexts, not through direct instruction of some sort, but individual encounters with a word in a natural context are not likely to yield much useful information about that word.”
Adding information to text. Can the context in which a word appears be augmented in some way to make it more useful for learning the word? Typical dictionary definitions may not be written to suit the learner's needs; explanations written to convey the core sense of the word in plain and simple language work better (McKeown, 1993). Presenting context-specific definitions in computer-mediated text has been shown to be helpful for vocabulary acquisition, at least for sixth graders (Reinking and Rickman, 1990). Adding information to text is a hybrid of direct instruction and learning from reading text: first, start with a text to read; second, add brief, targeted instruction about words to the text.
In this paper, we investigate learning words by reading connected text during computer-assisted oral reading.
We now lay out an informal model of the process of learning vocabulary during assisted oral reading. We intend this to be a conceptual framework useful for identifying opportunities to improve vocabulary learning. We will focus here on encountering a word for the first time, and on learning the meaning of a word.
We can characterize how many words a student learns in a day of assisted oral reading as shown in Equation 1.
We define our main claim for this paper as follows, in the context of Equation 1. We can help children learn vocabulary during assisted oral reading by (a) helping them encounter new words, and (b) helping them learn new words they encounter. We aim to help children encounter new words by increasing how much new material students read – not a guaranteed outcome when students have substantial control over their interaction with the software. We aim to help children learn new words they encounter by augmenting text to facilitate better learning than possible with the unaugmented text – not a guaranteed outcome since reading is already a reasonable way to build vocabulary. We verify each of these claims by empirical tests of modifications to Project LISTEN’s Reading Tutor, a computer program that listens to children read aloud and helps them learn to read (Mostow and Aist, 1999).
The remainder of this paper is as follows. First, we present the 1997-98 baseline version of Project LISTEN's Reading Tutor. Next, we summarize how we modified the Reading Tutor to help children encounter new words. Then, we describe how we modified the Reading Tutor to help children learn the meaning of new words. After that, we summarize results relevant to vocabulary learning from a yearlong study of the modified Reading Tutor against classroom instruction and one-on-one human tutoring. We then present further experiments in vocabulary help. Finally, we summarize the contributions of this line of research to date.
We now turn to describing the Reading Tutor.
This paper builds on a larger research project with years of history and publications: Project LISTEN. Project LISTEN’s Reading Tutor listens to children read aloud, and helps them learn to read. A detailed overview of the history of Project LISTEN lies outside the scope of this paper. Mostow and Aist (2001) provide further information. Here we simply inform the reader of enough previous results to set our work in context.
1994 Reading Coach. A predecessor to the Reading Tutor, Project LISTEN’s Reading Coach provided assistance in oral reading (Mostow et al., 1994; see Mostow et al., 1993 for earlier work). In a 1994 study, 34 second graders comprehended a challenging third-grade passage 40% better with Reading Coach assistance than without (Mostow and Aist, 2001), as measured by a comprehension test administered immediately after students had read the passages being tested. In that study, there was no assistive effect for an easier passage.
1996-1997 pilot study. Iterative redesign of the Reading Coach with concurrent usability testing resulted in the 1996 version of the Reading Tutor (Mostow et al., 1995, Mostow, 1996). In a 1996 pilot study reported in Mostow and Aist (2001), 8 bottom 3rd graders at a low-income urban elementary school (Fort Pitt Elementary) used the 1996 Reading Tutor in a small room under individual supervision by a school aide. The six students who completed the study (one moved away; another was unavailable for post-testing) averaged a 2-year gain in eight months from pre-test to post-test on a school-administered Informal Reading Inventory.
Summer 1997 Reading Clinic. During the summer of 1997, 62 students in grades K-5 used the Reading Tutor during a reading clinic at a low-income urban elementary school (Fort Pitt Elementary). Concurrently, “the Reading Tutor underwent major design revisions of the “frame activities” – logging in and picking a story to read – to enable classroom-based use” (Mostow and Aist, 2001).
1997-1998 formative and controlled studies. As Mostow and Aist report (2001):
The 1997-1998 study assessed Word Attack, Word Identification, Passage Comprehension, and fluency – but not Word Comprehension.
We now describe the 1997-1998 Reading Tutor, our baseline system.
The Reading Tutor displays one sentence at a time to the student, and listens to the student read all or part of the sentence aloud, and responds expressively using recorded human voices. The Reading Tutor lets children read stories from a variety of genres, including nonfiction, fictional narratives, and poems.
The design of the 1997-1998 Reading Tutor focused on independent classroom use. Figure 1 shows a student reading with the Reading Tutor while the teacher worked with the rest of the class. A prototypical student session consisted of the following steps: log on, choose a story to read, read part or all of the story, (perhaps) choose and read more stories, and finally log off.
The core interaction was when the student read a story aloud, with the Reading Tutor’s help (Figure 2). The Reading Tutor responded when it heard mistakes or when the student clicked for help, by playing hints or other help in recorded human voices. The help that the Reading Tutor provided balanced the student’s immediate goal of reading the word or sentence with the longer-term goal of helping the student learn to read (Aist and Mostow, 1997). Help included:
To place the baseline Reading Tutor in its research context, and clarify its differences with respect to similar software, we compare it here to other software. We focus on software that (a) helps with reading, (b) in a child’s first language, (c) using speech recognition. Readers who are interested in software outside these constraints may refer to (Aist, 1999) for an overview of speech recognition in second language learning, and (Schacter, 1999) for an overview of conventional and software-based reading instruction for a child’s first language. Whines (1999) provides a detailed comparison of some of the systems described below.
The Speech Training Aid (STAR) developed by DRA Malvern adapted automatic speech recognition to help children practice reading single isolated words (Russell et al., 1996). The 1997-98 Reading Tutor listened to children read connected, authentic text.
Talking and Listening Books, also described by Russell et al. (1996), used continuous text but employed word-spotting techniques to listen for a single word at a time.
Let’s Go Read (Edmark, 1997) incorporated speech recognition into a variety of single-phoneme and single-word exercises. The 1997-1998 Reading Tutor focused on assisted reading of authentic text.
Watch Me! Read (IBM, 1998, Williams et al., 2000) adapted speech recognition to teach reading from continuous text, but took a traditional talking-book approach using trade books with attractive pictures and relatively small amounts of text in small fonts. The 1997-1998 Reading Tutor placed primary emphasis on reading text, using child-friendly large fonts and a wide variety of reading materials.
How many words can we expect students to learn from the Reading Tutor? We can conceptualize this problem as a specialization of Equation 1, as follows (Equation 2).
We can split the reading that a student does into two categories: (a) reading with the Reading Tutor, and (b) everything else (outside the scope of this paper). In the case of reading with the Reading Tutor, “how much reading” translates into how many days a student has a session with the computer, and how many minutes each session lasts. How often the Reading Tutor gets used by whom for how long depends on who sets policy for Reading Tutor use, and in any event lies outside the scope of this thesis. Therefore, for the purposes of the present discussion we will take the number of days allocated for Reading Tutor use per year as externally determined, and likewise we consider the number of minutes of Reading Tutor use per day as also externally determined. How frequently we expect students to read with the Reading Tutor, and for how long each session, have varied for different studies and in different contexts of use.
We focus in this paper on the last two factors in equation 2: new words seen per story, and new words learned per word seen. First, students must encounter new words. Second, they must learn the meaning of new words when they encounter them. We modified Project LISTEN's Reading Tutor to be more effective at each of these tasks.
We next present the improvements we made to the Reading Tutor, along with experiments evaluating their effectiveness. We first present improvements to the Reading Tutor’s story choice policy and then summarize experiments on providing vocabulary help.
The tale of Reading Tutor story choice is one of finding a balance between students’ interests and the Reading Tutor’s educational goals. Children have their own agenda when using software, which may or may not match the desired educational outcome (Hanna et al., 1999). To allow students to read stories that interested them, and to increase students’ interest in what they are reading by maximizing learner control, the 1997-98 Reading Tutor allowed students free choice of any story on the Reading Tutor. Stories available included non-fiction, poems, and fictional narratives. We observed a number of problems with story choice, primarily that some students picked the same story to read many times (at the expense of reading new material), and that some students chose stories that were too easy. Either problem could substantially reduce the number of new words that students encountered. Therefore, we introduced a Take Turns story choice policy for the 1999-2000 Reading Tutor.
Take Turns consisted of three components:
Taking Turns. We can sum up taking turns as follows: “Every day decide randomly whether the student or the Reading Tutor chooses the first story to read, then take turns for the rest of the day.”
Reading Tutor story choice. The 1999-2000 Reading Tutor assigned each student to a recommended reading level (RRL) based on the student’s age, and adjusted the RRL based on the student’s performance. The Reading Tutor tried to pick new stories at the student’s RRL. If no story was available at the RRL, the Reading Tutor chose a harder story.
Student story choice. When it was the student's turn to choose a story, the student was free to choose any Reading Tutor story to read. The student could also choose to write and (optionally) narrate a story.
We also simplified the menu interaction (Aist and Mostow, 2000). The 1997-1998 Reading Tutor required at least two clicks for story choice: one click to select and another click to confirm. The 1999-2000 Reading Tutor required only one click to select a story (Figure 3), but allowed use of the Back button to return to the story choice screen and choose again.
In the spring of 1998, 24 students in grades 2, 4, and 5 at a low-income urban elementary school used the Reading Tutor with a student-only story choice policy. In the fall of 1999, 60 students in grades 2 and 3 at a (different) low- to middle-income urban elementary school used a revised version of the Reading Tutor that took turns picking stories. We used the statistical software package SPSS (version 9.0) to conduct a univariate analysis of variance (ANOVA) with grade as a covariate. The analysis of variance revealed a significant difference in favor of Fall 1999 on the rate of new material encountered between the two conditions, significant at 90%: F=3.25, p = .075. The students who used the Take Turns Reading Tutor in 1999 read 64.1% new sentences out of ~35,000 sentences overall. This was a significantly higher percentage of new material than the 60.1% for the ~10,000 sentences read by the students who used the spring 1998 student-only story choice policy Reading Tutor. Furthermore, the Reading Tutor’s story choices helped the most for those who did not choose new stories themselves: about half of the students picked less than half new stories on their own, with some choosing as low as 15% new stories. With the Reading Tutor’s choices included, all students read about 50% or more new stories.
Other issues remain in terms of story choice in the Reading Tutor. For example, we might want to encourage students to pick good stories when it’s their turn. In addition, it was possible for a student to spend a lot of time on an activity or story he or she chose – to the exclusion of further Reading Tutor choices that same day. Thus we might want to ensure that the total amount of time spent reading Reading Tutor-chosen stories is approximately equal to the amount of time spent reading student-chosen stories (Jack Mostow, personal communication).
Aist and Mostow (in press) describes this evaluation in more detail.
Reading material that contains new words is a requirement for learning new words from reading text. However, simply reading new and challenging stories may not be sufficient. Why? In particular, individual encounters with a word may not contain enough information to learn much about the word. How can text be augmented so students can learn more from an encounter with a word than they would have otherwise, without taking too much additional time away from reading the original text?
We decided to explore augmenting text with various kinds of vocabulary assistance, aimed at increasing the number of new words learned per new words seen in Equation 2. In our experiments, we compared augmented text to unaugmented text – because if the augmentation does not help beyond text alone, adding augmentation would probably just waste the student’s time.
By augmenting stories with vocabulary help such as short context-specific explanations or WordNet-derived comparisons to other words, the Reading Tutor can help students learn words better than they would from simply reading the unaugmented stories.
We augmented text with “factoids”: automatically constructed comparisons of a target word to a different word drawn from WordNet (Fellbaum, 1998), an electronic lexical database originally developed by George Miller and colleagues at Princeton. For example, a factoid might compare the word astronaut to its hypernym (more general term) traveler. A four-month study conducted in Fall 1999 compared text alone vs. text plus such factoids as follows. A control trial consisted of a student seeing a target word in a sentence and (on a later day) answering an automatically constructed multiple choice vocabulary question on the target word. An experimental trial added a “factoid” presented prior to displaying the target word in the sentence (Figure 4). Sometimes the comparison word was the correct answer in the multiple-choice question, and sometimes a different word. In total, over 3000 trials were completed. In order to explore the effect of factoids on getting the question right, we built logistic regression models (Menard, 1995) using SPSS. (Aist, 2000: chapter 4 provides details.) There was no significant difference overall between experimental and control conditions; however, exploratory analysis identified conditions in which factoids might help. In particular, story plus factoid was more effective than story alone for the 317 trials on single-sense, rare words tested one or two days later, as well as for third graders seeing rare words. (Here, “rare” means occurring less than 20 times in the Brown corpus (Kucera and Francis, 1967, Francis and Kucera, 1971).)
While factoids helped for some words, there were shortcomings. Factoids were provided on words for which it was easy to generate factoids – which are not necessarily the same as the words where explanations were needed. The factoids were not necessarily the best additional information to provide about a word; for example, the Reading Tutor might have been better off saying that an astronaut was a space traveler – not just a traveler. Finally, some of the words used in the factoids were actually more difficult than the target words – not less difficult, as we would have preferred. Ultimately, we would like to have a new lexical relation defined: “explains”, where w1 explains w2 iff and only if w1 and w2 are synonyms and w1 is an easier word (or phrase) than w2.
Aist (2001) describes the factoid study in more detail.
So far, we have described two enhancements to computer-assisted oral reading. We first described how we changed the Reading Tutor’s story choice policy to have the computer and the student take turns picking stories. We showed that Take Turns resulted in students reading more new material than they presumably would have on their own. We then discussed how we enriched text with vocabulary assistance in the form of automatically generated factoids like “astronaut can be a kind of traveler. Is it here?” We showed that at least for single-sense rare words tested the next day, and for third graders seeing rare words, text augmented with factoids prepared students to answer future multiple-choice questions about words better than did the same text without such assistance.
So, the changes we made improved the baseline 1997-98 Reading Tutor. But how did the new and improved Reading Tutor with Take Turns and factoids compare to other methods of helping children learn to read? Specifically, how did the 1999-2000 Reading Tutor compare to other reading instruction, on measures of vocabulary learning? In this section we summarize relevant parts of a larger 1999-2000 study that we helped design and analyze, but which was carried out primarily by other Project LISTEN team members. This larger study was not intended primarily to evaluate vocabulary assistance, but did contain comparisons of the modified Reading Tutor to other ways of teaching reading.
Here we summarize one aspect – vocabulary learning – of a larger study comparing computerized oral reading tutoring to classroom instruction and one-on-one human tutoring. 144 students in second and third grade were assigned to one of three conditions: (a) classroom instruction, (b) classroom instruction with one-on-one tutoring replacing part of the school day, and (c) computer instruction replacing part of the school day. For second graders, there were no significant differences between treatments in word comprehension gains. For third graders, however, the computer tutor showed an advantage over classroom instruction for gains in word comprehension (p = 0.042, effect size = 0.56) as measured by the Woodcock Reading Mastery Test (American Guidance Service, n.d.). One-on-one human tutoring also showed an advantage over classroom instruction alone (p = 0.039, effect size = 0.72). Computer tutoring and one-on-one human tutoring were not significantly different in terms of word comprehension gains. Aist et al. (2001) provides further details on results from the Word Comprehension measure; Mostow et al. (2001) on the overall study; the analyses described in this paragraph were carried out by Brian Tobin, Jack Mostow, and the present author.
So, students using the Reading Tutor averaged higher gains than their cohorts in human tutor rooms. However, the experimental design precluded a (strong) within-room comparison, so we cannot assign credit unequivocally to the Reading Tutor. In fact it is possible that teacher effects may have accounted for some or perhaps all of the difference. A more definitive analysis belongs in a separate paper on this year-long controlled study – under preparation as this article went to press.
Follow-on experiments explored ways to make vocabulary assistance even more effective, such as adding short child-friendly explanations to text. We describe these experiments briefly here; more information is presented in the Appendix.
An initial test confirmed that even low-reading students could understand short explanations well enough to do better on immediate multiple-choice questions than without such explanations. The factoid experiment had shown results only for a restricted set of students and words. We wanted to make sure students in our intended population (elementary students at a low-income urban school) could read and understand definitions well enough to make use of the information in them above and beyond just the original text. The treatment conditions were (a) experimental condition: text plus short explanations such as “COMET: A big ball of dirty ice and snow in outer space.”, and (b) text plus a nonsemantic help sentence such as “COMET: Comet starts with C.” We compared students’ performance on a five-item matching task (on paper, attached to the passages). Results confirmed that students did better with explanations: Explanations held an advantage over nonsemantic help. For the 41 students who had just completed grades 2-5 who completed the protocol, analysis of variance (ANOVA) including a term for age showed a significant effect of definition on the matching task (p = .041). A t-test paired by student to compare the same student’s responses in different categories showed that the definition helped: for the text that included definitions, students averaged 2.5 items right vs. 1.8 items right for the text plus nonsemantic help (p = .007). Thus, students were able to make use of the information in a definition above and beyond the simple effect of an additional exposure.
A within-subject experiment in summer 2000 measured word familiarity and word knowledge on eight (difficult) words with a paper test given one or two days after exposure to those words in one of four conditions: no exposure, definition alone, children’s limerick alone, or definition plus children’s limerick. The stories were children’s limericks by Edward Lear (19th cent.) There were eight limericks, with one target word each. The words were dolorous, laconic, imprudent, innocuous, mendacious, oracular, irascible, and vexatious. The texts thus controlled for many factors, such as genre, author, intended audience, word frequency, part of speech, and general semantic class (adjectives describing personality traits.)
An example of a limerick:
We wrote the definitions for the target words to be as syntactically similar as possible. Each definition explained the words in ordinary language, following the advice given in McKeown (1993). For example: We can say someone is dolorous if they are mournful, or feel really bad.
Results were as follows. Definitions increased all students’ familiarity with the words, and limericks yielded a strong trend favoring increased familiarity. Also, while 2nd and 3rd graders performed essentially at chance on word knowledge, 4th and 5th graders learned enough from reading stories and definitions with the Reading Tutor to do better on word knowledge. This study furthermore ruled out the word recency effect as an explanation, since none of the words in the definitions or limerick showed up as answers on the multiple choice test. This experiment also shed light on the relationship between word familiarity and word knowledge: the correlation between word familiarity and knowledge was larger in higher grades. Limericks may have been more effective at strengthening the tie between word familiarity and word knowledge – a direction for future research.
Reading is a key skill in the information economy. Reading is comprehension: making meaning from print. Vocabulary underlies comprehension. We began with computer-assisted oral reading, and proceeded as follows. Improved story choice helped students encounter new material. Factoids comparing words in text to other words helped some students learn words. The Reading Tutor with Take Turns and factoids did better than a classroom control for 3rd graders on vocabulary learning – and even did comparably with one-on-one human tutoring. Finally, follow-on experiments pointed the way towards delivering improved vocabulary assistance.
Let us put these results in perspective. We note that the National Reading Panel (2000) observed that most vocabulary studies show effects only on experimenter-designed measures – not on standardized tests like the Woodcock Reading Mastery Test (WRMT). Such standardized tests measure vocabulary so crudely that it is hard to achieve significant results when evaluating vocabulary growth, and even more so to show differences in growth between treatments. Our research along this line of inquiry did in fact achieve significant results on the Word Comprehension section of the Woodcock Reading Mastery Test. Furthermore, we introduced and used two finer-grained techniques. First, we measured new material read; second, we had the Reading Tutor administer computer-constructed, in-context vocabulary questions as part of an embedded experiment (Mostow and Aist, 2001; cf. Walker, 2000), encountered in the course of normal Reading Tutor use.
To achieve this goal, we built on a foundation of computer-assisted oral reading: Project LISTEN’s Reading Tutor. Then, we developed, incorporated, and evaluated two improvements. First, we made the Reading Tutor take turns picking stories, which not only guaranteed that every student saw ~50% or more new material, but helped those students most who chose the fewest new stories themselves. (Such students were presumably those who needed the most practice in reading new text.) Second, we added automatically generated vocabulary assistance in the form of factoids – short comparisons to other words – and automatically generated vocabulary assessment in the form of multiple-choice questions. The factoids helped students answer multiple-choice questions – but only for third graders seeing rare words, and for single-sense rare words tested one or two days later. The multiple-choice questions explicitly operationalized Nagy et al.’s (1985) criteria for difficult (their Level 3) multiple choice questions, as discussed at length in (Aist, 2000: Section 184.108.40.206). Besides the factoids results, correlating the multiple-choice questions with the Word Comprehension subtest of the Woodcock Reading Mastery Test demonstrated some validity (Aist, 2000: for details).
Follow-on experiments pointed the way towards even more effective vocabulary help, by presenting students with in-context explanations. Students who had just finished 2nd through 5th grade gained word familiarity from exposure to words in the Reading Tutor, while 4th and 5th graders gained word knowledge from definitions as well.
Along the way, we used a variety of techniques, on timescales ranging from seconds to minutes to days to months (cf. Newell, 1990’s time scale of human behavior). A story took seconds or minutes to choose, and minutes to read. We measured the effects of different story choice policies in the cumulative distribution of story choices over several months. Vocabulary assistance takes seconds to construct and present, and seconds to minutes to read. We measured the effects of vocabulary assistance either immediately (as in the comets and meteors experiment) or on a subsequent day (as in the factoids experiment and the limericks experiment). Finally, reading with the Reading Tutor took ~20 minutes/day for an entire school year – and we measured its effects during a yearlong study.
We set out to demonstrate two claims, framed as improvements over factors in Equation 2 (which we repeat here):
First, by taking turns picking stories, an automated tutor that listens to children read aloud did indeed ensure that students read more new material than just their own choices would provide. In fact, students who chose the fewest new stories themselves benefited the most from the Reading Tutor’s story choices – presumably, such students needed the most practice reading new text. Second, by augmenting stories with semantic information about words, an automated reading tutor can help students learn words better than they would from the stories alone. Further experiments shed light on how to present effective vocabulary instruction, using short explanations of words. Finally, the 1999-2000 Reading Tutor with Take Turns and factoids outperformed a classroom control on Word Comprehension gains for third graders – and was even competitive with one-on-one human-assisted oral reading.
This work was supported in part by the National Science Foundation under Grant Nos. REC-9720348 and REC-9979894, and by the author’s National Science Foundation Graduate Fellowship and Harvey Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation or the official policies, either expressed or implied, of the sponsors or of the United States Government.
As with all research carried out within the context of a larger project, the present paper was enabled by previous work done by many on Project LISTEN; the project website lists personnel (http://www.cs.cmu.edu/~listen). This paper is a summary version of Aist (2000); we thank anonymous Educational Technology and Society reviewers for their comments, Brian Junker for statistical advice, and Jack Mostow and Brian Tobin for reading and/or commenting on earlier drafts of this paper and of the analyses we presented here. Any remaining problems are of course the sole responsibility of the author.
This paper is based on the work done while the author was with Project LISTEN, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213 USA.
Table A.2. Summary of vocabulary help experiments