Educational Technology & Society 5 (2) 2002
ISSN 1436-4522

Helping Children Learn Vocabulary during Computer-Assisted Oral Reading

Gregory Aist
Research Institute for Advanced Computer Science
NASA Ames Research Center
Moffett Field CA 94035-1000 USA



This paper addresses an indispensable skill using a unique method to teach a critical component: helping children learn to read by using computer-assisted oral reading to help children learn vocabulary. We build on Project LISTEN’s Reading Tutor, a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read ( To learn a word from reading with the Reading Tutor, students must encounter the word and learn the meaning of the word in context. We modified the Reading Tutor first to help students encounter new words and then to help them learn the meanings of new words. We then compared the Reading Tutor to classroom instruction and to human-assisted oral reading as part of a yearlong study with 144 second and third graders. The result: Second graders did about the same on word comprehension in all three conditions. However, third graders who read with the 1999 Reading Tutor, modified as described in this paper, performed statistically significantly better than other third graders in a classroom control on word comprehension gains – and even comparably with other third graders who read one-on-one with human tutors.

Keywords: Reading, Children, Spoken dialog systems, Intelligent tutoring systems


This paper addresses an indispensable skill using a unique method to teach a critical component: helping children learn to read by using computer-assisted oral reading to help children learn vocabulary. Why should you read this paper? Literacy matters: The increasing demands of the information economy require higher and higher standards of reading ability from everyone, not just the privileged few. There is a clear need for better tools for literacy development: The United States Department of Education’s National Assessment of Educational Progress reported that 69% of American fourth graders read below desired proficiency; 38% were below even the basic level (Donahue et al., 1999). Vocabulary knowledge plays a critical role in reading, by enabling and facilitating comprehension (Snow, Burns, and Griffin, 1998). Using computers to boost vocabulary learning holds promise for offering children frequent, engaging practice with the meanings of words.

We focus in on one aspect of vocabulary learning, as follows. First, we focus on learning words during assisted oral reading. Second, we concentrate on initial encounters with words. Third, we subdivide vocabulary learning from initial encounters in text into two stages: encountering new words in text, and learning from those encounters. We demonstrate improvements over baseline computer-assisted oral reading by: (a) making sure that all students – not just better students – see new text; (b) adding information to text so that children can learn more from encounters with words than they would from the original text alone.

Our novel approach builds on a recent advance in computer technology as applied to reading: computer-assisted oral reading. We leverage others’ work by building on a software platform representing years of multidisciplinary endeavor: Project LISTEN’s Reading Tutor, a computer tutor listens to children read aloud, and helps them learn to read (Mostow and Aist, 2001). The Reading Tutor is research software based on years of research and development, but is not (yet) a commercial product ( We situate our work in real classrooms at two Pittsburgh-area schools: Fort Pitt Elementary School, in a low-income neighborhood of Pittsburgh, and Centennial Elementary School, in a moderate-to-low income urban neighborhood near Pittsburgh. Computers’ bookkeeping capability enables us to carry out finely detailed in-classroom experiments with massive samples recorded in excruciating detail.

Our results break new ground in several fields of research. For those interested in computer-assisted oral reading, we demonstrate improvements over Project LISTEN’s baseline system prior to our research. For those working on intelligent tutoring systems, we operationalize a hybrid method for deciding which task to work on next: taking turns. For reading researchers, our experiments illuminate the relative merits of natural text and artificially constructed vocabulary help, and provide an example of automatically generated vocabulary assessment. Furthermore, our follow-on experiments demonstrate that children as young as fourth and fifth grades can acquire some word familiarity and word knowledge from as little as 1-2 exposures to a novel word.

The main contribution of the paper is to lay out a framework for studying how to boost children’s vocabulary learning in the context of computer-assisted oral reading, summarize results from several years of empirical research in this area, and draw conclusions and make suggestions for further work.


Learning the meaning of a new word

How can we help children learn new words? We consider two primary methods: direct instruction and learning through reading; and a hybrid: adding information to text.

Direct instruction. Intensive study of specific vocabulary words results in solid knowledge of the taught words, but at a high cost in time. For example, a 1983 study taught fourth graders 104 words over a five-month period, with 75 lessons of approximately 30 minutes each – on average about 21 minutes of instructional time per target word (McKeown et al., 1983).  Exposures were during varied tasks: “matching words and definitions, associating a word with a context, creating contexts for words, and comparing and contrasting words to discover relationships” (McKeown et al. JRB 1983). In the high-exposure group of words, students saw 26-40 exposures; even for the low-exposure words, students saw 10-18 exposures – a substantial amount of instructional time. Beck and McKeown (1991) suggest that “the problem that effective instruction takes time can be alleviated by targeting instruction toward the most useful words” (Beck and McKeown, 1991). Which words? Second-tier vocabulary (McKeown 1993), that is, words that are “of high frequency in a mature vocabulary and of broad utility across domains of knowledge” (Beck and McKeown, 1991). Even so, direct instruction may play a role for certain critical words (Zechmeister et al., 1995), but a full-fledged instructional lesson is too time-consuming to use for every new word.

Reading. Children can learn words from written contexts (Nagy et al., 1985, McKeown, 1985, Gipe and Arnold, 1978), but the process is incremental.  That is, the amount learned from each exposure may be small, but the net effect is still substantial (Eller, Pappas, and Brown, 1988).  Also, readers with better vocabularies learn more from context – because of broader and deeper prior knowledge of words – even though less of the information in the text is new to them than to readers with poorer vocabularies (Shefelbine, 1990).

Reading offers hope for spurring vocabulary growth, if children can be guided to read material that does in fact contain unknown words. Carver (1994) argues that “students must read books above their independent level in order to consistently run into unknown words, that is, about 1, 2, or 3 unknown words for each 100 words of text”. Easier text simply does not contain enough new words to substantially impact children’s vocabulary learning (Carver, 1994).

Is simple exposure to multiple contexts sufficient for all readers to learn new words?  Perhaps – or perhaps not. McKeown (1985) studied how high- and low-ability students learn words from context. McKeown’s (1985) study examined 15 fifth-graders who, at the end of fourth grade, had scored between grade equivalent 3.3 and grade equivalent 4.1 on the Vocabulary section of the Stanford Achievement Test (Madden et al., 1973). These low-reading fifth graders had trouble learning words from context both because of incorrect inferences about the meaning of a word from context and because of difficulty in deriving meaning from multiple sentence contexts. Even the 15 higher-ability students in McKeown’s (1985) study, who had scored above grade equivalent 4.8 on the Stanford Vocabulary subtest, had some trouble integrating multiple sentence contexts to derive meaning.

There has been some work aimed at teaching children how to learn words from context, but the major effect may be due to practice at learning new words from context and not due to teaching a specific strategy (Kuhn and Stahl, 1998.)  Kuhn and Stahl (1998) conclude that “Ultimately, increasing the amount of reading that children do seems to be the most reliable approach to improving their knowledge of word meanings, with or without additional training in learning words from context.” As Schwanenflugel et al. (1997) put it, “… the vast majority of a person's word growth can be accounted for by exposure to words in written and oral contexts, not through direct instruction of some sort, but individual encounters with a word in a natural context are not likely to yield much useful information about that word.”

Adding information to text. Can the context in which a word appears be augmented in some way to make it more useful for learning the word? Typical dictionary definitions may not be written to suit the learner's needs; explanations written to convey the core sense of the word in plain and simple language work better (McKeown, 1993). Presenting context-specific definitions in computer-mediated text has been shown to be helpful for vocabulary acquisition, at least for sixth graders (Reinking and Rickman, 1990). Adding information to text is a hybrid of direct instruction and learning from reading text: first, start with a text to read; second, add brief, targeted instruction about words to the text.

In this paper, we investigate learning words by reading connected text during computer-assisted oral reading.


Learning vocabulary from assisted oral reading

We now lay out an informal model of the process of learning vocabulary during assisted oral reading. We intend this to be a conceptual framework useful for identifying opportunities to improve vocabulary learning.  We will focus here on encountering a word for the first time, and on learning the meaning of a word.

We can characterize how many words a student learns in a day of assisted oral reading as shown in Equation 1.


Equation 1. New words learned per day of assisted oral reading

We define our main claim for this paper as follows, in the context of Equation 1. We can help children learn vocabulary during assisted oral reading by (a) helping them encounter new words, and (b) helping them learn new words they encounter. We aim to help children encounter new words by increasing how much new material students read – not a guaranteed outcome when students have substantial control over their interaction with the software. We aim to help children learn new words they encounter by augmenting text to facilitate better learning than possible with the unaugmented text – not a guaranteed outcome since reading is already a reasonable way to build vocabulary. We verify each of these claims by empirical tests of modifications to Project LISTEN’s Reading Tutor, a computer program that listens to children read aloud and helps them learn to read (Mostow and Aist, 1999).

The remainder of this paper is as follows. First, we present the 1997-98 baseline version of Project LISTEN's Reading Tutor. Next, we summarize how we modified the Reading Tutor to help children encounter new words. Then, we describe how we modified the Reading Tutor to help children learn the meaning of new words. After that, we summarize results relevant to vocabulary learning from a yearlong study of the modified Reading Tutor against classroom instruction and one-on-one human tutoring. We then present further experiments in vocabulary help. Finally, we summarize the contributions of this line of research to date.

We now turn to describing the Reading Tutor.


Project LISTEN's Reading Tutor

This paper builds on a larger research project with years of history and publications: Project LISTEN. Project LISTEN’s Reading Tutor listens to children read aloud, and helps them learn to read. A detailed overview of the history of Project LISTEN lies outside the scope of this paper. Mostow and Aist (2001) provide further information. Here we simply inform the reader of enough previous results to set our work in context.

1994 Reading Coach. A predecessor to the Reading Tutor, Project LISTEN’s Reading Coach provided assistance in oral reading (Mostow et al., 1994; see Mostow et al., 1993 for earlier work). In a 1994 study, 34 second graders comprehended a challenging third-grade passage 40% better with Reading Coach assistance than without (Mostow and Aist, 2001), as measured by a comprehension test administered immediately after students had read the passages being tested. In that study, there was no assistive effect for an easier passage.

1996-1997 pilot study. Iterative redesign of the Reading Coach with concurrent usability testing resulted in the 1996 version of the Reading Tutor (Mostow et al., 1995, Mostow, 1996). In a 1996 pilot study reported in Mostow and Aist (2001), 8 bottom 3rd graders at a low-income urban elementary school (Fort Pitt Elementary) used the 1996 Reading Tutor in a small room under individual supervision by a school aide. The six students who completed the study (one moved away; another was unavailable for post-testing) averaged a 2-year gain in eight months from pre-test to post-test on a school-administered Informal Reading Inventory.

Summer 1997 Reading Clinic. During the summer of 1997, 62 students in grades K-5 used the Reading Tutor during a reading clinic at a low-income urban elementary school (Fort Pitt Elementary). Concurrently, “the Reading Tutor underwent major design revisions of the “frame activities” – logging in and picking a story to read – to enable classroom-based use” (Mostow and Aist, 2001).

1997-1998 formative and controlled studies. As Mostow and Aist report (2001):

“During 1997-1998, students in 11 classrooms at an urban elementary school [Fort Pitt Elementary] used the Reading Tutor as part of a formative study to explore use of the Reading Tutor in a regular classroom setting.  In Spring 1998, 63 students [completed the study – out of 72 who started – and] either read with the Reading Tutor, used commercial software, or received conventional instruction including other computer use.  The Reading Tutor group gained significantly more than statistically matched classmates in the conventional instruction group on the Passage Comprehension subtest of the Woodcock Reading Mastery Test, even with only a fraction of the planned daily 20-minute sessions.  No other significant differences were found.” 

The 1997-1998 study assessed Word Attack, Word Identification, Passage Comprehension, and fluency – but not Word Comprehension.

We now describe the 1997-1998 Reading Tutor, our baseline system.

The Reading Tutor displays one sentence at a time to the student, and listens to the student read all or part of the sentence aloud, and responds expressively using recorded human voices. The Reading Tutor lets children read stories from a variety of genres, including nonfiction, fictional narratives, and poems.

The design of the 1997-1998 Reading Tutor focused on independent classroom use. Figure 1 shows a student reading with the Reading Tutor while the teacher worked with the rest of the class. A prototypical student session consisted of the following steps: log on, choose a story to read, read part or all of the story, (perhaps) choose and read more stories, and finally log off.

The core interaction was when the student read a story aloud, with the Reading Tutor’s help (Figure 2). The Reading Tutor responded when it heard mistakes or when the student clicked for help, by playing hints or other help in recorded human voices. The help that the Reading Tutor provided balanced the student’s immediate goal of reading the word or sentence with the longer-term goal of helping the student learn to read (Aist and Mostow, 1997). Help included:

  1. Read the entire sentence using a recording of a human narrator’s fluent reading, to model correct reading. While playing the (continuous) recording, the Reading Tutor would highlight each word as it was spoken, which we call word-by-word highlighting.
  2. Read the entire sentence by playing back isolated recordings of a single word at a time, in order to allow students to hear one word read at a time. Because these recordings may be in different voices, we call word-by-word playback “ransom note” help – after the prototypical ransom note made up of words in various fonts cut out from newspaper and pasted together.
  3. Recue a word by playing an excerpt from the sentence narration of the words leading up to that word (along with word-by-word highlighting), in order to prompt the student to try (re-) reading the word. For example: If the text is Jack and Jill went up the hill to fetch a pail of water, the Reading Tutor could recue hill by first reading Jack and Jill went up the out loud, and then underlining the word hill to prompt the student to read it.
  4. Give a rhyming hint that matches both the sound (phoneme sequence) and the letters (grapheme sequence) of the target word, in order to give a hint on how to read the target word, and to expose the student to related words. For example, if the word is hill, give the word fill as a spoken and displayed rhyming hint, but not the word nil because its spelling does not match.
  5. Decompose a word, syllable-by-syllable or phoneme-by-phoneme, to model the process of sounding out words and to call attention to letter-to-sound mappings. For example, say /h/ while highlighting h, then say /i/ while highlighting i, then say /l/ while highlighting ll.
  6. Show a picture for a word, in order to demonstrate word meaning and to increase engagement. For example, if the word is apple, show a drawing of an apple. Fewer than 200 words had pictures in the 1997-1998 version.
  7. Play a sound effect, perhaps to demonstrate word meaning but primarily to increase engagement. For example, if the word is lion, play the roar of a lion. Fewer than 50 words had sound effects in the 1997-98 version; most were names of animals, such as seagulls, tiger, and wolf.


Figure 1. A student reads with the Reading Tutor while the teacher teaches the rest of the class (Photo credit: Debra Tobin)

Figure 2. Reading a story in the 1997-1998 Reading Tutor

Comparison of baseline 1997-1998 Reading Tutor and other software

To place the baseline Reading Tutor in its research context, and clarify its differences with respect to similar software, we compare it here to other software. We focus on software that (a) helps with reading, (b) in a child’s first language, (c) using speech recognition. Readers who are interested in software outside these constraints may refer to (Aist, 1999) for an overview of speech recognition in second language learning, and (Schacter, 1999) for an overview of conventional and software-based reading instruction for a child’s first language. Whines (1999) provides a detailed comparison of some of the systems described below.

The Speech Training Aid (STAR) developed by DRA Malvern adapted automatic speech recognition to help children practice reading single isolated words (Russell et al., 1996). The 1997-98 Reading Tutor listened to children read connected, authentic text.

Talking and Listening Books, also described by Russell et al. (1996), used continuous text but employed word-spotting techniques to listen for a single word at a time.

Let’s Go Read (Edmark, 1997) incorporated speech recognition into a variety of single-phoneme and single-word exercises. The 1997-1998 Reading Tutor focused on assisted reading of authentic text.

Watch Me! Read (IBM, 1998, Williams et al., 2000) adapted speech recognition to teach reading from continuous text, but took a traditional talking-book approach using trade books with attractive pictures and relatively small amounts of text in small fonts. The 1997-1998 Reading Tutor placed primary emphasis on reading text, using child-friendly large fonts and a wide variety of reading materials.


Learning vocabulary in the Reading Tutor

How many words can we expect students to learn from the Reading Tutor? We can conceptualize this problem as a specialization of Equation 1, as follows (Equation 2).


Equation 2. Words learned per day on the Reading Tutor (RT)

We can split the reading that a student does into two categories: (a) reading with the Reading Tutor, and (b) everything else (outside the scope of this paper). In the case of reading with the Reading Tutor, “how much reading” translates into how many days a student has a session with the computer, and how many minutes each session lasts.  How often the Reading Tutor gets used by whom for how long depends on who sets policy for Reading Tutor use, and in any event lies outside the scope of this thesis.  Therefore, for the purposes of the present discussion we will take the number of days allocated for Reading Tutor use per year as externally determined, and likewise we consider the number of minutes of Reading Tutor use per day as also externally determined. How frequently we expect students to read with the Reading Tutor, and for how long each session, have varied for different studies and in different contexts of use.


Goal: Help students encounter new words; help students learn new words once encountered

We focus in this paper on the last two factors in equation 2: new words seen per story, and new words learned per word seen. First, students must encounter new words.  Second, they must learn the meaning of new words when they encounter them.  We modified Project LISTEN's Reading Tutor to be more effective at each of these tasks.

We next present the improvements we made to the Reading Tutor, along with experiments evaluating their effectiveness. We first present improvements to the Reading Tutor’s story choice policy and then summarize experiments on providing vocabulary help.


Improving story choice

The tale of Reading Tutor story choice is one of finding a balance between students’ interests and the Reading Tutor’s educational goals. Children have their own agenda when using software, which may or may not match the desired educational outcome (Hanna et al., 1999).  To allow students to read stories that interested them, and to increase students’ interest in what they are reading by maximizing learner control, the 1997-98 Reading Tutor allowed students free choice of any story on the Reading Tutor. Stories available included non-fiction, poems, and fictional narratives. We observed a number of problems with story choice, primarily that some students picked the same story to read many times (at the expense of reading new material), and that some students chose stories that were too easy. Either problem could substantially reduce the number of new words that students encountered. Therefore, we introduced a Take Turns story choice policy for the 1999-2000 Reading Tutor.

Take Turns consisted of three components:

  1. A mechanism to let the Reading Tutor and the student take turns choosing stories,
  2. A mechanism for use when the Reading Tutor picked stories.
  3. A mechanism for allowing students to choose stories.

Taking Turns. We can sum up taking turns as follows: “Every day decide randomly whether the student or the Reading Tutor chooses the first story to read, then take turns for the rest of the day.”

Reading Tutor story choice. The 1999-2000 Reading Tutor assigned each student to a recommended reading level (RRL) based on the student’s age, and adjusted the RRL based on the student’s performance. The Reading Tutor tried to pick new stories at the student’s RRL.  If no story was available at the RRL, the Reading Tutor chose a harder story. 

Student story choice. When it was the student's turn to choose a story, the student was free to choose any Reading Tutor story to read. The student could also choose to write and (optionally) narrate a story.

We also simplified the menu interaction (Aist and Mostow, 2000). The 1997-1998 Reading Tutor required at least two clicks for story choice: one click to select and another click to confirm. The 1999-2000 Reading Tutor required only one click to select a story (Figure 3), but allowed use of the Back button to return to the story choice screen and choose again.


Figure 3. Story choice screen, fall 1999. The Reading Tutor spoke the prompt displayed at the top (here, “Greg, choose a level C story to read”) and then read each item out loud. “More Level C Stories” showed more stories at the current level, level C in this example.

In the spring of 1998, 24 students in grades 2, 4, and 5 at a low-income urban elementary school used the Reading Tutor with a student-only story choice policy. In the fall of 1999, 60 students in grades 2 and 3 at a (different) low- to middle-income urban elementary school used a revised version of the Reading Tutor that took turns picking stories. We used the statistical software package SPSS (version 9.0) to conduct a univariate analysis of variance (ANOVA) with grade as a covariate. The analysis of variance revealed a significant difference in favor of Fall 1999 on the rate of new material encountered between the two conditions, significant at 90%: F=3.25, p = .075. The students who used the Take Turns Reading Tutor in 1999 read 64.1% new sentences out of ~35,000 sentences overall. This was a significantly higher percentage of new material than the 60.1% for the ~10,000 sentences read by the students who used the spring 1998 student-only story choice policy Reading Tutor. Furthermore, the Reading Tutor’s story choices helped the most for those who did not choose new stories themselves: about half of the students picked less than half new stories on their own, with some choosing as low as 15% new stories. With the Reading Tutor’s choices included, all students read about 50% or more new stories.

Other issues remain in terms of story choice in the Reading Tutor. For example, we might want to encourage students to pick good stories when it’s their turn. In addition, it was possible for a student to spend a lot of time on an activity or story he or she chose – to the exclusion of further Reading Tutor choices that same day. Thus we might want to ensure that the total amount of time spent reading Reading Tutor-chosen stories is approximately equal to the amount of time spent reading student-chosen stories (Jack Mostow, personal communication).

Aist and Mostow (in press) describes this evaluation in more detail.


Automatically generated vocabulary assistance and assessment: The factoids experiment

Reading material that contains new words is a requirement for learning new words from reading text.  However, simply reading new and challenging stories may not be sufficient. Why? In particular, individual encounters with a word may not contain enough information to learn much about the word. How can text be augmented so students can learn more from an encounter with a word than they would have otherwise, without taking too much additional time away from reading the original text?

We decided to explore augmenting text with various kinds of vocabulary assistance, aimed at increasing the number of new words learned per new words seen in Equation 2. In our experiments, we compared augmented text to unaugmented text – because if the augmentation does not help beyond text alone, adding augmentation would probably just waste the student’s time.

By augmenting stories with vocabulary help such as short context-specific explanations or WordNet-derived comparisons to other words, the Reading Tutor can help students learn words better than they would from simply reading the unaugmented stories.

We augmented text with “factoids”: automatically constructed comparisons of a target word to a different word drawn from WordNet (Fellbaum, 1998), an electronic lexical database originally developed by George Miller and colleagues at Princeton. For example, a factoid might compare the word astronaut to its hypernym (more general term) traveler. A four-month study conducted in Fall 1999 compared text alone vs. text plus such factoids as follows. A control trial consisted of a student seeing a target word in a sentence and (on a later day) answering an automatically constructed multiple choice vocabulary question on the target word. An experimental trial added a “factoid” presented prior to displaying the target word in the sentence (Figure 4). Sometimes the comparison word was the correct answer in the multiple-choice question, and sometimes a different word. In total, over 3000 trials were completed. In order to explore the effect of factoids on getting the question right, we built logistic regression models (Menard, 1995) using SPSS. (Aist, 2000: chapter 4 provides details.) There was no significant difference overall between experimental and control conditions; however, exploratory analysis identified conditions in which factoids might help. In particular, story plus factoid was more effective than story alone for the 317 trials on single-sense, rare words tested one or two days later, as well as for third graders seeing rare words. (Here, “rare” means occurring less than 20 times in the Brown corpus (Kucera and Francis, 1967, Francis and Kucera, 1971).)

While factoids helped for some words, there were shortcomings. Factoids were provided on words for which it was easy to generate factoids – which are not necessarily the same as the words where explanations were needed. The factoids were not necessarily the best additional information to provide about a word; for example, the Reading Tutor might have been better off saying that an astronaut was a space traveler – not just a traveler. Finally, some of the words used in the factoids were actually more difficult than the target words – not less difficult, as we would have preferred. Ultimately, we would like to have a new lexical relation defined: “explains”, where w1 explains w2 iff and only if w1 and w2 are synonyms and w1 is an easier word (or phrase) than w2.

Aist (2001) describes the factoid study in more detail.


Figure 4. Factoid in popup window

How well did the 1999-2000 Reading Tutor help children learn vocabulary?

So far, we have described two enhancements to computer-assisted oral reading. We first described how we changed the Reading Tutor’s story choice policy to have the computer and the student take turns picking stories. We showed that Take Turns resulted in students reading more new material than they presumably would have on their own. We then discussed how we enriched text with vocabulary assistance in the form of automatically generated factoids like “astronaut can be a kind of traveler. Is it here?” We showed that at least for single-sense rare words tested the next day, and for third graders seeing rare words, text augmented with factoids prepared students to answer future multiple-choice questions about words better than did the same text without such assistance.

So, the changes we made improved the baseline 1997-98 Reading Tutor. But how did the new and improved Reading Tutor with Take Turns and factoids compare to other methods of helping children learn to read? Specifically, how did the 1999-2000 Reading Tutor compare to other reading instruction, on measures of vocabulary learning?  In this section we summarize relevant parts of a larger 1999-2000 study that we helped design and analyze, but which was carried out primarily by other Project LISTEN team members. This larger study was not intended primarily to evaluate vocabulary assistance, but did contain comparisons of the modified Reading Tutor to other ways of teaching reading.

Here we summarize one aspect – vocabulary learning – of a larger study comparing computerized oral reading tutoring to classroom instruction and one-on-one human tutoring. 144 students in second and third grade were assigned to one of three conditions: (a) classroom instruction, (b) classroom instruction with one-on-one tutoring replacing part of the school day, and (c) computer instruction replacing part of the school day. For second graders, there were no significant differences between treatments in word comprehension gains. For third graders, however, the computer tutor showed an advantage over classroom instruction for gains in word comprehension (p = 0.042, effect size = 0.56) as measured by the Woodcock Reading Mastery Test (American Guidance Service, n.d.). One-on-one human tutoring also showed an advantage over classroom instruction alone (p = 0.039, effect size = 0.72). Computer tutoring and one-on-one human tutoring were not significantly different in terms of word comprehension gains. Aist et al. (2001) provides further details on results from the Word Comprehension measure; Mostow et al. (2001) on the overall study; the analyses described in this paragraph were carried out by Brian Tobin, Jack Mostow, and the present author.

So, students using the Reading Tutor averaged higher gains than their cohorts in human tutor rooms. However, the experimental design precluded a (strong) within-room comparison, so we cannot assign credit unequivocally to the Reading Tutor. In fact it is possible that teacher effects may have accounted for some or perhaps all of the difference. A more definitive analysis belongs in a separate paper on this year-long controlled study – under preparation as this article went to press.


Further experiments on vocabulary help

Follow-on experiments explored ways to make vocabulary assistance even more effective, such as adding short child-friendly explanations to text. We describe these experiments briefly here; more information is presented in the Appendix.


Can (low-reading elementary) students make use of explanations?: The comets and meteors experiment

An initial test confirmed that even low-reading students could understand short explanations well enough to do better on immediate multiple-choice questions than without such explanations. The factoid experiment had shown results only for a restricted set of students and words. We wanted to make sure students in our intended population (elementary students at a low-income urban school) could read and understand definitions well enough to make use of the information in them above and beyond just the original text. The treatment conditions were (a) experimental condition: text plus short explanations such as “COMET: A big ball of dirty ice and snow in outer space.”, and (b) text plus a nonsemantic help sentence such as “COMET: Comet starts with C.” We compared students’ performance on a five-item matching task (on paper, attached to the passages). Results confirmed that students did better with explanations: Explanations held an advantage over nonsemantic help. For the 41 students who had just completed grades 2-5 who completed the protocol, analysis of variance (ANOVA) including a term for age showed a significant effect of definition on the matching task (p = .041).  A t-test paired by student to compare the same student’s responses in different categories showed that the definition helped: for the text that included definitions, students averaged 2.5 items right vs. 1.8 items right for the text plus nonsemantic help (p = .007).  Thus, students were able to make use of the information in a definition above and beyond the simple effect of an additional exposure.


Can explanations add to natural contexts?: The limericks experiment

A within-subject experiment in summer 2000 measured word familiarity and word knowledge on eight (difficult) words with a paper test given one or two days after exposure to those words in one of four conditions: no exposure, definition alone, children’s limerick alone, or definition plus children’s limerick. The stories were children’s limericks by Edward Lear (19th cent.) There were eight limericks, with one target word each. The words were dolorous, laconic, imprudent, innocuous, mendacious, oracular, irascible, and vexatious. The texts thus controlled for many factors, such as genre, author, intended audience, word frequency, part of speech, and general semantic class (adjectives describing personality traits.)

An example of a limerick:

There was an Old Man of Cape Horn,

Who wished he had never been born;

So he sat on a chair,

Till he died of despair,

That dolorous Man of Cape Horn.

We wrote the definitions for the target words to be as syntactically similar as possible. Each definition explained the words in ordinary language, following the advice given in McKeown (1993). For example: We can say someone is dolorous if they are mournful, or feel really bad.

Results were as follows. Definitions increased all students’ familiarity with the words, and limericks yielded a strong trend favoring increased familiarity. Also, while 2nd and 3rd graders performed essentially at chance on word knowledge, 4th and 5th graders learned enough from reading stories and definitions with the Reading Tutor to do better on word knowledge.  This study furthermore ruled out the word recency effect as an explanation, since none of the words in the definitions or limerick showed up as answers on the multiple choice test. This experiment also shed light on the relationship between word familiarity and word knowledge: the correlation between word familiarity and knowledge was larger in higher grades. Limericks may have been more effective at strengthening the tie between word familiarity and word knowledge – a direction for future research.



Reading is a key skill in the information economy. Reading is comprehension: making meaning from print. Vocabulary underlies comprehension. We began with computer-assisted oral reading, and proceeded as follows. Improved story choice helped students encounter new material. Factoids comparing words in text to other words helped some students learn words. The Reading Tutor with Take Turns and factoids did better than a classroom control for 3rd graders on vocabulary learning – and even did comparably with one-on-one human tutoring. Finally, follow-on experiments pointed the way towards delivering improved vocabulary assistance.

Let us put these results in perspective. We note that the National Reading Panel (2000) observed that most vocabulary studies show effects only on experimenter-designed measures – not on standardized tests like the Woodcock Reading Mastery Test (WRMT). Such standardized tests measure vocabulary so crudely that it is hard to achieve significant results when evaluating vocabulary growth, and even more so to show differences in growth between treatments. Our research along this line of inquiry did in fact achieve significant results on the Word Comprehension section of the Woodcock Reading Mastery Test. Furthermore, we introduced and used two finer-grained techniques. First, we measured new material read; second, we had the Reading Tutor administer computer-constructed, in-context vocabulary questions as part of an embedded experiment (Mostow and Aist, 2001; cf. Walker, 2000), encountered in the course of normal Reading Tutor use.

To achieve this goal, we built on a foundation of computer-assisted oral reading: Project LISTEN’s Reading Tutor. Then, we developed, incorporated, and evaluated two improvements. First, we made the Reading Tutor take turns picking stories, which not only guaranteed that every student saw ~50% or more new material, but helped those students most who chose the fewest new stories themselves. (Such students were presumably those who needed the most practice in reading new text.) Second, we added automatically generated vocabulary assistance in the form of factoids – short comparisons to other words – and automatically generated vocabulary assessment in the form of multiple-choice questions. The factoids helped students answer multiple-choice questions – but only for third graders seeing rare words, and for single-sense rare words tested one or two days later. The multiple-choice questions explicitly operationalized Nagy et al.’s (1985) criteria for difficult (their Level 3) multiple choice questions, as discussed at length in (Aist, 2000: Section Besides the factoids results, correlating the multiple-choice questions with the Word Comprehension subtest of the Woodcock Reading Mastery Test demonstrated some validity (Aist, 2000: for details).

Follow-on experiments pointed the way towards even more effective vocabulary help, by presenting students with in-context explanations. Students who had just finished 2nd through 5th grade gained word familiarity from exposure to words in the Reading Tutor, while 4th and 5th graders gained word knowledge from definitions as well.

Along the way, we used a variety of techniques, on timescales ranging from seconds to minutes to days to months (cf. Newell, 1990’s time scale of human behavior). A story took seconds or minutes to choose, and minutes to read. We measured the effects of different story choice policies in the cumulative distribution of story choices over several months. Vocabulary assistance takes seconds to construct and present, and seconds to minutes to read. We measured the effects of vocabulary assistance either immediately (as in the comets and meteors experiment) or on a subsequent day (as in the factoids experiment and the limericks experiment). Finally, reading with the Reading Tutor took ~20 minutes/day for an entire school year – and we measured its effects during a yearlong study.

We set out to demonstrate two claims, framed as improvements over factors in Equation 2 (which we repeat here):


Equation 2. New words learned per day on Reading Tutor

First, by taking turns picking stories, an automated tutor that listens to children read aloud did indeed ensure that students read more new material than just their own choices would provide. In fact, students who chose the fewest new stories themselves benefited the most from the Reading Tutor’s story choices – presumably, such students needed the most practice reading new text.  Second, by augmenting stories with semantic information about words, an automated reading tutor can help students learn words better than they would from the stories alone. Further experiments shed light on how to present effective vocabulary instruction, using short explanations of words. Finally, the 1999-2000 Reading Tutor with Take Turns and factoids outperformed a classroom control on Word Comprehension gains for third graders – and was even competitive with one-on-one human-assisted oral reading.



This work was supported in part by the National Science Foundation under Grant Nos. REC-9720348 and REC-9979894, and by the author’s National Science Foundation Graduate Fellowship and Harvey Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation or the official policies, either expressed or implied, of the sponsors or of the United States Government.

As with all research carried out within the context of a larger project, the present paper was enabled by previous work done by many on Project LISTEN; the project website lists personnel ( This paper is a summary version of Aist (2000); we thank anonymous Educational Technology and Society reviewers for their comments, Brian Junker for statistical advice, and Jack Mostow and Brian Tobin for reading and/or commenting on earlier drafts of this paper and of the analyses we presented here. Any remaining problems are of course the sole responsibility of the author.



This paper is based on the work done while the author was with Project LISTEN, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213 USA.



  • Aist, G. (1999). Speech recognition in computer assisted language learning. in K. C. Cameron (Eds.). Computer Assisted Language Learning (CALL): Media, Design, and Applications, Lisse: Swets & Zeitlinger, 165-182.
  • Aist, G. (2000). Helping Children Learn Vocabulary during Computer-Assisted Oral Reading. Ph.D. dissertation, Language Technologies Institute, Carnegie Mellon University.
  • Aist, G. (2001). Towards automatic glossarization: automatically constructing and administering vocabulary assistance factoids and multiple-choice assessment. International Journal of Artificial Intelligence in Education, 12(2), 212-231,
  • Aist, G. and Mostow, J. (2000). Improving story choice in a reading tutor that listens. Paper presented at the Fifth International Conference on Intelligent Tutoring Systems (ITS’2000), June 19-23, 2000, Montreal, Canada,
  • Aist, G. S. and Mostow, J.  (1997) Adapting human tutorial interventions for a reading tutor that listens: using continuous speech recognition in interactive educational multimedia. Paper presented at the CALL 97: Theory and Practice of Multimedia in Computer Assisted Language Learning conference, September 21-23, 1997, Exeter, UK,
  • Aist, G. S. and Mostow, J.  (in press). Faster, better task choice in a reading tutor that listens.  To appear in Philippe DeCloque, P. and Holland, M. (Eds.), Speech Technology for Language Learning, Lisse: Swets & Zeitlinger Publishers.
  • Aist, G. S., Mostow, J., Tobin, B., Burkhead, P., Corbett, A., Cuneo, A., Junker, B. and Sklar, M. B. (2001) Computer-assisted oral reading helps third graders learn vocabulary better than a classroom control — about as well as human-assisted oral reading.  Paper presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference, May 19-23, 2001, San Antonio, Texas.
  • American Guidance Service. (n.d.) Bibliography for Woodcock Reading Mastery Tests – Revised (WRMT-R),
  • Beck, I. and McKeown, M. (1991) Conditions of vocabulary acquisition. in Barr, R., Kamil, M., Mosenthal, P. and Pearson, P.D. (Eds.), Handbook of Reading Research vol. 2. Mahwah, New Jersey: Lawrence Erlbaum, 789-814.
  • Carver, R.P.  (1994) Percentage of unknown vocabulary words in text as a function of the relative difficulty of the text: Implications for instruction.  Journal of Reading Behavior, 26(4), 413-437.
  • Donahue, P. L., Voelkl, K. E., Campbell, J. R. and Mazzeo, J. (1999) NAEP 1998 Reading Report Card for the Nation and the States. Washington, DC: National Center for Education Statistics.
  • Edmark. (1997) Let’s Go Read,
  • Eller, R.G., Pappas, C.C. and Brown, E.  (1988) The lexical development of kindergarteners: Learning from written context.  Journal of Reading Behavior, 20(1), 5-24.
  • Fellbaum, C. (1998)  WordNet: An Electronic Lexical Database. Cambridge MA: MIT Press.
  • Francis, W. N. and Kucera, H. (1971) Brown Corpus Manual. Providence, RI: Brown University,
  • Gipe, J. P., and Arnold, R. D. (1978) Teaching vocabulary through familiar associations and contexts. Journal of Reading Behavior, 11(3), 281-285.
  • Hanna, L., Risden, K., Czerwinski, M., and Alexander, K. J. (1999) The role of usability research in designing children’s computer products. in Druin, A. (Eds.), The Design of Children’s Technology, San Francisco: Morgan Kaufmann, 3-26,
  • IBM.  (1998) Watch Me Read,
  • Kucera, H. and Francis, W. N. (1967) Computational analysis of present-day American English, Providence, RI: Brown University Press.
  • Kuhn, M. R. and Stahl, S. A. (1998) Teaching children to learn word meanings from context: A synthesis and some questions. Journal of Literacy Research, 30(1), 119-138.
  • Lear, E. (19th c.) The Book of Nonsense,
  • Madden, R., Gardner, E. F., Rudman, H. C., Karlsen, B. and Merwin, J. C. (1973) Stanford Achievement Test, New York: Harcourt, Brace, Jovanovich, Inc. Cited in (McKeown 1985).
  • McKeown, M. G. (1985) The acquisition of word meaning from context by children of high and low ability. Reading Research Quarterly, 20(4), 482-496.
  • McKeown, M. G. (1993) Creating effective definitions for young word learners. Reading Research Quarterly, 28(1), 17-31.
  • McKeown, M. G., Beck, I. L., Omanson, R. C., and Perfetti, C. A. (1983). The effects of long-term vocabulary instruction on reading comprehension: A replication. Journal of Reading Behavior, 15(1), 3-18.
  • Menard, S. (1995) Applied Logistic Regression Analysis, Quantitative Applications in the Social Sciences vol. 106, Thousand Oaks, CA: Sage Publications.
  • Mostow, J. (1996) A Reading Tutor that Listens (5-minute video). Presented at the DARPA CAETI Community Conference, November 19-22, 1996, Berkeley, CA.
  • Mostow, J. and Aist, G. (2001) Evaluating tutors that listen. in Forbus, K. and Feltovich, P. (Eds.) Smart Machines in Education: The coming revolution in educational technology, Menlo Park, California: MIT/AAAI Press, 169-234.
  • Mostow, J., Aist, G., Burkhead, P., Corbett, A., Cuneo, A., Eitelman, S., Huang, C., Junker, B., Platz, C., Sklar, M. B. and Tobin, B. (2001.) A controlled evaluation of computer- versus human-assisted oral reading. Poster presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference, May 19-23, 2001San Antonio, Texas.
  • Mostow, J., and Aist, G. (1999) Giving help and praise in a Reading Tutor with imperfect listening – Because automated speech recognition means never being able to say you’re certain. CALICO Journal 16(3), 407-424. Holland, M. (Eds.) Special issue – Tutors that Listen: Speech recognition for Language Learning.
  • Mostow, J., Hauptmann, A. and Roth, S. F.  (1995) Demonstration of a Reading Coach that listens.  Paper presented at the Eighth Annual Symposium on User Interface Software and Technology, November 15-17, 1995, Pittsburgh PA.  Sponsored by ACM SIGGRAPH and SIGCHI in cooperation with SIGSOFT.
  • Mostow, J., Hauptmann, A. G., Chase, L. L. and Roth. S.  (1993)  Towards a Reading Coach that listens: Automatic detection of oral reading errors. in Fikes, R. and Lehnert, W. (Eds.) Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93), Menlo Park, California: AAAI Press, 392-397.
  • Mostow, J., Roth, S. F., Hauptmann, A. G. and Kane, M.  (1994) A prototype Reading Coach that listens.  Paper presented at the Twelfth National Conference on Artificial Intelligence (AAAI-94), August 1-4, 1994, Seattle WA. Selected as the AAAI-94 Outstanding Paper.
  • Nagy, W. E., Herman, P. A., and Anderson, R. C. (1985) Learning words from context. Reading Research Quarterly, 20(2), 233-253.
  • National Reading Panel. (2000) Teaching Children to Read,
  • Newell, A. (1990) Unified Theories of Cognition. Cambridge MA: Harvard UP.
  • Reinking, D. and Rickman, S. S. (1990)  The effects of computer-mediated texts on the vocabulary learning and comprehension of intermediate-grade learners.  Journal of Reading Behavior, 22(4), 395-411.
  • Russell, M., Brown, C., Skilling, A., Series, R., Wallace, J., Bohnam, B. and Barker, P.  (1996)  Applications of Automatic Speech Recognition to Speech and Language Development in Young Children.  Paper presented at the Fourth International Conference on Spoken Language Processing, October 3-6, 1996, Philadelphia PA.
  • Schacter, J. (1999) Reading Programs that Work: A Review of Programs from Pre-Kindergarten to 4th Grade. Milken Family Foundation,
  • Schwanenflugel, P. J., Stahl, S. A. and McFalls, E. L. (1997) Partial word knowledge and vocabulary growth during reading comprehension. Journal of Literacy Research, 29(4), 531-553.
  • Shefelbine, J. L. (1990) Student factors related to variability in learning word meanings from context. Journal of Reading Behavior, 22(1), 71-97.
  • Snow, C. E., Burns, M. S. and Griffin, P. (Eds.) (1998) Preventing Reading Difficulties in Young Children, Washington DC: National Academy Press.
  • Walker, M. A. (2000) An application of reinforcement learning to dialogue selection strategy in a spoken dialogue system for email. Journal of Artificial Intelligence Research, 12, 387-416.
  • Whines, N. (1999) Unpublished master’s thesis, Master of Arts in Design for Interactive Media, Middlesex University, London.
  • Williams, S.M., Nix, D. and Fairweather, P. (2000) Using Speech Recognition Technology to Enhance Literacy Instruction for Emerging Readers. in B. Fishman and S. O’Connor-Divelbiss (Eds.), Proceedings of the Fourth International Conference of the Learning Sciences, Mahwah, NJ: Erlbaum, 115-120,
  • Zechmeister, E. B., Chronis, A. M., Cull, W. L., D’Anna, C. A. and Healy, N. A.  (1995) Growth of a functionally important lexicon.  Journal of Reading Behavior, 27(2), 201-212.


Appendix: Summary of experiments



Aist (2000) Chapter #


Key result

Improve story choice

Chapter 3


Modify Reading Tutor to take turns with the student at picking stories. Compare to Spring 1998 student-only story choice policy.

Higher percent of new material chosen in Fall 1999 (64.1%), vs. Spring 1998 (60.1%). Reading Tutor helped lower-performing students more.

Provide automatically generated vocabulary assistance


Chapter 4

Supplement stories with WordNet-extracted factoids; look for effect of factoids on answering multiple-choice questions.

Compare trials with factoid + context to trials with context alone.

Factoids helped for the 189 trials with single-sense rare words tested one or two days later – significant at 95%, but exploratory.

Compare Reading Tutor to other reading instruction

Chapter 5

Analyze Word Comprehension portion of a larger Project LISTEN study comparing Reading Tutor with classroom instruction, one-on-one human tutoring

For third graders, Reading Tutor better than classroom control (effect size = 0.56, p = .042) and competitive with one-on-one human-assisted oral reading

Explore ways to improve vocabulary assistance

Chapter 6

Compare short explanations to nonsemantic assistance. Two texts with teacher-written definitions or nonsemantic assistance (COMET starts with C.)

At least when test is given in back of packet, students perform better on word-to-definition matching task when supplied with definitions (2.5 items right vs. 1.8).


Chapter 6

Adapt limericks to vocabulary experiment.

Compare no exposure vs. limerick alone vs. definition alone vs. limerick plus definition, all in Reading Tutor.

Measure familiarity (“Have you seen this word before?”) and semantics (multiple-choice question on word meaning).

Strong effect of seeing explanations on familiarity.

Trend favoring effect of seeing limericks on familiarity.

Only 4th and 5th graders learned enough from definition to answer multiple-choice questions better.

Table A.1. Summary of experimental results



(Aist 2000)

Chapter 4: Factoids

(Aist 2000)

Chapter 6: Comets and Meteors

(Aist 2000) Chapter 6: Limericks

Which students?

60 students in grades 2, 3

Centennial Elementary School

Classroom setting

41 students who had just finished grades 2 through 5

Fort Pitt Elementary School

Classroom setting

29 students in grades 2,3, 4, 5

Fort Pitt Elementary School

Summer reading clinic setting

Which target words?

Words for which the Reading Tutor could automatically generate vocabulary assistance

Five domain-specific content words for each topic (10 words total)

Eight domain-independent but very rare adjectives

What kind of help?

Comparisons to other words, drawn from WordNet

Definitions written by story author (a teacher)

Experimenter-written context-specific explanations

When was help given?

Immediately before sentence containing target word

Immediately before sentence containing target words

Prior to limerick containing target word

At whose initiative?

Reading Tutor-selected using experimenter-written constraints

Teacher- (author-) selected words

Experimenter-selected words

What kind of text?

Stories already in the Reading Tutor

Two teacher-written nonfiction passages, one about comets and one about meteors

Eight children’s limericks

Modality of text

Computer-assisted oral reading

Independent paper-based reading

Computer-assisted oral reading

Modality of vocabulary help

Help inserted in yellow pop-up boxes, to be read out loud in computer-assisted oral reading

Definitions inserted seamlessly into text, to be read independently on paper

Explanations inserted seamlessly into text, to be read out loud in computer-assisted oral reading

How tested?

Automatically generated multiple-choice questions, administered by the Reading Tutor

Five-item matching test, administered on paper, stapled to the text passages

4-item multiple-choice questions given on paper, subsequent day:

Eight word familiarity yes-no questions and eight word knowledge


Factoids helped for rare single-sense words tested one or two days later (44.1% correct with factoids vs. 25.8% correct without).
Factoids also helped for third graders seeing rare words (42.0% with factoids vs. 36.2% without).

Definitions helped more than nonsemantic assistance on same-day matching task (2.5 items right vs. 1.8 items right.)

All students gained familiarity:

59/116 with limerick vs. 49/116

65/116 with definition vs. 43/116

Only 4th and 5th graders showed increased knowledge, and only for explanations:

13/22 right with limerick vs. 14/22

17/22 right with definition vs. 10/22

Table A.2. Summary of vocabulary help experiments