Eric Krotkov
Carnegie Mellon University


The Computer Vision course at Carnegie Mellon University (CMU) covers the fundamental techniques and background of machine vision, from basic digital image processing to symbolic image understanding. Offered by the Computer Science Department, the pre-requisites include courses in data structures, calculus, and linear algebra. Students are expected to assimilate the theoretical background and to apply it in programming assignments, such as detecting edges in an image, or recognizing a telephone in the image of an office scene.

We have been teaching this course for years, making use of the classic textbooks in the field: Computer Vision published in 1982, and Robot Vision published in 1986. We recognize two fundamental problems with the course and the available textbooks:

1) Few opportunities for active, "hands-on" learning. By their nature, the course and textbooks restrict students to passive watching rather than active doing. There are few demonstrations and exercises permitting students to apply, on real images, the concepts and algorithms being taught. There are even fewer opportunities to explore usage of the algorithms with different inputs and different parameter values which allow students to discover features and properties of the algorithms. In addition, students are forced to follow the order imposed by the course instructors and the textbook authors. This ordering induces passive learning, because there is no need for decision-making about pathways through the material.

2) Inadequate imagery. The subject matter of the course is inherently visual, while textbooks are limited to prose and a small number of static images. This problem manifests itself in two principal deficiencies:

a) Few detailed example images. The typical approach followed in the course and textbooks presents algorithms in prose, derives their mathematical properties, and then shows a graphic with an input image and an output result. This is effective, but the density of images showing input, intermediate results, and final result tends to be rather low, probably due to the high cost of working out examples and producing artwork. Since a picture is worth a thousand words, showing an example can convey a concept with an immediate and concrete directness that cannot be rivalled by discourse alone.

b) No time-varying imagery. The course and textbooks each contain perhaps 100 static images. In contrast, during a single minute, a typical computer vision system acquires a time series of nearly 2,000 images. Since motion is a primary visual cue, the restriction to static images is a fundamental limitation.

We have proposed to develop courseware on computer vision with 40 interactive hour-long lessons. This courseware remedies the two fundamental problems through innovative instructional materials that feature interactive learning and multimedia presentation:

1) Interactive materials for "hands-on" learning. It is well established that active involvement in the learning process enables students to learn more effectively and more successfully. We aim for a type of active involvement that is self-determined and self-paced along two dimensions:

a) "Hands-on" demonstrations and exercises. The courseware will provide students with control over parameters used in illustrative examples. For example, in a demonstration of an edge detector the student will first select an image from a database, and then use a slider bar to select the size of the window in the image that is used to compute the edge magnitude and direction. Given these selections, the courseware will then apply the edge detector to the input image and display the computed edges. By experimenting with a number of different window sizes, the student will learn that the smaller the window size, the more likely are false positives (image pixels marked as edges but not corresponding to a real edge in the scene).

b) Selecting the order of presentation. The courseware will allow students to follow a self-determined path through the material. For example, the courseware will accommodate both the students who find it more natural to start with theory and then move on to applications, and the students who find it more natural to begin with applications before studying the theory. Further, the courseware will allow students to follow their selected path at their own speed. This allows them to move as quickly or as slowly as they need to in order to develop an understanding of the material.

2) Multimedia presentation with extensive imagery. The courseware materials will consist of digital video, digital audio, text, graphics, and animation. The imagery will include numerous detailed examples illustrating the results of applying computer vision techniques, plus still images and time-varying sequences. This assortment of media will stimulate the attention and interest of many students to a greater extent than do the conventional media such as textbooks, blackboards, and viewgraphs.

To date we have implemented five hour-long lessons on CD-ROM. In addition, we have conducted two pilot studies of student usage of the courseware. In the process, we developed a number of dynamic examples, and "hands-on" exercises involving extensive animation and real-world images. As an example, one of these exercises seeks to elaborate how the optical parameters of a lens, namely the aperture diameter, focussing distance, and focal length, affect the images acquired. In this exercise, the student clicks buttons to select values for the three optical parameters, which triggers the display of a real image that was acquired with those exact parameters. A typical reaction: "Oh! So that's what focal length does: it zooms in and out!" In our experience, this interactive multimedia exercise strongly engages student interest, and illustrates to them the meaning of the optical parameters more effectively than any number of diagrams, equations, and descriptions.


Back to Teaching Resources homepage