Automatic Summarization of Student Course Feedback
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Summarization of Student Course Feedback Wencan Luo† Fei Liu‡ Zitao Liu† Diane Litman† †University of Pittsburgh, Pittsburgh, PA 15260 ‡University of Central Florida, Orlando, FL 32716 wencan, ztliu, litman @cs.pitt.edu [email protected] { } Abstract Prompt Describe what you found most interesting in today’s class Student course feedback is generated daily in Student Responses both classrooms and online course discussion S1: The main topics of this course seem interesting and forums. Traditionally, instructors manually correspond with my major (Chemical engineering) analyze these responses in a costly manner. In S2: I found the group activity most interesting this work, we propose a new approach to sum- S3: Process that make materials marizing student course feedback based on S4: I found the properties of bike elements to be most the integer linear programming (ILP) frame- interesting work. Our approach allows different student S5: How materials are manufactured S6: Finding out what we will learn in this class was responses to share co-occurrence statistics and interesting to me alleviates sparsity issues. Experimental results S7: The activity with the bicycle parts on a student feedback corpus show that our S8: “part of a bike” activity approach outperforms a range of baselines in ... (rest omitted, 53 responses in total.) terms of both ROUGE scores and human eval- uation. Reference Summary - group activity of analyzing bicycle’s parts - materials processing - the main topic of this course 1 Introduction Table 1: Example student responses and a reference summary Instructors love to solicit feedback from students. created by the teaching assistant. ‘S1’–‘S8’ are student IDs. Rich information from student responses can reveal complex teaching problems, help teachers adjust In this work, we aim to summarize the student re- their teaching strategies, and create more effective sponses. This is formulated as an extractive sum- teaching and learning experiences. Text-based stu- marization task, where a set of representative sen- dent feedback is often manually analyzed by teach- tences are extracted from student responses to form ing evaluation centers in a costly manner. Albeit a textual summary. One of the challenges of sum- useful, the approach does not scale well. It is there- marizing student feedback is its lexical variety. For fore desirable to automatically summarize the stu- example, in Table 1, “bike elements” (S4) and “bi- dent feedback produced in online and offline envi- cycle parts” (S7), “the main topics of this course” ronments. In this work, student responses are col- (S1) and “what we will learn in this class” (S6) are lected from an introductory materials science and different expressions that communicate the same or engineering course, taught in a classroom setting. similar meanings. In fact, we observe 97% of the bi- Students are presented with prompts after each lec- grams appear only once or twice in the student feed- ture and asked to provide feedback. These prompts back corpus ( 4), whereas in a typical news dataset § solicit “reflective feedback” (Boud et al., 2013) from (DUC 2004), it is about 80%. To tackle this chal- the students. An example is presented in Table 1. lenge, we propose a new approach to summarizing 80 Proceedings of NAACL-HLT 2016, pages 80–85, San Diego, California, June 12-17, 2016. c 2016 Association for Computational Linguistics student feedback, which extends the standard ILP be selected if that sentence is selected (Eq. 3). Fi- framework by approximating the co-occurrence ma- nally, the selected summary sentences are allowed trix using a low-rank alternative. The resulting sys- to contain a total of L words or less (Eq. 4). tem allows sentences authored by different students to share co-occurrence statistics. For example, “The 3 Our Approach activity with the bicycle parts” (S7) will be allowed to partially contain “bike elements” (S4) although Because of the lexical diversity in student responses, the latter did not appear in the sentence. Experi- we suspect the co-occurrence matrix A may not es- ments show that our approach produces better re- tablish a faithful correspondence between sentences sults on the student feedback summarization task in and concepts. A concept may be conveyed using terms of both ROUGE scores and human evaluation. multiple bigram expressions; however, the current co-occurrence matrix only captures a binary rela- 2 ILP Formulation tionship between sentences and bigrams. For exam- ple, we ought to give partial credit to “bicycle parts” Let D be a set of student responses that consist of M (S7) given that a similar expression “bike elements” sentences in total. Let y 0, 1 , j = 1, ,M j ∈ { } { ··· } (S4) appears in the sentence. Domain-specific syn- indicate if a sentence j is selected (yj = 1) or not onyms may be captured as well. For example, the (yj = 0) in the summary. Similarly, let N be the sentence “I tried to follow along but I couldn’t grasp number of unique concepts in D. z 0, 1 , i ∈ { } the concepts” is expected to partially contain the i = 1, ,N indicate the appearance of concepts { ··· } concept “understand the”, although the latter did not in the summary. Each concept i is assigned a weight appear in the sentence. w of i, often measured by the number of sentences or The existing matrix A is highly sparse. Only 2.7% documents that contain the concept. The ILP-based of the entries are non-zero in our dataset ( 4). We § summarization approach (Gillick and Favre, 2009) therefore propose to impute the co-occurrence ma- searches for an optimal assignment to the sentence trix by filling in missing values. This is accom- and concept variables so that the selected summary plished by approximating the original co-occurrence sentences maximize coverage of important concepts. matrix using a low-rank matrix. The low-rankness The relationship between concepts and sentences is N M encourages similar concepts to be shared across sen- captured by a co-occurrence matrix A R × , ∈ tences. The data imputation process makes two no- A = 1 i where ij indicates the -th concept appears in table changes to the existing ILP framework. First, the j-th sentence, and Aij = 0 otherwise. In the lit- it extends the domain of Aij from binary to a con- erature, bigrams are frequently used as a surrogate tinuous scale [0, 1] (Eq. 2), which offers a better for concepts (Gillick et al., 2008; Berg-Kirkpatrick sentence-level semantic representation. The binary et al., 2011). We follow the convention and use ‘con- concept variables (zi) are also relaxed to continuous cept’ and ‘bigram’ interchangeably in the paper. domain [0, 1] (Eq. 5), which allows the concepts to N be “partially” included in the summary. max wizi (1) y,z i=1 Concretely, given the co-occurrence matrix A N M ∈ PM R × , we aim to find a low-rank matrix B s.t. Aij yj zi (2) N M ∈ j=1 ≥ R × whose values are close to A at the observed AP y z (3) positions. Our objective function is ij j ≤ i M j=1 ljyj L (4) 1 2 ≤ min (Aij Bij) + λ B , (6) B RN M 2 − k k∗ yP 0, 1 , z 0, 1 (5) ∈ × (i,j) Ω j ∈ { } i ∈ { } X∈ Two sets of linear constraints are specified to en- where Ω represents the set of observed value po- sure the ILP validity: (1) a concept is selected if and sitions. B denotes the trace norm of B, i.e., k rk∗ only if at least one sentence carrying it has been se- B = i=1 σi, where r is the rank of B and σi k k∗ lected (Eq. 2), and (2) all concepts in a sentence will are the singular values. By defining the following P 81 projection operator PΩ, The reference summaries are created by a teach- ing assistant. She is allowed to create abstract sum- Bij (i, j) Ω maries using her own words in addition to select- [PΩ(B)]ij = ∈ (7) 0 (i, j) / Ω ∈ ing phrases directly from the responses. Because summary annotation is costly and recruiting anno- our objective function (Eq. 6) can be succinctly rep- tators with proper background is nontrivial, 12 out resented as of the 25 lectures are annotated with reference sum- maries. There is one gold-standard summary per 1 2 min PΩ(A) PΩ(B) + λ B , (8) N M F lecture and question prompt, yielding 36 document- B R × 2k − k k k∗ ∈ summary pairs1. On average, a reference summary where F denotes the Frobenius norm. contains 30 words, corresponding to 7.9% of the to- Followingk · k (Mazumder et al., 2010), we optimize tal words in student responses. 43.5% of the bigrams Eq. 8 using the proximal gradient descent algorithm. in human summaries appear in the responses. The update rule is 5 Experiments (k+1) (k) B = proxλρk B + ρk PΩ(A) PΩ(B) , (9) − Our proposed approach is compared against a range of baselines. They are 1) MEAD (Radev et al., where ρk is the step size at iteration k and 2004), a centroid-based summarization system that the proximal function proxt(B) is defined as scores sentences based on length, centroid, and the singular value soft-thresholding operator, position; 2) LEXRANK (Erkan and Radev, 2004), prox (B) = U diag((σ t) ) V , where a graph-based summarization approach based on t · i − + · > B = Udiag(σ , , σ )V is the singular value eigenvector centrality; 3) SUMBASIC (Vanderwende 1 ··· r > decomposition (SVD) of B and (x)+ = max(x, 0). et al., 2007), an approach that assumes words oc- Since the gradient of 1 P (A) P (B) 2 is Lip- curring frequently in a document cluster have a 2 k Ω − Ω kF schitz continuous with L = 1 (L is the Lipschitz higher chance of being included in the summary; continuous constant), we follow (Mazumder et al., 4) BASELINE-ILP (Berg-Kirkpatrick et al., 2011), a 2010) to choose fixed step size ρk = 1, which has a baseline ILP framework without data imputation.