Non-Sequential Authoring of Handwritten Video Lectures with Pentimento

Online Submission ID: 549 Non-Sequential Authoring of Handwritten Video Lectures with Pentimento (a) User records temporal lecture, (b) retroactively edits layout, (c) records more derivation, (d) moves content to make space runs out of space, timing is preserved decides extra step is needed for new step fortunately, can write in margin (e) moves to insertion time, (f) new step properly inserted, (g) wants to change µ, (h) redraws E(x) to replace µ records new step finishes derivation makes space rest of timing is preserved Figure 1: Authoring of a handwritten video lecture using Pentimento. Users record their lecture, in which strokes are drawn over time with a pen-tablet interface while audio is recorded. At any time, they can can perform retroactive edits with the flexibility of static 2D vector graphics. However, these operations also preserve temporal information and synchronization with audio. These edits would be challenging with current screen-capture approaches because they lack a sparse and structured space-time representation of the data. Other asynchronous recording scenarios are possible, such as audio first or visuals first, and can be combined iteratively. 1 Abstract 29 academy [Khan 2012]. These videos feature text and diagrams cap- 30 tured as the teacher writes, synchronized with a narration. Unfor- 2 We facilitate the authoring of handwritten educational video lec- 31 tunately, their authoring currently relies on archaic technology that 3 tures and seek to combine the advantages of vector graphics and 32 resembles a typewriter more than the flexibility of a modern word 4 video editing. Handwritten videos are currently recorded using 33 processor. When recording digitally, authors use screen-capture 5 video screen capture, which makes editing challenging. We be- 34 software and draw with a painting program, which makes it dif- 6 lieve that the power of digital content creation is in non-sequential 35 ficult to correct or edit content after recording. We want to combine 7 iterative refinement. Teachers should be able to start a video lecture 36 the benefits of 2D vector graphics and non-linear video editing to 8 without a perfect script, and they should be able to improve their 37 make non-sequential and iterative authoring easy. 9 material over the year. If they want to add an extra line in a deriva- 38 Consider the variance derivation in the teaser (Fig.1), recorded with 10 tion, they should not need to restart a video lecture from scratch. 39 our approach. After recording the audio and visuals, the instructor 11 We make the editing of audio and visuals as orthogonal as possible 40 later decided to replace µ byE[x] (Fig.1f-h). This requires shifting 12 by decoupling the notion of time of these two modalities. To main- 41 strokes to the right, going back in time and replacing the (tempo- 13 tain synchronization and provide fine-grained control, we rely on a 42 ral) strokes corresponding to µ by a longer writing E(x), all the 14 simple retiming data structure that maps from audio time to visual 43 while preserving synchronization with the audio. 2D vector graph- 15 time and is controlled by point-wise correspondences. This also 44 ics applications make this type of spatial editing trivial, but they do 16 facilitates silence suppression and the corresponding speedup of vi- 45 not not support temporal content synchronized with audio. Video 17 suals. We identify and implement four types of retroactive editing 46 editing software supports editing and synchronization but requires 18 operations, which can modify content after recording. The user can 47 tedious efforts because visual and temporal content is dense and un- 19 1/ change the layout, 2/ insert content back in time, and respect the 48 structured. Similarly, when the lecturer starts writing an equation 20 other modality when inserting only audio or visuals 3/ replace au- 49 too big and runs out of space (Fig.1(a)), there is traditionally little 21 dio or visuals while preserving synchronization with the untouched 50 that can be done in post-production; they must restart this segment 22 modality, and 4/ shift content in time. Our approach supports a 51 from scratch and composite it spatially in the video editor. The 23 variety of recording scenarios in which audio and visuals can be 52 creation of temporal handwritten content should be as easy as the 24 recorded synchronously or independently. 53 creation of static 2D vector graphics, allowing for non-sequential 54 recording and iterative refinements. 25 Keywords: Pen, education, authoring, 2D animation 55 An additional challenge stems from the difficulty of keeping audio 56 and visuals in sync. First, synchronously recording both modalities 26 1 Introduction 57 is cognitively difficult. Second, they have different natural speeds. 58 The writing often needs to catch up with the narration, leading to 27 Our work facilitates the creation of handwritten video lectures, in 59 awkward silences: video editors working on education content have 28 the style of the virtual white board popularized by, e.g., the Khan 60 told us that they spend most of their time suppressing silences and 1 Online Submission ID: 549 61 speeding up visuals. It is also often useful to shift timing to make 121 The Flash animation language is often used to display the creation 62 sure that a new term gets written at the time when it is uttered. 122 of drawings on the web, e.g. [Sclipo 2008], but the ability to record 63 These edits as well as the asynchronous recording of visuals and 123 and edit freehand drawings is limited. In fact, authors often use 64 audio are possible using non-linear video editors, but they remain 124 masks to reveal static drawings or hand-written text, e.g. [Adri- 65 tedious because visuals and audio need to be sliced up finely and 125 atic11 2008]. Vector graphics formats such as SVG have also been 66 the speed of videos segments must be adjusted to match the audio. 126 extended to the temporal dimension, e.g. [W3C 2011], but there is 67 However, we observe that, compared to traditional video editing, 127 limited support for editing and audio synchronization. Loviscach et 68 the time scale of the visuals in handwritten lectures is more flexible 128 al. developed a technique to automatically generate a handwritten- 69 and the recorded writing speed rarely needs to be respected, which 129 looking animation [2011]. Other tools focus on pre-recorder cli- 70 provides us with great flexibility. We want to make the synchro- 130 part’s, e.g. [Sparkol 2013]. 71 nization of audio and visuals direct and flexible, and let the user 131 Most handwritten video lectures are screencasts captured during a 72 explicitly specify correspondences rather than edit the duration of 132 live session, e.g. [Talbert 2011]. A number of apps, e.g. [Tablo 73 disconnected segments. 133 2012; Showme 2013; Queeky 2013; Everything 2013] and elec- 74 Flexible synchronization is also critical to allow users to edit or 134 tronic pens or white boards, e.g. [Livescribe 2013], enable the 75 record one modality (audio or visuals) without worrying about the 135 recording of freehand drawing for applications such as education, 76 other one. Consider the scenario where a user has recorded syn- 136 but they usually do not offer post-editing. 77 chronized audio and visuals, then realizes that she needs to insert 137 smart tablets for education Most work on tablets and sketching 78 new visuals without changing the audio because her diagram is in- 138 for education has focused on use by students; artificial intelligence 79 complete. Time needs to be made to accommodate the new visu- 139 to understand an input handwriting or sketch; feedback or simula- 80 als, but without modifying the audio, which requires speeding up 140 tion of the depicted configuration, e.g. [Zeleznik et al. 2008; LaVi- 81 other visuals. It’s, however, unclear how far from the insertion time 141 ola Jr. and Zeleznik 2004; Hammond and Davis 2003; Alvarado 82 should be sped up, and the user might want to edit the acceleration 142 et al. 2001]. In contrast, we focus on teachers, we do not want to 83 later. 143 embed intelligence in our software and, instead, focus on the flexi- 84 Our work makes a step towards the flexible creation of hand-drawn 144 bility of editing a recorded handwritten tutorial. 85 lectures composed of strokes and optional background slides. We 145 86 leave geometric shapes, typefaced text, or video as future work. Pen and audio notes The integration of handwriting and audio is 146 87 In order to make the editing of different modalities orthogonal, we also powerful in the context of white board meetings, e.g. [Pedersen 147 88 decouple the time dimension of audio and visuals. To maintain syn- et al. 1993; Moran et al. 1997] and note taking, e.g. [Wilcox et al. 148 89 chronization, we introduce a simple retiming data structure based 1997; Stifelman et al. 2001]. The focus in this context is usually 149 90 on a list of correspondences between audio and visual time val- automatic organization and retrieval, while we cater to users intent 150 91 ues. This retiming enables a number of editing operations while on creating tight synchronized content. However, for the creation 151 92 respecting synchronization. It also permits flexible synchroniza- of longer lectures, our users could benefit from powerful search and 152 93 tion, in particular, the acceleration of handwriting to keep up with other advanced retrieval to better navigate a time line. 94 narration.

Load more