A Software-Only Video Pro duction Switcher for the Internet MBone
Tina Wong Ketan Mayer-Patel David Simpson Lawrence A. Rowe
Computer Science Division
University of California, Berkeley
ftwong,kpatel,davesimp,[email protected]
ABSTRACT single camera that mainly fo cuses on the sp eaker,
and o ccasionally pans to show other materials such
In this pap er, we describ e the design and imple-
as slides on the overhead pro jector, a demo running
mentation of a software video pro duction switcher,
onaworkstation, or memb ers of the lo cal audience.
vps, that improves the quality of MBone broad-
This single camera approach is the most common
casts. vps is mo deled after the broadcast televi-
con guration seen in low-budget, small-scale broad-
sion industry's studio pro duction switcher. It pro-
casts on the MBone.
vides sp ecial e ects pro cessing to incorp orate audi-
We are working on to ols to improve the quality
ence discussions, add titles and other information,
and simplify the pro duction and control of MBone
and integrate stored videos into the presentation.
broadcasts. This pap er describ es the design and
vps is structured to work with other MBone con-
implementation of a software-only videoproduction
ferencing to ols. The ultimate goal is to automate
switcher vps that can b e used to improve the
the pro duction of MBone broadcasts.
quality of an MBone broadcast. vps is mo deled
after a studio production switcher [9] used in the
1 INTRODUCTION
broadcast television industry. A studio pro duction
switcher is a custom-designed hardware device that
Live programs are pro duced and broadcast world-
provides an array of real-time editing and sp ecial ef-
wide on the Internet MBone using IP Multicast [1]
fects functions. A director can select one of several
and the MBone [2] conferencing to ols e.g., vic [8],
picture sources e.g., cameras, videotap es, and still
vat [7], wb [3], sdr [5] etc.. Some examples are
image displays to b e the output. Other sources are
the NASA Space Shuttle Missions, conference pre-
generated by the device by applying sp ecial e ects
sentations e.g., Sixth International WWW Confer-
pro cessing to one or more streams such as inserting
ence, and livemusic p erformances. These broad-
titles into a picture, sup erimp osing one picture on
casts usually have audiences ranging from tens to
another, chroma-keying, and wiping or fading from
hundreds of viewers distributed world-wide.
one picture to another.
Wehave broadcast the weekly Berkeley Multime-
vps will enhance the quality of an MBone broad-
dia and Graphics Seminar on the MBone since early
cast by providing e ects available in a hardware
1995. The seminar is pro duced using the LBL/UCB
switcher. Sp eci call y,wewant to display lo cal and
MBone to ols vic and vat to capture and transmit
remote audience discussions and feedback, add ti-
video and audio streams, resp ectively. The shared
tles and credits, integrate stored analog and dig-
whiteb oard to ol wb is used to distribute p ostscript
ital videos into the presentation, and incorp orate
slides. A second wb is used by the broadcast direc-
sp ecial e ects to improve the visual images and
tor hereafter, director to communicate with par-
retain audience attention. Ultimately, our goal is
ticipants in order to debug problems with the trans-
to automate the pro duction pro cess byintegrating
mission and monitor video and audio quality.We
vps with a broadcast management system [13] that
are also testing a o or control to ol qb [6] to facili-
maintains ro om, equipment and broadcast con gu-
tate question asking. The current broadcast uses a
rations, observes eventschedules, and launches and
monitors the MBone to ols required to pro duce a
T. Wong is supp orted by a GAANN fellowship. 2
production drawing tool switcher INTERNET wb vps MBONE
video capture STUDIO vic rtpgw MBONE video transcoding
vat qb mbr audio capture floor control
recording tool
Figure 1: vps in an MBone Broadcast.
broadcast. Figure 1 shows how vps ts into the current system. It is organized as follows. Section
context of a typical MBone broadcast. The Stu- 2 presents an example of vps in use. Section 3 de-
dio MBone is a lo cal domain network connecting scrib es the vps software architecture. The imple-
the pro cesses required to pro duce the broadcast. It mentation of vps is describ ed in section 4. Section
can supp ort high data rates e.g., 5 to 30 Mbs and 5 talks ab out future work and section 6 summarizes
go o d quality video streams e.g., MJPEG video. the pap er.
The public MBone is the Internet and it runs at a
considerably lower sp eed.
2 AN EXAMPLE SCENARIO
A hardware pro duction switcher is a highly de-
We describ e how vps can improve the qualityofan
velop ed technology that could b e used in an MBone
MBone broadcast by illustrating its use through an
broadcast. However, this solution has several limi-
example scenario.
tations. First, video sources must b e converted to a
Before b eginning the scenario, we rst describ e
switch-sp eci c analog format b efore b eing passed to
the two GUI interfaces to vps: the director's con-
the switcher and converted back to a digital format
sole and the speaker's console, shown in Figures 2
and encapsulated as RTP data [11] after pro cessing
and 3, resp ectively. The director's console pro duces
so that it can b e sent to the MBone. vps avoids
the content of the broadcast. The main window of
these conversions by op erating on video streams in
this console has an editor area at the top and a pre-
the RTP representation. Second, a hardware sys-
view area at the b ottom. The director uses the edi-
tem is not extensible. vps is designed in a mo dular
tor area to cho ose a sp eci c e ect editor and to con-
manner to allow new e ects to b e added to the sys-
gure the parameters of that e ect. The preview
tem. Third, vps can b e controlled by other soft-
area shows thumbnails of video sources including
ware to automate decisions by a director through
the results of applying an e ect. The director can
reactive software heuristic technologies. Finally,
click onathumbnail to see more information ab out
a hardware switcher has only one user interface.
that video. The output window of the director's
vps can b e op erated by many GUIs ranging from
console shows the current video b eing broadcast.
a simple interface designed for a sp eaker to a so-
This window also describ es the broadcast multicast
phisticated interface designed for a skilled director.
session, if applicable. The sp eaker's console allows
Moreover, interfaces can b e customized for di erent
the sp eaker to incorp orate stored videos into the
users.
lecture. It has a preview area similar to the direc-
This pap er describ es the design of vps including
tor's console. The sp eaker clicks on a thumbnail
the GUI interface and the implementation of the 3
Figure 2: The Director's Console.
to select and and bring up a VCR-like player to remote lo cations also have digital cameras attached
playback a video. to their workstations, and the sp eaker has several
The following shows how vps can b e used in this videos to accompany her lecture. The director uses
1
scenario. Supp ose a seminar is b eing conducted on these videos to pro duce the content of the broad-
the Berkeley campus. Students on campus attend cast. He previews them in the preview area of the
the seminar in the lecture ro om, and remote viewers director's console. He also monitors the broadcast
join in virtually bywatching the broadcast on the with the output window. New video sources can b e
MBone. added at any time during the broadcast. For ex-
ample, a remote viewer who joins late can still b e
a vps source and part of the lecture broadcast.
Beginning a Lecture
A short time b efore the lecture starts, the direc-
tor switches from a still image that identi es the
program to a picture showing the sp eaker. He uses
the cut editor of the director's console to select this
picture and switch sources. He then uses the subti-
tle editor to insert the seminar title and the sp eaker
name onto the picture. After a minute or so, the di-
Figure 3: The Sp eaker's Console.
rector removes the titles by switching to the original
picture. Figure 4 shows screen shots that illustrates
the op ening of the lecture.
Viewing Sources and Monitoring the Broadcast
1
Supp ose there are two cameras in the seminar ro om:
These videos might b e stored on a video le server or
replayed on a VCR.
one fo cusing on the sp eaker and one facing the au-
dience. Supp ose further that participants at several 4
Figure 4: Pro ducing Op ening Phase of Lecture.
Playing Stored Video Incorp orating Audience Discussions
At some p oint in the lecture, the sp eaker wants A lo cal audience memb er raises his hand to notify
to show a video. She uses the sp eaker's console the sp eaker that he has a question. A few remote
to select and play the video. Dep ending on her viewers also indicate their desire to ask a question
pace, she can use the VCR-like controls to play, or commentbyentering a request into the o or con-
stop, rewind, fast-forward or restart the video. trol to ol. The o or mo derator signals a remote
The director's console provides the same playback viewer to ask her question. The director notices
facilities so the director can assume this task. See that a video of this remote viewer is available b e-
Figure 5 for a screen shot of the VCR-like player. cause it is b eing previewed on the director's console.
The o or control to ol might send a \grant o or"
message to all to ols. The director's or sp eaker's 5
Figure 6: Incorp orating Audience Discussions.
3 SOFTWARE ARCHITECTURE
This section describ es the vps architecture. vps is
comp osed of multiple pro cesses: a video le server,
a broadcaster, one or more e ects fx processors
managed byanfx server, and two user inter-
faces. These pro cesses exchange control messages
and transmit video data to each other using RTP
over IP Multicast on the Studio MBone, and receive
data from the public MBone. Figure 8 illustrates
the pro cesses in this software architecture.
vps is decomp osed into multiple pro cesses in or-
Figure 5: VCR-like Player.
der to build a distributed system which can uti-
lize more resources, facilitate future extensions in
console can displayathumbnail in the preview
e ects pro cessing, and make mo di cations to the
area when it receives this message if it includes a
user interfaces. The system is implemented with
video source or still image. vps could automat-
the Continuous Media To olkit CMT [4] whichis
ically switch to a stream that showed the sp eaker
describ ed in more detail in section 4.
and questioner in side-by-side windows. This exam-
ple illustrates automatic switching. This action can
3.1 Studio MBone
also b e invoked manually by using the picture-in-
The Studio MBone is an RTP session [11] with
picture PIP editor. See Figure 6 for screen shots.
2
a single multicast group and two p ort numb ers
Finishing a Lecture
on which vps pro cesses transmit video data and
At the end of the lecture, the director uses the fade
control messages. This multicast address is well
editor to execute a fade transition from the sp eaker
known to the pro cesses and can b e con gured as a
to a black screen. On the black screen, he uses
command-line argument at system startup. One
the subtitle editor to put up acknowledgments and
2
This multicast address can b e chosen through the session
credits to thank the sp eaker, p eople organizing the
directory protocol[5]toavoid con icting with other multicast
seminar, and an advertisement for the next event.
groups. Since the Studio MBone spans Berkeley,we only
need to allo cate a multicast address not in use on campus.
Figure 7 shows an example. 6
Figure 7: Pro ducing Closing Phase of Lecture
p ort numb er is used for data and the other for sages ow from the user interfaces to the other pro-
control. Administrative scoping or the time-to-live cesses. They request e ects pro cessing from the
ttl eld in the RTP session is set to reach all pro- fx server, con gure parameters at the broadcaster,
cesses in the system. For our broadcasts, the Studio notify the broadcaster to switch to another video
MBone spans our building. source, and control video playback at the video le
We designed the system so that control messages server. Table 1 lists the control messages used in
serveasaninterface among the pro cesses. Conse- the current system. Although these messages could
quently,internal changes to a pro cess do not a ect b e unicast to the appropriate destinations, webe-
other pro cesses as long as these messages remain lievemulticast will b e more ecient when the sys-
the same. Control Messages provide co ordination tem is integrated with other MBone to ols. For ex-
among vps pro cesses. In the current system, mes- ample, as the o or control to ol grants the o or to
7
an audience memb er, it multicasts a message on the 3.4 FX Pro cessor and FX Server
Studio MBone so the resp onsible vps pro cesses can
An fx pro cessor manipulates one or more video
react to the message by switching to the correct
streams to generate sp ecial e ects. The e ects sup-
video to broadcast and/or requesting e ects pro-
p orted by the current vps are fade, mix, picture-
cessing. The Studio MBone is connected to the
in-picture PIP and subtitle. The fade and mix
public MBone byamulticast router. This way,
e ects are implemented in the compressed MJPEG
streams from remote participants are automatically
domain which means the streams are not fully-
passed to the Studio MBone.
deco ded b efore b eing pro cessed [12]. PIP and sub-
Available video streams are source videos and
title are implemented in the uncompressed YUV
result streams from e ects pro cessing. They are
domain which means the streams must b e fully-
multicast on the Studio MBone instead of unicast
deco ded b efore b eing pro cessed. More details ab out
b ecause multiple pro cesses usually need the same
the e ects pro cessing algorithms are presented in
video at one time. For example, the result of an fx
the next section.
pro cessor is needed by the preview area at the direc-
There can b e more than one fx pro cessor de-
tor's console, another fx pro cessor for other kinds
p ending on the computing resources available.
of pro cessing, and the broadcaster to output to the
The collection of fx pro cessors are managed by
MBone, all at the same time.
the fx server. It communicates with the other
vps pro cesses, accepting pro cessing requests from
3.2 Stored and Live Video Server
the director's console and assigning them to an fx
The video le server pro cess serves stored digital pro cessor. Fx pro cessors are scheduled using round
robin scheduling to ensure load balancing. The re-
videos to other vps pro cesses. Stored video play-
back is controlled by the user interfaces which send sult of e ects pro cessing is multicast onto the Stu-
dio MBone so all vps pro cesses can utilize it. For
control messages to the server. The server plays a
video bymulticasting the appropriate streams on example, the director's console previews the result
b efore it is b eing switched to the output, and an-
the Studio MBone.
Live videos originating from the lo cal studio e.g., other fx pro cessor uses the result as an input to a
camera feeds from the lecture ro om or from other di erent e ect.
This design was chosen so the fx server and fx
video feeds e.g., cable or satellite receivers are
\served" by the Studio MBone in the sense that pro cessors can b e easily extended without requiring
the other to b e signi cantly rewritten or a ecting
the streams are multicast on the asso ciated RTP
session. Live videos from remote participants e.g., other vps pro cesses. Changes in the load balancing
p olicy in the fx server do not a ect the internals of
cameras attached to studentworkstations are sent
on a separate RTP session on the public MBone. the fx pro cessors. Likewise, mo di cations to the
pro cessors, such as implementing e ects pro cessing
These streams are multicast on di erent addresses
so that lo cal data and control messages are not for- in the raw or compressed domain, are isolated.
To add a new typ e of e ects pro cessing suchas
warded to the public MBone in order to avoid wast-
ing valuable bandwidth. We distinguish each video chroma-key which is common in television weather
source within an RTP session with the unique syn- forecasts, wewould need to include the co de to im-
plement this pro cessing into the fx pro cessor, ex-
chronization source identi er eld ssrc sp eci ed
in the RTP header. tend the control messages so that the director's
console can request the e ect, and implementa
chroma-key editor so the director can control the
3.3 Broadcaster
e ect.
The broadcaster pro cess multicasts the vps output
to the public MBone at the address and p ort num-
3.5 User Interfaces
b er advertised for the broadcast. The director using
As describ ed in the scenario, there are two user
the director's console selects a video to b e the out-
put and sends a control message to tell the broad- interfaces in vps. The director's console is the main
control center which provides a set of primitives to
caster to carry out the switching b etween streams.
manipulate vps, and the sp eaker's console which 8
fx processor fx processor INTERNET fx server MBONE . . .
vicSTUDIO MBONE broadcaster rtpgw
video file UI server UI director's console . . . speaker's console video
control
Figure 8: vps Software Architecture.
sender receiver control name parameters
Director's Console Fx Server Processing FX Name, Fx Params
Director's Console Video File Server Playback File ID, Playback Speed
Speaker's Console Video File Server Playback File ID, Playback Speed
Director's Console Broadcaster Switch Video ID
Director's Console Broadcaster Configure address/port/ttl
Table 1: Control messages.
allows the lecturer to integrate stored video into the When a transition is requested, the director sends
presentation. Two separate interfaces are provided a control message to the broadcaster pro cess to ex-
so that the pro duction pro cess and the lecture can ecute the switch.
b e going on in di erent ro oms. They are written in
Tcl/Tk [10] and OTcl [14] and are easy to mo dify 3.6 Automating the Pro duction Pro cess
to incorp orate b etter UI designs, as we get more
Several asp ects of the pro duction pro cess can b e
exp erience using them, and new editors when new
automated. The director's console can follow a pre-
e ects are included.
pared script and send control messages for switch-
The thumbnail previews in the director's and
ing and e ects pro cessing at sp eci ed times. For
sp eaker's console are \optimized" in the sense that
example, the script shown in Table 2 can b e used
they are up dated infrequently to avoid exp ensive
to automate a broadcast. The rst part of the script
deco ding of each frame in the video. The current
de nes variables to b e used later on; startTime is
implementation displays one frame every one hun-
the advertised starting time of the broadcast April
dred frames of the video. When e ects pro cessing
15, 1997, 1:00 p.m., endTime is the end time 3:00
is requested by the director, the director's console
p.m. the same day, and speakerStream is the
sends a control message to the fx server to request
camera facing the sp eaker ssrc 326232628 in the
the pro cessing as describ ed ab ove. The resulting
RTP session 234.1.2.3/1234. The second part of
stream is sent back on the Studio MBone so the di-
the script automates the op ening and closing phases
rector's console can display it in the preview area.
9
of the broadcast. It rst tells the broadcaster pro- most recentversion has approximately 5000 lines of
cess to switch speakerStream to the MBone at OTcl co de.
startTime. Then, it requests subtitle pro cessing
on speakerStream with a text string and assigns
4.2 Sp ecial E ects Pro cessing
the resulting stream to the variable titledStream.
The PIP and subtitle e ects are applied in the un-
At time startTime + 30 seconds, it switches to
compressed domain, and have YUV streams as in-
the stream sp eci ed by titledStream. It then
puts and outputs. The mix and fade e ects are
switches back to the original video speakerStream
generated in the compressed JPEG domain, which
at startTime + 60 seconds. Similar actions are are
pro cess MJPEG streams and output in the same
executed in the closing phase. A p otential prob-
format. The algorithms that manipulate images in
lem here is that the estimated times are not always
the compressed domain are fully describ ed in [12].
accurate, as the lecture can start late and run over-
Toinvestigate the p erformance of the current ef-
time. These situations should b e accounted for in
fects implementation, we conducted exp eriments
the design of the automation engine.
to determine the latency of each e ects on each
frame. We also measured the p erformance of the
4 IMPLEMENTATION AND DISCUSSION
MJPEG to YUV deco ding op eration. The MJPEG
and YUV streams used in the measurements are
This section discusses our current implementation
CIF 320x240 sized videos, and are served from lo-
and related issues.
cal disks to isolate the measurements from network
overhead. The measurements were carried out on a
4.1 Status
200 MHz Pentium Pro with 32MB of memory and
vps is implemented using the Continuous Media
2GB of disk space.
To olkit CMT. CMT is a p ortable to olkit of
The results are presented in Table 3. In our im-
reusable ob jects op erating on media streams that
plementation, the generation of various e ects is
simpli es the developmentofmultimedia applica-
inexp ensive; it takes approximately 15 to 20 ms
tions. The to olkit includes video le and playback
to pro cess each frame in YUV. The deco ding from
ob jects in MJPEG and H.261 formats, communi-
MJPEG to YUV is more computationally intensive
cation ob jects for unicast UDP, multicast RTP,
and thus takes on average 45 ms. Adding the de-
and blo cking and non-blo cking RPC, synchroniza-
co ding times to the pro cessing times, it takes ab out
tion ob jects to control application b ehavior, and
65 ms to complete the YUV pro cessing on each
lter ob jects that implement e ects pro cessing on
frame. When compared to the mix e ect imple-
YUV and MJPEG data. Each pro cess in vps is
mented in the compressed domain, we see that it
comp osed of CMT ob jects connected by the Tcl
takes on average of 65 ms for pro cessing YUV data,
scripting language. The vps co de is structured into
but only 20 ms for pro cessing in the compressed do-
a hierarchy of classes using OTcl, an ob ject-oriented
main. Clearly, e ects pro cessing in the compressed
extension to Tcl develop ed at MIT. The Tk to olkit
domain is much faster than converting a stream to
is used to build the user interface.
YUV, applying the transformation, and converting
The current implementation is a prototyp e of the
back to MJPEG [12]. These measurements only
describ ed system. We implemented e ects pro cess-
account for the raw computation time needed to
ings in the YUV and MJPEG domains. The au-
generate the op erations; we did not lo ok into how
tomation engine is in its design phases and works
other b ottlenecks in the vps system can a ect the
closely with the broadcast management system de-
p erformance of e ects pro cessing. Other p ossible
scrib ed in the intro duction. To demonstrate the
I/O b ottlenecks may exist in the kernel when the
system's feasibility, this prototyp e sends video data
system transmits or receives streams over the net-
unicast UDP over the network and control mes-
work.
sages via RPC. The implementation is b eing up-
The e ects pro cessing o ered in the current sys-
dated to use IP Multicast for b oth video data and
tem are simple; they mainly involve memory copies
control messages. At the writing of this pap er, the
and/or simple calculations. For more complex ef-
system has gone through two ma jor revisions. The
fects that require greater computation p ower, such
10
set vps [new VPS :::]
set startTime [new Time ``4 15 1997'' ``13:00'' GMT]
set endTime [new Time ``4 15 1997'' ``15:00'' GMT]
set speakerStream [new LiveStream ``234.1.2.3'' 1234 326232628]
:::
at $startTime ``$vps cut $speakerStream''
set introText ``Berkeley Graphics ...''
at $startTime + 1 ``set titledStream [$vps subtitle $speakerStream $introText]''
at $startTime + 30 ``$vps cut $titledStream''
at $startTime + 60 ``$vps cut $speakerStream''
:::
at $endTime ``$vps fade $speakerStream black''
set creditsText ``Credits ...''
at $endTime + 1 ``set endStream [$vps subtitle black $creditsText]''
at $endTime + 30 ``$vps cut $endStream''
Table 2: OTcl script to automate pro duction pro cess.
Operation Latency ms Std. dev ms