<<

Automatic software synthesis of dataflow program: An MPEG-4 simple profile decoder case study Ghislain Roquier, Matthieu Wipliez, Mickael Raulet, Jörn W. Janneck, Ian D. Miller, David B. Parlour

To cite this version:

Ghislain Roquier, Matthieu Wipliez, Mickael Raulet, Jörn W. Janneck, Ian D. Miller, et al.. Auto- matic software synthesis of dataflow program: An MPEG-4 simple profile decoder case study. Signal Processing Systems, 2008. SiPS 2008. IEEE Workshop on, Oct 2008, Washington, United States. pp.281 - 286, ￿10.1109/SIPS.2008.4671776￿. ￿hal-00336516￿

HAL Id: hal-00336516 https://hal.archives-ouvertes.fr/hal-00336516 Submitted on 4 Nov 2008

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. AUTOMATIC SOFTWARE SYNTHESIS OF DATAFLOW PROGRAM: AN MPEG-4 SIMPLE PROFILE DECODER CASE STUDY

Ghislain Roquier, Matthieu Wipliez, Mickael¨ Raulet Jorn¨ W. Janneck, Ian D. Miller, David B. Parlour

IETR/INSA, UMR CNRS 6164 Image and Remote Sensing laboratory Xilinx inc. F-35043 Rennes, France San Jose, CA, U.S.A {groquier, mwipliez, mraulet}@insa-rennes.fr {jorn.janneck,ian.milller,dave.parlour}@xilinx.com

ABSTRACT parallelism. Meanwhile, the growth of video coding tech- The MPEG Reconfigurable Video Coding (RVC) framework nologies leads to solutions that are increasingly complex and is a new standard under development by MPEG that aims at more difficult to design. So far, no effort has been made to providing a unified high-level specification of current MPEG profit from the significant overlap between standards. video coding technologies. In this framework, a decoder is The observation of these drawbacks of current video stan- built as a configuration of video coding modules taken from dard specifications led to the development of the Reconfig- the standard “MPEG toolbox ”. The elements of the urable Video Coding (RVC) standard. The key concept be- library are specified by a textual description that expresses hind this project is to enable the design of decoders at a higher the I/O behavior of each module and by a reference software level of abstraction. An “abstract” model focusing on modu- written using the CAL Actor Language. A decoder configura- larity, concurrency and reusability is a better starting point tion is written in an XML dialect by connecting a set of CAL for any design and implementation . RVC provides modules. Code generators are fundamental supports that en- a high-level description of the MPEG standard using as new able the direct transformation of a high level specification to form of reference software, for each module of the standard efficient hardware and software implementations. This paper library, a specific language called CAL. Once the high-level presents a synthesis tool that from a CAL dataflow program model is available the challenge is then to develop appropri- generates code and an associated SystemC model. Exper- ate tools that implement the design flow and provide the opti- imental results of the RVC Expert’s MPEG-4 Simple Profile mization steps necessary for efficient implementations. This decoder synthesis are reported. The generated code and the paper presents a non-normative (in terms of relation with the associated SystemC model are validated against the original RVC standard) software synthesis tool called Cal2C that from CAL description is simulated using the Open Dataflow a dataflow program generate C code and associated SystemC environment. model in a completely automated process. Index Terms— MPEG RVC, CAL Actor Language, The paper is organized as follows: section 2 introduces dataflow modeling, software synthesis the RVC framework. This is followed by the description of the CAL actor simulator and synthesis tools for RVC in sec- tion 3. Synthesis process of the CAL to C transformation 1. INTRODUCTION are next explained in section 4. Results on the RVC expert’s MPEG-4 SP decoder are reported in section 5. Finally con- A large number of successful MPEG video coding standards clusions and future work are provided in section 6. have been developed since the first MPEG-1 standard in 1988. The process has always aimed at pro- viding appropriate forms of specifications for a wide and easy 2. RVC FRAMEWORK deployment. While at the beginning MPEG-1 and MPEG- 2 were only specified by textual descriptions, starting with The MPEG RVC framework is currently under development MPEG-4 C/C++ descriptions, reference software became the by MPEG as part of MPEG-B and MPEG-C standards. RVC formal specification of the standard. Such descriptions are aims at providing a model of specifying existing or com- composed of non-optimized software packages and face now pletely new MPEG standards at system-level [1]. An abstract many limitations. These monolithic specifications hide the decoder is built as a block diagram in which blocks define inherent parallelism and the dataflow structure of the video processing entities called Functional Units (FUs) and connec- coding algorithms. For efficient implementations the refer- tions represent the data . RVC provides both a normative ence software has to be rewritten to exploit this so-called standard library of FUs and a set of decoder descriptions - – 4 – 3 – 2 is – action 1 an short, conditions: following In the more respects enabled. case it if in otherwise fireable selected, is be action to one actions than the state- among thus order is partial firing Action Finally, (FSM). Machine dependent. State Finite using a constrained as Action further state. be actor current may the selection or values token on depend (called may conditions optionally se- additional and been tokens on has input based it of availability fired, the is on action based an lected con- When it tokens produces. of in- and number its sumes the specifies change action and Each tokens state. output ternal produce tokens, input sume r ysnigdt cle oes hnes uigan During oth- channels. with along interact execution tokens) only (called data may the sending actor modify by An ers nor actor. other access is any neither actor of An state can it parameters. and encapsulated; (input state strongly interfaces internal with ports), entity output computational and a FUs. is of actor library standard lan- CAL RVC A the the of as modules I/O normative the chosen the of specify behaviors been that software has reference the [2] of guage Language Actor CAL The specification FU 2.1. FUs common contain to standards. descriptions across reusability decoder on focuses allowing the mainly modifying RVC by by network. decoder the a of of topology reconfiguration modular is the representation helps a Such and FUs. of networks as pressed [01111001...] BITSTREAM ohge-roiyato epcs() 2 n (3). and (2) (1), respects action higher-priority no the to according fire to FSM; action the enables state current the true; to evaluate present, if ports; clauses, input guard on tokens necessary the are there actions

PARSER as eerdas referred (also cinpriorities action addr addr addr split split split DC DC DC DC DC DC pred pred pred DC DC DC -1 -1 -1 firing ETR DECODING TEXTURE a eue oips a impose to used be can i.1 Fig. Scan Scan Scan cinguards action nato) tcncon- can it action), an -1 -1 -1 PG4Sml rfiedcdrdescription decoder Profile Simple MPEG-4 . cinschedules action pred pred pred AC AC AC -1 -1 -1 Quant- Quant- Quant- ize ize ize which ) -1 -1 -1 DCT DCT DCT -1 -1 -1 eatmtclycnetdt D n ievra ti also is editor It Graphiti versa. the use vice to and possible DDL can to which converted a (NL), automatically in Language be described Network graph- called traditionally network language been a textual have provide Networks not does editor. project ical the this by Moses, superseded to ily been (OpenDF project has environment The it Dataflow Open maintained, monitor values). longer to token no user and being the state allows (actor and execution inter- actor editor, This Moses network the graphical in actors. a used of first network was hierarchical preter that a infrastructure simulate interpreter portable can a by supported is CAL Simulator 3.1. by connected are actors two edge. one when than more represented is only readability, edge are for blocks that one Note other CAL. All in specified They block. actors file). atomic shadowed DDL a separate with a of represented in networks are the described hierarchical them and of are parser (each blocks The actors 2) (Figure 1. DCT Figure inverse shown is Profile description used rep- Simple is graphical decoder MPEG-4 the and macroblock-based instance, - the For network of general resentation actors. more to a parameters net- of pass a part to - a hierarchical be is formed DDL may network actors. work a interconnected is of set decoder a A by connectivity. decoder description the the Description of enables that Decoder dialect the XML an with (DDL), Language described is decoder RVC An Description Decoder RVC 2.2. 3 2 1 http://sourceforge.net/projects/graphiti-editor http://opendf.sourceforge.net/ moses/ http://www.tik.ee.ethz.ch/ OINCOMPENSATION MOTION Addr Addr Addr Buffer Buffer Buffer .SPOTTOSFRRVC FOR TOOLS SUPPORT 3. Interpo- Interpo- Interpo- late late late Add Add Add 3 1 odslyntok nthe in networks display to rjc.Mssfeatures Moses project. 2

o hr) Contrar- short). for MERGE EOE DATADECODED SIGNED

SIGNED

Clip OUT OUT

IN Trans- Trans- IN IN Scale OUT IN 8-pt idct OUT IN OUT IN 8-pt idct OUT IN OUT IN Shift OUT pose pose

Fig. 2. Inverse DCT description

DDL format. 4. CAL2C SOFTWARE SYNTHESIS

4.1. Semantics of CAL dataflow 3.2. Software synthesis The system behavior of a dataflow program is determined by the interactions between actors (i.e. exchange of data tokens). It is important to be able to automatically obtain a concrete Such interactions are governed by a Model of Computation software implementation from a dataflow description. The C (MoC) that defines which policies can be used to language is particularly well-suited as a target language. The fire actors. The CAL language is not related to any particular same code can be compiled on any processor, from embedded dataflow MoCs. Indeed, several forms of dataflow exist to in- DSPs and ARMs to general-purpose microprocessors, which terpret the network from the general dataflow process network considerably eases the task of writing a software synthesis (PN) [4] model with multiple firing rules to the more restric- tool. The interest of having an automatic C software synthesis tive synchronous dataflow (SDF) one [5, 6]. CAL extends the is two-folded. The code obtained can be executed, in which model in [4] by : case it enables a considerably faster simulation of the dataflow program and the ability to debug the program using existing 1 – state within actors, IDEs (Visual Studio, Eclipse CDT). Moreover, the C code 2 – multiple overlapping (non-joinable in the parlance of [4]) description obtained may be a basis for an optimized tailor- firing rules, and made decoder. For these reasons, we created the Cal2C tool 3 – priorities among firing rules. (detailed in section 4) that aims at producing functionally- equivalent, humanly-readable C code from CAL descriptions. CAL actors in the standard library often contain multiple actions and priorities, FSM or guards that lead to state- dependent or conditional execution. Therefore most of the 3.3. Hardware synthesis CAL actors in the library are closer to the PN model which is then chosen (as a first milestone) for developing the tool Another tool is available that converts CAL to HDL [3]. After described in this paper. parsing, CAL actors are instantiated with the actual values for their formal parameters. The result is an XML representation transfor- code C, CAL parser CAL AST CIL of the actor which is then precompiled (transformation and mations generation C++, H analysis steps, including constant propagation, inference parameters and type checking, analysis of data flow through variables...), code parser represented as a sequential program in static single assign- NL NL AST elaboration NL AST generation H ment (SSA) form (making explicit the data dependencies be- tween parts of the program). Fig. 3. Cal2C compilation process Then follows the synthesis stage, which turns the SSA threads into a web of circuits built from a set of basic opera- When the PN model is chosen to interpret CAL networks, tors (arithmetic, logic, flow control, memory accesses and the any environment which supports multithreading may be cho- like). The synthesis stage can also be given directives driving sen for implementation. For instance, it can be done using the unrolling of loops, or the insertion of registers to improve POSIX threads by translating CAL actors into threads and by the maximal clock rate of the generated circuit. replacing connections with FIFOs. The PN model can also The final result is a file containing the circuit im- be executed by a discrete-event (DE) scheduler, such as the plementing the actor, and exposing asynchronous handshake- one provided by SystemC. Actors and FIFOs can be easily style interfaces for each of its ports. These can be connected translated using SystemC classes. Additionally, its simulation either back-to-back or using FIFO buffers into complete sys- environment permits high-level programming, well-adapted tems. The FIFO buffers can be synchronous or asynchronous, to functional verifications. A PN-oriented SystemC model is making it easy to support multi-clock-domain dataflow de- expressed as a network of modules communicating with each signs. other via blocking FIFOs (with an additional way to support in different actors. Another challenge is (2) the difference ac: action AC:[i] ⇒ OUT:[ saturate( o )] var of programming paradigm between the source and the target int(size=SAMPLE_SZ) v = language: CAL allows functional constructs that have no di- ( quant * ( lshift( abs(i), 1) + 1) ) - round, int(size=SAMPLE_SZ) o = rect equivalent in C. The action translation process starts with if i = 0 then 0 else if i < 0 then -v else v end end an Abstract Syntax Tree (AST) issued from the CAL source do count := count + 1; code, and modifies it as needed to meet the previous require- end ments. Function names are prefixed with the actor name to prevent any potential name clashes; actor parameters are re- (a) CAL “ac” action in the “Inversequant” actor placed by their values (when constant) or transformed to local variables otherwise; actions are converted to functions where void Inversequant_ac( struct Inversequant_variables *_actor_variables , input and output patterns become parameters. Finally, actor int i , int *out ) declarations are ordered by dependencies between locals, so { int v, o ; int _call_6, _call_9; that a variable or a function is defined before being used. At int _if_7, _if_8 ; this point, we convert the AST to λ-calculus, apply Damas- _call_6 = Inversequant_fun_abs(_actor_variables, i); v = _actor_variables->quant * ((_call_6 << 1) + 1) Milner W-algorithm [7] to it, and augment the AST with the - _actor_variables->round; type information returned. Types are necessary for correct C if (i == 0) { _if_8 = 0; code generation, and type-dependent transformations: we in- } else { line functions that return lists, and compute list sizes. The if (i < 0) { _if_7 = - v; transformed CAL AST is expressed in the C Intermediate } else { Language (CIL) [8], where CAL functional constructs are re- _if_7 = v; } placed by imperative ones. C code is generated by calling the _if_8 = _if_7; pretty-printer included in the framework. } o = _if_8; The actor code translation process is illustrated on fig- (_actor_variables->count) ++; _call_9 = Inversequant_fun_saturate(_actor_variables, o); ure 4. Figure 4(a) is an action from the Inversequant actor *out = _call_9; of the RVC reference MPEG4-SP decoder, and figure 4(b) is } the translated C code. The resulting code exhibits a function (b) C translation of the “ac” action in the “Inversequant” actor whose name is composed of the actor name and the action name (requirement (1)). Its parameters are the same as the Fig. 4. C translation of a CAL action action’s, with an additional pointer to a structure containing the actor variables. The if expression has been transformed to assignments of temporary variables ( if 7, if 8) (require- token availability). Note that Cal2C does not intend to use ment (2)). As a matter of fact, function calls have also become SystemC as an hardware description language but only as a assignments of temporary variables ( call 6, call 9) because convenient PN modeling and simulation environment. More- CIL semantics requires it. The action output expression is over, this work is a first step to an efficient “pure C” code gen- translated as a pointer parameter whose value is written at eration (closer to an SDF model than a PN one). In short, the the end of the C function. The synthesized C code shown in software synthesis from a network of CAL actors produces figure 4(b) is functionally-equivalent to the CAL code and re- several files as explained in Figure 3. Each NL network is mains humanly-readable, which were two requirements listed translated into a header file where FIFOs, modules and sub- in section 3.2. networks are instantiated and connected. CAL actor transla- tion is done in 2 different parts: the actor code (actions, func- tions, procedures) to express the functionality and the action 4.2.2. Action scheduling scheduler (priorities, FSMs, guards) to control the execution. An action scheduler is created to control the action selection Finally an additional file is created to instantiate the top net- during execution. Priorities, guards, token consumption rates work and to launch SystemC simulation. and FSM have to be translated to this end. Determining the overall order of action execution is required to have a con- 4.2. C code generation: Actors sistent evaluation of actions that can be fired. Priorities are 4.2.1. Actor code translation resolved by sorting actions in a total order and by adding a if-then-else statement around actions wherein the condition Translating the actor code produces a single C file wherein is given by the availability of input tokens and the guards functions, procedures and actions are translated. The C lan- conditions. FSMs are resolved using switch-case statement. guage and compilers impose some limitations on the trans- Finally, the generated file consists of a with an infi- lated code. For example, (1) distinct functions should have nite loop wherein its body consists of the result of the previ- different names to avoid linking problems, even if they are ous transformations and actions are replaced with their corre- sponding C functions. semantic differences between the SystemC implementation For instance, a downsampler by N is illustrated figure and the NL specification: FIFO channels, while implicit in 5(a). It could be written in a simpler manner but this actor en- NL, must be explicitly created in SystemC. Broadcasting data ables to highlight key features of the action scheduling trans- from a source to several sinks is transparent in NL, but re- lation. The synthesized C++ code is illustrated figure 5(b). quires additional logic in SystemC, namely a generic broad- cast module. actor downsampler (N) In ⇒ Out : count := 1; 5. RESULTS ON THE MPEG-4 SIMPLE PROFILE pass: action In: [x] ⇒ Out: [x] end DECODER done: action ⇒ guard count = N do count := 1; end skip: action In: [x] ⇒ do count := count + 1; end In order to certify the correctness of Cal2C code translation, schedule fsm pass: copy ( pass ) --> discard; the first case study is a two-dimensional inverse DCT. The discard ( done ) --> copy; IDCT is a component of several MPEG standard decoders discard ( skip ) --> discard; end and is specified by the new Finite Precision IDCT Specifi- priority cation [9] based on [10]. The algorithm consists of applying done > skip; end one-dimensional IDCT along the row and column axis of an end 8×8 pixel block. The network is composed of 2 input ports, 1 output port and 5 different actors; one actor can be instan- (a) CAL downsampler tiated several times in NL. Incoming tokens from IN port are SIGNED struct downsampler_vars { dequantized data and a token from enables to spec- int count; ify to the “clip” actor if incoming data are signed or not. The int N; }; first row of Table 1 shows the number of files of the respec- void downsampler::process() { tive programs. An actor becomes a C, a C++ and a header file; int fsm_state, _call_6, _call_7,_out_1; struct downsamplerN_vars _actor_vars; a network simply becomes a header file; the additional C++ _actor_variables.count = 1; file is the main. The second row exhibits the corresponding fsm_state = 1; while (1) { source lines of code (SLOC). The testbed consists of apply- switch (fsm_state) { ing a stimulus (streamed by a C-code reference software of case 1: _call_6 = In->get(); MPEG4 SP decoder) to the top network and verifying the re- downsampler_pass(&_actor_vars, _call_6, &_out_1); sponse against an expected result (from the CAL description Out->put(_out_1); fsm_state = 2; simulated compared to a “golden reference” streamed by the break; C-code reference software). case 2: if (_actor_vars.count == _actor_vars.N) { downsampler_done(&_actor_vars); IDCT CAL NL C C++ H fsm_state = 1; Number of files 5 1 5 6 6 } else { _call_7 = In->get(); Code Size (SLOC) 131 25 324 386 107 downsampler_skip(&_actor_vars, _call_7); fsm_state = 2; Table 1. Code size and number of files automatically gener- } break; ated for the IDCT } } } Another synthesized model of a more complex CAL dataflow program simulated with the Open Dataflow environ- (b) Synthesized action scheduler ment also validate the Cal2C tool. The compilation process has been successfully applied to the full MPEG-4 Simple Fig. 5. Action scheduling (FSM) of CAL actor Profile dataflow program written by the MPEG RVC ex- perts (Figure 1). Table 2 shows that synthesized C-software is faster than the simulated CAL dataflow program (20 frames/s 4.3. SystemC code generation: Networks instead of 0.15 frames/s), and close to real- for a QCIF format (25 frames/s). However it remains slower than the Expressing a NL network in SystemC is relatively straight- automatically synthesized hardware description by Cal2HDL forward: actor or network instantiations are transformed to [3]. Using Cal2C has also permitted to correct some actors module instantiations. A SystemC module is declared using which had a behavior depending on one particular code gen- the SC MODULE macro. Its effect is that it creates a C++ erator. Indeed, they made use of nondeterminism allowed by class that can be used inside a SystemC simulation. For con- CAL (for instance, 2 fireable actions without priority relation- venience, in Cal2C each module is translated to a definition ship). The way to solve the nondeterminism (not expressible file (*.h) and an implementation file (*.cpp). There are two in C) is implementation-dependent. MPEG4 SP Speed Code size scheduling of (parts of) standard decoders. decoder kMB/S kSLOC Once this work is done, it may be possible to generate CAL simulator 0.015 3.4 both hardware and software for implementation onto hetero- Cal2C 2 10.4 geneous platforms with processors and programmable logic Cal2HDL 290 4 devices. Moreover, we are in the process of integrating Cal2C within the Open Dataflow framework, which should Table 2. MPEG4SP decoder speed and SLOC various heterogeneous implementations easier to obtain.

The MPEG4 SP dataflow program is composed of 27 7. REFERENCES atomic actors; atomic actors can be instantiated several times, for instance there are 42 actor instantiations in this dataflow [1] C. Lucarz, M. Mattavelli, J. Thomas-Kerr, and J. Jan- program. Its number of SLOC is shown in Table 3. All of neck, “Reconfigurable Media Coding: a new specifica- the generated files are successfully compiled by gcc. For tion model for multimedia coders,” in Proceedings of instance, the “ParserHeader” actor inside the “Parser” net- SIPS’07, Oct. 2007. work is the most complex actor with multiple actions. The [2] J. Eker and J. Janneck, “CAL Language Report,” Tech. translated C-file (with actions and state variables) includes Rep. ERL Technical Memo UCB/ERL M03/48, Univer- 1043 SLOC for actions and 1895 for action scheduling. The sity of California at Berkeley, Dec. 2003. original CAL file contains 962 lines of codes as a comparison. [3] J. Janneck, I. Miller, D. Parlour, G. Roquier, M. Wipliez, MPEG4 SP decoder CAL NL C C++ H and M. Raulet, “Synthesizing hardware from dataflow Number of files 27 9 27 28 36 programs: an MPEG-4 Simple Profile decoder case Code Size (kSLOC) 2.9 0.5 5.8 3,7 0.9 study,” in Proceedings of SiPS’08, 2008.

Table 3. Code size and number of files automatically gener- [4] Edward A. Lee and Thomas M. Parks, “Dataflow Pro- ated for MPEG4 SP decoder cess Networks,” Proceedings of the IEEE, vol. 83, no. 5, pp. 773–801, May 1995. [5] E. A. Lee and D. G. Messerschmitt, “Static schedul- 6. CONCLUSION AND FUTURE WORKS ing of synchronous data flow programs for digital processing,” IEEE Trans. Comput., vol. 36, no. 1, pp. This paper presents a software synthesis tool that enables an 24–35, 1987. automatic translation of dataflow programs written in CAL, showing the rules to translate the I/O behaviors (actions, func- [6] Shuvra S. Bhattacharyya, Praveen K. Murthy, and - tions and procedures), the relevant control structures (priority, ward A. Lee, “Synthesis of embedded software from guard and FSM) as well as the networks. Moreover, the pro- synchronous dataflow specifications,” J. VLSI Signal cess has been successfully applied to automatically translate Processing Systems, vol. 21, no. 2, pp. 151–166, 1999. the MPEG-4 SP decoder. The results obtained so far show the efficiency and soundness of the synthezised code. [7] Luis Damas and Robin Milner, “Principal type-schemes The next milestone of Cal2C concerns the static schedul- for functional programs,” in Proceedings of POPL’82, ing of such dataflow programs. Indeed, its main purpose is to 1982, pp. 207–212. generate efficient embedded software for multiprocessor - [8] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer, get. Even if the PN interpretation of the network gives rele- “CIL: An Infrastructure for C Program Analysis and vant results for hardware, it is no longer adapted for software Transformation,” in Proceedings of CC’02, Apr. 2002, when it comes to hard real-time implementation. Liveness pp. 213–228. and memory boundedness are critical issues that cannot be guaranteed. More restrictive dataflow models (like SDF mod- [9] ISO/IEC FDIS 23002-2:2007(E), “Information technol- els) enable static scheduling with efficient code generations. ogy – MPEG video technologies – Part 2: Fixed-point However, only a subset of PN-actors can be scheduled stati- 8x8 inverse discrete cosine transform and discrete co- cally (especially in the RVC standard library). The next step sine transform,” 2007. is to identify static regions that obey SDF restrictions. In- deed, it is sometimes straightforward to schedule actors (for [10] C. Loeffler, A. Ligtenberg, and G. Moschytz, “Practical instance an actor with one action and no guard is SDF). Un- Fast 1-D DCT Algorithms with 11 Multiplications,” in fortunately, several actors in the RVC library are not so triv- Proceedings of ICASSP’89, Feb. 1989. ial. Future work will have to provide analysis of such ac- tors and/or CAL coding guidelines in order to achieve static