Multimed Tools Appl (2017) 76:8195–8226 DOI 10.1007/s11042-016-3448-5

Synthesis of simulation and implementation code for OpenMAX multimedia heterogeneous systems from UML/MARTE models

D. de la Fuente1 & J. Barba1 & J. C. López1 & P. Peñil1 & H. Posadas1 & P. Sánchez 1

Received: 21 January 2015 /Revised: 31 January 2016 /Accepted: 15 March 2016 / Published online: 30 March 2016 # Springer Science+Business Media New York 2016

Abstract The design of multimedia systems is becoming a more and more challenging task due to the combination of growing functionalities and strict performance requirements along with reduced time-to-market. In this context, the OpenMAX initiative defines a standard interface for the development and interconnection of HW and SW multimedia components. However, the simulation and implementation steps required to obtain the final prototypes of such complex systems are still a challenge. To solve these problems, this paper presents a framework which enables automatic code generation from high-level UML/MARTE models. SystemC and VHDL codes are synthesized according to the OpenMAX specification require- ments and they are integrated with the application SW, derived from task-based systems models. The generation of the SystemC executable specification enables easy simulation and verification of multimedia systems. After this verification stage, the framework automatically provides the VHDL code which feeds the final implementation and synthesis stage for the

* D. de la Fuente [email protected]

J. Barba [email protected] J. C. López [email protected] P. Peñil [email protected] H. Posadas [email protected] P. Sánchez [email protected]

1 University of Castilla-La Mancha, Ciudad Real, Spain 8196 Multimed Tools Appl (2017) 76:8195–8226 target platform. To demonstrate this approach, a SOBEL-based use case has been implemented with the developed framework.

Keywords OpenMAX . UML/MARTE . SystemC . VHDL . Automatic code generation

1 Introduction

The design of multimedia embedded systems is a highly competitive context. New multimedia devices typically include a wide set of applications, with an increasing number of intensive data processing operations in order to fulfil the new standards of audio, video or image quality. Furthermore, the development of a successful product greatly depends on being the first product on the market providing these new complex functionalities. However, these intensive data processing operations require an increment in the computing power of the system. To cope with this complexity, high performance features are demanded. For this purpose, some parts of the system can require implementations on heterogeneous platforms. Therefore, HW/SW integration is necessary. However, common HW/SW design flows are far from being easy and quick to be applied. The lack of standardization in HW/SW integration has led to ad- hoc implementations, requiring in-depth knowledge and great effort from the developers. A consequence of large, complex systems is the multiple design variations that can be considered during the design process. The elements that compose the system can have different properties with implications in their behaviour. These properties can be examined, enabling a design exploration process (DSE) to obtain the best configuration for each system element so as to optimize them and achieve the performance requirements. Furthermore, current chips integrate multiprocessor systems, usually combining a growing number of general-purpose processors (GPPs), different types of processing units (Digital Signal Processors and Graphic Processor Units) with configurable devices. In this way, application elements can be implemented either as SW or HW components during the design process. Therefore, new design approaches should establish a DSE process to find the best config- uration of the system elements, according to the specific characteristics which determine the correctness of the global system behaviour. In addition, these new design approaches should enable exploration of the system element mapping, taking advantage of the heterogeneous nature of current platforms in order to achieve the best system implementation according to the available resources. Different approaches have appeared in order to manage the design of large, complex systems. The adoption of standards supports the development of portable, flexible and reusable designs for embedded systems. At the same time, high-level methodologies also provide powerful solutions for system development and component reuse. Development of electronic system-level (ESL) design methodologies [17] provides a strategy for designing complex systems, in which the initial key activity is specification. SystemC [19, 38] is the most popular language, widely accepted by the ESL community. SystemC is a specification language to model system at functional level. Model-driven development (MDD) methodologies can simplify specifications and make them more understandable, which are major requirements for tackling the design challenge [14]. As an example, the use of standard languages, such as Unified Modelling Language (UML, [42]), provides easy to read and portable specifications. Multimed Tools Appl (2017) 76:8195–8226 8197

MDD methodologies are commonly adopted to handle the design of complex large functionalities. The latest design methodologies start from high-level UML models combined with algorithmic codes (e.g. C, C++, Matlab, etc.) of the different system components [40]. In these models, the user defines the system functionality and the target platform where this functionality is executed. The combination of both approaches takes advantage of the potential synergies in order to obtain an improved result. The approach presented here combines the benefits of the OpenMAX standard with a UML-based synthesis solution. The OpenMAX standard [27] is an initiative promoted by the and supported by many important companies such as AMD, , ARM, Sony, , , , etc. OpenMAX is based on components and defines a standardized media compo- nent interface for audio, video, image and others (as defined in the standard itself). The OpenMAX middleware allows developers and platform providers to integrate and communi- cate with multimedia codecs implemented in hardware or . Through the use of OpenMAX in embedded systems, developers can reduce the effort required to design multimedia HW/SW systems because: (a) the whole core logic can be reused when targeting a new platform (standardized interfaces), (b) it is not necessary to hand write the drivers or code that depends on these components and, (c) communication issues are separated from the processing primitives (the same synchronization protocols, standardized communication mechanism, etc.). In order to provide UML with specific semantics to support complex system design, a set of profiles have been developed. In the specific context of embedded systems, the Modelling and Analysis of Real-Time and Embedded Systems profile (MARTE [25]) adds to the UML language the concepts and semantics needed to describe real-time features at high abstraction levels. The UML Testing Profile [26] enables the definition of models that capture scenarios for system testing. Following this combined approach, this paper presents an infrastructure for automatic code generation for OpenMAX multimedia systems simulation and for the later implementation from UML/MARTE. This infrastructure enables executable specifications to be automatically obtained. Also, this executable models are used to simulate the OpenMAX designs captured in UML/MARTE and decide which system configuration is the most suitable for a latter validation of the system requirements. For the simulation, SystemC [35] has been used in this work since it is a modelling language that is applied to system-level modelling, architectural exploration, performance modelling, software development, functional verification and high-level synthesis. In addition to that, the SystemC executable specification automatic generated corresponds to a HW OpenMAX Integration Layer infrastructure [10]. This HW OpenMAX IL infrastructure can be executed and explored, considering different automatically generated test-benches as well. Finally, the UML/MARTE methodology captures enough information about the target platform to enable automatic VHDL code generation for the implementation of the SW/HW component interconnections, which are allocated in a Field Programmable Gate Array (FPGA). The paper is organized as follows; in Section 2, a study of the state-of-the-art is presented. Section 3 provides the context in which this work is developed. In Section 4, the complete design flow is described. All the aspects of the UML/MARTE methodology, which are the main goal of this paper, are presented in Section 5. In Section 6, the OpenMAX-SystemC simulation process is explained. Later, in Section 7 the OpenMAX synthesis process for 8198 Multimed Tools Appl (2017) 76:8195–8226 generation of VHDL code is defined. A study case is proposed in Section 8 and finally, some conclusions are presented in Section 9.

2 State-of-the-art

In order to facilitate understanding of this section, three main groups of related works will be described. These groups have a correspondence with the three main pillars used in this proposal: OpenMax as the standard of reference for integration and multimedia platform modelling, SystemC for executable specifications and UML language and MARTE profile as the tools to address the modelling tasks.

2.1 Implantation of OpenMAX standard in real commercial products

The OpenMAX standard [27] is a component interface for integrating and communicating multimedia codecs implemented in hardware or software. The implantation of OpenMAX in many commercial products is a fact. For example, in [22], NVIDIA demonstrated a prototype OpenMAX IL [27] implementation executing on an NVIDIA GeForce 3D handheld (GPU) to create a flexible, acceler- ated streaming-media pipeline. Later, NVIDIA developed the first dual-core processor for faster Web browsing and snappier response time called NVIDIA Tegra 2 [23], which supports OpenMAX for media acceleration. Adaptive Digital has an OpenMAX Integration Layer (OpenMAX IL) implementation that is used in TI products such as the Blaze development platform [2]. The DM814x/DM816x DaVinci Digital Media Processors [39], which are highly-integrated, programmable platforms, have implemented the OpenMAX development framework to provide a standardized and user-friendly API. In Android’s platform, multimedia functionality uses a client server architecture where OpenMAX IL components are owned by media server processes. In order to the OpenMAX IL functionality from a client process, such as a multimedia application, Android provides an IOMX interface [16]. The GST-OpenMAX project [37] extends the GStreamer multimedia framework with OpenMAX IL in order to provide the advantage of enabling access to multimedia components in a standardized way. Many component wrappers have been developed for different OpenMAX IL implementations, such as Bellagio [41], TI (OMAP-3430) [33]orMaemo[28]. GST-OpenMAX is already distributed in many embedded platforms, such as Angstrom [4] and . VisualOn, an and ARM-Connected Community member, devel- oped the VisualOn Media Engine (VOME [45]), a fully OpenMAX-compliant multimedia framework. VOME gives users full flexibility to design and build their customized multimedia features based on the OpenMAX standard. VOME is fully optimized for ARM-based proces- sors, including ARM9, ARM11 Cortex-A8 and Cortex-A9. VOME is optimized to reduce data flow overheads and integration effort and it can plug into hardware or software components through OpenMAX interfaces. These works are attractive to us since it is an opportunity to demonstrate the interest of this proposal. However, previous approaches do not consider the HW/SW support and system explo- ration required to optimize the design of multimedia systems on highly heterogeneous platforms. Thus, the aim of the proposed work is to develop a HW/SW design flow for multimedia systems based on OpenMAX standard, providing the designer with the capability to design, Multimed Tools Appl (2017) 76:8195–8226 8199 simulate, explore and generate OpenMAX-based applications that will run on a heterogeneous platform. For that purpose, a complete system specification methodology is required to enable the execution and verification of multimedia systems. The characterization of a system specifica- tion using an executable model helps the developer to obtain a global vision of the system which is going to be built, since the model is based on the objectives of the system. It is here where SystemC comes into play.

2.2 SystemC modelling for system specification

Targeting SystemC [13] enables the building of executable, platform agnostic validation environments. SystemC is a language which has been widely used for system-level and reusable test bench development, and moreover, has already enabled the development of advances features for supporting verification and debugging. [9] shows how formal models and directed acyclic graphs (DAGs) can be used to build a system on chip using SystemC. The cooperation of fast performance estimation techniques with SystemC has enabled a fast simulation of a complex including SW and custom HW parts. In [18], SW parts are simulated with a virtualization environment called Simics, while SystemC was used for modelling of custom HW devices. In [18] the SystemC kernel is done a slave system of the Simics kernel, and an efficient technique for check pointing of the SystemC custom HW was presented. While in this approach, SystemC serves to model HW devices as an integral part of the system model, our proposal is focused on the modelling of the system environment, thus out of the system. Modelling a system brings many benefits, among which the following can be highlighted: it helps to capture organize the knowledge about the system, it allows the early exploration of alternatives, it facilitates the decomposition and modularity of the system, it reduces the number of final mistakes, it facilitates the reuse and maintenance system, increasing the productivity of the development team and finally it simplifies the documentation process, etc.

2.3 UML/MARTE methodology and modelling for code generation

The application scope of the Unified Modelling Language (UML, [42]) has been extended to cover different domains from its beginnings as an object-oriented software engineering modelling language. In this context, in [43], the capabilities of the application for the design of electronic systems are described. Specifically, in order to exploit the benefits of UML as a modelling language, the MARTE profile was created to deal with the modelling of real-time embedded systems. Several proposals for bridging the gap between UML specifications and SystemC execut- able models have been published. The first area for combining UML and SystemC was using UML stereotypes for SystemC constructors. This combination was focused on a system-on- Chip (SoC) design methodology, such as [8]or[20]. With the work of Bocchio et al. [7]aSoC design flow was proposed based on a SystemC profile used to produce an executable model from a UML specification. Another way of combining UML and SystemC is using mapping rules [3] to establish an automatic transformation between UML and SystemC. This work provided the first ideas about the mapping among the UML elements and the SystemC elements. 8200 Multimed Tools Appl (2017) 76:8195–8226

Regarding the methodologies that generate SystemC from UML/MARTE, in [20]aUML/ MARTE methodology is presented which starts from UML sequence diagrams with MARTE timing constraints and generates a SystemC/TLM specification. The executable specification allows the specified sequence of information exchanged between components to be verified. In [30], a UML/MARTE methodology for specifying systems with heterogeneous communication semantics is presented. Then, a mapping to SystemC is described in order to obtain an executable specification. Gaspard2 [31] is a design environment for data-intensive applications which enables MARTE description of both, the application and the hardware platform, including MPSoC and regular structures. More specifically, [32] presents a generic control semantics for the specification of system adaptivity and specially dynamic reconfigurability in SoCs. The dynamic reconfigurability is implemented by generating the code for a dynamically reconfigurable region which relates to a high level application model, translated into a hardware functionality, generating the source code related to a reconfiguration controller, that manages the different implementations related to the hardware resource. MoPCoM [44] is a methodology that supports the UML/MARTE modelling for platform description and for architectural mapping, considering non-functional properties. MoPCoM defines three levels of generation. They include a level called Execution Modelling Level (EML), which targets the generation of models for performance analysis, which is suitable for obtaining performance figures used in DSE iterations. Additionally, the work reported in [44] mostly focuses on the Detailed Modelling Level (DML), intended for implementation, by enabling VHDL code generation. In addition to that, [15] enables to design of FPGA-based embedded system, supporting automatic generation of VHDL descriptions from UML/-MARTE model, establishing a mapping rules to translate high-level elements into VHDL constructs, allowing the generation of fully synthesizable descriptions, including the embedded system structure and behaviour. Other works take UML/MARTE models as input and generate executable code from them. In [32] the complete design flow to move from high-level MARTE models to code generation, for implementation of dynamically reconfigurable SoCs, is presented. In addition to that, this paper includes generic control semantics for the specification of adaptive and dynamic reconfigurable SoCs. The work [12] describes a component-based modelling methodology based on UML/MARTE and explicitly designed for supporting design space exploration. This UML/MARTE design exploration methodology is focused on the specification of a set of design solutions for the HW/SW architecture; characteristics of the HW resources (frequency, etc.) and multiple application allocations to HW resources. According to the previous works, the use of UML/MARTE as a high-level language for embedded systems design is widely extended. Additionally, the generation of SystemC executable specifications from UML models is a well-known activity. However, no model-based methodology has been found which specifies a complete design flow that, based on UML/MARTE high-level models, enables the SystemC simulation considering the standard OpenMAX. In addition to that, the design flow enables the system implementation considering HW/SW components according to the application mapping captured in the UML/MARTE model and the standard OpenMAX.

3Background

Our proposal is based on the use of the OpenMAX standard for the design and implementation of HW/SW multimedia systems. This combination of architecture and middleware enables a Multimed Tools Appl (2017) 76:8195–8226 8201 flexible and portable implementation of multimedia platforms to be obtained in a short period of time. To reach this goal, the OpenMAX standard was analysed in order to identify which parts should be ported as hardware components of the embedded platform using reconfigurable logic technology. A typical multimedia application in OpenMAX is composed by a chain of processing elements called OpenMAX components (OMX components from now on). Each component processes the data in its inputs, generating new data at its outputs, which will be consumed by the next component in the chain. Depending on the input/output ports, an OMX component can play the role of source (only output ports), filter (input and output ports) or sink (only input ports) role. An OMX component implements one or more media processing function belong- ing to one of the application domains identified in the standard, such as audio, video or image. To abstract and unify the access to the functionality embodied in the OMX components, the OpenMAX Integration Layer (IL) is defined. IL provides the platform with the required functionality to: load and unload OMX components, establish communications between OMX components, manage communication and synchronization by sending commands to the OMX component, configure the OMX component parameters and obtain the necessary resources. Hence, the OpenMAX IL implements some of the most important functionalities in OpenMAX becoming the core of the middleware. Using this IL layer, a processing chain made out of multimedia components can be created. A global vision of a multimedia processing chain, based on our component architecture is represented in Fig. 1 together with a potential implementation. In the example above, the B and C OpenMAX components are assigned to a HW implementation. In this platform, SW communication is carried out in compliance with the protocols and mechanisms defined in the OpenMAX implementation of reference. The HW components form a sub-chain of components, which will exchange buffers in a tunnelled way,

Fig. 1 HW/SW multimedia platform based on HW OpenMAX implementation 8202 Multimed Tools Appl (2017) 76:8195–8226 without the intervention of a mediator, following the producer-consumer philosophy defined in the standard. Every HW OpenMAX component needs a software wrapper to interact with the rest of the OpenMAX layers, including other SW OpenMAX components. In this way, it is almost straightforward to move certain functionality to HW starting from a SW implementa- tion. This SW wrapper represents the HW Component and reproduces the API as proposed in the standard. For some operations, the wrapper behaves as a façade, redirecting invocations to the HW OpenMAX Component. From the user point of view, the use and integration of HW OpenMAX components in the system are completely transparent. In order to support this mapping, a hardware implementation of the OpenMAX IL was proposed in [5]. The IL elements implemented include, for example, the entities and commu- nication methods defined in the standard and the concept of OMX component, given that it encapsulates the module that actually contains the multimedia function. Considering the HW OpenMAX implementation, there are two main parts (Fig. 2). On one hand, the HW implementation of the multimedia function called HW Media Core (HMC) and on the other hand, a placeholder for this HMC called OpenMAX HW Adapter (OHA). The OHA implements a set of OpenMAX primitives for configuration and communication man- agement of the OMX Component in HW. A simple, fixed interface is provided between the HMC and the adapter, mainly based on a memory interface. Therefore, HMC remains independent of the data communication protocols and the memory technologies, increasing the opportunities for reusing it. This interface contains one read port and one write port to the local memories present in the OHA, where input and output buffers will be stored. A buffer is the minimum amount of multimedia data to be exchanged between two OpenMAX components.

Fig. 2 HW OpenMAX component architecture Multimed Tools Appl (2017) 76:8195–8226 8203

Besides the memory interfaces, the OHA implements the control logic that governs the HMC execution (start and stop signals). Moreover, the HMC provides the OHA with infor- mation about its execution status. Depending on this status, the Data Transfer Engine (DTE) overlaps memory operations with the data transmission process. The proposed platform allows the development of HW/SW multimedia systems. Therefore, a flexible and transparent mechanism to manage the communication is necessary. To meet this requirement, the proposed solution relies on the facilities provided by the Object-Oriented Communication Engine (OOCE, [5]). OOCE is a hybrid middleware for SoCs which provides basic and advanced in-chip communication services to transparently handle communication between the SW and the HW parts of an heterogeneous embedded system. For example, in our case, the communication between components B and C follows the HW-to-HW semantics described in [6]. When the communication takes place between HW and SW components, the buffer exchange convention is implemented through main memory as illustrated in Fig. 3. The main problem is the simulation, evaluation and construction of the whole platform since, until now, it has been done manually. This has motivated the development of a design flow which automates the code generation for the simulation and HW synthesis process.

4Proposeddesignflow

The proposed design flow (Fig. 4) starts with the creation of a UML/MARTE model of the system. The model captures a component-based view of the system, as it will be explained in the following sections. From this model, an executable model of the system is automatically generated, which is used for simulation purposes. This model is written using SystemC and it comprises a series of entities which characterize the timing of the major platform elements such as OpenMAX components, buffers, etc. By means of SystemC code generation, the designer obtains several executable system specifications for a set of different system configurations. These system configurations are defined by the potential properties of each component such as buffer size, data rate, etc. In addition, SystemC code generation enables the evaluation of different component mappings.

Fig. 3 HW/SW and SW/HW communication model 8204 Multimed Tools Appl (2017) 76:8195–8226

Fig. 4 Proposed OpenMAX design flow

Considering HW or SW mapping, the design process provides essential information about the system, which enables the best system configuration to be identified. The data obtained from the successive simulations is used by the designer to refine the initial UML/MARTE specification. Finally, the framework automatically generates the VHDL code required for the imple- mentation on the target platform. The tool developed to support this workflow is based on Eclipse and Papyrus [29] has been used for the graphical system capture steps. Also, a code generator has been developed as a set of generation templates written in the standard MTL (Model Transformation Language) [24] and the development has been done through Acceleo [1], a code generation framework fully integrated in Eclipse.

5 UML/MARTE modelling

The system modelling methodology used in the proposed flow is characterized by following a component-oriented approach [36] and applying the Model-Driven Architecture (MDA) [46] principles to the development of the HW/SW embedded systems. In Component-based Software Engineering (CBSE) [34], the system is built as a compo- sition of application components interacting with each other only through ports, which defines the communication mechanism. Based on this approach, the application can be split into a set of clearly separable and reusable blocks, improving the organization of the product as well as Multimed Tools Appl (2017) 76:8195–8226 8205 its maintainability and modularity. Additionally, the internal behaviour of each component should be taken into account in the specification process. Keeping in mind the goals of the CBSE (i.e. composability and compositionality), this methodology is focused on the definition of a component model that enables the exploration and the implementation of OpenMAX systems. The system under design is specified by an UML/MARTE model before starting the flow. The graphical orientation of UML helps designers to handle large systems in an easy way. However, the UML/MARTE model has to contain all the relevant, essential information about the system, in order to enable the synthesis process to be performed. Thus, it is necessary to define a UML/MARTE methodology combining the benefits of a visual language with large amounts of information. To deal with this, the information contained in a UML/MARTE model is separated into specific concerns, depending on their application area. Each concern is captured in a model view, which is represented using the UML diagrams that best fit the concern. According to the MDA guide [46], the system model is organized in three different viewpoints: (i) the PIM, which describes the system functions independently of the underlying system platform; (ii) the system PDM, developed in parallel to the application, which depicts the hardware and software resources that form part of the system platform and (iii) the system architecture model PSM, which provides a description of the allocation of the application components onto the platform’s processing resources. These model viewpoints are mapped to different model views in the methodology. Nevertheless, the resulting viewpoints are still too complex to be described easily, espe- cially considering the large amount of information required for the automatic code generation. Thus, the UML/MARTE methodology proposed applies the idea of separation of concerns as a solution to create models composed of more, simpler views that are easier to develop and explore. The proposed separation of concerns is achieved by providing distinct system views to the designer. The ApplicationView includes the definition of the application components and the appli- cation structure. The CommunicationView captures the set of communication channels used for interconnect the different application components. HWResourcesView focused on the descrip- tion of the HW platform resources. SWPlatformView defines the SW components (OSs). ArchitecturalView specifies the HW/SW platform architecture and the mapping of the appli- cation structure on the HW/SW resources. VerificationView defines the system environment by specifying the environment components and the interconnections among these environment components and the system

5.1 PIM: application modelling

The PIM model describes the system as a set of application components interconnected. Application components are modular parts of the system that identify pieces of functionality that represent a certain behaviour. In the proposed methodology, the system functionality is divided into a set of intercommunicated OpenMAX components.

5.1.1 Application components

The application components are specified in the <> UML package. Each application component is captured as a UML component specified by the MARTE stereotype 8206 Multimed Tools Appl (2017) 76:8195–8226

<>. Then, a set of features has to be modelled for each OpenMAX RtUnit compo- nent: input and output rates, processing time, mode, compression factor, burst length and throughput. The data rates (input or output), denoting the data rate for incoming or outgoing data flows. The processing time of the component has to be defined as well. Each OpenMAX component has associated a behavioural mode that determines how it works. The mode defines the way the data are sent: full (the buffers should be full for data sending) or continuous (the data are sent in a continuous flow, while data are being stored in the buffers). The compression factor defines the relationship among the input and the output data streams. The burst length defines the size of the stream used by the component. Finally, each OpenMAX component has an associated memory address in order to identify the component for data transmissions. More- over, each component has associated a memory size. The input and output rates are modelled by two UML comments owned by the correspond- ing RtUnit and specified by the MARTE stereotype <>. In the attribute utility, BinputRate^ or BoutputRate^ of the application component is annotated. Then, the attribute occKind specifies the way the information is transmitted. Specifically, in order to characterize the rates, the arrival pattern is defined as BopenPattern^. Then, this BopenPattern^ is defined as BarrivalRate^, specifying the rate value; Figure 5 shows the capture of BinputRate^ and BoutputRate^ specifications. The throughput attribute is captured in a similar way. The features Bfactor^, Bmode^, Bburst length^ and Bprocessing time^ are captured in an UML constraint specified by the MARTE stereotype <> and owned by the RtUnit component where the modelling variables $processingTime, $factor, $ burstLength and $mode are annotated. In addition to these properties, the OpenMAX component has an associated memory size required for the component. The memory size associated with the component is annotated in the RtUnit attribute memorySize. All the previous features are modelled at component level, which implies that multiple instances of the same component have the same associated features. However, the memory address feature has to be captured at instance level, since the memory address of each

Fig. 5 Application component specification Multimed Tools Appl (2017) 76:8195–8226 8207

OpenMAX component instance has to be univocal. The memory address is captured in UML constraints specified by the MARTE stereotype <> where the modelling variable $memoryAddress is annotated. Then, by using a UML link, the constraint is associated with the specific OpenMAX component instance (Fig. 8).

5.1.2 Communication mechanisms

The OpenMAX components are interconnected through ports and channels. The ports are specified by the MARTE stereotype <> in order to capture the data-flow commu- nication of the components. These component features are modelled by a UML constraint specified by the MARTE stereotype <>, annotating the latency value in the latency attribute of the stereotype. The channels used for the application component interconnection are specified in the <> UML package. In this view, two different channels are distin- guished. The first channel is by means of buffers. These buffers are associated with the ports of the application components. These buffers are modelled as UML components specified by the MARTE stereotype <>. The attributes elementSize and resMult of this stereotype define the size of the data stored and the number of data that can be stored, respectively (Fig. 6). A buffer can be local to the application component or shared with other application components. In order to specify this last type of buffers, the MARTE stereotype <> should be used (Fig. 6). However, a single port can have an associated set of buffers. In this case, a UML component specified by the MARTE stereotype <> is used. In a CommunicationEndPoint component, a set of properties can be defined (Fig. 7). These properties are specified by the previously modelled buffers. The number of buffers of a CommunicationEndPoint port is modelled by using UML attribute multiplicity. Then, the application FlowPorts are specified by a CommunicationEndPoint component or by a StorageResource component (Fig. 6). This last case represents a port with only one buffer. There is a modelling constraint: all the buffers associated with a CommunicationEndPoint must have the same size, thus, the same value of the resMult attribute. The amount of data to be stored has to be the same as well for enabling the automatic synthesis generation. In addition, the CommunicationView includes the specification of the other channels used for the application component inter-connection. These channels are modelled as UML com- ponents specified by the MARTE stereotype <>. Two different communication semantics are considered: either tunnelled or non tunnelled. To model the tunnelled semantics, the MARTE stereotype <> should be applied to a CommunicationMedia component; the attribute mechanism should be specified as other. For the other communication semantics, no stereotype should be used.

Fig. 6 Buffer modelling 8208 Multimed Tools Appl (2017) 76:8195–8226

Fig. 7 Port CommunicationEndPoints

5.1.3 Application system specification

The OpenMAX System is composed of a chain of OpenMAX component instances intercon- nected through ports. The OpenMAX System is captured in a UML component that acts as the System’s top component included in the ApplicationView. In a composite structure diagram, instances of the OpenMAX components are included and connected by using UML connectors (Fig. 8). The start and the end of the chain are connected to the environment in order to provide and collect the system data by using ports associated with the top System component.

5.2 PDM: HW/SW platform description

The PDM specifies the different HW and SW resources of the target platform, where the PIM components are executed. Firstly, the specification of the HW or SW nature of the application components is considered. On one hand, the «SWPlatformView» includes the SW resources.

Fig. 8 OpenMAX system structure Multimed Tools Appl (2017) 76:8195–8226 8209

In this methodology, the SW resources are modelled as UML components specified by the stereotype «OS» [21]. On the other hand, the «HWResourcesView» relies principally on the use of the MARTE stereotype «HwResource». We now focus on the attributes which define an application component that is going be implemented in the reconfigurable logic fabric of our prototyping platform. Resource usage is modelled using the standard «ResourceUsage» MARTE stereotypes together with the UML standard stereotype «File» (see Fig. 9). A bitstream is a binary file that is used to program the FPGA in order to configure the logic gates to implement a custom circuit. The bitstream file provides data about how many resources will be needed (i.e. LUTs, Flip-Flops and Slices). Additional attributes have to be considered for inclusion into the UML properties set for the HwResource component. These properties are bufferSize, burstLength, numberOfBuffers and baseAddress.ThebufferSize attribute defines the size of the local memories in the component and it is defined as a MARTE NFP_DataSize; the number of buffers supported by the component is indicated by numberOfBuffers attribute, defined as an integer; the burstLength attribute adds information about the size of communication channel transactions to transmit the data and it is defined as a MARTE NFP_DataSize. Finally, the baseAddress attribute specifies the base address of the component in the system and it is defined as an NFP_Hexadecimal, a new non-functional property type defined for that purpose. Each component connected to a shared bus should have a range of specific addresses assigned. In this case, the first address is specified.

5.3 PSM: application allocation

Once the SW and HW resources have been defined, the RtUnit components are mapped to them. In the package «ArchitecturalView», the application instances defined in the PIM are allocated onto HW/SW resources. By using UML abstractions specified by the MARTE stereotype «Allocate», this application-HW/SW resource mapping is modelled. Figure 10 shown how this step applies to the application components defined previously in Fig. 8.

5.4 Description of the system environment

Finally, the UML/MARTE modelling methodology enables the capture of the system envi- ronment. This environment feeds the system with the data to be processed and collects the

Fig. 9 HW resource modelling 8210 Multimed Tools Appl (2017) 76:8195–8226

Fig. 10 Application HW/SW resource mapping result. The system environment is defined in the <> package. The VerificationView includes the elements that make up the system environment. In this VerificationView, a set of UML components is specified by means of stereotypes included in the standard UML Testing Profile (UTP). These components are specified by the UTP stereotype <>. A TestComponent can be considered as a source or sink depending on whether it only has out flow ports or in flow ports. In the Bsource^ TestComponent, the latency feature has to be defined by the MARTE stereotype <> associated with an UML constraint. All these TestComponents are integrated into a top component which defines the overall environment. This component is specified by the UTP stereotype <>. This component defines the structure of the environment and the interconnection of the environ- ment components to the system to be specified. The environment structure is modelled in a UML composite structure diagram associated with this TestContext component (Top_Application component of Fig. 11). This composite structure diagram contains instances

Fig. 11 Environment structure Multimed Tools Appl (2017) 76:8195–8226 8211 of TestComponents and a property typed by the UML component which acts as top System component, included in the ApplicationView, where the structure of the OpenMAX chain is specified. Since the ports that interact with the environment are defined in the UML compo- nents included in this model view, this property is specified by the UTP stereotype <> (System Under Test). Each environment component instance should have its own associated memoryAddress (see Fig. 11 above).

6 OpenMAX simulation process: SystemC generation

Once the system model has been captured, identifying the different configuration possibilities, it is necessary to fix the component properties (Section 5.1.1) before the generation of the final implementation. To reach this goal in an optimal way, the flow proposes performing multiple simulations, analysing the performance estimation results obtained from them. For each possible configuration specified in the model, the tool automatically generates the SystemC code required for the simulation. With this automatic code generation, the designer can analyse in an easy and efficient way all the possible configuration alternatives and make a decision. In order to launch the simulations, a simulation scenario is generated for each configuration. This scenario contains the system architecture and the system environment which perform as the test bench (Fig. 12). The system architecture is made up of a chain of SystemC OpenMAX components connected through a Global Communication Media, preserving the general topology and configuration of the actual implementation. The environment injects the input data into the chain. These data will go through the components and will be collected at the end of the chain. At this point, the time taken to complete the process is recorded together with some interesting parameters such as average buffer processing time, waiting cycles due to congestion or buffer transmission times.

Fig. 12 SystemC simulation scenario 8212 Multimed Tools Appl (2017) 76:8195–8226

According to the characteristics of the implementation of each HW or SW OpenMAX component captured in the UML/MARTE model, the crafted code generator tool automatically produces SystemC simulation code for each component as it is illustrated in Fig. 13. Following the standard hierarchical style for SystemC modelling, all subsystems in the OpenMAX component are connected via ports and channels. The functionality of the component is translated into a processing unit which is imple- mented in the HW Media Core (HMC). Internally, this processing unit contains a generic synchronous pipeline in which the number of stages and cycle time can be specified, making its temporal behaviour flexible and configurable. In order to make computation and commu- nication independent, a placeholder for this unit, called OpenMAX HW Adapter (OHA) has been implemented. The OHA is responsible for interpreting the OpenMAX primitives. Both elements (HMC and OHA) form the SystemC Hw OpenMAX Component. In the same way, the SystemC SW Component consists of a SW Media Core and OpenMAX SW Adapter. The main difference between a HW and SW SystemC Component is the time assigned to the SystemC channels. The communication between the component (HW or SW) and the rest of the system is carried out by means of special OOCE adapters to the global communication media (proxies and skeletons) implementing the FSM which controls this process. Mainly, the automatic generation process extracts the characteristics of the system and its composition from the UML/MARTE specification. The generator produces a set of files in order to simulate the whole system. Based on the same proposed SystemC OpenMAX component model, different templates for HW and SW components have been taken from a component library according to their temporal requirements. Each SystemC OpenMAX component is instantiated and connected to the global communication media by the generated test-bench.

Fig. 13 SystemC specification of a HW OpenMAX component Multimed Tools Appl (2017) 76:8195–8226 8213

//SW OpenMAX Component parameters

struct SWParameters { int baseAddr; //Each component (HW or SW) has one int bufferSize; int burst_len; int nstages; …. };

//Example of HW and SW OpenMAXComponent instances SWOpenMAXComp SWComp_i("OMXSWComp_i",SWParameters); HWOpenMAXComp HWComp_i("OMXHWComp_i",HWParameters); // Test-bench instance testb testb_i("myTest"); //Bus Adapters declarations and bus instance master< 32 > BusMasterAdapter = new BusMasterAdapter<32> ("BMA"); slave< 32 > BusSlaveAdapter = new BusSlaveAdapter<32> ("BSA"); bus<32> Bus = new Bus<32>("Bus"); //Bus binding example of Component (HW or SW) //testb_i must be bound in the same way BusMasterAdapter->master_sock(Bus->slave_sock); BusMasterAdapter->prtProx(HWCompo_i.prox_xport); BusSlaveAdapter->prtSkel(HWCompo_i.skel_xport); Bus->master_sock(BusSlaveAdapter->slave_socket); BusMasterAdapter->master_sock(Bus->slave_sock); BusMasterAdapter->prtProx(SWCompo_i.prox_xport); BusSlaveAdapter->prtSkel(SWCompo_i.skel_xport); Bus->master_sock(BusSlaveAdapter->slave_socket);

//Start Simulation sc_start();

As shown in the previous source code fragment, regardless of the component nature (HW or SW), for each instantiation of SystemC OpenMAX Component at least two bus adapters (one master and one slave) are needed. Both adapters are bound to the component adapters (proxy and skeleton) through the exports and to the shared bus (AHB in this case). Internally, a SystemC OpenMAX component is generated as follows:

class HWOpenMAXComp : public sc_core::sc_module { public: // Adapters Port Interface sc_export > skel_xport; sc_export > prox_xport; HMC HMC = new HMC(“HMC”); // HW Media Core OHA OHA = new OHA(“OHA”); //OpenMAX HW Adapter // Instances of communication channels OA_Skel_ch OA_Skel_ch; OA_Prox_ch OA_Prox_ch; // Constructor SC_HAS_PROCESS(OpenMAXComp); OpenMAXComp(sc_module_name nm, Parameters pamaters);

~OpenMAXComp(); };

The main difference between a SW OpenMAX component and a HW component is the temporal parameterization, which is reflected in the channels’ latency. For example, the 8214 Multimed Tools Appl (2017) 76:8195–8226 required time for memory access is different in HW components from the SW ones, so in memory channels these times are characterized different so as to reflect, for example, the overhead introduced by the cache or layers. MemOut_ch("MemOutCh",SWTime); //or HWTime

The Object Adapter (OA) module is responsible for interpreting the bus transactions and translating them into OpenMAX primitives. It contains one proxy and one skeleton for this task. //Skeleton reads bus transactions skel_port->read_fDecode(m_Address, m_dataSize, m_data); wait(SC_ZERO_TIME); if ((m_Address - BaseAddr) >= 0) { invocationType = m_Address - BaseAddr; Switch(invocationType) { case FILLBUFFER: skel_port->fillBuffer(); break; case EMPTYBUFFER: skel_port->emptyBuffer(m_data); break; … }

//Proxy writes data into the bus (TargetAddr+method, transfer size, data) proxy_port->writeMaster((SourceAddr+FILLBUFFER), 4, fillData);

As defined in the standard, in correspondence to the number of input and output buffers, each OpenMAX component will play the role of source, sink or filter in the chain. Based on the design space exploration findings, the parameters of the components are evaluated through a test case. Each possible parameter combination is translated into a SystemC test case which is automatically generated. During the test execution, the different SystemC modules gather statistical information that is relevant for them. This information is registered for further analysis in order to obtain meaningful conclusions about the performance of the system configuration under evaluation. The report presented to the designer includes: total application time, final throughput, average processing time per buffer and per component, average waiting times due to bus congestion, total number of transactions, overload due to synchronization messages, actual input and output buffer rates per component and average waiting time per component due to local memory saturation (not ready to receive or not ready to write). Depending on the results obtained after the simulation of the configuration alternatives, the specific configuration of each component is selected, leading to the VHDL synthesis process.

7 OpenMAX synthesis process: VHDL generation

After the simulation and having selected the best configuration for each component, the VHDL generation process can be started. Depending on the HW component position and its role in the chain, it is necessary to generate different templates. Figure 14 illustrates and justifies this need. The figure shows different scenarios in which an OpenMAX application consists of a collection of SW OpenMAX components, some of them having an association with a HW OpenMAX component. As it can be seen in the figure, depending on the relative position occupied by the HW OMX components in the SW chain, there are different types of HW Multimed Tools Appl (2017) 76:8195–8226 8215

Fig. 14 Different HW OpenMAX components sub-chains

OpenMAX component sub-chains and their implementations shows little differences which must be taken into account. For all of them, the VHDL code generation is done automatically. As mentioned in Section 3, if a HW OpenMAX component interacts with another SW OpenMAX component, this communication must to take place through main memory so as to the standard buffer exchanging mechanism is left untouched. For that reason, a Native Port Interface (NPI) implementation is proposed because it is the most efficient mechanism in the target prototyping platform. Source and sink HW OpenMAX components also can be connected to a physical device like input/output media data devices (webcam, monitor, Ethernet, serial, etc.) when they are located at the beginning or end of the processing chain (for example in scenario 1, 2 and 3). If there is only one component in the middle of the chain, it interacts with two SW components, so it has two NPI interfaces (scenario 4). The communication channel between HW OpenMAX components (typically a shared bus) determines the HW OpenMAX component adapters (proxy and skeleton). For this reason, various adapter templates have been implemented to support communication with several buses (AHB, PLB, OPB, AXI, etc.). For example, the following source code represents the top HW OpenMAX Component acting as a producer. In this case, the component is placed at the beginning of the processing 8216 Multimed Tools Appl (2017) 76:8195–8226

SW chain and receives its input data via Ethernet. The producer has to send its output data to a SW Component (so it needs a NPI interface) and it is connected to a PLB shared bus in order to receive the configuration parameters and the OpenMAX primitives. entity eth_npi_prod_omx is rx_statistics_valid : out std_logic; generic ( tx_statistics_valid : out std_logic; C_PLBV46_AWIDTH : integer := 32; rx_good_frame : out std_logic; C_PLBV46_DWIDTH : integer := 64; tx_underrun : in std_logic := 'X'; C_BASEADDR : std_logic_vector := rx_dcm_lock : in std_logic := 'X'; X"FFFFFFFF"; mdc : out std_logic; C_HIGHADDR : std_logic_vector := tx_start : in std_logic := 'X'; X"FFFFFFFF"; host_clk : in std_logic := 'X'; C_PLBV46_NATIVE_DWIDTH : integer pause_req : in std_logic := 'X'; := 64; host_miim_rdy : out std_logic; C_PI_ADDR_WIDTH : integer := 32; rx_data : out std_logic_vector (63 downto 0); C_PI_DATA_WIDTH : integer := 64; rx_data_valid : out std_logic_vector (7 C_PI_BE_WIDTH : integer := 8 downto 0); ); tx_statistics_vector : out std_logic_vector (24 port ( downto 0); -- PLB Common Signals host_opcode : in std_logic_vector (1 downto PLB_Clk : in std_logic; 0); PLB_Rst : in std_logic; pause_val : in std_logic_vector (15 downto PLB_PAVAlid : in std_logic; 0); PLB_RNW : in std_logic; tx_data : in std_logic_vector (63 downto 0); PLB_BE : in std_logic_vector(0 to tx_data_valid : in std_logic_vector (7 downto (C_PLBV46_DWIDTH/8)-1); 0); PLB_size : in std_logic_vector(0 to 3); rx_statistics_vector : out std_logic_vector (28 PLB_ABus : in std_logic_vector(0 to downto 0); C_PLBV46_AWIDTH-1); host_rd_data : out std_logic_vector (31 PLB_wrDBus : in std_logic_vector(0 to downto 0); C_PLBV46_DWIDTH-1); host_wr_data : in std_logic_vector (31 PLB_MAddrAck : in std_logic; downto 0); PLB_MwrDAck : in std_logic; tx_ifg_delay : in std_logic_vector (7 downto 0); -- PLB Slave Signals xgmii_txc : out std_logic_vector (7 downto 0); Sl_addrAck : out std_logic; xgmii_txd : out std_logic_vector (63 downto Sl_wrBTerm : out std_logic; 0); Sl_wrDAck : out std_logic; xgmii_rxc : in std_logic_vector (7 downto 0); Sl_wrComp : out std_logic; xgmii_rxd : in std_logic_vector (63 downto 0); -- PLB Master Signals host_addr : in std_logic_vector (9 downto 0); M_ABus : out std_logic_vector(0 to -- NPI Signals C_PLBV46_AWIDTH-1); XIL_NPI_Addr : out std_logic_vector M_wrDBus : out std_logic_vector(0 to (C_PI_ADDR_WIDTH-1 downto 0); C_PLBV46_DWIDTH-1); XIL_NPI_AddrReq : out std_logic; M_BE : out std_logic_vector(0 to XIL_NPI_AddrAck : in std_logic; (C_PLBV46_DWIDTH/8)-1); XIL_NPI_RNW : out std_logic; M_priority : out std_logic_vector(0 to 1); XIL_NPI_Size : out std_logic_vector(3 M_wrBurst : out std_logic; downto 0); M_request : out std_logic; XIL_NPI_WrFIFO_Data : out M_RNW : out std_logic; std_logic_vector(C_PI_DATA_WIDTH -1 M_Msize : out std_logic_vector(0 to 1); downto 0); M_size : out std_logic_vector(0 to 3); XIL_NPI_WrFIFO_BE : out std_logic_vector M_type : out std_logic_vector(0 to 2); (C_PI_BE_WIDTH-1 downto 0); XIL_NPI_WrFIFO_Push : out std_logic; XIL_NPI_RdFIFO_Data : in -- ETHERNET Signals std_logic_vector(C_PI_DATA_WIDTH-1 tx_dcm_lock : in std_logic := 'X'; downto 0); tx_ack : out std_logic; XIL_NPI_RdFIFO_Pop : out std_logic; reset : in std_logic := 'X'; XIL_NPI_RdFIFO_Empty : in std_logic; host_miim_sel : in std_logic := 'X'; XIL_NPI_RdFIFO_Latency : in mdio_out : out std_logic; std_logic_vector(1 downto 0); mdio_tri : out std_logic; XIL_NPI_RdModWr : out std_logic; rx_clk0 : in std_logic := 'X'; XIL_NPI_WrFIFO_AlmostFull : in std_logic; host_req : in std_logic := 'X'; XIL_NPI_InitDone : in std_logic ); rx_bad_frame : out std_logic; end eth_npi_prod_omx; mdio_in : in std_logic := 'X'; tx_clk0 : in std_logic := 'X'; Multimed Tools Appl (2017) 76:8195–8226 8217

The values of the parameters fixed according to the analysis of the simulation results—for example, buffer size, number of buffers, burst length or transmission mode among others—are set at generation time. These parameters can also be modified using the OpenMAX configu- ration primitives at runtime. Inasmuch a HW OpenMAX Component always has an associated SW component, a Bdummy^ SW OpenMAX component must be provided by the VHDL generation to each HW component. This SW component is a reduced version of a reference SW OpenMAX component [41]thatonly performs representational and configuration functions and is free of computing tasks. A flexible and transparent mechanism to manage the communication between HW and SW components is necessary. In addition, to cope with the temporal and computational requirements, it is essential to reduce the overload introduced by the integration infrastructure to a minimum. To do so, we rely on the facilities provided by the Object-Oriented Communication Engine (OOCE). OOCE is a hybrid middleware, based on bus architecture, for SoCs. It provides basic and advanced in-chip communication services to transparently handle communication between the SW and the HW parts of an embedded system. OOCE has been extended with new features and some optimizations in order to achieve the performance levels demanded by multimedia systems. The entire OOCE infrastructure is automatically generated by the OOCE interface compiler which will be fed with a Slice specification obtained from the UML/MARTE model. Elements in the system that are not generated directly by the OOCE compiler, such as internal memories in HW components (*.ngc files) or compiled files for example, are generated through the backend tools of Xilinx (Coregen, XPS and ISE). This generation task is guided by Makefile files.

8Studycase

In this section, we demonstrate the applicability of the approach by applying the design flow presented to the generation of the source code for an image processing chain. The main target of the application is to apply a SOBEL algorithm for edge detection in an image sequence. The implemented SOBEL requires greyscale pictures so a previous BRGB to Black and White^ filter is needed. Finally, the image obtained will be shown on a computer monitor. All computation tasks will be carried out in an FPGA, in particular a Xilinx Virtex-5 XC5VFX70T model. The prototyping board runs an uClinux operating system with a kernel 3.1 preloaded in the main memory (DDR). The application (illustrated in Fig. 15) will consists in four OpenMAX components: (a) the first receives the image via Ethernet, (b) the second converts the image to grey scale, (c) the SOBEL algorithm will be applied in the third step and finally the last component (d) transfers the image to the computer monitor. The reference model was introduced in the Fig. 8 (Section 5.1.3). In this model, all components have an associated estimation of the execution time, needed for the simulation phase. After analysing the results of each simulation, the following conclusions have been obtained:

& Using bursts of 64 words instead of 16, the execution time of the system improved by nearly 47.7 %. & Fixing the burst size to 64 words, the best buffer distribution was four buffers of 4 KB for each component (two in the inputs and the other two in the outputs). In this way, the multiple buffer technique [11] (also called ping pong buffer) can be applied. With this configuration, the system improved another 14.37 %. 8218 Multimed Tools Appl (2017) 76:8195–8226

Fig. 15 Application flow

& Moreover, if the components transfer the buffer contents in packets as soon as they are available in their output memories (avoiding waiting for the buffers to fill), the execution time is again reduced about 4.18 %. & The estimated performance is 68. 92 frames per second.

Taking into account the UML/MARTE model, the VHDL generation stage produces the infrastructure of the FPGA corresponding to Fig. 16. As can be seen in Fig. 16, a SW component, acting as a manager, has been generated for each HW component. In this particular case, the HW components can interact among themselves following the HW to HW communication model [10, 11] so it is not necessary to use the main memory as storage area. As the shared communication channel is a 64-bit PLB bus, the generator creates the PLB adapters for each component, including the necessary elements to establish the HW/HW and HW/SW communication through OOCE middleware. Regardless of what the model shows, if the buffers fit in the specified memories in the UML/ MARTE model, the HW component templates will be generated to use the resulting value of buffer sizes in the SystemC simulation phase although some memory can be wasted. In the same way, OOCE adapters are configured to transmit data in bursts whose size has been indicated in the simulation. The system generated has been tested with a sequence of images whose size is 640 × 480 pixels and which occupy approximately 300 Kbytes each. When the test finished, the number of processed images per second was 63.64, implying that the error regarding the simulation was only 8,29 %. Multimed Tools Appl (2017) 76:8195–8226 8219

Fig. 16 FPGA content after the synthesis

8.1 Lines of code automatically generated

To get an idea of the benefits of using the proposed design flow, the following table shows the lines of code that have been generated automatically instead of being handwritten from scratch.

Generated HW Component HW Component HW Component HW Component Elements Producer RGB2BW SOBEL SINK Bus Drivers 381 381 381 381 OOCE Adapters 763 763 763 763 FIFOs 278 278 278 278 OpenMAX 815 777 777 672 Adapter Top 271 247 247 239 SW Manag 3004 3004 3004 3004 Component TOTAL 5512 5450 5450 5337

All the generated infrastructure is focused on communication and integration and, as can be seen in the table, about 22,000 lines of source code were generated, leaving out the Makefile scripts. Therefore, the developers can focus their efforts exclusively on the codification of the application functionality. In our proposal, we make it use of a Xilinx tool called Vivado High- Level Synthesis (HLS). HLS tools take as their input a high-level description of the specific algorithm to implement and generate the RTL description of FPGA implementation. Modern HLS tools accept untimed C/C++ descriptions as input specifications. These tools give two interpretations to the same C/C++ code: (1) sequential semantics for input/output behaviour; and (2) architecture specification based on C/C++ code and compiler directives. Based on the C/C++ code, compiler directives and target throughput requirements, these HLS tools generate 8220 Multimed Tools Appl (2017) 76:8195–8226 high-performance pipelined architectures. Among other features, HLS tools enable automatic pipeline stages insertion and resource sharing to reduce FPGA resource utilization. In summary, HLS tools raise the level of abstraction for FPGA design, and make transparent the time-consuming and error-prone RTL design tasks. We have focused on using C++ descriptions, with the goal of leveraging C++ template classes to parameterized blocks in the architecture. Next, the pieces of C++ source code concerning the functional parts of the study case are shown below.

#include "rgb2bw.h" unsigned char rgb2bw(unsigned char r, unsigned char g, unsigned char b) { #pragma AP interface ap_ctrl_hs register port=return unsigned char aux; #if defined(__HLS_SYN__) while (1) { #endif aux = (3*r + b + 4*g) >> 3; return aux; #if defined(__HLS_SYN__) } #endif }

Source C++ code for RGB2BW filter

char maskSobelX[3][3]={{-1,-2,-1},{0,0,0},{1,2,1}}; char maskSobelY[3][3]={{-1,0,1},{-2,0,2},{1,0,1}}; unsigned char sobelFilter(unsigned char window[3][3]) { short i, j; char auxX = 0; char auxY = 0; for(i=0; i!=3; i++){ for(j=0; j!=3; j++){ auxX = auxX + window[i][j]*maskSobelX[i][j]; auxY = auxY + window[i][j]*maskSobelY[i][j]; } } if (auxX < 0) auxX *= -1; if (auxY < 0) auxY *= -1; return (auxX+auxY); } Source C++ code for SOBEL filter

From Vivado (and using the generated Makefiles files) the corresponding VHDL code for both filters (RGB2BW and SOBEL) has been generated. As a result, the following table shows the relationship between the code written by the developer and the code generated from Vivado. Multimed Tools Appl (2017) 76:8195–8226 8221

Filter Code written by developer Code generated from Vivado tool RGB2BW 13 426 SOBEL 16 513 TOTAL 29 939

The HW OpenMAX component, acting as producer in this case, encapsulates one core to govern the ethernet port. The ethernet implementation follows the same development pattern as used in the above cases. The C/C++ source code determines how to transmit and receive data from the ethernet port.

#include "rw_utils.h" data32.last = false; #include "parameters.h" while (!data32.last){ rx.read(data32); write32to8(data32, tx); void forwardPayload8to32(hls::stream&rx, } hls::stream&tx){ } byte position = 2; void eth_rx(hls::stream&rx, STR8 data8; hls::stream&tx){ STR32BE data32; #pragma AP interface ap_ctrl_none port=return rx.read(data8); data32.data(31, 24)=data8.data; data32.be[3]=0x8; byte ethType_1; data32.last = false; byte ethType_0; STR32BE data32; while (!data8.last){ STR8 data8; rx.read(data8); int i; data32.data(position * 8+7, position* 8)= if (!rx.empty()) { data8.data; skip8(rx, 6); data32.be[position]=1; // First 4 bytes of the src MAC data32.last = data8.last; data32.be = 0xf; if (position == 0){ for (i = 0; i < 4; i++) tx. write(data32); data32.data =(data32.data << 8)| data32.data = 0; rx.read().data; data32.be = 0; tx.write(data32); position = 3; // Last 2 bytes of src MAC + ethType } for (i = 0; i < 4; i++) else data32.data =(data32.data << 8)| position--; rx.read().data; } tx.write(data32); if (position){ forwardPayload8to32(rx, tx); tx.write(data32); } } } } void eth_tx(hls::stream&rx, hls::stream&tx){ void write32to8(STR32BE data32, #pragma AP interface ap_ctrl_none port=return hls::stream&tx){ STR32BE data32; STR8 data8; STR8 data8; int i; byte ethType_1; data8.last = false; byte ethType_0; if (!data32.last){ data8.last = false; for (i = 3; i >= 0; i--) { if (!rx.empty()) { data8.data = data32.data(i * 8 +7, i * 8); // target MAC: first 4 bytes tx.write(data8); rx.read(data32); } write32to8(data32, tx); } // target MAC: last 2 bytes + eth type else { rx.read(data32); byte validBytes = data32.be[3]+data32.be[2]+ data8.data = data32.data(31, 24); data32.be[1]+data32.be[0]; tx.write(data8); for (i = 3; i > 4 - validBytes; i--) { data8.data = data32.data(23, 16); data8.data = data32.data(i * 8 +7, i * 8); tx.write(data8); tx.write(data8); // src MAC } sendSequence8(tx, 6, ETH_ADDR, false); data8.data = data32.data(i * 8 +7, i * 8); // eth type data8.last = true; data8.data = data32.data(15, 8); tx.write(data8); tx.write(data8); } data8.data = data32.data(7, 0); } tx.write(data8); void forwardPayload32to8(hls::stream forwardPayload32to8(rx, tx); &rx, hls::stream&tx){ } unsigned short size = 0; } STR32BE data32; 8222 Multimed Tools Appl (2017) 76:8195–8226

Using Vivado, the total VHDL lines generated from the above high-level source code added up to 2520. Finally, the consumer HW component was completely handwritten and therefore a reusable VGA driver was developed. It takes the RGB components of its input local memory and produces the values through the VGA interface. In total, there are 729 lines in the source code. In conclusion, the developers’ effort to build this application has been reduced significantly to just the writing of a few lines of code. Taking this code and the UML/MARTE model as a reference, the proposed framework can generate the rest of the system and also find the best configuration values.

9 Conclusions

The paper presents an UML/MARTE methodology with sufficient modelling capabilities in order to enable the design of current multimedia systems according to the specification requirements that the standard OpenMAX provides. This UML/MARTE methodology enables the system application structure to be modelled, defining all the structural and communication semantic characteristics which completely specify an application component. Then, the high- level modelling methodology enables the specification of test-benches in SystemC which are used to establish a design exploration process in order to find the best configuration of the system. The SystemC automatic executable generation enables the different timing execution performances to be obtained. Depending on the values obtained, the best system configuration can be selected. The best system configuration is defined in the model, depending on the SystemC simulation results. The UML/MARTE methodology establishes a mapping onto HW or SW resources of the target board, generating the VHDL code required for the final implementation.

Acknowledgments This research was supported by the Spanish Ministry of Economy and Competitiveness under the project REBECCA (TEC2014-58036-C4-1-R), and by European Regional Development Fund and Regional Government of Castilla-La Mancha under the project SAND (PEII11-0227-0070).

References

1. Acceleo website. www.acceleo.org.Nov.2010 2. Adaptive Digital Technologies, Inc. Adaptive digital OpenMAX IL implementation. 2012. http://www. adaptativedigital.com 3. Andersson P, Höst M UML and SystemC a comparison and mapping rules for automatic code generation. FDL’07 4. Angstrom OpenMAX. http://omappedia.org/ 5. Barba J, de la Fuente D, Rincón F, Moya F, López JC (2010) Hardware native support for efficient multimedia embedded system. IEEE Trans Consum Electron 56. ISSN 0098–3063 6. Barba J, Rincon F, Dondo JD, Moya F, Villanueva FJ, Villa D, Lopez JC (2007) OOCE: Object-Oriented Communication Engine for soc design. DSD-Euro-Micro Conference on Digital System Design. Lubeck (Germany) 7. Bocchio S, Riccobene E, Rosti A, Scandurra P A SoC design flow based on UML 2.0 and SystemC. In: DAC, Workshop UML-Sock’05 8. Bruschi F, Di Nitto E, Sciuto D SystemC code generation from UML models. Forum on Specification and Design Languages’02 Multimed Tools Appl (2017) 76:8195–8226 8223

9. Cansell D, Culat JF, Méry D, Proch C (2004) Derivation of SystemC code from abstract system models. In: proc. of FDL’04. Lille. France 10. de la Fuente D, Barba J, Rincón F, Dondo JD, López JC Embedded systems—high performance systems. pp 129–154. ISBN: 978-953-51-0350-9 11. de la Fuente D, Barba J, Dondo J, Rincón F, López JC OpenMAX compliant heterogeneous multimedia embedded plaftorm. In DCIS’12 12. Herrera F, Peñil P, Villar E, Ferrero F, Valencia R (2012) An embedded system modelling methodology for design space exploration. JCE 13. IEEE Std. 1666–2011 (2012) IEEE Standard for Standard SystemC® Language Reference Manual. Available at http://standards.ieee.org/getieee/1666/download/1666-2011.pdf 14. Kopetz H The complexity challenge in embedded system design. In: 11th IEEE ISORC 15. Leite M, Vasconcellos CD, Wehrmeister MA (2014) Enhancing automatic generation of VHDL descriptions from UML/MARTE models. 12th IEEE International Conference on Industrial Informatics (INDIN) 16. Lukas S Staff Engineer, QuIC, Inc. Accessing hardware-accelerated video codecs on Android™. UPLINQ Conference. June, 1–2, 2011. Manchester Grand Hyatt, San Diego CA 17. Martin G, Bailey B, Piziali A (2007) ESL design and verification: a prescription for electronic system level methodology (systems on silicon). March 9, 2007. ISBN-10: 0123735513 18. Monton M, Gladigau J, Haubelt C, Teich J (2010) Checkpoint and restore for SystemC models. In: Borrione D (ed) Advances in Design methods from modelling languages for embedded systems and SoCs. Springer. ISBN- 978-90-481-9304-2 19. Müller W, Rosenstiel W, Ruf J (2003) SystemC, methodologies and applications. ISBN 1-4020-7479-4 20. Muller W et al (2010) The SATURN approach to sysML-based HW/SW codesign. IEEE Annual Symposium on VLSI, ISVLSI 21. Nicolás A, Peñil P, Posadas H, Villar E Automatic synthesis over multiple from UML/MARTE models for easy platform mapping and reuse. DSD/SEAA Conference, 2014–08 22. NVIDIA (2006) Demonstrates high definition processor. Las Vegas, Nevada 23. NVIDIA Khronos Apps SDK (2010). http://www.nvidia.com 24. OMG (2008) MOF model to text language 25. OMG. MARTE Profile 1.1 website. http://www.omgmarte.org/. Nov 2014 26. OMG. UML Testing Profile (UTP) 1.1 website http://utp.omg.org/. Nov 2014 27. OpenMAX website. https://www.khronos.org/openmax/. Nov 2014 28. Palojärvi J, Bergström T (2010) Maemo base port. Nokia Corporation 29. Papyrus website. http://www.papyrusuml.org/ 30. Peñil P, Medina J, Posadas H, Villar E (2009) Generating heterogeneous executable specifications in SystemC from UML/MARTE models. UML-FM 31. Piel E, Atitallah R, Marquet P, Meftali S, Niar S, Etien A, Dekeyser J-L, Boulet P (2008) Gaspard2: from MARTE to SystemC simulation. In: Proc. of the DATE’08 Workshop on Modeling and Analysis of Real- Time and Embedded Systems with the MARTE UML Profile 32. Quadri IR, Yu H, Gamatié A, Rutten E, Meftali S, Dekeyser J-L (2010) Targeting reconfigurable FPGA based SoCs using the UML MARTE profile: from high abstraction levels to code generation. Int J Embed Syst 33. Rintaluoma T, On2 Technologies. Optimizing H.264 decoder for Cortex-A8 with ARM NEON OpenMax DL implementation. pp 32–37. http://www.iqmagazineonline.com/Archive 27 34. Schmidt DC (2006) Model-driven engineering. IEEE Comput 39(2):25–31 35. SystemC website. http://www.accellera.org/. Nov 2014 36. Szyperski C (2002) Component software: beyond object-oriented programming, 2nd ed. Addison-Wesley Professional 37. The Institution of Electronics and Telecommunications Engineers (2011) IETE technical review. ISSN: 0256–4602 38. The open SystemC initiative www.systemc.org 39. TI software makes development easy for DM8168 and DM8148 DaVinci™ digital media processors. Technology for Innovators 2011. http://www.ti.com/ 40. UML website. http://www.omg.org/spec/UML/2.4/. February 2013 41. Urlini G (2007) Bellagio OpenMAX component writer’sguide 42. Vanderperren Y, Mueller W, Dehaene W (2008) UML for electronic systems design: a comprehensive overview. Des Autom Embed Syst 12(4) 43. Vanderperren Y, Mueller W, Dehaene W (2008) UML for electronic systems design: a comprehensive overview. J Des Autom Embed Syst. Springer Verlag 44. Vidal J, de Lamotte F, Gogniat G, Diguet JP, Soulard P (2010) UML design for dynamically reconfigurable multi processor embedded systems (DATE) 8224 Multimed Tools Appl (2017) 76:8195–8226

45. VisualOn (2011) Enabling home entertainment and mobile multimedia. TI Technology Day. Taipei 46. Yamashita K (2010) Possibility of ESL: a software centric system design for multicore SoC in the upstream phase. Design Automation Conference (ASP-DAC)

David de la Fuente received the Computer Eng. Diploma from the University of Castilla-La Mancha (UCLM) in 2007. Since then he works as a Teaching Assistant at the UCLM. Currently, he is finishing the PhD degree in Computer Science at the UCLM. His current research interests include heterogeneous distributed systems, SoCs and embedded system design and Hw/Sw integration.

Jesús Barba received the MS and PhD degrees in Computer Engineering Diploma from the University of Castilla-La Mancha (UCLM), Spain, in 2001 and 2008 respectively. He is working as Associate Professor with the Department of Information and Systems Technology since 2001. His research interests include SoCs, HW/ SW integration and reconfigurable systems. Multimed Tools Appl (2017) 76:8195–8226 8225

Juan Carlos López received the MS and Ph.D. degrees in Telecommunication (Electrical) Engineering from the Technical University of Madrid in 1985 and 1989, respectively. From September 1990 to August 1992, he was a Visiting Scientist in the Department of Electrical and Computer Engineering at Carnegie Mellon University, Pittsburgh, PA (USA). His research activities center on embedded system design, distributed computing and advanced communication services. From 1989 to 1999, he has been an Associate Professor of the Department of Electrical Engineering at the Technical University of Madrid. Currently, Dr. López is a Professor of Computer Architecture at the University of Castilla-La Mancha where he has served as Dean of the School of Computer Science from 2000 to 2008. He is and has been member of different panels of the Spanish National Science Foundation and the Spanish Ministry of Education and Science, regarding the Information Technologies research programs. He is member of the IEEE and the ACM.

P. Peñil received the master degree in Physics in 2002 and the master degree in Telecommunication Engineering in 2009 from the University of Cantabria, Santander, Spain. His research interest covers co-design methodologies based on UML by applying the MARTE profile. 8226 Multimed Tools Appl (2017) 76:8195–8226

Dr.H.Posadasreceived the master degree in Telecommunication Engineering in 2002 from the University of Cantabria, the degree in Informatics from the UNED in 2006, and the Ph. D in Electronics from the University of Cantabria in 2011. He is currently an assistant teacher at the Electronics Technology, Automatics and Systems Engineering Department. His research interests include co-design methodologies for embedded systems, focus- ing on high-level modelling, performance estimation, and SW synthesis.

Pablo Sánchez received the Ph.D. degree in Physic (Electronic) from the University of Cantabria, Santander, Spain, in 1991. He is currently an Associate Professor of Electronic Technology with the Department of Electronic Technology, Engineering and Systems of the University of Cantabria. His current research interests include Embedded System Specification, Design and Verification Methodologies, Performance Analysis and Simulation, Embedded System Safety and Security, Functional Verification Techniques and Implementation of Embedded Systems (mainly Vision Algorithms).