POLITECNICO DI MILANO Master of Science inTelecommunication engineering

Electronics, Information and Bioengineering Department

Visual Search modeling for end to end simulation of a Cyber Physical Systems

Supervisor: Prof. Marco Marcon Advisor: Eng. Danilo Pietro Pau Eng. Emanuele Plebani

Graduation thesis of: Shen Yun 835888

Acadmic year 2015 - 2016 Acknowledgements

There are people whom I would like to acknowledge, for their assistance and support during my studies in Politecnico di Milano. I would like to thank all the wonderful teachers, colleagues, family, and friends whom I have been fortunate to interact with during my lifetime.

I would like to take this opportunity to express my sincere gratitude and appreciation to my supervisor in STMicroelectronics, Danilo Pau for his countless efforts in guiding and encouraging me throughout my studies and work. His friendly attitude has been a very strong support for me to work with him. This work without his guidances and encouragements was not possible. I am so grateful to him.

I am also thankful to Prof. Marco Marcon, for his valuable advices in monthly meetings and discussions, which was a plus point during my re- search. Without doubt, all these meetings together guided me into a bright way to handle the research and studies.

I would like to give special thanks to Eng.Emanuele Plebani and Eng.Marco Brando Paracchini in STMicroelectronics for the uncountable helps and ad- vices they have given to me during my internship in STMicroelectronics.

Also I have to thank my colleagues in the university, for their effortless helps, valuable advices and discussions. We had great and unforgettable times during all these years.

The last but not least, I am so thankful to my family whom they have been a continuous source of encouragements and supports in all directions during my life.

ii

Abstract

We live in a interconnected digital world. Over the past couple of decades, “the internet of the things” has permeated every aspect of modern life and its impact keeps growing in the future years. They features computational cores and all kinds of hetereogeneous sensors are all around and sense us, allowing a deeper interaction with the physical world, collecting, storing and exchanging intelligently information. These systems that link the digi- tal (cyber) with the physical world have become known as Cyber-Physical Systems (CPS). Visual Search (VS) is an Content Based Image Recognition (CBIR) application able to retrieve information of a query image comparing it against a large image database. We can consider VS as an application that can built upon a very large and distributed CPS. composed by a limitless number of interconnected mobile devices, as many as the number of image sensor available in the world, and servers in the cloud. The goal of this thesis is to build a CPS simulator for the client, the network and the server, and mapping onto it the Visual Search application. This CPS simulator need to simulate efficiently and accurretly several full operating systems as server and users coupled with a network.

Key words: CPS, gem5, OMNeT++, Visual Search , CERTI HLA, Re- trieval, CDVS

iv

Contents

Acknowlegements ii

Abstract iv

1 Introduction 1 1.1 IntroductionofCPS ...... 1 1.2 IntroductionofVisualSearch ...... 2 1.3 Motivation ...... 2 1.4 Objectives...... 3 1.4.1 ObjectivesoftheMVS...... 3 1.4.2 ObjectivesoftheCPSsimulator ...... 4 1.5 MajorContributionoftheThesis ...... 4 1.6 OrganisationoftheThesis...... 5

2 State of the art 7 2.1 StateoftheartofCPS...... 7 2.1.1 BackgroundofCPS ...... 7 2.1.2 ApplicationoftheCPS ...... 9 2.2 CPSsimulationtools ...... 11 2.2.1 Introduction of some Processor-Only Simulation Tools 12 2.2.2 Introduction of some Network-Only Simulation Tools . 14 2.2.3 Introduction of some Processing Simulators with a NetworkExtension ...... 17 2.3 StateoftheartofVisualSearch ...... 19 2.3.1 BackgroundofVS ...... 19 2.3.2 Existing Visual Search Applications ...... 22 2.3.3 Compact descriptor for Visual Search(CDVS) . . . . . 24 2.3.4 The functionality and advantage of CDVS ...... 25

vi 3 The Structure of the MVS and CPS simulator 28 3.1 ThestructureoftheMobileVisualSearch ...... 28 3.2 Mapping the Mobile Visual Search on the CPS...... 31

4 Visual Search 33 4.1 DescriptorExtraction ...... 33 4.2 RetrievalStage ...... 38

5 Building of the CPS simulator 41 5.1 CPSProcessingSubsystem ...... 44 5.1.1 ConfigurationoftheGEM5system ...... 45 5.1.2 NetworkmodelofGEM5 ...... 46 5.2 CPSNetworkSubsystem ...... 48 5.3 Integrationtool: CERTIHLA...... 49 5.3.1 CERTIHLAarchitecture ...... 50 5.3.2 CERTIHLASynchronisation ...... 51

6 Visual Search Evaluation on CPS simulator 55 6.1 TestScenario1 ...... 56 6.1.1 Descriptor Extraction on the user side ...... 56 6.1.2 Retrievalonserverside ...... 58 6.2 TestScenario2 ...... 61 6.3 TestScenario3 ...... 64

7 Conclusion and Recommendation 69 7.1 Conclusion ...... 69 7.2 Recommendation ...... 69

Bibliografia 71

A Imgae Retrieval results with only global descriptors 78

B Imgae Retrieval results with global and local descriptors 80

C Excution time of the simulation in Test scenario 2 82 List of Tables

6.1 Information of simulated ARM CPU and the reference real ARMCPU ...... 57 6.2 ExecutiontimeofDescriptorExtraction ...... 57 6.3 Information of simulated X86 CPU and native X86 CPU . . . 59 6.4 ExecutiontimeofRetrievalStage...... 59 6.5 Network latency of the single client situation ...... 62 6.6 Executiontimeofclientandserver ...... 64 6.7 Network latency of the three clients situation...... 66 6.8 Execution time that clients get the results...... 67

viii List of Figures

2.1 WorldSenssimulator...... 19 2.2 AnexampleofGoogleGoogles...... 22 2.3 AnexampleofAmazonFlows...... 24

3.1 UML user case diagram for Mobile Visual Search...... 29 3.2 UML Sequence diagram for Mobile Visual Search, client en- coding...... 29 3.3 UML Sequence diagram for Mobile Visual Search, server en- coding...... 30 3.4 UMLSequence diagram for Mobile Visual Search, single server withmulticlients...... 31 3.5 Mobile Visual Search mapped on the CPS simulator . . . . . 32

4.1 PipelineofDescriptorExtraction ...... 33 4.2 Theworkflowofkeypointdetection ...... 34 4.3 BlockImageLOGFiltering ...... 35 4.4 GlobalDescriptoraggregation...... 37 4.5 DiagramofRetrievalalgorithm ...... 40

5.1 CPS Processing Subsystem inputs and outputs...... 41 5.2 CPS Network Subsystem inputs and outputs...... 43 5.3 CPSsimulatorProcessingSubsystem ...... 45 5.4 GEM5 systems interconnection in the CPS simulator . . . . . 47 5.5 CPSsimulatornetworksubsystem...... 49 5.6 TheCERTIHLAarchitecture...... 51 5.7 CERTIHLAGlobalSynchronization...... 53 5.8 CERTI HLA Local and Global Synchronisation ...... 54

6.1 ExecutiontimeofDescriptorExtraction ...... 58 6.2 ExecutiontimeofRetrievalStage ...... 60 6.3 Execution time of Retrieval Stage(host seconds)...... 60 6.4 Network topology for single client and single server ...... 61

ix 6.5 Network latency of the single client situation ...... 63 6.6 Executiontimeofclientandserver ...... 64 6.7 Network topology for multiple clients with a server ...... 65 6.8 Network latency of the three clients situation...... 67 6.9 Execution time that clients get the results...... 68 Chapter 1

Introduction

1.1 Introduction of CPS

Cyber-Physical System (CPS) is a mixture of components and physical components. More specifically, in 2006 Hellen Gill introduced this term to indicate:

“...physical, biological, and engineered systems whose operations are inte- grated, monitored, and/or controlled by a computational core. Components are networked at every scale. Computing is deeply embedded into every phys- ical component, possibly even into materials. The computational core is an embedded system, usually with requirements of real-time responses, and is most often distributed” [19]

CPSs are based on the interaction between physical and digital world. On one hand they have to deal with problems deriving from manufacturing processes, the behavior of physical materials and the unpredictability of the physical world; on the other hand, they have to deal with the openness of the internet and its risks[28], such as physical disasters caused by cyber- attacks[11]. CPS is kind of similar to the the term Internet of Things (IoT), they share the same goal - building large scale distributed computing system - and the same core technology - embedded systems and the Internet[28]. On the other hand, there are some difference between them. IoT is driven by the computer science community and it is more focused on networks and open standards whilst the CPS focus is more on the physical systems and their engineering problems[28]. 1.2 Introduction of Visual Search

Visual Search (VS) is a Computer Vision task which aims to analyze the actual content of an image and to search similar contents inside a large database of images. In the Mobile Visual Search form, an image is captured by a mobile device and then sent to remote server(s) in a cloud, where the visual recognition is performed by complex computer vision algorithms that compare the user’s image with a set of encoded images on the cloud. This kind of problem was first introduced in the early 90s and it kept on growing with the expansion of the Internet. Nowadays mobile devices with cameras are the standard so this kind of problem become more topical than ever.

VS is a reliable application that solves the very complex task of search- ing content among billions of images by analyzing the actual information depicted in the image. Recognizing the most similar image among a very large database could be useful in numerous practical situation. For example a photo of a street automatically taken by the car could be used to assist the driver localizing himself in absence of GPS signal (e.g. urban canyon) by simply searching the most similar picture in a street view dataset (i.e. Google Street-View). Another example could be using a smartphone and a paper shop flyer to shop in on-line stores simply taking pictures of some wanted products. In general VS applications could also substitute QR codes in almost every situation in which these last are used: for example in a mu- seum one could receive multimedia information of a painting simply taking a picture of it, an action that one would perform anyway, instead of getting very close to it and taking a picture of the very small QR code besides it.

1.3 Motivation

On the one hand, CPSs offer much greater benefits and have the potential to dwarf the 20th century IT revolution[77]. The most powerful feature of CPS is adaptability and fast innovation: for example, a CPS could gather useful information about objects of the physical world and use them to improve a manufacturing process. The continuous evolution of embedded and ubiq- uitous computing technologies, in terms of decreasing costs and increasing capabilities, may even lead to the distribution of existing business processes not only to the network itself but also to “network edges”, i.e. the CPS, and can overcome many limitations of existing centralized approaches[77]. An- other example of CPS that would have a significant impact in the imminent future is the rise of “Industry 4.0”, that is, an industrial information rev-

2 olution comparable to previous revolution as the energy revolution (1900) and the digital revolution (1970)[47]. For example, automatic triggers for maintenance processes and processing chains able to control other processes autonomously are only a few of possible improvements in this field. A fur- ther field of application field is the evolution of the electricity grid into a Smart Grid[30].

On the other hand, with respect to that our reliance on smartphone and tablets to be conduits of relevant, real-time information grows, the lines are blurring between the time spent in the physical world and our experiences in, and expectations of, the digital world, Mobile Visual Search is a technol- ogy to narrow gap between physical awareness and digital experience. We can consider VS as an application that can built upon a very large Cyber Physical System (CPS) composed by a limitless number of interconnected mobile devices and cloud servers. In particular this kind of CPS is the kind of CPS mentioned in the forefront of this subsection which is based on a sensor able to sample images, coupled with embedded computational in- telligence and featuring transmission capabilities over a bandwidth limited transmission channel.

Last, Visual Search takes a very important role in the age of Big Data. Implementing the visual search application on a CPS simulator not only gives us a directive perception about that how can the mobile visual search runs in the real-world but also the visual search can used to measure the CPS system the network settings and efficiency. What’s more the Mobile Visual Search is a test user case of the COSSIM project (CPS Simulator Framework) funded by the European Commission.

1.4 Objectives

The main objective of this thesis is to simulate the Mobile Visual Search(MVS) application on a CPS simulator

1.4.1 Objectives of the MVS

• User of MVS Extract the descriptor the encoded form of the image from the query image and send it to server.

• Server of MVS According to the received descriptor, extract the match- ing results of the query image.

3 1.4.2 Objectives of the CPS simulator There are three main part of the CPS simulator, Processing Subsystem, Network Subsystem and the integration of the two. • Processing Subsystem Due to the client platforms for MVS are smart- phones, which, for the most part, have SOCs comprising multicore processors based on the ARM architecture and the server nodes are high performance x86 platforms, so the Processing Sub-system should model operating systems based on ARM and X86.

• Network Subsystem The smartphone clients and the cloud are inter- connected by a wireless network. In the main scenario, the wireless links are phone connections over a mobile network. The CPS simulator should support a wireless network.

• Integration of the Network Sub-system and Processing Subsystem Bringing the processing and the network simulators together requires carefully designed communication interfaces and synchronization schemes. This bidirectional interface need to pass information on the type and timing of events and to provide a common data representation, since data are represented differently in the processing and the network sim- ulators.

1.5 Major Contribution of the Thesis

A Mobile Visual Search application is implemented and simulated in a CPS simulator. The major contribution in chapter 3 give an overview of the structure of the Visual Search integrated in the CPS simulator. The major contribution in chapter 4 is the algorithm of the Visual Search is presented, the descriptor extraction performed by the client and retrieval stage per- formed by the server. In chapter 5, the details about the Processing Sub- system gem5 simulator and Network Subsystem OMNet++ simulator and the configuration, network topology are described, besides, the integration of the network subsystem and processing subsystem and their synchronisa- tions are presented here. In chapter 6, we focused on a set of quantitative performance, the quality of results compared with the native system, the execution time which includes the time recorded by simulator and the real time passed, and the network latency. These metrics are tested in three scenarios, a client(on ARM) extract the descriptor and retrieval the results, the single server and single client( client extract the descriptor and server retrieve the results), last scenario is a server with multiple clients.

4 1.6 Organisation of the Thesis

Chapter 2 is an overview of the CPS and Visual Search, the state of the arti- cle of both and the industry use of CPS and the Visual Search applications exist today willbe given. Futhermore, the CPS simulation tools and the technology Compressed Desvriptor for Visual Serach(VS) will be discussed.

Chapter 3 will give an overview of the architecture of the Visual Search modelling for end to end simulation of CPS. First the user and server’s func- tionality in the Visual Search will be illustrated with the UML diagrams, in which the single user with single server scenario and multiple users with single clients scenario will be discussed. Last the the interaction diagram that the MVS mapping on the CPS simulator will be given which let us have an from a macroscopic perspective on how the Visual Search works inside the CPS simulator.

Chapter 4 The Visual Search algorithm proposed in this chapter is based on MPEG CDVS[67] and will be split into two blocks, the descriptor ex- traction and the retrieval stage. In the former block, we will get a encoded form of the query image which is represented by the compressed descriptor including the compressed local descriptor and global descriptor. In the lat- ter block, compares the query and database image descriptors to determine if the query image depicts the same object and give the top matchings in the database.

Chapter 5 will discuss the composition and construction of the CPS sim- ulator in detail. First, the CPS processing subsystem is simulated by the gem5 simulator, which is required to simulate the for the server and user of VS, the detailed configuration will be given here. Second , the OMNeT++ network simulation tool is used for network subsystem simulation, the building of the required network will be discussed. Finally, the integration High Level Architecture (HLA) of the two subsystem and the synchronisation by the CERTI HLA [66] will be discussed in detail.

Chapter 6 will evaluate the execution time, quality of results and the net- work latency in three scenario, first is that the client and server complete the descriptor extraction and retrieval separately, second is that the client communicate with the server with a simulated wired network, last is that three clients communicate with a server with the simulated wired network. These three metrics under the three scenarios can give us the feadbck about

5 the performance of the CPS simulator.

Chapter 7 Finally will give the conclusion and suggest future work to be developed.

6 Chapter 2

State of the art

2.1 State of the art of CPS

2.1.1 Background of CPS

The core idea of CPSs can be tracked back to 1926 with the publication of Nicola Tesla’s thoughts on what he called “Teleautomation”. He predicted that “when wireless is perfectly applied the whole earth will be converted into a huge brain” and the instruments that would be used would be so small that “a man will be able to carry one in his vest pocket”[28]. In 1948 the mathematician Norbert Wiener investigated more on this idea and gave it the name of “Cybernetics”[28]. The first embedded system was cre- ated in 1961 and used for the Apollo missions[28]. Since then this kind of systems had been extensively used in applications such as communication systems, aircraft control systems, automotive electronics, home appliances, weapons systems, games and toys, to cite a few examples[77]. In 1988 Mark Weiser working for Xerox introduced the term “Ubiquitous Computing” or ubicomp[28]. The core difference between CPSs and embedded systems in general is that the latters are not necessarily network connected and thus are modelled as closed “boxes” that do not expose computing capabilities to the outside world[77]. CPSs are instead embedded systems based on the interaction between physical and digital world.

The most interesting and revolutionary cyber-physical systems is networked. The most widely used networking techniques today introduce a great deal of timing variability and stochastic behavior. Today, embedded systems are often forced to use less widely accepted networking technologies (such as CAN busses in manufacturing systems and FlexRay in automotive applica- tions), and typically must limit the geographic extent of these networks to a confined local area. To be specific, recent advances in time synchronization across networks promise networked platforms that share a common notion of time to a known precision[29].

Operating systems technology is also groaning under the weight of the re- quirements of embedded systems. RTOS’s (Real Time Operation Systems) are still essentially best-effort technologies. To specify realtime properties of a program, the designer has to step outside the programming abstractions, making operating system calls to set priorities or to set up timer .

Cyber-physical systems by nature will be concurrent.Physical processes are intrinsically concurrent, and their coupling with computing requires, at a minimum, concurrent composition of the computing processes with the phys- ical ones. Even today, embedded systems must react to multiple real-time streams of sensor stimuli and control multiple actuators concurrently. Re- grettably, the mechanisms of interaction with sensor and actuator hardware, built for example on the concept of interrupts, are not well represented in programming languages. They have been deemed to be the domain of op- erating systems, not of software design. Instead, the concurrent interac- tions with hardware are exposed to programmers through the abstraction of threads.

Thrads, however, are a notoriously problematic[32][76]. This fact is often blamed on humans rather than on the abstraction. Sutter and Larus[61] observe that “humans are quickly overwhelmed by concurrency and find it much more difficult to reason about concurrent than sequential code. Even careful people miss possible interleavings among even simple collections of partially ordered operations.” The problem will get far worse with exten- sively networked cyber-physical systems.

Besides what mentioned above, nowadays CPSs are mostly used as sensor networks that provide information extracted from the physical world to a central computational unit. However, with the increased communication and the emergence of networks of CPSs they will be able to cooperate, share information, and generally be active elements of a more complex system[30]. CPSs will also cooperate to augment the computational power of the entire system via cloud virtualization[30]. Moreover, CPS systems are based on social system, with concepts like division of labor, specialization, multimodal communication models and adaptive behavior, and on biological models, both in software in the form of artificial neural networks and on hardware

8 with the concepts of system awareness or self-replication[28]. CPS is an interdisciplinary field that includes communication technology (bandwidth and computational power), electronic engineering (embedded system design and miniaturization) and semantic technology (information integration) but is not restricted to those topics and it’s also driven by many other fields of cognitive sciences such as neuroscience and sociology[28].

2.1.2 Application of the CPS

Applications of CPS arguably have the potential to dwarf the 20th century IT revolution. They include high confidence medical devices and systems, traffic control and safety, advanced automotive systems, process control, en- ergy conservation, environmental control, avionics, instrumentation, critical infrastructure control (electric power, water resources, and communications systems for example), distributed robotics (telepresence, telemedicine), de- fence systems, manufacturing, and smart structures.

It is easy to envision new capabilities, such as distributed micro power gen- eration coupled into the power grid, electric power grids are designed and engineered to ensure the seamless transport of electrical energy from the place of its production (supply side) to the place of the consumption (de- mand side). These systems form networks that are naturally distributed over large areas, such as a country or a continent, and they encompass mul- tiple nodes of electricity generation and consumption and the infrastructural interconnections, such as lines and transformers, between the nodes. In or- der to ensure a reliable and quality power supply to all consumers that are distributed over the grid, the operational goals of the grid relate first of all to maintaining grid stability while adhering to the grid codes, i.e. the network specifications for the operation of the grid, such as voltage level references at different transmission (high-voltage, HV) and distribution (low-voltage, LV) lines, power transfer levels for transmission and distribution, and fre- quency references in the system, the provision of a connection to the grid, the performance of electricity transmission across the grid, and cross-border transmissions. There are several research projects in the Electronic Power Grids, Grid+, Meter-ON , REserviceS and so on.

Traffic management represents a highly complex System of Systems com- ing under increasing demands for additional capacity, greater safety and lower costs while meeting strict environmental regulations.In the automo- tive sector Intelligent a traffic control cyber-physical system is proposed in

9 [11] which has three levels: the application of the CPS theory of integrating information process into transportation process, traffic detection and control of information on the implementation of technical solutions and support of modern computing, communication and control technology.

The Unmanned Aerial Vehicles (UAVs) is a CPS use in the aerospace. There are numerous UAV activities being undertaken within Europe with several major large programmes andmany smaller programmes investigating and de- veloping UAV technology for military and civilian use. It is not possible to cover all of these in a report but in this section a few key large programmes are highlighted. In larger UAV programmes multiple vehicles are operated as part of systems of systems implementations to gather information or per- form tactical missions.The Thales Watchkeeper WK450[65] is a remotely piloted air system (RPAS) for all weather, Intelligence, Surveillance, Tar- get Acquisition and Reconnaissance (ISTAR) which has been developed for use by the British Army, in a 1bn euros contract awarded to UAV Tactical Systems (U-TacS) in 2005, a joint venture between Thales UK and Israeli Elbit Systems. A UAV “system” such as Watckeeper is not a CPS in itself, it is a single system since it is managed and operated as a single distributed system, however, Watchkeeper detachments will be deployed and integrated into task forces and force packages so will form part of an ad-hoc contingent of CPS.

The smart building, as an application of the cyber-physical systems (CPSs), plays an important role in everyday lives of people. Now in Eu, there are several projects focus on the domain of smart building. REEB,The project aims to facilitate creation of a Strategic Research Agenda (SRA) and a supporting Implementation Activity Plan (IAP) for sustainable and energy- efficient smart building constructions by the establishment of and federation of dialogues between interactive and complimentary communities of prac- tice from energy, environment, and building construction domains. MSP, the central objective of the MSP-project is the development of a highly competitive technology and manufacturing platform for the 3D-integration of sophisticated components and sensors with CMOS technology. BESOS is an EU Research and Development project funded by the EC in the con- text of the 7th Framework Program that proposes the development of an advanced, integrated, management system which enables energy efficiency in smart cities from a holistic perspective.

Networked autonomous vehicles could dramatically enhance the effectiveness

10 of our military and could offer substantially more effective disaster recovery techniques. In communications, cognitive radio could benefit enormously from distributed consensus about available bandwidth and from distributed control technologies. Financial networks could be dramatically changed by precision timing. Large scale services systems leveraging RFID and other technologies for tracking of goods and services could acquire the nature of distributed real-time control systems. Distributed real-time games that in- tegrate sensors and actuators could change the (relatively passive) nature of on-line social interactions. Tight integration of physical devices and dis- tributed computing could make “programmable matter” a reality.

The positive economic impact of any one of these applications areas would be enormous. Today’s computing and networking technologies, however, may have properties that unnecessarily impede progress towards these ap- plications. For example, the lack of temporal semantics and adequate con- currency models in computing, and today’s “best effort” networking tech- nologies make predictable and reliable real-time performance difficult, at best. Many of these applications may not be achievable without substantial changes in the core abstractions.

2.2 CPS simulation tools

There are numerous simulators and emulators that have been developed and implemented mainly for Wireless Sensor Networks (WSNs). Because of the affinity of the WSN field with the CPS one, most of these tools can also be used in the CPS context. The simulators and emulators can be placed in the following three categories:

(i) those that support only processing sub-systems,

(ii) those that support only network sub-systems,

(iii) those that support both processing and network sub-systems.

However, it should be noted that simulators of the 3rd category (i.e. simu- lators which support both processing & network) either support only very simple (for example ATMega128) and Real Networks (wire- less IEEE 802.11 andor 802.15.4 protocols) or very complex CPUs (ARM, MIPS, PowerPC etc.) and dummy network communication. As a result none of them can simulate both the processing and network subsystems of an actual CPS application.

11 2.2.1 Introduction of some Processor-Only Simulation Tools

OVP[50] is a high performance simulator that can simulate advanced multi- core heterogeneous or homogeneous platforms with complex memory hier- archies, cache systems and peripherals. OVP is an instruction-accurate sim- ulator (not cycle accurate) implemented in C. Currently, it can support processor models of ARC, ARM, MIPS, PowerPC, NEC v850, and Open- Risc families with many different types of system components including ram, rom, trap, cache and peripheral models including dma, uart, fifo, etc. How- ever, it does not provide cycle-accurate simulation.

The SimpleScalar[58]tool set is a system software infrastructure used to run modeling applications for program performance analysis, detailed micro- architectural modeling, as well as for hardware-software co-verification. Sim- pleScalar can execute modeling applications that simulate (cycle-accurate) real programs running on a range of modern processors and systems. How- ever, it cannot model or run an operating system, since the simulation speed is low.

CPU Sim[68] is a Java application that allows users to design simple com- puter CPUs at the level and to run machine-language or assembly- language programs on those CPUs through simulation. It can be used to simulate a variety of architectures, including accumulator based, RISC-like, and stack-based (such as the JVM) architectures. However, CPU Sim can- not support very complex CPUs (such as ARM, X86, etc.), while it can run only machine-language or assembly-language programs.

ESCAPE[64] is a PC-based simulation environment aimed at the support of education. The environment can simulate both a microprogrammed architecture and a pipelined architecture with single pipeline. Both architectures are custom-made, with a certain amount of con- figurability. However ESCAPE cannot support very complex CPUs (such as ARM, X86, etc.), while it can be used only for education purposes (due to simplicity of supported architecture).

HASE[24] is a Hierarchical computer Architecture design and Simulation Environment which allows for the rapid development and exploration of computer architectures at multiple levels of abstraction, encompassing both hardware and software.

12 MikroSim [72] is an educational software computer program for hardware– non–specific explanation of the general functioning and behaviour of a vir- tual processor, running on the operating system. De- vices like miniaturized , ,microprocessors, and com- puter can be explained on custom-developed instruction code on a regis- ter transfer level controlled by sequences of micro instructions (microcode). Based on this it is possible to develop an instruction set to control a vir- tual application board at higher level of abstraction. MikroSim is available mainly for academic purposes and it can support only micro instructions (microcoding) for a virtual and not complex CPUs (such as ARM, X86,etc.).

SESC [55] is a microprocessor architectural simulator developed primar- ily by the i–acoma research group at UIUC and various groups at other universities that models different processor architectures, such as single pro- cessors, chip multi-processors and processors-in-memory. It models a full out-of-order pipeline with branch prediction, caches, buses, and every other component of a modern processor necessary for accurate simulation. SESC is an event–driven simulator, while it can support only MIPS instructions.

Simics [57] is a full-system simulator used to run unchanged production binaries of the target hardware at high-performance speeds. Simics can simulate systems such as Alpha, x86-64, ARM, MIPS, PowerPC, POWER, SPARC-V8 and V9, and x86 CPUs. Many operating systems have been run on various varieties of the simulated hardware, including MS-DOS, Win- dows, VxWorks, OSE, Solaris, FreeBSD and Linux. However, Simics is a commercial tool and it is not opensource.

Zsim [54] is a fast x86-64 simulator. It was originally written to evaluate ZCache (Sanchez and Kozyrakis, MICRO–44, Dec 2010), hence the name, but it has since outgrown its purpose. Zsim’s main goals are to be fast, sim- ple, and accurate, with a focus on simulating memory hierarchies and large, heterogeneous systems. It is parallel and uses DBT extensively, resulting in speeds of hundreds of millions of instructions/second in a modern mul- ticore host. Unlike conventional simulators, Zsim is organized to scale well (almost linearly) with simulated core count. However, Zsim cannot support ARM–based architecture.

GEM5[18] simulator is a modular platform for computer-system architec- ture research,encompassing system-level architecture as well as processor

13 microarchitecture. It is a cycle accurate simulator able to model different CPUs/ISAs and system components (full-system (OS) and application-only modes are both supported). In addition, it is a widely used processing sim- ulator with active development by contributors from both the academic and the industrial sectors. The GEM5 simulation infrastructure is the merger of the best aspects of the M5[12] and GEMS[38] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and di- verse CPU models. GEMS complements these features with a detailed and flexible memory system, including support for multiple cache coherence pro- tocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86). It should be noted that GEM5 does not provide network simulation functionality, in order to simu- late frameworks of interconnected systems. However, when using FS (Full System) mode where devices can be added and used from the OS, GEM5 supports the use of network interface cards that can provide hooks for con- nection to external network simulator tools. Due to the above features of GEM5, we choose it as our Processing subsystem simulation tool.

2.2.2 Introduction of some Network-Only Simulation Tools

NS-2 NS-2[45] is a popular non-specific discrete event real network simulator built in an Object-Oriented extension to the Tool Command Language (TCL) and C++. It provides the most complete support of communication proto- col models (such as wireless IEEE 802.11 and 802.15.4 protocols). However, this simulator has some limitations. Firstly, when compared to other tools, NS-2 has a long learning curve and requires advanced skills to perform mean- ingful and repeatable simulations. Another drawback of NS-2 is the lack of customization available. Packet formats, energy models, MAC protocols, and the sensing hardware models all differ from those found in most sen- sors. NS- 2 also lacks an application model. In many network environments this is not a problem, but sensor networks often contain interactions between the application level and the network protocol level.

In addition, since NS-2 is originally targeted to IP networks but not WSNs or CPSs, there are some limitations when used for such applications. Firstly, NS-2 can simulate the layered protocols but not application behaviors. How- ever, the layered protocols and applications interact and cannot be strictly separated in WSNs/CPSs. So, in this situation, using NS-2 is inappropriate,

14 and it can hardly produce results with adequate accuracy for WSNs/CPSs. Secondly, because NS-2 is designed as a general network simulator, it does not consider some unique characteristics of sensor-based networks. For ex- ample, NS-2 cannot simulate problems of the limited available bandwidth and energy resources, typical of the aforementioned applications. Thirdly, NS-2 has scalability issues when WSN/CPS applications typically involve a large number of nodes[59],[75]. Finally, increasing the number of nodes simulated in NS-2 results in tracing files that are too large to manage and difficult to parse.

J-Sim J-Sim[27] is an open-source, component-based compositional network sim- ulation environment that is developed entirely in Java. J-Sim is a truly platform-neutral, extensible, and reusable environment. J-Sim also provides a script interface to allow integration with different script languages such as Perl, Tcl or Python, while in the current release, J-Sim is fully integrated with a Java implementation of the Tcl interpreter (with the Tcl/Java ex- tension), called Jacl. So, similar to NS-2, J-Sim is a dual-language simula- tion environment in which classes are written in Java (for NS-2, classes are written in C++) and “glued” together using Tcl/Java. It supports energy modeling and has a component-based architecture, but it does not support radio energy consumption. However, only the IEEE 802.11 MAC protocol has been implemented so far in JSim, while the J-Sim model defines the generic structure of a node (either an end host or a router) without any mention for the type of CPU which can be simulated. Hence, it is not a pre- ferred simulator tool for realistic WSN simulation. Moreover, it introduces some additional overhead and some inefficiencies regarding to the Java pro- gramming language[1].

NetSim NetSim[42] is a popular network simulation and network emulation tool used for network design&planning, defense applications and network R&D. Various technologies such as Cognitive Radio, Wireless Sensor Networks, Wireless LAN, Wi–Max, TCP, IP, etc. are covered in NetSim. NetSim is a stochastic discrete event simulator developed by Tetcos, in association with Indian Institute of Science, with the first release in June 2002. However, NetSim is an application that simulates only Cisco Systems’ networking hardware and software and is designed to aid the user in learning the Cisco IOS command structure.

15 NS-3 NS-3[46], like NS-2, is an open source discrete–event network simulator. NS– 3 is considered as a replacement of NS–2, (not an extension of it) and as a result it is not backwards compatible with NS–2; thus it cannot directly take advantage of the large base of protocols and models that have been developed for NS–2. However, it is a real network simulator capable of sim- ulating networks such as Wireless Sensor Networks IEEE 802.15.4 and IEEE 802.11. It provides significant improvements in performance, scalability and extensibility compared to NS–2 simulator. Like its predecessor, NS–3 relies on C++ for the implementation of the simulation models. NS–3 no longer uses Tcl scripts to control the simulation, thus skipping the problems which were introduced by the combination of C++ and Tcl in NS–2. Instead, network simulations in NS–3 can be implemented in pure C++, while parts of the simulation optionally can be realized using Python as well. However, powerenergy model is not mentioned in documentation (probably not im- plemented) while it has very large trace files.

OMNeT++ OMNeT++[48] is also a discrete event simulator; however it is more general than theaforementioned simulators as it is not designed only for network sim- ulations, thus providing great extensibility. OMNeT++ is a general discrete event, component-based (modular) open architecture simulation framework that includes the basic machinery and tools to write network simulations. Although it does not provide any components specifically for computer net- works, queuing networks or any other domain, it offers “1-click” extensions to various simulation models and frameworks such as the INET framework, Castalia or and MiXim[40] for accurate WSN modeling. Model frameworks are developed and maintained completely independently of the simulation framework, and follow their own release cycles.

The key advantage of OMNeT++ is that it offers great extensibility not only in the classical network simulation domain but also in the physical/environment domain[1] (i.e node mobility in 3D space). Moreover its modular and exten- sible architecture allows it to be seamlessly integrated to a framework that include CPS processing subsystem, without compromising future updates and/or backward compatibility. Furthermore it features a much friendlier compared to its alternatives, which makes tracing and debugging easier.

16 2.2.3 Introduction of some Processing Simulators with a Net- work Extension

TOSSIM TOSSIM[62] is an emulator specifically designed for WSN running on top of the TinyOS,which is an open source operating system targeting embedded low-end systems. TOSSIM is a bitlevel discrete event network emulator built in Python, a high-level programming language emphasizing code readability, and C++. It includes models for very simple CPUs (ATMega128 microcon- troller), analog-to-digital converters (ADCs), clocks, timers, flash memories and radio components. The network communication over the wireless chan- nel is abstracted as a directed graph, in which each vertex is a processing node, and each edge has a bit error probability. Each processing node has a private piece of state representing what it hears on the radio channel, thus this abstraction allows testing under perfect transmission conditions (bit er- ror rate is zero).

However TOSSIM is designed to simulate behaviors and applications run- ning on top of TinyOS and developed only in nesC language, and it is not designed to simulate any other applications/network protocols.

ATEMU ATEMU[2] is an emulator of an AVR processor (MICA 2 single chip micro- controller) implemented in C and being utilized in WSNs. ATEMU can em- ulate not only the communication among the sensors, but also every instruc- tion executed in each sensor (low-level operations of the processor, timers and radio system are all emulated). ATEMU can simulate multiple sensor nodes at the same time, and each sensor node can run different programs. Moreover, it can emulate power consumptions or radio channels. However, although ATEMU can give highly accuracy results (cycle-accurate), the sim- ulation time is much longer than that of similar simulation tools, while it can support only Mica 2 motes.

Cooja(Contiki OS) The Cooja [131] simulator is similar to TOSSIM since its main purpose is to simulate the behavior of an operating system. Cooja is a Java-based simula- tor developed for simulations of sensor nodes running the Contiki operating system.

The authors of Cooja claim that their simulator can work on different levels

17 enabling the so–called cross level simulations. For example ns-2 (networking level) is principally a simulator designed for network and application levels without taking the hardware properties into account. On the other hand, TOSSIM (operating system level) is intended particularly for simulating the behavior of the operating system TinyOS, while the purpose of Avrora is to simulate at the level (instruction set execution level).

The Cooja simulator is implemented in Java, making the simulator easy very extensible, since it also allows the sensor node software to be written in C by using the Java Native Interface. Furthermore, it can execute the Contiki programs in two different ways: either by compiling the application code directly on the host CPU, or by compiling it for the MSP430 hard- ware. Moreover, it can simulate IEEE 802.15.4 wireless protocol, however it cannot provide any power estimations, it can simulate only very simple microcontrollers (MSP430 microcontroller) and it has low efficiency due to the cross level simulation extendibility [60]. AVRORA Avrora[4] simulates a network of motes, runs the actual microcontroller pro- grams (rather than models of the software), and runs accurate simulations of the devices as well as the radio communication. Avrora is an instruction- level simulator built in Java, which removes the gap between TOSSIM and ATEMU. The codes in Avrora run instruction by instruction, which results in a higher simulation speed and better scalability. Avrora provides more accuracy than TOSSIM while it scales at the same level as TOSSIM. Avrora can simulate applications written in different programming languages. How- ever, AVRORA practically supports only the ATMega128L microcontroller while it can support only some features of the IEEE 802.15.4 standard com- pliant radio chip CC2420 using AvroraZ[3] (an Avrora extension).

WorldSens Worldsens[73] is actually an integrated environment for development and rapid prototyping of wireless sensor network applications. The environment itself consists of two simulator tools which can be used either independently or in a cooperation – WSim and WSNet as illustrated in Figure 2.1.

The task of WSim is to simulate the hardware behavior and the events that occur in the actual hardware platforms. It relies on cycle accurate full platform simulation using microprocessor instruction driven timings. The simulator is able to perform a full simulation of hardware events which allows performance analysis. However, these platforms include only Texas Instru- ment MSP430f1611 micro-controller unit including the full instruction set

18 Figure 2.1: WorldSens simulator.

as well as all the peripheral digital blocks (timers, basic clock module, serial ports, with UART and SPI modes, etc).

WSNet is a modular event-driven wireless network simulator. Its architec- ture consists of different blocks that model characteristics and properties of the radio medium. The list of available MAC protocols is also relatively rich containing the IEEE 802.15.4 wireless protocol[59]. During one simulation, the behavior of a block is specified using a model which is a particular im- plementation of the block functionalities. Models are either provided with the simulator or developed by users.

Each node in the WorldSens network is simulated by a WSim. The World- Sens simulator runs the native code as deployed in the sensor hardware without any change and emulates all components embedded in the hard- ware sensor nodes. Thus, all instructions sending commands to the CC1100 are executed and the behavior of the CC1100 is also simulated. When the CC1100 actually transmits a byte, it is transferred to WSNet which simu- lates the radio propagation and interferences according to its internal mod- els, and finally transmits the data to the simulated CC1100 RF transceivers in the other WSim programs. Finally, it supports simple linear energy con- sumption model. However, it can simulate only very simple microcontrollers (MSP430 microcontroller), while the official WorldSens site is not available anymore (and the documentation).

2.3 State of the art of Visual Search

2.3.1 Background of VS

Visual Search (VS), also known as Content Base Image Recognition (CBIR) or Query By Image Content (QBIC), is a computer vision task oriented on

19 analyzing the pixel content of an image and searching images with a similar content in a large database. This kind of problem was first introduced in the early 90s and it kept on growing width the expansion of the internet. Nowa- days, mobile devices with cameras are widespread and this kind of problem is now more relevant than ever.

The term CBIR originated around 1992 and the earliest VS system was de- veloped in the mid-90s with implementation such as QBIC from IBM[15], the Photobook system from MIT[51] and Virage[5], all based on feature similarity algorithms developed more than a decade before[6]. In particular, QBIC used color histograms features, moment based shape feature and a texture descriptor. Photobook used appearance features, texture features and 2D shape features[10]. All those algorithms were based on a discrete approach: every feature is mapped to a binary feature and the occurrence of a feature is treated like the occurrence of a word in a text document,allowing the use of techniques like inverted files and text retrieval metrics[10]. On the other hand, many other methods, developed in the same years, follow a continuous approach in which each image is represented by a feature vec- tor and these features are compared using various distance measures; the images with lowest distances are then ranked highest in the retrieval pro- cess. A popular example of this approach is Blobworld[7] in which images are segmented using a method based on Expectation-Maximization (EM). Many other examples are available in literature[10], with SIMBA[56] and CIRES[26] among them.

In the early 2000s, with the evolution of visual search applications and of the Internet, it became clear that the next step would be understanding the semantics of a query and not simply the underlying low level compu- tational features. This general problem was called “bridging the semantic gap”[34]. In other words, the problem was to translate the easily computable low level content-based media features to high level concepts or terms that would be understandable by the human user. Early examples of content- based retrieval system which addressed the semantic gap problem in the query interface, indexing, and results are the ImageScape search engine[33] and Netra[37]. In those years[39] commercial applications of CBIR included shopping, content filtering, automatic detection of pornographic contents[34] and content rights management systems searching trademark databases.

In the second half of the 2000s, smartphones started to spread at a very fast pace: in 2006 the percentage of smartphones sales on the total market of

20 mobile devices was only 6.9%, in 2007 it reached 10.6%, in 2008 the market segment reached 15% worldwide and 19.3% in Europe[20]. The evolution and diffusion of camera equipped mobile devices shifted Visual Search (PC web search) into Mobile Visual Search. This is not a simple move to a different device, because of the deep architectural differences between the two systems: nowadays smartphones are equipped with high resolution cam- eras, high quality color displays, real time hardware accelerated 3D graphics, GPSs, accelerometers and many other sensors. A new class of augmented reality applications was born[43] which includes browsing the available in- formation about a particular place (the costumers ratings of a particular hotel) or providing services (such as booking hotel rooms) allowing to link the physical world to the digital one[35]. Moreover, the difficulties of typing on touch screen keyboards of many mobile devices or to describe with words what one wants to search made taking a snapshot and using mobile VS ap- plications in many cases easier than standard search. On the same years in which the smartphones were spreading, media services and applications were also created, often based on photo or video sharing. The mobile VS is thus expected to become one of the core functionality of many applications such as image based browsing of personal photo archives[43]. At the same time, there are also several related works on mobile visual search system. Girod et al. [47][30] proposed a visual search system in 2011. They developed new features and data structures for better feature extraction and retrieval. Since they used the 3G wireless network, one of their objectives is to reduce transmission delay, which may not be a big issue today. Meanwhile, the per- formance of their system relied on the database they built, which may not be suitable for arbitrary visual searches. However, their work showed that one could benefit from assigning some computational tasks on the phone. Schroth et al. [47] developed an location recognition system for mobile plat- forms. They developed novel feature quantization to overcome the limited computing power of mobile platforms. The main task is to identify the loca- tion of one picture. The work in [53] also proposed a mobile location search system. They tried to predict the best viewing angle to allow more success- ful queries. Shen et al. [28] proposed a framework to implement automatic object extraction. By employing the top-retrieved images to accurately lo- calize the object in the query image, their proposed framework significantly improved the retrieval performance. Their work also suggested the impor- tance of extracting the query objects for improving retrieval performance. Most of visual retrieval systems are designed for single object retrieval. To enable multiple object recognition, the work in [77] employed a bottom up search-based approach. In this way, graph cuts can be used to solve the

21 multi-object recognition problem. Meanwhile, in [14], the authors described an interactive multi-modal visual search system for mobile platforms. They took advantages of the multi-modal feedback from the user to retrieve the most matched images.The current mobile VS applications available nowa- days are differentiated on many factors such as client interface (usage of menus, real time augmented reality, etc.), image processing (features ex- traction and compression) and image recognition (nearest neighbor based approach, object recognition based approach, etc.).

2.3.2 Existing Visual Search Applications

As an example, Google’s Goggles [21][9], a mobile visual search application, searches an image database with a picture taken by a mobile device. Cur- rently, its supports the search of landmarks, barcodes, books, contact info, artwork, wines and logos. The user sends an image to the server and the image is compared to the contents of entire database. The image package is sent as JPEG, which means a compression process takes place before sending the image [44]. Nokia’s Point and Find, combining MVS with Augmented Reality, presents information about the elements inside the frame of the camera of the mobile phone. The difference of Point and Find is that it

Figure 2.2: An example of Google Googles. uses an index that allows the user to search the image without consulting the server, which shortens the response time. In average cases, a call to a server may take ten to thirty seconds, while Point and Find responds in a few seconds. oMoby [49], an application developed by IQ Engines applies augmented reality with image recognition, detecting brands and products within the camera’s field of view. An interesting property of IQ Engines is that they use crowdsourcing in a situation where an image cannot be

22 matched automatically. The unmatched image is sent to the crowdsourc- ing platform, where it is tagged and properly indexed into the database. Like.com [71], now Google Shopping, effectively monetizes the technology by presenting to the users a selection of products that are similar to the objects that users were looking for, or allows them to search for a product that includes a highlighted part of a different product. Currently, it does not allow users to upload pictures and therefore the user can only select images from company database. However, an extension is on the way and soon it will be possible to send image queries and to retrieve results. Kooaba [25] image recognition platform is similar to oMoby in that it is specialized for specific category object recognition (such as wine lables or clothing). Kooaba has customers such as the wine database Vivino or the Switzer- land’s Ex-Libris (a store for online media). A database of images can be provided or the integration of an external image database is enabled. Like in other cases, the image needs to be searched in a database and therefore requires a network connection and substantial time for sending and receiv- ing data. Moodstocks [41], a startup company from France, also provides visual recognition services. It does not supply the customers with a pre-built dataset and requires the customer to integrate his/her own database. The descriptors are computed on the client side and therefore the data to be sent to the server becomes much smaller, resulting in a faster response time [8].

Nowadays the industry related research on Visual Search follows two orthog- onal directions: generalized object recognition and specific objects recogni- tion. Company such as Google, Alibaba and Baidu have currently developed solutions for the first task using Artificial Neural Networks (ANN). Their goal is to recognize general objects without any restrictions on them and this frequently overlaps with image classification. On the other hand, other companies like Tencent and Amazon (in particular with the Fire Phone) have developed Visual Search applications able to recognize objects among a restricted database of possible items. For example, a database could con- tain all the products included in a flyer, books or all the painting located in a museum. For this reason, it is important to notice that all the above- mentioned companies are developing their own Visual Search applications independently from one another. This means that they have no urgent need to develop interoperable technology, also because they have no interest in sharing their massive and precious databases that are the cornerstones of the capitalization of their VS applications into revenues.[43].

23 Figure 2.3: An example of Amazon Flows.

2.3.3 Compact descriptor for Visual Search(CDVS)

From 2002 to 2010 the Moving Picture Experts Group (MPEG) defined a standard for content based access to multimedia data in their MPEG-7 standard, in which a set of descriptors for images was defined[10]. More re- cently, the growing processing power of mobile CPUs proved that sending an entire image to the cloud is unnecessary and image processing (features ex- tractionand compression) could be performed directly on the smartphone. Moreover, other factors such as unstable or limited bandwidth or query transmission latency demonstrate that feature compression plays an impor- tant role in mobile VS applications [13]. For this reason, the MPEG group started in 2011 a standard proposal under the name of MPEG Compact Descriptors for Visual Search which is in the Final Draft stage as of 2015. CDVS introduces the notion of executing the image processing on the device (imager or mobile phone CPU) and then sending only a highly compressed version of the image over the network. With bitrates from 512 bytes to 16 KB supported by the standard in six different profiles, the bandwidth required is significantly reduced compared to JPEG images, and the quality of the search improved, as the image avoids the intermediate JPEG encoding

24 step which could lose introduce noise or artifacts.

2.3.4 The functionality and advantage of CDVS • Compression Efficiency Compression is an essential requirement for typical client-server VS architectures: some of the existing MVS applications rely on the trans- mission of a JPEG compressed query to remote servers: the drawbacks of this approach relate to huge computational load on the server side, reduced picture quality, bandwidth consumption and latency, in par- ticular for application scenarios with more stringent requirements (e.g. Augmented Reality).

CDVS is changing the communication paradigm: a set of local fea- tures are extracted from the query image and compressed into a single compact descriptor on the client side, the resulting compact descriptor is then sent to the server to initiate search. As shown in figure 3, the overall size of a set of local uncompressed features extracted from an image can be larger than a traditionally compressed JPEG files. In this conceptual figure, the well-known SIFT [36] descriptor is used as an example how large the size of an uncompressed descriptor could get. However, CDVS allows to drastically decreasing the dimension of the compact visual descriptors, thanks to a scalable and adaptive compression scheme. CDVS supports different sizes of compact de- scriptor footprint, spanning from a maximum of 16Kbytes per image, which is the fully performing operating mode, down to 512 Bytes, for extremely constrained bandwidth scenarios.

Moreover CDVS offers a unique top performing combination of funda- mental properties.

• Scalability Compact descriptors of different sizes can efficiently interoperate: in fact, the same compression scheme is applied for each operating mode and scalability is guaranteed by a different number of local features embedded into the compact descriptor. A mechanism for ordering the local features according to their relevance is also standardized, aiming at maximizing performances given a certain number of local descriptors available.

• Support to web-scale Databases

25 A global descriptor is also embedded into the compact descriptor: the global descriptor can be matched extremely fast with similar global descriptors, thus providing means to search against extremely large scale (e.g. web scale) datasets in a shorter time thus quickly generating a limited set of candidates for further refinement.

• Hardware implementation efficiency The development of CDVS standard has been driven by a strong hard- ware manufacturer industrial support, aiming at solutions with very low computational complexity, small memory footprint thus facilitat- ing low-power hardware implementations. This resulted in a reference extraction pipeline that will simplify the design work of SoC archi- tects and hardware designers. Moreover the CDVS simulation frame- work provides essential tools to easily verify the design implementa- tion. This is a very important task, quite time, resources and money consuming representing up to 70% of the time to bring integrated cir- cuits at full maturity. The test model CDVS designs will make easier to achieve a SoC design fully functional, highly performing and ready for production resulting with a dramatically reduced effort.

• Generality CDVS technology targets general-purpose scenarios: therefore, the standard solution is designed to guarantee robustness with any cat- egory of data. The technology, relying on local features and geometric verification, can be successfully applied for matching images of any textured rigid objects, such as books, CDs, landmarks, printed doc- uments, DVDs, paintings, buildings, without any need for re-training or user defined parameters optimization.

• Robustness The very rigorous testing procedures in place, as traditional praxis in MPEG, ensure an excellent level of performances: in particular, performances are evaluated on extensive datasets, containing data of different categories: graphic objects (cds, dvds, business cards, mag- azines, books), landmarks, museum paintings, video frames and com- mon objects, for an overall amount of 30,000 query images. To ensure a continuous benchmark against prominent technologies available in the scientific community, publicly available datasets are also part of the testing dataset. Finally, in order to emulate the searching con- ditions of a typical large scale (web) dataset, additionally 1 million images are used a distractor set.

26 • Sufficiency The descriptors are self-contained, no other metadata are necessary to enable search. However, CDVS descriptors can be easily combined with other relevant metadata (e.g. GPS coordinates) aiming at nar- rowing the scope of the search and improving retrieval efficiency.

27 Chapter 3

The Structure of the MVS and CPS simulator

In this chapter, we will describe the structure of the MVS application case in this cps simulator and the integration of the whole cps system.

3.1 The structure of the Mobile Visual Search

In the Mobile visual search, the user is expected to use an application on the simulated client system to acquire a photo and send the related informa- tion to the server and the server can return the photo’s related information according to the pre-built database. The processes can be summarized as follows and illustrated in Figure 3.1.

• Take a photo with the camera on the mobile device ( In the simulation, this process is pre-completed in the native system);

• Send the photo as a query in order to get information about the vi- sual content;

• Receive image information as a result of (successful) query.

Internally, the application on the client side processes the image and in or- der to create an encoded and compressed form for visual processing (the descriptor of the image) and send the just created descriptor to a remote server for image retrieval. The server performs a search on a database index and in case of the successful search and it returns an answer (the name of the image) to the client. The different steps of the process are shown in the Figure 3.2. Figure 3.1: UML user case diagram for Mobile Visual Search.

Figure 3.2: UML Sequence diagram for Mobile Visual Search, client encoding.

29 Alternatively, the mobile client application can send the image directly to the remote server, where the image is then encoded and the search per- formed, as shown in the Sequence Diagram in, Figure 3.3. In this second scenario, a larger amount of data needs to be sent over the network, but some computations are offloaded from the mobile client to the server Futhermore, in our algorithm, decode an image buffer from client need OpenCv library and it is very time-consuming in the GEM5 system, the Processing Subsys- tem simulator we use, so we will not consider this scenario in real simulation.

Figure 3.3: UML Sequence diagram for Mobile Visual Search, server encoding.

What’s more, as respect to the scenario a server with multi client, in order to improve the user experience and efficiency of the server, we use a multithread server. Rather than processing the incoming request in the same thread that accepts the client connection and processes the results retrieval, the client connection is handed off to a worker thread that will process the request from the client, one thread for one client until the client quit itself and the thread will be released. This scenario is illustrated in Figure 3.4.

30 Figure 3.4: UMLSequence diagram for Mobile Visual Search, single server with multi clients.

3.2 Mapping the Mobile Visual Search on the CPS.

The diagram that mapping the Mobile Visual Search on the CPS simulator is shown in the Figure 3.5, the steps of the visual search in the CPS sim- ulator are shown below while the CPS processing subsystem (gem5), CPS network subsystem (omnet++) and the integration of the two system will be discussed in detail in next chapter.

• In the CPS system the visual search consists of two kinds of nodes,

– the user nodes, – the server node.

• Both kind of nodes have a network related part and a processing re- lated part, these two parts are synchronised by CERTI HLA, which will be discussed in detail in the next chapter.

• At the start of a visual search process, the user takes a a picture and pre-processed(get the descriptor of the picture) in the simulated client system. This simulated client system is the processing subsystem of the CPS.

31 Figure 3.5: Mobile Visual Search mapped on the CPS simulator

• During the next step the user node submits the descriptor of the query image to the server nodes using the simulated network. This procedure is handled by the Network subsystem of the CPS.

• After the Server node receives the request from the usr node, the server searches it in its database for possible matches of the image descriptor. This part is handled by the processing subsystem of the CPS.

• The last step of the Visual Search is the submission by the Server node to User node of the image name for possible matches of the query image descriptor. This procedure is handled by the Network subsystem of the CPS.

• As shown in Figure 3.5, both CPS subsystem have instantiations of the User node and Server node but each subsystem handles different aspects of their behaviour: the network communication handled by the network subsystem and the user and server behaviour specifically the image processing and retrieval are handled by the processing sub- system.

• The dashed lines in the processing sub-system between the nodes shows that it is not concerned by the network communication between nodes but only their processing behaviour. The network communication is completely handled by the network subsystem.

32 Chapter 4

Visual Search

The Visual Search application implemented in the CPS simulator can be split in two main blocks, Descriptor Extraction and Retrieval Stage. The clients in the CPS simulator are required to complete the first block, then the compressed local descriptors and the global descriptors are send to the server. The server in the CPS simulator are required to complete the Re- trieval Stage, in the simple model, the server will only use global descriptor to do the matching and retrieval the results with only global score. In a more complex model, the server will use local descriptors for homography check and then retrieval the results with both global and local scores.

4.1 Descriptor Extraction

The descriptor extraction is the results of image analysis, the descriptor is the encoded and compressed form of the query image for retrieval stage. The pipeline of the descriptor extraction is shown in Figure 4.1.

Figure 4.1: Pipeline of Descriptor Extraction 1 Block based Key point detection The key point detection is based on the Block-based Frequency do- main Laplase of Gaussian, As illustrated in Figure 4.2, it works in the following steps,

Figure 4.2: The workflow of keypoint detection

(a) Generating the pyramid of an input image with several octaves. Image pyramid is used to reduce the complexity and speed up the key point detection.The pyramid is generated with O octaves by down sampling the input image by step = 2. (b) For each octave, do steps (i) – (vi)

i. Decomposing each octave into blocks. Blocks are introduced to reduce the computational complex- ity,Firstly, blocks can significantly reduce the time complex- ity. On the one hand, by fixing the block size, frequency domain LoG filters and Gaussian filters can be computed of- fline, thereby avoiding the online generation time cost of fre- quency domain filters with respect to different image sizes.

34 On the other hand, blocks allow the parallel implementa- tion in practice. Secondly, blocks can significantly reduce the memory complexity. Compared with w hole imagewise filter- ing, block-wise filtering can shrink the memory cost incurred by the frequency domain filters (i.e., LoG and Gaussian) and the FFT/IFFT transformation. In the algorithm, the the block size of 128 by 128 is chosen. ii. Performing block-wise filtering to produce LoG response blocks. Frequency domain block-wise filtering is employed, consisting of three stages, as illustrated in Figure 4.3 : Fourier trans- form of a block image, block filtering by frequency domain LoG and Gaussian scaled filters, and inverse Fourier trans- form of frequency domain filtered blocks.This LoG 2-D filter is composed by the Gaussian smoothing filter which is used to reduce the noise of the image and the Laplacian filter which is used as an edge detector cause it is an isotropic measure of the 2nd spatial derivative of the image and is able to em- phasize regions of rapid intensity change.

Figure 4.3: Block Image LOG Filtering iii. Recomposing blocks to produce LoG response images and Gaussian scaled images. Based on the block-wise filtering results of the decomposed block images of an octave input image, LoG response blocks and Gaussian scaled blocks are concatenated to form a LoG response image. iv. Detecting scale-space extrema over LoG response images. Scale–space extrema detection is done by comparing each pixel’s BFLoG response value with its surrounding 26 pixels. Scale–space extrema refinement is done by eliminating edge pixels using Hessian matrix.

35 v. Refining extrema output to detect the interest points. Scale- space extrema detection may produce unstable keypoints. First, interpolation is introduced to remove unstable interest points and locate more accurate location of interest points; and then, Hessian matrix is applied to remove noisy interest points occurring at edges. vi. Computing the orientation of each interest point. (c) Outputting the union of detected interest points of all octaves as the interest points, each interest point containing the charac- teristics of scale, orientation, coordinates, the output of the LoG filtering.

2 Feature selection Selection of a limited number of key points, in order to identify those that maximize a measure of expected quality for subsequent match- ing.The feature selection adopts the approach described in [17]. It assigns a positive value to any feature, as a function of its Laplacian of Gaussian (LoG)characteristics, its orientation and its coordinates. We let the n’th feature in an image be denoted by Sn (intended to en- compass the DoG characteristics, its orientation and its coordinates). The function value will be denoted by r (for keypoint relevance), so a feature has the value r(Sn) . The relevancies are then sorted such that r(Sn1) ≥ r(Sn2) ≥···≥ r(SnN ) . Only the first L features n1,..., nL are kept, such that the mean query length remains below the target bit rate.

3 SIFT feature description The keypoint selection produces a number of keypoints, each charac- terized by four parameters, namely its position (x,y), it’s scale σ, and it’s orientation θ . For each keypoint,a local feature (SIFT feature) is extracted from a local image region around the keypoint. and generate a local descriptor, representing the image region around the selected point. selected keypoints are described using the Scale Invariant fea- ture Transform (SIFT)[36], this kind of local descriptors are widely used and are scale invariant. feature extraction step is performed us- ing the off-the-shelf SiftGPU library[74].

4 Local Descriptor compression Transform and scalar quantization-based compression of the selected SIFT features.

36 5 Coordinate coding The coordinates of the key points in an image are encoded by quanti- zation and arithmetic coding.

6 Global Descriptor aggregation For fast and efficient search, uncompressed local feature descriptors are aggregated into a compact discriminative global descriptor named Scalable Compressed Fisher Vector (SCFV)Compute a global descrip- tor by aggregating the previously extracted local descriptor using Fis- cher Vectors[52]: the probability of observing a single local feature is modeled using a Gaussian Mixture Model(GMM). The Fisher vector is then defined as a normalization of GMM Fisher information and for normalization purpose is computed on a transformation of the local descriptor. First of all the RootSIFT descriptors are computes sim- ply L1 normalizing the original descriptors and then substituting each component of them with the signed root of it, then a PCA (Princi- pal Component Analysis) dimension reduction is performed which can significantly reducing the Fischer vector dimension and effectively re- moving redundant information. GMM parameters are obtained from a training dataset. Finally, we will get a compressed global descriptor. The process is shown in Figure 4.4.

Figure 4.4: Global Descriptor aggregation

37 4.2 Retrieval Stage

The retrieval procedure encompasses a comparison between the global de- scriptors of the query image and the reference image in the database, as well as the matching of the local descriptors present in both images. We need to mention that as a preliminary offline task, every image in the database is extracted a set of local descriptors to generate a compressed global descrip- tor and compressed local descriptors stored in the database and the same technique described in the Descriptor Extraction stage.

In the global descriptor matching, For the matching of the global descrip- tors, given two images X and Y, the similarity score is a weighted correlation between their global descriptors and can be calculated quickly by (i) using bitwise XOR and POPCNT to compute Hamming distances, and (ii) read- ing the weights from a small look-up table. If the similarity score exceeds a threshold, this image pair is decided as a match, otherwise a non-match.

In the local descriptor matching, first, the compressed local descriptors and their coordinates are decoded for both the query image and the reference images.

The local descriptors are decoded and then compared in the compressed do- main using the L1 distance, which is calculated for corresponding groups of four ternary elements using XOR and a small lookup table. When descrip- tors of different lengths are matched, the scalability of the local descriptors is exploited by reducing the higher length descriptors to the subset of ele- ments which appear in the lower length descriptors.

The ratio of closest distance and the next closest distance is used as a cri- terion for distinctiveness to determine the key point matches (correspon- dences) between the two images.

The algorithm we use here also include a two-way keypoint matching tool, whereby matching keypoints are identified in both directions, i.e. by iden- tifying in the query image the keypoints that match the keypoints of the reference image and by identifying in the reference image the keypoints that match the keypoints of the query image, and by retaining the intersection between the two sets. This results in a set of keypoint matches which are consistent in both directions, which in turn results in image matching and retrieval accuracy gains.

38 Whenever two or more different keypoints in one image match the same keypoint in the other image, the match that has the smallest distance ratio is maintained. This improves the chance of having correct matches among those submitted to the following stage, the geometric consistency check.

The geometric consistency check is performed to determine the number of inliers among the key point matches for the two images. If a certain hypoth- esis test is passed and a weighted sum of the inliers exceeds a threshold, the two images are considered as a match. The weights depend on the ratio of the two closest descriptors computed in the keypoint matching stage and privilege stronger matches. Finally, in case of a match, homography esti- mation is conducted to produce localization information. In this thesis, the geometric consistency check uses the histogram of logarithmic distance ra- tios (LDR) for pairs of matches. This histogram was introduced in [63] and described here has the acronym DISTRAT (distance ratio coherence.)

DISTRAT identifies inliers with high precision, however in some cases the identified set may still contain one or two outliers. This does not cause problems for image matching, but it may cause incorrect estimates of spa- tial transformations. So the final step is a homography estimation between the query image feature points and the reference ones, The homography is computed rapidly using a direct linear transform (DLT) [23] algorithm per- formed inside a random sample consensus (RANSAC)framework to exclude outliers.

There are tow modes that we use in our VS application implemented in the CPS. The one is that retrieval the matching images with only global de- scriptor matching and we will get a ranked lists of images with global score, the higher implies matching better. The other one is that we get the match- ing results from both the global descriptor matching and local descriptor matching, first we we have a lists of matching images as the results of global descriptor matching, then the local descriptor matching will be processed in these images and the query image, finally we will get a ranked lists of matching images with both global score and local score.

39 Figure 4.5: Diagram of Retrieval algorithm

40 Chapter 5

Building of the CPS simulator

For a Processing subsystem, the inputs related to build a processing subsys- tem and its output required are defined as illustrated in Figure5.1.

Figure 5.1: CPS Processing Subsystem inputs and outputs.

• System Configuration A simulation configuration file. It defines sys- tem components. More specifically:

– number of CPUs (or cores) – on-chip interconnect – memory sub-system (incl. caches) – other devices • CPU description Characteristics of CPU cores, including CPU profile (simple IPC = 1, in order CPU, Out of Order, pipeline CPU), ISA, main architectural features.

• Network Interface Card Characteristics of the network interface card (NIC) of a node, including connectivity with the host CPU, network protocol type and packet processing times. The simulator is expected to offer some readily available NIC descriptions for commonly used interfaces (e.g. Ethernet).

• OS Image and Application Executable An image file of the Operat- ing System that would be executed on the simulated platform. The image file shall include the OS kernel and all the libraries and other components that are required. It should include the Visual Search ap- plication that the user intends to execute on the simulated platform. The simulator is expected to include a number of preassembled OS images (nix–based, the simulator at present is not expected to sup- port MS Windows operating systems) that the user can invoke along with instructions on how to modify them or create entirely new im- ages. The Visual Search server and client in the CPS are both use the ubuntu 12.04 image.

• Statistics A text representation of all of the processing sub-system statis- tics registered for the simulation. By default tool provides all statis- tics of an execution run (clock ticks, real time execution, cache misses, memory transactions, simulator instruction rate, number of CPU cy- cles simulated, number of seconds simulated, branch predictor infor- mation, etc).

• Application Output A file which contains the application output from the simulation.

The processing sub-system will execute the whole operating system with its peripherals and the application in a cycle-accurate manner and it produces statistics about simulation (such as clock ticks, simulator instruction rate, number of CPU cycles simulated, number of seconds simulated, etc.). The processing sub-system will provide a number of different CPU models (from abstracted CPU models to fully timing CPU models) to achieve speed vs accuracy tradeoffs.

For a Network subsystem, the inputs and outputs are illustrated as Figure 5.2.

42 Figure 5.2: CPS Network Subsystem inputs and outputs.

• Network & Topology Through Network and Topology description the user will be able to:

◦ Describe network topology o chain, ring, mesh, etc ◦ Describe Channel definitions: – Antenna characteristics (for wireless nodes) – Transmission power – Bandwidth rate – Higher level protocols Eg. TCP/IP – Lower level protocols Eg. Ethernet, GSM, 802.11.

• Simulation parameters Simulation parameters could offer additional flexibility when considering multiple simulation scenarios.

• Node Behavior description User can also provide the Node Behavior description.This is used in order to model intermediate network devices (such as routers or bridges) .

• Time Series Time series present the behavior over time and will be pro- vided by series of timestamps pairs with:

◦ end to end delay of received packets ◦ packet drops or channel throughput

• Summary Statistics Considering the summary statistics those will be:

◦ the number of packets sent

43 ◦ number of packet drops ◦ average end to end delay of received packets ◦ peak throughput

The functionality that a Network subsystem should offer include: (i) Building the network model of the simulated system according to the network and topology descriptions provided by the user

(ii) Simulating the system according to the application characteristics through the interaction with the processing sub-system that models each net- work node.Following we will discuss the inputs to build the two subsys- tem and the most important part that integration of the two systems will be discussed in subsection 3. The outputs will be illustrated in short in next chapter and then the outputs will be presented as the the evaluation results. For the integration of processing and network simulation, bringing the two simulators together requires carefully designed communication interfaces and synchronization schemes. This bidirectional interface will have to pass information on the type and timing of events and to provide a common data representation, since data are represented differently in the processing and the network simulators. Passing actualdata between the two simulators is necessary for supporting simulation of real use cases. The High Level Architecture(HLA)[70] is used.

5.1 CPS Processing Subsystem

GEM5 is selected as the simulation tool to form the basis the of the pro- cessing sub-system of the CPS frame work. Since GEM5 have following features: (i) Handling nulti-core CPSs including several of memory hierarchies, cache systems and peripherals( in Full system model).

(ii) Being Cycle accurate[31]

(iii) Able to efficiently simulate a complete systems with devices and an operating system, an image file of the Operating System that can be executed on the GEM5, the image file shall include the OS kernel and all the libraries and other components that are required, the OS image used in our simulation is ubuntu 12.04 for both ARM and X86 ISA.

44 (iv) It can support a very board range of modern CPUs while supporting X86 and ARM ISA.

(v) It cannot simulate network-related functionality however it provides network interface cards (how to extent the network model of GEM5 will be discussed in the following subsection).

Here a diagram (Figure 5.3) is given to show the component in CPS process- ing subsystem, then we will discuss the detail of configuration and network model of GEM5 system.

Figure 5.3: CPS simulator Processing Subsystem

5.1.1 Configuration of the GEM5 system • CPU Model GEM5 provides 4 different CPU models, AtomicSimple, TimingSimple, In-Order, O3, each of them representing a different

45 point in the speed vs simulation accuracy trade-off. InOrder is a pipelined in order CPU model, o3 is a pipelined out-of-order CPU model. AtomicSimple and TimingSimple are non-pipelined CPU mod- els that attempt to fetch, decode, execute, and commit a single instruc- tion on every cycle. The AtomicSimple CPU is a minimal, single IPC CPU which complete all memory access immediately. This low over- head makes AtomicSimple a good choice for the simulation tasks like fast-forwarding.In the simulation, we will test AtomicSimple CPU and O3, the out of order model for the client for descriptor extraction, and only AtomicSimple CPU for server.

• System Mode Each execution-driven CPU model can operate in either of two modes. System-call Emulation (SE) mode emulating most system-level services. Full-system (FS) mode executes both user-level and kernel-level instructions and models a complete system including the OS and devices. In the FS mode, GEM5 initiates the simulated op- erating system and the user has to connect to the the system (through a console terminal) and use it as a typical Virtual Machine. In this simulation, we simulate the ARM and X86 architectures on GEM5 FS mode so as to support real network cards and a complete TCP/IP protocol stack included in the Operating System Kernel module with the appropriate divers. as shown in Figure 5.3.

• Memory System The gem5 simulator include two different memory sys- tem models, Classic and Ruby. The Classic model (from M5) provides a fast and easily configurable memory system, while the Ruby model (from GEMS) provides a flexible infrastructure capable of accurately simulating a wide variety of cache coherent memory systems. In this simlulation, we use the Classic model, to be more fast because GEM5 as a cycle-accurate simulator is rather slow.

• ISA Should be able to simulate ARM for client and X86 for server.

5.1.2 Network model of GEM5 In addition to the network interface cards, GEM5 supports networking through a simple Ether device. This Etherlink is a virtual dummy link which emulates a cable over which Ethernet packets are sent and received without any delay. The limitation that there is only a single NICs and a single type of work just with two nodes ( without any routing or switching) force us to do some modifications. Since the device models (NICs) are not

46 Figure 5.4: GEM5 systems interconnection in the CPS simulator

easy to develop without inside information from their manufacturer, we can’t modify through the NICs. In order to support a more complex network, we tap the Ethernet packets from Etherlink and send them to a Networking Simulator configuring at the same time the NIC according to the specific network protocol we simulated. The Networking Simulator is OMNET ++ which is described in the next network subsection.

Furthermore, the simplistic network model of GEM5 through the Ether- link device, has another serious limitations. It only supports the simulation of two networked system which are identical (for example two identically configured ARM processors with exactly the same peripherals and memory

47 configuration), as the whole simulation is excited within the same thread, there is no synchronisation primitives between the two systems are provided.

To achieve the synchronisation of the different system, we employ CERTI HLA. We developed a virtual device named HLA GEM5 and we integrated this device to the main core of the GEM5 system through Etherlink. HLA GEM5 device is a wrapper to an RTI Ambassador Class serving to exchange mes- sages over the network with the HLA Server (RTIG process) via TCP (and UDP) sockets. More specifically, the HLA GEM5 device exchanges Ether- net Packets captured from Etherlink Devices and sends ( and accordingly re- ceives) them to (from) HLA Server(RTIG). The HLA Server forwards theses messages to the proper interface of a Network Simulator (that implements all the network related fnctionality (NIC physical layer and actual network). Figure 5.4 provides a visualisation of the CPS GEM5 interconnection.

5.2 CPS Network Subsystem

The basic idea of the Visual Search simulation is implemented on a wire- less network, unfortunately we only realize a Ethernet by the OMNeT++ simulator. The OMNeT++ will help us complete all the functionality of OSI data link layer , include the ARP protocol (since we use IPV4 on OSI network layer) and the communication between processing subsystem and network subsystem.

There are three main parts need to build a simulation network of CPS. First, a robust and seamless communication need to be build between processing sub–system and network subsytem, this communication is established throw an HLA run-time infrastructure (RTI) wrapper. The user interface library is imported and the a HLA OMNET (programmed in the c++, HLA OMNET is a part of module behaviour description) is integrated to a reference tem- plate design as a stating point for all the network simulation. HLA OMNET is a CERTI HLA complaint wrapper was developed to offers a unique in- terface to each network node simulated in the network subsystem in order to communicate in a consistent and synchronised way with the processing subsystem and it is a wrapper to RTI Ambassador Class so as to exchange messages over the network, in particular with the HLA server(RTIG pro- cess), via TCP (and UDP) sockets. In other words the HLA OMNET is the counter part wrapper of the HLA GEM5 mentioned in the previous subsec- tion.

48 Second, besides the module behaviour description about HLA, we have also the txc.cc file that describe the functionality of OSI layer2, including the ARP protocol implementation, CRC (Cyclic redundancy check) error– detecting algorithm, encapsulate data transmitting from GEM5 node to network node (through HLA) and the reverse.

Third, there is a network topology is description ned file, the ned file de- fines how the submodes (router, switch, client, and server) are connected together. These network submodes are imported from simulation kernal li- braries, in our simulation we import form the OMNeT++ INET library.

A simplified configuration diagram that the OMNeT++ adapt to the CPS simulator is shown by Figure 5.5.

Figure 5.5: CPS simulator network subsystem.

5.3 Integration tool: CERTI HLA

HLA is foremost a general purpose, reusable software architecture for the de- velopment andexecution of very large distributed simulation application[22].

49 The HLA has a wide applicability,across a full range of simulations areas, in- cluding education and training, analysis, engineering; web based distributed applications, real–time critical applications and variety of level resolution. Thus, the HLA supports interfaces to live participants, such as instruments platforms and live systems. These widely different applications areas indi- cate the variety of requirements that have been considered in development and evolution of the HLA. HLA is an initiative to capture the best sides of DIS[69] and ALSP[] and to provide at the same time a standard architecture for software simulation [22].

For this reason the IEEE standard High Level Architecture (HLA) [16] will be used for interconnection of the processing and networking simulation sub-systems. As mentioned in the network subsystem and the processing subsystem, we employ the CERTI-HLA to achieve the syncronisation of the different processing subsystem and the network subsystem.

5.3.1 CERTI HLA architecture

CERTI is an HLA RTI ,Figure 5.6 shows the resulting layered implementa- tion using CERTI, the lower layers consist of two types of process, local ones called RTI Ambassadors(RTIA), as mentioned above are HLA OMNET and HLA GEM5 and a central one called RTI Gateway(RTIG). These process are linked with each other using Unix and TCP sockets. Thereby the RTIG is of predominant importance since any form of communication between fed- erates, be it for data exchange or be it for synchronisation purpose, is done through the RTIG.

Specifically, each subsystem process interacts locally with an RTI Ambas- sador process(RTIA) through a Unix-domain socket. The RTIA processes exchange messages over the network, in particular with the RTIG process, via TCP (and UDP) sockets, in order to run the various distributed system associated with the RTI services. A specific rule of the RTIA is to imme- diately satisfy some federate requests, while other requests require network messages sending or working. The RTIA manages memory allocation for the message FIFOs and always listens to both the federate and the network ( the RTIG).

On the other hand, the RTI Gateway(RTIG) is a centralisation point in the architecture. Its function has been to simplify the implementation of some services. It manages the creation and destruction of federation executions

50 and publications/subscription of data. It plays a key role in message broad- casting which has been implemented by an emulated multicast approach. When a message is received from a given RTIA, the RTIG will deliver it to the interested RTIA, avoiding a broadcasting.

Figure 5.6: The CERTI HLA architecture.

5.3.2 CERTI HLA Synchronisation

CERTI HLA can provide synchronization through network providing time management services, which enable deterministic and reproducible distributed simulations. In the processing subsystem and network subsystem, we have the RxPacketTime for synchronisation per node and SynchTime for global synchronisation. Each federates manage its own logic time and communi- cates this time to the RTI. The RTI ensures correct coordination of fed- erates by advancing time coherently. CERTI HLA defines Time–Stamp ordered(TSO) events which are supposed to occur at specific points in time.

51 Regulating subsystem generate TSO events (possibly out of time-stamp or- der) must no early than the current local time plus the local–ahead. The local–ahead acts as a contract value which guarantees that the subsystem will not generate a TSO event earlier than its current local time plus looka- head. Following are the detail about the synchronization:

• Synchronisation per node Each processing subsystem node simulator (gem5) needs to communicate in a consistent way with its counterpart in network simulator(OMNeT++) in order to exchange data packets efficiently. For example, the processing simulator instances node 0 has to be synchronised with the representation of node0 in the network simulator. The processing simulator instances of node1 is synchro- nised with the representation of node1 in the network simulator. This type of simulation is very important because the same node must seam- lessly exchange the network data between the two simulators , while preserving exact the same time ordering, that means the same amount of data packets must go through the same number of “steps” in both simulators. For this reason, one federation will be created per node to achieve the same synchronisation time per node as illustrated in the Figure 5.7.

• Global synchronisation The whole cps system needs to synchronise all nodes simultaneously periodically because it will surport differ- ent types of CPUs with different clock cycles and different network protocols. For this reason, the simulated time in each node can be completely different given the same realtime. The unpredictable time drafts between the simulated nodes can surface, resulting in simulation errors.( e.g invalidated network packets) Further more, incase of sys- tem crash, the simulator must have saved the last state, while in case of vulnerabilities detection, the simulator must have saved more than one previous state. For all the aforementioned reasons, global syn- chronisation is necessary to achieve either automatically or from user so as to avoid the previous circumstances as illustrated in the Figure 5.7. We are able to relax the constrain of the Global synchronisation time gaining simulation speed assuming that the communication of the node with the network or the environment (in terms of latency) is slower without losing significant simulation accuracy.(synchtime)

In conclusion the Synchronization time per node and the Global Synchro- nization time are two different entities that can be separately defined by the

52 Figure 5.7: CERTI HLA Global Synchronization.

user. The first one is mostly defined by the latency of the network inter- face and it doesn’t constrain the simulation speed while the second one is a trade–off between simulation speed and simulation accuracy.

53 Figure 5.8: CERTI HLA Local and Global Synchronisation

54 Chapter 6

Visual Search Evaluation on CPS simulator

In this chapter, three main metrics will be used in evaluation:

• Execution time The execution time will be compared to the result of internal timers in the software code, in order to check both the simu- lation speed accuracy and the slowdown of simulation time compared to native execution.

• Quality of the results To test the quality of the results of accuracy of the search will be compared with the one obtained on the native system with the same set of query images.The algorithms implemented in the CPS simulator will be the same as the one in native system so a detailed check of all the algorithmic steps will detect the quality of results of the CPS simulator itself.

• Network latency As no native methods to measure the latency are available for this test case, only “soft” criteria (usefulness of the re- sults, ease of use of the simulator, time to design) will be assessed.

As mentioned in the previous chapter, we have several outputs from the GEM5 and OMNet++. In OMNet++ console, we get the statistics of the nuber of packets send and what kind of mesages they are, end to end delay, however what we use to evaluation the above metrics are in the GEM5 out- puts, from GEM5 output we can also get the end to end delay by use the utility “ping”, following give the description about some GEM5 output files we have used to get the evaluation information.

55 • config.ini & config.json Contain a list of the every SimObject created for the simulation and the value doe its operator.

• stats.txt A text representation of all of the gem5 statistics registered for the simulation, including the simulation time inside the system recorded as sim seconds and the time passed in real system recorded as host seconds in this file.

• etherdump file a file can be read by Wireshark, recording the information about all the packet of transmission in the network.

• testsys.pc.com 1.terminal & testsys.relview.uart.terminal Record what has been runing in the simulated operating system, the previous one is recorded for X86–based CPU and the latter is recorded for ARM– based CPU.

We will give our eavaluation results under three different scenario.

6.1 Test Scenario 1

This scenario focuses on test the extraction process of the visual search on client (ARM–based gem5) and retrieval stage on server (X86–based gem5). In this scnario, we only need the the simulation of the processing subsystem 5 without any synchronisation using HLA or network simulation using om- net++.The client application is configured to dump intermediate process- ing results ( coordinates and attributes of the key points, feature vectors extracted from the key points) and save the descriptors computed from the images to file. The data dumped can then be compared to a reference result in a real arm system odroid to access the simulation accuracy. On the server side, we would use the pre–processed descriptor in the server system, and simulate the retrieval stage with different size odf database in two situation ,one is that retrieval with only global descriptor ,the other is retrieval with both the global descriptor and local descriptor matching. Further more the results will compare the reference one in a native X86 system.

6.1.1 Descriptor Extraction on the user side

Here we modeling the client as a ARM operating system, in Table 6.1 the information about the simulated ARM cpu and the reference native ARM CPU is given, besides we need to mentioned that the the simulated ARM cpu is a in order AtomicSimple model and without any caches because it

56 Table 6.1: Information of simulated ARM CPU and the reference real ARM CPU

takes more than 10 hours to extract the descriptor from a GEM5 ARM–baed system with caches.

Table 6.2: Execution time of Descriptor Extraction

• Quality of results The descriptor extracted in the GEM5 system is completely same with the one extracted in the real ARM system ODROID, the GEM 5 simulation on client side has 100%.

• Execution time Table 6.1 shows the time to extract the descriptor in the recorded in the application and shown on the gem5 terminal , the

57 Figure 6.1: Execution time of Descriptor Extraction

sim seconds and the host seconds recorded by gem5, host seconds is the real time passed in the host machine when running the simulation in gem5, besides the execution time in a real ARM system is given. Figure 6.2 shows all the execution time except the host seconds for it too large compared to others. We can see that the time recorded by application it self is accord with the sim seconds in the gem5 stats.txt , also the real time is with a slow-down factor of 1500x compared to the simulation time and it takes more than two hours.

Descriptor extracting is time–consuming in gem5 simulator, for the reason that when we extract the descriptor the query image needed to be divided into block by block and the block size is defined as 128B, the image used for testing is 222.9kB, which means we have around 1700 blocks, then takes Fast Fourier Transformation on each block for the purpose filter it by LOG filter in frequency domain, the Fast Fourier Transformation is also time- consuming.

6.1.2 Retrieval on server side

First in Table 6.3 we give the information about the GEM5 simulated X86– based CPU and the information about the reference native X86 CPU.

58 Table 6.3: Information of simulated X86 CPU and native X86 CPU

Table 6.4: Execution time of Retrieval Stage

• Quality of results The global scores and local scores that get in the GEM5 X86–based system are identical to the ones generated in native X86 system with the same query image and the same database and get the same top matched images. In other words, X86–based gem5 has 100% accuracy.

• Execution time Using both the local and global descriptors to identify an object, 707ms simulation seconds are required to get the

59 Figure 6.2: Execution time of Retrieval Stage

Figure 6.3: Execution time of Retrieval Stage(host seconds)

top 25 matched images from a database with 100 images. The simu- lation take around 11 minutes in real time, with a slowdown factor of 1500X. When only the much faster global score is used, it takes around 40ms simulation seconds (26s in del time) to get the top25 matched

60 images, with a slow-down factor of 65.

The local score is more expensive because, besides comparing a single de- scriptor vector with all the images in the database, it also perform a point- by-point match between local descriptors in the query and candidate image, and an expensive geometric verification through the RANSAC algorithm.

6.2 Test Scenario 2

Figure 6.4: Network topology for single client and single server

This Scenario focus on a client-server architecture and introduce the net- work. In this scenario we have a server node( x86 system) and a client node(ARM system) connected by a wired network(Ethernet), the network topology is shown in Figure 6.4. Since it takes more than two hours to ex- tract the descriptor from the query image, in this scenario, the extraction

61 stage has been removed and the pre-computed descriptors are copied in the client simulated system and direct loaded from file.

Table 6.5: Network latency of the single client situation

• Quality of results Compared to the retrieval results on a real X86 processor with the same top 25 images and the same global descrip- tor matching scores and local descriptor matching scores.The packet transmission is 100% correct and the gem5 system still have 100% accuracy after integrating with OMNeT++.

• Network latency In Figure 6.5 and Table 6.5, we give the different network latency according to different RxPacket time, the network latency is measured by the ping rate between the two systems. The network latency is increased as the RxPacket time increased.

• Execution time The following simulation time are tested under 10ms synchronisation time(global synchronization) and 10ms RxPacket time.

We have use four different desc with the same size to test the simula- tion time. It takes the server node 1.23 simulation seconds accept a connection request from the client node and perform the matching pro- cess both local and global descriptors while on the client side, the time

62 Figure 6.5: Network latency of the single client situation

between connect the server and receive the results is 1.41 simulation seconds.

It takes a server 0.6 simulation seconds to accept the network client and retrieval only by global descriptors while on client side the time between connect the server and receive the results is 0.78 simulation seconds. The above datas are shown in Figure 6.6.

The time mentioned above is using the time recorded in ether dump file and we don’t given the realtime passed on host, for the reason that the time recorded in gem5 stats file is polluted when there is a network for packet transmission. There is considerable variability in the GEM5 realtime recording and simulation time recording while the simulation time recorded by the application itself are the same. These difference can be seen in the table in AppendixC.

Furthermore, the time needed to boot this CPS simulator, X86–ARM with the omnet++ network integrated by CERTI HLA is about 9 minutes,the ARM-based GEM5 system booting is 1 or 2 minutes faster than X86–based GEM5.

63 Table 6.6: Execution time of client and server

Figure 6.6: Execution time of client and server

6.3 Test Scenario 3

As the figure 6.7 shows where a x86 Server (node0) is connected through the switch1 with the ARM node2. Both belonging to the same Class-C subnets defined by their configuration files.They also have the same gateway (router) in the address 10.0.0.1. We have a similar situation on the other side of the network (node1, node3). Those nodes also share the same Class-C subnet

64 Figure 6.7: Network topology for multiple clients with a server

network using the switch2 with base ip the 10.0.1.1 which is their gateway address to the rest of the network. All the nodes can ping each other ei- ther using their local switch (L2 routing) or using the router that passes the packets through of the side of the network using L3 routing. Single server with multiple client will be used in this scenario.

• Quality of results Compared to a native run on the x86 proces- sor, all the clients can get the 100% correct results. In other words, the parallel implementation of the server used to manage multiple simul-

65 taneous connections is in the CPS simulator is successful and the the pthread library used to implement concurrency in the Visual Search application is correct.

• Network latency As the Table 6.7 and Figure 6.8 illustrated, the client that in the same subnet with the server has the small network latency, the other two in the subnet that different from the server’s subnet have almost the same network latency.

Table 6.7: Network latency of the three clients situation.

• Execution time The Figure6.9 gives the time that clients get the results from server after they send the request, both the local descrip- tors and global descriptors are used to retrieve the results. The client1 is the node2 in the network topology figure that in the subnet with the server, the other 2 are the node1 and node 3 in another subnet with the server.

66 Figure 6.8: Network latency of the three clients situation.

Table 6.8: Execution time that clients get the results.

67 Figure 6.9: Execution time that clients get the results.

68 Chapter 7

Conclusion and Recommendation

7.1 Conclusion

It can be observed that GEM5 and OMNet++ based CPS simulator have 100% accuracy in the simulation of X86 platform and ARM platform and this CPS simulator performs well when handle concurrent process. As re- spect to the execution time, this is a big issue of the descriptor extraction in the CPS simulator, especially when we want to simulate the Multi–core X86 server and ARM with caches which are important for the power esti- mation. As respect to the network latency, we can control them by setting the RxPacket time, this is also a parameter that have a infuence on the local synchronisation. It’s a pity that we don’t realize the wireless network now.

7.2 Recommendation

For visual search, we could consider a more complex situation that the client complete the visual search with a small database offered by the server, if it successful then there’s no need to send the send the descriptor to the server otherwise send it to the server to complete the visual search. Further more, the client can do the retrieval just with a few global descriptors at first, after ranked them, if the first matching’s global score is much more higher than others then there’s no need for the homography check with local descriptor, since local descriptor matching is time–consuming otherwise ask the local descriptors of the top rankings of the global descriptor related images.

For CPS simulator processing sub system, it’s important to accelerate the the gem5 processing speed and implement a power measurement. For CPS simulator network subsystem, it’s necessary to have the wireless network simulation for the reason that the Mobile Visual Search is based on wireless network.

70 Bibliography

[1] Habib M Ammari. The Art of Wireless Sensor Networks. Springer, 2014.

[2] ATMEU. "http://www.hynet.umd.edu/research/atemu/".

[3] AvororaZ. "http://rijndael.ece.vt.edu/gezel2/".

[4] AVRORA. "http://compilers.cs.ucla.edu/avrora/".

[5] Jeffrey R Bach, Charles Fuller, Amarnath Gupta, Arun Hampapur, Bradley Horowitz, Rich Humphrey, Ramesh C Jain, and Chiao-Fe Shu. Virage image search engine: an open framework for image manage- ment. In Electronic Imaging: Science & Technology, pages 76–87. In- ternational Society for Optics and Photonics, 1996.

[6] DH Ballard. Brown ch. m., computer vision, 1982.

[7] Chad Carson, Serge Belongie, Hayit Greenspan, and Jitendra Malik. Blobworld: Image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 24(8):1026–1038, 2002.

[8] Pierre Chapuis. “For image recognition as a service, what are the advantages and disadvantages of kooaba, IQ En- gines and Moodstocks?”. Quora.com. http://www.quora.com/ For-image-recognition-as-a-service-what-are-the-advantages\ \-and-disadvantages-of-kooaba-IQ-Engines-nd-Moodstocks, May 2011.

[9] Pierre. Chapuis. “What is the technology stack behind Google Goggles?” Quora.com . https://www.quora.com/Google-Goggles/ What-is-the-technology-stack-behind-Google-Goggles, Febru- ary 2012.

71 [10] Thomas Deselaers, Daniel Keysers, and Hermann Ney. Features for image retrieval: an experimental comparison. Information Retrieval, 11(2):77–107, 2008.

[11] Peiyuan Dong, Yue Han, Xiaobo Guo, and Feng Xie. A systematic review of studies on cyber physical system security. Int. J. Secur. Appl, 9(1):155–164, 2015.

[12] Nathan L Binkert Ronald G Dreslinski, Lisa R Hsu, Kevin T Lim, and Ali G Saidi Steven K Reinhardt. The m5 simulator: Modeling networked systems. Ann Arbor, 1001:48109–2121, 2008.

[13] Ling-Yu Duan, Feng Gao, Jie Chen, Jie Lin, and Tiejun Huang. Com- pact descriptors for mobile visual search and mpeg cdvs standardiza- tion. In 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), pages 885–888. IEEE, 2013.

[14] Jack Eisenhauer, Paget Donnelly, Mark Ellis, and Michael O‘Brien. Roadmap to secure control systems in the energy sector. Energetics Incorporated. Sponsored by the US Department of Energy and the US Department of Homeland Security, 2006.

[15] Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, and William Equitz. Efficient and effective querying by image content. Journal of intelligent information systems, 3(3-4):231–262, 1994.

[16] IEEE 1516-2010 Standard for Modeling, Simulation High Level Archi- tecture Framework, and Rules.

[17] Gianluca Francini, Skjalg Lepsøy, and Massimo Balestri. Selection of local features for visual search. Signal Processing: Image Communica- tion, 28(4):311–322, 2013.

[18] gem5. "http://gem5.org/".

[19] Helen Gill. Cyber-physical systems: Beyond es, sns, and scada. In Presentation in the Trusted Computing in Embedded Systems (TCES) Workshop, 2010.

[20] JL G´omez-Barroso, R Compa˜n´o, C Feij´oo, M Bacigalupo, O Westlund, S Ramos, et al. Prospects of mobile search eur 24148 en. Seville: Institute for Prospective Technological Studies. European Commission, 2010.

72 [21] Google Googles. http://www.google.com/mobile/goggles/#text, 2011.

[22] Akram Hakiri, Pascal Berthou, and Thierry Gayraud. Addressing the challenge of distributed interactive simulation with data distribution service. arXiv preprint arXiv:1008.3759, 2010.

[23] Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.

[24] HASE. "http://www.icsa.inf.ed.ac.uk/research/groups/hase/ ".

[25] Shushu Inbar. “kooaba, image recognition”. kooaba.com. http://www. kooaba.com/, 2011.

[26] Qasim Iqbal and Jake K Aggarwal. Cires: A system for content-based retrieval in digital image libraries. In Control, Automation, Robotics and Vision, 2002. ICARCV 2002. 7th International Conference on, vol- ume 1, pages 205–210. IEEE, 2002.

[27] J-Sim. "http://www.physiome.org/jsim/".

[28] Sabina Jeschke. Cyber-physical systems-history, presence and future. Industrial Advisory Board, Aachen, Germany, 2013.

[29] Svein Johannessen. Time synchronization in a local area network. IEEE control systems, 24(2):61–69, 2004.

[30] Stamatis Karnouskos. Cyber-physical systems in the smartgrid. In 2011 9th IEEE International Conference on Industrial Informatics, pages 20–23. IEEE, 2011.

[31] Ji Eun Kim and Daniel Mosse. Generic framework for design, model- ing and simulation of cyber physical systems. ACM SIGBED Review, 5(1):1, 2008.

[32] Edward A Lee. The problem with threads. Computer, 39(5):33–42, 2006.

[33] Michael S Lew. Next-generation web searches for visual content. Com- puter, 33(11):46–53, 2000.

[34] Michael S Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. Content-based multimedia information retrieval: State of the art and

73 challenges. ACM Transactions on Multimedia Computing, Communi- cations, and Applications (TOMM), 2(1):1–19, 2006.

[35] Xu Liu, Jonathan J Hull, Jamey Graham, Jorge Moraleda, and Tim- othee Bailloeul. Mobile visual search, linking printed documents to digital media. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[36] David G Lowe. Distinctive image features from scale-invariant key- points. International journal of computer vision, 60(2):91–110, 2004.

[37] Wei-Ying Ma and Bangalore S Manjunath. Netra: A toolbox for navi- gating large image databases. Multimedia systems, 7(3):184–198, 1999.

[38] Milo MK Martin, Daniel J Sorin, Bradford M Beckmann, Michael R Marty, Min Xu, Alaa R Alameldeen, Kevin E Moore, Mark D Hill, and David A Wood. Multifacet’s general execution-driven multiproces- sor simulator (gems) toolset. ACM SIGARCH Computer Architecture News, 33(4):92–99, 2005.

[39] Liam M Mayron. Image retrieval using visual attention. Personality and Individual Differences, 40(5):873–884, 2006.

[40] Mixim. "http://mixim.sourceforge.net".

[41] Moodstocks. http://www.moodstocks.com/how-it-works/, Septem- ber 2012.

[42] NetSim. "http://www.boson.com/netsim-cisco-network-simulator".

[43] Spiros Nikolopoulos, Stavri G Nikolov, and Ioannis Kompatsiaris. Study on mobile image search. European Communities, 2010.

[44] Nokia. How does point and find work? http://betalabs. nokia.com/trials/nokia-point-and-find/discussion/ how-does-point-and-find-work, May 2012.

[45] NS-2. "http://www.isi.edu/nsnam/ns/".

[46] NS-3. "https://www.nsnam.org/".

[47] National Institute of Standards and Technology. Cyber-Physical Sys- tems: Situation Analysis of Current Trends, Technologies and Chal- lenges. 2012.

[48] OMNet++.

74 [49] oMoby. https://www.iqengines.com/omoby/, September 2012.

[50] OVP. "http://www.ovpworld.org/".

[51] Alex Pentland, Rosalind W Picard, and Stan Sclaroff. Photobook: Content-based manipulation of image databases. International journal of computer vision, 18(3):233–254, 1996.

[52] Florent Perronnin and Christopher Dance. Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.

[53] Ragunathan Raj Rajkumar, Insup Lee, Lui Sha, and John Stankovic. Cyber-physical systems: the next computing revolution. In Proceedings of the 47th Design Automation Conference, pages 731–736. ACM, 2010.

[54] Daniel Sanchez and Christos Kozyrakis. Zsim: fast and accurate mi- croarchitectural simulation of thousand-core systems. ACM SIGARCH Computer Architecture News, 41(3):475–486, 2013.

[55] SESC. "http://iacoma.cs.uiuc.edu/~paulsack/sescdoc/".

[56] Sven Siggelkow, Marc Schael, and Hans Burkhardt. Simba – search images by appearance. In Joint Pattern Recognition Symposium, pages 9–16. Springer, 2001.

[57] Simics. http://www.windriver.com/products/simics/.

[58] Simplescalar. "http://www.simplescalar.com/".

[59] Martin Stehlık. Comparison of simulators for wireless sensor networks. PhD thesis, Ph. D. dissertation, Masaryk University, 2011.

[60] Harsh Sundani, Haoyue Li, Vijay Devabhaktuni, Mansoor Alam, and Prabir Bhattacharya. Wireless sensor network simulators a survey and comparisons. International Journal of Computer Networks, 2(5):249– 265, 2011.

[61] Herb Sutter and James Larus. Software and the concurrency revolution. Queue, 3(7):54–62, 2005.

[62] TOSSIM. "http://tinyos.stanford.edu/tinyos-wiki/index.php/ TOSSIM".

75 [63] Sam S Tsai, David Chen, Gabriel Takacs, Vijay Chandrasekhar, Ra- makrishna Vedantham, Radek Grzeszczuk, and Bernd Girod. Fast geo- metric re-ranking for image-based retrieval. In 2010 IEEE International Conference on Image Processing, pages 1029–1032. IEEE, 2010.

[64] Jan Van Campenhout, Peter Verplaetse, and Henk Neefs. Escape: En- vironment for the simulation of computer architectures for the purpose of education. In Proceedings of the 1998 workshop on Computer archi- tecture education, page 9. ACM, 1998.

[65] Yunbo Wang, Mehmet C Vuran, and Steve Goddard. Cyber-physical systems in industrial process control. ACM Sigbed Review, 5(1):12, 2008.

[66] CERTI HLA website.

[67] The Moving Picture Experts Group website. “compact descrip- tors for visual search,”. http://mpeg.chiariglione.org/standards/ mpeg-7/compact-descriptors-visual-search.

[68] Wikipedia. Cpu sim — wikipedia, the free encyclopedia. "https://en. wikipedia.org/w/index.php?title=CPU_Sim&oldid=737578164", 2016. [Online; accessed 3-September-2016].

[69] Wikipedia. Distributed interactive simulation — wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title= Distributed_Interactive_Simulation&oldid=710617075, 2016. [Online; accessed 18-March-2016].

[70] Wikipedia. High-level architecture — wikipedia, the free encyclope- dia. https://en.wikipedia.org/w/index.php?title=High-level_ architecture&oldid=718446651, 2016. [Online; accessed 3-May- 2016].

[71] Wikipedia. Like.com — wikipedia, the free encyclopedia, 2016. [Online; accessed 9-February-2016].

[72] Wikipedia. Mikrosim — wikipedia, the free encyclopedia, 2016. [Online; accessed 20-July-2016].

[73] WorldSens. "http://wsim.gforge.inria.fr/tutorials/wasp/ files/wsim-tutorial.pdf".

[74] Changchang Wu. Siftgpu: A gpu implementation of scale invariant feature transform (sift). 2007.

76 7.2. Recommendation 77

[75] Fei Yu and Raj Jain. A survey of wireless sensor network simulation tools. Washington University in St. Louis, Department of Science and Engineering, 2011.

[76] Nickolai Zeldovich, Alexander Yip, Frank Dabek, Robert Morris, David Mazieres, and M Frans Kaashoek. Multiprocessor support for event- driven programs. In USENIX Annual Technical Conference, General Track, pages 239–252, 2003.

[77] Na Zhao, Min Chen, Shu Ching Chen, and Mei-Ling Shyu. Proceedings- 11th ieee symposium on object/component/service-oriented real-time distributed computing, isorc 2008. In 11th IEEE Symposium on Object/Component/Service-Oriented Real-Time Distributed Comput- ing, ISORC 2008, 2008. Appendix A

Imgae Retrieval results with only global descriptors 79

Retrieval with global descriptors

Index local score global score images in databae 46 0 416,926 ../cds/smvs_cd_covers_001.jpg 48 0 52,4838 ../cds/smvs_cd_covers_022.jpg 19 0 50,9705 ../cds/smvs_cd_covers_037.jpg 58 0 49,0349 ../cds/smvs_cd_covers_021.jpg 54 0 46,3411 ..//cds/smvs_cd_covers_027.jpg 92 0 44,941 ../cds/smvs_cd_covers_016.jpg 63 0 43,8577 ../cds/smvs_cd_covers_006.jpg 98 0 43,1942 ../cds/smvs_cd_covers_035.jpg 18 0 42,8789 ../cds/smvs_cd_covers_050.jpg 16 0 42,4278 ../cds/smvs_cd_covers_053.jpg 99 0 41,4549 ../cds/smvs_cd_covers_024.jpg 64 0 40,7722 ../cds/smvs_cd_covers_093.jpg 11 0 40,664 ../cds/smvs_cd_covers_076.jpg 86 0 40,4235 ../cds/smvs_cd_covers_041.jpg 59 0 39,7443 ../cds/smvs_cd_covers_059.jpg 23 0 39,6823 ../cds/smvs_cd_covers_009.jpg 31 0 39,6542 ../cds/smvs_cd_covers_031.jpg 29 0 39,6197 ../cds/smvs_cd_covers_048.jpg 44 0 39,5582 ../cds/smvs_cd_covers_088.jpg 13 0 39,1097 ../cds/smvs_cd_covers_052.jpg 91 0 39,1063 ../cds/smvs_cd_covers_081.jpg 12 0 38,642 ../cds/smvs_cd_covers_067.jpg 81 0 38,1758 ../cds/smvs_cd_covers_056.jpg 84 0 37,7245 ../cds/smvs_cd_covers_044.jpg 28 0 37,4837 ../cds/smvs_cd_covers_073.jpg

Page 1 Appendix B

Imgae Retrieval results with global and local descriptors

. 81

Retrieval with both gloobal and local descriptors

Index local score global score images in databae 46 89,1758 416,926 ../cds/smvs_cd_covers_001.jpg 92 1,72339 44,941 ../cds/smvs_cd_covers_016.jpg 48 1,1655 52,4838 ../cds/smvs_cd_covers_022.jpg 19 0,449217 50,9705 ../cds/smvs_cd_covers_037.jpg 58 0,762318 49,0349 ../cds/smvs_cd_covers_021.jpg 54 0,789925 46,3411 ../cds/smvs_cd_covers_027.jpg 63 0,768383 43,8577 ../cds/smvs_cd_covers_006.jpg 98 0,435087 43,1942 ../cds/smvs_cd_covers_035.jpg 18 1,38318 42,8789 ../cds/smvs_cd_covers_050.jpg 16 0,389011 42,4278 ../cds/smvs_cd_covers_053.jpg 99 0,428067 41,4549 ../cds/smvs_cd_covers_024.jpg 64 0,264507 40,7722 ..//cds/smvs_cd_covers_093.jpg 11 0,791078 40,664 ../cds/smvs_cd_covers_076.jpg 86 0,703718 40,4235 ../cds/smvs_cd_covers_041.jpg 59 0,630267 39,7443 ../cds/smvs_cd_covers_059.jpg 23 1,27729 39,6823 ../cds/smvs_cd_covers_009.jpg 31 0,654689 39,6542 ../cds/smvs_cd_covers_031.jpg 29 0 39,6197 ../cds/smvs_cd_covers_048.jpg 44 0,960436 39,5582 ../cds/smvs_cd_covers_088.jpg 13 0,440211 39,1097 ../cds/smvs_cd_covers_052.jpg 91 0,976879 39,1063 ../cds/smvs_cd_covers_081.jpg 12 0,897965 38,642 ../cds/smvs_cd_covers_067.jpg 81 1,34167 38,1758 ../cds/smvs_cd_covers_056.jpg 84 0,261721 37,7245 ../cds/smvs_cd_covers_044.jpg 28 0,477224 37,4837 ..//cds/smvs_cd_covers_073.jpg

Page 1 Appendix C

Excution time of the simulation in Test scenario 2

. 83

Execution time of Retrieval with only global descriptors

desc1 desc2 desc3 desc4 Recorded by server(ms) 18,9323 18,94 18,9452 18,9648 server_time_in_stats_simseconds(s) 0,986454 0,86668 0,876468 1,066525 server_time_in_stats_hostseconds(s) 273,24 270,24 300,83 268,4

Recorded by client(ms) 88,694 88,6947 88,7372 88,7 client_time_in_stats_simseconds(s) 0,649673 46,060546 0,656819 157,340547 client_time_in_stats_hostseconds(s) 270,41 1314,39 298,39 2075,92

Execution time of Retrieval with global and local descriptor desc1 desc2 desc3 desc4 Recorded by server(ms) 640,153 640,059 640,032 642,754 server_time_in_stats_simseconds(s) 2,237363 1,697127 1,457293 1,459743 server_time_in_stats_hostseconds(s) 898,99 811,66 869,88 880,87

Recorded by client(ms) 88,6943 88,6948 88,693 88,6928 client_time_in_stats_simseconds(s) 1,279673 89,500546 1,279674 31,720546 client_time_in_stats_hostseconds(s) 886,16 1954,27 860,57 1667,93

Page 1