Catalogue of Technologies

2018

QEMU EMULATOR PLAT­ BINARY CODEANALYSIS 22 STATIC ANALYZER SVACE 14 contents SEARCH SYSTEM EXPLORATORYSCINOON: 55 BASIC SEMANTICANALYZER TEXTERRA 47 POLICY MODELS MODULES ANDSECURITY OF LINUXKERNEL DEDUCTIVE VERIFICATION ASTRAVER TOOLSET. 38 GNU CPROGRAMS STATIC VERIFICATION OF THE TECHNOLOGY OF KLEVER: 31 FORM BASEDON

SPATIO-TEMPORAL DATA AND ANALYSIS OFLARGE INDEXING, SEARCHING THE TECHNOLOGY OF COMPLEX OF SOLUTIONS 57 TALISMAN TECHNOLOGY MEDIAANALYSISSOCIAL 50 MODULAR AVIONICS OF INTEGRATED DEVELOPMENT TOOLS FOR MASIW. 41 GENERATORPROGRAMS MICROTESK. TEST 34 ISP OBFUSCATOR 25 CODE ANALYSIS TOOLА BINSIDE. STATIC BINARY 17

SPATIO-TEMPORAL DATA AND ANALYSIS OFLARGE INDEXING, SEARCHING ANALYZER DYNAMICANXIETY 19 OF INNOVATIONS ASANECOSYSTEMISP RAS 6 ENDANGERED LANGUAGES DOCUMENTATION OF LABORATORY FOR LINGVODOC: VIRTUAL 53 T CONSTRUCTIVITY 4D: 44 TOOL REVESRE ENGINEERING HDL-DESCRIPTIONS RETRASCOPE: 36 ANALYZERTRAFFIC PROTOSPHERE NETWORK 28 HE TECHNOLOGY OF

Catalogue of Technologies Arutyun Avetisyan Doctor of Science, Corresponding Member of the Russian Academy of Sciences, Director of ISP RAS.

Ivannikov Institute for System Programming of the Russian Academy of Sciences is the leading competency center in the field of system programming in Russia. ISP RAS experts create high-end technologies that allow the Institute to compete with the R&D centers of international IT corporations and the world’s top scientific research organizations in diverse areas of system programming — source code analysis, verification, data analysis, operating systems, etc.

The Institute’s success is based on an ecosystem that supports complete development chain, from generation of ideas and basic research through technologies and products ready for transfer to customers to actual deployment, and allows training of highly qualified IT experts. Thanks to the founder of the Institute Victor Ivannikov, ISP RAS has preserved the scientific school, which began its formation during the Soviet era, and has been adapted to the modern realities. Nowadays, it shows its viability and effectiveness in the conditions of high mobility of scientists, ideas, and global competition.

The key mechanism for retaining advanced positions is focusing on science-intensive innovations which are based on long-term research projects and sustained partnership. Technologies are developed in close integration with the industry, transferred to and used in leading Russian and international companies. Among the long-term partners of the Institute are Samsung, , EMC, HPE, Intel, Nvidia, Rogue Wave, Foundation. Many of them have created joint laboratories with the Institute. ISP RAS performs joint projects with leading research and university centers, such as Cambridge (UK), Carnegie Mellon (USA), INRIA 4 (France), University of Passau (Germany) and others. Conference of ISP RAS at the end of 2016. endof atthe Conference ISPRAS of Research 1-st widerpublicatthe andDevelopmentto the were for first time the demonstrated described technologies known professional onlyto circles. particular been time The commercial projects, Institute’s the products have for along of industrialpartners andregardingwith specifics the for deployment. Due courseto to closecooperation the are that alreadytechnologies inuseby industryorready the ISPRAS the provides of catalogue the description part of revealsecosystem, Themain ourapproaches to education. Institute’s structure the whichdetailsthe of model, business ISPRAS the of description the with opens The catalogue system programming. fieldof the engineersin skilled andbecome have publications scientific life projects. graduation many scientific students Bythe time programs, studentsandpostgraduates are involved inreal- educational their of From very the beginning Economics. forInstitute Physics andTechnology, of andHigherSchool Russian Universities: University, MoscowState Moscow as hostssystem programming leading the departmentsof providesThe Institute itsown postgraduate program aswell

5 Catalogue of Technologies as an Ecosystem of Innovations

The research activity of ISP RAS is aimed at transferring the results of basic research to industry or to other spheres of use. It means that all the Institute’s activities focus on ensuring that the technologies, software products, methods for solving problems of system programming created at the Institute meet modern requirements and are maximally ready for adoption by the industry.

The Institute’s business model consists of three closely related activities, which together provide a synergistic effect: • project-oriented basic and applied research in the field of system programming (under contracts with Russian and international companies, the governmental programs, scientific foundations), aimed primarily at creating new technologies; • innovations - projects to transfer the results of advanced research to the industrial partner companies. An innovative product is impossible without the feedback from the industry; • education - training students and postgraduates on the basis of modern technologies developed and used in ISP RAS works, involving students in research and industrial projects of the Institute.

The model is well known and used both in research labs of top-ranked universities (Stanford, MIT, Berkeley, Carnegie Mellon), and in laboratories of industrial giants (for example, IBM and Intel), as well as in public research centers such as INRIA (France), Fraunhofer (Germany), and others. This efficiently implemented model helps bridge the gap between science and industry, as well as to train highly qualified researchers and engineers capable of creating and implementing new technologies.

6 Research cooperation Basic Research the modern technological level. technological modern the andindustryat integration science,ensures the of education industrialenterprises. alaboratory the Such of benefit the laboratory implementresearch to efficiently projects for software and applied engineers work inthe side-by-side FANLIGHT, created hasbeen andevolves rapidly. System cloudplatform service-oriented technological onthe based A laboratory problems, mechanics for solvingcontinuum EU andUSA. leading research with conducted anduniversity centers from ultra-highsystems of performance. Jointprojects are being for andpromisingtechnologies modern heterogeneous CUDA Research NVIDIA-parallel Center with computing projects; large-scale of planning andnD-modeling Synchrodata analysis andprocessing; (UK)-comprehensive (USA)-Big Dell vulnerabilitiesanalysis andsecurity detection; Rogue Wavewith code static (USA)isfor of technologies context mobileplatforms of (ARM,Android, lab Tizen); the Korea) inthe technologies (South atcompiler isaimed Samsung labwith the companies: international with Institute Currently, jointlaboratories thereare of atthe anumber areas are interest partners. that of to the in organize competencies training with young of specialists emerging areas system programming. of Also, help they resources available, in aswell asbuildingcompetencies allow sustainablefunding,they flexible the with planningof providedorganizing Having long-term been cooperation. Joint laboratories for forms atISPRAS are the used oneof provides highlevel research of results. research anduniversity centers inRussiaandabroad, that programs leading with educational inclosecooperation and scientific alarge of conducts number ISP RAS Institute’sthe innovations are research-based. ideas for partners new jointprojects andcustomers. with All newest technologies. Basicresearch source of isalsothe moving the activity inlinewith the elementsof necessary researchresearch, applied conducting andprototyping are and products ready for deployment. At sametime, basic the from generation ideas andbasicresearch of to technologies The Institute’s chain the elementsof presents model allthe

7 Catalogue of Technologies Intellectual property

Using its own technologies and existing back-up for basic research, ISP RAS has created a model that allows it to reserve all the intellectual property rights or transfer them within the framework of special agreements (for example, with the Free Software Foundation) to the community of free software developers. Taking into account the specifics of the Institute’s business model, an original license has been introduced. Rather than obtaining royalties, its goal is direct investment by the customer into further research aimed at the development of the technology.

Non-exclusive user rights are given to the customer, whereas all exclusive rights are reserved by the Institute. In specific situations, the decision on intellectual property rights management is taken individually, regarding the prospects for the long-term development of the research direction and the staff of the Institute as a whole. As an example of such an exemption can serve a contract with the Foundation for Advanced Studies (FPI), under which all rights must be transferred to the customer (FPI), as well the customer will be given the non-exclusive patent rights that belong to ISP RAS and are supposed to be used in this project.

Free software

One of the most important components of the created ecosystem is the wide use of free software (FS) — you can hardly imagine modern system programming without it. The Institute considers FS as: • a tool that provides legitimate free access to a wide variety of modern technologies, including ready-to-use software products, technologies and open standards; • an opportunity to interact with the global market of products and services, which allows innovative development instead of outsourcing; • a powerful educational resource: the environment and infrastructure of international FS-projects can be used to train highly qualified experts.

Scientific activity implies openness of a research result and «visibility» of the author of this result, which often comes into conflict with the corporate policy of IT companies. For ISP RAS, openness of research results (in particular, active use of the open source code model) is both working incentive and an instrument for promoting the Institute and transferring the technologies being developed. Openness leads to the fact that young researchers, even though working in a large team, are «visible» in the international community of IT experts. This contribution bolsters their reputation and assets which are enhanced by the Institute.

8 Education theses. semester-long training for preparation final the their of laboratories doafull-time internship atISPRAS, including universities. attheir courses Young the of specialists andconferencepublications talks, anddelivering modern development andresearch environment, working onjoint and research ISPRAS, projects with forming adistributed external the inindustrial labsparticipates The staffof at Yaroslav-the-Wise Novgorod University State (Russia). Yerevanat the is University State other (Armenia)andthe hascreatedISP RAS two external laboratories: oneisbased known methods. of problems boundaries beyond the mostadvanced technologies, butaswellthe solvingsuch solved asareal challenge. Itrequires of notonlypossession Research problem considers ascientific orataskto be professionals created ecosystem to the by Institute. the projects attracts isavery that seriousmotivation young Involvement studentsand postgraduates of inreal research now comparable to inhigh-tech salaries ITcompanies. receive year, ascholarship; from second the is asalarythat fromfirst year the trainingStarting of atISPRAS, students employees. staffandadditional official the teaching approximately in the process of samenumber the training takes strategy Institute courses, the involving the of of effectiveness soinorder are to prioritized, boost activities revised.existing have ones significantly Educational been adozennew have courses developed whilethe than been hasincreased more twice. than More activities educational employees of involved number the decade in the For research group, asarule. experience, agraduate degree program the of heads asmall course anddiplomapapers. Having such accumulated seminars students, andpractical with mentor classes their are actively involved teaching inthe process. They conduct new technologies. Moreover,and studyof graduate students practical of experience accumulation graduate isboth school Postgraduate training ISPRAS program. inthe Studying system programming fieldsof intheir research.specialists andwillhavehave publications already real scientific been Institute. graduation the Bythe time, manyof studentswill year, studentsalready real the inthe participate projects Institute’stopics onthe research second directions. Inthe the experts, seminars with andgetacquainted attend special first year training of ISPRAS, listen atthe they to lectures of to B.Sc. Institute workannually to ontheir the Inthe thesis. third-year 50-60 Economics. studentscome of School Physics andTechnology of Moscow Institute andHigher University,universities: departmentsatMoscow State leading innovation with .Integration ISPRAS ecosystem of ISPRAS the are cornerstone of the activities Educational

9 Catalogue of Technologies ISP RAS in figures

With a total increase in funding from 105 million rubles in 2005 to 674 million rubles in 2016, the share of contract work is about 70-80% of the total, consistently. And in the total volume of contract work, the share of work with Russian organizations has increased from less than 3% in 2005 to a half in 2016. At the same time, the average salary of researchers grew from 26,000 rubles in 2005 to 180 thousand rubles in 2016.

The number of staff researchers increased more than twice from 2005 to 2016, including 10 Dr.Sc. and 34 PhD. Given that, the share of young scientists in 2005 was just over 50%, and at the end of 2016, researchers under the age of 40 accounted for about 80% of the total number.

Employees of the Institute took part in more than 300 leading Russian and foreign conferences. Over a thousand scientific articles and ten monographs have been published.

Junior staff members regularly receive scholarships from the Government and the President of the Russian Federation. Also, two young researchers have been awarded medals of the RAS and prizes. Two staff members were awarded State Prizes of the Russian Federation.

ISP RAS publishes a journal named “Programming and Computer Software” that is included in the international science citation indexes Scopus and Web of Science (Core Collection).

Quick facts

The ecosystem created in ISP RAS is based on advanced approaches to the organization of research and development, including the widespread use of open source software and management of intellectual property. It has a high level of adaptability and dynamics, which enables adequate respond to the emerging technological and organizational challenges. It greatly expands the horizons of planning and allows building relationships with customers and partners on a long- term basis, significantly reducing the risks for both parties. This is important, since collaboration with organizations of the real sector of the economy is the key component of the Institute’s ecosystem.

It is impossible to achieve and sustainably maintain high technological level of research and development without such cooperation.

Since the establishment the Institute has accumulated considerable experience and has obtained world- class results in a number of advanced areas of system programming. The Institute has created its own technologies and instruments, implemented dozens of industrial projects, obtained a number of patens in the ICT domain, trained hundreds of students and postgraduates; staff members of the Institute have presented more than 50 Dr.Sc. and PhD 10 dissertations and brought out hundreds of publications. ISP RAS model. ISP RAS onthe activities Russian andworld andinto community allthe integrated whichhave successfully intoComputing, been the Verification Center for Center andthe Parallel andDistributed areas centers LinuxOS for -the competence specialized hascreated ISPRAS two consortia. groups international of askeynoteinvitations inexpert speakers andparticipation many conferences, international of committees their program inthe scientists ISPRAS inclusionof through the Inparticular, worldwidegained recognition. itisexpressed system programming fieldof centerinthe production has and scientific asaneffective ISPRAS The work of 11 Catalogue of Technologies Technologies

12 13 Catalogue of Technologies Smart detection of errors and vulnerabilities 1/17 STATIC ANALYZER

Svace is an essential tool of the secure software devel- opment life cycle, the main static analyzer that is used in Samsung Corp. It detects more than 50 critical error types. Svace supports C, C++, C#, and Java. Svace is registered in the National Software Unified Register, which is kept by the Ministry of Digital Development.

Why Svace?

1 Maximum flexibility Our team is based in Moscow and provides fast customer and adaptability for tailored deployment and short support turnaround. Svace Russian customers has no Russian competitors and offers the maximum level of convenience and efficiency for Russian users:

— Accelerated customization (configuring existing detectors as well as writing individual ones available exclusively to this customer; creating specific user interfaces); — Ultra fast adaptation to new environments and tools (adding new compilers within 1-2 weeks, in complex cases up to 2 months); — Continuous training of customer’s developers, regular interaction with the customer, providing technical product improvements with solving the customer’s new tasks during the whole secure software development lifecycle; — Flexible licensing terms tailored to the customer’s needs (in particular, the possibility to obtain the product source code); — Full compatibility with regulatory documents and requirements of regulators (FSTEC of the Russian Federation);

2 Quality at the level Svace is a constantly evolving innovative product based on of international years of research. It combines the key qualities of foreign competitors competitors (Coverity Static Analysis, HP Fortify, RogueWave Klocwork Static Analysis) with the unique open industrial compilers usage to provide the maximal support level for new programming language standards.

14 What isSvace architectures platforms and Supported audience? target Svace andSamsung

— — laboratories. Certification — — — — — — changes submittedforchanges review TizenOS. andinclusioninthe 2017, homeappliances. Since Samsung Svace checks all insmartphones,Tizen isused infotainment systems and onAndroidbased OS TizenOS aswell source as the code. 2015.since company’s to the Itisused check own software Corp. inSamsung analyzerused Svace mainstatic isthe ARM, ARM64, MIPS, MIPS64, Power PC, Hexagon. (version 2.6 andlater), Windows XPandlater. focus onhighreliability andsecurity; – – – – – – – – – by: Svace isdefined Target Intel /x86-64, code: analyzed architectures the of Host platforms foranalyzer:Linuxkernel the OS based developed software; of certification need that Companies atsoftware aimed developmentCompanies aspecial with Convenient warnings viewing interface: andhighspeed: Scalability analysis: deep High-quality analysis results hidingany migration runswith between review interface for marking trueandfalse positives; navigation; code with errordetailed description incremental system to supporting analysis inaddition tens of sizeof code ability to analyzesoftware the with parallel analysis usingupto cores; 64processor (60-90%). truepositives of high percentage coveragefull path calling function takinginto account an accurate (dueto source code the representation of issues previouslyissues marked asfalse positives. source code). modified recently the of aquickre-analysis (implies fullanalysis mode the hours); takesmillion lines 5-6 8 of Android (analysis lines of 6consisting millions of contexts for searching complex errors; integration any buildsystem); with 15 Catalogue of Technologies Supported Compilers

— For С/С++: GCC (GNU Compiler Collection), Clang (LLVM compiler), Visual C++ Compiler, RealView/ARM Compilation Tools (ARMCC), Intel C++ Compiler, Wind River Diab Compiler, NEC/Renesas CA850, CC78K0(R) C Compilers, C/C++ Compiler for the Renesas M16C Series and R8C Family, Panasonic MN10300 Series C Compiler, C compiler for Toshiba TLCS-870 Family, Samsung CalmSHINE16 Compilation Tools, Texas Instruments TMS320C6* Optimizing Compiler. — For С#: Roslyn, Mono. — For Java: OpenJDK Javac Compiler, Eclipse ECJ compiler, Jack Compiler for Android.

Svace Architecture

C / C++ / Svace Java IR

Interception Warnings

C# C# analysis Build system — syntax coloring and code navigation support; — warning review support (assigning true/false positive The analysis — lightweight status); intermediate abstract syntax — comparison of analysis runs with representation trees analysis; suppressing old false positives is created by — interprocedural own compilers analysis adapted from (context the open sensitive and industrial path sensitive toolchains. with symbolic execution) — tainted data analysis.

16 analysis of binary code The importance tool code analysis Static binary Program analysis Static binary codeanalysis tool — — — — — intra-procedural dataflow analysis inorder to identify graphs; graphs these are to used perform context-sensitive representation to used create control flow graphs and call architecture internal specialized independent into the code The analysis transforms tool executable andlibrary binary to conformneed to standards. qualityandsecurity libraries andauxiliaryprograms, distributed inbinaryform, andexternal itscode libraries. both External updating task of development cycle; software maintenance incorporates the is gradually more the andmore within becoming important only.binary code hand,software other maintenance Onthe librariesin agroup tasks; these of are distributed in often This istypicallydoneby using standard libraries specializing code. into their notions algorithms, andsimilarcommon andcompression complex dataencryption computations, developersSoftware face often aproblem incorporating of functions, buffer overflows, invalid dynamicmemory. usageof tables; or symbol for Windows; PowerPC, added); MIPSto be following the includes analysis Thistool features: tool. code we have binary developed Sciences astatic Academy of At for Institute System the Programming Russian the of Automatic defect detection: invalidAutomatic detection: format defect usageof string debuginformation analysis for without binarycode Support for various formats:Support ELFfor binarycode LinuxandPE for variousSupport platforms: Linux, Windows; for variousSupport architectures: processor x86, x64(ARM, 2/17 17 Catalogue of Technologies potential runtime defects and vulnerabilities. The context- sensitive analysis core automatically generates function specifications and propagates them through function call code points.

Defect detection

We currently provide automatic checkers that identify problems with format string functions, potential buffer overflow defects and invalid usage of dynamic memory.

Extensibility features

Our analysis tool provides an API for accessing internal code representation and models and can be used to design new checkers.

Internal infrastructure

Our analysis tool employs IDA Pro — a de facto standard in the field of program disassembly and reverse engineering — and additional tools (Google (Zynamics) BinNavi and BinExport) modified to our needs; these tools transform program binary code into REIL — an architecture-independent intermediate representation language. We are extending these tools in order to improve the efficiency of intraprocedural analysis, abstract interpretation, defect detection, tainted data flow analysis, PDG (Program Dependence Graph) generation and other methods. Certain extensions (e.g. x64 support for REIL transformation) were successfully released into the community.

Operational scheme

PostgreSQL Specification

Assembler Bin Export REIL Analysis code

IDA Plugins

Potential Binary code Defect

18 ANALYZER DYNAMIC 1 why Anxiety? functions of essential Combination

— — — — — — Wave CodeDynamics) TestingSynopsys Dynamic Security Application andRogue (CAleading globalanalogues Veracode Dynamic Analysis, allows Anxietyto technologies solve sametasks asthe the of analyzers combination andfuzzingtools. Thesuccessful dynamic analysis, whichinvolves integration static the with Anxiety’s feature approach special combined to isthe available. ordebugginginformation nosource code input datawith execution, whichallows generatesymbolic automatically software. andoperating the dynamic testing onthe Itisbased development, of process acceptance dangerous inthe cases Anxiety isaframework for errors detecting andpotentially network sockets, environment variables, standard inputflow). and expandneeds itsfunctionality; generator) allowing to to adopt system analysis components execution; dynamicsymbolic incaseof than program’s coverage code input datasetsmuchfaster with constants. Fuzzingusagemakes to achieve itpossible the with oncomparison transitions depend that conditional problems randomized whenitisfaced of passing testing with (fuzzing) to increase itsperformance. Integration solves userassertsandetc.); of violation infinite looping, divisionby(in particular: zero, dereferencing anullpointer, of Verification previously defects of detected by analysis static for target the program components the selectively. of testing directed analysis, of whichallowsfor implementation the execution method; functions; a fuzzer, aswell asfilters for inputdatastream andanalyzed integration distributed andparallel operation, with of modes Anxiety provides following features: Support of various of sources forSupport external program data(files, infrastructureModular (tracer, checker andinputdata forIntegration tools programs with randomized testing source analyzers ormachinecode of Integration static with dynamic symbolic onthe analysis based tools Creation of High level of analysis dueto performance support of errors to finding approachCombined

3/17 19 Catalogue of Technologies 2 Convenience for Anxiety is a cutting the edge development of the Institute for Russian customers System Programming of the Russian Academy of Sciences based on the results of long-term of research and intended for industrial usage. Flexible basic environment with the ability to fully adapt to the needs of the customer. The benefits are:

— Implementation of specific tasks of program analysis based on dynamic symbolic execution (in particular, determining the reachability of a certain function or operation in a program); — Ability to receive the alienable product; — Ability to be used for the implementation of interim requirements of GOST R 56939-2016 (in case if software certification is needed for deploying in Russia)

Who is Anxiety intended for?

— Companies aimed at software development with a special focus on high reliability and security; — Companies responsible for software audit or certification.

Where is Anxiety used?

The Anxiety tool is used for testing programs included into the OS packages.

Supported environments and tools

Anxiety supports an analysis in Windows OS (XP version and higher) and Linux OS, as well as the operation with various types of SMT solvers (STP, Z3, MathSAT, etc.). It is based on DynamoRIO dynamic instrumentation environments (instruction flow is processed by the Triton framework, and supports Windows OS programs analysis) and Valgrind dynamic binary instrumentation, which used for trace interception and automatic basic block coverage mechanism.

20 analysis dynamic Symbol Operational scheme Fuzzing Binary code and inputdata and metrics Input data Coverage Defects Defects analysis Data

and branching Input data Input data DynamoRio New input Fuzzer Valrgind Dyninst Tracer data PIN

и SMT-LIB 2

CVC4 CVC CVC STP Defects and Defects

addresses Combined Combined only Linux и Linux Windows generator Solvers route Data

and addresses Binary code

Input data SMT-LIB 2

MathSAT only Z3

constraints

delimiter DSE Route Path

Input data Dangerous operations restriction New path New path 21 Catalogue of Technologies Binary code 4/17 analysis platform­ based on emulator

QEMU is a full-system multi-­target open source emulator. It is widely used for software cross-development. Many large companies (e.g., Google, Samsung, Oracle) prototype and emulate their hardware platforms and peripheral devices on QEMU.

Open-source code allows extending Qemu features to use Qemu for: — creating new virtual platforms, — prototyping peripheral device models, — debugging OS kernel code, firmware code, drivers for emulated devices, — malware analysis, — recording virtual machine execution for later replay and analysis.

Remote debugging in the emulator

QEMU supports remote debugging of virtual machine through the GDB-compatible interface. Debugging service works within the emulator and does not affect virtual machine behavior.

GDB (open source debugger) can connect to the emulator via network sockets and inspect processor registers, memory cells, call stack, and so on. One can debug either application or kernel code in the virtual machine. Popular binary analysis tools and IDE such as IDA and Eclipse can also connect to QEMU for debugging and analysis of the virtual machine, because they support GDB-compatible remote debugging interface. 22 4/17 and replay execution recording Virtual machine Reverse debugging patches later will be included intopatches included QEMU mainline. later willbe snapshots for faster recovering paststates. the These of replay deterministic in QEMUuses and virtualmachine operations continue” these of commands. Implementation past.GDBinterfacethe “reverse includes step” and“reverse faster assumes program, itassumes because “rewind” to Reverse notrequire the debugging does of restarting past. bycaused inthe someoperations performed reasonsdetermine suchbehavior. the of failure Usually the is an error manifests itselforexception occurs. to Thenhetries program. where Developer startsdebuggingfrom point the Reverse the paststates of to used inspect debuggingmay be x86, ARM,andMIPSplatforms. allows recording andreplaying virtualmachineexecutions for replayDeterministic inQEMUiscreated by ISPRAS. QEMU system. guest the work they outsideof execution, because analysis donotalter tools bugs. program andother Debugger therefore for used may convenient volatile be debugging of andhardware instructions the of states) and sequence Every replay equivalent runproduces executions same (the emulator recorded may andreplayed. be programs. Every operating guest system by supported the applications, system kernels, firmwares, andmulti-threaded replayFull-system for user-level used may analysis be of network packets, serialandUSBcommunications. program.inputs to the inputsincludeuserinput, These executing andpassingpreviously CPUinstructions recorded hard states diskcontents. these Replay between proceeds CPU registers, memorycells, peripheral devices’ state, and program of sequence (orvirtualmachine)states including only recorded inputs. replay Deterministic reconstructs the samebehavior,reconstruct the program the uses because Thenallfollowing log. to inputsinto record runswill the these previously Thefirst recorded program inputdata. runisused replayDeterministic reconstructsprogram execution using facilitate andthus machine) run, debugging. replay provides aprogram stablereproduction of (orvirtual interactions graphical with interface Deterministic andsoon. execution, hardwareas multithreaded behavior, user is unstable:itaffected by “random” factors, such more issignificantly difficultifitsmanifestation Debugging moving times, backward multiple performed step-by-step. run itandtryto findfailure source. Thisoperation isusually “back intime”. To restore pastprogram state onehasto re- where anerror code actuallyappeared. moving Itimplies of to trace usuallyneeds Debugging fromfailure the line to the 23 Catalogue of Technologies Guest system analysis

Virtual machine debugging requires information about programs and modules location in memory. We have developed introspection mechanism which gets such information from virtual machines with Windows or Linux inside. Introspection can be used for retrieving:

— instruction execution sequence, — memory access sequence, — executing system calls, — created processes, — loaded modules, — file accesses

New platform and peripheral devices emulation

Emulating new devices and platforms in QEMU requires a complete set of documentation describing the instruction set architecture of the processor, memory map and peripherals. Every new peripheral device must be provided with its own documentation.

Development of a new platform in the Qemu emulator from scratch requires implementation of: — new virtual CPU and translator for its instructions into intermediate representation, — virtual memory management unit (MMU), — virtual peripheral devices, — new platform which integrates all of the above, — extension of QEMU interfaces for new devices connected to the real world.

Even when QEMU already includes implementations of virtual CPU, MMU, and peripheral devices, all of these parts need to be interconnected with virtual system buses into one virtual platform.

In case of lack of documentation or its incompleteness, virtual platform debugging becomes very difficult. Information about the platform may be extracted only from available binary code for the existing devices. Then code execution failures provide information about virtual hardware implementation flaws. Emulator development requires more efforts in this case, because binary code analysis is used to recover expected behavior of the virtual device.

We provide semi-automatic scripts on Python to simplify new virtual platform development. It provides declarative API for configuration description and graphical interface for making this configuring simpler.

24 1 Why OBFUSCATOR? OBFUSCATOR 2 necessary featuresnecessary the of combination The optimal customer needs with accordance in Fast adaptation will notpass The exploit — — — — — — — — and allows compiling the code of fullOS. of code and allows the compiling diversification code vulnerabilities of usingvarious methods protects systemObfuscator from the massexploitation of forindividual industrialsolution any customer: competencies, ourteam isabletonecessary create an transformations research. allthe Due presence of to the obfuscating onyears isbased Obfuscator code of code. madetochanges the software, common the rest willremainwith the protected by hacker devices Ifthe the bookmarks. isableto attackoneof vulnerabilities fromexploitation resulting of errors or to prevent technologies isasetof Obfuscator mass – ing OS correctly; compilers, GCCopen whichallows compil asetof Usage of engineers are required); efforts from customer’s oradditional the source code build maximum is8times; reverse the times, analysis). Theminimumdeceleration is1.2 to level (whenused protect performance and the against of by: isdefined Obfuscator challenges. new tasks and with product inaccordance the of adaptation program orother); indifficultcases); and upto sixmonth takes onaverage (adaptation toolkit 2months binary code Two diversification: of methods arrangement program the (nospecial Full automation of obfuscation degree of the between balance of Fine-tuning transformations adjustmentof andtechnological Continuous anddemonstration customer’sFree testing (usingthe The abilityto getacompletely alienableproduct; or compiler to aspecific technology the of Fast adaptation

Dynamic code diversificationDynamic atprogram code startup. Itis • similar products are following:the the degradation by 1.5%. about advantages Obfuscator over aslightincreasewith initsvolume andperformance allows you toThis method transfer code the upto 98%of (fordevices example, dueto mandatory certification). on all samecode the customer needs whenthe used Obfuscating down to the functions (as opposed to (asopposed down functions to the Obfuscating - 5/17 25 Catalogue of Technologies ASLR and Pagerando technologies that obfuscate only large blocks of code); — Obfuscating functions throughout the system, except for the kernel, and the absence of potential conflict with antiviruses (advantages over similar Selfrando technolovgy developed for the Tor Browser); – Static code diversification. During each compilation, depending on the specified key, a new executable file is obtained. The advantages of this method are the following: • the amount of binary code is not increased (which is particularly important for the Internet of things); • deterioration of performance tends to zero; • thanks to the operations being done inside the compiler, but not in the program linker post factum, an extended set of diversifying transformations can be applied and more flexibly customized. — Conflict-free combination with other software protection tools (including the ASLR system mechanism).

Who is Obfuscator intended for?

— Developers of specialized OS; — Application software developers.

Where is Obfuscator used?

ISP Obfuscator is implemented in the Zirkon OS, which is used by the Ministry of Foreign Affairs and the Border Guard Service of the Federal Security Service of Russia.

System requirements

Obfuscator is a universal product that can be adapted to any system requirements. The main version is currently running on a Linux-based OS (starting with version 2.6) with the Intel x86 / x86-64 architecture support.

26 Dynamic Diversification Static Operational scheme Diversification Compilation Standard GCC compiler GCC compiler Source code Source Source code Source Source code Source and modified and modified linker Errors diversification diversifying Executable and linker Data for Errors Static Static code GCC

GCC compiler andlinkerGCC compiler Seed Seed Seed diversifying Modified Modified dynamic dynamic Exploit loader Errors code 3 code code 2 code code 1 code Exec. Exec. Exec.

cutable code 3 cutable code cutable code 2 cutable code cutable code 1 cutable code × Nothacked × Nothacked Run of exeRun of exeRun of exeRun of ! Hacked Errors ­ ­ ­ ! Hacked Errors Executable Exploit Exploit code

27 Catalogue of Technologies 6/17 NETWORK TRAFFIC

Analyzes traffic, ANALYZER detects anomalies

Protosphere is a system of deep packet inspection (DPI). It is the part of intrusions and information leaks protection. Detects inconsistencies between protocol specification and specific implementation. Allows you to quickly add support for new (including closed) protocols due to the flexibility of the internal representation.

Why Protosphere?

1 Optimal Protosphere is an innovative system based on the research of combination network traffic analysis technologies. Combines key features of necessary of foreign competitors (Wireshark, Microsoft Network functions Monitor) with a universal internal presentation that allows you to quickly expand the analysis capabilities.

Protosphere is defined by: — Optimal system core capabilities: – universal model of data representation in the process of network traffic analysis; – processing of data containing distortions, losses, rearrangements and duplication of packets, as well as asymmetric traffic; – support of compressed and encrypted data analysis; – support of arbitrary configuration tunnels. — Support of all stages of network traffic research through synchronized tools: – localization of one or more investigated network connections on the graph of network interactions and the network flows tree; – providing details for the selected connections on the flow chart; – visual representation of the analysis results on the analysis tree; – diagnosis of inconsistencies between the protocol specification and the actual traffic in the malfunction diagnosis log; – extracting and analyzing data, including the application layer, by shared use of data content windows, a list of fragments and a list of objects. 28 — Fast expansion of the supported protocols list: Who istheProtosphere 2 Architectures Platforms and Supported for? system intended customer needs adaptability to Flexibility and

— — — — — — — — — — increased level safety of dueto mandatory certification. andpreventionintrusion detection systems); and network equipment); operating systems (includingembedded implementations resources);consumed detailandaccuracy between analysisa balance and of resources (flexible configuration system allows you to find format upthe fordata types, setting analysis results); new protocols, of extraction (support presentation new of analysis. mostconvenientthe way to the results present of the new protocols isaccelerated;verification of and base, implementation Due code the unified to the – – – Platforms: Windows OS, Linuxkernel OS. based Architecture: Intel x86-64. manufacturing an Companies needs that equipment (firewalls, network tools security of Companies-developers involvedCompanies network protocols intesting Ability for customer to the obtainanalienated product. for networkAdaptation channelandavailable computing internal Accelerated flexibility the customization dueto the of An advanced graphical interface allows you that to choose mode. operation: onlineandoffline of two modes Supports

the ability to debug the module underdevelopment module abilityto on debugthe the parsing errors; of localization analysis results to the API; access protocols introduction. live traffic, new whichallows significantacceleration of

29 Catalogue of Technologies Operational scheme

Control and management

Streaming traffic analysis — Online modules — Online core Protosphere source code Selection of anomalies New Modules protocols — Recognizers; support — Parsers Network traces System core — Module management — Parsing results management — Malfunctions diagnostics

Network traces analysis — Offline modules — Offline core

Interactive offline analysis

30 GRAMS OF GNUCPRO VERIFICATION OGY OFSTATIC THE TECHNOL 1 Why Klever? analysis capabilities and advanced High accuracy

— — — — — ispras.ru/projects/klever). software). isavailable (forge. Thetechnology access inopen but notjustanarrow programs classof (drivers orembedded differentbeing abilityto verify inthe any complex software, (for Microsoft SDV),as itsglobalanalogues instance while verifying large software systems. sametasks Itsolves the use. Itallows usinghighlyaccurate for formal methods Klever isaresearch-based intended for technology industrial particular, to itisused verify real-time the OS. softwareof systems GNUClanguage. developed inthe In security,to the thoroughly check reliability andperformance Klever advanced uses verification system tools that isastatic environment program of fragments) the of andgeneration models of to decomposition control the isused specifications set of refined verification (a constantly andelimination); detection errors of immediately afterand verify correction their the checked, whichmakes iteasy to verify various versions program the of being source code the to modify (no need language); GNUC inthe code of lines of thousands contain projects that allows appropriate you useof andanalyze to tools scalethe interface program to the checked); specific being safe of Cprogramming rules the the andcorrect usageof desired types); errors the allpossible of of (identification Klever by: isdefined Using the incremental process of refining the results of results of refiningthe incremental of Using the process different verification of program of versions the of Support verification static (modular formal of methods Scalability variousChecking requirements for program the (checking anyHigh-precision complex soundanalysis of software guarantees safety Omits nomistakes, - - 7/17 31 Catalogue of Technologies 2 High level of — Adaptation of technology to the needs of the customer. adaptability and Timely expansion of the list of detectable errors. Development convenience of specifications set for formalizing the program-specific requirements, as well as the environment modeling specifications and, in some cases, plug-ins; — Convenient multi-user web interface for performing static verification, as well as storing, analyzing and comparing results; — Ability for the customer to obtain an alienated product (after preparation, adaptation, development of specifications and search for errors). Training of customer’s developers;

Implementation experience

The Klever technology was developed as part of the Linux Verification Center (http://linuxtesting.org/) supported by the and organized on the basis of the Institute for System Programming of the Russian Academy of Sciences. Klever is currenlty used to verify various operating systems.

In the process of verifying the device drivers and subsystems of the Linux OS, the following results were achieved:

— More than 300 developers confirmed errors were detected: buffer overrun errors, null pointer dereference, use of uninitialized memory, repeated or incorrect memory de- allocation, race conditions and interlocks, leaks of specific Linux kernel resources, incorrect function calls depending on the context, incorrect initialization of specific data structures of the Linux kernel; — 50% coverage of the device drivers and kernel subsystems achieved. In order to look for these errors when executing various scenarios of interaction between Linux kernel drivers and their environment, Klever built a fairly accurate environment model (more than 20 of the most widely used driver interfaces, such as interrupt handlers and timers, USB and PCI devices’ interfaces, network and character interfaces);

Who is Klever intended for?

— Companies aimed at software development with a special focus on high reliability and security; — Companies that need certification of developed software; — Certification laboratories.

32 Operational scheme — refinementanddevelopment rules of — refinementanddevelopment of specifications of Adaptation rules of — selection program the — controlled buildof Preparing for launch — coverage evaluation by code traces errors— triageof of results the Analysis of specifications environmental specifications model of the program the of Source code code Source

— checking models for compliance with the the for with models compliance — checking program— generating of models fragments — generating environment models into fragments— program decomposition Automatic verification rules specifications rules and environment models Specifications of rules rules of Specifications

33 Catalogue of Technologies 8/17 TEST PROGRAMS

Verifies GENERATORmicroprocessors

MicroTESK is a reconfigurable and expandable test program generation environment for functional microprocessors verification. It allows automatically constructing test program generators for target microprocessor architectures based on their formal specifications. MicroTESK is applicable for a wide range of architectures (RISC, CISC, VLIW, DSP).

Why MicroTESK?

1 Sophisticated and MicroTESK is a technology stack for industrial use, which promising concept includes the basic modeling environment (it builds models of microprocessors based on formal specifications) and the generation environment (it builds test programs based on templates). Based on the tasks solved, it is close to its global analogues (Genesys Pro and RAVEN), however, it differs from them in increased productivity and usability, as well as distribution under the open source license.

It is free for access on the Institute for System Programming of the Russian Academy of Sciences website: https://forge. ispras.ru/projects/microtesk. In addition, a description of the technology is available at http://www.microtesk.org/.

MicroTESK is defined by: — Using formal specifications as sources of knowledge about the configuration of a verifiable microprocessor: – specifications of nML architectures (registers, memory and addressing modes, instruction logic, text/binary instruction format); – additional specifications of the memory subsystem on mmuSL (properties of memory buffers (TLB, L1 and L2), address translation logic and read and write operations logic); – the potential to move to a formal verification and to the generation of a set of tools for the microprocessor under development (disassembler, emulator, etc.); — Generation of test programs based on object-oriented test patterns: – test patterns in Ruby language (due to which the patterns are graphic and easy to support); – the possibility of simultaneous use of different generating techniques for sets of instructions and test data (random generation, combinatorial generation, generation based 34 on the resolution of restrictions, etc.); 2 Operational scheme Implementation experience System requirements the customer the convenience for Maximum — — — — — — ARMv8 andMIPS64microprocessors). (particularly, inindustrialprojects for verification of the industrialmicroprocessorsdevelopment modern of was projects inRussianandinternational for used the underdevelopmentMicroTESK 2007. has been since It EnvironmentRuntime version 8. Windows GNU\Linuxkernel OS orthe OS, based Java developers; restrictions, etc.); of resolution (random, onthe combinatorial, based methods complexquickly describing verification scenarios; (dueto formaltest situations specifications); extraction andautomatic minimal costs information about of – – – – Ability for customer to the receive analienableproduct. andtraining technicalsupport customer’sTimely local of Possibility to integrate awiderange different of generation Convenient language for developing allows test patterns that environment the forTimely setupof new architectures with Wide range microprocessor architectures: supported of Multi-core architecture target of microprocessorMulti-core is MicroTESK-based test program generators were features of various architecturessupport of of classes at generation the environment abilitytoscalability of (the supported. PowerPC, RISC-V; developed for sucharchitectures asARM,MIPS, VLIW, DSP); level generatorsthe of environment design (RISC, CISC, develop complex templates atlow dueto cost reuse). Verification Engineer environment MicroTESK generation Specifications Test patterns Translator Core Model Test programs Restrictions Extensions Generator Simulator 35 Catalogue of Technologies 9/17 HDL-DESCRIP- TIONS REVESRE ENGINEERING Static analysis of digital hardware TOOL descriptions

Retrascope is a tool for reverse engineering and functional verification of digital hardware descriptions. It provides automated tools for extracting and analyzing formal models from source code. The tool supports synthesized subsets of Verilog and VHDL languages.

Why Retrascope?

1 Combination of Retrascope is an extensible tool that allows you to develop the most important hybrid verification techniques for HDL descriptions by qualities combining various tools for analyzing formal models.

Retrascope is defined by: — Extracting formal models from source code and their visualization. — The following types of models are supported: – control flow graph; – decision diagram of guarded actions; – high-level decision diagram; – extended finite state machine. — Generation of functional tests for hardware modules (random generation, extended finite state machine bypass, bounded model checking); — Verification of formal models (model checking) for compliance with PSL specifications using external verification tools (NuSMV, nuXmv).

2 Convenience — Graphical user interface based on the Eclipse IDE (command for the customer line interface is also available); — Open source code (Apache Licensed Version 2.0); — Extensibility at the source code level (the ability to add new hardware descriptions or analysis tools); — Open interaction interfaces (SMT-LIB, SMV languages) allow using various model checking tools and solvers to achieve 36 analysis and verification goals. Operational scheme System requirements Implementation experience intended for? Who isRetrascope — — Runtime EnvironmentRuntime version 8. WindowsSoftware: OS orGNU\Linuxkernel OS, based Java PC.Hardware: IBM-compatible underway. research isatthe The tool prototype stage, development is of digitalequipment. of Research groups in the field of functional verification Research verification functional groupsfieldof inthe aimingto developCompanies digitalhardware; Flowcharts testing Internal representation Internal representation (GADD, EFSM,HLDD) (Modules, Processes, Tests generation HDL description HDL description (Verilog, VHDL) Operators) Models

verification Models Models 37 Catalogue of Technologies 10/17

Deductive verification of Linux kernel modules and security policy models

Software plays a key role in many systems, e.g., safety-, security-, and mission-­critical systems. Bugs in such software can lead to catastrophic consequences. As a result development of critical software is regulated by certification standards/guidelines (like DO-178С, ISO/IEC 15408, etc) that require following best practices in development process.

For example, ISO/IEC 15408 “Information technology — Security techniques — Evaluation criteria for IT security” requires including of the following activities: — formal security policy modelling (ADV_SPM); — formal verification of internal consistency of a security policy model; — formal proof that the target system cannot reach an unsecure state; — development of formal and semi-formal functional specification; — formal proof of correspondence between the security policy model and the functional specification; — formal proof of correspondence between different representations of target software like functional specification, design and source code.

ISP RAS has developed methods and tools implementing these activities. The approach has been applied for verification of security module of Astra Linux Special Edition developed by RusBITech.

The approach suggests using two specification languages with corresponding toolsets: — security policy models and formal functional specifications are specified in Event-B; — formal specification of critical implementation components is done in ACSL.

38 Event-B andRodin ACSL andAstraVer Toolset SMT solvers) aswell asperforming interactive proof. Rodin allows usingvarious provers automatic (for example, proved. to be generated need obligations To proof dothis, refinement correctness. Toactions; all fullyverifymodel the invariant axioms, invariants, preservation; well-defined guards, generates for requires obligations each casethat proof proof: anditsplug-ins. Rodin automatically Southampton) of Zurich, Systerel, Clearsy, University andUniversity Newcastle of Rodin platform source by license (developed underopen ETH Event-B are developed andverified usingthe specifications processes. verification andsupport development, refinement technique to simplifythe using the variables. Event-B specifications alsoallows usto decompose current the state by modify assigningnew valuesActions to the event states inwhichthe canoccur. of number reducing the valuesrestrict event the of parameters andmachinevariables event parameters, of Each consists guards, actions. Guards by ISO/IEC 15408-3, ADV_SPM.1.2C. a“safe” of a machineandformalizingnotion the state required internal invariants consistencyState ensuringthe of allow both constrained by invariants by andchanged events. isformed by variables means of specification whosevalues are part: variables, invariants, events. Thecurrent a state of carrier sets, dynamic axioms. the contain Machines constants, aspecification: partof static Contexts the contain contextsEvent-B of andmachines. consists specification contract specifications from mostlow-level,contract the specifications suchas supports languagethat programs specification behavior ACSLACSL Language). isaC (ANSI/ISO CSpecification languagecalled inaspecial are described to proposed be interfaces component of inClanguage specifications Formal features in operating system Clanguage used kernels. of As don’t verification tools allthe deductive support Existing example, interactive provers theorem andPVS. suchasCoq Vampire, E-Prover etc., for orrequiring userparticipation, many different automatic, tools: such asZ3, CVC, Alt-Ergo, checked for with canbe verification conditions, satisfiability formulae, logic These known as fulfills itspost-condition). willfinishanditsresult function then iscalled, when function holds given (ifaprecondition with properties accordance is equivalent source to program the correct in being formulae, logical into aset of whichgeneral significance checked properties of specifications annotated with code translation Csource onthe of verification isbased Deductive to specifications. used write partial can alsobe variouslanguage isenoughto fullyspecify functions, andit values”. these maximum of Expressive ACSL power of linked intvalues listof asaninputandreturnsnon-empty input”, to high-level, requires function a for example: “this requires function to avalid pointer intasan initialized “this 39 Catalogue of Technologies a result Astraver Toolset, a new deductive verification toolset, was developed by ISP RAS. The toolset is based on an open C program verification platform Frama-C (CEA-LIST, France) and deductive verification system Why3 (INRIA, France), and includes the following new features: — container_of construct support; — function pointers support; — expression-level support for bitwise arithmetic operations; — support for pointer type reinterpretation between integer types, incl. types of different size; — zero-sized arrays support; — String literals support; — Template specifications for standard library memory operations; — Control flow highlighting for verification conditions in Why3ide.

Operational scheme

Deductive verification of security models

Functional LSM-level Security Security policy model requirements Requirements Linux Security Module

Linux kernel Formal functional Formal design LSM Design API specification of LSM

Pre- and postconditions of LSM operations Custom LSM

Linux Specification of library functions kernel

Deductive verification of operating system components

Manual development

Automatic verification

40 Modular of Integrated development tools for Software Integrator Workplace Avionics System MASIW —Modular Avionics — — – – – – software, including: requirements developers; with for followingsponsible the tasks: System designers and integrators IMAsystems are of re (IMA) architecture. electronics systems onIntegrated based Avionics Modular real-timework is intended to of automate design aviation the AvionicsThe Modular System Integrator Workplace frame design of the IMA platform based on the requirements IMAplatform onthe the based for of design software the andhardware of andreconciliation clarification generation of configuration data for components of the the generation configuration of datafor of components developed for IMAsystem being model the verification of onrequirements based network of topology of design software amongavailable of applications distribution core IMA system. anaircraft of umentation anditsindividual components; project doc requirements inthe with defined compliance latency, etc; reliability, interfaces, consistency of delivery message network interfaces, etc.; applications, RAM/ROM of memoryusage, bandwidth the CPU cycles requirements scheduling count, for periodic required with (CPM),incompliance modules processing - - - 11/17 41 Catalogue of Technologies To solve these problems, a system integrator of IMA systems needs a precise understanding of all the details of the system being developed , both at high and low levels of granularity, as well as utmost attentiveness when tracking the consequences of changes in the IMA system architecture. At the same time, the size of modern on-board aircraft systems and the number of essential details is so large that it is impossible to keep everything in a single person’s mind.

In order to automate the process of IMA systems design and integration the MASIW Framework has been developed. The MASIW Framework is used mainly at design stages during development of IMA systems.

The current deployment of the MASIW Framework allows system integrator to perform the following tasks — Creation, editing and management of models based on AADL modeling language: – creation and editing of models using the text and diagram editors; – support for team research that would enable tracking and modifying individual elements of a model; – support for the reuse of third-party AADL models. — Analysis of models: – analysis of the hardware/software system structure: sufficiency of hardware resources, interfaces consistency, etc.; – analysis of data transmission characteristics of AFDX networks: message latencies, fullness of queues of the ports, etc.; – generation and analysis of fault trees (FTA) to determine probabilities of high-level fault events; – architecture-model based analysis of failures and their consequences, including generation of special descriptive tables; – simulation of hardware/software system model with generation of user reports including software-in-the-loop execution of on-board partitions with RTOS co-emulated with QEMU. — Synthesis of models: – distribution of software applications by computational modules, taking into account limited hardware platform resources and additional restrictions on reliability and security of the hardware/software system; — Generation of configuration data: – generation of schedules for processors (in particular, for ARINC-653 compatible real-time operating systems); – development of specialized configuration data tools, based on the provided software interface (API); – generation of configuration data for RTOS VxWorks653 and AFDX network equipment.

Creation, editing and management of models and configuration data are implemented using widely accepted extensions of the Eclipse environment, such as Eclipse Modeling Framework, Graphical Editing Framework, Eclipse Team Providing, SVN Team Provider, GIT Team Provider.

The MASIW Framework is modular and extendable. Third-party developers can extend the functionality of the toolset by creating 42 their own modules to customize it. Operational scheme hardware system hardware system configuration AADL-models The software- AADL-model in the form of formin the of hardware or the requirethe models and models files, AADL- the form of form of the default.xml vendors in Data from software for them mupd5_ libraries AADLib AADLib ments

­

Requirements refinement refinement MASIW Model Model Configuration AFDX Network REAL Checker software- hardware Analyzers data for system JetOS Results of model analysis model Results of PyCL Checker PyCL VxWorks653 IMA System Analyzers documentation Reports and FTA FMEA 43 Catalogue of Technologies 12/17

The technology of indexing, searching and analysis of large spatio-temporal data

Purpose

The rapid growth of information volumes, as well as the need for its analysis and interpretation, leads to the development of new approaches to the management of multidimensional data and, in particular, to the management of spatio-temporal data. Usually popular general-purpose database manage- ment systems provide spatial indexing and retrieval tools for such purposes, which successfully manage processing of static information, but are not adapted for data liable to permanent changes. In turn, temporal systems are oriented to work with the data that has a history of changes, but do not take into account spatial factors. The problem of data management is even more complicated when they are not just arrays of points in a multidimensional space, but complex structures, for example, a set of mobile objects with extended boundaries and imposed composition relations. For example, managing large-scale architectural and construction pro- grams often involves a visual analysis of millions of objects, each of which has its own geometric representation and exhibits individual dynamic behavior.

The technology developed at ISP RAS is intended to create promising software systems and services that operate large arrays of spatio-temporal data or dynamic scenes. The class of such applications is extremely wide and covers such sub- ject areas as computer graphics and animation, geoinformat- ics, scientific visualization, CAD/CAM/CAE, robotics, logistics, planning and project management.

The technology provides for the usage of original methods of spatio-temporal indexing, search and analysis of data, taking into account the peculiarities of their geometric representa- tion, complex organization and the predetermined nature of the dynamics. Support for a developed set of temporal, metric, topological and orientational operations ensures efficient exe- cution of typical spatiotemporal queries and solution of a wide range of applied problems related to qualitative and quantita- tive analysis of scenes. In particular, queries for reconstructing a scene at the given point in time, retrieving objects in the giv- en spatial region, finding nearest neighbors, determining static and dynamic collisions, and conflict-free routing in a global 44 dynamic environment are effectively resolved. Implementation — — — — random trees (RRT) andprobabilistic roadmaps (PRM). like planningalgorithms rapidly local using popular exploring mapsare verified andcorrected topological againstcollisions planning. Globalroutes using usage onpath obtained certed anditscon from 3D scenes representationtion geometric of informa onextracting metric andtopological based spatial, of is developed and implemented. Themethod ronment hasbeen for inglobaldynamicenvi navigation The new originalmethod differentfor with scenes complexity characteristics. strategydetection demonstrates uniformly high performance parallelepipeds, temporal Thecollision coherence methods. hierarchies andOBB volumes onAABB bounding based of onregular andkd-trees, based octrees of methods the decomposition usingspatial methods localization collision primitives, geometric between collisions of determination forprecise scenes. Thestrategy the the methods combines strategyA computational to isused determine in collisions calculi. directional is achieved throughalternative classical interpretations of extended with boundaries. This objects ble for analysis of the relation algebra oriented point and are(CDC), (OPRA) applica Freksa’s relative calculi, cardinal calculi orientation direction complexapplicable for analysis objects. of the andare operations allow implementation constructive 3D the DE-9IM,RCC-8, models known topological the son with RCC- coverage,intersection, overlap touch, Incompari orcollision. facts coincidence, the andestablish their objects of of cation calculated for objects. solid geometric canbe objects between planar projections, anddistances mutual arrangement. Diameter, area, volume, mass, centerof characteristics their andthe of objects geometric of erties events intervals. andtheir discrete stampsof to by respect time introduced Allenwith operations: The library following of the types supports queries. spatio-temporal to special library mostrational inthe way for related solvingapplications trees.tion Theconfiguration allow tools you to configure the volumebounding trees, cluster object trees, occupa space trees, binaryevent combines trees, decomposition spatial executing queries. applied Theadvanced indexing system operations, implementing and ing andcachingderived data, indexes,data andchanges, buildingandupdating calculat library the The organization of provides for tools managing functions. workapplications, inorder andexpand their to user optimize development new software andinlegacy the of applications in library both the used canbe virtualization, data access poral Due dataandexecuting to to typicalqueries them. interfaces spatio-tem for andrelated specifying methods in C++language, whichisanextensible classes, setof library isimplemented asanobject-oriented The technology Orientational operations generalizeOrientational known Frank’s the and Topological operations are relative intended to classifythe lo Metric operations allow you to individualprop determine the Temporal classicalinterval operations implementthe algebra ------45 Catalogue of Technologies Industrial application The technology has been successfully approved in the course of development of the Synchro software system intended for visual modeling, planning and management of large-scale industrial projects.

The functions of the system provide consolidation of project data and schedule, visualization of project activities, identification of collisions, the project progress monitoring, financial monitoring, preparation of illustrated documentation using a series of images and video materials. The graphic user interface of the system includes Gantt charts and synchronized views of tree-­dimensional scenes, resource utilization and earned value analysis plots.

A consolidated model of project data allows to take into account and to control various factors of the project activities. For example, implemented tools for solving project planning problems in the generalized formulation (Generally Constrained Project Scheduling Problem) make it possible to generate reliable and trustfulness schedules that take into account not only the imposed time conditions, precedence relations, resource constraints and calendar rules, but also specific requirements for spatial-temporal concordance of project activities, their financial and logistic support. Examples of such requirements are conditions for attracting investment funds, restrictions on the material supply chains, rules for deployment and use of equipment, particularities of mounting elements of erected structures, conditions for reserving work areas in project sites. These requirements are important for large-scale industrial programs, in which the risks of technological and organizational errors are extremely high, and deadlines and budgets are severely limited.

Currently, the software system has been successfully applied by more than 300 companies in 36 countries.

46 ANALYZER SEMANTIC BASIC 1 Why Texterra? of functions Unique combination — — — — — — Watson Natural Understanding Language level. IBM the projects platform of allows with the to compete technologies of combination Thesuccessful capabilities. search expanding system, significantly its Elasticsearch basic research andprovides abilityto integrate the the with Russian language. results of onthe Theanalyzerisbased differs from foreign by analogues predominant to attention instead justwords. concepts of of It identification on the Texterra Russiantexts performs auniqueanalysis of based software. Russian Register Unified of in the Itisincluded identification. texts applications. Itanalyzes usingconcept multifunctional forfrom technologies creating text. basicsetof Itisthe Texterra isascalableplatform for extracting semantics – – – technology); cloud volume originalISPRAS (usingApacheIgnite andthe etc.);MediaWiki, Linked Data, Open replenishment experts (automatic using Wikipedia, of texts); inRussian-language andconcepts emotions analysis of the notsupport Understanding,Language whichdoes projects,spaCy andUDPipe aswell asIBMWatson Natural full text analysis —approximately 13600words second); per coreference —10100wordssecond, resolution second, per analysis —39100words syntactic 000 words per second, per Texterra by: isdefined High text analysis accuracy key of dueto anumber features: and knowledge inword speed processing both Scalability involvement the knowledge base without Building the knowledge (more 7millionconcepts); Large amount of than toMaximum attention Russianlanguage(unlike similar analysis —69 (morphological High text speed processing Analysis of emotional coloring (with separation of attitude separation attitude (with of coloring emotional Analysis of toAdaptability slang,hashtagsanderrors; Multi-level search by related concepts; towards attributes); and their objects analysis Smart text

13/17 47 Catalogue of Technologies – Determination of the relationship of people and companies (based on information in the text); – Definition of implicit references to objects during discussions.

2 Maximum Texterra — is a high-tech product that combines advanced adaptability for scientific developments with the possibility of their industrial Russian use. Our local technical support works with maximum customers attention to the Russian customer.

The main advantages are: — High speed of individual solutions development; — Two use cases: – as an alienable product on the customer’s local server with access via both the HTTP protocol (REST architecture) and the RMI protocol; – o online at https://texterra.ispras.ru/; — Continuous training of customer’s developers, as well as innovative technological refinement of the product in accordance with new problems and challenges; — Simple and fast development of specific subject areas and the ability to integrate new languages for analysis (thanks to the modern approach to machine learning).

Who is Texterra intended for?

— Corporate software developers (chat bots in particular); — Developers of semantic search systems for specific subject areas (information security, medicine, auditing, etc.); — Developers of arbitrary text processing applications.

Who do we cooperate with?

Texterra was upgraded to the industrial level in the framework of cooperation with HP and Samsung (the goal of joint projects is to obtain technologies for analyzing corporate reporting and supporting the work of smart television). Currently, a number of original developments of the ISP RAS (in particular, the Talisman social media analysis technology) are working on the platform. Texterra is also used by a number of Russian government departments.

Supported languages

Texterra analyzes texts in Russian and English.

48 Operational scheme System requirements — — — language identification; morphological analysis morphological analysis of syntaxand analysis of with errorwith correction; Linguistic analysis Linguistic We recommend using64-bitOS. languages; analyzed At for the RAM each least of 16GBof Any platforms by supported Java 1.8; semantics. module:

relationships extraction; of mentioned concepts; mentioned of Information extraction key concepts identification identification recognition. module:

hashtags into account) (analyzes the opinions the (analyzes of social media users, media social of Sentiment analysis Sentiment taking slangand module module

49 Catalogue of Technologies SOCIAL MEDIA 14/17 ANALYSIS TECHNOLOGY

Analyzes everything, finds the essence

Talisman — big data processing solution for social and commercial information retrieval. It recognizes patterns in relationships by analyzing large graphs from hundreds of millions of nodes.

Why Talisman?

1 A unique combination A unique combination of features of features Talisman is an industrial solution integrated with a platform for semantic extraction (Texterra) and the original ISPRAS’s technology for data mining. Considering the technological level, Talisman is comparable to the world’s best analogs (Palantir Gotham and IBM Watson Content Analytics). Its advantage is the automation of routine processes utilizing the recent scientific achievements (reducing the cost of analysis).

Talisman is defined by: — The combination of essential features, specifically: – Semantic analysis utilizing the capabilities of the Texterra platform (sentiment analysis, work with meaning instead of terms which are unique for the Russian language, the ability to analyze users’ comments and identify implicit references to objects in discussions, etc.); – Analysis of large graphs consisting of hundreds of millions of nodes (including automatic construction of information distribution graphs with role definition: source, distributor, opinion leader, reader). – Automatic grouping of messages by topics (a map of all discussed topics in the information space, taking into account the flow between different resources); – Identification of true users’ attributes in social networks. Determination of gender, age (to within a year), education, marital status, place of residence based on the analysis of profiles and user activity (expandable list); – Automatic recognition of a target audience parameters (aggregation by demographic attributes and identification of dominant values); – Information validation tools (detection of bots, spam 50 filtering, and signs of audience opinion manipulation). 14/17 Application areas 2 Supported languages customers for Russian convenience Maximum

— — — — — — — — — — — — Texterra analyzer(RussianandEnglish). Talisman by recognized languages currently supports identification of saidcampaign’s of identification target audience. target aswell as opinionsof audiences manipulating employee of causes andcustomer grievances); monitoring); interests, leakage anddisclosure nonpublicinformation of onshort-term based incentives systems of andlong-term indeveloping assistance recruitment, dataverification, advertisement effectiveness; products; issues. tensions andgroups social addressing of hot-spot hotbeds e.g. target purposes), (for audiences marketing andpolitical domains (information security, medicine, etc.). auditing, customer’s developers duringimplementation); (Talismanequipment product); isacompletely detached customer: Talisman provides profitableoffers for of aRussian anumber andexternal ISPRAS collectors. the of technology originaldataacquisition the integratedcan be both with Telegram andDark messenger web resources. Talisman LinkedIn, etc.), (LiveJournal), channelsof blogs open Twitter, Facebook, (VK, Instagram, Youtube, Odnoklassniki, ISPRAS; the of originalcloudtechnology system usingthe the scalabilityof elastic stackandthe ApacheHadoop the of to thanks bigdataanalysis technologies publication the Recognition of informationcampaignsaimingat of Recognition management(inparticular,Reputation of determination management(efficient personnel of Optimization key of trendsIdentification andforecasting online publicopinions onorganizations, and of people Recognition interest analysis of media groups onsocial based Detection for andexpansion functionality variousFast adaptation of (includingtraining Russiantechnical support Local for mode; inSAAS Functioning onacustomer’sIndustrial deployment solutions of any bigdata:corporate,Analysis of news, networks social Reports onmonitored afew informationwithin minutes after 51 Catalogue of Technologies Operational scheme

Raw data storage

Tags for each Avatar(s) content unit Profiles of users Processed and groups Content information Friendship analysis: storage graphs text, images, Subscriptions video, Data collection graphs Aggregation and information and additional Posts on the extraction analysis walls Accounts Community analysis: friendship posts Applications Reposts graph, actions graph Interface: Web, API Likes Tags for each Messages from profile forums (with authorship)

Mass media messages

52 Why Lingvodoc? LANGUAGES DOCUMENTATION OFENDANGERED VIRTUAL LABORATORY FOR 1 for users opportunities Wide features essential of Unique combination

— — — — — — — — — —

within the framework TSU. the the ajointprojectwithin with of finished isbeing isogloss of forconstruction solution the improved.constantly Currently, development aunique the of functions. Thesystem isbeing necessary of ing anumber lingvodoc-react), research onscientific based andcombin gy (github.com/ispras/lingvodoc andgithub.com/ispras/ source, isanopen Lingvodoc cross-platform technolo Project website — lingvodoc.ispras.ru. and Tomsk University. State Under development 2012. since Sciences RussianAcademy of the of Linguistics of Institute received soundandtext the the Jointproject with data. layered work with andperforming scientific dictionaries endangered languages,­ of creating multi- documentation isasystemLingvodoc intended for collaborative multi-user Institute forInstitute System Programming (currently the RAS the of structures; dictionary for function ready-made there isanimport In addition, layer (lexical inputsand paradigms layers) andmulti-layer. pankki project). TypeCraft project); by avarietytionaries parameters of to asimilar (asopposed quent visualization; flac formats), aswell vowel asconstruct formant subse with dictionaries; aswell asbetween dictionaries cal inputswithin lands); forby MaxPlanckInstitute Psycholinguistics the (Nether ELANprogram onintegration the based aries with developed foreseen); project,(unlike where similarStarling suchwork the isnot by: isdefined Lingvodoc Work both with the involvement the Work cloudresources the with of of both any structure, of typicaltwo- Creating dictionaries both Increased asimilar Kieli level (compared with automation of conflict-free of The possibility two-way slow synchronization; Advanced search, whichallows you to search dataindic Record, play andstore annotated sounds(inwav, mp3and Arranging lexi links and bidirectional between unidirectional audioandtext workSimultaneous with anddiction corpuses usersSaving fullhistory actions; of users over of Collaboration vocabulary datareplenishment language experts GitHub for

------15/17 53 Catalogue of Technologies client-server architecture is optimized for the VMEmperor cloud infrastructure), and with the deployment of a local version with isolation of its own data; — Availability web viewing program and desktop version; — Open registration (with confirmation); — Operational improvement of technology for any customer with the expansion of functionality, as well as adaptation for another scientific branch.

Who is Lingvodoc intended for?

Lingvodoc is first of all intended for language experts who are engaging in scientific work in the field of documentation of endangered languages. It is, however, possible to adapt the technology for other purposes.

Where is Lingvodoc used?

Lingvodoc is currently used in joint projects with the Institute of Linguistics of the Russian Academy of Sciences and Tomsk State University. Negotiations are also underway with a number of research institutes.

Operational scheme

GraphQL HTTP Protocol Lingvodoc Frontend Linguist Browser web interface

react python Programmer apollo ruby redux c ++ javascript semantic UI lua wavesurfer C# leaflet apple swift GraphQL HTTP Protocol java lingvodoc backend

scala pyramid

Any language celery with HTTP support dogpile Using bash python 3.5 + and curl graphene

Using SQLAlchemy browser add- ons (such as Altair) C extensions

54 SYSTEM SEARCH EXPLORATORY 1 Why SciNoon? of essential functions essential of combination Optimal — — —

Scholar), but at the same time it possesses uniquefunctions. itpossesses sametime butatthe Scholar), Scholar, Google Search, (Microsoft Academic Semantic major globalanalogues sametasks Itsolves asthe the them. areasnew subject andmaintainawareness inany oneof term publications. Ithelpsto teamwork explore scientific with isaninnovative long- SciNoon system to optimize designed bigdata. user actions. Canwork with allows youparticularly to work inateam andkeep ahistory of searching results. of process andanalyzingthe It the uniquefeatures of anumber to Combines articles. optimize isasystem for researcher’sSciNoon scientific inquiryof – – – – – – – – – – teamwork:for successful by: isdefined SciNoon Review results accumulated viauser-friendly of interface: relevant of system for articles: Optimized collection scientific collaborativeA uniquefeature search supporting required of support for clustering.support semi-automatic takingboth articles selected of tabular presentation importance takingtheir individualarticles of visualization research to the specific to indicate possibility aspects the form a inthe of articles allselected of visualization graphcitation available for navigation. onpreviously based recommendations of service datafrom of differentaggregation sources (user- group the aware workkeeping of the allmembers about in the to allparticipants workplace accessible common metadata and indicated aspects valuesmetadata andindicated aspects into account; valuesand indicated aspects into account; articles; markupof corresponding task andthe graph scaling possibility; citation the with articles; selected Scholar); Google with usingabrowsermetadata collected pluginintegrated aswell fulltexts as articles with PDFfiles uploaded of by anintegrated them utilizing each of chatbot. of research group; scientific publications scientific Smart search for

16/17 55 Catalogue of Technologies — Big data operations support: – own graph model to represent knowledge of all articles, authors and ongoing research. Scaling to graphs of tens of millions of nodes by using a graph database deployed on top of Apache Cassandra; – scaling business logic by using Akka; – integration with Apache Spark.

Why SciNoon?

2 Maximum user — Two use cases: convenience – as an alienable product on the customer’s local server; – online at https://scinoon.at.ispras.ru. — Timely adaptation of technology and expansion of functionality for use in various subject areas.

Who is SciNoon intended for?

— Employees of R&D departments of corporations; — Employees of research institutes who need a tool for teamwork; — Teachers and students of universities engaged in researcher’s inquiry for the preparation of scientific works.

Operational scheme

Metadata Graph Information Deduplication of articles, User extraction and data authors and interface module cleaning conducted Articles researches in PDF

56 Main advantages data and analysis oflargespatio-temporal The technology ofindexing,searching 1 by three solutions: currently represented The complexis technology customized Openstack on the environment based Cloud

— — — — — — of cloud environments.of cloud intended fordeploymentvirtual machines. Itisparticularly the containers and usingboth resource-intensive calculations Provides abilityto store the dataandperform complex, – PaaS at the services level: – – – large with availableterm calculations resources: standardwhich isthe for buildinglarge cloudsystems; 2014. largewith available resources. Itisreliably operational since company. Dell with for Itisdesigned short-term calculations This environment iscreated framework inthe ajointproject of competencies); necessary all the System Programming Sciences); RussianAcademy of the of forfree Institute software developments the andscientific of standards, fullcontrol over open with useof itthrough the to recreate infrastructure the inanisolated environment detection); (CFD,classes program bigdata analytics, analysis for defect Provides to develop possibility the andimplement various forProvides short- necessary functionality allthe users with technology, open Openstack the Deployed basisof onthe whichpossesses technicalsupport, local Operational work of Technological ability (the solutions andalienabilityof security flexibility problem for to Adaptation buildsolutions specific Big Data Open Lab computer cluster Lab forBig Data Open bigdata easily expandable storage object onOpenstack based datastorageblock Cinder system (similar to onthe based virtualnetworksmanagement of and computer clusters analyzing with fully configured Apache Spark, Apache fullyconfigured ApacheSpark, analyzing with Swift (similarto AmazonS3). Storage); Block Amazon Elastic Amazon EC2); using Keystone, Neutron andNova systems (similarto tasks complex of Maximum simplification

17/17 57 Catalogue of Technologies Hadoop and Apache Ignite systems and an arbitrary number of computing nodes (starting one cluster takes about 5 minutes). It is publicly available (https://github. com/ispras/spark-openstack); – for artificial intelligence research using Tensorflow, Caffe, etc., as well as modern hardware (NVIDIA Tesla V100 servers on the SXM2 bus); – to work with HPC.

2 VMEmperor Designed in the Institute for System Programming of the virtual machine Russian Academy of Sciences for solving internal problems, management publicly available (https://github.com/ispras/vmemperor). solution Designed to manage virtual resources at the IaaS level. It has been continuously running on the XCP-ng / Citrix XenServer platform since 2012 providing users with easy access to virtual resources and their orchestration.

3 Fanlight web- Created as a result of the Institute for System Programming laboratories of the Russian Academy of Sciences participation in organization the «University Cluster» program and in the Open Cirrus platform international project (established by Hewlett-Packard, Intel and Yahoo!). It is intended for deploying SaaS infrastructures for web-based computing labs using Docker Compose. It is built on virtual containers and operates on the basis of virtual desktops in the DaaS model (Desktop as a Service). The platform is available for users on the fanlight.ispras.ru website and supports applications developed for Linux kernel based OS only.

— Demonstrates high performance cloud computing through the use of containers: – comfortable work with heavy CAD-CAE engineering applications that require hardware acceleration support for 3D graphics for complex visualization; – support for running MPI, OpenMP, CUDA applications by accessing HPC clusters, multi-core processors and NVIDIA graphics accelerators. — Expands computing capabilities at the PaaS level by engaging hardware resources (HPC / BigData clusters, storage systems, servers with graphic accelerators); — Allows you to perform customization for a given application area by integrating specialized design application packages. There is a particular experience of implementation in: – the MCC field: OpenFOAM, SALOME, Paraview, etc.; – the Gas&Oil field: tNavigator, Eclipse, Roxar, Tempest, etc. — Allows the user to work using any thin client (including mobile devices) without auxiliary software; — It can be deployed on a server, a computing farm, in the cloud (from IaaS level) or in its own data processing center.

58 Implementation experience Sciences. forInstitute System Programming RussianAcademy of the of however, Ivannikov ininternal the itisused projects of VMEmperor was inexternal notused commercial projects, (https://unicfd.ru). Mechanics Continuum Laboratory aswell Union State) asthe of the of potential hydrocarbon raw usethe materialsefficiently resource to (development increase technology of and Mathematics Keldysh Applied RRS-Baltika, of Physics, Institute OOO Research Experimental of Institute All-Russian Scientific laboratories RussianFederal the Nuclear with Center of jointprojects for deployment web- of the of a number Fanlightplatform the were of in used The capabilities Russian Federation. the of Science Ministryof the of participation the out with works of are anumber carried are implemented. Inaddition, samples of and automate regular assemblyandtesting the jointdevelopmentorganizing OS components process the TizenOS lifecycleand the infrastructure support allows that graphs technologies) analysis usingbigdataprocessing Android OS usingSvace). Huawei Ajointproject (large with (inparticular, Sciences Russian Academy of to analyze Ivannikov the of for Institute System Programming the of technologies other operation of andto support technology Talismaninformation flows inthe analysis media social computer cluster to Lab isused analyze The BigData Open 59 Catalogue of Technologies 60