Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

Central Processing Units for PLC implementation in Virtex-4 FPGA

M. Chmiel*, J. Mocha**, E. Hrynkiewicz***, A. Milik****  Institute of Electronics, Silesian University of Technology, Gliwice * (e-mail: [email protected]), ** (e-mail:[email protected]) *** (e-mail:[email protected]), **** (e-mail:[email protected])

Abstract: The paper presents an approach to the design and construction of central processing units for programmable logic controllers implemented in a FPGA development platform. Presented units are optimised for minimum response- and throughput time. The CPU structure is based on bit-word architecture and two types of control data exchange methods: with handshaking – control data are passed through the two flip-flop units with acknowledgement; without handshaking – control data are passed through the dual port RAM. Third unit – simple one processor – built to compare with the above two. The paper presents specific timers/counters hardware construction solution. Additionally it presents implementation results which show how many FPGA circuit resources are used to implement presented units. Keywords: Programmable Logic Controller (PLC), Central Processing Unit, Throughput Time, Concurrent Programs, Field (FPGA).

 independent tasks are often determined by their analogue or 1. INTRODUCTION binary nature, as well as process set of signals and control One of the main parameters (features) of Programmable conditions. This observation leads to the conclusion: bit-word Logic Controller (PLC) is scan time – execution time of one structure of PLC CPU well matches typical processed data. thousand control commands. Due to this fact designing and The CPU structure is oftentimes optimised for very fast logic construction of the CPU should have an architecture that operations and for execution of complicated arithmetic enables fast control program execution. It is a very important operation (including floating point). To benefit from task. The most of developed CPUs of PLCs delivered by described architecture both processors must work in parallel well-known manufacturers are constructed as multiprocessor as independent as possible. To make it possible, two units. Particular processor in such units executes the processors must be equipped with specific hardware and commissioned for it tasks. In this way one can obtain a unit, software solutions. which make possible concurrent operation of a few The most effective and natural approach to the problem of processors. For such CPU the main problem to solve is the task assignment is partitioning along the operation type (bit way of task assignment to particular processors and finding or word). The tasks operating on discrete input/outputs are a structure of CPU be able to execute of such task assigned in executed by a bit-processor (Getko, 1983). Nowadays such practice as it was shown by (Michel, 1990). The other processors may be implemented in programmable structures important problem inseparable from hardware are like CPLDs or FPGAs. It brings the positive effects in user programmatic tools. Those tools should enable easy and program execution time (the controller speed-up). On the efficient creation of control algorithm. The programming other hand a word-processor is built on the base of a standard toolbox should take benefits from all aspects of microprocessor or embedded microcontroller. It is used for multiprocessor unit. word data processing in control of analogue objects, numeric Apart from instruction execution time, the access time to data processing and operating system maintenance of the internal (markers, counters, timers), and external (inputs and PLC (networking, diagnostics, control loop) (Donandt, 1989; outputs) resources is a very important parameter. Another Aramaki et al., 1997). parameter which characterises PLC is throughput time. It is As it was mentioned above an efficient and most promising defined as the response time to the change of object signals. platform for control unit implementation is a platform based From the point of view of the object, this parameter is most on programmable logic devices. This platform may be based important, which describes the quality of control that is on Field Programmable Logic (FPL), especially Field directly derived from the central processing unit and Programmable Gate Arrays (FPGAs). System architects are programmatic toolbox (Chmiel, 2008). offered powerful tools which ensure acceptable financial and PLCs control mainly process of a binary nature. In some time outlays in comparison to effects. The FPL enables easy cases they are used for mixed control containing analogue prototyping, testing and evaluating different solutions. signals (Koo et al., 1998). There are a lot of objects where control can form independent tasks. The boundaries of

Copyright by the 7860 International Federation of Automatic Control (IFAC) Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

2. FPGA PLATFORM - HARWADRE AND SOFTWARE Programmer writes a program in form of instructions SOLUTION sequence. Compiler checks the syntax and splits the sequence of instructions into two streams. Those streams are later Large density FPGA devices offer a platform that enables compiled separately for each processor and written to the using different approaches to construct a PLC CPU. The CPU program memories of the particular processor. can be constructed from off the shelf CPU IP-Cores. It can be F F cores that are compatible with standard microprocessors or Bit b bB Byte Fb Procesor WRFbB RDFbB Procesor microcontrollers. An alternative is to design your own CPU Program Instruction FbB Instruction Program EMPTYF READYF TRF bB bB READ_FbB from the scratch. This can be designed to satisfy requirements Memory bB Memory of the application. Those requirements reflect on the Byte Procesor F F instruction set and interface operation. This approach is much bB B FB Standard RDFBb WRFBb Program more laborious but results seem to be more optimal. Instruction FBb Instruction READYF EMPTYF Memory TFBb Bb Bb WRITE_FBb

The authors have decided to use a development board with Bit Bit Byte Byte Procesor Procesor a Virtex-4 (Xilinx, 2006) to perform experiments. Data Processor Processor Data Virtex-4 logic resources are sufficient to implement and Memory Memory evaluate dedicated PLC CPU (processors and required Bit Byte peripherals) implemented in FPGA structures. It must be In/Out In/Out mentioned that implemented central processing units work in Modules Modules classical manner. The central processing unit executes instructions in serial-cyclic manner, in opposite to parallel Fig. 1. CPU with exchange flip-flop register block (Chmiel specific hardware processing of ladder diagram which is and Hrynkiewicz, 2008). possible in reconfigurable logic devices (Ichikawa et Bit Processor Exchange Memory Byte Processor al., 2006). RAM 1' RAM 1 In prior research works the comparison of basic structures were carried out (Chmiel et al., 2010). Three different Upadate memory Update memory state - UPDB structures were designed. VHDL hardware description state - UPD language was used for the design (Skahill, 2004). RAM 2 RAM 2' There following structures were evaluated:

 dual processor where one processor waits for the results Fig. 2. CPU with exchange memory (BlockRAM) (Chmiel et from the other; al., 2010).  dual processor with fully asynchronous operation From the point of view of experiments, possibility of execution (no synchronisation between processors – no introducing new commands is very important. The process of waiting for each other); developing new commands influences the processor  single processor executes bit and word instructions. hardware. This is because each new command means that new functionality must be modelled in a hardware description Ideas presented in (Chmiel and Hrynkiewicz, 2005; Chmiel et language. Finally the new structure has to be synthese and al., 2005) were used to build bit-word structure of CPU. implemented in the target architecture. Different ideas of concurrent execution of instructions, as well as the processor’s synchronisation mechanism based on The assembler program is able to process macros. Macros common data dependencies are presented in cited papers. The enable creating sequences of instructions that perform units with fully concurrent operating processors were used in specific operation (e.g. configuring I/O units or timer/counter experiments. Information between processors was exchanged units). Using macros simplified writing the programs and in two alternative ways: increase the level of abstraction.  by means of flags written to the flip-flops equipped with 3. PROCESSORS STRUCTURES a handshake mechanism; one flag is written by each processor and made available for opposite one for reading For experimental and evaluation purposes, three structures of (Fig. 1); central processing unit have been designed, described in  by means of exchange memory, which was implemented VHDL and finally implemented. Two dedicated processors in dual port RAM (Fig. 2). One side has full access to the have been designed: for bit operations and for word memory while opposite one is granted only reading. operations. The third processor has been developed as a general processor equipped with word and bit operations. It In order to exploit specific features of designed units, has not been equipped with additional hardware support specialised compiler was developed. Assembler was allowing for hybrid (bit-word) multiprocessor operation. recognised as the best language to build compiler. Assembler may be compared to ’ STL languages for S7-300/400 (Berger, 2001) and S7-200 (Siemens, 2009) PLC families.

7861

Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

Program Clock Program Memory Address Bus otherwise the instruction is skipped. The Ac_A flag is Counter Generator equivalent to RLO in Siemens controllers. The Ac_A stores the result of the last logic operation. The content of Data Memory Address Bus Ac_A can be loaded with use of appropriate instructions Instruction from exchange memory, arithmetic flags (carry, overflow, Decoder division by 0) or can be read from binary output of Stacks for a counter or timer. Skipping the instruction takes two bracket operations clock cycles; executing  Arithmetic-Logic Unit (ALU) - this 16-bit unit operates on Accumulators A and B. Operation result is always Auxiliary Accu_A Accu_B Registers stored in Accu_A as the default result target (single Timer Register address machine concept based on default result target). Time Base Register Timer Mode Register Comparators Presented approaches are flexible and offer a lot of Number of Modules Register Last I/O Address Register freedom for programmers due to rich instruction list. Input Number Register Timer Adrress Register A well designed set of transfer instructions allows to Module Data Register gather arguments and dispatch results. An additional Overflow Temporary Register Arithmetic Logic Carry Temporary Register Unit (ALU) benefit is to simplify the description of the controller in Instruction Register Carry Flag the VHDL language. Multiple arguments instructions Overflow Flag require a much more complicated hardware structure. The Buses Control Unit single instruction machine benefits from description simplicity. The new instruction can be relatively easily Exchange Memory Data Bus implemented. The ALU performs basic arithmetic Exchange Memory Address Bus Bit Operation operations like addition, subtraction, multiplication and Coprocessor Counters Address Bus division. Each arithmetic operation updates arithmetic Counters Data Bus flags (carry, overflow, division by 0). Logic operations Timers Address Bus (AND, OR, EX-OR) are performed with simple and Ac_A Ac_B Timers Data Bus inverted arguments. Flags can be transferred to Ac_A bit I/Os Address Bus accumulator and can be used for instructions conditional I/Os Data Bus execution. A stack has been implemented that supports nested expression and operation order. It is used for Fig. 3. Word processor block diagram. partial result storage. Stack implements 8 levels that is large enough for typical operation. When result is pushed 3.1 Word Processor Hardware Implementation on the stack from Accu_A it is also transferred to Accu_B; The CPU construction designed around a standard word  Compare unit is used to determine all possible relation processor has reduced performance of word operation in between Accu_A and Accu_B contents. The Boolean comparison to dedicated custom designed bit processor. An result of a comparison is transferred to Ac_A; attempt has been made toward implementing the word processor optimised for controller implementation. For  Bit operation coprocessor unit enhances performance of the processing unit, reduces the number of interface reference purposes features implemented in Siemens’ PLCs cycles and partially reduces the load of the bit-processor. series S7-300/400 has been used. In presented controller The bit coprocessor performs operations on Ac_A and families designers have assumed that each instruction can Ac_B arguments. The logic operations can be performed have only one argument. Above assumption (or design considering results of arithmetic operations. The Ac_A is constrain) requires implementing two accumulator registers the target register for transfer operations from Exchange for word operations. One of them is default (ACCU_A) Memory. Similarly to the unit that operates on words accumulator register. Taking into consideration experience of there is implemented a stack. The bit (Boolean) stack is Siemens and our own research the block diagram of designed used to maintain operation order and to calculate nested word processor is depicted on Fig. 3. The word processor expressions. The default logic accumulator is Ac_A consists of the following blocks: register. To improve multiple argument comparison  Instruction decoder is a part of the microprogrammable operation the arguments are automatically transferred control unit. It delivers control signals that are dependant from Ac_A to Ac_B; on executed instruction and machine cycle. Each instruction cycle consists of several machine cycles. The  Auxiliary registers store data required for proper operation of the unit and enable co-operation of shortest instruction cycle consists of two machine cycles. a processing unit with other system components. Each machine cycle is executed in one clock cycle. Each instruction can be execute unconditionally or conditionally. That is different to the processing unit from 3.2 Word Processor Instruction List Siemens solutions. The conditionally executed instruction is performed only if the content of Ac_A is True, The Fig. 4 shows schematically the data flow for arithmetic and logic operation on sets of bits.

7862

Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

3.4 Bit Processor Instruction List Accumulator_A

+, -, *, / Instruction list of the bit processor covers the following Accumulator_A operation: AND, OR, EX-OR Accumulator_B  Data transfer instructions allows exchanging data between I/O space, marker memory and inter-processor data memory; Fig. 4. Realization of arithmetic and logic operations.  Logic operations are performed on accumulator’s content. Result is placed in default location (Ac_a). Embedded Following operations are performed on 16-bit words: result stack allows for nested operation and for easy  Transfer instructions – transfer contents of word and bit maintenance of operation order; accumulators. Transfer operation can be performed with  Reading of binary/Boolean output of counters and timers; I/O modules, marker memory and inter-processor data  I/O space and process image memory configuration. memory;  Arithmetic operation performed on 16-bit arguments with 3.5 Single Bit/Word Processor ability of calculating nested expression or maintaining the order of operation through the embedded stack; The single bit-word processor has been implemented using  Comparison operation are performed on word components that enable implementation of word and bit accumulators while the result is stored in bit/Boolean operations. This architecture differs from typical accumulator Ac_A; microprocessors. The instruction list has been carefully selected to support operations performed by the PLC.  Logic operations on word accumulator contents. It also benefits from available stack that allows executing simple operations as well as implementing nested expressions or 3.6 Timer and Counter Hardware Implementation maintain in a order of operations; Timers and counters are accessed by both bit and word  Bit coprocessor instructions. Set of logic operations processing units. The access method to timers and counters performed on Ac_A and Ac_B bit/Boolean accumulator unit should be considered in order to achieve high speed registers. Result is always stored in Ac_A; program execution. Operation of this unit is controlled and  Counter and timer instructions allow for initialisation of maintained by the word processor. Results of its operations counters content, incrementing, decrementing and are used mainly by the bit processor. Typical task assignment clearing the counter or timer registers content; is as follows: word processor maintains operations of the unit while bit processor uses computed out results (Chmiel, 2008).  I/O space and process memory configuration. Proposed architecture solution minimises processor load for timers and counters servicing. The FPGA enables 3.3 Bit Processor Hardware Implementation implementation of dedicated timer and counter units. The timer and counter units have been designed to operate with The bit processor has been designed to perform logic single and dual processor units. The timers and counters are operations quickly. In order to unify construction of word and not an integral part of either word or bit processors, they are bit processor some description and design concepts have been autonomic units that operate under configuration control of borrowed from a previously developed word processor. the word processor. They only require initialisation for proper Using VHDL for design purposes enables flexible description operation. The initialisation procedure can be performed and easy modification in functionality (Skahill, 2004). during system start-up. Results of its operations (actual states General construction of the bit processor was derived from of timers and counters) are available for both processors the word processor with some simplification possible to the through dedicated bit outputs. specificity of logic bit operation. The main differences are: Clock  Accumulators size reduced from 16 bits to 1 bit. The Generator Ac_a register is the default target register for all Write operations; Address  Simplified ALU that is restricted only to logic operation. Configuration Word It should be called a Logic Unit (LU); Word Timers Bit Outputs Bit Processor Unit Processor  Reduced number of auxiliary registers; Control Signals  Bit co-processor has been removed as no longer required in this structure; Word Output Timer Reset  An 8-bit bus is used for data transfer purposes instead of 16-bits. Fig. 5. The connections between timers and processors.

7863

Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

The timers unit has been equipped with 16 timers called T0 to Apart from timers, the designed unit has been equipped with T15. Designed controller functionality available to users 16 counters called from C0 to C15. The operation modes are should be compatible with commercially available solutions. defined similarly to counters on commercially available Operating modes are identical to timers available in Simatic programmable controllers. Counters do not require a time S7-200 (Siemens, 2009). The following operation modes for base unit. Each counter can operate in one of three modes: timers have been implemented:  CTU – counting up;  TON – timer-on delay;  CTD – counting down;  TONR – timer on-delay retentive;  CTUD – bi-directional.  TOF – timer-off delay. Even though that counters unit is independent it should not be All timers can operate with resolutions of 1s, 100ms, 10ms considered as a fast counters unit. Its input is not directly and 1ms – programmed individually in time base unit. The connected to the controlled object. This unit is controlled by maximum count is 16383 reference signal pulses. the word processor. Each timer and counter has an individual triggering input and Programmatic control over the counters unit and connections output. This allows for direct access to timers and counters to the processing units is similar to the previously described reducing on the system bus load. This also yields timers unit. Counters, like timers require a starting up simultaneous access to the timers and counters by word and configuration. The configuration word consists of mode bit processing units. selection and initial state of the counter. Depending on the operation mode this value is considered as an initial value for All timers share common resources in the form of RAM that CTD mode or output toggle value for CTU and CTUD. store information about operation mode, time resolution and initial state. During normal operation, timers are updated The dedicated counters and timers units are possible to sequentially in 16 cycles. Each timer update cycle consists of integrate with the PLC CPU thanks to using a FPGA the following operations: implementation platform. Contrary to typical PLC where 1. Transfer of configuration memory content of processed those operations are implemented in software layer of PLC. timer to operating register. The presented hybrid method pushes intensive periodic operation to dedicated hardware. Only a small part of non- 2. The timer content is updated only if the time unit signals periodic actions like initialisation are executed by the CPU. about time interval passing or clear request flag has been set. 3. Based on update procedure result the output state of the 4. IMPLEMENTATION RESULTS timer is determined. After completing design and verification processes 4. The timer state is written back from operating register to processing unit has been implemented in target device. For the unit memory. implementation quality, the number of required resources Each timer unit requires configuration before placing in run was collected for each block. The target device is a Xilinx mode. The word processor transfers the configuration word. XC4VLX25 that belongs to the Virtex-4 family The configuration word contains a terminal pulse count that (Xilinx, 2008). The entire central processing unit with units changes output state and time base information. required for proper operation consumes about 17% (1841 Configuration word write operations are executed in two slices) of available logic resources. In the final central steps: processing unit there are some additional blocks which were 1. Configuration word is placed on the data bus together with not described: I/O modules, debounce unit, Flash memory the timer address. boot loader, and serial asynchronous interface. Table 1 gathers hardware requirements of the particular blocks, 2. The timer write enable line is activated that transfers the listing different logic resources by units like slices (general control word from the data bus to the timer unit. purpose logic components) and Block RAMs (full dual port The timer configuration word (Fig. 6) consists of 18 bits. The 16kb memories). 4 most significant bits determine the time base of the timer and its operation mode. The remaining 14 bits determine the During the design process and assembling the entire system, terminal count value. it was observed that some optimisations can be introduced and some functionality is replicated. To avoid replication, 17..16 bits 15..14 bits 13..0 bits common components can be shared by the entire system. Time Base Mode Setup Time This situation has been observed in the debounce unit that Fig. 6. Structure of timer configuration word. requires measuring 100ms time intervals. This signal is worked out in the timer’s prescaler unit. Instead of replicating After initial timer configuration, they are ready for normal the frequency divider, single clock prescaling unit with operation. The word processor can read the state of the timer multiple outputs can be implemented that satisfies the at any moment of time. The read operation is performed in requirements of the entire system. the same way as for other peripheral units. Bit and word units can read the state of binary output of the timer.

7864

Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011

ACKNOWLEDGEMENT Table 1. FPGA resource utilisation This work has been supported by the Polish Ministry of Component Slice Blocks Block RAM Science and Higher Education (5391/B/T02/2010/38). Bit Processor 226 (2%) 0 (0%) Word Processor 538 (5%) 0 (0%) REFERENCES Counter Block 192 (1%) 0 (0%) Aramaki N., Shimokawa Y., Kuno S., Saitoh T., Hashimoto H. (1997), A new Architecture for High- Timer Block 305 (2%) 0 (0%) performance Programmable Logic Controller, Proc. of Exchange Memory 296 (3%) 0 (0%) the IECON’97 23rd Inter. Conf. on Industrial Flash Memory Controller 125 (1%) 0 (0%) Electronics, Control and Instrumentation, IEEE vol.1, Receiver RS232 50 (0.5%) 0 (0%) pp.187-190, New York, USA Debouncer 310 (2%) 0 (0%) Berger H. (2001), Automating with STEP7 in STL and SCL – SIMATIC S7-300/400 Programmable Controllers, Digital Input Module 2 (~0%) 0 (0%) Siemens AG, Germany Digital Output Module 2 (~0%) 0 (0%) Chmiel M., Hrynkiewicz E. (2005), Remarks on Parallel Bit- Program Memory 0 (0%) 4 (5%) Byte CPU structures of Programmable Logic Counter Memory 0 (0%) 2 (3%) Controllers. In: Design of Embedded Control Systems, Section V, (Adamski M., Karatkevich A., Węgrzyn M), Timer Memory 0 (0%) 2 (3%) pp.231-242, Springer Science + Business Media Marker Memory 0 (0%) 2 (3%) Chmiel M., Hrynkiewicz E., Milik A. (2005), Concurrent Whole CPU 1841 (17%) 11 (15%) operation of the processors in Bit-Byte CPU of a PLC, Preprints of the IFAC World Congress, Prague, Czech Two processors were developed (bit and word) to test Republic, July 3-8 different configurations of central processing units. Most of Chmiel M., Hrynkiewicz E. (2008), Fast Operating Bit-Byte instructions of those processors are executed within 2 clock PLC, Preprints of the 17th IFAC World Congress (on cycles. The development board was clocked by a 50MHz DVD-ROM), Seoul, Korea, pp. 14810-14815, July 6-11 oscillator, equivalent of 40ns per instruction. All operations Chmiel M. (2008), On Reducing PLC Response Time, of the word-processor are carried out on 16-bit data. Bulletin of the Polish Academy of Sciences. Technical Sciences, Vol.56, No.3, pp.229-238 5. CONCLUSIONS Chmiel M., Mocha J., Hrynkiewicz E. (2010), A FPGA- Based Bit-Word PLC CPUs Development Platform, 10th The research and development works allow to obtain the bit- IFAC Workshop on PDeS, pp.155-160, Pszczyna, word dual core central processing unit. This unit was fully Poland, October 6-7 custom design created from the ground. The purpose of the Donandt J. (1989), Improving response time of design was to compare obtained performance with that Programmable Logic Controllers by use of a Boolean offered by general purpose microprocessors. In designer unit coprocessor, IEEE Comput. Soc. Press. 4:167-169, two different mechanism of inter-processor communication Washington, DC, USA have been implemented. The inter-processor data exchange Getko Z. (1983), Programmable systems of binary control in based on discrete flip-flops. The other one was based on the PLC, Elektronizacja, WKiŁ, Warszawa, 1983 (in Polish) dual port memory where markers are exchanged between Ichikawa S., Akinaka M., Kieda R., Yamamoto H. (2006). processors. This solution allow for fully parallel operation. Converting PLC instruction sequence into logic circuit: The experiments have been carried out on designed units A preliminary study, IEEE Inter. Symp. on Industrial after their implementation in FPGA. The use of the VHDL Electronics, vol.4, pp. 2930-2935, 9-13 July and high density programmable logic devices allows Koo K., Rho G.S., Kwon W.H., Park J., Chang N. (1998), designing, constructing and verifying completely new Architectural Design of an RISC Processor for constructions. They can be verified not only in simulation but Programmable Logic Controllers”, Journal of Systems with use of FPGAs in real working device. Presented Architecture, vol.44, no.5, Feb. 1998, pp.311-325. processing unit has its own instruction list that also required Publisher: Elsevier, Netherlands designing dedicated assembler. Michel G. (1990), Programmable Logic Controllers, Future work will be carried in following directions: Architecture and Applications, John Wiley & Sons, West Sussex, England  testing of developed units with different benchmark Siemens (2009), Simatic S7-200 User Manual, Wydanie IV, programs; Warsaw (in Polish)  improvement of structures in term of soft logic resource Skahill K. (2004). VHDL For Programmable Logic, WNT, efficiency and performance efficiency; Warsaw (in Polish)  new features implementation like other systems of Xilinx (2006), ML401/ML402/ML403 Evaluation Platform processor synchronisation mechanism, improvement of User Guide, UG080 version 2.5. www.xilinx.com, USA existing one, event driven calculations etc. Xilinx (2008), Virtex-4 FPGA User Guide, UG070, version 2.6. www.xilinx.com, USA

7865