News October 2019
Ladbroke Grove Resilience Back to basics what have we learnt and the digital railway fundamentals of train control systems back to business
Saying goodbye to summer is always hard but why not get back to business and improve your existing knowledge or start afresh as a signalling engineer. Here at Signet Solutions we've got courses to suit all! Our training school is here to offer excellent value for money, training to meet each customer’s requirements, and not to mention fully qualified and competent trainers to ensure each trainee’s needs are met...what more could you want? Take a look online at our courses coming up, our friendly staff will be happy to give you all the information you’ll need when booking a course.
+44 (0)1332 343 585 [email protected] www.signet-solutions.com back to business People resilience
Issue 259 This month’s IRSE News features a number of articles exploring the topic of engineering resilience. Resilience on the railway can mean the ability to keep the October 2019 railway running during equipment failure, or during environmental events such as bad weather, but it can also mean the ability to cope with changes in the workforce and the skillsets expected of them. Resilience also applies to human health with In this issue evidence showing that serious harm to physical and mental well-being can be caused by stress at work. A recent newspaper article claimed that the incidence of mental issues in the Feature articles construction industry is greater than average, with suicides occurring in extreme cases. There has been a rise in the number of people suffering from stress, anxiety 20 years after Ladbroke 2 and depression. Contributing factors include homesickness, job insecurity, financial Grove - where are we pressures, isolation and bullying. In many countries industry is structured around now? A personal view tiers of subcontractors competing with tight margins. Principal contractors may commit to protect workers, but this doesn’t always ripple down the supply chain. Rod Muttram The main causes of the distress appear to be loneliness, being on site sometimes a Solving the resilience 7 significant distance away from family for weeks at time, and working long hours. A problem in the union safety advisor on a major construction project was recently quoted as saying that the number of people going absent is similar to 50 years ago, only now it is with digital railway stress and mental issues not physical injuries. Tim Whitcher & Mikela Chatzimichailidou This directly affects our industry. We have people working long hours, sometimes away from families and friends and working hard to achieve tight deadlines. What smart railways can 14 Companies have to compete for the next project, which could be at home or do for smart cities overseas. Engineers and technicians may work from home far more than in the past, and travel long distances. They may not always have the security of a depot or office Frank Heibel with people to talk to, as they did in the past. Fundamental 16 Many companies are aware of the issues and are implementing various mental health requirements for a programmes, including staff trained as mental health first aiders. Companies need to train control system address the root causes of stress, not just treat its symptoms. We can all help as well, by looking out for our fellow workers and sometimes just picking the phone up and New interlocking systems 19 talking to a lone work colleague. introduced in the UK Paul Darlington Paul Darlington managing editor, IRSE News The Network Rail 22 digital long-term deployment plan Claire Beranek Cover story News October 2019 Saying goodbye to summer is always hard but why not get back to business and Engineers of the future 26 Looking west through the throat of improve your existing knowledge or start afresh as a signalling engineer. Jennifer Gilleece Jones Zurich main station (HB) at dusk on 19 March 2018. HB’s buffer stops are Here at Signet Solutions we've got courses to suit all! Safety and 28 some 420 metres behind us. The tracks Our training school is here to offer excellent value for money, training to meet each Reliability Society ahead lead to all compass corners. IRSE Swiss Section member André Rüegg told Peter Sheppard customer’s requirements, and not to mention fully qualified and competent trainers us about HB’s interlocking. A Siemens to ensure each trainee’s needs are met...what more could you want? SpDrS-SBB equipped with a desk- mounted track diagram and pushbuttons, Take a look online at our courses coming up, our friendly staff will be happy to give News and opinion its first workday was 15 May 1966. It monitored and controlled HB’s signals, you all the information you’ll need when booking a course. Ladbroke Grove Resilience what have we learnt Back to basics and the digital railway Industry news 29 points and 106.7Hz track circuits via train control systems a network of underground chambers 37 Your letters connected by foot tunnels. With the launch of Zurich’s S-Bahn in 1990, a ZN/ ZNL90 system began setting routes using trains’ numbers in the interlockings at HB operations centre at Zurich Airport for the +44 (0)1332 343 585 From the IRSE and nearby stations remotely. Scheduled eastern third of Switzerland took control movements at HB have grown from 993 of HB’s interlocking. SBB has announced [email protected] News from the IRSE 32 daily in 1966 to 3100 today. In 2014, an no plans to replace it. Professional development 33 Photo and caption by George Raymond www.signet-solutions.com 1 20 years after Ladbroke Grove – where are we now? A personal view
Rod Muttram
The UK rail industry was shaken by a head-on collision that occurred on the Great Western Main Line between London’s Paddington station and the west of the country in 1999. The event led to a lot of negative publicity for the railway, challenged perceptions about the safety of the network, and led to increasing calls for the application of automatic train protection. In this article Rod Muttram looks back over the past 20 years and considers what has changed since that tragic day. The tragedy that occurred near London Paddington Station in 1999 led to a changed At approximately 0809 on a bright approach to safety in the UK rail industry. sunny morning on the 5 October 1999 a Lord Cullen’s report ran to several hundred Thames Turbo train leaving Paddington pages. 20 years on, what has changed? passed signal SN109 at danger and a short time later collided head-on with an inbound High-Speed train (HST). What has happened since? improvement has come not from one The collision speed was in excess of Thankfully at the time of writing those ‘magic bullet’ measure but from a very 130mph. 31 people, including the two fatalities were the last in an Automatic large number of improvements working drivers, died and hundreds were injured. Train Protection (ATP) preventable in combination and sustaining all of them The damage to the Thames Turbo train accident in the UK and Britain’s railways requires constant vigilance. was so severe that when the Railtrack are now amongst the safest in the world. In the aftermath of the Ladbroke (the predecessor to Network Rail) zone Indeed, the single passenger fatality in Grove collision there were three Public director arrived on site he thought it was the Grayrigg derailment (when a high Inquiries: Ladbroke Grove Part 1 into the a two-car train. In fact, it was a three-car speed passenger train was derailed due specific circumstances of the accident, unit but the damage to the front carriage to a defective set of points on the West Part 2 into the industry safety structure was catastrophic – the diesel fuel in Coast Main Line) was in February 2007, and the Joint Inquiry into Train Protection its tank had been atomised resulting which means that it is now not only 20 Systems (jointly with the inquiry into the in a huge fireball which also engulfed years since there was an ATP preventable Southall Collision on the 19th September coach H of the HST. fatality but also over 12 years since there 1997). Those inquires drove many were any passenger fatalities in the UK I had been director of safety and improvements (as did the main part of resulting from a collision or derailment. standards at Railtrack for just two years the Southall inquiry) but it is important to That is an amazing improvement over and I and my team had been working also recognise that many, including the the historic position and everyone in the hard to understand and better control key Train Protection and Warning System industry who has played their part, big these kinds of risks; it was the event we (TPWS) were already being developed or small, has a right to feel immensely always feared would occur before we got before that and were scrutinised by proud. But I would suggest that there there, the British railway system is large the inquiries. is no room for complacency. The and change takes time.
2 IRSE News | Issue 259 | October 2019
AWS (the yellow magnet middle right of the photograph) and TPWS (right foreground) still work together across the UK rail network. Photo Shutterstock/TreasureGalore.
The Joint Inquiry into Train Protection Systems considered both 46 from the Clapham Inquiry had proposed it should be within TPWS and the European Train Control System (ETCS) as part of five years of system selection from the two piloted options. its deliberations and it was seriously suggested by many that I have heard it said that such a decision was immoral; that you TPWS should be abandoned in favour of accelerating ETCS. cannot “put a value on a life” and that because the technology Fortunately, the evidence given by people like Sir David Davies existed to prevent these kinds of accidents it should have been and me prevailed and TPWS was recommended as the short- implemented at any price. I would strongly contest that the term system with ETCS to follow. Time and time again we opposite is true. Be it for countries, organisations or individuals, see that the gestation and implementation times for these resources are never infinite. Given finite resources the ‘moral systems is very long. Implementation of the Automatic Warning and ethical’ thing to do is to have a transparent process which System (AWS), which still works in combination with TPWS, seeks to use those resources to deliver the best overall benefit; took over 50 years. in terms of safety investment to deliver the greatest reduction It is a matter for eternal regret that we did not get TPWS in in fatalities or harm. Otherwise money spent on most current or place in time to prevent Ladbroke Grove, but equally had ‘best lobbied for’ measures would leave nothing to be spent on a recommendation to cancel TPWS been accepted by the other things that could have saved more lives. inquiry we can now be reasonably certain that further lives The UK National Health Service wrestles with similar issues all would have been lost. It is worth remembering that the Joint the time in terms of funding new and expensive treatments Inquiry also recommended the implementation of ETCS on from a constrained budget. But it is wholly understandable that the East Coast, West Coast and Great Western Main lines by someone who has lost a loved one in an accident that could 2008. That was not achieved and eleven years on from the have been prevented, or whose sick child is refused a life- deadline ETCS is only just coming into UK service on the likes of saving treatment, finds such arguments hard, if not impossible, Thameslink and Crossrail. to accept. Nevertheless, good governance requires such TPWS success factors transparent processes whilst continuing to strive to reduce the cost of the measure or find another way to address the issues. Many people contributed to the development and deployment of TPWS, too many to mention all of them here, but I would like Peter Watson recognised that the risks associated with Signals to pay particular tribute to the late Dr Peter Watson (at that time Passed at Danger (SPADs) remained high because, whilst the the Engineering Director of British Rail) who I still view as the average risk supported the CBA conclusion not to extend BR- real ‘father’ of the project. In 1994 a report was published on the ATP, within the risk population there remained the risk of a large use of safety cost/benefit analysis (CBA) to rank and prioritise multi-fatality accident similar in consequences to Clapham safety investment. This was in response to recommendation (how prophetic that turned out to be). Sometime in mid-1994 48 of the Inquiry into the Clapham Junction accident that “The he invited me to his office in Euston House. He believed, and Department of Transport and BRB (the British Railways Board) I supported, that there was no ‘do nothing’ option. British Rail shall make a thorough study of appraisal procedure for safety and Railtrack agreed to jointly fund a package of R & D to elements of investment proposals so that the cost-effectiveness look at reducing SPAD risk and the SPADRAM (Signals Passed of safe operation of the railway occupies its proper place in a At Danger Reduction And Mitigation) project was born from business-led operation”. which TPWS and a host of other measures emerged. As BR wound down my Electrical Engineering and Control Systems The report had used the two BR-ATP (British Rail – Automatic Directorate, Railtrack took over 100% sponsorship under the Train Protection) pilot schemes on the Great Western and guidance of the pan-industry Train Protection Steering Group Chiltern lines as case studies and showed that they were much (TPSG) which I chaired. too expensive to be justified against other potential safety investments (some to 6 to 7 times more than the benchmark). One of the potential measures evaluated by what at that At that time there were many much less costly safety time was still BR Research on our behalf was enhancing the investments which were not proceeding due to lack of funding, functionality of the existing AWS system by adding a ‘train stop’ particularly in the road sector. When David Rayner, at that time and a ‘speed trap’ which could not be overridden by the driver director, safety and standards for Railtrack, and I presented in the way that the AWS warning could be, even for a red signal. the report to the then Secretary of State for Transport (Brian Ladbroke Grove data recorder evidence showed that the driver Mawhinney, now Lord Mawhinney) he endorsed the decision of the Thames Turbo overrode the AWS warning at SN109 that BR ATP should not be further deployed as recommendation which was at Danger and drove on. Exactly why we shall never
3 IRSE News | Issue 259 | October 2019
know; there were many potentially contributing factors, but the The Regulations did not mandate TPWS specifically but were fact that the warning for a stop signal could be overridden is written in a way that it was a permissible solution whilst leaving AWS’s ‘Achilles Heel’. room for something technically better. Except where the BR- ATP pilots were already fitted nothing else could have met the Such an enhanced AWS system would reduce risk even if not timescale. Shortly after the Regulations were enacted and one deployed at every signal, it did not need 100% deployment week into the Southall Inquiry, Ladbroke Grove happened. The to start to deliver benefits and the modelling showed that by size of the public outcry, stoked by those hostile to privatisation targeting the deployment at high risk signals such as those accusing the industry of ‘putting profit before safety’ (all of the controlling junctions it was theoretically possible to deliver 80% subsequent inquiries concluded that had no foundation, but it of the benefit of ATP for around 20% of the cost. The key thing has stuck in the public psyche) meant that Railtrack committed was to get the system level cost as low as possible. That meant to implementation in four years. Apart from a few problem areas ensuring simple and quick installation as well as getting the that was essentially achieved. component costs right. So, TPWS really had the wind behind it – it was designed to At some point in this process David Fenner, then the be simple to install, particularly to retrofit rolling stock; it met project manager, changed the name from Enhanced AWS the (then) cost criteria for being ‘reasonably practicable’ and (E-AWS would have risked ridicule) to TPWS. A performance anyway it was really the only credible response to a mandatory specification was written and put to competitive tender with a Regulation. Whilst installation was achieved in around five target cost. The competition was won by Redifon MEL, now part years much of the ‘ground-work’ had been done before the of Thales, and was based on a right-side door enable system Regulation was written, we had started the project another five they had developed for London Underground. It was a simple years before that. system based on electronic timers, with no complex processing. Loops in the track passed signal status to the train. Two loops Other factors co-located made a train stop, spaced apart gave a speed trap, Many other things have contributed to the improvement in with the speed set easily by the distance between the loops. safety performance that has been achieved. Other technical The electrical interface to the lineside signalling was simple and, measures include the Driver Reminder Appliance (DRA), also a critically for keeping the system cost low, the on-board unit SPADRAM development, which reduces platform starter SPAD was a simple bolt in replacement for the AWS relay box needing risk, and the work that has been done to remove or mitigate only a subsidiary antenna under the train, a small additional sub-standard signal overlaps. Better data analysis has led to control unit in the cab, and minimal additional wiring. Trains specific measures to reduce risk at identified multi SPAD signals could be retro fitted within a single overnight shift keeping the and control centre alarms expanded and made clearer and ‘disruption costs’ low. easier to distinguish, particularly SPAD Alarms. Critically the ‘numbers worked’; the projected cost met the The Railway Safety Regulations 1999 also mandated the end of CBA criteria for such a system, and we would have been able to Mk1 ‘Slam door’ rolling stock. The fleets of new trains brought justify implementation by a mandatory Railway Group Standard. in since the time of Ladbrook Grove have better braking with I really believe that the industry would have done this on its almost universal Wheel Slide Protection (WSP) and automatic own. Not without some arguments, and it would have taken sanders as well as improved crashworthiness if the worst does longer, but it would have been done. In the end that was never ever happen. And of course, we should never forget the people. tested. In the aftermath of the Southall accident HM Railway The UK has one of the best trained and most professional Inspectorate produced the Railway Safety Regulations 1999 bodies of train drivers in the world. The advent of ‘Defensive which were laid before parliament in August of that year. These Driving’ and the enforcement of speed by TPWS on the mandated fitment of some form of train protection within five approach to high risk signals all go to promote and reinforce years of their coming into force on the 30 January 2000. safe behaviour. Improvements still continue with Network Rail
4 IRSE News | Issue 259 | October 2019
In complex areas such as Glasgow Central station, pictured left, TPWS has added a great deal of complexity to design and test in order to offer greater levels of protection than was previously available. Photo Shutterstock/DRussell78.
recently implementing the ‘overrun management’ initiative mandate to conventional lines. The two Directives were later where a SPAD alarm triggers replacement of signals in the replaced and consolidated into 2008/57/EC. In the mid 1990’s vicinity – with the logic implemented in the SIL 2 control system we identified that for even moderately dense conventional and not in the interlocking, simplifying design and approval. lines the GSM-R radio communication from track to train had insufficient allocated channels to support operation in the It is also worth remembering that at the time we developed ‘circuit switched’ mode that had been envisaged for high speed TPWS the Department for Transport (DfT) recognised a higher lines. Given the number of communicating trains that would Value of Preventing Fatality (VPF) for risks that might lead to be within a communication area, particularly close to major multiple fatalities than for risks that might only lead to one termini, the system would need to use the General Packet Radio or two. VPF is the ‘benchmark figure’ against which safety Service (GPRS) (2G/3G data). It was not until 2015/2016 with investments are tested. If a measure costs less than the VPF for the production and release of ETCS Baseline 3 maintenance each fatality or equivalent fatality (an aggregation of injuries) release 2 and GSM-R Baseline 1 that EGPRS (Edge enabled it prevents it is probably worth doing; if it costs more, then it GPRS) formed part of the specification, rendering the compliant is probably not good value. Other factors should always be system really ‘fit for purpose’ in a UK network context. considered, and there should be an attempt to understand the uncertainties involved. VPFs are generally based on research The extension to conventional lines and the need for into ‘public perception of willingness to pay’ i.e. on societal compatibility with the operating rules of the railways of 27 preference. Later research concluded that a life lost in a member states with railways (and Malta theoretically gets a say multi fatality accident should not be valued more highly than even though it has no railway to be ‘backward compatible’ with) those lost singly and the ‘two-tier’ VPF was dropped in favour has driven a very complex set of requirements and thus a very of a single figure. The most recently published DfT figure is expensive and complex solution. £1 897 129 in 2017 prices. At the time we developed TPWS Further, early in the standardisation process there was a we were using circa £2 400 000 for multi fatality risk. With debate about whether the ETCS project should follow a ‘black the additional buffer-stop protection and extra speed traps box’, ‘grey box’ or ‘white box’ approach. I and some of the mandated by the 1999 Regulations that was probably exceeded. other railway representatives at the time favoured ‘black box’ Taking into account inflation the incremental investment for which would have mandated a standard architecture with TPWS would not now be deemed ‘reasonably practicable’ as interchangeability at the sub-system level enabling active £1.9M equates to less than £1.2M in 1999 prices. competition between the suppliers throughout the lifecycle. Despite that, I do not believe anyone would seriously suggest The suppliers favoured ‘white box’ with standardisation only that it was wrong to have implemented it. at the air gap interfaces which they felt would involve the minimum change from their existing products and allow them So, why do we still need ETCS? to preserve some of their unique features and advantages. We only ever intended TPWS to be a stop gap until ETCS, ‘Grey Box’ would have been somewhere in between with mandated by European law to be used for all significant standardisation of some of the sub-system interfaces. In the upgrades, became stable and readily available. As stated end the suppliers won and what we have is at best a ‘light above the Joint Inquiry into Train Protection systems grey box’ system. recommended fitment on the three major main lines by That leads not so much to an oligopoly as to a connected set 2008. In the event it has taken much longer for the ETCS of monopolies. Since every manufacturer’s system architecture specifications to mature than anyone could have predicted is slightly different and the physical size and shapes of their in the late-1990s. on-board equipment sub-systems are different, once a train has ETCS started its life as a common system for high-speed been designed for one supplier’s equipment it is very difficult lines: its use for those was mandated by Directive 96/48/EC to change. So, competition can only happen in a somewhat in 1996. It was not until 2001 that 2001/16EC extended the constrained way at the beginning of a fleet procurement.
5 IRSE News | Issue 259 | October 2019
ETCS and the cab signalling it brings are seen as the long-term solution Bombardier’s Elizabeth Line trains must integrate ETCS, TPWS and to improving UK rail safety Trainguard CBTC. Seen here in CBTC standalone mode. Photo Bombardier.
Because ETCS is complex, and provides continuous protection, ‘muddle along’ managing the obsolescence and other costs transition strategies that provide benefits incrementally (as was associated with a hotchpotch of systems and methods and the case for TPWS) are very hard to find so wide scale fitment then try to defend that when the inevitable happens. is needed before benefits are delivered. The success of TPWS • The manufacturers must play their part. I believe a move means there is very little incremental safety benefit to be had, towards greater interchangeability would not disadvantage so ETCS has to be justified by other benefits such as capacity them in the way they fear because the size of the market improvement. Whilst there is real pressure for increased would grow. The trend away from high obsolescence-risk capacity on many routes, there are also many that are still dedicated hardware towards architectures that support under-utilised and so making a system wide business case for commercial-off-the-self (COTS) implementations must ETCS is difficult. continue for both wayside and on-board. There has been a This list is not exclusive, but I hope gives the flavour that massive investment in the ETCS software and that is what unlike TPWS having ‘the wind behind it’, ETCS has faced some now needs to be stabilised and preserved so that eventually significant headwinds. implementation costs will fall as the past cost of software production and homologation are amortised. Nonetheless I believe it is the only way forward, because of the • The only really effective way to fight the skills shortages that following issues. undoubtedly exist for these ‘new’ technologies in the rail • Within the reduced overall risk there remains the potential sector is to start to create a consistent market by having a for failures with catastrophic consequences; particularly as steady and planned programme of deployments. demand on the network continues to increase. Consider a SPAD in heavy traffic at a non-TPWS fitted plain line signal Conclusion resulting in a rear end collision. If a derailment results, then 20 years after the Ladbroke Grove collision the UK industry’s a three-train collision similar to Clapham might occur. record on safety performance improvement is one to be Whilst with modern more crashworthy rolling stock the proud of but the recent tragic loss of two trackworkers near consequences would likely be much less severe there is still Port Talbot reminds us that we can never be complacent. The significant potential for loss of life. record on investment in modern protection systems is less • Is it really sustainable to continue to rely on the wide range impressive for a wide variety of reasons although much has of connected measures outlined above to secure acceptably been done in some areas. I believe it is time to stop agonising low SPAD risk? There must be a probability that one will over the business case, accept ETCS as the future system on a eventually fail. modern equivalent asset basis and plan a steady deployment process across the network. We owe it to those who died and • From a technical perspective the existing UK system were injured, and to all the families and friends affected by still depends on AWS which, certainly from a wayside failures like Ladbroke Grove not to let performance slip. With perspective, relies on 1950’s magnetic technology. The the capacity pressures on many parts of the railway a modern investment decision needs to be made on a ‘modern ATP system is the only sensible way forward and ETCS is the equivalent replacement’ basis, based on the cost of all of only ‘game in town’. Government, suppliers, train operators and the systems and measures ETCS would replace. When Network Rail must all play their parts to make that deliverable buying a new car you would not judge the reasonableness without waiting for the painful incentive of another tragedy. of its price based only on the new features it offered you over your old one. • ETCS is now really the ‘only game in town’. If the UK does What do you think? eventually leave the EU and try to have its own standards the volumes are too low to be attractive to multiple What is your experience of the challenges, and solutions, manufacturers and by the time the development cost was described above on your railway, in your country or in your amortised little or nothing would be saved, it might even be company? Is the UK following the only available path? Was more expensive. The only real alternative is to continue to TPWS an appropriate stop-gap. Email [email protected].
6 Solving the resilience problem in the digital railway
Tim Whitcher Mikela Chatzimichailidou
Designing safety critical systems like standards are deficient or non-existent. In doing signalling and train control is arguably the so, we put system risk engineering back in the “Both human most important job on the modern railway. hands of the signalling engineers. and automated While some physical infrastructure and rolling stock continue to rely on tried-and- Introduction agents are tested designs, signalling is undergoing a Railway systems are comprised of complex responsible for transformation with the introduction of what mechanical, electrical and electronic systems ensuring the is commonly referred to in Great Britain as with many moving parts. Within this highly integrity and the Digital Railway (DR). technological and engineered accumulation of systems, both human and automated agents efficiency of the Our networks are increasingly congested, with are responsible for ensuring the integrity and system” more and varied traffic types. As we move efficiency of the system. Therefore, along with toward a dynamically controlled model within the physical system (the technology; rolling this complex ecology, our ability to manage stock; infrastructure etc.), railway systems also complexity and to assure the systems to the incorporate operational issues and services, same level of safety integrity must evolve with such as timetabling, pricing and integration with As the digital railway system of systems the technology. other modes of transport, as well as system gets more complex, To achieve this, we need to facilitate a new properties which emerge from the interactions engineering resilience management approach that incorporates a more between elements of the system. Typical into the solution to keep targeted level of risk modelling. By progressively examples of emergent properties include: safety, trains running becomes assessing system design – using data collected customer experience and reliability of services, increasingly business- across the system lifecycle – we can continually financial viability,resilience, sustainability and critical. Photo Shutterstock/ refine a design and test its resilience, making carbon footprint. VMCgroup. enhancements as well as corrections where
7 IRSE News | Issue 259 | October 2019
Figure 1 — a complex What do we mean by ‘resilience’? element denotes one of the controlled processes socio-technical model. Resilience for our purposes is the ability of the of the system. A control loop, formed by the railway system to respond to change, disruption elements with the letters ‘C’, ‘A’, ‘S’, and ‘CP’ on or challenge in such a way so as to continue to them, depicts a fully automated part of the system. provide a suitable level of service as an output. The parts of the system where humans exercise indirect control over the controlled process, “Resilience is Railways can be considered as a system-of- systems (SoS) [1], the overall performance of with an automated controller in the middle, are the ability of the which depends on factors that include network denoted by the bidirectional arrows between the railway system regulation, infrastructure and rolling stock stick figures and the elements with the ‘C’ letter, reliability, organisational safety management and together with the rest of the control loop, which to respond includes the controlled process. to change, human factors. And the railway SoS is changing. Critically, modern railway systems are moving Finally, the parts of the system where humans disruption or from a distributed system of local control points have direct control over a process are denoted by challenge ...” to being increasingly distributed only at the data the bidirectional arrows between the stick figures collection end, and progressively aggregated and the elements with the ‘CP’ label. In Figure 1, toward the central hubs. For train and traffic control entities are designated by the stick figures, management, these are the Rail Operating Centres as well as by the elements with the label ‘C’. The (ROCs), and as these become focal hubs for the control entities located at a specific hierarchical routes, the ancillary systems and services are level enforce safety constraints [4] on their likely to follow, putting the right people in closer controlled processes, which include other control proximity to each other, in imitation of the data. entities located at the lower hierarchical levels. In “Railways can This will facilitate increased and more efficient Figure 1, for instance, there are three hierarchical be considered networking, helping to reduce and remove levels: L0, L1, and L2. as a system-of- existing system lags. The system parts are organised in hierarchies and linked to each other with control actions, early systems” Figure 1 [2] shows a generic architecture for a SoS. If we look more closely at Figure 1, we warnings, and/or information feedbacks that strive can see the hierarchy consists of many control to keep the system in equilibrium. The complex loops. A control loop is the elementary part of links of responses and feedbacks are important the SoS model as defined here. The ‘stick figure’ because the system exhibits a dynamic behaviour icons indicate humans (individuals or teams) that that stems from the interactions between the control various system processes. The technical system parts. A system exhibits resilient behaviour components of that system are the elements by adapting to changing conditions in order to marked as ‘C’, ‘A’, and ‘S’, standing for automated maintain control over its properties, such as safety controllers, actuators, and sensors, whilst the ‘CP’ and performance.
8 IRSE News | Issue 259 | October 2019
Once digital railway Definition of the problem interference and communications overload. infrastructure is in place These issues will be exacerbated if not properly it is possible to make The DR can revolutionise the way we manage decisions about how our networks. It presents the opportunity to managed as part of the system evolution. Train to deliver customer integrate systems and data, which enables the control becomes even more complex as we add experience based on any decision-makers (whether human or machine) automation on the main line. For example, many available data sets. to control the networks based on real-time data automated Traffic Management (TM) conflict Photo Shutterstock/ resolutions may still require the operator to make Connel. from multiple sources, to perform train and traffic management functions. the final call, but do not present the logic behind the decision – for the first time, the operator is out But it does not stop there. Once the infrastructure of the loop. When we consider the integration of is in place it can be extended to include smart this with other transport planning systems used ticketing, passenger loading data, weather by the TOCs and other operators (such as road or monitoring (and the effect it has on ridership, maritime), the degree to which the train control or customer dispersion along a platform, for system is not just safety critical but mission critical example) and any other data-driven input. If comes sharply into focus. “How do we you can collect, codify, analyse and present ROCs consolidate a lot of control and data in one build resilience the data, you can make network regulation place – making each ROC a single point of failure decisions based on it. into this new for each route and presenting the challenge of ecology?” All the systems in the DR ecology are interlinked ensuring that a failure does not become mission and interdependent in a similar way as shown in critical to an area potentially encompassing 20% Figure 1. As the adoption of digital systems and or more of the country – for example think of the associated working practices increases, so too will chaos that would ensue in the UK if London North the centralisation of data aggregation and usage. East Route goes dark! Points of failure will also aggregate, becoming Decision making is now based on a more complex fewer but having greater impact. array of information and, as mentioned, without As the ROCs take over each route, the network the operator fully understanding the logic will move from a distributed control network of behind it; so, as reliance on that system builds, some 800+ signal boxes across the UK today, to a the operator’s ability to instinctively respond to truncated control network of 12 ROCs. Each ROC problems will conversely reduce. will therefore cover a greatly increased area. “Decision Furthermore, all this takes place in an era So how do we build resilience into this new DR of intense media coverage, where network making is based ecology, and how do we prepare it for rapid performance is increasingly scrutinised by a on a more service recovery in the event of failure? despondent public. Despite falling reliability rail in complex array of Proposed solutions the UK remains a key means of moving people and information” goods en masse; and failure carries more impact The change in ecology and the shift to digital than before, with greater economic consequences requires a little more exploration. Transport for lost transit, fines and damages. systems by their nature are complex – and they integrate into a much broader urban The shift from a conventionally signalled and environment, which brings challenges around controlled ecology to a digital one changes the EMC (Electromagnetic Compatibility), third-party approach to system performance assurance in a fundamental way. In the current ecology, systems
9 IRSE News | Issue 259 | October 2019
RiskSOAP Test Case Description Conclusion Aviation: This application was the first attempt to quantitatively Decisions made during design led to an increased mid-air express the positive correlation between risk risk during operations, thereby reducing the collision accident [1] awareness and system resilience. RiskSOAP showed resilience of the design, which subsequently that the system as-designed (i.e. target system factored into the cause of the collision. before the accident) was able to retain and process more risk related data compared to the one involved in the accident. Robotics: The values of the RiskSOAP indicator generated for It was proved that resilience is not a static system design of robotic the different system designs reflected the system property, but (in this test) improved along with the platform for dynamics. The values fluctuated every time the application of system safety recommendations domestic use [3] system design changed. Design changes made in mainly on the human-machine interface. relation to the capability of the system to be risk Resilience and safety were positively correlated, aware were correlated to equivalent levels of safety whereas resilience and reliability were not. performance. The system design was changed as safety requirements were added to ensure that even if the robotic platform fails (i.e. it is unreliable), the whole system (human-machine interfaces) will maintain its resilience and assure human safety. Highways: RiskSOAP defined a tunnel design with risk awareness The RiskSOAP indicator was used as a selection maintenance of capability over and above that of designs that comply criterion and decision-making tool to select road tunnel [6] with the EU and World Road Association (PIARC) and make modifications in the road tunnel directives. RiskSOAP was key in determining the final under maintenance. The indicator showed that design as it led to the identification of system safety maintenance improved the resilience of the requirements, which were added to the new designs system, as well as its safety. on top of the EU and PIARC recommendations.
are largely isolated from the outside world, using RiskSOAP [3] is a system assurance method Table 1 — RiskSOAP tests. dedicated fibres to communicate often using paired with a system performance indicator. It parochial railway-only protocols. The servers and is a comparative method that systemically and equipment sit in isolated equipment and control systematically produces results by measuring the rooms and can only be accessed by authorised difference between ‘work-as-done’ (i.e. system personnel. There are minimal cyber security as-is) and ‘work-as-designed’ (e.g. target system). threats and the system integrity is largely secure. RiskSOAP test cases are illustrated in Table 1. The new paradigm relies on digital systems with The RiskSOAP methodology is founded on three a completely different architecture, increased approaches, combined in a unique way. These are exposure to cyber threats and inherent resilience performed in the following order: challenges leading to increased risk; assuring this 1. System-theoretic process analysis (STPA) [4]. is now a key imperative. 2. Early warning analysis based on the STPA In building our new model of network operation (EWaSAP) approach [5]. we need to consider new approaches to 3. Binary dissimilarity measure to depict the communications: distance between the different system • Physical security at all sites needs to be configurations. assured and guaranteed. The methodology guides system designers “RiskSOAP has • Servers need to be encrypted. and engineers in collecting meaningful data, been tested • Communications links need to be turning them into information (e.g. requirements high bandwidth, low latency and and specifications) to calculate the system for road tunnel continuously available. performance and resilience. Moreover, by applying designs, aviation the method to different intervals throughout • ROCs need to be able to communicate, in the entire system life cycle, we can maintain and robotic real-time without performance interruption. situational awareness, predict the future system platforms” Defining a resilient system states and adjust operations to maintain the It is important that we are able to collect the right system performance at an acceptable level and data, so that we can perceive and comprehend secure its resilience. complex system risks. Data can unveil unwanted RiskSOAP has been tested for use in such areas deviations that indicate the presence of threats as the update of road tunnel designs and man- and vulnerabilities of the system, and therefore machine interfaces in aviation and robotic help us predict the systems performance in platforms. It has caught the attention of Network terms of resilience. Rail and Transport for London in the UK, as well as being reviewed by experts from industry and academia around the world.
10 IRSE News | Issue 259 | October 2019
“Disaster Disaster recovery sites system. Greater use of situational awareness recovery systems Disaster recovery is the process of bringing the and workflow tools will help manage response times, and clear action plans will help to are expensive system back on-line after a major or catastrophic failure. This capability is a fundamental component prevent misunderstandings through different and rarely used of ‘resilience’. The objective is for a community interpretations of events. in rail” and its economic functions to recover quickly and One of the advantages of the DR ecology is soundly, after a major event. that it delivers efficiency gains that enable us to Highly integrated systems, such as the aggregation turn our attention to continuous improvement. of network command and control into a single By virtualising our systems we are able to ROC, present a big locus of impact should one work on them remotely, design, build and test ‘fall over’. To protect the network against this upgrades, and implement these in a system-wide threat one or more disaster recovery sites are download in minutes. This brings the railway required to replicate the essential functions of the into the fold of DevOps – combining work ROC in a secondary (and third) location, which streams from developers and operators of the is secured and robust against the threats that system to rapidly develop, implement and enact took out the primary ROC. This needs to have enhanced performance. compatible technology and data to enable a rapid The DevOps cycle provides both benefits to transfer of control. capture and risks to manage: the benefits are that Disaster recovery sites are expensive and rarely the system can now be enhanced incrementally used in rail – this needs to change if we are to use and with minimum service disruption; this will them as a foundational part of our strategy. In that facilitate a rapid change in the capability of the instance we need to do two things: service if played correctly. 1. Use them as training hubs for operators and This flexibility is not without risk, as the procedure maintainers to (a) keep their competency at for remote updates now means the system has the highest level, (b) keep the disaster recovery another access point to secure against cyber ROC in line with the latest technology and threats. But it is a known risk, a quantifiable risk, data releases (c) but also use this opportunity and a risk we can control – indeed, the RiskSOAP “The advent of to close the loop, that is, utilise feedback from model is designed precisely for that purpose and DR is ushering in operators and maintainers to re-think and gives us a new model for network assurance and a new paradigm maybe design more user-friendly interfaces risk modelling. and data representation. in technical Conclusion 2. Hold as many open interfaces as possible to Transportation systems combine social and support for the facilitate the use of the disaster recovery ROC technical components that work together to railway” as a fall-back for multiple field ROCs. achieve the purpose of the system. These Dedicated support teams and complex socio-technical systems consist of DevOps (Development/Operations) many parts, controlled by human or automated In order to support rapid failure recovery, agents in constant interaction and close maintenance teams need to be intimately familiar cooperation, though usually located in different with the systems and always up to date with hierarchical levels and at distant regions. the latest technology and associated skills and Communication and control are critical in order knowledge. Digital systems are not likely to fail to retain, comprehend and share data and often but when they do, the integrated nature of information, maintain awareness of possible the new ecology means more associated systems threats and vulnerabilities and, ultimately, enhance are affected across a wider area. the resilience and safety of rail transport systems. Ideally, such attributes should be embedded into To counter this, maintenance facilities need to systems from the concept stage onwards, so be equipped with a full simulator suite for the they become an integral and intimate part of the systems under control. Any spare time between infrastructure. Signalling and control engineers will routine maintenance and inspections should be continue to play a vital part in this process, within spent on the simulators running through failure the new DR ecology. scenarios, practicing diagnostic routines and response techniques; making sure that no matter We are building a more interconnected world what event occurs the team is capable and agile which will provide the railway and its passengers enough to respond promptly. with many benefits. As the returns build, so does the complexity and risk and we must be ready to The advent of DR is ushering in a new paradigm mitigate this with solutions tailored to our shared in technical support for the railway. It will require digital future. Tools like RiskSOAP and the wider the upskilling of personnel, and the provision of implementation of disaster recovery processes more tools and theoretical training, as the actions and technology will help signalling control and systems perform are now entirely in the software communication engineers increase resilience, domain, with no direct observation possible. by supporting the design, development and We will need to keep maintenance teams maintenance of systems that are self-aware of better informed, even when things are going to their vulnerabilities and can pro-actively prevent plan, in order to build their familiarity with the and instantly react to accidents and losses.
11 IRSE News | Issue 259 | October 2019
References About the authors ... [1] Chatzimichailidou, M M, Stanton, N A, & Dokas, I M Tim Whitcher is solution lead for digital railway at (2015). The concept of risk situation awareness provision: WSP and brings a decade of professional engineering towards a new approach for assessing the DSA about the within safety critical industries; from assistant tester threats and vulnerabilities of complex socio-technical to technical authority. His focus is on delivering a fully systems. Safety Science, 79, 126-138. integrated, holistic solution which delivers not only the [2] Chatzimichailidou, M M, Martinetti, A, Majumdar, right answer now but a bedrock for the future. Tim is A, Dongen, L A V, & Ochieng, W Y (2018). Wheel a chartered engineer and holds a bachelor’s degree in maintenance in rolling stock: safety challenges in the electromechanical engineering and a master’s degree in defect detection process. International Journal of System business. [email protected]. of Systems Engineering, 8(4), 387-397. Mikela Chatzimichailidou is a senior engineer at WSP, an [3] Chatzimichailidou, M M, & Dokas, I M (2016). Introducing internationally recognised and award-winning system RiskSOAP to communicate the distributed situation assurance specialist, honorary research associate at awareness of a system about safety issues: an application Imperial College London and visiting fellow at Cranfield to a robotic system. Ergonomics, 59(3), 409-422. University, specialising in the modelling and management [4] Leveson, N (2011). Engineering a safer world: Systems of high complexity systems risk. She holds master’s degrees thinking applied to safety. MIT press. in innovation management in production engineering and [5] Dokas, I M, Feehan, J, & Imran, S (2013). EWaSAP: An early systems engineering management, a doctorate in system warning sign identification approach based on a systemic safety and human factors and is a Fellow of the Safety and hazard analysis. Safety Science, 58, 11-26. Reliability Society. [email protected]. [6] Chatzimichailidou, M M, & Dokas, I M (2016). RiskSOAP: introducing and applying a methodology of risk self- awareness in road tunnel safety. Accident Analysis & Prevention, 90, 118-127.