Big data, big impact: how big data is shaping everyday life Introduction

We live in an information age, where data is being produced at an ever-increasing rate. 90% of all the data in existence has been created in the last two years, and it’s no longer just businesses and governments who are amassing it. With the birth of the and the rise of social media, the ubiquity of smartphones and tablets, and the development of smart fridges and smart scales and devices that can record our physical movements, we are creating vast personal datasets as well. The challenge presented by big data isn’t management and harvesting of crops. just because of its sheer volume; it also Sensors included in shipping containers comes from the variety of the different could monitor the transport of refrigerated types of data involved and the velocity at goods, alerting operators to changes in which it is created. The value of all this conditions; keeping food in optimum data lies in the useful information and storage conditions could greatly reduce the meaning we can extract from it, and big amount of spoilage (estimated at 10-15% data is being used in all aspects of life. and worth $25 billion) and ensure more of In crime prevention it helps the police to our precious food ends up on plates, rather spot trouble before it starts, and facilitates than in the bin. cross-border collaborations that can put Big data is helping us to learn more about an end to international crime. Bankers and the Universe we live in, and to answer insurers can use big data techniques to some fundamental questions. Reaping spot anomalous behaviour, and bring about all of the benefits that big data offers us huge reductions in fraud. means constant innovation in computing The information our smart devices gather and communications. about us can be linked to public health There are various approaches to processing records and allow the development of big data, including using supercomputers, personalised apps that could help monitor distributed computing systems and and manage health conditions like volunteer computing networks that rely on diabetes. Algorithms can unused computer power being work their way through ‘donated’ by members of the a maze of data that’s public. The current generation impenetrable to the of supercomputers is limited by human eye and come up its power consumption; we’re with recommendations developing energy-efficient that can reduce the computing techniques to help amount of money usher in the next generation – the NHS spends on exascale computers that will be prescriptions, reduce able to perform a million trillion waiting lists and track calculations every second. STFC patient movements have a long history of high (without compromising performance computing, and healthcare). The Part of STFC Supercomputer. Credit: STFC we’re proud to be building on human genome project took 13 years and that and being part of the team pushing $3 billion to sequence the first human boundaries to develop the technologies genome. New DNA sequencing machines needed for ambitious science projects such can do the same thing in one day, bringing as the Square Kilometre Array (SKA) and the cost per genome below $1000 and the Gaia galactic survey. making it possible to tailor treatments The case studies included in this brochure to both the patients and the disease are just a small fraction of the ones we (choosing the most suitable cancer drugs, could tell, but they illustrate the UK’s for example). industrial and research strengths in this In agriculture big data can be used arena, and show why we are ideally placed to track bovine health, and to make to deal with the big data challenges of the recommendations for the planting, future. Science and research

Artists’ impression of the SKA Dish arrays in operation at night. Credit: SKA Organisation

1

SKA Science and research The Square Kilometre Array (SKA) will be the world’s largest and most sensitive telescope. Thousands of radio wave receptors (antennae) will combine to allow the SKA to see back into the early universe, before the stars were formed and there was only gas. It will allow researchers to investigate a wide range of fundamental questions in physics, astronomy, cosmology and astrobiology, exploring distant parts of the Universe for the first time. The SKA presents unprecedented big the volume of data, and which will require data challenges. It digitally combines enough fibre optic cable to encircle the the signals from each antenna – using Earth twice. Clocks at each antenna will powerful supercomputers – to provide a have to be synchronised to a thousand- virtual telescope with a collecting area of a billionth of a second, which is no mean feat square kilometre. It will be 50 times more when you discover that they are spread powerful than any existing radio telescope. across two continents, with SKA sites in The data it will collect in just one day will Australia and South Africa. be enough to fill 15 million 64 Gb mp3 The SDP consortium has an equally players; it would take nearly two million challenging job, focusing on the technology years to play back on an iPod. The SKA that’s needed to turn the data collected central computer will need to have the into useable science products. The SKA processing power of over a 100 million project is driving technology development PCs. The dishes will produce ten times in antennae, data transport, software the current amount of internet traffic; the and computing and power – all of completed aperture arrays will multiply which will have enormous commercial that by ten again. potential. SKA technologies could lead to The Government has committed faster smartphones and higher internet £100 million to the SKA project, and the speeds, and the UK’s role in developing Universities of , Cambridge cutting‑edge data analysis techniques and Oxford are playing an active role in will give us a competitive advantage in a several aspects of the design process. The global market that is expected to be worth UK is leading two consortiums – Signal and £31 billion by 2016. Data Transport (SaDT) and Science Data Processor (SDP). The SaDT consortium “The SKA will be the largest and most is responsible for the design of the data sensitive radio telescope in the world, transport networks that will have to handle stretching data analysis and science technologies to their limits.” Dr Benjamin Stappers, University of Manchester Background image credit: SKA Organisation/Swinburne Astronomy Productions Science and research

2 Artists’ impressions of Gaia. Credit: ESA

Gaia The Gaia spacecraft was launched in December 2013, and made its way to an orbit at L2 (a Lagrange point) from where it is best placed to get a good view of the galaxy. Gaia’s mission, scheduled to last five years, is to perform the most detailed survey of the Milky Way. It will be charting the position, motion, luminosity, temperature and composition of a billion stars, creating an extraordinarily precise 3D map of our galaxy. The spacecraft will observe each target star 80 times over five years, making an average of 40 million observations every day. The UK helped to build the spacecraft itself, computers, essential to process the torrent and the very special billion-pixel camera of data, were produced by Astrium in that gives Gaia an unprecedented view Stevenage. of the galaxy; UK industry and science Why go to all this trouble? Gaia’s stellar institutes won around €80 million in census will provide the data needed to contracts. As well as the considerable answer a wide range of questions relating engineering challenge involved in building to the origin, history and structure of the Gaia, the UK is involved in the technical Milky Way. By examining the large-scale challenge of processing the vast amount motion of stars, it will also show us the of data into useable science products. distribution of dark matter, the otherwise The Data Processing and Analysis invisible substance that is thought to hold Consortium (DPAC) is the pan-European our galaxy together. Gaia’s mission will team of scientists and software developers improve our understanding not only of the responsible for processing Gaia’s data and solar system and our galaxy, but also of producing the Gaia catalogue. The data the fundamental physics underpinning the will be processed in six data centres, each entire Universe. of which is dedicated to a different aspect of the data. The Cambridge data centre is responsible for processing photometric data. Gaia’s data will be transmitted ‘raw’ to “We expect that the photometric data Earth for processing. Even after it has been processing software to which we have compressed, the amount of data produced contributed as part of the UK-led team during the five-year mission would fill will offer the first opportunity ever to more than 30,000 CD ROMS. Processing precisely measure the brightness of the the data is such a complex process that it billion objects that GAIA will see.” has to be automated. The UK will be at the Dr Peter Allan forefront of processing Gaia’s images, and Head of the Space Data Division STFC helped to set up the data applications at RAL Space centre. Gaia’s powerful on‑board Science and research

3

Climateprediction.net Science and research The winter of 2013/14 proved to be an unusually wet one for the southern UK – the wettest in 250 years. Most of us would prefer not to relive it, but climate researchers from the University of Oxford are doing just that, several thousand times over. They’re involved in a citizen science project called ‘weather@home’, which relies on the donated computing power of thousands of PCs to run two models. The first model is based on the current climate reality, the second on what the climate would be like if climate change didn’t exist. The aim of the project is to determine what influence climate change had on our rotten winter weather. Experiments that attempt to link climate Most extreme weather events take place change to particular extreme weather on a scale that global climate models events are called attribution studies. can’t show. The weather@home system is Because they’re looking at rare events, the a family of regional climate models that models have to be run many thousands of allow scientists across the world to gauge times to deliver a statistically robust result. how climate change is affecting weather This work could be done with a super locally. Having this information to hand will computer, but weather@home relies on help us to anticipate what extreme weather distributed computing, using thousands of events may occur, to plan for the future and computer volunteered by members of the to reduce the lives lost and costs incurred public. Participants in the project can get by unexpectedly bad weather. involved in cutting-edge climate research – all they need is a computer. Weather@home is the latest experiment to be run by the climateprediction.net team, which is part of the RCUK e-science programme. Launched in November 2010, with support from the Guardian newspaper, it uses a regional climate model (previous climate prediction models were all global) and can look specifically at UK weather events. Climateprediction.net was launched in 2003, and by 2005 had already published its first results in Nature. The initial experiment used more than 90,000 PCs and “Together, we will see the answer revealed that a doubling of pre-industrial emerge.” atmospheric carbon dioxide levels could Dr Nathalie Schaller lead to more than double the temperature rise originally predicted. University of Oxford Science and research

4 Panasas aisle. Credit: STFC

JASMIN In general it is not public demand that drives computing advances, but the requirements of researchers to collect, store and manipulate increasingly large and complex datasets. Big science projects, such as those supported by STFC, have consistently pushed the boundaries of data volumes and complexity, serving as ‘stretch goals’ that drive technical innovation. JASMIN is a super-data-cluster that delivers for Atmospheric Science, the facility for the high-tech infrastructure required Climate and Environmental Monitoring for data analysis for UK and European from Space and the wider atmospheric environmental science research. Housed science and earth observation communities. at the Rutherford Appleton Laboratory, It comprises 5.5 Petabytes of fast storage JASMIN is half supercomputer and half data and 640 CPU cores and large network centre. It has high bandwidth networks bandwidth to the data of over 1Tbit/s, linking it to satellite installations at the and provides an efficient data analysis Universities of , , Reading and environment. as well as Plymouth Marine Phases two and three, managed by STFC’s Labs and the Met Office. This unique scientific computing department and due computational environment was funded by to be completed in 2014-15, will open the the Natural Environment Research Council facility to the wider environmental science (NERC) and the UK Space Agency (UKSA) community with an additional 7 PB of and delivered by STFC. fast storage and extra 3,500 CPU cores in Phase one of JASMIN was delivered in 2013-14, and another significant upgrade 2012 and supports the National Centre in 2014-15. Environment and climate

5

Gung-Ho Environment and climate Weather affects every aspect of modern life, such as transport, agriculture, energy use and leisure. The winter flooding of 2013/4 is estimated to have caused £426 million of flood damage. The summer flooding in 2007 affected 50,000 homes and led to insurance payouts of £3 billion, with 25% of claims made by businesses. The UK Met Office uses an IBM India, South Africa and South Korea. The supercomputer capable of performing MET Office is now collaborating with more than 100 trillion calculations per STFC’s Hartree Centre and NERC to produce second to create 3000 tailored forecasts a next-generation weather and climate every day. These forecasts are delivered model, a project code named ‘Gung-Ho’ to a huge range of customers, including (the original Chinese meaning of which is the Government and armed forces, the “working harmoniously together”). NHS and businesses, providing information The new model will be able to make on how the weather might affect things use of the ever increasing power of as diverse as hospital admissions, traffic supercomputers, expected to reach conditions, and military operations. the exascale (containing millions of Forecasts save money and lives. processors and able to perform a million The software used to provide these trillion calculations per second) by the forecasts relies on a Unified Model (UM) end of the decade. It will give far more of the climate, parts of which are nearly accurate forecasts, with much higher 25 years old. The MET Office’s UM is also resolution – down to individual towns or used by other national weather services – roads – and maintain the UK’s leadership in including Australia, New Zealand, Norway, environmental prediction.

“Tomorrow’s ‘exascale’ computers represent a huge opportunity and a huge challenge for the science of weather forecasting. The opportunity to produce forecast detail down to the scales which affect specific human activities are beckoning”. Professor Stephen Mobbs Director of NERC’s National Centre for Atmospheric Science Environment and climate

6

Earthquake prediction Although it’s not possible to predict the exact time and date an earthquake will occur, a team of researchers from the National Autonomous University of Mexico (UNAM), led by Dr Mario Chavez, have been working on a way to predict what the outcome might be when it does. They have used high performance computing to study the propagation of seismic waves through the Earth’s crust, modelling major historical earthquakes such as the devastating magnitude scale eight events in Mexico City in 1985 and Sichuan in 2008. Having studied these historical events, to create long-term research partnerships. the team are working on a system that The use of high performance computing can predict what the impact of a specific for this project enables models to be run at earthquake event would be – based on its much higher resolution, giving a much finer magnitude and epicentre. The information and more detailed view of possible impacts. provided by the model could play a major role in the design process for buildings as varied as nuclear power stations , hospitals and schools and homes, determining how resilient they need to be in order to minimise earthquake damage. Although modelling these scenarios has enormous potential, it also poses a significant computing challenge, requiring large amounts of memory and storage and intensive use of computing resources. Staff from STFC’s scientific computing department were able to optimise and develop the necessary programming code so that the model could be run on “This project required availability of thousands of processors simultaneously for many hours. The model could then much larger computing resources be run on HECToR, the UK’s National than is currently available in Mexico, Supercomputing Service at the time. making use of the highest levels of performance on parallel machines.” The collaboration was made possible through Scientific Computing Advanced Dr Mike Ashworth Training (SCAT), a programme funded by the Associate Director of the European Commission that aims to provide Scientific Computing Department training in computational science to young at Daresbury Laboratory scientists in Europe and Latin America, and Environment and climate

7

Space weather Environment and climate At the UK Solar System Data Centre at the Rutherford Appleton Laboratory, Matthew Wild is digitising materials held in the archives and allow the UK research community to take advantage of existing data (a project funded by NERC). Between 1903 and 1942, the Cambridge Solar Observatory took a daily image of the solar disc, each one of which was stored on a glass plate. From 1957 until 1991, data from the UK ionospheric monitoring programme was stored on 35 mm film. Digitisation of both these resources secures and improves access to a valuable environmental data source. These historical records of solar activity help us to understand the likelihood and severity of solar events that can disrupt modern technologies here on Earth – a phenomenon now known as space weather. Electrical power infrastructure is of only a few countries who have the particularly vulnerable to space weather capability to forecast space weather. From effects, and of critical importance to spring 2014, the Met Office will be able to modern economies and societies. A space issue continuous space weather forecasts, weather event in March 1989 caused the using data from both ground-based and failure of Quebec’s power grid, which satellite instrumentation. Near real-time went from normal operation to complete observations of the solar surface and shutdown in 90 seconds. Five million atmosphere will detect active regions that people were left without electricity in the could become the source of large events. nine hours it took to restore operations, and Earth’s atmosphere is also monitored, businesses across Quebec were disrupted. to detect changes related to solar wind The costs incurred were estimated to be variations and the short-term impacts of over C$ 2 billion, including C$ 13 million of solar eruptions. direct damage to the grid. Power systems It’s not just the energy industry that will were also affected elsewhere in the world, benefit from space weather forecasts. All with permanent damage to a $12 million Global Positioning System (GPS) signals are transformer in New York and major damage vulnerable to space weather, with potential to large transformers in the UK. Since then impacts on aviation and other transport the power industry has been working to industries. Communications, pipelines and limit the effects of space weather events, the mining industry may also be affected, but a better understanding of how and as may any business reliant on modern why they occur will allow more effective technology, including the finance sector. protections to be put in place, and more accurate space weather forecasts. A £4.6 million investment by the Department for Business, Innovation and Skills is being used to make the UK one Environment and climate

8

Energy from waste heat When you turn on an old-fashioned, incandescent light bulb, only 5% of the energy it uses is converted into light – the rest is wasted as heat. And 36% of the petrol you put in your car is also lost as waste heat, with just 27% being used to move the car forwards. If we could capture and make use of that heat energy, energy efficiency would rise and fuel use would drop. One way of making use of waste heat is to use thermoelectric materials that convert heat energy into electricity. Researchers led by Royal Holloway, University of (supported by the Engineering and Physical Research Council) have been experimenting with advanced materials that could pave the way for a new generation of thermoelectric materials. After performing X-ray and neutron scattering experiments, the team needed high performance computing to interpret their results. They called on STFC’s scientific computing team to help with the complex materials modelling calculations required, which were performed at the UK’s national supercomputer facility. Supercomputers are currently limited by their power consumption, and STFC are working on energy efficient computing technologies that will usher in the next- generation exascale supercomputers. Thermoelectric materials could be used to build solid state refrigerators that would generate electricity whilst keeping computer chips cool, or to recover heat from car exhausts. They are useful in all kinds of off-grid application, including deep space missions, and these advanced materials may one day make it to Mars. Environment and climate

9

Copernicus Environment and climate The EU Copernicus programme aims to build the most efficient and comprehensive Earth-observation system in the world, involving a constellation of satellites closely monitoring the planet. The first, Sentinel 1a, was launched in April 2014. Its task is radar mapping, and its key role is to provide rapid damage maps to help the emergency services deal with disasters such as earthquakes and severe floods. Sentinel 1a will also be able to monitor coastal waters for oil spills (or icebergs) and investigate subsidence. Airbus developed the radar instrument for Sentinel 1a in Germany, and the associated electronics in the UK. Sentinel 1a is expected to produce the EU’s GDP (gross domestic product) 600 gigabytes of data per orbit, which of S30 billion, by 2030. The vision is is about 2.5 terabytes per day. When it for Copernicus to be an open-ended has been joined by a full complement of programme, with satellites being replaced Sentinel satellites, that figure is expected to as they reach the end of their lifespan and rise to 8 terabytes per day. Dealing with this more Sentinel series to come. amount of data has required considerable The Sentinel 3 mission will include the Sea investment in computer processing power and Land Surface Temperature Radiometer and storage on the ground, but the aim of (SLSTR), an instrument capable of making Sentinel is to return data to Earth much highly accurate measurements of global faster than existing satellites, which store surface temperatures. RAL Space is playing data to be sent down when they pass over a key role in the design process for the a ground station. The European Data Relay SLSTR, and will build a dedicated facility for System will use lasers to transmit data pre-flight calibration activities. within minutes, rather than hours, meaning that Sentinel 1a could be used for flood prediction as well as flood monitoring. With five more launches planned by 2019, Copernicus will have many uses, including climate studies and monitoring fish stocks, air quality and waste disposal. All of Copernicus’ data will be open, and freely available. Research has shown that allowing unfettered access is likely to stimulate novel uses of the data, resulting in the emergence of many new companies selling new services. It is anticipated that the Copernicus programme will give rise Acquired by the Sentinel-1A satellite, this image shows to around 48,000 jobs, and a boost to part of India’s Andaman and Nicobar islands. Credit: ESA Medicine and health

10

Predicting polymorphs The action of an organic molecule (whether it’s inside of a chocolate bar, or a pharmaceutical drug) depends on both its chemical composition and its physical shape. It’s possible for molecules with the same chemical composition to exist with different crystal structures, and these variants are called polymorphs. Although they can be discovered experimentally, it’s impossible for a researcher to know whether they’ve found all of the possible polymorphs. An unexpected polymorph can affect the efficiency of a drug, for example by changing its solubility and the rate at which it is absorbed into the bloodstream. Polymorphs can cause problems years after Predicting the existence of polymorphs a drug is first introduced to the market, computationally is a tricky challenge. It which can mean the drug needs to be involves analysing millions of potential withdrawn so that the problem can be structures to identify those that are most solved. Or a newly discovered polymorph likely to be stable. Professor Sally Price could be patented as a new variant from University College London (UCL) of the drug by a rival pharmaceutical rose to the challenge, with the help of an company. The UK-based pharmaceutical integrated data and computing structure industry employs around 68,000 people developed for the RCUK e-materials and produces an annual trade surplus project by UCL and STFC. Using this new of £5 billion. Protecting Grid system, simulations and analysis are pharmaceutical revenues automated. Professor Price predicted the is crucial to the existence of a new crystal structure of UK economy. piracetam, the Alzheimer’s drug, which matched a laboratory experiment by Professor Colin Pulham and his team from the University of Edinburgh. The same method has also been used to predict a new form of progesterone (used in oral contraceptives and hormone replacement therapy) and new crystal structures of aspirin. The data produced is stored in the Computed Crystal Structure (CCS) database, hosted by the National Grid Service. Medicine and health

11

Image analysis Medicine and health Magnetic resonance imaging (MRI) is a powerful medical tool, able to create detailed images to aid the diagnosis of diseases including cancer, heart disease, multiple sclerosis and Alzheimer’s disease. Today’s MRI scanners are underpinned by 70 years of physics research, and the superconducting ‘Rutherford cable’ inside each one was invented at the Rutherford Appleton Laboratory for particle physics applications. The global market for MRI systems is growing quickly. Worth around £4.3 billion in 2010, it is expected to grow to around £6.2 billion by 2015. It’s an important industry in the UK, which made a value-added contribution to UK GDP of £111 million in 2011, and supported around 2,200 UK jobs. Processing MRI images is a big data remains relatively constant as the size problem – the 2D and 3D datasets are of the dataset increases. The system has large, and the number of images taken potential for use in many other industries per patient is increasing. An MRI scan lasts as well, including security scanning and from 30-40 minutes, during which the seismic interpretation for oil and gas patient has to lie still, which is difficult for surveying. very young patients, the elderly and the very ill. At £500 per scan, and with 10% of scans ruined, there are considerable costs to the NHS. Blackford Analysis, a spin-out company from the University of Edinburgh, has a software solution. They use the MOPED algorithm, patented technology originally developed at the Institute of Astronomy at the University of Edinburgh, to speed-up the processing of galaxy spectra. The STFC-funded research was used to determine the star formation history of the universe, and created a thousand-fold increase in processing speed. Applied to MRI scans, Blackford Analysis’ system can stabilise images and align 3D medical scans in real-time. It can improve radiography throughput by 10%, estimated to be worth $1.2 billion in the US alone. The company is making strong inroads into the US market. It allows rapid processing of large datasets, and the processing time Medicine and health

12

WISDOM Malaria is one of the planet’s deadliest killers, and the leading cause of sickness and death in the developing world. Every year there are 350-500 million cases of malaria worldwide, causing between one and three million deaths (primarily in children under five). WISDOM was a pioneering project that ‘dock’ with proteins in the infections brought together 5000 computers in agents (a parasite, for malaria) and might 27 different countries and allowed therefore have potential as anti-malarial UK scientists to identify promising drugs. Solving a huge biomedical data drug compounds to fight malaria. Grid challenge, WISDOM was able to analyse computing pools the resources of 41 million combinations in just six weeks, geographically-distant computers to allow which would have been more than 80 scientists to process large amounts of years of work for a single PC. It identified data in short periods of time. National over 30 leads. A second run over four Grid Initiatives (NGI) in lots of countries months looked at over 140 million more link together thousands of computers in compounds. universities, data centres and national Ruling out inactive compounds in this facilities; the UK’s NGI is coordinated by way allows drug researchers to focus STFC. These NGIs are then linked together their laboratory experiments on promising by the European Grid Initiative. WISDOM potential drugs, speeding up the drug made use of the grid developed for the development process and reducing its cost. Large Hadron Collider (LHC), before it was WISDOM analysed an average of 80,000 needed for processing LHC data. compounds every hour, with 45% of its During the WISDOM project, computers computing hours provided by the UK. The calculated which compounds would WISDOM project is a model for successful international scientific cooperation.

“Using grid computing to find potential solutions before going into the laboratory means that precious time and physical resources can be saved, potentially leading to cures and treatments to diseases much more quickly.” Professor Neil Geddes Director of STFC Technology Medicine and health

13

HIV Medicine and health The combined supercomputing power of the UK and US national computing grids enabled scientists at University College London to simulate the efficacy of a drug in blocking a key protein (protease) used by HIV, the virus that causes AIDS. HIV is known to mutate, and develop drug resistance, and this research could one day be used to tailor personal drug treatments, for example, for HIV patients developing resistance to their drugs. The study, published online in the Journal Saquinavir, a known inhibitor of HIV-1 of the American Chemical Society, ran a protease, blocks the maturation step of the large number of simulations to predict how HIV life cycle. The study, which involved a strongly the drug saquinavir would bind to sequence of simulation steps, performed three resistant mutants of HIV-1 protease across several supercomputers on the and wild type protease, one of the proteins UK’s National Grid Service (NGS) and the produced by the virus to propagate itself. US TeraGrid, took two weeks and used computational power roughly equivalent to that needed to perform a long-range weather forecast. Credit: Vlad Galenko/Shutterstock.com Communications and computing

14

The In 1989 at CERN, Sir Tim Berners-Lee wanted to solve the problem of sharing digital information between scientists with different computers, running different operating systems and using different document formats. He came up with the World Wide Web, which is now a fundamental part of modern life. Thirty-six million adults accessed the internet every day in the UK in 2013, and it is worth more than £121 billion to the UK economy every year. Sir Tim Berners-Lee chose not to profit from culture, geographical location, or physical his invention, and in 1994 he founded the or mental ability. The W3C office for the World Wide Web Consortium (W3C) to lead UK and Ireland was hosted by STFC from the World Wide Web to its full potential 1997 to 2009, and Rutherford Appleton by developing protocols and guidelines Laboratory staff have contributed to that ensure the long-term growth of the W3C web standards for graphic images, Web. One of W3C’s primary goals is to multimedia and thesaurus exchange. The make the Web’s benefits available to all Rutherford Appleton Laboratory also hosted people whatever their hardware, software, the first ever UK website, and installed one network infrastructure, native language, of the first 50 webservers. Communications and computing

One of the ATLAS detector modules used at CERN. Credit: STFC 15

The GRID Communications and computing Physicists working on experiments at the Large Hadron Collider at CERN have to sift through around 15 petabytes of data every year in search of particle collisions with interesting results. The results have been spectacular, with the discovery of the Higgs boson in 2012 leading to a Nobel Prize for Peter Higgs and François Englert in 2013. CERN doesn’t have the computing and financial resources to do all of the data analysis on site, and so they make use of grid (distributed) computing. CERN’s Data Centre stores experiment data and sends it around the world for analysis, via the Worldwide LHC Computing Grid (WLGG), which began in 2002. Grid computing builds on world wide web Beyond their obvious cost benefits, technology, allowing computers all over the computer grids are robust as they have no world to be linked together to share their single point of failure, and can be easily computing power. The WLGG runs over a reconfigured to meet evolving challenges. million jobs every day, with data transfer rates peaking at ten gigabytes of data every second. It allows international teams to collaborate from the comfort of their own lab, giving more than 8000 researchers almost real-time access to LHC data. The WLGG is just one of hundreds of grids around the world, many of which are used for e-science. Biologists use them to simulate drug candidates; earth scientists use them to analyse satellite data to track ozone levels. Engineers are using grids to investigate alternative fuels, and artists are using them to create complex animations, included in films such as Kung Fu Panda. The Grid computing concept also allows volunteer computing, with members of the public donating ‘spare’ processing power to contribute to citizen science projects. Well- known examples include climateprediction. net, SETI@home (the search for extra- terrestrial intelligence) and LHC@home (which supports accelerator physicists simulating proton beam stability).

Proton collision. Credit: CERN Communications and computing

16

Janet The Janet network connects UK universities, further education colleges, research councils and other education providers. Over 18 million end-users are currently served by the Janet network, which is managed by Janet UK, a non-profit company spun out from the Rutherford Appleton Laboratory networking group in 1984. Janet allows teachers and students with eduroam credentials to log on to the network from any connected institution, removing the need for guest accounts. In 2004, Janet teamed up with the BBC A significant proportion of the data to stream the Olympic Games live from generated by the Large Hadron Collider Athens. During London 2012, key moments (LHC) is held at the Rutherford Appleton of the games were broadcast in Super Laboratory and distributed onwards via Hi‑Vision at various locations across the Janet to physics departments around the world, across the Janet network. country. This data is sent from CERN to Janet’s Lightpath service supports research the UK via the Janet Lightpath service, and projects with dedicated point-to-point its dedicated capacity enables the LHC to network connections. In January 2009 it distribute the analysis of the petabytes of was used during a special event bringing 17 data the LHC experiments generate across telescopes around the world together for a the physics community. The Lightpath 33-hour real-time astronomical observation. concept was developed as a direct result Two 1 Gbit/s links connected the Lovell of requests from the scientific community telescope at Jodrell Bank in Manchester to for ‘special purpose bandwidth’ and is London, and to the Science Centre in the used across physics and other scientific Netherlands where the signals from all of disciplines. the telescopes were brought together. Communications and computing

17

Nominet Communications and computing The first UK internet domain name was used in 1985. After Tim Berners-Lee’s invention of the World Wide Web in 1989, demand for domain names began to rise and outgrew the voluntary Naming Group who controlled them. In 1996, Nominet was spun out from the Rutherford Appleton Laboratory Networking Group as a non‑profit company, to manage the top level internet domain for the UK (.UK). Nominet is now one of the world’s leading As part of its public purpose, Nominet internet registry companies, running .cymru set up the Nominet Trust in 2008 to fund and .wales domains from 2014. Its role is internet-based projects that make a positive to protect, promote and support the online difference to the lives of disadvantaged presence of more than ten million domain and vulnerable people. To date, the Trust names. It also runs an award-winning has invested over £17 million in people dispute resolution service, provides support committed to using the internet to address for domain name holders and sellers and is big social challenges. behind the WHOIS domain name look-up service. Customers wishing to purchase a .UK domain name do so from one of Nominet’s licensed registrars. Manufacturing and industry

Bentley Virtual Reality Prototype at the Hartree Centre, Daresbury Laboratory Visualisation Suite used to test different product designs. 18 Credit: STFC

Bentley and the visualisation suite The Hartree Centre is an industrial gateway to world-class high performance computing (HPC) and simulation technology. Home to the UK’s most powerful supercomputer dedicated to the development, deployment and demonstration of new software, it enables new HPC collaborations that promote UK economic growth. Hartree’s visualisation suite allows industrial users to integrate the use of virtual models into their product development process, improving designs at an early stage, when changes are far less costly. Bentley Motors are dedicated to developing the use of virtual reality technologies and and crafting the world’s most desirable immersive environments in product design high-performance cars. The company was and development. Due to the success of the established in 1919, with the first Bentley project, Bentley engineers have adopted car built in a workshop near London’s this approach for the development of their Barker Street. In 2013, Bentley produced next-generation products. Using the state- 10,120 cars and added the Continental GT of-the-art visualisation suite at the Hartree Speed Convertible - the fastest four-seat Centre allows Bentley to create new convertible in the world - to their line-up. vehicle models virtually. This speeds up Exports accounted for 86 per cent of product development times through better Bentley’s 2013 turnover. understanding of design data and reduces Working together, Bentley and the Virtual the number of physical prototypes required, Engineering Centre (a partnership between leading to lower costs and eliminating the Hartree and the University of Liverpool) need for late-stage modifications. developed a unique framework to evaluate Manufacturing and industry

19

Product development ‘app’ Manufacturing and industry Leading companies Unilever, Syngenta and Infineum are using high performance computing to reduce the time it takes to bring a product to market by up to 80%. Funded as part of a £1 million grant from the Technology Strategy Board to support the development of new ways of designing, improving and manufacturing complex high-value formulated products, the initiative will make use of one of the UK’s most powerful supercomputers. Blue Joule is housed in the Hartree Centre at Daresbury Laboratory, and is the UK’s largest supercomputing facility dedicated to industrial applications. Blue Joule can perform up to 15 trillion calculations per second, and can be used to perform advanced computer modelling, simulation and 3D visualisation that can achieve results in 40 minutes that would require a week’s worth of experimentation. The initiative’s aim is to develop a range of new software tools that can be combined into an ‘app’ that manufacturers can run themselves, rather than requiring the presence of specialist computational scientists – helping to bridge the gap between science and industry. Speed-to-market is a critical factor in the success of UK companies in highly-competitive markets, and will reduce both the time and cost of new product development in areas such as environmentally friendly cleaning products, cleaner lubricating oils and fuels, more sustainable crop protection products and breakthrough personal care products. Left: STFC’s Blue Joule, an IBM Blue Gene/Q, the UK’s most powerful supercomputer. Credit: STFC

“We have identified STFC’s Hartree Centre as a key enabler that will allow us to access the power of the supercomputer to accelerate our discovery processes.” Massimo Noro, Relationship Manager at Unilever Manufacturing and industry

High resolution visulisation of a rabbit’s heart using community cluster and open source software stack. Credit: STFC

20

Computer animation In the early 1960s, the Atlas Laboratory on what is now the Rutherford Appleton Laboratory (RAL) site developed ground-breaking computer graphics and animation technologies to help researchers visualise complex mathematical datasets. This innovative and pioneering approaching using computer-generated imagery (CGI) made the Atlas Laboratory, in the eyes of the Financial Times, “the spiritual home of computer animation in Britain”. By 1970, the laboratory had produced the commercialising the CGI concepts and first commercially-available computer- code developed by STFC, and introducing animated films. Change and Chance was a them to new markets. The UK computer series of short films on thermodynamics, animation is now worth £20 billion, produced entirely on computer. These including £2 billion generated by the video silent, black and white films were and computer games market. distributed by Penguin Books, and aimed Animations are now so detailed and at A-level physics students. complex that creating a film such as Kung RAL scientists continued to lead the Fu Panda typically takes around 25 million UK’s CGI field during the following rendering hours from start to finish. two decades. Computer images DreamWorks releases at least two films created here made a notable per year, and is generally developing ten contribution to the film Alien, the different feature films at any one time. first significant film to use CGI, To reduce the strain on their computing which won the 1979 Academy systems, they now use parallel file server Award for Best Visual Effects. software – grid computing. Scenes that It success spawned a new once took hours to complete can now be sector, with many companies rendered in seconds. Timeline

1964 Atlas 1 computer installed at RAL

1979 Alien wins an Oscar for its special effects, which include a sequence computer- generated at RAL 21 1984 1984 JANET spins out of Invention of the World Wide the RAL Networking Web at CERN, by Sir Tim Berners-Lee Group Timeline 1990 1992 The official launch of the The LHC leads to the Human Genome Project development of grid computing 1996 Nominet takes over management of the .UK internet domain

2003 Climateprediction.net launches

2013 2012 The Gaia spacecraft JASMIN phase 1 is is lanched delivered at RAL 2013 2014 Blue Joule is installed at ARCHER takes over from HECToR Daresbury Laboratory’s as the UK’s national high Hartree Centre performance computing service 2014 Sentinal 1a, the first satellite of the Copernicus constellation, is launched 2014 Work begins on MeerKAT, the first of SKA’s 64 antennae

2019 Gaia completes its galactic survey mission

2023 SKA generates 1.3 zettabyes (1300 billion gigabytes) of data every month Ninety per cent of the all of the data in existence has been created in the last two years. The birth of the World Wide Web has given rise to social media, ubiquitous smart devices and the internet of ‘things’ – it’s clear that we live in an information age.

With stories from the worlds of science and research, environment and climate, medicine and health and manufacturing and industry, the information in this brochure is just a sample of the significant social and economic impacts of big data.

Science and Technology Facilities Council Polaris House, North Star Avenue, Swindon SN2 1SZ T: +44 (0)1793 442000 F: +44 (0)1792 442002 E: [email protected] www.stfc.ac.uk July 2014

Head office: Polaris House, North Star Avenue, Swindon SN2 1SZ, United Kingdom. Establishments at: Rutherford Appleton Laboratory, Oxfordshire; Daresbury Laboratory, Cheshire; UK Astronomy Technology Centre, Edinburgh; Chilbolton Observatory, Hampshire; Isaac Newton Group, La Palma; Joint Astronomy Centre, Hawaii.