<<

38th Research Students’ Conference in Probability and Statistics

Conference Proceedings 3rd – 6th August 2015

Welcome from the Organisers

Dear Delegate,

We at the University of take great pleasure in welcoming you to the 38th Research Students’ Conference (RSC) in Probability and Statistics. The RSC is an annual conference that provides an opportunity for research students from probability and statistics to come together in a relaxed environment to discuss their research. Against this backdrop, we hope that new contacts and friendships will be forged. For many students this will be your first experience of presenting your work, and some of you will have the opportunity to chair a session. For those of you attending and not presenting, we hope that you will benefit greatly from observing others and network- ing with researchers working in a similar field. For more information on RSC 2015 see rsc2015.co.uk or join our RSC 2015 - Leeds Facebook group: https://www.facebook.com/groups/rsc15 This year’s conference features plenary speakers Professor , professor of probability and statistics at University College London, and Professor Sir David Cox, hon- orary fellow of Nuffield College at the . We are no longer looking for potential hosts for RSC 2017, as the University of Durham have volunteered to host. Next year the conference will be held in Dublin. Lastly, we must give special thanks to all of the sponsors, whose support is vital for the continuation of the RSC project and to the previous hosts at Lancaster and Nottingham who have helped us to organise this year’s event.

The RSC 2015 Committee Timetable of Events

Monday 3rd August

12:30 - 15:00 Registration & Arrival Refreshments (Parkinson Court)

15:00 - 16.55 Welcome & Plenary Sessions (Michael Sadler Building)

15:00 - 15.15 Welcome: Colleen Nooney (Committee Chair) 15:15 - 16:00 Plenary Talk I: Professor Sir David Cox 16:00 - 16:45 Plenary Talk II: Professor Valerie Isham 16:45 - 16:55 Instructions for Chairs

19:00 - ATASS BBQ (The Faversham)

Tuesday 4th August

08:00 - 09:15 Breakfast - full delegates only (The Refectory)

09:20 - 10:40 Session 1 (Michael Sadler Building)

10:40 - 11:20 Coffee Break (Parkinson Court)

11:20 - 12:40 Session 2 (Michael Sadler Building)

12:40 - 14:00 Lunch (Parkinson Court)

14:00 - 15:20 Session 3 (Michael Sadler Building)

15:20 - 17:00 Poster Session and Refreshments (Parkinson Court)

17:45 - 19:00 Dinner (The Refectory)

19:00 - Cinema/ Town Hall Tour/ Pub Crawl

Town Hall Tour group will leave promptly from outside The Refectory at 19:00 Wednesday 5th August

08:00 - 09:15 Breakfast - full delegates only (The Refectory)

09:40 - 11:00 Session 4 (Michael Sadler Building)

11:00 - 11:40 Coffee Break (Parkinson Court)

11:40 - 12:40 Session 5 (Michael Sadler Building)

12:40 - 14:00 Lunch (Parkinson Court)

14:00 - 15:30 Sponsors’ Talks (Michael Sadler Building)

15:30 - 17:00 Sponsors’ Drinks Reception (Parkinson Court)

19:00 - Conference Dinner at The Refectory

Thursday 6th August

08:00 - 09:00 Breakfast - full delegates only (The Refectory)

09:00 - 10:00 Check Out of Accommodation (Storm Jameson Court) Contents

1 Map of University Campus 1

2 Help, Information and Telephone Numbers 3

3 Facilities and Transport 4

4 The RSC 2015 Committee 5

5 The City and University 6

6 Accommodation, Meals and Organised Entertainment 8

7 Instructions 10 7.1 For Chairs ...... 10 7.2 For Speakers ...... 10 7.3 For Displaying a Poster ...... 11

8 Prizes 11

9 Plenary Session 12 9.1 Professor Valerie Isham (UCL) ...... 12 9.2 Professor Sir David Cox (University of Oxford) ...... 12

10 Talks Schedule 14 10.1 Tuesday 4th August ...... 14 10.2 Wednesday 5th August ...... 17

11 Sponsors’ Talks: 14:00 - 15:30 20 11.1 Wednesday 5th August ...... 20

12 Talk Abstracts by Session 21 12.1 Tuesday 4th August ...... 21 12.1.1 Session 1: 09:20-10:40 ...... 21 12.1.2 Session 2: 11:20-12:40 ...... 30 12.1.3 Session 3: 14:00-15:20 ...... 39 12.2 Wednesday 5th July ...... 45 12.2.1 Session 4: 09:40-11:00 ...... 45 12.2.2 Session 5: 11:40-12:40 ...... 55 12.2.3 Sponsors’ Talks - 14:00-15:30 ...... 60

13 Poster Abstracts by Author 63

14 Sponsors’ Advertisements 72

15 RSC History 76

16 Delegate List 77

17 Voting Slip for Best Talks and Best Poster 85 1 Map of University Campus

38th Research Students Conference in Probability and Statistics Headingley 3rd August - 6th August 2015

ST. GEORGE'S FIELD

60 4FDVSJUZ0óDF 0113 343 5494 MAIN ENTRANCE 78 29

86

City centre

4QSJOHöFME SOUTH House ENTRANCE (Permit Holders Only) Steps CLARENDONWAY

City centre

Key Venues Car parks University visitors’ car parks (limited access) Accommodation - Storm Jameson 86 Other university car parks Parkinson Court 60 Public multi-storey car park

Michael Sadler Building 78

Refectory 29 Other useful information CityBus Stop Bus Stop Taxi Rank Pedestrian Only Area Lawns

1 Walking routes between the Train (A)/Bus (C) Stations and the (B) Route A – B will take approximately 27 minutes to walk Route C – B will take approximately 32 minutes to walk

2 2 Help, Information and Telephone Numbers

Accommodation: Storm Jameson Court Mount Preston Street Leeds LS2 9JP

Telephone (External): 0113 343 2750

Department Address: School of Mathematics Leeds LS2 9JT

Emergency Numbers: University Security (External): 0113 343 2222 Conference Organiser: 07792639226 (Colleen Nooney) University Event Manager: 0113 343 6104 (Ros Bates)

Taxi Firms: Amber Cars: 0113 202 2117 Royal Cars: 0113 230 5000 Blueline: 0113 244 5566 City Cabs: 0113 246 9999

3 3 Facilities and Transport

Shops: The university is only a 15 minute walk from the city centre, where you can find anything you might need. However, there is a Co-op in the Student Union and various cafes and bars. There are also a number of cafes, a pharmacy and a Tesco Express on Woodhouse Lane, opposite the Parkinson Building steps.

Transport: Rail: The First Leeds number 1 bus travels past the main campus entrance every 10 min- utes. The service can be caught from Infirmary Street (Stand B) in City Square or Bishopgate Street (Stop Z1). Both stops are close to the rail station. Alternatively, a taxi fare to the main campus entrance should be about £5 (see page 3). It is a one mile walk (approximately 24 minutes) from the station to the Parkinson building.

Air: A taxi from Leeds Bradford International airport to the main campus entrance will cost approximately £20. Arrow taxis have an office at the airport. The Yorkshire Tiger 757 bus service runs from Stand A at the airport to Leeds bus station. From there you can catch regular buses to the University campus. See below in the Bus and Coach section for more information.

Bus: For timetables, routes and pricing of coach services to Leeds, please visit the National Express and Megabus websites. Both of these coach operators stop at Leeds bus station. There are regular buses that run from the bus station to the main campus entrance, in particular numbers 6, 28 or 97. For further information please visit the Metro bus information website. There is also a taxi rank located at the bus station. A taxi will take about 10 minutes and cost approximately £5 (see page 3).

Car: The main entrance to the campus for cars is on Woodhouse Lane, or Cavendish Lane on some Sat-Nav systems. Parking on campus is limited, and we are also commit- ted to reducing our carbon emissions, so we encourage walking and cycling to the University where possible.Parking for disabled “blue badge” holders is available on campus for visitors. If you are a blue badge holder you are advised to enter the cam- pus via the main barrier on Woodhouse Lane to show your badge. You can then be directed to the nearest available disabled parking bay to the area you are visiting. Blue badge holders do not need to pay to park on campus.

4 4 The RSC 2015 Committee

If you have any questions during your time in Leeds please feel free to ask any member of the RSC 2015 Committee. We will be wearing white T-shirts with the RSC 2015 logo on the front.

Colleen Nooney: Committee Chair Keith Newman: Website Manager Nebahat Bozkus: Entertainment and Conference Dinner Wafa Al Mohri: Conference Dinner Aziz Aljuaid: Merchandise Khaled Alqahtani: Merchandise Christopher Pope: On the day Helper Kapil Patel: On the day Helper Anna Gavriel: On the day Helper Zoe Baker: On the day Helper

5 5 The City and University

The City of Leeds In 2011 the estimated population of Leeds made it the third largest city in the UK. The history of Leeds can be traced back to the 5th century when the Kingdom of Elmet was covered by the forest of Loidis, the origin of the name Leeds. In the 17th and 18th cen- turies Leeds became a major centre for the production and trading of wool. Then, during the Industrial Revolution, Leeds developed a major industrial centre; wool was the dom- inant industry but flax, engineering, iron foundries, printing and other industries were important. Today Leeds is ranked as a gamma world city by the Globalisation and World Cities Research Network; and is considered the cultural, financial and commercial heart of the West Yorkshire Urban Area. Leeds has plenty to offer from stylish shopping and decadent dining, to contemporary arts and vibrant nightlife. The City Centre boasts breathtaking architecture such as the and the Victoria Quarter as well as a range of attractions such as enter- tainment, sport, theatre and heritage alongside world-class museums and galleries such as the Royal Armouries museum, Leeds Art Gallery and . Leeds also hosts a variety of events and festivals throughout the year, from Leeds Festival and Light Night to Leeds West Indian Carnival.

6 The University of Leeds The University of Leeds was founded in 1904, but its origins go back to the nine- teenth century with the founding of the Leeds School of Medicine in 1831 and then the Yorkshire College of Science in 1874. Today the University is on a single campus, just a ten minute walk from Leeds city cen- tre. It is an eclectic mix of the old and the new reflecting its 100 year history. In 2012 the University was one of only three institutions reviewed by the Quality Assurance Agency (QAA) to receive a com- mendation, the highest category of praise available, for our enhancement of student learn- ing opportunities. The University ranks amongst the top 100 in the world and is a UK top 10 research institution, with 83% of our research classed as world-leading or interna- tionally excellent. We are one of only 24 UK universities that make up the prestigious research-intensive Russell Group and we have a world-wide reputation for excellence in research, with access to unprecedented global collaborations.

School of Mathematics The School of Mathematics at the Uni- versity of Leeds has an international rep- utation for teaching and research excel- lence. In the results of the recently an- nounced 2014 Research Excellence Frame- work (REF), Mathematics at Leeds was ranked 8th overall and 9th for research im- pact. The quality of research saw an in- crease in grade point average to 3.21 with 85% of work rated as internationally lead- ing (4*) or internationally excellent (3*). In 2012 the Faculty of Mathematics and Phys- ical Sciences was awarded a Silver Athena SWAN Department award, in recognition of the efforts made to advance the careers of females in mathematics and science.

7 6 Accommodation, Meals and Organised Entertainment

Accommodation Accommodation will be in Storm Jameson Court, which is on the main campus (see page 1). Please report to the Parkinson Court when you arrive as your room key will be given to you during registration. You will not be able to check into your room until after 14:00 on Monday 3rd August. Luggage can be left in the Parkinson Court for delegates who do not have time to check in before the plenary talks. A member of the committee will stay with the luggage at all times. After the plenary talks members of the comittee will escort everyone over to Storm Jameson Court for check in. Accommodation is in single en-suite rooms. All bed linen and bath towels will be pro- vided. The halls complex has a 24 hour reception. Ask at reception if you need an iron or a hairdryer. Wireless networks are available throughout the University of Leeds site via eduroam. Included in your accommodation is free access to The Edge; the University’s swimming pool, gym and fitness suite on the main campus (Willow Terrace Road, see page 1). You will find a pass for The Edge in your room. You should check out of your accommodation between 09:00 and 10:00 on Thursday 6th Au- gust. Please return your keys to the reception at Storm Jameson Court. Unreturned keys will result in a £20 charge.

Meals Breakfast and dinner (Tuesday and Wednesday only) will be provided in The Refectory, which is in the Student Union. On Monday dinner will be at The Faversham. On Tuesday and Wednesday there will be a buffet lunch provided in the Parkinson Court, where you will register. Only full delegates are entitled to breakfast; there will be breakfast tokens in your room. Do not forget to take your token to The Refectory or you will not receive a free breakfast.

Organised Entertainment During the week there are a number of organised activities for delegates. Your evening entertainment option for Tuesday 4th of August can be found on the back of your badge, along with your BBQ meal choice. Please do not change activity unless you can find some- one to swap with.

ATASS BBQ: Monday 3rd August

On Monday night there will be a BBQ at The Faversham on campus (Springfield Mount, see page 1). This event is sponsored by ATASS. We will meet outside Storm Jameson Court at 18.45 to walk to The Faversham. There will be food, drink and giant board games to keep you entertained!

8 Pub Crawl/Cinema/Town Hall Tour: Tuesday 4th August

On Tuesday night there is a choice of entertainment; a pub crawl, the cinema or a tour of the Town Hall. For the Town Hall Tour, the group will depart from outside the Refetory at 19.00 on Tuesday night, after dinner. For the other activities we will depart at 19:30. The pub crawl will take place in Leeds City Centre. There will be one group for the pub crawl visiting three bars. There will be committee members leading groups back to the accommodation from the City Centre at approximately 22:30 and 00:00. Should you wish to leave at other time one of the committee members will advise you on how to safely get back to the accommodation. We have also provided some numbers of taxi firms on page 3. Note that taxis will cost between £5 and £10. Don’t forget to bring ID. The cinema is at The Light in the city centre. Please bring your student cards. After the film ends there will be the option to join the bar crawl, or a committee member will guide you back to the accommodation. The Town Hall Tour starts at 19:30 and will finish at 21:00. After the Town Hall Tour there will be the option to join the bar crawl, or a committee member will guide you back to the accommodation.

Conference Dinner: Wednesday 5th August

On Wednesday night there is a three course conference dinner with a Masquerade theme which will be held at The Refectory. The evening attire is formal black tie - NO trainers, jeans or hoodies. There will be food, music and dancing in an event that is always the highlight of any RSC conference.

9 7 Instructions

7.1 For Chairs

• Please arrive at the appropriate seminar room five minutes before the start of your session. Please log on to the bench PC using the login you are provided with in your welcome pack and turn on the projector. Should you have any problems, please contact one of the committee members — there will be a committee member in each session.

• Packs will be left in each seminar room. Do not remove the packs or any of their contents from the seminar room. The packs contain a laser pointer and USB, and ‘5 min’, ‘1 min’ and ‘stop’ signs . If you think something might be missing from the pack, please contact one of the organisers.

• You should clearly introduce yourself and each speaker in turn.

• Talks are 15 minutes in duration with 5 minutes for questions.

• It is very important that we stick to the schedule. Therefore please start the session on time, use the ‘5 min’, ‘1 min’ and ‘stop’ signs to assist the speaker in finishing on time, and make sure that questions are not allowed to delay the rest of the session.

• If a speaker fails to show, please advise the audience to attend a talk in an alternative seminar room. Do not move the next talk forward.

• After each talk, thank the speaker, encourage applause, and open the floor to ques- tions. If no questions are forthcoming, ask one yourself.

7.2 For Speakers

• Each seminar room will contain a computer, data projector, laser pointer and white/black board.

• Arrive at least five minutes before the start of the session, introduce yourself to the chair and load your presentation onto the computer.

• Presentations must be pdf or Powerpoint (ppt or pptx) files. No other format is ac- ceptable. We do not support Macs.

• Talks are strictly fifteen minutes plus five minutes for questions. Anyone going over this time will be asked to stop by the chair.

• Your chair will let you know when you have five minutes and then one minute re- maining for your presentation.

10 7.3 For Displaying a Poster

• Please submit posters upon registration on Monday 3rd August.

• Posters should be A0 or smaller and portrait.

• The poster session will be held in the Parkinson Court at 15:40 on Tuesday 4th August.

• Posters will be erected by conference organisers.

• During the poster session, it is advisable to be near your poster in order to answer questions from interested participants.

• Please ensure that your poster is removed at 17:30 on Tuesday 4th August.

8 Prizes

The three best presentations, as voted for by delegates, will receive a book (from a selection of books) kindly given by Chapman and Hall and Cambridge University Press. In addition, the Royal Statistical Society have kindly offered the three best talks and the best poster the opportunity to present at the Royal Statistical Society Conference in 2015. The prize will be in the form of free registration for the conference for each of the four winners. The registration will include meals and social events but not transport or accom- modation. Important: Please note that the RSS 2015 Conference will be held 7th–10th September at Exeter University. Please be aware that you may be required to be free to travel on the 7th September to Exeter University if you are a winner. The voting slip for the best talks and best poster can be found on page 85, at the back of this booklet. Voting slips must be handed in by 14:00 on Wednesday 5th August. The winners will be announced at the conference dinner on the evening of Wednesday 5th August.

11 9 Plenary Session

9.1 Professor Valerie Isham (UCL) Title: Stochastic Modelling: some personal highlights

Abstract I will give a brief account of some of the personal highlights of my career as a stochastic modeller, describing a few of the research topics on which I have worked and some of the collaborators from a wide range of fields who influenced me along the way.

Valerie has a PhD in Statistics from the University of London (Imperial College) for a the- sis on multidimensional point processes. She joined UCL in 1978 and has been a professor since 1992. She was previously Head of Department from 1996-2002 and again from 2010- 2011. She was President of the Royal Statistical Society in 2011 and 2012, and has repre- sented the RSS on the Council for Mathematical Sciences since 2010. She currently chairs the Scientific Steering Committee of the Isaac Newton Institute for Mathematical Sciences in Cambridge, and the Biometrika Board of Trustees. Valerie’s research interests are applied probability: broadly, the development and ap- plication of stochastic models, including i) the generic development of models for point processes and determination of their properties; ii) development of models for spatial and spatio-temporal processes arising in the physical sciences, and especially in hydrology (eg soil moisture and precipitation); iii) models for applications in the life and medical sciences, focussing particularly on population processes, epidemics and the transmission dynamics of infection and information; iv) models for random networks and for the dy- namics of processes evolving on them.

9.2 Professor Sir David Cox (University of Oxford)

David Cox read Mathematics at Cambridge and then worked at the Royal Aircraft Estab- lishment, Farnborough and then at the Wool Industries Research Association, Leeds before holding a series of academic positions at Cambridge, University of North Carolina, Birk- beck College, London, Imperial College, London and finally at Nuffield College, Oxford, where he was Warden from 1988 until retirement in 1994. He is a Fellow of the Royal Society and a Foreign Associate of the US National Academy of Sciences.

12 1

1http://www.phdcomics.com/comics/archive.php?comicid=1553

13 10 Talks Schedule

10.1 Tuesday 4th August Session 1: 09:20-10:40 (A) EPIDEMIOLOGY AND META-ANALYSIS - Room: LG 10, Chair: Tom Burnett

Time Speaker Title Pg 09:20 Claire Simons Using Meta-analyses to Explore Structural 21 Uncertainty in Cost-effectiveness Models 09:40 Jessica Modelling and for the 22 Stockdale Abakaliki Smallpox Data 10:00 Fabrizio Bayesian Multivariate Network 22 Messina Meta-analysis of Ordered Categorical Data 10:20 Khuneswari Comparing and Model 23 Gopal Pillay Averaging with Multiply-imputed Datasets: an Application in a Child Growth Study

(B) STATISTICAL MODELLING I - Room: LG 19, Chair: Mark Webster

Time Speaker Title Pg 09:20 Pengcheng Tracking the Movement of Hyoid Bone 24 Zeng during Swallowing 09:40 Anyi Zou Opponent Reference Model - a Generalised 24 Bradley-Terry Model 10:00 Ho Hin Henry Investigation of Drug Abuse among 25 Chan Adolescents in UK: Associations and Interdependencies 10:20 Adrian Byrne Defining and Modelling Socioeconomic 25 Status for Individual Trajectories throughout the Life-Course

(C) STATISTICAL METHODS - Room: LG 15, Chair: Olga Egorova

Time Speaker Title Pg 09:20 Serveh Sharifi Log-linear Models in Sparse Contingency 26 Tables 09:40 Hamed Haseli- A Smooth Approximation of L1 Lasso 26 mashhadi Penalty 10:00 Jude Chukwura Differences and Similarities Between Fisher’s 27 Obi Linear Discriminant Analysis (FDA) and Support Vector Machines (SVM) 10:20 Tita Partial Ranking Analysis 27 Vanichbuncha

14 (D) NON-PARAMETRIC METHODS - Room: LG 17, Chair: Nicholas Tawn

Time Speaker Title Pg 09:20 Yordan Raykov Greedy yet Rigorous Approach to 28 Non-parametric Clustering 09:40 Tianmiao Wang Non-parametric Regression of Data with 28 Correlated Noise 10:00 Zexun Chen How Priors of Initial Hyperparameters affect 29 Gaussian Process Regression Model 10:20 Tom Berrett Non-parametric Entropy Estimation 29

Session 2: 11:20-12:40 (A) COMPUTATIONAL BAYESIAN TECHNIQUES - Room: LG 10, Chair: Alan Benson

Time Speaker Title Pg 11:20 Mark Webster Approximate Bayesian Computation for 30 Brownian Motion 11:40 Keith Newman Efficient Bayesian Inference for Large Linear 30 Gaussian Models 12:00 Jamie Owen On the use of Kullback Leibler Divergence as 31 a Metric for Asymptotically Exact Approximate Bayesian Computation 12:20 Nicholas Tawn Parallel Tempering for Multi-modal Problems 31

(B) CLINICAL TRIALS - Room: LG 19, Chair: Svetlana Cherlin

Time Speaker Title Pg 11:20 Amy Designing Randomised Controlled Trials 32 Whitehead Based on Internal Pilot Trials 11:40 Gareth Davies Approaches to Calibration in Survey 32 12:00 Tom Burnett Adaptive Enrichment Designs for Clinical 33 Trials 12:20 Md Anower Missing Continuous Outcomes under 33 Hossain Covariate Dependent Missingness in Cluster Randomised Trials

15 (C) SPATIO-TEMPORAL MODELS WITH ENVIRONMENTAL APPLICATIONS - Room: LG 15, Chair: Abla Azalekor

Time Speaker Title Pg 11:20 Craig Wilkie Spatio-temporal Data Fusion of 34 Remote-sensing and in-lake Chlorophylla Data using Statistical Downscaling 11:40 Qingying Shu Characterisation of Near-Earth Magnetic 35 Field Data for Space Weather Monitoring 12:00 Muhammad Time Series with Mixed Distributions and an 36 Safwan Ibrahim Application to Daily Rainfall 12:20 Guowen Huang Quantication of Overall Air Quality in Space 36 and Time and it’s Effects on Health

(D) ECOLOGICAL STATISTICS - Room: LG 17, Chair: Colleen Nooney

Time Speaker Title Pg 11:20 Richard Modelling Encounters in Population 37 Glennie Abundance Surveys 11:40 Panicha Capture-recapture Estimation when some of 37 Kaskasamkul the Observed Classes are Misclassified 12:00 Samuel Jackson Bayesian Emulation and History Matching 38 with Application to Biological Plant Models 12:20 Alison Parton The ’Step and Turn’ Animal Movement 38 Model in Continuous Time

Session 3: 14:00-15:20 (A) SEQUENTIAL MONTE CARLO METHODS - Room: LG 10, Chair: Min Wang

Time Speaker Title Pg 14:00 Sophie Watson Adaptive Summary Statistic Selection within 39 a Sequential Monte Carlo Algorithm for Approximate Bayesian Computation 14:20 Adam Griffin Simulating Quasi-stationary Distributions on 40 Reducible State Spaces 14:40 Pieralberto A Look-ahead Approach to Sequential Monte 40 Guarniero Carlo 15:00 Rui Vieira State and Parameter Estimation in Dynamic 41 Generalised Linear Models

16 (B) BIOINFORMATICS - Room: LG 19, Chair: Colleen Nooney

Time Speaker Title Pg 14:00 Tusharkanti Identification of Differentially Methylated 41 Ghosh Regions in Bisulphite Sequencing Data using Bayesian Models 14:20 Alasdair Population Structure in Genetic Association 42 McIntosh Studies 14:40 Rosanna Inference of Transmission Trees for 42 Cassidy Epidemics using Whole Genome Sequence Data 15:00 Svetlana Inferring Rooted Phylogenies via 42 Cherlin Non-reversible Substitution Models

(C) STOCHASTIC EPIDEMIC MODELS - Room: LG 15, Chair: Riccardo Rastelli

Time Speaker Title Pg 14:00 Umar Mallam Inference for Stochastic SIR Household 43 Abdulkarim Epidemic Model with Misclassification 14:20 Benjamin Davis The Impact of Degree Distribution on 44 Network Epidemic Models with Casual Contacts 14:40 Abla Azalekor Stability Analysis of a Model of Wireless 44 Communication Network with Spatial Between Users 15:00 Chibuzor Analysis of Household-based Models of 45 Nnanatu Endemic Diseases

10.2 Wednesday 5th August Session 4: 09:40-11:00 (A) STATISTICAL MODELS WITH MEDICAL APPLICATIONS - Room: LG 10 , Chair: Nebahat Bozkus

Time Speaker Title Pg 09:40 Eoin Gray How well do Current Lung Cancer Risk 45 Prediction Models Perform? 10:00 Rebecca An Investigation of the Factors which 46 Simpson Influence Children with Asthma Having Asthma Exacerbation Around the Start of the New School Year 10:20 Taghreed Jawa Modelling Rates of Methicillin-resistant 47 Staphylococcus Aureus Bacteraemia and Detecting Changes in Trend 10:40 Ernest Melanoma Survival Models using Integrated 47 Mangantig Clinical and Genomic Data

17 (B) MULTIVARIATE ANALYSIS - Room: LG 19, Chair: Jude Chukwura Obi

Time Speaker Title Pg 09:40 A’yunin Sofro Convolved Gaussian Process Regression 48 Model for Multivariate Count Data 10:00 Eirini Multivariate Centiles via Convex Sets and 48 Koutoumanou their Extension via Copula Models 10:20 Angelo Moretti Small Area Estimation Methods for 49 Multidimensional Poverty and Well-being Indicators 10:40 Anusua Singh A Statistical Data-based Approach to 50 Roy Identifying Gameplay Behaviours in Online Freemium Games

(C) REGRESSION MODELS - Room: LG 15, Chair: Adrian Bryne

Time Speaker Title Pg 09:40 Aziz Aljuaid Simultaneous Bayesian Box-Cox Quantile 51 Regression 10:00 Maria Toomik Sparse Unmixing of Hyperspectral Data 51 10:20 Hadeel Discrete Weibull Regression Model for Count 52 Kalktawi Data 10:40 Xi Liu Bayesian Quantile Regression 52

(D) STATISTICAL MODELLING II - Room: LG 17, Chair: Gareth Davies

Time Speaker Title Pg 09:40 Daniel Simultaneous Confidence Sets for Several 53 Tompsett Effective Doses 10:00 Wenyan Hao The Quantum Language for Finance via the 53 Generator Approach 10:20 Irene Marinas Modelling the Shape of Emotions 54 10:40 Anna Heath Efficient High-Dimensional Gaussian Process 54 Regression to Calculate the Expected Value of Perfect Partial Information in Health-Economic Evaluations

18 Session 5: 11:40-12:40 (A) SOCIAL NETWORKS - Room: LG 10, Chair: Abla Azalekor

Time Speaker Title Pg 11:40 Vasiliki Koutra Optimal Block Designs on Social Networks 55 12:00 Cunyi Wang House Substitutability in Glasgow 55 12:20 Riccardo Properties of Latent Variable Network 56 Rastelli Models

(B) BIAS REDUCTION TECHNIQUES - Room: LG 19, Chair: Vincent Mabikwa

Time Speaker Title Pg 11:40 Tayfun Terzi Proposing a New Measure for Detecting 56 (Latent Variable Model Aberrant) Semi-plausible Response Patterns 12:00 Olga Egorova Optimality Criteria for Multiple Objectives 57 Accounting for Model Misspecification 12:20 Sophia Improving Estimation in Generalised Linear 57 Kyriakou Mixed Effect Models

(C) GRAPHICAL MODELS - Room: LG 15, Chair: Pieralberto Guarniero

Time Speaker Title Pg 11:40 Alex Gibberd Estimating Dynamic Graphical Models from 58 Multivariate Time Series 12:00 Craig Analysis of Linguistic Behaviour using 58 Alexander Graphical Models 12:20 Adria Caballe Graphical Models to find Sparse Networks in 59 Mestres Gene Expression Data

19 11 Sponsors’ Talks: 14:00 - 15:30

11.1 Wednesday 5th August SESSION 1 - Room: LG 10, Chair: Amy Whitehead

Time Company (Speaker) Title Pg 14:00 Roche (Alun Roche Biostatistics Doing now what 60 Bedding) patients need next 14:30 ATASS Sports (Tim Game, set and maths: Predicting sports 60 Paulden) results through statistical modelling 15:00 Royal Statistical The RSS supporting your statistical 60 Society (Sarah future Nolan)

SESSION 2 - Room: LG 19, Chair: Gareth Davies

Time Company (Speaker) Title Pg 14:00 JMP (Ian Cox) Advances in the Design of Statistical 61 Experiments 14:30 Capital One (Dan Statistics at Capital One UK 61 Kollett) 15:00 Shell (Tim Park) Statistics in the Energy Industry 61

20 12 Talk Abstracts by Session

12.1 Tuesday 4th August 12.1.1 Session 1: 09:20-10:40

(A) EPIDEMIOLOGY AND META-ANALYSIS - Room: LG 10, Chair: Tom Burnett

Talk Time: 09:20-09:40

USING META-ANALYSES TO EXPLORE STRUCTURAL UNCERTAINTY IN COST-EFFECTIVENESS MODELS Claire Simons and Chris Jackson MRC Biostatistics Unit, Cambridge

Keywords: Cost-effectiveness, meta-analysis structural uncertainty

Cost-effectiveness analysis is the comparison of alternative treatment options in terms of their costs and consequences with the purpose of providing decision-makers with in- formation to help make resource allocation decisions (Drummond (2005), Morris (2007)). It is a comparative method and looks at the incremental costs and benefits of two or more treatments for a specific disease. Frequently, decision analytic models are constructed to observe costs and benefits of interventions over a lifetime. There is often uncertainty with the inputs and structure of these models. This work looks at one area of structural uncertainty in cost-effectiveness models pa- tients’ compliance to an intervention using a case study of treatment for patients with mild- moderate sleep apnoea. There is evidence the treatment decision is sensitive to assump- tions on compliance with two interventions (Mandibular Advancement Devices (MADs) and Continuous Positive Airway Pressure (CPAP)) for treating sleep-apnoea. However, a literature search has found a lack of evidence on long-term compliance for these devices. Meta-analyses of the available information for each intervention were carried out, con- sisting of the number and denominator of patients remaining compliant at one or more times, for multiple studies. Ceasing use of the device was considered as a time-to-event outcome, under the assumption that people who stop using their device do not restart at a later point. A number of models were fitted to the binomial outcome, assuming the times to non-compliance for all individuals in the same study were generated from the same time-to-event model. The impact of the results of the meta-analyses models on the correct treatment decision has been explored. The results have also been used to help prioritise future research into compliance for both MADs and CPAP through an expert elicitation exercise.

21 Talk Time: 09:40-10:00

MODELLINGAND BAYESIAN INFERENCEFORTHE ABAKALIKI SMALLPOX DATA Jessica Stockdale, Philip ONeill and Theodore Kypraios University of Nottingham

In 1967, an outbreak of Smallpox occurred in the Nigerian town of Abakaliki. The vast majority of cases were members of the Faith Tabernacle Church (FTC), a religious organisation whose members refused vaccination. The outbreak was recorded in detail in a World Health Organisation report, with information on not only the time series of case detections but also their place of dwelling, vaccination status, and FTC membership. It has inherent historical interest as it occurred during the WHO Smallpox eradication programme initiated in 1959. Although Smallpox was declared eradicated in 1979, it has regained attention as a potential bioterrorism weapon and continues to be of interest due to concerns about its re-emergence. The Abakaliki data have been considered by numerous authors, although in almost all cases the population structure aspects are ignored. The exception is Eichner and Dietz (American Journal of Epidemiology, 2003), although their analysis relies on an approx- imate likelihood. We seek to model the smallpox outbreak using Eichner and Dietz’s stochastic transmission model but with the correct likelihood, in order to infer the pa- rameters in a Bayesian framework using MCMC methods and data augmentation. Results include estimates of transmission rates, vaccine efficacy and the basic reproduction num- ber.

Talk Time: 10:00-10:20

BAYESIAN MULTIVARIATE NETWORK META-ANALYSIS OF ORDERED CATEGORICAL DATA Fabrizio Messina University of Sheffield

Keywords: Medical Statistics

Ordered categorical data arise in many disease areas with examples being the classifi- cation of the severity of a patients’ disease as one of None, Moderate and Good for a EU- LAR (European League Against Rheumatism) response and one of None, ACR20, ACR50 and ACR70 for an ACR (American College of Rheumatology) response for patients with rheumatoid arthritis. (Network) meta-analysis is used to synthesise the evidence about treatment effects from a collection of trial. It is usually implemented separately on one or more outcome measures (i.e. a univariate analysis). Multivariate (network) meta-analyses incorporate inter-dependencies between outcome measures. The aim of this PhD is to research the benefits of accounting for dependence between outcome measures and the consequences of ignoring it in the context of aggregate ordered categorical data. Issues will include methods for incorporating correlation within studies, the benefit of borrow- ing strength for studies that do not provide evidence on all outcome measures, assessing consistency between direct and indirect evidence, and publication bias in a multivariate

22 setting. A particular aspect of the PhD will be the objective of being able to estimate the proportion of patients in each category in addition to an estimate of treatment effect.

Talk Time: 10:20-10:40

COMPARING MODEL SELECTIONAND MODEL AVERAGINGWITH MULTIPLY-IMPUTED DATASETS: AN APPLICATION IN A CHILD GROWTH STUDY Khuneswari Gopal Pillay, John H. McColl and Charlotte Wright University of Glasgow

Keywords: Model selection, model averaging, M-STACK, MI, highly correlated, prediction, AICc, inclusive strategy, MICE

Model selection and model averaging become more complicated in the presence of missing data and there are no agreed guidelines for how best to carry out these model- building procedures with multiply-imputed datasets. This paper compares the perfor- mance of model selection and model averaging in the context of a UK longitudinal study (the Gateshead Millennium Study) of 1029 children, where the primary purpose of the analysis is to predict the future weight, or weight standard deviation score (SDS), of in- dividual children. The response is weight (or weight SDS) measured at 7-8 years (42% missing). The covariates are weights (or weight SDS) measured at five time points from birth to one year (17% missing on average). One auxiliary variable, gestational age at birth, is also available. Imputation was carried out using chained equations (via the “norm.nob” method in the R package MICE). The paper discusses three strategies (inclusive, restrictive and non-overlapping) for building imputation and prediction models. In each case, the best multiple regression model for prediction was identified using the model selection cri- terion AICc (corrected AIC). A modified version of STACK (M-STACK) was used for model selection across multiply-imputed datasets. Non-Bayesian model averaging was also ex- plored using AICc based weights. The performance of M-STACK and model averaging was compared using mean square error of prediction (MSE (P)) in a 10% cross-validation test. M-STACK provided better prediction of response values than model averaging. The inclusive strategy for building imputation and prediction models was better than the re- strictive and non-overlapping strategies in this study. The presence of highly correlated covariates and response is believed to have led to better prediction. It is concluded that M-STACK should be used with an inclusive model-building strategy when highly corre- lated covariates are available to make predictions in the presence of moderate amounts of missing data.

23 (B) STATISTICAL MODELLING I - Room: LG 19, Chair: Mark Webster

Talk Time: 09:20-09:40

TRACKINGTHE MOVEMENTOF HYOID BONEDURING SWALLOWING Pengcheng Zeng Newcastle University

It is a brand-new approach to use the trajectories of several bones during swallowing to assist diagnoses, i.e. estimate or predict the stroke patients’ recovery level. The first step is the data acquisition, tracking the movement of hyoid bone and other bones in an x-ray video of swallowing. While identifying the position of some bones in some frames, there are some difficulties like unclear visual boundary, the part overlapped with other objects, etc. In this presentation, we first use template matching error method to address these problems, followed by calibration. Finally, we would briefly introduce the registration and modelling.

Talk Time: 09:40-10:00

OPPONENT REFERENCE MODEL - A GENERALISED BRADLEY-TERRY MODEL. Anyi Zou University of Warwick

This talk is concerned with an extension of the Bradley-Terry model for pairwise com- parisons to situations where items involved in the comparisons are influential to, and af- fected by their opponents. A new model, the Opponent Reference Model, is proposed and applied to model the outcomes of sporting contests with opponent-varying abilities. A new set of parameters, type parameters, is introduced to measure whether a team tends to give unexpected results as a “surpriser”, to follow the predictions from the Bradley-Terry model as a “super-predictive” team, or not to be in uenced by opponents as a “BT-perfect” team. It is assumed that teams’ abilities depend on their type parameters and their oppo- nents’ abilities in the proposed model. The results of the 2009-2013 seasons of the English Premier League (EPL) are analysed using the Opponent Reference Model. The results sug- gest that the proposed model yields a reasonable representation of actual situations. The Opponent Reference Model presented in this report can also be applied to other pairwise comparison situations such as citation exchange between scientific journals.

24 Talk Time: 10:00-10:20

INVESTIGATION OF DRUG ABUSEAMONG ADOLESCENTSIN UK: ASSOCIATIONS AND INTERDEPENDENCIES Ho Hin Henry Chan, Gareth Ridall and Deborah Costain Lancaster University

The primary aim of this study is to construct statistical models to explain the relation- ship between smoking, drinking and socio-demographic factors and drug-abuse among adolescents and also drug-drug interrelations. The main impediment of this study is that a large proportion of missing values in the “Smoking, Drinking and Drug Use among Young People in in 2010”, and also a number of drugs exposures are rare. Previ- ous studies reported the survey findings. However, the missingness was not appropriately addressed (MCAR assumption). In our research, we applied both hot-deck imputation and multiple imputation by chained, and adopted backward elimination by Rubins rule, using a high-dimensional model formulation (13 drug-trying interrelations with 24 smoking, drinking and socio-demographic covariates), and adopted null-case im- putation to address the rare-case problem. These approaches explain the associations in a more comprehensive way and with a more comprehensive data set, reflecting drug use associations in a more objective way and more rigorous approach to the missing data prob- lem.

Talk Time: 10:20-10:40

DEFINING AND MODELLING SOCIOECONOMIC STATUS FOR INDIVIDUAL TRAJECTORIESTHROUGHOUTTHE LIFE-COURSE Adrian Byrne, Mark Tranmer and Natalie Shlomo University of Manchester

Introduction: Research literature which explicitly examines the relationship between parental socioeconomic status (SES) and adult (child of parents) SES and treats life-course SES as the main exposure of interest using birth cohort longitudinal data is limited. This line of research is important as it investigates the degree of accord between the parents and adult childs SES and in doing so provides an opportunity to rigorously evaluate the implementation of life-course SES measurement empirically. This study wishes to conduct an investigation of secondary analysis of existing UK longitudinal birth cohort survey data to enhance understanding of the origins and outcomes of SES throughout the life-course. Defining SES: Using well established domains, which are commonly captured in the surveys under consideration, to define SES, this unobserved variable will be treated as a formatively causal composite response to observed ordinal variables of social capital; namely economic, social and cultural, and will be generated by ordinal Principal Compo- nent Analysis (Kolenikov and Angeles 2009). Modelling SES: SES will be modelled throughout the life-course using linear quantile mixed effects models (Geraci 2014) as they extend regression for the mean to the analysis of the entire conditional distribution of SES which is appropriate when dealing with non- normal outcome variables and broadens our substantive understanding of life-course SES beyond the conditional mean. Contextual analysis Contextual model covariates include

25 socioeconomic background, gender and region of residence. Inter and intra-generational SES growth curves will be compared across different UK birth cohort survey studies. Model trends will be assessed against historic UK economic indicator trends such as Gross Domestic Product and the total money supply. These comparisons are novel given that SES is normally used to analyse person-specific health, crime and education indicators whereas in this instance the nationally representative life-course SES trends will be con- trasted against the fortunes of the nation where the cohort members reside.

(C) STATISTICAL METHODS - Room: LG 15, Chair: Olga Egorova Talk Time: 09:20-09:40

LOG-LINEAR MODELSIN SPARSE CONTINGENCY TABLES Serveh Sharifi, Michail Papathomas and Ruth King University of St Andrews

Keywords: Log-linear models, Identifiability, Contingency Tables Log-linear models are usually fitted to data with categorical observations arranged in a contingency table. When sampling zeros occur in a contingency table, identifability prob- lems may arise and some model parameters may become inestimable, while numerical methods in software usually report estimates for them. The main aim of this work is to de- velop a general methodology to identify which model parameters or linear combinations of parameters are estimable in the presence of one or more sampling zeros in the table. The method is described for saturated 2m and 3m models, although the procedure is applicable to more general settings. After the inestimable parameters are eliminated from the set of possible parameters, a model selection method can be applied to search for the best model, only based on the estimable parameters set. The process is explained by an example.

Talk Time: 09:40-10:00

ASMOOTH APPROXIMATION OF L1 LASSO PENALTY Hamed Haselimashhadi Brunel University

Nowadays, L1 penalized likelihood has absorbed a high amount of consideration due to its simplicity and well developed theoretical properties. This method is known as a reli- able method in order to apply in a broad range of applications including high-dimensional cases. On the other hand, L1 penalized likelihood, precisely lasso like regularizations, suf- fer over-sparsity when the number of observations is too low. In this paper we address a new differentiable approximation of lasso that can produce the same results as lasso, ridge and elastic net. We prove the theoretical properties of proposed penalty and also study its computation complexity. Due to differentiability, proposed method can be implemented by means of the majority of convex optimization methods in literature. That is a more ac- curate estimation of parameters compare to lasso in situations where true coefficients are close to zero. Simulation study as well as flexibility of the method show that the proposed approach is as good as lasso and ridge and can be used in both situations.

26 Talk Time: 10:00-10:20

DIFFERENCES AND SIMILARITIES BETWEEN FISHER’S LINEAR DISCRIMINANT ANALYSIS (FDA)AND SUPPORT VECTOR MACHINES (SVM) Jude Chukwura Obi, John T. Kent and Peter Thwaites University of Leeds

Keywords: Fisher’s Linear Discriminant Analysis, Support Vector Machines

The relative merits of FDA over SVM or vice versa, remain a bone of contention among statisticians, and the machine learning community. This is largely because the FDA is due to Fisher, R. A. (1936), a statistician, whereas SVM is due to Vanik V. et al, (1995), from the machine learning community. We have used different handmade datasets that depict different real life classification problems to carry out the study. Our aim is to clearly draw out the differences and similarities between the two methods, and to further highlight features that make each classifier more appropriate for a given classification problem.

Talk Time: 10:20-10:40

PARTIAL RANKING ANALYSIS Tita Vanichbuncha and Martin S. Ridout University of Kent

Many models have been proposed in order to determine the preference of items when given a partial data set. Among them, the Bradley-Terry model is the most popular and is considered a standard model for analyzing pairwise comparison. However, the inability to be used on multiple comparisons is a big limitation of this model. Another popular model is Placket-Luce model. In this work, we investingate the rank-breaking technique to break down multiple comparison into pair comparison, thus Bradley-Terry model can be used. We compare the existing package in R programming for Plackett-Luce model (StatRank package) and Bradley-Terry model (BradleyTerry2 package) with our codes.

27 (D) NON-PARAMETRIC METHODS - Room: LG 17, Chair: Nicholas Tawn Talk Time: 09:20-09:40

GREEDYYET RIGOROUS APPROACHTO NON-PARAMETRIC CLUSTERING Yordan Raykov, Alexis Boukouvalas and Max A. Little Aston University

Keywords: Dirichlet Process, Clustering, Fast inference

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparamet- ric . However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibb’s sampling are re- quired. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. Here, we develop simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithms for DPMs. This algorithm is as simple as K-means clustering, per- forms in experiments as well as Gibbs sampling, while requiring only a fraction of the computational effort. Unlike related small variance asymptotics, our algorithm is non- degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a nondegenerate closed-form likelihood which enables standard tools such as cross- validation to be used. This is a well-posed approximation to the MAP solution of the prob- abilistic DPM model. The simplicity of our approach allows us to easily derive inference schemes for more complex, composite models such as the Hierarchical Dirichlet process and HDP-HMM

Talk Time: 09:40-10:00

NON-PARAMETRIC REGRESSIONOF DATA WITH CORRELATED NOISE Tianmiao Wang University of Bristol

Keywords: Taut-String, Multi-resolution Criterion, Correlated noise, Covariance structure

I consider the problem of nonparametric regression with the aim to find a simple ap- proximation that fits given noisy data. The majority of methods in this field have been developed under the assumption that the noise is additive and can be adequately mod- elled as white noise. However, if the noise is correlated, they do no longer work well. In this talk I focus on the taut string method by Davies and Kovac (2001). I adapt their multi-resolution criterion for data with stationary correlated noise. This requires level- dependent thresholds that depend on the variances of the multi-resolution coefficients on each level and I present an automatic method for estimating these by considering the co- variance structure of the noise. In a simulation study and a real data application I compare the performance of the new method with kernel and wavelet based methods for corre- lated data. The results indicate that the new method not only achieves a smaller mean squared errors on several testbeds, but also provides simpler approximations that better approximate the true features of the observed data.

28 Talk Time: 10:00-10:20

HOW PRIORSOF INITIAL HYPERPARAMETERS AFFECT GAUSSIAN PROCESS REGRESSION MODEL Zexun Chen, Alexander Gorban and Bo Wang University of Leicester Keywords: Gaussian process regression, initials values, hyperparameters, local optima, maximum marginal likelihood, non-informative prior, data-dominated prior Over the last few decades, Gaussian Process Regression (GPR) has been a widely-used method to resolve some non-linear regression problems due to many desirable properties, such as the existence of analytical form, the ease of obtaining and expressing uncertainty in predictions, the ability to capture a wide variety of behaviour through a simple parame- terization, and a natural Bayesian interpretation. However, GPR is actually a kernel-based nonparametric model, thus it relies on kernel selection over some undetermined hyper- parameters. When a kernel is selected, the next task is to learn the undetermined hy- perparameters from the training data. In the mainstream studies, the estimation of these hyperparameters can be achieved by maximum marginal likelihood by means of conjugate gradient. Unfortunately, marginal likelihood functions are not usually convex with respect to the hyperparameters, which means there may be local optima whereas the global is re- quired. A common strategy adopted by most researchers using Gaussian processes is to ran- domly select a few initial hyperparameters using their expert opinions and experiences. For example, Wilson (2014) used a specific prior distribution based both on data and his experience for the initialization of hyperparameters in Spectral Mixture kernel. Therefore, to study how the priors of these initial values affect the performance of GPR is essential. In this work we study the effect of the initial values of hyperparameters for some specific kernels on the performance of GPR, using different prior distributions, both vague and data-dominated.

Talk Time: 10:20-10:40

NON-PARAMETRIC ENTROPY ESTIMATION Tom Berrett Differential entropy plays a fundamental role in information theory. It is often thought of as the average information content, or unpredictability, of a random vector. It is closely related to mutual information, which is measure of dependence between two random vec- tors that is zero if and only if they are independent. As a consequence of this and other interesting properties of entropy, an estimate of the entropy is useful in many applications, such as constructing hypothesis tests for goodness of fit or independence between two ran- dom vectors, feature selection and graphical modelling. Many nonparametric estimators of the entropy from an i.i.d. sample have been proposed, often based on density estimation via kernel methods or histograms, though there are few theoretical results in the literature. In this talk I will focus on a generalisation of the estimator introduced by Kozachenko and Leonenko (1987) that is based on the nearest neighbour distances of the sample, and will present some new results about its performance.

29 12.1.2 Session 2: 11:20-12:40

(A) COMPUTATIONAL BAYESIAN TECHNIQUES - Room: LG 10, Chair: Alan Benson Talk Time: 11:20-11:40

APPROXIMATE BAYESIAN COMPUTATION FOR BROWNIAN MOTION Mark Webster, Jochen Voss and Stuart Barber University of Leeds

Keywords: Monte Carlo Methods, Approximate Bayesian Computation, ABC

Approximate Bayesian Computation is a naive Monte Carlo method, used when the likelihood is not known or tractable. A basic version is to generate proposed values for the parameters, use those parameters to generate data, then accept each proposal if the data generated from it are close enough to those observed. For a one-dimensional parameter, n proposals – or accepted proposals – and a sufficient summary statistic with dimension q, the mean square error for ABC estimate of the posterior expectation has been shown to have convergence rate n−4/q+4. This raises the question of whether there is an infinite- dimensional problem where the convergence does not vanish. We look at the case where the observation is a Brownian motion over [0, 1] with an unknown linear trend.

Talk Time: 11:40-12:00

EFFICIENT BAYESIAN INFERENCEFOR LARGE LINEAR GAUSSIAN MODELS Keith Newman Newcastle University

Keywords: Markov chain Monte Carlo, linear Gaussian models, block updating

A linear Gaussian structure can be used on a wide range of statistical models. When the model is analytically intractable, inference is often performed using Markov Chain Monte Carlo (MCMC) methods such as the Gibbs sampler. In the case where there are large cor- relations between the variables, especially when there are many of these variables, the MCMC may perform poorly. Fortunately, the performance can be improved by reparam- eterising the model, or by the use of blocking (or grouping) methods. Blocking comes with additional computational overheads, stemming from the expensive matrix operations that underpin the algorithms. We can exploit any sparsity in the model’s Gaussian structure by working in terms of the precision matrix (the inverse of the covariance matrix) of the Multivariate Normal distribution, to negate the additional expenses of blocking methods. This talk discusses the methods and necessary considerations for efficiently performing blocking methods, with application to an ongoing investigation to identify genetic inter- actions between telomere defects and the deletion of non-essential genes in Saccharomyces cerevisiae (Brewer’s yeast) chromosomes. Kuo & Mallick variable selection methods are incorporated in to the model to help identify interacting gene deletions, but also provide further opportunities to reduce the computational penalties created by blocking methods. A comparison between the efficiency of these blocking methods and an equivalent, simpler Gibbs sampler scheme is included.

30 Talk Time: 12:00-12:20

ONTHEUSEOF KULLBACK LEIBLER DIVERGENCEASA METRICFOR ASYMPTOTICALLY EXACT APPROXIMATE BAYESIAN COMPUTATION Jamie Owen Newcastle University

Keywords: Approximate Bayesian computation, likelihood free, stochastic kinetic models, Markov processes

With the development of technologies such as flow cytometry, obtaining large amounts of data at the cellular level has become common place. Stochastic kinetic models are a particular type of Markov process, widely used to represent the dynamics of complex bi- ological processes such as intra-cellular reactions. Unfortunately models of this type pose a difficult inference problem due to the unavailability of the for all but the most trivial reaction networks, rendering standard Bayesian techniques of little use. The development of approximate Bayesian computation (ABC) techniques has provided avenues for posterior learning of model parameter,s in scenarios where no analytic likeli- hood is available. One of the difficulties that arises in employing these techniques is the choice of a metric, a function by which to measure distance between observed and simu- lated data. At present, metrics are typically constructed on sets of summary statistics of the data, inducing a posterior distribution that is different from the exact target of interest. We motivate the use of f-divergence, in particular Kullback-Leibler divergence, as a metric in ABC, highlighting its connection with exact posterior inference.

Talk Time: 12:20-12:40

PARALLEL TEMPERINGFOR MULTI-MODAL PROBLEMS Nicholas Tawn and Gareth Roberts University of Warwick

Keywords: Parallel Tempering, Simulated Tempering, MCMC Bayesian inference typically requires MCMC methods to evaluate samples from the posterior, however it is important that the MCMC procedure employed samples correctly from the distribution for the sample estimates to be valid. For instance, if the posterior distribution was multi-modal then by running an MCMC procedure for only a finite num- ber of runs it is possible that the chain can become trapped and not explore the entire state space. Well known algorithms to aid mixing in multimodal settings are the Paral- lel and Simulated tempering algorithms. The talk will introduce these and demonstrate their powers and weaknesses for sampling in such situations, and then describe the way in which these algorithms can be setup to achieve optimal efficiency when sampling. The key feature of these algorithms is the ability to share information from the mixing in the hotter states to aid the mixing of the chain in the hotter states. This talk also presents a new approach based on reparameterisation that could potentially enhance the algorithms’ efficiency. Empirical evidence is illustrated to show that this new algorithm appears to vastly enhance the trade of mixing information between temperature levels when targeting certain posterior distributions.

31 (B) CLINICAL TRIALS - Room: LG 19, Chair: Svetlana Cherlin

Talk Time: 11:20-11:40

DESIGNING RANDOMISED CONTROLLED TRIALS BASEDON INTERNAL PILOT TRIALS Amy Whitehead University of Sheffield

The sample size justification is an important consideration when planning a clinical trial not only for the main trial but also for any preliminary pilot trial. When the outcome is a continuous variable, part of the calculation of a sample size requires an accurate esti- mation of the variance of the outcome measure. If the variance is inaccurately estimated it can impact on the power of the resulting trial. If the estimate is too high more people than required will be included in the main trial. If the estimate is too low too few participants will be recruited to answer the research question. An internal pilot trial can be used to get an estimate of the variance, which the inves- tigators anticipate will be observed in the full main trial. An internal pilot allows estima- tion of the variance from the actual trial data, using an initial proportion of the main trial participants. After the internal pilot period of the trial the sample size requirement is re- calculated based on this new estimate of the variance and the sample size of the full trial is adjusted appropriately. Pilot trials are by nature small and can imprecisely predict the variance. Methods are available to adjust the variance estimate from a pilot trial to allow for this imprecision. The size of the adjustment depends on the size of the pilot trial. Increasing the size of the pilot trial can lead to a reduction in the sample size required in the full trial after the adjustment. However, eventually the increase in the pilot trial sample size will not be outweighed by the subsequent decrease in the sample size for the full trial. This presentation will look at how using an internal pilot trial affects the power of a trial and what sample size is necessary to inform the sample size recalculation.

Talk Time: 11:40-12:00

APPROACHESTO CALIBRATION IN SURVEY SAMPLING Gareth Davies, Jonathan Gillard, Anatoly Zhigljavsky Cardiff University

Keywords: Calibration, survey sampling, optimization, official statistics

The technique of calibrating sample surveys seeks to assign optimal weights to mem- bers of a sample in order to form good estimates of population statistics. This talk will outline various calibration methods adopted by statistical offices throughout the world. The calibration procedure can be motivated as a constrained optimization problem. The constraints may lead to negative or extreme weights. Several approaches to overcome these issues will be motivated, including the use of range restrictions, alternative calibra- tion functions, and relaxing the constraints. The talk will conclude with an application to estimating the UK unemployment rate from the Labour Force Survey.

32 Talk Time: 12:00-12:20

ADAPTIVE ENRICHMENT DESIGNSFOR CLINICAL TRIALS Tom Burnett University of Bath

Keywords: Clinical trials, Bayes optimal decisions, Utility functions

Randomised control trials are a well established method for the assessment of new treatments. Typically trials are conducted in the population for which the treatment is intended. However, it may be possible to identify sub-populations where we expect a larger benefit. With promising sub-populations defined the question is how to make best use of them. Adaptive enrichment provides the option to test either multiple hypotheses or a single hypothesis. For example, we may test in both the full and sub-populations or only in the sub-population. Initially patients are recruited from the entire population, then during an interim analysis it is decided which of the hypotheses will be tested, adapting the recruit- ment strategy accordingly. Care must be taken to ensure protection of the type 1 error rate due to having multiple hypotheses and stages to the trial. During the interim analysis we aim to make the best possible decision, by defining a gain function we capture the importance of the possible outcomes. We want a decision rule that maximises the expected gain given our prior distribution for the true treatment effects and first stage data. The Bayes optimal decision rule achieves this. The gain function can be of further use. For any particular trial we may find the ex- pected gain given our prior. This will allow us to make comparisons with fixed trial meth- ods that may be used in place of an adaptive enrichment trial, showing the circumstances where adaptive enrichment will provide an overall benefit.

Talk Time: 12:20-12:40

MISSING CONTINUOUS OUTCOMESUNDER COVARIATE DEPENDENT MISSINGNESSIN CLUSTER RANDOMISED TRIALS Md Anower Hossain London School of Hygiene and Tropical Medicine

Keywords: Cluster randomised trials, covariate dependent missingness

Attrition is a common occurrence, which leads to missing data, in cluster randomised trials (CRTs), which are characterised by randomisation at the cluster level to intervention group and control group. We investigate the performance of unadjusted and standard ad- justed cluster-level analyses, under the assumption of covariate dependent missingness (CDM), in terms of bias, empirical standard error and coverage probability, using records with observed outcomes. The conditions are derived under which both these analyses methods are unbiased. A simulation study was conducted where the number of clusters in each group, the cluster size, the interclass correlation coefficient and the correlation be- tween outcome and baseline covariate were allowed to vary in the data generation process. The missing values were imposed in the data according to a logistic regression model. Both

33 methods are unbiased when the effect of baseline covariate on outcome is the same in two groups and the missingness mechanism is the same in two groups. When the missingness mechanism is different, the unadjusted analysis is biased but the adjusted analysis is unbi- ased if the effect of baseline covariate on outcome is the same in two groups. Both methods are biased when the effect of baseline covariate on outcome is different in the two groups. We developed an estimator which is unbiased even when the missingness mechanism is different in the two groups and the effect of baseline covariate on outcome is different in the two groups. The adjusted analysis with proposed estimator gives unbiased estimate of intervention effect, when the effect of baseline covariate on outcome is different in two groups and when the missingness mechanism is different in two groups, which is quite common in practice.

(C) SPATIO-TEMPORAL MODELS WITH ENVIRONMENTAL APPLICATIONS - Room: LG 15, Chair: Abla Azalekor Talk Time: 11:20-11:40

SPATIO-TEMPORAL DATA FUSIONOF REMOTE-SENSINGAND IN-LAKE CHLOROPHYLLa DATA USING STATISTICAL DOWNSCALING Craig J. Wilkie1, Marian Scott1, Claire Miller1, Andrew N. Tyler2, Peter D. Hunter2 and Evangelos Spyrakos2 1 University of Glasgow, 2 University of Stirling

Chlorophylla is a green pigment used as an indirect measure of lake water quality. Its strong absorption of blue and red light allows for quantification through satellite images, providing better spatial coverage than traditional in-lake point samples. However, grid- cell scale imagery must be calibrated using in-lake point samples, presenting a change- of-support problem. This talk will present a method of statistical downscaling, namely a Bayesian spatially- and temporally-varying coefficient regression, which assimilates remotely- sensed and in-lake data, resulting in a fully calibrated spatial map of chlorophylla at each timepoint. The model is applied to a case study dataset from Lake Balaton, Hungary. Based on Berrocal et al (2010), model (1) is a Bayesian spatially- and temporally-varying coefficient regression:

Y (s, t) = α(s, t) + β(s, t)x(B, t) + ε(s, t) (1) where ε(s, t) ind∼ N(0, σ2), Y (s, t) is the response, in-situ data at location s at time t, and x(B, t) is the explanatory variable, remotely-sensed data at grid cell B containing s at time t. Spatially-varying coefficients are:

α(s, t) ∼ NN (µα(t), Σα)

β(s, t) ∼ NN (µβ(t), Σβ) and temporally-varying coefficients are:

µα(t) ∼ NT (0, Πα)

µβ(t) ∼ NT (0, Πβ)

34 where N and T are the numbers of spatial locations and timepoints in the data, respec- tively. Spatial and temporal covariance matrices are:

 1  Σα = exp (− (φαD)) τα  1  Σβ = exp (− (φβD)) τβ  1  Πα = exp (− (ψαT)) λα  1  Πβ = exp (− (ψβT)) λβ where τα, τβ, λα and λβ are the spatial and temporal precisions, φα, φβ, ψα and ψβ are the spatial and temporal decay parameters, and D and T are matrices of distances and times between observations, respectively. Priors are chosen to be noninformative:

2 γ(= 1/σ ), τα, τβ, λα, λβ ∼ Gamma(0.001, 0.001)

φα, φβ, ψα, ψβ ∼ Unif(0.05, 20)

The model was fitted in JAGS via R to a dataset with 7616 cells of remotely-sensed data and 9 in-situ locations, resulting in a fully calibrated spatial map of chlorophylla for each timepoint, with associated uncertainty measures, assimilating in-situ and remotely-sensed data. The model performs well in comparison to competing models for this application. The application here is novel in that the model is used to calibrate chlorophylla data.

Talk Time: 11:40-12:00

CHARACTERISATION OF NEAR-EARTH MAGNETIC FIELD DATA FOR SPACE WEATHER MONITORING Qingying Shu1, Lyndsay Fletcher1, Matteo Ceriotti1, Marian Scott1 and Peter Craigmile2 1 University of Glasgow, 2 The Ohio State University

Space weather monitoring and early storm detection can be used to mitigate risk in sen- sitive technological systems such as computer systems, or telecommunications. Based on Cluster satellites, we examine the nature of the data arising from observations of the mag- netic field in space, with the ultimate goal of designing a network of monitoring stations to detect storm events. This necessitates the development of spatio-temporal statistical mod- els that assimilate information from computer models of the magnetic field as well as the observations . With the aim of better characterising the electromagnetic environment around the Earth, we develop spatio-temporal statistical models of the near-Earth magnetic field utilising in-situ magnetic field data and physical model outputs. Time-series regression analysis has been performed and initial results show that the magnetic field data in storm periods

35 exhibit complex and rapid changes as the satellites orbit into different regions of magneto- sphere, and that there is a strong temporal residual auto-correlation. GARCH (general au- toregressive conditional heteroscedastic) models are used to capture the stochastic model components due to the non-stationary properties of the time series. This talk will focus on a time-series calibration problem - how the satellite observations, together with the physical model outputs, are calibrating the real magnetic field under dif- ferent storm conditions, and present the results of our continuing analysis and modelling of magnetic field data under non-storm conditions.

Talk Time: 12:00-12:20

TIME SERIESWITH MIXED DISTRIBUTIONSANDAN APPLICATION TO DAILY RAINFALL Muhammad Safwan Ibrahim Newcastle University

Keywords: Time series, Bayesian, Mixture model, Mixed distribution

A mixture model can be used to represent two or more sub-populations. A special case is when one of the sub-populations has a degenerate (or discrete) distribution and another has a continuous distribution. This leads to a mixed distribution. For example, daily rainfall data contain zero and positive values. We investigate models for time series and for spatio-temporal data when the distribution is mixed and develop Bayesian methods for analysis and forecasting.

Talk Time: 12:20-12:40

QUANTICATION OF OVERALL AIR QUALITYIN SPACE AND TIMEANDITS EFFECTS ON HEALTH Guowen Huang University of Glasgow

The long-term health effects of air pollution are often estimated using a spatio-temporal ecological study, where the disease data take the form of counts of hospital admissions or mortalities from populations living in nonoverlapping administrative units within yearly time intervals. One of the major challenges in such studies is to estimate spatially repre- sentative pollution concentrations for each areal unit and time period. I focus on how to improve the prediction of pollution concentrations by fusing multiple sources of pollution data, then investigate its health effects.

36 (D) ECOLOGICAL STATISTICS - Room: LG 17, Chair: Colleen Nooney

Talk Time: 11:20-11:40

MODELLING ENCOUNTERSIN POPULATION ABUNDANCE SURVEYS Richard Glennie, Stephen Buckland and Roland Langrock University of St Andrews

Keywords: Statistical Ecology, Population Abundance, Hidden Markov Model

Estimating the density of a biological population over a study region is a common chal- lenge in Statistical Ecology. Data collected during population abundance surveys often depends on encountering members of the target population. Such encounters rely upon the movement of the individuals involved and on the probability of observers detecting individuals. Despite this, the most popular statistical methods for estimating abundance assume the target population is immobile or that individuals move in an unrealistic way during the survey. In this talk, we briefly introduce the most popular methods used in abundance estimation (distance sampling, capture-recapture, random encounter models) and discuss their shortcomings. We then present a flexible statistical framework where the particular movement characteristics of a target population can be incorporated with the uncertainty of detecting individuals during a survey. We show how animal tagging data or expert knowledge of a species can be included in this framework. Furthermore, we il- lustrate the use of spatial hidden Markov models to obtain maximum likelihood estimates and discuss the computational difficulties of these models.

Talk Time: 11:40-12:00

CAPTURE-RECAPTURE ESTIMATIONWHENSOMEOFTHE OBSERVED CLASSESARE MISCLASSIFIED Panicha Kaskasamkul and Dankmar Bohning¨ University of Southampton

Keywords: Capture-recapture, geometric model, Poisson heterogeniety, zero-truncated model, one-truncated mode

Estimating population size and biodiversity is one of the topics of current interest, but analysis remains a challenge because some real situations are not consistent with basic assumptions. Current approaches usually assume that the observed data are not in error. However, this assumption appears doubtful in some applications. This can cause the prob- lem of misclassification. The consequences are that some approaches such as Chao’s lower bound approach induced by unobserved heterogeneity overestimates severely even if it is seemingly adjusted for underestimation. For the purpose of this talk, we will focus on the use of capture-recapture models to estimate the target population size and takes into account potential misclassification. Such a model is provided by the geometric in a beneficial way. We see that the geometric arises as the mixture of a Poisson distribution with an Exponential distribution hence taking into account at least some of the potentially available heterogeneity. Assuming that classes

37 counted only once may be easier overlooked than classes counted many times. Therefore, truncating starts at lower frequency counts with singletons and then doubletons and po- tentially more if it is appropriate. This truncation process is particular easy in the case of the geometric since lower frequency count truncation results in geometric distribution again. Based on this model an inferential approach for estimating biodiversity and popu- lation size under misclassification is presented.

Talk Time: 12:00-12:20

BAYESIAN EMULATION AND HISTORY MATCHING WITH APPLICATION TO BIOLOGICAL PLANT MODELS Samuel Jackson Durham University

Many processes in our world are represented in the form of complex simulator mod- els. These models frequently take large amounts of time to run. Emulators are statistical approximations of these simulators that make predictions, along with corresponding un- certainty estimates, of what the simulator would produce. The main advantage of these emulators are the speed at which they run, which, in general, is many orders of magni- tude faster than the simulators which they aim to approximate. This increase in speed allows for fully comprehensive model analysis and full exploration of the input parameter space. Emulation can be used in any area of science that represents real-world systems in the form of complex models. Examples include climate models, biological plant models, galaxy formation models and epidemic-spread models. The example which I focus on is a biological plant model of the interaction of the POLARIS gene function, PIN proteins and hormonal crosstalk in Arabidopsis root development. I will, in particular, look at the application of Bayesian history matching, a technique which facilitates efficient parameter inference, within this context.

Talk Time: 12:20-12:40

THE ‘STEPAND TURN’ANIMAL MOVEMENT MODELIN CONTINUOUS-TIME Alison Parton and Paul Blackwell University of Sheffield

Understanding how wildlife paths evolve has the potential to affect a range of areas within ecology. Yet the questions of why, when and where animals move are still unan- swered in many areas of ecology. Wildlife movement paths are often complicated and dif- ficult the interpret; they regularly appear to exhibit ‘randomness’ and arise from a complex mixture of internal behavioural states, physiological constraints and memory-related pro- cesses. Due to their intuitiveness and accessibility to non-statisticians, the current group of statistical animal movement models with the most widespread use by ecologists are those based on observed ‘turning angles’ and ‘step lengths’, derived from an animals observed locations over time. However, these models are based in a discrete-time setting in which the animals location is only defined at pre-determined discrete time-points. This talk will

38 introduce this popular range of models and demonstrate how they can be applied to ob- served animal movement data. It will be discussed why there is a need for movement mod- els formulated in continuous-time rather than the more widespread discrete-time models, before introducing current work into the development of an ‘equivalent’ continuous-time version of the ‘step and turn’ model.

12.1.3 Session 3: 14:00-15:20

(A) SEQUENTIAL MONTE CARLO METHODS - Room: LG 10, Chair: Min Wang

Talk Time: 14:00-14:20

ADAPTIVE SUMMARY STATISTIC SELECTIONWITHINA SEQUENTIAL MONTE CARLO ALGORITHMFOR APPROXIMATE BAYESIAN COMPUTATION Sophie Watson University of Bristol

Sequential Monte Carlo (SMC) algorithms for Approximate Bayesian Computation (ABC) have been shown to improve the accuracy of inference, compared to standard rejec- tion ABC. The reason for this improvement is that standard rejection ABC proposes par- ticles from the prior, whereas the proposal distribution in SMC approaches the posterior. However, as we demonstrate here, standard SMC-ABC methods are still very sensitive to the dimensionality of summary statistics. In addition, such methods can be expensive to implement as they require many simulations from the model. Often a large proportion of simulations are quickly ‘rejected’ and subsequently ignored. SMC-ABC methods are also affected by the choice of summary statistics: Summarising the data poorly results in worse inference. We present a novel algorithm which uses knowledge gained from all historical simulations from the model to select summary statistics to use for the next iter- ation of SMC. We show that our algorithm performs favourably on a toy model in which summary statistics have been deliberately poorly chosen and obtains estimates which are remarkably close to those which could be obtained using sufficient statistics.

39 Talk Time: 14:20-14:40

SIMULATING QUASI-STATIONARY DISTRIBUTIONSON REDUCIBLE STATE SPACES Adam Griffin, Gareth Roberts, Simon Spencer and Paul Jenkins University of Warwick

Keywords: Quasi-Stationary Distributions, Sequential Monte Carlo, Bootstrap Filters

Most stochastic epidemic models only have degenerate stationary distributions which represent “no infection”. We study quasi-stationary distributions (QSDs) linked to these processes by considering events conditional on the epidemic having not died out. Further- more, in multi-type processes we encounter reducible state spaces which cause problems with simulation. To tackle these, this talk will outline some SMC sampler methods and re- sampling techniques: Combine-Split Resampling and Regional Resampling developed to derive approximations to QSDs through simulation. These will be demonstrated through application to stochastic epidemic models.

Talk Time: 14:40-15:00

ALOOK-AHEAD APPROACHTO SEQUENTIAL MONTE CARLO Pieralberto Guarniero University of Warwick

I will briefly illustrate the use of look-ahead functions and look-ahead SMC algorithms in a hidden Markov model setting and present an original iterative look-ahead particle filter scheme, based on subsequent waves of particles gradually improving their path ex- ploration efficiency. The algorithm, possibly starting with no information at all regard- ing the aforementioned look-ahead functions, proceeds in a forwards/backwards iterative fashion estimating the look-ahead functions, gradually improving the precision of their estimate and using them to get estimates for the model that are potentially improved at each iteration. Some simulation results from the algorithm implementation, showing some promising potential especially in a high-dimensional setting will be included.

40 Talk Time: 15:00-15:20

ONLINE STATE AND PARAMETER ESTIMATION IN DYNAMIC GENERALISED LINEAR MODELS Rui Vieira Newcastle University

Keywords: Bayesian inference, State space models, Sequential Monte Carlo, Estimation algorithms, Dynamic generalised linear models, Big data

With the advent of Big Data, inference in very large datasets is becoming increasingly common. Furthermore, business models and operational strategies demand ever more of- ten an online, user real-time feedback solution in contrast with standard batch methods. This involves performing inference on streaming data even before it is persisted. A flexible and elegant way of performing this task is by modelling data as a Dynamic Generalised Linear Model (DGLM), fundamentally a State-Space Model (SSM) which combines a linear transition state model with an observational model following a distribution from the ex- ponential family. Here we will explain the basics of a fully Bayesian online and sequential state estimation using sequential Monte Carlo (SMC) methods and how this class of al- gorithms can be implemented to perform state and parameter estimation simultaneously. The merits and shortcomings of these methods will also be discussed.

(B) BIOINFORMATICS - Room: LG 19, Chair: Colleen Nooney

Talk Time: 14:00-14:20

IDENTIFICATION OF DIFFERENTIALLY METHYLATED REGIONSIN BISULPHITE SEQUENCING DATA USING BAYESIAN MODELS Tusharkanti Ghosh, Mayetri Gupta, John Cole, Neil Robertson and Peter Adams University of Glasgow

Keywords: Hidden Markov models, Bisulfite-sequencing data

DNA methylation plays an important role in many biological processes such as gene ex- pression and cellular proliferation. Aberrant DNA methylation patterns-hyper-methylation and hypo-methylation, compared to normal cells-have been associated with a large num- ber of human malignancies and potential cancer symptoms. Many studies have explored complex age-related methylation changes, but their functional significance has remained unclear. Bisulphite sequencing (BS-seq) is currently the gold standard for experimentally measuring genome-wide DNA methylation. Our first objective is to identify differential patterns of DNA methylation between proliferating (normal) and senescent (ageing) cells. We propose a novel Bayesian latent variable model that can identify Differentially Methy- lated Regions (DMRs) in data obtained by BS-seq. The predicted DMRs can help under- standing the phenotypic changes associated with human ageing. We also provide a data- augmentation algorithm for Hierarchical Hidden Markov Models (HHMMs) that can si- multaneously predict differentially methylated regions and a number of other parameters of biological interest.

41 Talk Time: 14:20-14:40

POPULATION STRUCTUREIN GENETIC ASSOCIATION STUDIES Alasdair McIntosh University of Glasgow

The human genome contains 3 billion base pairs worth of genetic information. His- torically, as people migrated across the world, groups of humans became isolated from each other over large periods of time and their genetic makeup diverged or drifted. As a result of this historic drift, within modern-day samples of humans there is the risk of incorrect inferences being made about the association of genes with risk of disease due to such clusters or subpopulations being present within the samples. One possible approach to this problem is to quantify the amount of genetic drift that has occurred between such subpopulations within a phylogenetic hierarchy. This talk describes these problems and introduces two methods of quantifying such drift, the first one developed by Balding and Nichols and one developed more recently by Nicholson, Donnelly and others.

Talk Time: 14:40-15:00

INFERENCEOF TRANSMISSION TREESFOR EPIDEMICSUSING WHOLE GENOME SEQUENCE DATA Rosanna Cassidy University of Nottingham

We discuss the inclusion of whole genome sequence data in models of disease trans- mission, with a particular focus on the spread of MRSA in hospital wards. We attempt to highlight some of the common limiting assumptions made in previous models for genetic difference between patients in the data, to illustrate the impact that these assumptions have, and to develop new models. We present the results of running these new models in an MCMC routine.

Talk Time: 15:00-15:20

INFERRING ROOTED PHYLOGENIESVIA NON-REVERSIBLE SUBSTITUTION MODELS Svetlana Cherlin, Tom Nye, Tom Williams, Sarah Heaps, Richard Boys and Martin Embley Newcastle University

Most phylogenetic models assume that the evolutionary process is reversible. This means that the root of a phylogenetic tree cannot be inferred as part of the analysis because the likelihood of the data does not depend on the position of the root. Yet, defining the root of a phylogenetic tree is a key component of phylogenetic inference, because it provides a point of reference for polarising ancestor/descendant relationships and therefore inter- preting the tree. In this talk I present two related non-reversible models and their use for inferring the root of a phylogenetic tree. The non-reversibility of both models is achieved by extending a standard reversible model to allow a non-reversible perturbation of the instantaneous rates of substitution between the nucleotides. This perturbation makes the

42 likelihood dependant on the position of the root, enabling inference about the root directly from the sequence alignment. The two models differ in the way the underlying reversible model is perturbed. The inference is performed in a Bayesian framework using Markov chain Monte Carlo. The performance of the models is illustrated via analyses of simulated data sets. The models are also applied to a real biological dataset for which there is robust biological opinion about the root position: the radiation of palaeopolyploid yeasts follow- ing a whole-genome duplication. The results suggest that non-reversibility is a suitable feature for learning about root position and that non-reversible models can be useful to infer the root position from real biological data sets.

(C) STOCHASTIC EPIDEMIC MODELS - Room: LG 15, Chair: Riccardo Rastelli

Talk Time: 14:00-14:20

INFERENCEFOR STOCHASTIC SIRHOUSEHOLD EPIDEMIC MODELWITH MISCLASSIFICATION Umar Mallam Abdulkarim University of Kent

Keywords: Classification error, False negative probability, False positive probability, final size epidemic data

Misclassification of individual health state often occur when infectives are wrongly recorded as susceptibles or suscpetibles recorded as infectives referred to as false negative and false positive classification errors respectively. These errors of classification often lead to imprecise records of the number of individuals infected in the households and there- fore unreliable results of inferences from such data. It is then necessary to examine the choice of numerical optimisation schemes for the maximum likelihood estimates of the parameters. In this work, we examined the above scenarios for the stochastic SIR house- hold epidemic model in which the probability of false negative classification error (false negative misclassification probability) and false positive classification error (false positive misclassification probability) are assumed the same and then compared the performance of two and three dimensional numerical optimisation schemes on three dimensional sim- ulated data for large and small percentage noise in the data. We found that for large noise, three dimensional numerical optimisation outperforms two, while for small noise closer to 0, two dimensional numerical optimisation outperforms three.

43 Talk Time: 14:20-14:40

THE IMPACT OF DEGREE DISTRIBUTIONON NETWORK EPIDEMIC MODELSWITH CASUAL CONTACTS Benjamin Davis University of Nottingham

We consider a SIR (Susceptible - Infected - Removed) epidemic on a population with network and casual (homogeneously mixing) contacts. For a given degree distribution and epidemics with fixed basic reproduction number (R0 > 1) we examine the effect of the relative transmission rates for network and casual contacts on the final outcome of the epidemic. We conclude by briefly describing some extensions to the underlying model such as including household structure and vaccination strategies.

Talk Time: 14:40-15:00

STABILITY ANALYSIS OF A MODELOF WIRELESS COMMUNICATION NETWORKWITH SPATIAL INTERACTIONBETWEEN USERS Abla Azalekor Heriot-Watt University

Keywords: Stochastic stability, Aloha protocol, Spatial interactions

We consider a multiple access communication network of two stations where users at- tempt transmission of messages until successful. We divide time into slots and we assume that each transmission attempt of messages lasts for one time slot. Also, the number of new messages that come into the system at a node and within one time slot follows a certain distribution. We assume further that whenever there is multiple transmission attempts, the messages collide. In case of failure, messages are retransmitted in the next time slot with a given probability. We assume that there is interaction between the two stations and we can describe it as follows: node 1’s transmission activity is independent of that of node 2 and node 2 fails to transmit if at the same time, node 1 is attempting transmissions of messages. The number of messages awaiting transmission in the system through time is a 1 2 1 two dimensional discrete-time Markov chain (Wn ,Wn ) such that (Wn ) solely is the slotted 2 1 Aloha type model and (Wn ) is a stochastic process that depends on (Wn ). We find con- ditions on the expected number of new arrivals at the different nodes such that we have 2 stability of the system. We later on construct an auxiliary Markov chain for (Wn ) and show that these conditions are also sufficient for its stability.

44 Talk Time: 15:00-15:20

ANALYSIS OF HOUSEHOLD-BASED MODELSOF ENDEMIC DISEASES Chibuzor Nnanatu and Peter Neal Lancaster University

Keywords: Household-based, SIS, Individual-based, Aggregate-based, Epidemic Models, Bayesian Inference, MCMC

In this talk, we analyse Susceptible-Infected-Susceptible (SIS) household-based epi- demic models using two data forms, namely; Individual-based data (IBD) and Aggregate- based data (ABD). The latter gives the infectiousness (number of infectives) of a given household at a particular time point of observation, while the former, in addition to giving the number of infectives in a household, also gives the infection status of each individual of the household at a given time point of observation. We develop Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference on both IBD and ABD. Interest here is on the efficiency of the MCMC algorithms. The performances of the MCMC algorithms developed using each of IBD and ABD are compared in terms of speed of convergence and accuracy (closeness to the true parameters values). Specifically, we want to know if the estimates obtained with ABD offer better approximation to the true parameters values than those with IBD.

12.2 Wednesday 5th July 12.2.1 Session 4: 09:40-11:00

(A) STATISTICAL MODELS WITH MEDICAL APPLICATIONS - Room: LG 10, Chair: Nebahat Bozkus

Talk Time: 09:40-10:00

HOWWELLDO CURRENT LUNG CANCER RISK PREDICTION MODELS PERFORM? Eoin Gray, Dawn Teare and John Stevens University of Sheffield

Keywords: Prediction Models, Lung Cancer

Background: Lung cancer prediction models have been developed to accurately pre- dict risk in individuals and could be used for selective screening. Despite these models being available, current methods in the US only use accumulated smoking exposure (1). Methods: A systematic review identified seven validated prediction models that in- clude epidemiological factors. We are evaluating the performance of these seven models in ten datasets from the International Lung Cancer Consortium to perform an external com- parative validation. The study will look at the calibration, discrimination, and prediction rules for each model and scrutinise performance to identify the optimal risk thresholds.

45 Results: The preliminary results on the first datasets found the two PLCO models (2) had the best overall performance; the 2014 model recorded an AUC of 0.773, 95% CI [0.748, 0.798], but a calibration p-value of 0. On testing the prediction rules at a series of risk thresholds the study found the PLCO 2014 model worked best at the 0.5% risk over 6 years with a net reclassification index of 0.0876 on current selective screening methods. Discussion: Our preliminary results demonstrate that at least one of the seven models would provide a more accurate screening tool than the current methods in the US. The completed full comparison of all the prediction models on the ten distinct study datasets should identify robust improvements to lung cancer screening criteria which can be im- plemented into future screening trials. References: 1. Gatsonis CA, Natl Lung Screening Trial Res T. Radiology 2011;258(1):243-253 2. Tammemagi MC, et al. PLoS medicine. 2014;11(12):e1001764-e

Talk Time: 10:00-10:20

AN INVESTIGATION OF THE FACTORSWHICH INFLUENCE CHILDREN WITH ASTHMA HAVING ASTHMA EXACERBATION AROUND THE START OFTHE NEW SCHOOL YEAR Rebecca M. Simpson, Steven A. Julious and Wendy O. Baird University of Sheffield

At the beginning of the new school year in September there is an increase in the number of unscheduled medical contacts made by school age children with asthma. It is thought that this is caused by a viral challenge caused by the return to school. Unfortunately, this challenge is exacerbated as some children stop taking their medication over the summer holiday period. The aim of this research is to try to predict which children are more likely to have an unscheduled medical contact in September. The PhD will use mixed meth- ods, this means that the analysis will use both quantitative and qualitative methods to answer the research aim. For the quantitative part, the data that will be statistically anal- ysed will come from an intervention study called ‘PLEASANT’. PLEASANT (Preventing and Lessening Exacerbations of Asthma in School-age children Associated with a New Term), investigates whether a simple letter intervention to remind children to take their asthma medication during the summer holidays can reduce unscheduled contacts. For the qualitative part, interviews/focus groups will be undertaken with children to explore why children do not take their medication. Both the quantitative and qualitative analyses will be used to inform and reinforce each other. After completing the analyses, the final aim for the PhD is to propose an intervention that can be targeted at those who are most likely to have an unscheduled hospitalisation in September.

46 Talk Time: 10:20-10:40

MODELLING RATES OF METHICILLIN-RESISTANT STAPHYLOCOCCUS AUREUS BACTERAEMIAAND DETECTING CHANGESIN TREND Taghreed Jawa, David Young and Chris Robertson University of Strathclyde

Keywords: Poisson regression, seasonal effect, change points analysis, bootstrap method, MRSA bacteraemia

Methicillin resistant Staphylococcus aureus (MRSA) refers to strains of Staphylococcus aureus (bacteria that 30% of healthy people have on their skin) that have become resis- tant to some antibiotics. The identification of changes in trend is an important issue in the analysis of MRSA bacteraemia data. Since 2003, the data on MRSA bloodstream in- fection have been gathered by Health Protection Scotland. This data shows the rates of MRSA bacteraemia per 100,000 acute occupied bed days (AOBDs) in Scotland. Poisson regression models are used to describe change in the rate over time. A cubic model was the best for describing the change in trend. Its residual deviance is small (66.79 on 37 de- grees of freedom) and the residuals are not correlated. A good estimate of future trend was observed by using the cubic model. The cubic model showed two time points when the trend of MRSA bacteraemia changed. During March in 2005 the trend of MRSA bacter- aemia changed significantly where the maximum rate is 20.53 per 100,000 AOBDs began to decrease. The 95% confidence interval when the rate start to decrease is between De- cember, 2004 and June, 2005. In conclusion, good antimicrobial practice in hospitals in Scotland was recommended in 2005 which led to decrease the rate of MRSA bacteraemia afterward.

Talk Time: 10:40-11:00

MELANOMA SURVIVAL MODELSUSING INTEGRATED CLINICALAND GENOMIC DATA Ernest Mangantig, Mark M. Iles, Jer´ emie´ Nsengimana, Jon Laye, Julia A. Newton Bishop, Timothy Bishop and Jennifer H. Barrett Leeds Institute of Cancer and Pathology, University of Leeds

Keywords: Survival model, penalized Cox regression, polygenic risk score

Genetic factors, such as gene expression levels within the tumour and the patient’s genotype, influence survival in melanoma patients. No studies so far have conducted a joint analysis of the effect of clinical predictors, gene expression levels and genotype on melanoma-specific survival (MSS). This study aims to compare methods for combining clinical and genomic data to build prognostic models for melanoma. Clinical and genome- wide genetic data were available from a cohort of 2000 melanoma patients from Leeds, UK, with genome-wide gene expression data from the primary tumour available on a subset of 204 participants. Two approaches were used to combine the data. In the first (agnostic) method, penalized Cox regression, using the lasso penalty, was applied in two analyses to select variables related to MSS: (i) to 5889 single nucleotide polymorphisms

47 (SNPs) from across the genome (ii) to genome-wide gene expression levels using 27,644 probes. A weighted polygenic risk score (PRS) was created from the set of 20 selected SNPs; this is the sum of trait-associated variants, weighted by their regression coefficients from the survival model. The PRS was included in a Cox regression of MSS together with the 15 selected gene expression levels, with and without clinical predictors. In the second approach, PRSs for inclusion in the models are being developed based on the genetic pre- dictors of intermediate traits related to melanoma survival, such as gene expression levels and telomere length. For the former, for each gene whose expression is related to MSS, a genome-wide analysis was conducted to identify SNPs related to expression levels from which to derive a PRS, while the score for the telomere length is based on previously pub- lished risk estimates. The performance of the different models will be compared to identify the best approaches to integrating different types of data to build survival models.

(B) MULTIVARIATE ANALYSIS - Room: LG 19, Chair: Jude Chukwura Obi

Talk Time: 09:40-10:00

CONVOLVED GAUSSIAN PROCESS REGRESSION MODELFOR MULTIVARIATE COUNT DATA A’yunin Sofro and Jian Qing Shi Newcastle University

Keywords: Covariance kernel, Convolution, Multivariate count data

We propose convolved Gaussian process regression model for multivariate count data where the response variables widely use for spatial analysis, particularly for health data. The proposed model offers multivariate count data with multi-dimensional covariates, and provides a natural framework on modelling common mean structure and covariance structure for correlated multiple input. The definition of the model, the inference and the implementation as well as its asymptotic properties are discussed.

Talk Time: 10:00-10:20

MULTIVARIATE CENTILESVIA CONVEX SETSANDTHEIR EXTENSIONVIA COPULA MODELS Eirini Koutoumanou, Mario Cortina-Borja and Angie Wade University College London

Copulas are multivariate distribution functions that model the joint behaviour of re- sponse variables with known univariate marginals. They can be used to model complex relationships between response variables that go beyond the bi/multi-variate normal dis- tribution and can incorporate a wide variety of marginal distributions. They can also be ex- tended to incorporate covariates into the parameters of all marginal distributions as well as the copula parameter, which defines the strength of the dependence between the response variables. Copula models may be used to construct multivariate centiles, an application

48 that we have not found in the literature. Population centiles are widely used to highlight individuals who have unusual outcomes and often several are jointly considered. Also, it is often necessary to adjust for one or more covariates, most commonly age, gender and height. Whilst the methodology for the construction of univariate centiles is well estab- lished there has been very little work in the area of multivariate centiles although these clearly have the potential to offer improvements over a series of univariate centiles. In this talk, I will demonstrate the usefulness of convex sets in the creation of bivariate centiles and discuss the extension of this application to higher dimensions. A convex set is a set of points such that the line joining all these points is also part of the same set and does not have any dents in its perimeter. I will illustrate how convex plots can act as the basis of distribution-free ways of exploring bivariate associations and how we can use them to produce parametric centiles with the assistance of copula models, given specific marginal distributions that will then allow us to produce confidence intervals for each centile.

Talk Time: 10:20-10:40

SMALL AREA ESTIMATION METHODSFOR MULTIDIMENSIONAL POVERTY AND WELL-BEING INDICATORS Angelo Moretti, Natalie Shlomo and Alan Marshall University of Manchester

Keywords: Factor Analysis, EBLUP, Multivariate Mixed Effects Model, Poverty and Well-being

Measuring poverty and well-being is a key issue for governments and policy makers who require a detailed understanding of the geographical distribution of social indicators (e.g. means, ratio, proportions). This understanding is essential for the formulation of targeted policies that address the need of people in specific geographical locations. Most large-scale social sample surveys provide accurate estimates at regional levels. For in- stance, a relevant survey in the European Union for analysing social exclusion phenomena is EU-SILC (European Union Statistics for Income and Living Conditions). However, this data can be used to produce accurate direct estimates only at the NUTS 2 (Nomenclature of Territorial Units for Statistics) level. Hence, when the goal is to measure poverty and well- being indicators at a sub-regional level, they cannot be directly estimated from EU-SILC (Pratesi et al., 2013). Thus, indirect estimation methods, in particular small area estima- tion (SAE) methods should be used in that case. It is generally agreed that poverty and well-being are multidimensional phenomena. Therefore, multivariate mixed effects mod- els play a crucial role. On the one hand, we can take into account the correlation structure between variables, and on the other hand, we can obtain more accurate and more efficient estimates than the univariate case in SAE (Datta et al., 1999). In this presentation I will discuss the problem of small area estimation for multidimensional poverty and well-being indicators. Then, I will present the results of some preliminary simulation studies. In the simulation experiments, the population has been generated from the multivariate mixed effects model (Fay and Fuller, 1987). As a first approach I will be investigating a univariate EBLUP (empirical best linear unbiased predictor) (Rao, 2003) for factor scores following a data reduction method of factor analysis. In the simulations I compare the EBLUPs with the direct estimates (obtained via the Horvitz and Thompson estimator) of the mean factor score in small areas. The simulations will consider bot the standard one level factor model (Hardle and Simar, 2012) as well as the two level factor model (Longford et al. 1992).

49 Talk Time: 10:40-11:00

ASTATISTICAL DATA-BASED APPROACHTO IDENTIFYING GAMEPLAY BEHAVIOURS IN ONLINE FREEMIUM GAMES Anusua Singh Roy, Robert Raeside and Martin Graham Edinburgh Napier University

Keywords: Big Data, Bayesian Modelling and Clustering, Statistical Modelling of Online Behavioural Data, Multidimensional Clustering, Predicting Survival Times

The video games industry is one of the most attractive segments in the entertainment and digital media. A review from Digi-Capital on Global Games Investment suggests that the worldwide games business could be worth more than $100 billion by 2017. The new commercial models are “free to play” and available across multiple platforms. The business is in selling advertising space and having players make micro payments within the game to access advanced features. The focus is on maintaining average rev- enue per user (ARPU) and lifetime value (LTV) of player instead of the one-off retail price. The aim of this work is to research suitable data-driven methods to effectively under- stand how players act within a game environment. This will then be used to predict three important measures associated with successful games retention, monetisation and sur- vival times of players. The challenges identified are:

• The sheer volume of data arising from gameplay - typically thousands of players triggering millions of game events - truly Big Data

• Strategies and behaviours of players are continually evolving as they progress fur- ther into the game

The approach that will be undertaken to achieve the afore-mentioned aims can be de- scribed as:

• Conducting empirical examinations of event sequences from gameplay data

• Determining statistical modelling and clustering approaches (Frequentist, Bayesian and Machine Learning techniques)

• Testing these with the help of an experimental approach with simulated data and

• Building an analytic framework for behavioural study of online Big Data

Predictive models of players’ behaviour in online games are an open research topic that is receiving increasing attention in the literature. Successful research in this area will add knowledge to online behavioural analysis especially with Big Data.

50 (C) REGRESSION MODELS - Room: LG 15, Chair: Adrian Byrne

Talk Time: 09:40-10:00

SIMULTANEOUS BAYESIAN BOX-COX QUANTILE REGRESSION Aziz Aljuaid, John Paul Gosling, Charles C. Taylor University of Leeds

Since quantile regression was introduced, it has been considered as a significant statis- tical tool for analysing the full conditional distribution of the response variable. It offers a greater extension than the ordinary least square regression in terms of describing the entire conditional distribution of the dependent variable. It has been applied in a variety of fields such as science, finance, econometrics and environmental science. Transformation meth- ods are widely considered as a useful statistical technique for addressing two violations of the linear regression model assumptions, which are non-linearity and heteroscedastic- ity. However, in order to benefit from the transformation power, the suitable transfor- mation for the response variable needs to be determined. Therefore, it is better to allow the data to decide what is the most appropriate transformation. This can be achieved by applying Box-Cox transformation. We employ the Box-Cox transformation to develop a si- multaneous Bayesian quantile regression method based on a pseudo-asymmetric Laplace likelihood (PALD-BC) to fit multiple quantile regression. Also, we specify suitable prior distributions for all parameters including the transformation parameters. Moreover, the issue of the crossing of quantile curves is discussed, and a solution based on prior dis- tributions is considered. Then, we propose a suitable Metropolis-Hastings algorithm and Gibbs sampler to implement the PALD-BC method. Besides, we consider two extensions of the PALD-BC method. These extensions are Bayesian Box-Cox quantile regression with heteroscedastic errors (PALD-HBC) and Bayesian two-sided Box-Cox quantile regression (PALD-TBC).

Talk Time: 10:00-10:20

SPARSE UNMIXINGOF HYPERSPECTRAL DATA Maria Toomik and James Nelson University College London

Spectral unmixing is popular tool for estimating the fractional abundances of pure spectral signatures (called endmembers) in hyperspectral images. This can be approached in a semi-supervised manner using spectral libraries that contain spectra collected from the ground by a field spectroradiometer. Then the observed image signatures can be ex- pressed in the form of linear combinations of pure spectral signatures from the spectral library. The number of endmembers present in each image pixel is very small compared to the large spectral libraries, hence sparse regression techniques are used. In my work, the traditional L1 norm (an approximation of L0 norm) for the sparse optimisation is replaced by a Lp norm with 0 < p < 1. I will also discuss the inclusion of the total variation regu- larisation in order to take advantage of the spatial-contextual information of hyperspectral images.

51 Talk Time: 10:20-10:40

DISCRETE WEIBULL REGRESSION MODEL FOR COUNT DATA Hadeel Kalktawi and Keming Yu Brunel University

Data can be collected in the form of count in many situations. Furthermore, researchers usually concerned to investigate the effect of some other variables on this frequency of events, that is the regression for count data. Although the Poisson regression can be con- sidered as the most common model for count data, it is limiting in its equi-dispersion. The negative has become the most widely used for over-dispersed count data, but it can not be suitable to model the under-dispersed data. In this paper, the discrete Weibull regression model, which allows for both over-dispersion and under- dispersion, or even a combination of both, is introduced. Moreover, this simple model has the ability to model the highly skewed data with too many zero counts. Even for the very over-dispersed data, the model shows a better fit and less number of parameters to estimate than zero-inflated and hurdle models which are typically used in this case. Max- imum likelihood estimate is used to fit the regression model and shown work well. Real data examples from both over-dispersion and under-dispersion are used to illustrate the methods.

Talk Time: 10:40-11:00

BAYESIAN PROFILE QUANTILE REGRESSION Xi Liu, Silvia Liverani and Keming Yu Brunel University

In classical regression analysis, some covariates may achieve a high level of statisti- cal significance on the outcome by itself, but not in the presence of many other related covariates. On the other hand, the effect of a particular covariate on the outcome might only be revealed in the presence of other covariates. Therefore, the overall pattern of joint effects may be elusive, and hard to capture by traditional analyses that include main ef- fects and interactions of increasing order, as the model space becomes soon unwieldy and power to find any effects beyond simple two-way interactions quickly vanishes. Clusters representing covariate patterns have been recently proposed. However, covariate profiles contributing to extreme outcome may be different from normal ones, so that covariate pro- file based clusters may change over the outcome distribution. This presentation introduces profile quantile regression for clustering, which utilizes a Bayesian mixture of asymmet- ric Laplace distributions. The model is fitted using Markov chain Monte Carlo (MCMC) sampling methods and used in a medical application.

52 (D) STATISTICAL MODELLING II - Room: LG 17, Chair: Gareth Davies

Talk Time: 09:40-10:00

SIMULTANEOUS CONFIDENCE SETSFOR SEVERAL EFFECTIVE DOSES Daniel Tompsettt, Stephanie Biedermann and Wei Liu University of Southampton

Keywords: Simultaneous Confidence Bands, Logistic Regression, Effective Doses

In this talk, we construct confidence sets for the effective doses of the maximum likeli- hood estimated logistic regression model , for which provide minimal simultaneous cov- erage 1 − α for a pre number of effective doses at once. The sets are generated via the construction of specialised large sample confidence bands on the inverse dose response curve, known as the inversion of confidence bands method, and provide closer to nomi- nal coverage, and are therefore less conservative than the currently established method of using Scheffe´ type simultaneous confidence bands. Simultaneous confidence sets are first constructed for two effective doses for a multivariate logistic model, and then three and four effective doses for a univariate model, with a method to apply this to any number of effective doses at once. We first construct sets when the effective doses lie over the whole real space, focus then turns to generating the same constructs, where the effective doses are assumed to lie over some predetermined finite range. These sets are then contrasted to current methods for a practical example.

Talk Time: 10:00-10:20

THE QUANTUM LANGUAGEFOR FINANCEVIATHE GENERATOR APPROACH Wenyan Hao and Sergey Utev University of Leicester

Options are financial derivatives on an underlying security. The price of the option is shown to be the analogue of the Schrodinger wavefunction of quantum mechnics and the exact Hamiltonian and Lagrangian of the system is obtained. The Hamiltonian opera- tor is the generator of infinitestimal translations in time. We consider other cases, such as differential operator, Brownian Motion, Geometric Brownian Motion, Poisson Process, Ge- ometric Poisson Process, and Levy Process in Quantum Mechanics formalism by generator approach to derive option price.

53 Talk Time: 10:20-10:40

MODELLINGTHE SHAPEOF EMOTIONS Irene Marinas˜ University of Glasgow

Keywords: Shape Analysis

A research project involving the School of Mathematics and Statistics and the Institute of Neuroscience and Psychology at the University of Glasgow is currently capturing data on facial shape in three-dimensions, with the aim of using this to study different medical questions, such as the effect of surgery in the correction of cleft lip. The data are in the form of three-dimensional point clouds which characterize each facial surface. Analysis of data of this nature raises very interesting questions about how to measure shape and shape change over time. My aim is to study the curves identifying the shape of the lips in a three-dimensional facial image and their variation over the expression of different feelings such as fear, anger, happiness, etc. To record the expressions, a large number of pictures of an actor are taken with a stereophotogrammetric camera system, while she produces the emotion, which leads to a set of data in four dimensions (the three spatial coordinates plus time). Methods for identifying the outline of the mouth are investigated, using ideas of shape index and curvature. The methods involve the identification of the lip curves (including the paths along the lips and the estimation of the landmarks at the mouth corners) and the algorithm for the estimation of the 4D curves. Models for the outline at a single time point are constructed using B-Splines, and a model for the full time evolution is developed. Principal Components Analysis is used to obtain a representation of how the mean shape of the emotion varies over space and time.

Talk Time: 10:40-11:00

EFFICIENT HIGH-DIMENSIONAL GAUSSIAN PROCESS REGRESSIONTO CALCULATE THE EXPECTED VALUEOF PERFECT PARTIAL INFORMATION IN HEALTH-ECONOMIC EVALUATIONS Anna Heath, Ioanna Manolopoulou and Gianluca Baio University College London

The Expected Value of Perfect Partial Information (EVPPI) is a decision theoretic mea- sure of decision uncertainty used principally in health-economic evaluations. This mea- sure has been recommended as the optimal way to quantify the impact of parameter uncertainty in decisions (a process that must be included in all health-economic evalu- ations in the UK). However, the computational time required to calculate this measure has prevented its widespread application. Recently, however, it has been demonstrated that non-parametric regression methods can be used to approximate this measure sig- nificantly reducing the computational power required. In low-dimensional cases fast re- gression methods can be used, but in higher-dimensions Gaussian Process (GP) regression is recommended. The computational cost of GP regression increases with the number of dimensions, meaning that this high-dimensional GP regression is still prohibitively ex- pensive for most health-economic evaluations. We propose an alternative method to esti- mate the hyperparameters of the GP using ideas from Spatial Statistics and the Stochastic

54 Partial Differential Equation components of the R-INLA software package. R-INLA is a powerful package for fast approximate Bayesian inference. Our model is equivalent to standard GP regressions, but employs a fictional spatial structure, along with projections, to describe the correlation among the parameters using a Matrn covariance matrix. For complex problems, this development calculates the EVPPI around 100 times faster than standard GP regression methods.

12.2.2 Session 5: 11:40-12:40

(A) SOCIAL NETWORKS - Room: LG 10, Chair: Abla Azalekor

Talk Time: 11:40-12:00

OPTIMUM BLOCK DESIGNSON SOCIAL NETWORKS Vasiliki Koutra, Steven Gilmour, Ben parker and Peter W. F. Smith University of Southampton

Keywords: Experimenta design, optimum design, social networks

The statistical design on communities in social networks is of great practical importance for the field of experiments. Our main goal is to investigate how the connectivity structure and especially the sub-communities’ formation affect the experimental design. We suggest a technical framework for finding optimum block designs, while taking into account the interrelations of groups of people within the social network. In doing so, the experimental units are initially divided into blocks using spectral clustering techniques and the concept of modularity, prior to assigning the treatments. The construction of the (near-) optimum blocked designs is done using a simple exchange algorithm and is implemented in R.

Talk Time: 12:00-12:20

HOUSE SUBSTITUTABILITY IN GLASGOW Cunyi Wang University of Glasgow

Keywords: Substitutability, p∗ model, Latent space model, CPEP

There are many types of houses in Glasgow, such as two bedroom flats, houses with gardens, different families have different preferences to the type of houses. Substitutability means with the same social and physical attributes, the houses in the same submarket are similar options to the buyers. A submarket is a geographic, economic, or specialized subdivision of a market, which means the more similar the characteristics are, the more possible they will show in the same submarket. The data we have is a 10,000x10,000 matrix of Cross Price Elasticity of Price (CPEP) values, it is a new continuous measurement to quantify the substitutability between postcodes. Classical models focus on the individual postcodes, but in this study, we care more about the relationship between postcodes which will also go against the constraint of independence, so we will use Social Network Analysis

55 (SNA) to model the data. There are several models in SNA, such as the p∗ model (also known as ERGM) and latent space model. p∗ model is used to calculate the probability of a graph with certain characteristics occurring. Latent space model makes an improvement on the basis of p∗ model by containing the latent positions of the postcodes. However, p∗ model and latent space model only deal with binary adjacency matrix, if we have more information about the data, such as the dissimilarities of nodes, we can use BMDS and MBCD model to solve the problem. The goal is to apply the above models to our CPEP data.

Talk Time: 12:20-12:40

PROPERTIESOF LATENT VARIABLE NETWORK MODELS Riccardo Rastelli, Nial Friel and Adrian E. Raftery University College Dublin

Keywords: Social Networks, Latent Position Models

The Latent Position Model (LPM) is one of the most important and widely used statis- tical models in social network analysis. However, despite its extensive use, little is known about the theoretical properties of networks realised from this model. Making use of well known probabilistic frameworks, the degree distribution, clustering coefficient, path length’s distribution and degree correlations arising from realised latent position networks are characterised. Asymptotic scenarios and real data examples are considered, showing how suitable LPMs are for representing large social networks.

(B) BIAS REDUCTION TECHNIQUES - Room: LG 19, Chair: Vincent Mabikwa

Talk Time: 11:40-12:00

PROPOSINGANEW MEASUREFOR DETECTING (LATENT VARIABLE MODELABERRANT)SEMI-PLAUSIBLE RESPONSE PATTERNS Tayfun Terzi London School of Economics and Political Science

New challenges concerning bias to measurement error have arisen due to the increas- ing use of paid participants: semi-plausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments or ques- tionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently over- come objective plausibility checks and process other items only superficially, if at all. The consequences are biased estimations, blurred or even covered true effect sizes, and con- taminated valid models. A new measure developed for the identification of SpRPs in a latent variable framework is evaluated and future research outlined.

56 Talk Time: 12:00-12:20

OPTIMALITY CRITERIAFOR MULTIPLE OBJECTIVES ACCOUNTINGFOR MODEL MISSPECIFICATION Olga Egorova and Steven Gilmour University of Southampton

Keywords: Optimality criteria, Lack-of-fit, bias, model misspecification, compound criteria In the framework of a factorial experiment and fitting polynomial regression model the question of taking into account the possibility of the model lack of fit is of particular interest especially if a relatively small number of runs does not allow estimating poten- tially missed terms. We modified the idea of Generalised A- and D-optimality criteria comprising both desirable properties of the parameters and considering the possibility of the model misspecification by following the approach of using “pure error” estimation which is based on the replicate design points and does not depend on the model. The resulting compound Generalised LP- and DP- optimality criteria maximise the precision of the model parameters estimations and the power of the lack of fit test in the direction of the potential terms, minimise the corresponding bias and also contain standard L- or D-optimality components. We considered an example and examined the properties of the optimal designs obtained. Though there was no specific pattern observed between the al- location of weights to the parts of the criteria and the resulting designs performance (in terms of efficiencies), a few interesting properties were seen and some conclusions can be drawn for practical applications.

Talk Time: 12:20-12:40

IMPROVING ESTIMATION IN GENERALIZED LINEAR MIXED EFFECTS MODELS Sophia Kyriakou and Ioannis Kosmidis University College London

Keywords: Generalized linear mixed effects models, bias reduction Generalized linear mixed effects models (GLMMs) are widely used in statistical prac- tice but traditional estimators, like the maximum likelihood estimator, usually underper- form. Generally, this underperformance results in problems in inference, because bias af- fects the performance of hypothesis testing and confidence intervals. Our research focuses on remedying such phenomena by producing variants of maximum likelihood for GLMMs and the parallel development of the associated computational procedures. Specifically, we derive a bias reduction method that operates via the adjustment of the loglikelihood derivatives. The method is applied to binomial-response GLMMs with logistic link and a random intercept. For this type of models, we provide explicit, but intractable expressions, for the adjusted score functions and present an algorithm for obtaining reduced-bias esti- mates. A simulation study is also presented that illustrates the comparative performance of our bias reduction approach to other GLMM estimation methods. The results indicate that our method is superior to others for a wide range of parameter values.

57 (C) GRAPHICAL MODELS - Room: LG 15, Chair: Pieralberto Guarniero

Talk Time: 11:40-12:00

ESTIMATING DYNAMIC GRAPHICAL MODELSFROM MULTIVARIATE TIME-SERIES Alex Gibberd and James Nelson University College London

Our data-hungry society is not only harvesting more data points, but also measuring an ever-increasing number of variables. The complex systems represented by such data- sets arise in many socio-scientific domains, such as: cyber-security, neurology, genetics and economics. In order to understand such systems, we must focus our analytic and experi- mental resources on investigating the most significant relationships. However, searching for significant relationships between variables is a complex task. The number of possi- ble graphs encoding dependencies between variables becomes exponentially large as the number of variables increase, such problems are only compounded when one considers how the graphs may vary over time. We present and discuss a computationally efficient method that: (i) recovers key de- pendency structure within a time-series and (ii) detects structural changepoints where the dependencies between variables appear to change at a systematic level. The potential util- ity of our method is demonstrated by applying it to pressing real-world problems, such as detecting anomalies in computer networks and examining dependencies within gene activation networks.

Talk Time: 12:00-12:20

ANALYSIS OF LINGUISTIC BEHAVIOUR USING GRAPHICAL MODELS Craig Alexander, Ludger Evers, Tereza Neocleous and Jane Stuart-Smith University of Glasgow

Keywords: Graphical models, phonetics, chain graph models, hidden Markov models, linear mixed models

We discuss the use of chain graph models as a method of static analysis on specific aspects of language using data from the Sounds of the City project based on Glaswegian speech data. The motivation for using chain graph models is twofold. Firstly, to provide a visual description of the data which is simple to understand, first beginning with a vi- sualisation of a linear mixed model. Secondly, we use chain graphs in particular to allow a clear response variable. The remainder of the talk will discuss future work on tracking temporal dependency in speech with the use of a hidden Markov model (HMM), with the unobserved variable being introduced to track the temporal dependency.

58 Talk Time: 12:20-12:40

GRAPHICAL MODELS TO FIND SPARSE NETWORKSIN GENE EXPRESSION DATA Adria Caballe Mestres University of Edinburgh

Gaussian graphical models are widely used to estimate the underlying graph structure of conditional dependence between variables by determining their partial correlation or precision matrix. In high dimensional setting, the precision matrix is estimated using pe- nalized likelihood by adding a penalization term which controls the amount of sparsity in the precision matrix and totally characterizes the complexity and structure of the graph. The most commonly used penalization term is the L1 norm of the precision matrix scaled by the regularization parameter which determines the tradeoff between sparsity of the graph and fit to the data. In the talk I will present an application of graphical modeling to gene expression data with the aim to discover the genes relevant to colon tumor. Using this data, we recover conditional dependence graphs which are verified to display significant biological gene associations.

59 12.2.3 Sponsors’ Talks - 14:00-15:30

SESSION 1 (14:00-15:30): Room: LG 10, Chair: Amy Whitehead

Talk Time: 14:00-14:30

ROCHE BIOSTATISTICS DOING NOW WHAT PATIENTS NEED NEXT Alun Bedding and Carol Reid Roche Products Ltd., UK

Roche is the worlds largest biotech company, with truly differentiated medicines in on- cology, immunology, infectious diseases, ophthalmology and neuroscience. Roche is also the world leader in in vitro diagnostics and tissue-based cancer diagnostics, and leads the way in diabetes management. Our work in preventing, diagnosing and treating a range of health disorders through innovative products and services enhances both people’s health and quality of life. In the talk you will learn more about Roche globally and in the UK. We will describe the various roles available for biostatisticians at Roche and will provide some examples of work they are involved in.

Talk Time: 14:30-15:00

GAME, SET AND MATHS:PREDICTINGSPORTSRESULTSTHROUGH STATISTICAL MODELLING Tim Paulden ATASS Sports

In this presentation, Tim will briefly describe how probability and statistics can be used to help forecast sports results, with a particular focus on football and tennis.

Talk Time: 15:00-15:30

THE RSS SUPPORTING YOUR STATISTICAL FUTURE Sarah Nolan Royal Statistical Society

A representative of the Royal Statistical Societys Young Statisticians Section will outline how the Society supports and involves career-young statisticians, and how you can get involved in the sections activities.

60 SESSION 2 (14:00-15:30): Room: LG 19, Chair: Gareth Davies

Talk Time: 14:00-14:30

ADVANCES IN THE DESIGNOF STATISTICAL EXPERIMENTS Ian Cox JMP

Statistically designed experiments have the potential for great practical utility in many diverse applications. But this potential can only be realised if practitioners have access to software they can actually use effectively, both for experimental design and analysis. Through examples, this presentation shows two relatively recent developments in design (optimal designs and definitive screening designs). Both usefully extend the design reper- toire of scientists and engineers, allowing them to efficiently tackle a wider class of prob- lems than classical designs. The examples will be presented using JMP, Statistical Discov- ery software from SAS.

Talk Time: 14:30-15:00

STATISTICS AT CAPITAL ONE UK Dan Kellett and David Robinson Capital One

Capital One are not only one of the UKs top 10 credit card companies, we have been voted the UKs Best Workplace for the last three years running. Capital Ones “test and learn philosophy is central to the business and we are recognised as amongst the leaders in the Finance sector in our data driven approach to lending. In this talk we will provide an overview of the role of the statistics team within Capital One. This includes the development and maintenance of decisioning models, test design, investigation of new techniques and company wide analytic consultancy (stats clinics and training). We present examples of recent statistical challenges encountered, including the integra- tion of Big Data into traditional credit scoring approaches for classifying customer spend. Talk Time: 15:00-15:30

STATISTICS IN THE ENERGY INDUSTRY Tim Park and Wayne Jones Shell Projects and Technology

Shell is one of the largest independent producers of Oil and Gas, refined petroleum products and chemicals. A large amount of data is generated in manufacturing, marketing and corporate functions. The statistics group of Shell helps to convert this data into infor- mation which is used for informed decision making process. In this presentation we will discuss various problems and statistical approach which are relevant to this sector. We will discuss the importance of using of well-established techniques such as experi- mental design, parametric and non-parametric models, multivariate analysis such as Prin- cipal Component Analysis (PCA) and Partial Least Square (PLS).

61 We also discuss the importance of developing new methodology to tackle current prob- lems such as the dilemma of dealing with Big Data, which is ever increasing as the cost of acquisition and storage has dropped with time. Statistical approaches to understand the structure of big data and inferences that can be drawn using techniques such as the 2D Fourier transformation, spectral de-noising combined with multivariate and spatiotempo- ral techniques are important as are advances in the field of Bayesian statistics. Examples of big data in the industry and application of statistics to deal with them will be shared.

62 13 Poster Abstracts by Author

DISTANCE ANALYSIS OF FOOTBALL PLAYER PERFORMANCE DATA Serhat Emre Akhanli and Christian Hennig University College London

We present a new idea to map football (or soccer) player’s information by using mul- tidimensional scaling, and to cluster football players. The actual goal is to define a proper distance measure between players in order to explore their similarity structure. We be- lieve that this type of information can be very useful for football scouts when assessing players; also journalists and football fans will be interested in this information. The data was assembled from whoscored.com. Variables are of mixed type, containing nominal, or- dinal, count and continuous information. In the data pre-processing stage, four different steps are followed through for continuous and count variables to make the data ready for the analysis: 1) representation (i.e., considerations regarding how the relevant informa- tion is most appropriately represented, e.g., relative to minutes played), 2) transformation (football knowledge as well as the skewness of the distribution of some count variables indicates that transformation should be used to decrease the effective distance between higher values compared to the distances between lower values), 3) standardisation (in or- der to make within-variable variations comparable), and 4) variable weighting including variable selection and tuning of the relative influence of nominal and numerical variables. As part of the representation step, two types of player’s position information in binary and proportional form are analysed, and two different dissimilarity measures based on the player’s position on the field are used. In a final phase, all the different types of distance measures are combined by using the principle of the Gower dissimilarity. We show outcomes of multidimensional scaling and potentially clustering.

BAYESIAN INFERENCEFOR CONTINUOUS TIME MARKOV CHAINS Randa Alharbi University of Glasgow

Bayesian Inference for Markov Chains has a wide application in various complex sys- tem. Markov Chain Monte Carlo method (MCMC) has become an important tool in sta- tistical inference. However, performing MCMC requires evaluation of likelihood which is often not available analytically. Such problem motivates us to consider a method that allows to carry out Bayesian inference without explicit use of the likelihood. Approximate Bayesian computation (ABC) provides an approximation to the posterior distribution us- ing simulation from the model. In order to evaluate the quality of the ABC approach, a comparison is considered between ABC algorithm and Particle MCMC approach which uses a similar posterior approximation.

63 STATISTICS OF MARKETINGIN E-COMMERCE Iman Al-Hasani Durham University

Advertisers spend an enormous amount of money on online advertising to influence user behaviour. Online advertising allows advertisers access to detailed data relating to user behaviour such as paid clicks and website visits. These data, however do not provide indication on the incremental value of the advertising. There is no proper methodology which measures the incremental impact of such advertising. However, geo experiment has been considered as a good approach in many situations. In this experiment a region of interest is partitioned into a set of geographical areas (geos). One of the difficulties is that the generation of these geos is not straightforward despite the availability of target location services provided by Google AdWords. AdWords location targeting criteria in- cludes country, city, region, province, county, postal code and designated market areas (DMA). AdWords target locations are based on country, city or county in the United King- dom. The overall goal of the research is to explore statistical aspects of this process, in order to answer questions such as: how do we analyse data obtained from geographical experiments? How do we design the sampling process and how do we relate the effect of advertising spend to increase in sales revenue?

MODELLINGOF EXTREME VALUEIN TREE RINGS DATA Omar Alzeley University of Leicester

Extreme value theory (EVT) has been used in diverse fields such as hydrology, me- teorology, finance, insurance, environmental study and other fields. The purpose of this research is to model and investigate the extreme value in tree rings data. We found out that the possible candidate to model the tree rings data is Weibull distribution. The fitted model has been examined by using several statistical techniques.

ADAPTIVE MCMCMETHODSFOR INFERENCEON MULTIPLE CHANGEPOINT MODELS Alan Benson and Nial Friel University College Dublin

Multiple changepoint models are an important concept in statistical modelling. Their aim is to identify changes in model parameters over time. It is important to identify both the number of changes in a dataset and the points at which these changes occur. From a Bayesian perspective the aim is to form a posterior for both the number and positions of changepoints. Calculation of these posteriors can be done exactly using O(n2) recur- sions, Fearnhead (2006), but the current work focusses on using Adaptive MCMC methods where these recursions become slow and unable to deal with larger, denser datasets.

64 PROBABILISTIC DISTANCES BETWEEN PHYLOGENETIC TREES Maryam Garba, Tom Nye and Richard Boys Newcastle University

There are many different metrics for measuring how similar two evolutionary trees are, and these are widely used by biologists in a variety of analysis. However, existing metrics ignore that evolutionary trees are really probability models for gene sequence data. We therefore develop metrics between trees based on the properties of the corresponding induced distributions on gene sequence data. We intend to use the Hellinger distance, the Kullback-Leibler divergence and the Jensen-Shannon distance to measure the distance between evolutionary trees.

ESTIMATING DYNAMIC GRAPHICAL MODELSFROM MULTIVARIATE TIME-SERIES Alex Gibberd University College London

Our data-hungry society is not only harvesting more data points, but also measuring an ever-increasing number of variables. The complex systems represented by such data- sets arise in many socio-scientific domains, such as: cyber-security, neurology, genetics and economics. In order to understand such systems, we must focus our analytic and experi- mental resources on investigating the most significant relationships. However, searching for significant relationships between variables is a complex task. The number of possi- ble graphs encoding dependencies between variables becomes exponentially large as the number of variables increase, such problems are only compounded when one considers how the graphs may vary over time. We present and discuss a computationally efficient method that: (i) recovers key de- pendency structure within a time-series and (ii) detects structural changepoints where the dependencies between variables appear to change at a systematic level. The potential util- ity of our method is demonstrated by applying it to pressing real-world problems, such as detecting anomalies in computer networks and examining dependencies within gene activation networks.

65 MESSAGE PASSINGASAN OPTIMISATION METHODFOR ELECTRICITY DISTRIBUTIONINA VOLATILE GRID Elizabeth Harrison, David Saad and K. Y. Michael Wong Aston University

The introduction of renewable energy generators into the electricity grid is putting ad- ditional pressure on the stability and management of the electricity network. This is be- cause, in addition to the fluctuations in demand, one has limited control on the outputs of most renewable energy sources and cannot accurately predict them at any given time. The current method of controlling electricity distribution is based on split-second incremental adaptation of the distribution in response to changes and has shortcomings in dealing with volatility and uncertainty. Global optimisation of the distribution is a difficult problem and is prohibitive computationally. The aim of this project is to suggest a distributed approx- imate probabilistic solution for the optimisation based on a method known as Message Passing [1]. This alternative principled optimisation method is based on local calculations, it inherently accommodates uncertainties, is of modest computational complexity and pro- vides good approximate solutions. This work highlights the potential and capabilities of message-passing techniques for managing electricity networks by addressing difficult is- sues such as fluctuations and uncertainties. We aim to extend the use of this method to address other aspects, particularly the minimisation of production costs. [1] M. Wong and D. Saad. (2007). Inference and optimization of real edges on sparse graphs: A statistical physics perspective. Physical Review. (76) 011115.

EFFICIENT HIGH-DIMENSIONAL GAUSSIAN PROCESS REGRESSIONTO CALCULATE THE EXPECTED VALUEOF PERFECT PARTIAL INFORMATION IN HEALTH-ECONOMIC EVALUATIONS Anna Heath, Ioanna Manolopoulou and Gianluca Baio University College London

The Expected Value of Perfect Partial Information (EVPPI) is a decision theoretic mea- sure of decision uncertainty used principally in health-economic evaluations. This mea- sure has been recommended as the optimal way to quantify the impact of parameter uncertainty in decisions (a process that must be included in all health-economic evalu- ations in the UK). However, the computational time required to calculate this measure has prevented its widespread application. Recently, however, it has been demonstrated that non-parametric regression methods can be used to approximate this measure sig- nificantly reducing the computational power required. In low-dimensional cases fast re- gression methods can be used, but in higher-dimensions Gaussian Process (GP) regression is recommended. The computational cost of GP regression increases with the number of dimensions, meaning that this high-dimensional GP regression is still prohibitively ex- pensive for most health-economic evaluations. We propose an alternative method to esti- mate the hyperparameters of the GP using ideas from Spatial Statistics and the Stochastic Partial Differential Equation components of the R-INLA software package. R-INLA is a powerful package for fast approximate Bayesian inference. Our model is equivalent to standard GP regressions, but employs a fictional spatial structure, along with projections, to describe the correlation among the parameters using a Matrn covariance matrix. For complex problems, this development calculates the EVPPI around 100 times faster than standard GP regression methods.

66 POINT PROCESS MODELFOR NEURAL SPIKE TRAINS Gyeongtae Hong Lancaster University

Characterising the neuron spike train firing as a function of external stimulus applied in the experiment and intrinsic dynamics of neurons such as absolute and relative refrac- tory periods, history effects is important in neuroscience. Such a characterisation is very complex and the broad class of models to capture such details are keep required. One of the useful method for characterising neuron spike trains activity is a point process model. For instance, they have successfully characterised spiking activity of a rat hippocampal place cells and a sea hare nerve cells. There are many parametric point process models based on likelihood analysis. They assumed that the conditional intensity function is be- long to a class of parametric functions. On the other hand, nonparameteric methods for estimating conditional intensity function of a point process model is difficult to compute. However there are several advantages of nonparametric estimation methods. First, non- parameteric methods do not rely on data belonging to any particular distribution, which is also called distribution free method. We propose a nonparametric point process which is computaionally efficient for intrinsic dynamic model and a parametric point process using hawkes process. We formulate the con- ditional intensity function in terms of a discrete time Volterra expansion of the baseline spike rate and the neuron’s spiking history. Us- ing generalised linear model in a regression spline framework to address the maximum likelihood estimation problem. We illustrate our approach by fitting model to the 25 input and 18 output neuron spike trains. The residual analysis is performed by time-rescaling theorem.

MULTIVARIATE CURVE RESOLUTIONFORTHE MIXTURE ANALYSIS OF ACTIVE INGREDIENTSIN PHARMACEUTICAL TABLETSFROM ENERGY DISPERSIVE X-RAY DIFFRACTION DATA Peter Kenny University College London

The soft-modelling chemometric method of Multivariate Curve Resolution (MCR) has been applied to Energy Dispersive X-ray Diffraction data obtained from lab-prepared phar- maceutical tablets in a designed mixture experiment. Closure and non-negativity con- straints have been applied in addition to the Alternating Least Squares algorithm to quan- tify the concentrations of paracetamol, caffeine and microcrystalline cellulose in the tablets. In addition, MCR has the advantage of resolving chemically meaningful loading vectors allowing the pure spectra of the components to be resolved. Results are compared to the well-established Partial Least Squares calibration method and are shown to be of compa- rable prediction error.

67 CONTROLLABILITYAND STABILIZATION FOR A CLASSOF NONLINEAR TIME-VARYING SYSTEMS Rong Li and Bujar Gashi University of Liverpool

We consider one specific of autonomous nonlinear time-varying systems through the pseudo-linear form representation analysis involving fixed-point theorem. With the aid of fundamental linear control results, it’s shown that some sufficient conditions for controlla- bility can be obtained simply through assessing the algebraic criteria of rank condition. A simple numerical example from E.J. Davison & E.G. Kunze (1970) is used to illustrate the improvement of that result. In addition to refine the Eigen structure-based stability analy- sis for non-linear systems by H. Ghane & Mohammad B. Menhaj (2013), we propose suffi- cient conditions for the global asymptotic stability in one more general setting. Via using these conditions, We then explore a counter example by S.Muhammad & J. Woud (2009) which demonstrates the further assumption isnt sufficient for the global asymptotic stabil- ity that || exp[A(x(t))t]|| has a upper bound for all x(·) ∈ Cn[0, +∞) from W. Langson & A. Alleyne (2000). Based on the idea of stability analysis, a special kind of nonlinear systems can be stabilised system by designing a state-feedback controller and Luenberge observer in terms of possible related Kalman canonical decomposition. The theorems obtained from deterministic systems can transformed into stochastic setting with some restraints.

AVERIFICATION THEOREMFORTHE EXISTENCEOF NASH EQUILIBRIAIN A NON-ZERO SUM GAMEWITH CLASSICAND IMPULSE CONTROLS Randall Martyr and John Moriarty University of Manchester

Consider a balancing services contract between an electricity transmission system op- erator and the owner of an electric energy storage device (battery operator). The system operator chooses times (up to a finite number) and amounts of electricity (up to a max- imum amount) to be delivered by the battery operator. At each request time, the deliv- ered energy instantaneously affects the imbalance between demand and supply on the power system. Between request times the battery operator charges the store by purchas- ing electricity from the spot market. The problem for the system operator is to choose the appropriate delivery times and energy amounts which minimise the total imbalance in its portfolio over a prescribed time interval. Simultaneously, the battery operator would like to minimise the cost of its energy purchases and any additional penalties for partial delivery at the request times. These optimisation problems are combined into a non-zero sum stochastic differential game where one player (the battery operator) uses a continuous control and the other player (the system operator) uses a stopping / impulse control. The game may be reformulated as a iteration over intermediary non-zero sum games of control and stopping. A verification theorem for the existence of Nash equilibria for the original game is then obtained via the Hamilton-Jacobi-Bellman variational PDEs associated with the non-zero sum games of control and stopping.

68 THE NATURE OF CRIME HOTSPOTS Rebecca McKay University of Glasgow

My poster will look at the nature of crime hotspots which is the focus of my PhD. I aim to use quantitative methods in the social sciences with a particular focus on criminol- ogy. My PhD is very applied and uses clustering techniques to identify crime hotspots within the region of Strathclyde in Scotland. My poster will give a brief overview of the social theories relating to crime and place. I will look at the different ways of identifying crime hotspots and why they are problematic for example, spatial issues, temporal issues and why the crime type chosen is important. I will then establish the three main cluster- ing techniques that I will use to analyse the dataset and I will provide graphs and charts displaying the information in an easily accessible way.

THE ‘STEPAND TURN’ANIMAL MOVEMENT MODELIN CONTINUOUS-TIME Alison Parton and Paul Blackwell University of Sheffield

Understanding how wildlife paths evolve has the potential to affect a range of areas within ecology. Yet the questions of why, when and where animals move are still unan- swered in many areas of ecology. Wildlife movement paths are often complicated and dif- ficult the interpret; they regularly appear to exhibit ‘randomness’ and arise from a complex mixture of internal behavioural states, physiological constraints and memory-related pro- cesses. Due to their intuitiveness and accessibility to non-statisticians, the current group of statistical animal movement models with the most widespread use by ecologists are those based on observed ‘turning angles’ and ‘step lengths’, derived from an animal’s observed locations over time. However, these models are based in a discrete-time set- ting in which the animals location is only defined at pre-determined discrete time-points. This poster will introduce this popular range of models and demonstrate how they can be applied to observed animal movement data. It will be discussed why there is a need for movement models formulated in continuous-time rather than the more widespread discrete-time models, before introducing current work into the development of an ‘equiv- alent’ continuous-time version of the ‘step and turn’ model.

69 USING POLYGENIC RISK SCORESTO PREDICT DISEASE SEVERITY AT ONSETAND RESPONSETO TREATMENT IN RHEUMATOID ARTHRITIS PATIENTS Jenna Strathdee1, John C. Taylor1, YEAR Consortium, Tim Bongartz2, James I. Robinson1, Paul Emery1, Ann W. Morgan1 and Jennifer H. Barrett1 University of Leeds1, Mayo Clinic2

Many genetic variants have been associated with developing rheumatoid arthritis (RA), but most individual variants have a very small effect on susceptibility. It is not known whether patients with a higher genetic predisposition to RA have more severe disease at presentation or respond differently to treatment. The aim here is to combine susceptibility variants into one variable, a polygenic risk score (PRS), and examine its association with disease severity at presentation and treatment response. Subjects (n = 342) were from the Yorkshire Early Arthritis Register and were all treated with methotrexate. 101 genetic variants associated with RA susceptibility were combined into a weighted PRS. For each variant the number of risk alleles an individual carries was multiplied by the log odds ratio for the per-allele effect of a variant on risk, estimated in a previous meta-analysis; the PRS was formed by summing over all variants. To test this score, it was included in logistic models looking at rheumatoid factor and family history of RA, both of which are thought to be genetically driven. The PRS was then used as a predictor in linear regression analysis of several measures of baseline disease severity and of change in these variables after 6 months of treatment, adjusting each model for age and sex. The PRS was a significant predictor of both rheumatoid factor (p < 0.001) and family history (p < 0.001). For the other outcomes a higher PRS was associated at the 5% level with a lower baseline swollen joint count (p = 0.03) and tender joint count (p = 0.03), but no other outcome measure. The prediction models will now be extended to include other genetic factors associated with RA risk.

FUNCTIONAL DATA ANALYSIS FOR GAIT PATTERNS Bairu Zhang and Heiko Grossmann Queen Mary University of London

Functional data analysis is a advanced methodology to study gait data, which are col- lected over time. In this poster, two different models for functional gait data are displayed. For healthy subject, left and lower limb perform synchronously. One curve can be be used to represent each kinematic variable and all response curves in the model are indepen- dent. Fixed-effect model with time-independent covariates can be applied for functional gait data. For subjects with hemiplegia cerebral palsy, left and right gait patterns vary during the movement. There are more than one gait curves for each kinematic variable and response curves from the same patient are correlated. Both fixed-effects and random- effects are important in the model for cerebral palsy gait. Therefore, mixed-effects model with time-independent covariates is applied for functional gait data in cerebral palsy pa- tients.

70 RSC 2016: University of Dublin

39th Research Students' Conference in Probability and Statistics

University College Dublin

14th – 17th June 2016

www.rsc2016.ie

71 14 Sponsors’ Advertisements

New & Essential Statistics Titles from CRC Press

Browse our latest Statistics catalog at http://tinyurl.com/pu2f7zx

SAVE 20% when you order online and enter Promo Code AZP96, FREE standard shipping when you order online (expires 31/12/2015)

www.crcpress.com CRC Press Taylor & Francis Group

email: [email protected] | 1-800-634-7064 • 1-561-994-0555 • +44 (0) 1235 400 524

72

http://bir.biometricsociety.org/

Membership of the International Biometric Society is FREE for PhD students!

If you haven’t already joined, there will be a chance to sign up during the sponsor’s reception. Alternatively, go to the British and Irish Region website and join there!

This is the blurb describing what we are:

The British Region of the International Biometric Society was founded in 1948 by eminent statisticians and biologists of the time, including R. A. Fisher and J. B. S. Haldane. The British and Irish region (formed in 2006) covers the United Kingdom of Great Britain and Northern Ireland and also The Republic of Ireland.

The general objective of the society is to promote and extend the use of mathematical and statistical methods in pure and applied biological sciences.

Members of the International Biometric Society get online access to the journal Biometrics and discounted rates at meetings – again, see the website for details. EXPLORE

Introduced in 1989 with scientists and engineers in mind, JMP® software links powerful statistics to interactive graphics. It keeps data in flow, no matter whether it’s small, tall or wide. Because there is a graph for every statistic, you can pursue your analysis without restraint.

Try JMP software for yourself at jmp.com/trial

Available for Mac® and Windows

SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2015 SAS Institute Inc. All rights reserved. 1190876UK0515 15 RSC History

39 2016 Dublin 38 2015 Leeds 37 2014 Nottingham 36 2013 Lancaster 35 2012 Southampton 34 2011 Cambridge 33 2010 Warwick 32 2009 Lancaster 31 2008 Nottingham 30 2007 Durham 29 2006 Glasgow 28 2005 Cambridge 27 2004 Sheffield 26 2003 Surrey 25 2002 Warwick 24 2001 Newcastle 23 2000 Cardiff and University of Wales 22 1999 Bristol 21 1998 Lancaster 20 1997 Glasgow 19 1996 Southampton 18 1995 Oxford 17 1994 Reading 16 1993 Lancaster 15 1992 Nottingham 14 1991 Newcastle 13 1990 Bath 12 1989 Glasgow 11 1988 Surrey 10 1987 Sheffield (9) (8) 1985 Oxford (7) 1984 Imperial College (6) (5) (4) (3) 1982 Cambridge (2) 1981 Bath (1) 1980 Cambridge

76 16 Delegate List

Name Email Affiliation Research Interests Year Umar Mallam [email protected] University of Kent Stochastic SIR household 2 Abdulkarim epidemic model Samira Abushilah [email protected] University of Leeds 2 Serhat Emre Akhanli [email protected] University College Cluster analysis 1 London Craig Alexander [email protected] University of Graphical models, Hidden 1 Glasgow Markov models, machine learning Mai Alfahad [email protected] University of Leeds Protein structure, multivariate 1 analysis, Bayesian statistics Fatimah Alghamdi [email protected] University of Leeds Bayesian statistics 1 Randa Alharbi [email protected] University of Bayesian inference over Markov 2

77 Glasgow chain models Iman Al-hasani [email protected] Durham University Applied statistics, E-commerce 2 statistics Aziz Aljuaid [email protected] University of Leeds Bayesian multiple quantile 2 regression Wael Abdulateef [email protected] Newcastle Bayesian networks, probabilistic 1 Jasim Al-Taie University models Omar Alzeley [email protected] University of Statistical modelling of extreme 1 Leicester values Joshua Ayres [email protected] University of Bath MCMC methods, spatial 2 statistics Abla Azalekor [email protected] Heriot-Watt Applied probability 2 University Alan Benson [email protected] University College Adaptive MCMC, Change point 2 Dublin analysis, Recursive segmentation Tom Berrett [email protected] University of Nonparametric statistics, 1 Cambridge entropy estimation Ignatius Boateng [email protected] University of Leeds Gaussian process emulation, 1 uncertainty in complex computer models Nebahat Bozkus [email protected] University of Leeds Wavelets, lifting, phylogenetics 1 Tom Burnett [email protected] University of Bath Adaptive enrichment, clinical 2 trials, Bayesian decision making Adrian Byrne [email protected] University of Longitudinal data analysis, 1 Manchester multilevel modelling, quantile regression Rosanna Cassidy [email protected] University of Statistics, epidemics, genomic 2 Nottingham data Ho Hin Henry Chan [email protected] Lancaster University Multiple imputation, 3 multivariate logistic regression Zexun Chen [email protected] University of Gaussian process, machine 2 Leicester learning, kernel selection, stock market 78 Svetlana Cherlin [email protected] Newcastle Phylogenetics 3 University Richard Culliford [email protected] University of Phylogenetic/population trees, 1 Reading sequential Monte Carlo Gareth Davies [email protected] Cardiff University Official statistics, survey 2 sampling, calibration, non-response, optimization Alice Davis [email protected] University of Bath Survival analysis, information 1 geometry Benjamin Davis [email protected] University of Probability, epidemics, 2 Nottingham stochastic processes, branching processes, networks Olga Egorova [email protected] University of Optimal experimental design 2 Southampton Yiolanda Englezou [email protected] University of , 1 Southampton Gaussian process Serveh Sharifi Far [email protected] University of Loglinear models, contingency 2 St Andrews tables, model identifiability Huang Feng [email protected] London School of Statistics, time series 1 Economics and Political Science Maryam Garba [email protected] Newcastle Phylogenetics, probability 1 University theory Tusharkanti Ghosh [email protected] University of Statistical genomics, Bayesian 2 Glasgow methods Alex Gibberd [email protected] University College Graphical models, time series, 2 London convex optimisation, unsupervised learning Richard Glennie [email protected] University of Statistical ecology 1 St Andrews Eoin Gray epgray1@sheffield.ac.uk University of Statistics, medical statistics, 2 Sheffield lung cancer, prediction models Adam Griffin adam.griffi[email protected] University of Mathematical epidemiology, 2 Warwick probability

79 Pieralberto [email protected] University of Sequential Monte Carlo 2 Guarniero Warwick Wenyan Hao [email protected] University of Quantum language for finance 2 Leicester via the generator approach, derivative pricing Elizabeth Harrison [email protected] Aston University Electricity distribution using 3 probabilistic modelling Hamed [email protected] Brunel University 2 Haselimashhadi Anna Heath [email protected] University College Health economics, INLA, 1 London Bayesian non-parametrics Gyeongtae Hong [email protected] Lancaster University Neural science 4 Md Anower Hossain [email protected] London School of Missing data in cluster 2 Hygiene and Tropical randomised trials Medicine Guowen Huang [email protected] University of Air pollution health effects 2 Glasgow Joseph Hunt jhunt1@sheffield.ac.uk University of 1 Sheffield Muhammad Safwan [email protected] Newcastle Time series 2 Ibrahim University Samuel Jackson [email protected] Durham University Bayesian emulation, history 1 matching, Bayes linear statistics Muhammad Irfan bin [email protected] Newcastle Bayesian statistics, survival 1 Abdul Jalal University analysis, medical statistics Taghreed Jawa [email protected] University of Medical statistics 2 Strathclyde Stephen Johnson [email protected] Newcastle Bayesian inference, rank data, 1 University MCMC Hadeeel Kalktawi [email protected] Brunel University 2 Panicha [email protected] University of Capture-Recapture 2 Kaskasamkul Southampton Peter Kenny [email protected] University College Multivariate analysis, 1 London chemometrics, analytical chemistry, X-ray physics 80 Aamir Khan [email protected] Newcastle Bayesian inference, stochastic 1 University processes Rana Hamza [email protected] University of for mixture 1 Khashab Southampton experiments Eirini Koutoumanou [email protected] University College Multivariate data, R 4 London Vasiliki Koutra [email protected] University of Designing experiments on social 2 Southampton networks Sophia Kyriakou [email protected] University College 1 London YingYing Lai [email protected] Newcastle Bayesian statistics, MCMC, time 1 University series Luting Li [email protected] London School of Piecewise deterministic Markov 1 Economics and process, credit risk, market risk, Politicital Science portfolio replicator Rong Li [email protected] University of Stochastic control, stability 2 Liverpool analysis Shiju Liu [email protected] London School of Applied probability, insurance 2 Economics and risk, mathematical finance Political Science Xi Liu [email protected] Brunel University Clustering, big data, mixture 1 models Ernest Mangantig [email protected] Leeds Institute of Statistical genetics 2 Cancer and Pathology Irene Marinas [email protected] University of Shape analysis, Gaussian 2 Glasgow process Randall Martyr [email protected] University of Optimal control, stochastic 3 Manchester control, stochastic games Joe Matthews [email protected] Newcastle Bayesian modelling, road safety, 1 University regression to the mean Alasdair McIntosh [email protected] University of Statistical genetics, medical 2 Glasgow statistics

81 Fabrizio Messina fmessina1@sheffield.ac.uk University of Medical Statistics 2 Sheffield Adria Caballe [email protected] University of High-dimensional data, 2 Mestres Edinburgh graphical models, genetics Rebecca McKay [email protected] University of Crime hotspots, clustering 2 Glasgow methods Angelo Moretti [email protected] University of Small area estimation methods 1 Manchester Keith Newman [email protected] Newcastle Bayesian inference, linear 3 University Gaussian models, sparse matrix methods Chibuzor Nnantu [email protected] Lancaster University Epidemic models, MCMC, 2 Bayesian statistics Colleen Nooney [email protected] University of Leeds Coevoluton, protein structure, 3 phylogenetics Jude Chukwura Obi [email protected] University of Leeds Statistical Learning 3 Vincent Onkabetse [email protected] University of Leeds Non-linear models and 1 applications in medicine Jamie Owen [email protected] Newcastle Bayesian, pseudo marginal, 4 University ABC, likelihood free Alison Parton aparton2@sheffield.ac.uk University of Statistical ecology, animal 1 Sheffield movement Khuneswari Gopal [email protected] University of Model selection, model 3 Pillay Glasgow averaging, missing data Riccardo Rastelli [email protected] University College Social network analysis, 2 Dublin clustering Yordan Raykov [email protected] Aston University 2 Joanne Rothwell j.c.rothwell@sheffield.ac.uk University of Regression to the mean, clinical 1 Sheffield trials, prediction, effect size Anusua Singh Roy [email protected] Edinburgh Napier Big data, Bayesian modelling 2 University and clustering, statistical modelling of online behavioural data, multidimensional clustering, predicting survival 82 times Qingying Shu [email protected] University of Time series, spatial statistics 2 Glasgow Claire Simons [email protected] MRC Biostatistics Cost-effectiveness, 1 Unit, Cambridge heterogeneity, meta-analysis, structural uncertainty Rebecca Simpson r.m.simpson@sheffield.ac.uk University of Children with asthma, 1 Sheffield intervention, prediction A’yunin Sofro [email protected] Newcastle Modelling, spatial analysis, 3 University multivariate Jessica Stockdale [email protected] University of Epidemic modelling, Bayesian 1 Nottingham statistics, MCMC Jenna Strathdee [email protected] Leeds Institute of Statistical genetics 1 Cancer and Pathology Nicholas Tawn [email protected] University of 2 Warwick Tayfun Terzi [email protected] London School of Person-fit indices, latent 2 Economics and variable models, online Political Science surveys, micro-jobbers, paradata, response pattern identification, appropriateness measurement, model aberrant responders Daniel Tompsett [email protected] University of Simultaneous confidence bands 3 Southampton and effective doses, logistic regression Maria Toomik [email protected] University College Regularisation approaches to 1 London highly structured data, multiresolution analysis Tita Vanichbuncha [email protected] University of Kent Ranking analysis 2 Rui Vieira [email protected] Newcastle SMC, big data 2 University 83 Cunyi Wang [email protected] University of Social network analysis 1 Glasgow Min Wang [email protected] University of Probability, modelling 1 Leicester Tianmiao Wang [email protected] University of Bristol Nonparametric regression, time 3 series Sophie Watson [email protected] University of Bristol Approximate Bayesian 2 Computation, Monte Carlo methods, genetics applications Mark Webster [email protected] University of Leeds Monte Carlo methods, 3 probability and statistics Amy Whitehead a.whitehead@sheffield.ac.uk University of Clinical trials, experimental 4 Sheffield design Craig Wilkie [email protected] University of Environmental statistics, 2 Glasgow downscaling Pengcheng Zeng [email protected] Newcastle Functional data analysis 1 University Bairu Zhang [email protected] Queen Mary for 2 University of London functional data Yajing Zhu [email protected] London School of Structural equation modelling, 1 Economics and longitudinal analysis Political Science Anyi Zou [email protected] University of Statistical modelling, categorical 2 Warwick data analysis 84 17 Voting Slip for Best Talks and Best Poster Prizes will be awarded to the three best talks and the best poster as voted for by yourselves, the delegates.

Please use this page to vote for your three favourite talks (in any order) and your favourite poster and hand it in by 14:00 on Wednesday 5th August. The winners will be announced at the conference dinner in the evening of Wednesday 5th August.

Best talks:

Best poster:

85 Back of RSC 2015 Voting Slip

School of Mathematics University of Leeds Leeds LS2 9JT rsc2015.co.uk