Timetable of Events

Monday 23rd March

13:00 Registration of delegates (George Fox Foyer).

15:00 Afternoon Tea (George Fox Foyer). 16:00 Opening Address & Plenary Session (George Fox Building). 16:00 Opening Address: Amanda Chetwynd 16:15 Plenary Talk I: David Hand 17:00 Plenary Talk II: Peter Diggle

19:00 Dinner (County South Restaurant).

20:00 Traditional Pub Quiz (County Restaurant Bar).

Tuesday 24th March

08:00 Breakfast (County Restaurant).

09:10 Session 1 (George Fox Building).

10:45 Refreshments (George Fox Foyer).

11:10 Session 2 (George Fox Building).

13:10 Lunch (George Fox Foyer).

14:10 Session 3 (George Fox Building).

15:10 Poster Session and Refreshments (George Fox Foyer).

16:20 Session 4 (George Fox Building).

18:00 Dinner (County Restaurant).

Evening Entertainment 19:00 First Bus Collection from Underpass to City Centre (Cinema & Castle Tour) 19:30 Second Bus Collection from Underpass to City Centre (Bar) 22:30 First Bus Collection from Lancaster Infirmary to Lancaster Campus 00:00 Second Bus Collection from Lancaster Infirmary to Lancaster Campus 00:30 Third Bus Collection from Lancaster Infirmary to Lancaster Campus

1 Wednesday 25th March

08:00 Breakfast (County Restaurant).

09:10 Session 5 (George Fox Building).

10:45 Refreshments (George Fox Foyer).

11:10 Session 6 (George Fox Building).

13:10 Lunch (George Fox Foyer).

14:10 Session 7 (George Fox Building).

15:45 Sponsors’ Wine Reception (George Fox Foyer).

Conference Dinner. 18:30 First Bus Collection from Underpass to Dinner 19:10 Second Bus Collection from Underpass to Dinner 20:00 Dinner and Ceilidh 23:45 First Bus Collection to Lancaster Campus 00:30 Second Bus Collection to Lancaster Campus

Thursday 26th March

08:00 Breakfast (County Restaurant).

10:00 Delegates Depart.

2 Contents

1 Welcome from the Organisers 6

2 The City and University 7

3 Campus Map 9

4 University Facilities 10

5 Accommodation at County College 11

6 Conference Details 12 6.1 Meals ...... 12 6.2 Sponsors’ Wine Reception ...... 12

7 Help, Information and Telephone Numbers 13 7.1 Computing and Internet Access ...... 13

8 Plenary Session 14 8.1 Professor Amanda Chetwynd () ...... 14 8.2 Professor David Hand () ...... 15 8.3 Professor Peter Diggle (Lancaster University / John Hopkins University) 16

9 Instructions 17 9.1 For Chairs ...... 17 9.2 For Speakers ...... 17 9.3 For Displaying a Poster ...... 18 9.4 Prizes ...... 18

10 List of Sponsors’ Talks 19

11 Talks Schedule 20 11.1 Monday 23rd March ...... 20 11.2 Tuesday 24th March ...... 21 11.3 Wednesday 25th March ...... 26

12 Talk Abstracts by Session 30 12.1 Tuesday 24th March ...... 30 12.1.1 Session 1a: Experimental Design ...... 30 12.1.2 Session 1b: Non-Parametric ...... 32 12.1.3 Session 1c: Probability ...... 35 12.1.4 Session 2a: Environmental ...... 37 12.1.5 Session 2b: Computational ...... 40 12.1.6 Session 2c: Medical I ...... 43 12.1.7 Session 3a: General I ...... 46 12.1.8 Session 3b: Dimension Reduction ...... 48

3 12.1.9 Session 3c: Genetics and Systems Biology I ...... 49 12.1.10 Session 4a: Operational Research ...... 51 12.1.11 Session 4b: Time Series ...... 52 12.1.12 Session 4c: Diffusions ...... 54 12.2 Wednesday 25th March ...... 57 12.2.1 Session 5a: Genetics and Systems Biology II ...... 57 12.2.2 Session 5b: Spatial ...... 59 12.2.3 Session 5c: General II ...... 61 12.2.4 Session 6a: General III ...... 64 12.2.5 Session 6b: Multivariate Statistics ...... 66 12.2.6 Session 6c: Medical II ...... 69 12.2.7 Session 7a: Clinical Trials ...... 72 12.2.8 Session 7b: Financial ...... 74 12.2.9 Session 7c: General IV ...... 76

13 Poster Abstracts by Author 78

14 RSC 2010: Warwick University 88

15 Sponsors’ Advertisements 89

16 RSC History 103

17 Delegate List 104

4 1 Welcome from the Organisers

Welcome to the 32nd Research Students’ Conference in Statistics and Probability (RSC 2009). This year the conference is hosted by Lancaster University. The RSC is an annual event aiming to provide postgraduate statisticians and probabilists with an appropriate forum to present their research. This four day event is organised by postgraduates, for postgraduates, providing an excellent opportunity to make con- tact and discuss work with other students who have similar interests. For many students this will be your first experience of presenting your work, and some of you will have the opportunity to chair a session. For those of you attending and not presenting, we hope that you will benefit greatly from observing others and networking with researchers working in a similar field. Finally, I would like to give warning that we will be looking for potential hosts of RSC in 2011. If you think your group would be suitable to hold the event and would be keen to take part in such an exciting project, please let me know. Next year the conference will be held in Warwick. Please enjoy the conference and Lancaster,

Matt Sperrin Head Conference Organiser

5 2 The City and University

Lancaster Lancaster is the county town of Lancashire in the north west of . It is the seat of the Duchy of Lancaster and much of the land is owned by Her Majesty the Queen, who also holds the hereditary title Duke of Lancaster. The town, which became a city in 1937, has a long history which stretches back to pre-Roman times. The Norman castle, which was originally built as a fortification against the Scots, stands in good repair and is still used as both a court and a prison. Once known as ‘The Hanging Town’ because more prisoners were sentenced to death here than at any other court in the land, it is famous for the trial of the Lancashire Witches. In the 18th century Lancaster was an important port as the town was then acces- sible to sea-going ships. Tidal changes resulted in the silting up of the river estuary and nowadays only small boats can sail so far up the River Lune. Most of the fine buildings in the city date from this era of prosperity and the boom in the cotton and slave trade with Africa and the USA.

The University Lancaster University was established in 1964, when it was granted its . Since then it has expanded both in size and reputation. There are now over 10,000 students studying in departments that are recognised nationally and internationally for the quality of their research and teaching. The origins of the university lie in the years following the Second World War, as the future of further and higher education became an important concern of the British government. The government faced immense problems as it tried to cope with the demands of an expanding population and the advent of a new technological age. After the war, there were only nine universities and less than 1000 full time students in the country. Between 1958 and 1961, this balance was redressed as 7 new universities were announced; one of these was Lancaster University. In the 1960s, the North-West had more than 7 million inhabitants so it is easy to see why a third university was proposed for the region. The first Vice Chancellor of Lan- caster University, Charles Carter made a somewhat tongue-in-cheek comment about why the university was built, when he stated that the people in London wanted a new university in this area in order to “civilize the North.” In addition to Lancaster’s bid, other towns such as Morecambe and Blackpool displayed an interest in the new university. However, Lancaster was thought to be a peaceful area for study, with a rich cultural heritage and plenty of facilities for potential students and members of staff. Lancaster was one of the last of ‘new’ universities to be authorized by the gov- ernment. Princess Alexandra became the Chancellor of the University and was inau- gurated in 1964. The ceremony also saw the granting of various honorary degrees to amongst others, the new Prime Minister Harold Wilson. Wilson delivered his speech only a short time after his election win with the Labour party.

6 The University accepted its first students in October 1964. The motto adopted by the new university was “Patet omnibus veritas” which means “Truth Lies Open to All” reflecting the hope that the expansion of higher education would lead to the extension of education to all. The colours of the university are red and Quaker grey, the latter reflecting the strong Quaker presence in the town and region. Lancaster University is based upon a collegiate system and is one of only six colle- giate universities in the country. The new university wanted to encourage interaction between students with varied interests. This was also encouraged by the creation of residences where both students and staff could live and work together. The colle- giate system has had a significant influence upon the university body. It has helped to forge a strong sense of identity amongst staff and students alike and continues to be one of the defining features of student life at Lancaster. In recent years, a major program to build new accomodation and refurbishment existing facilities has begun. This will be followed by building work for Lancaster University Masterplan 2007–2017, envisioning improved access across the Univer- sity with enhanced greenery and the construction of 27 new buildings, largely for academic use. It will cost an estimated £420 million. In the academic year 2008–2009 work will begin on a new School building, new social space and a £21 million sports centre.

Department The Department of Mathematics and Statistics is a large and vibrant department, embracing a Mathematics Section and three Statistics Sections, which combined have received top ratings in external assessments for research and teaching quality. The Department has a large and vibrant statistical group with a high international repu- tation The group consists of over 20 staff with 30-40 RAs, PhD students and visiting PhD students. For RAE2001, the Statistics return was graded at 5* and was the largest single institution Statistics group to achieve 5*. Subsequently, this grading was increased to 6* to reflect the combination of historical strength and recent growth in statistical activity at Lancaster. The research and teaching interests of the Lancaster Statistics Group are diverse. Main reseach areas include Medical Statistics and Epidemiology, Applied Statistics, Bayesian and Computational Statistics, Extreme Values, Applied Probability and Op- erational Research and Statistical Genetics. A strength of the Group is the breadth and depth of collaborative research with Medicine, Social Science, Finance, Biology, Vet- erinary and Environmental Science, Management Science and other areas of Mathe- matics. In 2008 the Department launched the Postgraduate Statistics Centre. This is a HEFCE funded Centre of Excellence in Teaching and Learning which will stimulate research and postgraduate training in the future. The centre is housed in a purpose- built building specially designed to minimise its environmental impact which won an award from the Royal Institute of British Architects for its innovative design.

7 3 Campus Map

8 4 University Facilities

The Lancaster University campus has a wide range of facilities; a bakery, bookshop, newsagents, Students’ Union shop, hairdresser, a post office, two banks, and a small supermarket (Spar) are all located in or near to Alexandra Square, shown on the campus map. Also on campus are a pharmacy, a health centre and a sports centre. There are several bars on campus some of which will be open during the conference week; details will be provided at the conference. The university Sports Centre has a swimming pool, sports hall, squash and bad- minton courts, table tennis room, sauna and solarium, a rock climbing (bouldering) wall and various outdoor facilities. The centre will be open from 8.30am - 10.15pm during the conference, except the swimming pool which closes at 9:45pm each day and on Wednesday and Thursday opens before the rest of the centre at 8:00am. Un- fortunately the weight training rooms cannot be used by delegates as it requires an induction. A regular bus service into Lancaster City Centre operates from the University Underpass, beneath Alexandra Square. The main buses that go to the City Centre are numbered X1, 2A, 2, 3 or 4, and bus times are posted at all bus stops. The X1 is the only service directly to the train station.

9 5 Accommodation at County College

Conference accommodation will be on campus in County College. Breakfast and dinner will be in the County South Restaurant, except for the conference dinner on Wednesday. Check In opens at 1:00pm on Monday and Check Out is by 10:00am on the day of departure (Thursday). The accommodation has shared facilities for all rooms. County college is named after Lancashire County Council who donated a gener- ous sum of money towards its construction and the university’s early running costs. This is commemorated in the college’s Latin motto, shown on the coat of arms be- low, “Sine concilio nihil”: nothing without design, or, translating concilio differently, nothing without the council. The coat of arms also contains an oak leaf, the symbol of the college, representing the oak tree in the college quad which is thought to be two hundred years old.

10 6 Conference Details

On Monday 23rd, delegates should arrive at the George Fox Building between 13:00 and 15:00 to register and collect conference packs. These contain all the information needed during the conference. If you are presenting a poster, please submit it at reg- istration. Following this, delegates can check-in to their accommodation in County College. Conference opening will take place at 16:00 in the George Fox Building. On Tuesday 24th and Wednesday 25th, delegates will have the opportunity to present talks in the George Fox Building. Posters will be on display in the foyer of the George Fox builing throughout Tuesday 24th, with the poster session at 15:10. Presenters are encouraged to be near their posters during this session in order to answer questions from interested participants.

6.1 Meals On Tuesday, Wednesday and Thursday, breakfast will be at County South Restaurant at 08:00-09:00. Lunch will be provided in the foyer of the George Fox building at 13:10-14:10 on Tuesday and Wednesday. Dinner will be in County South Restaurant at 19:00-20:00 on Monday and 18:00-19:00 on Tuesday. Dinner on the Wednesday evening will be at Crofters Hotel in the nearby town of Garstang between 20:00-21:30. The conference dinner will have a ’black and white’ theme. We encourage you to adhere to the theme - this can be as simple as ensuring your clothing (smart/casual) is black and white, to something more extravagant if you so desire! After the meal there will be a Ceilidh.

6.2 Sponsors’ Wine Reception The Sponsors’ Reception will be held in the foyer of the George Fox Building on the Wednesday at 15:45-17:30, prior to the conference dinner. Please take this oppor- tunity to talk with our sponsors and visit their displays to learn more about some possible career opportunities. Coaches to the conference dinner will pick delegates up from the underpass at 18:30 and 19:10.

11 7 Help, Information and Telephone Numbers

Department address: Department of Mathematics and Statistics Fylde College Lancaster University Lancaster, LA1 4YF Telephone: 01524 593960 Fax: 01524 592681 Email: [email protected]

Emergency Numbers: University Security: 01524 594541 (94541 or 999 internally) (also for general emergencies) Conference Organiser 1: 07845698986 (Matthew Sperrin – head organiser) Conference Organiser 2: 07886841877 (Michelle Stanton – on campus throughout) County College Porter: 01524 592560 (92560 internally)

Transport: A to B Taxis: 01524 60000 A1 Taxis: 01524 32090 or 35666 848848 Radio Taxis: 01524 848848 Bus information: 08712 002233 National Rail Enquiries: 08457 484950

7.1 Computing and Internet Access

Free computing and internet access will be available to all delegates in the PSC (Post- graduate Statistics Centre) and the University Library from Monday to Wednesday. RSC delegates will be provided with a username and password in order to access this service. If there are any problems please inform one of the conference organisers.

Username:rsc2009 Password: rsc2009

Delegates will also have access to the University Wireless Network and internet in their accommodation (ResNet), using the username and password given at registra- tion. Please note that in order to use ResNet you will need to bring your own ethernet cable.

12 8 Plenary Session

8.1 Professor Amanda Chetwynd (Lancaster University) Opening Address

Amanda Chetwynd combines the roles of Professor of Mathematics and Statistics and Pro-Vice-Chancellor for colleges and the student experience. The involves foster- ing good working relationships between the management of the university and the student body and leading development of the student experience at Lancaster. The student experience includes academic, social and welfare provision. Her research interests are in combinatorics and medical statistics. She is currently working on de- veloping new methodology for combining longitudinal and survival data applied to a clinical trial of heart disease patients following implantation of artificial valves.

13 8.2 Professor David Hand (Imperial College London) Title: Size Matters

Abstract The ideas of measurement are so ubiquitous that we often fail to notice them: they are simply parts of the conceptual universe in which we function. How- ever, it has not always been thus and sometimes, even now, rips in this usually unnoticed background fabric appear, casting doubts on one’s view of the way the world works. Occasionally these tears have serious, even fatal consequences. This talk looks at the conceptual infrastructure of quantification, showing how humans have constructed it, how it can be interpreted, how it is manipulated to make valid inferences about the real world, and how the ideas are central to statistics. The talk is illustrated with measurement tools from psychology, medicine, physics, economics and other areas.

David J. Hand is the President of the Royal Statistical Society and Professor of Statis- tics in the Department of Mathematics at Imperial College. He has broad research interests, including multivariate statistics, classification methods, pattern detection, the interface between statistics and computing, and the foundations of statistics. He is interested in applications in medicine, psychology, and finance.

14 8.3 Professor Peter Diggle (Lancaster University / John Hopkins Uni- versity) Title: Geostatistical Inference Under Preferential Sampling

Abstract Geostatistics involves the fitting of spatially continuous models to spatially discrete data (Chiles` and Delfiner, 1999). Preferential sampling arises when the process that determines the data-locations and the process being modelled are stochastically dependent. Conventional geostatistical methods assume, if only implicitly, that sampling is non-preferential. However, these methods are often used in situations where sampling is likely to be preferential. For example, in mineral exploration samples may be concentrated in areas thought likely to yield high-grade ore. We give a general expression for the likelihood function of pref- erentially sampled geostatistical data and describe how this can be evaluated approximately using Monte Carlo methods. We present a model for preferential sampling, and demonstrate through simulated examples that ignoring preferen- tial sampling can lead to seriously misleading inferences. We describe an appli- cation of the model to a set of bio-monitoring data from Galicia, northern Spain, in which making allowance for preferential sampling materially changes the re- sults of the analysis. The talk is based on joint work with Raquel Menezes and Ting-Li Su (Diggle, Menezes and Su, 2009).

References [ 1] Chiles,` J-P and Delfiner, P. (1999). Geostatistics. New York : Wiley. [ 2] Diggle, P.J. Menezes, R. and Su, T-L. (2009). Geostatistical inference under prefer- ential sampling (with Discussion). Applied Statistics (to appear)

Peter Diggle has an extensive record of published research which includes major con- tributions to spatial statistics, longitudinal data analysis, time series and Monte Carlo methods of inference. Most of his research has been motivated by applications in the environmental, biomedical and health sciences. In addition to his substantive post in the Division of Medicine, Prof Diggle is an Adjunct Professor of Biostatistics at the Johns Hopkins University School of Public Health, founding co-editor of the journal “Biostatistics” and a trustee for the Biometrika Trust.

15 9 Instructions

9.1 For Chairs

• Please arrive at the appropriate seminar room five minutes before the start of your session. Familiarise yourself with the visual equipment.

• Packs will be left in each seminar room. Do not remove the packs or any of their contents from the seminar room. If you think something might be missing from the pack, please contact one of the organisers.

• You should clearly introduce yourself and each speaker in turn.

• It is very important that we stick to the schedule. Therefore please start the session on time, use the time remaining cards, and make sure that questions are not allowed to delay the rest of the session.

• If a speaker fails to show, please advise the audience to attend a talk in an alter- native seminar room. Do not move the next talk forward.

• After each talk, thank the speaker, encourage applause, and open the floor to questions (from students only). If no questions are forthcoming, ask one your- self.

• Use the 5 min and 1 min flash cards to assist the speaker in finishing on time.

9.2 For Speakers

• Each seminar room will contain a computer, data projector and white board.

• Arrive five minutes before the start of the session, introduce yourself to the chair and load your presentation onto the computer.

• Presentations must be pdf or Powerpoint (ppt or pptx) files. No other format is acceptable.

• Talks are strictly fifteen minutes plus five minutes for questions. Anyone going over this time will be asked to stop by the chair.

• Your chair will let you know when you have five minutes and then one minute remaining for your presentation.

16 9.3 For Displaying a Poster

• The poster session will be held in foyer of the George Fox building at 15:10 on Tuesday 24th March.

• Please submit posters upon registration on Monday 23rd March.

• Posters will be erected by conference organisers.

• During the poster session, it is advisable to be near your poster in order to answer questions from interested participants.

• Posters will also be displayed throughout Tuesday.

• Please ensure that your poster is removed by 18:00 on Tuesday.

• Posters should be of no greater size then A1.

9.4 Prizes The three best talks and the best poster, as voted for by all delegates, will receive the following, courtesy of the Royal Statistical Society:

The RSS will offer the best three presentations and the best poster from the RSC2009 conference the opportunity to present their work at the RSS2009 conference which will be held from 7-11 September in Edinburgh. The three best presentations will participate in a special session at the confer- ence and the poster will be presented alongside the other posters at the event. The prize will be in the form of free registration at the conference for the four winners. (The registration fee includes many meals and social events but not transport or accommodation).

Further details about the conference can be found at: www.rss.org.uk/rss2009

17 10 List of Sponsors’ Talks

On Wednesday 25th several of the conference sponsors will be giving presentations as part of the main conference programme, providing an opportunity to learn about their statistical work.

Time Room Sponsor Speaker Title Pg 11:10 B59 Pfizer Whitlock, Use of mixture designs in pharma- 69 Mark ceutical product development 12:00 3&4 Shell Jones, Rapid Application Deployment with 65 Wayne R. R 12:25 3&4 StatSoft Gurycz, Why would anyone (in their right 65 Jurek mind) buy Statistica when they can get R for free? 12:50 5&6 Unilever Bennett, A multidimensional scaling ap- 69 Stephen proach to multivariate paired comparison data 14:10 B59 Royal Statis- Foster, 175 Years of the Royal Statistical So- 76 tical Society Janet ciety 15:25 3&4 GlaxoSmith- Bedding, The Bayesian analysis of Phase II 74 Kline Alun dose titration studies in order to de- sign Phase III 15:25 5&6 Man Invest- Cottet, Quantitative research and career op- 76 ments Remy portunities in AHL, part of Man In- vestments

18 11 Talks Schedule

11.1 Monday 23rd March Session – Plenary Chair: Matthew Sperrin Room: George Fox Lecture Theatre 1 Time Speaker Title Pg

16:00 Chetwynd, Amanda Opening Address 14 16:15 Hand, David Size Matters 15 17:00 Diggle, Peter Geostatistical Inference Under Preferential Sam- 16 pling

19 11.2 Tuesday 24th March Session 1a: Experimental Design Chair: Michelle Stanton Room: George Fox 3 and 4 Time Speaker Title Pg

9:10 Marley, Chris A Comparison of Design and Analysis Methods 30 for Supersaturated Experiments 9:35 Thornewell, Helen Selecting the Least Vulnerable BIBDs to Obser- 30 vation Loss 10:00 Rajak, Noorazrin Imprecision and Robustness in Bayesian Experi- 31 mental Design 10:25 Boukouvalas, Alexis Heteroscedastic Gaussian Processes for Complex 32 Datasets

Session 1b: Non-Parametric Chair: Dennis Prangle Room: George Fox 5 and 6 Time Speaker Title Pg

9:10 Smith, Andrew Nonparametric Regression on a Graph 32 9:35 Achilleos, Achilleas On Local Bandwidth Selection for Kernel Den- 33 sity Estimation 10:00 Marra, Giampiero Instrumental Variable Estimation for General- 34 ized Additive Models 10:25 Zayed, Mohammad Local Principal Curves with Application to Eco- 34 metric Data

20 Session 1c: Probability Chair: David Suda Room: George Fox B59 Time Speaker Title Pg 9:10 Marek, Patrice Parameter Estimation of Spatial Poisson Process 35 in Domain with Unknown Boundary 9:35 Toupal, Tomas Spatial Poisson Process Parameter Estimation 35 Using Information about Subset 10:00 Aldridge, Matthew Poisson Wireless Networks 36 10:25 Michael, Skevi Random trees with dependent edges 36

Session 2a: Environmental Chair: Thomas Fanshawe Room: George Fox 3 and 4 Time Speaker Title Pg 11:10 Williamson, Daniel Solving Difficult Policy Decision and Interven- 37 tion Problems for Complex Systems with Com- puter Simulators and Sequential Emulation 11:35 Youngman, Ben Applying threshold models to significant wave 38 heights 12:00 Stone, Nicola Uncertainty analysis of groundwater flow mod- 38 els using Gaussian process emulation 12:25 Collins, Lindsay Climate variability and its effect on terrestrial 39 carbon flux estimates 12:50 Miller, David A mixture model approach to distance sampling 40 line transect detection functions

21 Session 2b: Computational Chair: Vasileios Giagos Room: George Fox 5 and 6 Time Speaker Title Pg 11:10 Azadi, Namman Ali A Bayesian Approach to Model the Behavior of 40 Single Motor Units 11:35 Prangle, Dennis Summary Statistics for Approximate Bayesian 41 Computation 12:00 Vatsa, Richa The variational Bayes method with its applica- 41 tion to complex problems eg. palaeoclimate re- construction, GLM and multimodality 12:25 Karagiannis, Geor- AIS(RJ): A sampler for trans-dimensional statis- 42 gios tical problems 12:50 Marshall, Tristan Improving the mixing of Monte Carlo algo- 43 rithms through adaptation

Session 2c: Medical I Chair: Bryony Hill Room: George Fox B59 Time Speaker Title Pg 11:10 Casey, Neil An introduction to complex interventions and 43 the related analysis of counterfactual questions 11:35 Wilson, Kevin Modelling correlated binomial data using conju- 44 gate functions 12:00 Windridge, Peter Optimal resource allocation between trials of 44 two competing drugs 12:25 Alvarez Iglesias, Al- A Review of Tree Based Approaches in Survival 45 berto Analysis 12:50 Strawbridge, Alexan- Simulation study of the impact of non- 45 der differential measurement error on non-linear disease models

22 Session 3a: General I Chair: Jiayi Liu Room: George Fox 3 and 4 Time Speaker Title Pg 14:10 Zwiernik, Piotr Asymptotic model selection under non-standard 46 conditions 14:35 McElduff, Fiona Two graphical methods for outlier detection in 46 discrete distributions 15:00 Grapsa,Erofili Bayesian inference for surveys 47

Session 3b: Dimension Reduction Chair: Benjamin Taylor Room: George Fox 5 and 6 Time Speaker Title Pg 14:10 Guo, Hui Dimension Reduction, Propensity Score Analy- 48 ses in Observational Studies 14:35 Dimitrakopoulou, Bayesian Variable Selection in Cluster Analysis 48 Vasiliki 15:00 Serradilla, Javier The Statistical Monitoring of a Chemical Process 49

Session 3c: Genetics / Systems Biology I Chair: David Miller Room: George Fox B59 Time Speaker Title Pg 14:10 Wang, Dennis Analysis of ChIP-Chip Experiments for Map- 49 ping Sites of Transcriptional Regulation 14:35 Luo, Yang Stochastic modelling in biology 50 15:00 Milner, Peter Parameter estimation in biological models 50

23 Session 4a: Operational Research Chair: Daniel Williamson Room: George Fox 3 and 4 Time Speaker Title Pg 16:20 May, Benedict The theory of multi-armed bandits with regres- 51 sors 16:45 Huntley, Nathan Counterfactual Behaviour in Decision Trees 51 17:10 Smyrnakis, Michail Sequentially updated Probability Collectives 52

Session 4b: Time Series Chair: Alexis Boukouvalas Room: George Fox 5 and 6 Time Speaker Title Pg 16:20 Iqbal, Farhat M-estimators of some GARCH-type models; 52 computation and application 16:45 Natsios, David Estimation of the correlation decay rate for 53 Chaotic Intermittency Maps 17:10 Stevens, Kara The Shannon Wavelet in Locally Stationary 54 Wavelet Analysis

Session 4c: Diffusions Chair: Tristan Marshall Room: George Fox B59 Time Speaker Title Pg 16:20 Goncalves, Flavio Retrospective sampling with an application to 54 exact simulation of diffusions 16:45 Suda, David Importance Sampling on Discretely-Observed 55 Diffusions 17:10 Joshi, Chaitanya A New Method to Approximate Bayesian Infer- 55 ence on Diffusion Process Parameters

24 11.3 Wednesday 25th March Session 5a: Genetics / Systems Biology II Chair: Yang Luo Room: George Fox 3 and 4 Time Speaker Title Pg 9:10 Khan, Md. Hasinur Bayesian Variable Selection Methods for Para- 57 metric AFT Models in High Dimensions 9:35 Radice, Rosalba A Bayesian Approach to Phylogenetic Networks 57 10:00 Burgess, Stephen A Bayesian MCMC approach to meta-analysis of 58 Mendelian randomization studies 10:25 Giagos, Vasileios A diffusion approximation for stochastic kinetic 58 models

Session 5b: Spatial Chair: Alexandre Rodrigues Room: George Fox 5 and 6 Time Speaker Title Pg 9:10 Samat, Nor Azah Relative Risk Estimation of Dengue Disease 59 Mapping in Malaysia using Poisson-Gamma Model 9:35 O’Donnell, David Spatial Modelling of Water Quality on Large 59 River Networks 10:00 Stanton, Michelle A spatio-temporal analysis of meningitis inci- 60 dence in Ethiopia: an association with the envi- ronment and climatology 10:25 Fricker, Thomas Emulation of Multivariate Computer Models 61 Using the Linear Model of Coregionalization

25 Session 5c: General II Chair: Shu-Ting Lee Room: George Fox B59 Time Speaker Title Pg

9:10 Liu, Jiayi Age-Period-Cohort Models 61 9:35 Abdullah, Aisyatur- Validation of a Health-Related Quality of Life 62 ridha Scale for Cleft Children Using Exploratory Fac- tor Analysis 10:00 Hill, Bryony Pore Patterns in Fingerprints 63 10:25 Fallaize, Christopher Statistical Shape Analysis and Applications in 63 Bioinformatics

Session 6a: General III Chair: Rebecca Killick Room: George Fox 3 and 4 Time Speaker Title Pg 11:10 Simpkin, Andrew An Additive Penalty Approach to Derivative Es- 64 timation 11:35 Randell, David Bayes linear variance–covariance adjustment for 65 dynamic linear models with application to large industrial systems 12:00 Jones, Wayne R. Rapid Application Deployment with R 65 (Shell) 12:25 Gurycz, Jurek (Stat- Why would anyone (in their right mind) buy Sta- 65 Soft) tistica when they can get R for free?

26 Session 6b: Multivariate Chair: Matthew Sperrin Room: George Fox 5 and 6 Time Speaker Title Pg 11:10 Germain, Sarah Building a Prior for a Variance Matrix using the 66 Concept of Individual Variation 11:35 Pavlou, Menelaos Effcient weighted generalised estimating equa- 67 tions when the cluster size or covariate structure are informative 12:00 Zou, Lu Multiple Imputations of Bio-Datasets 68 12:25 Neufeld, Helene Graphical Gaussian Models with Symmetries 68 12:50 Bennett, Stephen A multidimensional scaling approach to multi- 69 (Unilever) variate paired comparison data

Session 6c: Medical II Chair: Maria Roopa Room: George Fox B59 Time Speaker Title Pg 11:10 Whitlock, Mark Use of mixture designs in pharmaceutical prod- 69 (Pfizer) uct development 11:35 Knock, Edward Contact tracing in a stochastic epidemic 70 12:00 Goudie, Robert How much does an infection tree tell us about a 70 contact network 12:25 Zain, Zakiyah A method for combining binary assessments of 71 stroke outcomes 12:50 Rogers, Jennifer Joint Modelling of Event Counts and Survival 71 Times: Example using data from the MESS Trial

27 Session 7a: Clinical Trials Chair: Jennifer Rogers Room: George Fox 3 and 4 Time Speaker Title Pg 14:10 Akacha, Mouna The Impact of Dropouts on the Analysis of Dose- 72 Finding Studies with Recurrent Event Data 14:35 Azmee, Nor Afzalina Technical Inference Problems in Three-Arm 73 Non-Inferiority Trials 15:00 Ren, Shijie Incorporating Prior Information in Clinical Trial 73 Planning 15:25 Bedding, Alun (Glax- The Bayesian analysis of Phase II dose titration 74 oSmithKline) studies in order to design Phase III

Session 7b: Financial Chair: Peter Windridge Room: George Fox 5 and 6 Time Speaker Title Pg 14:10 Ma, Xiaojuan Computational and statistical aspects of pricing 74 models 14:35 Shahtahmassebi, Bayesian modelling of trade by trade of price 75 Golnaz movements 15:00 Delatola, Eleni- Bayesian nonparametric inference in stochastic 76 Ioanna volatility modelling 15:25 Cottet, Remy (Man Quantitative research and career opportunities 76 investments) in AHL, part of Man Investments

Session 7c: General IV Chair: Helen Thornewell Room: George Fox B59 Time Speaker Title Pg 14:10 Foster, Janet (Royal 175 Years of the Royal Statistical Society 76 Statistical Society) 14:35 Baker, Rebecca A New Selection Method for Multinomial Data 77 15:00 Littler, Geoffrey Dirichlet Diffusion Trees 77

28 12 Talk Abstracts by Session

12.1 Tuesday 24th March 12.1.1 Session 1a: Experimental Design Session Room: George Fox 3 and 4 Chair: Michelle Stanton

Start time 09:10

ACOMPARISON OF DESIGNAND ANALYSIS METHODS FOR SUPERSATURATED EXPERIMENTS Chris Marley University of Southampton, UK Keywords: Supersaturated, Screening, Effect Sparsity, Gauss-Dantzig Selector

In screening experiments, it is often necessary to investigate a large number of fac- tors to establish which have a substantial effect on the response of interest. If it is not possible to carry out a large number of runs, a supersaturated design may be used. This is a design in which the number of runs is less than the number of factors. This has the implication that the main effects of all factors cannot be estimated simulta- neously. The effectiveness of supersaturated designs in detecting active factors relies on the assumption that, in reality, there are only a small number of dominant factors - a concept known as effect sparsity. There is much work in the literature on how to design and analyse such experiments. In this talk, the performance of supersaturated designs under several different sce- narios is investigated using simulation studies. Competing analysis methods for su- persaturated experiments are also compared, and some conclusions are drawn about the effectiveness of the methods for different numbers and sizes of active effects.

Start time 09:35

SELECTINGTHE LEAST VULNERABLE BIBDSTO OBSERVATION LOSS Helen Thornewell Maths Dept, , UK Keywords: Balanced Incomplete Block Design, Design Selection, Disconnected, Experimental Design, Observation Loss, Optimality, Vulnerability

Balanced Incomplete Block Designs (BIBDs) are optimal experimental designs used in situations in which there is one form of blocking. However, if observations are

29 lost during the course of the experiment, the design properties are changed. In some cases, observation loss can result in a disconnected final design, so that it is impossi- ble to estimate all elementary treatment contrasts in the analysis. A Minimum Rank Reducing Observation Set isolating i treatments denoted by MRROS(i) is a set of observations, of minimum size, which when lost, partitions the υ treatments into two sets, with cardinalities i and (υ − i), yielding a disconnected design. The MRROS(i) Vulnerability Measure (Si,Ti) of a design determines the size, Si, of the MRROS(i)s and the total number, Ti, of MRROS(i)s of size Si. New formulae have been derived for calculation of the measure (Si,Ti). A program implementing the formulae provides efficient calculation of the values of Si and Ti, outputting the Vulnerability Measures of any given BIBD(υ, b, k) and displaying de- tailed information about the location of observations in the MRROS(i)s. For some parameter classes, these formulae give fixed Vulnerability Measures, so that any BIBD with the same parameters will have the same Vulnerability Measure, independent of the design structure. These formulae can therefore give rise to a Pilot Procedure to check that the value of Si exceeds any reasonable expectation of obser- vation loss, before the design is used for the required experiment. However, for other parameter classes, the formulae depend on features of the par- ticular design, so that non-isomorphic BIBDs with the same parameters can have different vulnerability measures. The existence of BIBDs in these parameter classes demonstrates that equally-optimal designs are not necessarily equally vulnerable. In this case the program is a useful tool for selecting the least vulnerable design to ob- servation loss within a set of candidate designs.

Start time 10:00

IMPRECISIONAND ROBUSTNESSIN BAYESIAN EXPERIMENTAL DESIGN Noorazrin Abdul Rajak and Dr. Malcolm Farrow , UK Keywords: Bayesian experimental design, Imprecise utility function, Multi-attribute utility function, ,

This project is concerned with experimental design within the Bayesian framework and, in particular cases where the specifications which are normally required in or- der to solve a Bayesian experimental design problem are not made precisely and the development of methods for choosing designs which are robust against such impreci- sion. I will work on an approach to the design of experiments using ideas from Bayes linear methods and from decision analysis. We are particularly interested in exper- iments where several different kinds of observation can be made and where there are a number of different costs and benefits involved. To make it more accessible to users, it needs to be applied to various particular kinds of experiment.We particularly

30 interested in the possibility of applying similar methods to experiments in modern biology,such as microarray experiments but this is only one of many possibilities.

Start time 10:25

HETEROSCEDASTIC GAUSSIAN PROCESSESFOR COMPLEX DATASETS Alexis Boukouvalas, Dan Cornford , UK Keywords: Gaussian Processes, computer experiments, stochastic emulation, experimental design We present a novel method of performing heteroscedastic Gaussian Process (GP) re- gression on complex datasets with replicated observations and address the issue of experimental design in this setting. Such replicate output observations, for a fixed in- put or location, can be found in numerous areas of engineering and science. Our mo- tivating example comes from the field of computer experiments where a random out- put simulator has been constructed to estimate the development and impact of rabies on animal populations. This random output simulator exhibits clear heteroscedastic variance. Our developments build on previous work on heteroscedastic variance modelling using GPs by explicitly reasoning using replicate observations and apply- ing corrections due to finite sample size effects. We demonstrate our method on both synthetic data and the rabies model. We discuss the issue of experimental design, that is the choice of training set input points and whether to evaluate the random model once or repeatedly for a given input point. We provide empirical evidence on the ef- fectiveness of utilizing replicate observations compared to a space-filling design. We employ a wide array of methods to validate our results, including the Mahalanobis error, which allows us to assess the goodness of fit of the joint GP predictions. Lastly we demonstrate how to leverage our framework for the purposes of screening in random output simulators. We discuss open issues in the emulation of stochastic, or random output simulators, suggesting interesting directions for further work.

12.1.2 Session 1b: Non-Parametric Session Room: George Fox 5 and 6 Chair: Dennis Prangle

Start time 09:10

NONPARAMETRIC REGRESSIONONA GRAPH Andrew Smith , UK Keywords: Penalised regression, Image analysis, Total variation

31 A number of problems in penalised regression may be thought of as having a graphi- cal structure. These range from straightforward scatterplot smoothing, to more com- plicated penalties in image analysis. Some spatial and longitudinal models also con- tain a graphical structure. We will discuss nonparametric regression in the context of removing noise from ob- servations made on a graph. Regression on a graph requires the fitting of a function that somehow explains the observations made at the vertices. The fitted function, and shape of the graph, may be completely arbitrary. Therefore it is appropriate to use nonparametric regression, which makes less restrictive assumptions about the function. We borrow ideas from total variation denoising, in which the smoothness or simplicity of the function is controlled. The generalised method penalises departures from the data on the vertices, and roughness on the edges of the graph. There are computational challenges in im- plementing this penalised regression. We will see the results of a new active set al- gorithm for denoising on a graph, and discuss some applications including image analysis.

Start time 09:35

ON LOCAL BANDWIDTH SELECTIONFORKERNEL DENSITY ESTIMATION Achilleas Achilleos [email protected] Keywords: local bandwidth, variable window width, density estimation, Mean Squared Error, A kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. A parametric estimator has a fixed functional form and the parameters of this function are the only information we need to obtain in order to draw the density of a given data. On the other hand, non-parametric es- timators have no fixed structure and depend upon all the data points to produce an estimate. Kernel density estimation is by now a well-established technique. The qual- ity of the kernel density estimator of a probability density function can be examined using either the Mean Squared Error (MSE) or the Mean Integrated Squared Error (MISE) penalty criteria. For good estimation, the choice of the smoothing parame- ter is crucial. When the bandwidth is relatively small, our estimate of the density has small bias and is considered as a good approximation to fX (x), but then fewer ˆ data are used and fX (x) is more variable. The best choice of the bandwidth involves a trade-off between bias and variance. One basic issue is whether the bandwidth should depend on x (local bandwidth) or not (global bandwidth). We assess some practical local bandwidth selection procedures, and investigate whether local band- widths are better than global bandwidths in practice. Some useful remarks about choosing variable bandwidths in practice are discussed.

32 Start time 10:00

INSTRUMENTAL VARIABLE ESTIMATION FOR GENERALIZED ADDITIVE MODELS Giampiero Marra , UK Keywords: Generalized additive models; Instrumental variables; Two-stage estimation procedure; Unmeasured confounding.,

Regression model literature has generally assumed that observable and unobserv- able covariates are statistically independent. However, for many applications this assumption is clearly tenuous. When unobservables are correlated with included re- gressors, standard estimation methods will not be valid. This means, for example, that estimation results from observational studies (whose aim is to evaluate the im- pact that a treatment has on the outcome of interest) will be biased and inconsistent in the presence of unmeasured confounders. One method for obtaining consistent es- timates of treatment effects when dealing with linear models is the instrumental vari- able (IV) approach. However, linear models have been extended to generalized linear models (GLMs) and generalized additive models (GAMs), and although IV methods have been proposed to deal with GLMs, IV analysis has not been generalized to the GAM context. We propose a two-stage strategy for consistent IV estimation when dealing with GAMs represented using any penalized regression spline approach. We explain under which conditions the proposed method works and illustrate its empir- ical validity through an extensive simulation experiment and a health study where unmeasured confounding is suspected to be present.

Start time 10:25

LOCAL PRINCIPAL CURVESWITH APPLICATION TO ECOMETRIC DATA Mohammad A. Zayed Department of Mathematical Sciences, , UK Keywords: Principal Component Analysis, Principal Curves, Local Principal Curves, Phillips Curves,

This brief presentation gives an insight into Local Principal Curves(LPCs) as a non- parametric tool for representing multidimensional data structures by a single smooth one-dimensional curve. We also eexplore some possible econometric applications in which LPCs might be a good choice as a method to represent data as well as (with some smart tools) predicting within data range as well.

33 12.1.3 Session 1c: Probability Session Room: George Fox B59 Chair: David Suda

Start time 09:10

PARAMETER ESTIMATION OF SPATIAL POISSON PROCESS IN DOMAINWITH UNKNOWN BOUNDARY Patrice Marek University of West Bohemia, Czech Republic Keywords: Spatial Poisson process, parameter estimation, unknown boundary, separating domain

A spatial Poisson process is used to model a number of events in a domain. Often, we need to estimate an intensity of event occurrence to determine whether there is some risk (extinction, malfunction, harmfulness) due to the presence or absence of this event. The easiest way (for estimation) to obtain the intensity is to examine the whole do- main and make as many repetitions as possible. However, in many cases it is expen- sive or even impossible to examine the whole domain or even to make repetitions. Therefore, we have to work with a sample domain with no natural boundary and we cannot make repetitions. The only information we have are coordinates of each event (e.g. obtained by GPS technology). In this talk, I will present how to deal with this situation, i.e. how to estimate the parameter λ of the spatial Poisson process in domain with unknown boundary. The proposed method is tested by a large number of simulations.

Start time 09:35

SPATIAL POISSON PROCESS PARAMETER ESTIMATION USING INFORMATION ABOUT SUBSET Tomas Toupal University of West Bohemia, Czech Republic Keywords: Conditional distribution, Poisson distribution, parameter estimation, maximum likelihood method, Bayesian approach

If we estimate parameter λ of spatial Poisson process and we have information about some subset (a measure and number of events), we can use it for the parameter esti- mation of the whole set with unknown measure. In this talk, I will present the conditional distribution between the number of events

34 in a subset and the number of events in a whole set. This conditional distribution is used in estimation of the whole set parameter. I will show two approaches of esti- mation. The first one is based on the maximal likelihood method while the second is based on the Bayesian approach. I will also show, as a part of parameter estimation, how to estimate whole set measure.

Start time 10:00

POISSON WIRELESS NETWORKS Matthew Aldridge, Oliver Johnson and Robert Piechocki University of Bristol, UK Keywords: Poisson point processes, information theory, communications theory, signal-to-noise ratio

Wireless networks, such as mobile phone networks, WiFi, satellite communication, radios, Bluetooth and data networks have become rapidly more popular in recent years, and have attracted the attention of many probabilists, statisticians and engi- neers. However, due to the complicated way in which signals interfere with each other and the lack of knowledge about other users in a network, little is known about the theoretical limits of such communication. In this talk, we look at the background of communication theory. There is a natural tradeoff between the rate at which information can be sent and the accuracy with which it is received. This is encapsulated by the capacity, the fastest rate at which message can be sent whilst still allowing it be received without error. We will note the role played by the signal-to-noise ratio in determining capacity. We then consider wireless communication in large wireless networks. We model users (laptops or mobile phones, for example) as points of a Poisson process in Rd. (Usually d = 2, but there are applications for d = 1 and 3 too.) We have two main results. First, a local result gives bounds on the probability that attempted communication at a given rate between two users is sucessful. Second, a global result shows that the total amount of information flowing through the net- work scales linearly with the number of users. This talk requires no prior knowledge of information theory.

Start time 10:25

RANDOMTREESWITHDEPENDENTEDGES Skevi Michael, Dr Stanislav Volkov University of Bristol Keywords: random environment on trees, large deviation

35 We consider the d-ary regular rooted tree Td, with d ≥ 2 and vertex set V. Among d distinct colours, {1, 2, .., d},we choose one to colour the root. All the other vertices are coloured from left to right, so that, the d children of each vertex, have different 2 ˜ colours. Consider d strictly positive random variables, ξij, with i, j ∈ {1, 2, ...d}, of known joint distribution.Excluding the d edges involving the root,u0, for u, w ∈ V ˜ such that u ∼ w we assign the random variable, ξuw(which is one of ξij and satisfies certain conditions) to the edge (u, w). If ξ[w, u] is the product of all random variables assigned to the edges belonging to the self-avoiding path connecting u and w and ξ[u] := ξ[u0, u], we are interested in the asymptotical behavior of K(x), where K(x) :=  card{u ∈ V : ξ[u] < x}, where x > 0 and hence study the E K(x) as x → ∞.

12.1.4 Session 2a: Environmental Session Room: George Fox 3 and 4 Chair: Thomas Fanshawe

Start time 11:10

SOLVING DIFFICULT POLICY DECISIONAND INTERVENTION PROBLEMSFOR COMPLEX SYSTEMSWITH COMPUTER SIMULATORS AND SEQUENTIAL EMULATION Daniel Williamson and Prof. Michael Goldstein University of Durham, UK Keywords: Computer Model, Emulation, Decision Tree, Policy Intervention, Discrepancy, Climate

Policy makers often have very complex decision problems to consider. A good ex- ample are the class of policy decisions to be made concerning climate change. To understand how a complex system such as climate will behave in the future, scien- tists build computer simulators. These models may be used to produce probabilistic statements about future states which guide policy. Current research focuses on finding strategies that optimise a function of the model- output, I will present a methodology that is different in the following ways:

1. By careful consideration of the discrepancy between actual complex system and the computer model, probabilistic judgements about the system itself drive our loss calculations.

2. Optimal policy under specifified conditions is obtained through mapping the expected loss surface over all decisions. This will be a useful decision support tool.

36 3. Future system values (such as future climate states) will be observed and poli- cies will need to adapt accordingly. These potential downstream observations and decisions are included in loss calculations for current policy.

Current methods of Bayes Linear forecasting of complex systems are adapted to fa- cilitate the provision of decision dependent forecasts. I will describe a method of sequential emulation for a complex decision tree describing this policy problem, us- ing these forecasts.

Start time 11:35

APPLYING THRESHOLD MODELS TO SIGNIFICANT WAVE HEIGHTS Ben Youngman University of Sheffield, UK Keywords: extreme value theory, significant wave heights, generalized pareto distribution

This talk presents a study of one current procedure for analysing extreme waves. Understanding the behaviour of waves is of notable importance in the building ma- rine equipment—for example ships and offshore structures—which are required to withstand the punishment of extreme waves. Presently the strength to which such equipment is built is guided by return levels of significant wave heights, commonly estimated using the peaks over threshold procedure. In standard form this procedure involves identifying clusters in (typically dependent) sequences of observations, re- taining only the cluster peaks and then fitting the generalised Pareto distribution to the cluster peaks, assuming independence between peaks. Whilst asymptotically this procedure is justified, its practical value is at sub-asymptotic levels. Here an attempt is made to mimic a sequence of significant wave heights by simulating a dependent sequence with known distributional properties. Consequently the true parameters being estimated using peaks over threshold models are known, thus providing a method with which to assess the performance of such models.

Start time 12:00

UNCERTAINTY ANALYSIS OF GROUNDWATER FLOW MODELSUSING GAUSSIANPROCESSEMULATION Nicola Stone , UK Keywords: Gaussian process emulators

37 The application of Gaussian process emulators to stochastic groundwater flow mod- els is being investigated. Lack of information about the transmissivity values across the whole of a rock formation is a key feature in groundwater modelling. The loga- rithm of the transmissivity field is usually represented as a Gaussian spatial process, conditioned on the available data, and the groundwater flow code is run many times with different realisations of the field. This approach can be very computationally expensive, and so we consider using emulators as a computationally cheap approxi- mation to the code. We look at a test case of the Waste Isolation Pilot Plant (WIPP). This is a US govern- ment nuclear waste repository in New Mexico, where extensive investigations have been carried out across the region. Given a realisation of the transmissivity field, the groundwater flow computer code we wish to emulate calculates the time, t, it takes for a radioactive particle released in the centre of the region to reach the boundary. We use a stochastic model to approximate the transmissivity field, and then use Bayesian inference to provide distributions for the hyperparameters of this model using the 40 transmissivity measurements collected in the WIPP region. The hyperparameters are then used as inputs to the groundwater flow code. An emulator is built and run as a substitute for the groundwater flow code in an uncertainty analysis of the travel time, t, given the derived distributions of the inputs.

Start time 12:25

CLIMATE VARIABILITY AND ITS EFFECT ON TERRESTRIAL CARBON FLUX ESTIMATES Lindsay Collins and Professor Clive Anderson University of Sheffield, UK Keywords: Terrestrial carbon fluxes, Climate variability, Bayesian probabilistic sensitivity analysis

International strategies to mitigate climate change rely on predictions of the fluxes of carbon dioxide between atmosphere, oceans and the terrestrial biosphere. Of these fluxes, that between the atmosphere and vegetation and soils is the most complex and the least well quantified. A central strategy for estimating and predicting terres- trial carbon dynamics encapsulates knowledge of ecological and soil processes in a computer model, known as a dynamic global vegetation model (DGVM), and uses the model to synthesize data and process understanding to predict carbon fluxes. Climate variables are major drivers of DGVMs and potentially a major source of un- certainty in derived carbon estimates. The DGVM used here is the Sheffield Dynamic Global Vegetation Model (SDGVM), which operates globally on a daily time-frame. Climate input is provided as gridded monthly time series. In this talk it will be shown how temporal variation in the climate leads to differing carbon flux estimates. After establishing a relationship between climate variability and uncertainty in the carbon estimates the aim is to quantify the effect over the UK.

38 To do this a probabilistic sensitivity analysis will be carried out. Since the SDGVM is a complex model we adopt a Bayesian framework, introduced here. The theory is not yet developed enough to handle spatio-temporal data such as climate data. The talk will finish with some of the ideas developed to characterise the variability in climate in order to use the current software for probabilistic sensitivity analysis to directly link our uncertainty in the climate to uncertainty in carbon flux estimates with some early results.

Start time 12:50

A MIXTUREMODELAPPROACHTODISTANCESAMPLING LINETRANSECTDETECTIONFUNCTIONS David Lawrence Miller1 and Len Thomas2 1 Mathematical Sciences, University of Bath 2 Centre for Research into Ecological and Environmental Modelling Keywords: distance sampling, mixture models, line transects, ecological modelling

Distance sampling is a popular method for assessing animal abundance by modelling the detectability of individuals in the population. The usual formulation for the de- tection function is through key function and series expansion models and these work well in most situations. However, there is one notable disadvantage: when series expansions are used, mod- els are no longer necessarily monotonic non-increasing with increasing distance (which is not physically realistic.) This requires constrained likelihood maximization, which can be problematic, especially when covariates in addition to distance are included in the detection function modelling. An alternative class of flexible models is a finite and/or continuous mixture of simple key functions. If the key functions are monotonic non-increasing, then the mixture will also be so. Mixture models have recently become widely applied in the mark- recapture literature with a great deal of success. Here, we investigate the merits of fitting distance sampling detection functions using mixtures.

12.1.5 Session 2b: Computational Session Room: George Fox 5 and 6 Chair: Vasileios Giagos

Start time 11:10

ABAYESIAN APPROACHTO MODELTHE BEHAVIOUR OF SINGLE MOTOR UNITS Nammam Ali Azadi

39 Lancaster University Keywords: Motor Unites, Bayesian model selection, GLM, MCMC,

Peripheral nervous system (PNS) is the portion of the nervous system that is out- side the brain and spinal cord. It connects the brain and spinal cord to the limbs and organs. Motor units, MU, are the functional units of the PNS which activate our muscles. In neuromuscular disorders motor units are affected leading to incapability. Activity from motor units can be experimentally elicited with electrical stimulation of a nerve through the intact skin, or recorded with needle electromyography. The re- sponse from individual motor units to electrical stimulation is probabilistic, with the likelihood of a response depending on the intensity of the applied stimulus. We seek to compare these data collection methods to gain information on the nature of this response by using Bayesian models. We assume firing of a MU at each stimulus has a binomial response and attempt to determine the link. We find that the electromyo- graphical data is over dispersed which complicates the problem of model selection. We compare the merits of Bayesian model selection techniques.

Start time 11:35

SUMMARY STATISTICS FOR APPROXIMATE BAYESIAN COMPUTATION Dennis Prangle Lancaster University Keywords: Approximate Bayesian Computation, MCMC

Approximate Bayesian Computation (ABC) algorithms allow parameter inference for models where explicit likelihood functions are not easily available, instead relying on repeated simulation of data sets. They are useful for inference of many complex mod- els where likelihood based inference techniques such as MCMC are not feasible. To be efficient, ABC methods require low dimensional summaries of these data sets but little work has been done on how to choose these summary statistics. This presen- tation reviews ABC methodology and presents a scheme for constructing summary statistics. In support, theoretical arguments and results from a simple application are discussed.

Start time 12:00

THE VARIATIONAL BAYES METHOD WITH ITS APPLICATIONTOCOMPLEXPROBLEMSEG. PALAEOCLIMATE RECONSTRUCTION,GLM AND MULTIMODALITY

40 Richa Vatsa and Prof. Simon Wilson Trinity College of Dublin, Dublin, Ireland Keywords: palaeoclimate reconstruction, multimodality.,

The variational Bayes (VB) method is a distributional approximation to Bayesian In- ference. It converges very quickly and its application is straightforward, which is the key of its increasing popularity in Bayesian world. The application of the method in palaeoclimate reconstruction is presented. In this reconstruction problem, past cli- mate is inferred using pollen data and the modern climate. Though the method provides quick solutions to the multi-dimensional problems, the structure of (VB algorithm) the method does not allow it to be applied with complex models eg. GLM. A modification in the VB method is developed for its successful application and the accurate result to the estimation problem in GLM. An example of multi-modal posterior distribution with a latent model is presented to demonstrate that the VB method often fails to recognize the multi modes in the multi- modal distribution. A modification in the VB method to deal with the multi-modality problem is developed. The application of the modified VB method to multi-modal problem and its result is also presented.

Start time 12:25

AIS(RJ):AN MCMC SAMPLERFOR TRANS-DIMENSIONAL STATISTICAL PROBLEMS. Georgios Karagiannis and Dr Christophe Andrieu University of Bristol Keywords: Trans-dimensional problems, Bayesian statistics, reversible jump MCMC, generalised linear models

Trans-dimensional problem is a general statistical problem of making inference when the under construction statistical model is not defined precisely enough for the di- mension of the parameter space to be fixed. We present a Markov chain Monte Carlo (MCMC) based algorithm called AIS(RJ). Particularly, we describe two versions of AIS(RJ), the within model sampler AIS and the across model sampler AISRJ. AIS generates samples of ratios of posterior model probabilities for a set of pairs of the under competition models, allowing Bayes fac- tors and posterior model probabilities to be calculated. AISRJ is the extension of AIS on the popular reversible jump MCMC algorithm. AISRJ allows one to improve the performance of reversible jump MCMC in terms of acceptance probability without the need to use very sophisticated moves. The method relies on the idea of bridging the models in a specific way in order to ensure good performance. The performance of (RJ)AIS is illustrated on a generalised linear model example.

41 Start time 12:50 IMPROVINGTHEMIXINGOF MONTE CARLOALGORITHMS THROUGH ADAPTATION Tristan Marshall1 and Gareth Roberts2 1 University of Lancaster, UK 2 , UK Keywords: Metropolis-Hastings, Markov Chain Monte Carlo, Adaptive MCMC, Langevin algorithms, Random walk Metropolis Markov Chain Monte Carlo (MCMC) algorithms are among the best-known and most easily implemented methods for simulating from a given probability density π(x); applications have been found across the entirety of statistics and beyond. Our work examines a particular subset of these, commonly called ‘Langevin algorithms’. These algorithms use a special drift component to ‘point the simulation in the right direction;’ this leads to much faster mixing than can be achieved with other, better- known, MCMC algorithms such as Random Walk Metropolis (RWM). However, Langevin algorithms can only achieve optimal performance if certain con- trolling parameters are carefully tuned to the target density π(·). In extreme cases, poor tuning can result in an algorithm that is effectively useless for practical simu- lation. Unless we have detailed information about π(·) the only way to perform this tuning is through manual trial-and-error; performing ‘test’ simulations with different values of the tuning parameters and comparing performance. This process is slow, and it is very difficult to achieve an optimal tuning. We resolve this problem by using so-called ‘adaptive methods’. These methods at- tempt to automatically tune the algorithm ‘on-line’, by continually ‘adapting’ the controlling parameters using the past history of the simulation. This adaptation must be performed with care, as a thoughtless implementation can prevent the algorithm from converging. We demonstrate an adaptive algorithm that not only preserves convergence, but which also achieves a near-optimal tuning in a remarkably short amount of time. The performance gains from this adaptation are considerable, and allows this potent class of algorithms to be used to their full effect.

12.1.6 Session 2c: Medical I Session Room: George Fox B59 Chair: Bryony Hill

Start time 11:10 ANINTRODUCTIONTOCOMPLEXINTERVENTIONSAND THE RELATED ANALYSIS OF COUNTERFACTUAL QUESTIONS

42 Neil Casey2 1 General Practice & Primary Care Research Unit, 2 , UK Keywords: Complex Interventions, Counterfactual

Following recent MRC guidelines, a framework for the design and evaluation of com- plex intervention trials has been suggested, leading to more clarity in the area. In this talk I plan to give an introduction to complex interventions and potential challenges that arise for a statistician in the area. The work is focused around data from the ProActive trial, a behavioural intervention trial aimed at increasing physical activity in those at risk of Type II diabetes, an illness typically associated with a sedentary lifestyle. I shall present results from both the data set and from simulations based on the trial using models that consider counterfactual questions. Two methods are outlined, G-estimation for structural mean models and an instrumental variable re- gression model.

Start time 11:35

MODELLING CORRELATED BINOMIAL DATA USING CONJUGATE FUNCTIONS Kevin Wilson and Dr. Malcolm Farrow Newcastle University, UK Keywords: Bayesian,

Bayesian inference for such things as collections of related Binomial or Poisson distri- butions typically involves rather indirect prior specifications and numerical methods (usually MCMC) for posterior evaluations. Some possible alternative approaches to this problem shall be considered. They fall into two main types; copula functions and conjugate functions. It will be demonstrated that, although copula functions have many useful properties, most are unsuitable for the above analyses. Three possible conjugate functions for modelling correlated binomial data shall be proposed and each will be illustrated using a dataset reported by the Anturane Rein- farction Trial Research Group on the use of the drug sulfinpyrazone in patients who had suffered myocardial infarctions (heart attacks). The talk shall be motivated us- ing a much larger dataset concerned with the deaths of patients after surgery in two areas of the USA.

Start time 12:00

OPTIMALRESOURCEALLOCATIONBETWEENTRIALSOF TWOCOMPETINGDRUGS P. Windridge, S.D. Jacka and J. Warren

43 Department of Statistics, University of Warwick, UK Keywords: competing clinical trials, diffusions martingales stochastic calculus hamilton jacobi bellman

This talk concerns dynamic allocation of a scarce resource between two independent but competing clinical trials, each of which eventually ends in success or failure. Our primary aim is to find a strategy that minimises the total quantity of the resource re- quired for either (a) a single trial to succeed (our understanding being that the other project is abandoned) or (b) both trials to fail. A secondary aim (time permitting) is to investigate the situation when we only have a fixed amount of the scarce resource available. We shall formulate the problem probabalistically and outline how the heavy machin- ery of stochastic calculus can be used to deduce optimality results.

Start time 12:25

AREVIEWOF TREE BASED APPROACHESIN SURVIVAL ANALYSIS Alberto Alvarez Iglesias1 and Dr. John Newell2 1 National University of Ireland, Galway. 2 Biostatistics Unit, Clinical Research Facility, School of Medicine, National University of Ireland, Galway. Keywords: Censored data, Proportional hazards, Tree based methods, Survival trees, Recursive partitioning,

Survival analysis is a statistical method that has been applied for the last few decades in different areas of medicine and reliability in manufacturing. In order to model the survival experience, while adjusting for any of explanatory variables, the propor- tional hazards model (Cox regression model) is a standard approach. An additional approach involves the use of Tree-based methods. They have been recognised as a useful modelling tool, particularly in the medical area, as they mimic medical de- cision making Survival Trees are a useful method for identifying structure in high dimensional survival data. In this talk a description of Survival trees will be given, some interesting areas of improvement suggested and a biomedical application will be presented.

Start time 12:50

SIMULATION STUDY OF THE IMPACT OF NON-DIFFERENTIAL MEASUREMENT ERROR ON NON-LINEARDISEASEMODELS

44 Alexander Strawbridge MRC Biostatistics Unit, Cambridge, UK Keywords: Measurement error, Simulation, Cox regression, MacMahon’s Method, Fractional polynomials, P-splines

In measuring exposures in the real-world, measurement error is often unavoidable. When regressing a survival outcome on an imperfectly measured exposure, mea- surement error can induce bias in parameter estimates as well as mask features of the data. We conduct a simulation study using data generated from a range of epi- demiologically plausible non-linear survival models (’true’ exposure) to which we introduce varying degrees of measurement error generated from a standard non- differential model (’observed’ exposure). Common non-linear modelling methods - groups (the standard epidemiological practice), P-splines and fractional polynomials - are fitted to both the ’true’ and ’observed’ exposures to explore the effect of mea- surement error on functional form. A commonly used correction method for groups is considered and its limitations explored.

12.1.7 Session 3a: General I Session Room: George Fox 3 and 4 Chair: Jiayi Liu

Start time 14:10

ASYMPTOTICMODELSELECTIONUNDERNON-STANDARD CONDITIONS Piotr Zwiernik University of Warwick Keywords: BIC, singular models, graphical models with hidden variables

The Bayesian Information Criterion (BIC) is derived under some regularity condi- tions on the model under consideration. The problem is that the conditions are not satisfied for most of the interesting cases like Gaussian mixtures, hidden Markov models, phylogenetic tree models. We give an idea of what might go wrong and present a basic example of the adjusted BIC in the case of binary naive Bayes models.

Start time 14:35

TWOGRAPHICALMETHODSFOROUTLIERDETECTIONIN DISCRETEDISTRIBUTIONS Fiona McElduff, Dr. Mario Cortina-Borja and Dr. Angie Wade

45 Institute of Child Health, UCL. Keywords: outliers, discrete distributions, empirical probability generating function, surprise index

Discrete distributions are often long tailed with a few observations with high values that may be crucial in statistical analyses. Such values may either be due to charac- teristics of the underlying distribution or may potentially be outliers. We present two graphical methods for outlier detection in discrete distributions. The empirical prob- ability generating function (epgf) provides a smooth projection of observed discrete data and can be plotted with the probability generating function (pgf) of the theoret- ical probability distribution to reveal potential outlying observations. The Surprise Index is an empirical measure of how unexpected an observed value is with respect to a probability model; large values indicate ’surprising’ events. Both techniques allow the comparison of models for discrete data. We apply these methods to model data from epidemiological and clinical studies. The analyses and graphical presentations discussed are implemented in the R framework for Statistical Computing.

Start time 15:00

BAYESIANINFERENCEFORSURVEYS Erofili Grapsa University of Southampton,UK

Bayesian modelling for surveys and finite populations is quite flexible to take into ac- count design features like stratification or clustering when we know these features in advance. Assuming that we act as data analysts, information about sampling scheme and the population grouping variables is often provided only through the survey weights. The weights provide information about the groups in the population, but it may still be unknown if all the population groups are represented in the sample or which variables have been used to create them. Thus, the number of the different weights/groups may be considered as the number of random effects and a mixed effects model can be applied. Since the variables of interest in surveys are usually categorical or categorised with many categories a multinomial model is suitable. This is a complex model and the posterior densities are not directly available. The solution is to use Monte Carlo Markov Chain techniques, like Metropolis-Hastings algorithm to simulate draws from the posterior densities of the parameters. A corre- lation problem between the parameters arises and the use of reparameterisation be- comes essential in order to obtain less correlated parameters. Hierarchical centering is an efficient reparameterisation algortithm for this case. It uses the fact that mul- tilevel models contain linear predictors consisting of variables with associated fixed effects and zero mean random effects and it involves centering the random effects around the predictors.

46 12.1.8 Session 3b: Dimension Reduction Session Room: George Fox 5 and 6 Chair: Benjamin Taylor

Start time 14:10

DIMENSIONREDUCTION, PROPENSITYSCOREANALYSES IN OBSERVATIONAL STUDIES Hui Guo University of Cambridge, UK Keywords: sufficient covariate, propensity score, linear discriminant, average causal effect,

In many medical studies, the focus is on estimating the average causal effect (ACE) of a treatment. If the available data are gathered from observational studies where randomisation is absent, then estimating the ACE becomes problematic. We aim to find a scalar propensity variable to represent subjects’ characteristics. Given that the response, characteristics and treatment are linearly related, identical (different) covariance matrices of the characteristics for the treated and untreated groups result in the same (different) estimated ACEs from regressing the response on the treatment and characteristics, and on the treatment and propensity variable.

Start time 14:35

BAYESIAN VARIABLE SELECTIONIN CLUSTER ANALYSIS. Vasiliki Dimitrakopoulou and Prof. Philip J. Brown , UK Keywords: Bayesian Variable Selection, Bayesian Clustering, Stochastic Search Techniques,

Over the last decade, technological advances have lead to data with high dimension- ality. A very common characteristic of such data is the large number of variables with the number of observations being substantially smaller (p >> n). Two tasks are commonly addressed during the analysis of high-dimensional data. One is to uncover the group structure of the observations and the other involves the identifi- cation of the important variables. In our method we are trying to deal with those two problems simultaneously. The clustering problem is being seen in the form of a multivariate normal mixture model. The selection of the discriminating covariates is handled by the introduction of a binary exclusion/inclusion latent vector which gets updated via stochastic search techniques.

47 Start time 15:00

THE STATISTICAL MONITORINGOFA CHEMICAL PROCESS Javier Serradilla and Dr. Jian Q. Shi Newcastle University, UK Keywords: Multivariate Statistical Process Control, Gaussian Process Regression

Fault detection and diagnosis in chemical processes are crucial aspects of current pro- cess engineering practice. The early detection of faults, while the plant is still in oper- ation, can help avoid the abnormal event progression and reduce productivity losses. While there exists a variety of approaches to this problem, the use of empirical-based techniques as opposed to first-principles (deterministic) models can take advantage of the extraordinary amount of data which is routinely recorded and be developed relatively quicker. In this presentation we intend to review how dimensionality reduction techniques can be successfully used in process engineering where the input variables are highly correlated and multidimensional in nature. Likewise, we will discuss how Gaussian Process Regression could be used within this context.

12.1.9 Session 3c: Genetics and Systems Biology I Session Room: George Fox B59 Chair: David Miller

Start time 14:10

ANALYSIS OF CHIP-CHIP EXPERIMENTSFOR MAPPING SITESOF TRANSCRIPTIONAL REGULATION Dennis Wang1, Augusto Rendon2 and Lorenz Wernisch1 1 MRC Biostatistics Unit, Cambridge, UK 2 Department of Haematology, University of Cambridge, Cambridge, UK Keywords: biostatistics, bioinformatics, genetics, systems biology,

Protein-DNA interactions contribute to specific and epigenetic regulation of gene ex- pression. In order to identify the genomic sites where proteins bind to regulate the transcription process of nearby genes, researchers apply a common technique involv- ing chromatin immunoprecipitation on tiling microarrays (ChIP-Chip). Although a powerful experimental tool for studying gene regulation, the many steps in conduct- ing such an experiment introduces variability in the final results. Differences in ex- perimental protocols, microarray platforms, and data analysis algorithms introduce

48 different biases and noise into the measurements. Here we consider the statistical challenges of analyzing ChIP-Chip experiments for the unbiased mapping of regula- tory relationships between ubiquitous transcription factors, histone acetylations and candidate genes. In particular, analysis methods are mandated to have high sensitiv- ity in detecting binding events from ChIP-Chip data, but also maintain a low number of false positives. We present strategies for accessing immunoprecipitate enrichment and introduce a probabilistic framework for inferring sites involved in the regulation of megakaryocyte development.

Start time 14:35

STOCHASTIC MODELLINGIN BIOLOGY Yang Luo and Professor Lorenz Wernisch University of Cambridge, UK Keywords: Stochastic differential equation, Model selection, Stochastic simulation algorithm

Traditionally, biological systems are modelled by deterministic models. However, biochemical reacting systems which involve small number of molecules of certain species are largely influenced by stochastic fluctuations, may not be adequately mod- elled by a deterministic differential equations. This is due to the stochastic, or ‘noisy’ nature within biochemical processes. This stochasticity comes in two ways. The in- herent stochasticity of biochemical processes such as transcription and translation in gene expression generates intrinsic noise. In addition, fluctuations in the amounts or states of other cellular components lead indirectly to variation in the expression of a particular species and thus represent extrinsic noise. The Stochastic models, in form of jump processes or stochastic differential equations, have been proposed instead for analyzing such systems.

Aim of this project is to gain some insight into the stochastic behavior in biochemical systems and their comparison to the deterministic models in performing parameter inference and model selection.

Start time 15:00

PARAMETERESTIMATIONINBIOLOGICALMODELS Peter Milner, Dr. Colin Gillespie and Prof. Darren Wilkinson Newcastle University Keywords: Parameter estimation, moment closure, MCMC inference

This talk will tackle one of the key problems in the new science of systems biology: inference for the rate parameters underlying complex stochastic kinetic biochemical

49 network models, using partial, discrete and noisy time-course measurements of the system state. Although inference for exact stochastic models is possible, it is compu- tationally intensive for relatively small networks. We explore the Bayesian estimation of stochastic kinetic rate parameters using ap- proximate models, based on moment closure analysis of the underlying stochastic process. By assuming a Gaussian distribution and using moment-closure estimates of the first two moments, we can greatly increase the speed of parameter inference. The parameter space can be efficiently explored by embedding this approximation into an MCMC procedure. Mixing problems often occur when estimating missing data values, we overcome this problem by using a block updating approach.

12.1.10 Session 4a: Operational Research Session Room: George Fox 3 and 4 Chair: Daniel Williamson

Start time 16:20

THE THEORYOF MULTI-ARMED BANDITSWITH REGRESSORS Benedict May University of Bristol, UK Keywords: Bandit Problem, Reinforcement Learning, Regression, Game Theory

The multi-armed bandit problem is a simple example the exploitation/exploration trade-off generally inherent in reinforcement learning problems. An agent is tasked with learning from experience how to sequentially make decisions in order to max- imize cumulative reward. In the extension considered, the agent is presented with a regressor (or regressor vector) before making each decision. The potential rewards for each action at a decision epoch are partially determined by the regressor. The presentation will include a formal description of the problem studied so far, conver- gence in policy to optimality criteria for certain classes of algorithm and intended future work. In particular, the extension to the multi-agent case (partnership games) will be discussed.

Start time 16:45

COUNTERFACTUAL BEHAVIOUR IN DECISION TREES Nathan Huntley and Matthias C. M. Troffaes Durham University, UK Keywords: counterfactual, normal form, decision trees, choice functions

50 When solving decision trees using probability and expected utility, one can solve sub- trees by “snipping off” the rest of the decision tree – everything except the subtree in question and the events observed on the path to that subtree can be ignored, without changing the answer. It is well-known that many choice functions other than ex- pected utility do not have this property: in order to determine the choice one should take in a subtree, one must take into account the possible consequences of options that were refused in the past or events that did not occur. Choice functions with this behaviour are called counterfactual. Counterfactual behaviour may be considered unappealing and presents some prac- tical problems. We characterise the set of choice functions that are factual for nor- mal form decision making (specifying all decisions in all eventualities in advance). It turns out that avoiding counterfactuality places strong restrictions on the choice function.

Start time 17:10

SEQUENTIALLY UPDATED PROBABILITY COLLECTIVES M. Smyrnakis and D. Leslie Uniersity of Bristol, UK Keywords: Probability Collectives, Sequential Monte Carlo Methods

Multi-agent coordination problems can be cast as distributed optimization tasks. Prob- ability Collectives (PCs) are techniques that deal with such problems in discrete and continuous spaces. In this talk we are going to propose a new variation of PCs, Se- quentially updated Probability Collectives. The difference between the classic PCs methods and the proposed one is not in the main framework of PCs. Our objective is to show how standard techniques from the statistics literature, Sequential Monte Carlo methods and non-parametric regression, can be used as building blocks within PCs instead of the ad hoc approaches taken previously to produce samples and esti- mate values in continuous action spaces. We compare our algorithm in three different simulation scenarios with continuous action spaces. Two classical distributed optimization functions, the three and six di- mensional Hartman functions and a vehicle target assignment type game. The results for the Hartman functions were close to the global optimum. Additionally the agents managed to coordinate in our last simulation near to the optimal solution.

12.1.11 Session 4b: Time Series Session Room: George Fox 5 and 6 Chair: Alexis Boukouvalas

Start time 16:20

51 M-ESTIMATORSOFSOMEGARCH-TYPEMODELS; COMPUTATION AND APPLICATION Farhat Iqbal Lancaster University, UK Keywords: GJR model, GARCH model, Computing M-estimator, B-estimator, VaR.

In this paper, we consider robust M-estimation of time series models with both sym- metric and asymmetric forms of heteroscedasticity related to the GARCH and GJR models. The class of estimators includes least absolute deviation (LAD), Huber’s, Cauchy and B-estimator as well as the well-known quasi maximum likelihood es- timator (QMLE). Extensive simulations are used to check the relative performance of these estimators in both models and the weighted resampling methods are used to approximate the sampling distribution of M-estimators. Our results indicate that there are estimators that can perform better than QMLE and even outperform robust estimator such as LAD when the error distribution is heavy-tailed. These estimators are also applied to real data sets.

Start time 16:45

ESTIMATION OF THE CORRELATION DECAY RATE FOR CHAOTIC INTERMITTENCY MAPS. D. J. Natsios , UK Keywords: chaotic intermittency maps, long memory

Chaotic intermittency maps give a non-linear, non-Gaussian method of generating long memory time series. In particular we study the symmetric cusp map, the asym- metric cusp map, polynomial maps and logarithmic maps. In previous studies by Bhansali and Holland, it has been shown that these maps can simulate stationary time series with a full range of values for the long memory parameter, including d = 0.5 which is usually considered non-stationary, d = 0 which is usually considered short memory and d ¡ 0 which is usually intermediate memory. This gives us the opportunity to carry out a simulation study to investigate the robustness of various long memory estimation techniques in extreme cases and when the assumptions of linearity and Gaussian distribution no longer hold. We show that standard methods can give large bias and we introduce an extended dual parameter long memory model, which includes the standard one-parameter fractional difference model as a special case and also accommodates a boundary be- haviour of the type not admissible in the standard model.

52 Start time 17:10

THE SHANNON WAVELET IN LOCALLY STATIONARY WAVELET ANALYSIS Kara Stevens University of Bristol, UK Keywords: Time Series, Locally Stationary, Locally Stationary Wavelet Analysis, Shannon Wavelet

A time series is a set of observations which have been taken sequentially in time. The order of a time series is very important because often there is a dependence structure between observations. A useful way of obtaining information about a time series is to calculate its auto-covariance function, this measures the covariance between ob- servations at different time periods apart. Often, an observed time series shows properties of local stationarity. Classical time series analysis often assumes stationarity, which means it is not suitable for time se- ries exhibiting possible locally stationary behaviour. In this situation it may be more appropriate to use methodology that will account for the local stationarity. One such technique is locally stationary wavelet (LSW) analysis. In LSW analysis a method of looking at the covariance between time points is to calculate the localized auto- covariance function. There are many different types of wavelets that can be used in LSW analysis, such as the Shannon wavelet. However, currently there is little research on the Shannon wavelet within the framework of LSW analysis. The aim of this presentation is to introduce LSW analysis and the Shannon wavelet with special emphasis on the local- ized auto-covariance function.

12.1.12 Session 4c: Diffusions Session Room: George Fox B59 Chair: Tristan Marshall

Start time 16:20

RETROSPECTIVESAMPLINGWITHANAPPLICATIONTO EXACT SIMULATION OF DIFFUSIONS Flavio´ B. Gonc¸alves and Gareth Roberts University of Warwick Keywords: Retrospective sampling, infinite dimension, diffusion process

53 In this talk I will introduce the idea of retrospective sampling. It is a simple simula- tion technique most powerful in infinite dimension contexts. I will show some simple examples where retrospective sampling is a suitable alternative to other used meth- ods. Finally, I will present a retrospective sampling algorithm for the exact simulation of diffusion processes.

Start time 16:45

IMPORTANCE SAMPLING ON DISCRETELY-OBSERVED DIFFUSIONS David Suda1 and Paul Fearnhead2 1 University of Lancaster, UK 2 University of Lancaster, UK Keywords: importance sampling; diusion bridges; Bayesian inference,

This study focuses on importance sampling methods for sampling from conditioned diffusions. The fact that most diffusion processes have a transition density which is unknown or intractable makes it desirable to find an adequate alternative density to simulate from, hence making the importance sampling approach worth considering. In this study, the first aim will be that of using Bayes’ formula to derive the general stochastic differential equation for a diffusion bridge. This can be used to design effi- cient importance sampling proposals. Secondly, the performance of both existent and newly-derived importance samplers shall be assessed on various types of diffusions by means of Monte Carlo simulation, and a theoretical explanation of the output shall be sought for each diffusion/sampler combination.

Start time 17:10

ANEW METHODTO APPROXIMATE BAYESIAN INFERENCEON DIFFUSION PROCESS PARAMETERS Chaitanya Joshi and Dr. Simon Wilson Trinity College Dublin, Ireland Keywords: Bayesian Inference, Stochastic Differential Equations, Diffusion Processes, Gaussian Approximation, New Approach,

Since, in real life most of the diffusion processes are observed only at discrete time in- tervals not small enough, both Likelihood based and Bayesian methods of inference become non-trivial. To overcome this problem Bayesian inference is centred around introducing m latent data points between every pair of observations. However it was shown that as m increases, one can make very precise inference about the dif- fusion coefficient of the process via the quadratic variation. This dependence results

54 in slow mixing of the naive MCMC schemes and it worsens linearly as the amount of data augmentation increases. Various different approaches have been proposed to get around this problem. Some of them involve transforming the SDE, while most others present innovative MCMC schemes. We propose a new method to approximate Bayesian inference on the diffusion pro- cess parameters. Our method is simple, computationally efficient, does not involve any transformations, and is not based on the MCMC approach. Principle features of this new method are Gaussian approximation proposed by Durham and Gallant (2002) and a grid search to explore parameter space. In this paper we first introduce our new method and then compare its performance with recently proposed MCMC based schemes on several diffusion processes.

55 12.2 Wednesday 25th March 12.2.1 Session 5a: Genetics and Systems Biology II Session Room: George Fox 3 and 4 Chair: Yang Luo

Start time 09:10

BAYESIAN VARIABLE SELECTION METHODSFOR PARAMETRIC AFTMODELSIN HIGH DIMENSIONS Md. Hasinur Rahaman Khan Department of Statistics, University of Warwick, UK Keywords: Accelerated Failure Time, Bayesian, Microarray Data,

The need to analyse failure time data with very high-dimensional covariates arises in many fields. An important recent area of application is microarray data analysis, i.e. investigating the relationship between a censored survival outcome and microarray gene expression profiles. Because of the small sample size and large number of co- variates in such situations, frequentist methods for the variable selection process can be unstable and result in over-fitting. This research focuses instead on the Bayesian approach to identifying the most influential covariates when fitting parametric accel- erated failure time (AFT) models. The performance and sensitivity of such Bayesian variable selection methods will be analysed and demonstrated using both simulated and real datasets.

Start time 09:35

ABAYESIAN APPROACHTO PHYLOGENETIC NETWORKS Rosalba Radice University of Bath, UK Keywords: Bayesian estimation; Biomolecular sequences; Markov chain Monte Carlo Method; Phylogenetic Network; Reticulation events.

Phylogenies are the main tool for representing evolutionary relationships among bi- ological entities. Their pervasiveness has led biologists, mathematicians, and com- puter scientists to design a variety of methods for their reconstruction. However most of these methods construct trees and biologists have recognized that trees fail to take into account reticulation events such as horizontal gene transfer (HGT), leading to spurious results. These non-tree events can give rise to edges that connect nodes from different branches of a tree, creating a directed acyclic graph (DAG) structure usually referred to as phylogenetic network. Here, we present a Bayesian approach

56 to phylogenetic networks by using a Markov chain governed by a hidden Markov model. Markov chain Monte Carlo (McMC) techniques are employed to compute all posterior quantities of interest and, in particular, allow inferences to be made regard- ing the number of topology types along a multiple DNA sequence alignment.

Start time 10:00

ABAYESIAN MCMC APPROACH TO META-ANALYSIS OF MENDELIANRANDOMIZATIONSTUDIES Stephen Burgess and Simon Thompson MRC Biostatistics Unit, Cambridge, UK Keywords: Causal inference, Mendelian randomization, Meta-analysis, MCMC

Mendelian randomization is a technique for assessing a causal association of a puta- tive factor with the risk of disease in a non-experimental setting. It uses the random allocation of genes at birth in an analogous way to treatment assignment in a ran- domized control trial. By finding a genetic variation associated with a different level of the putative factor, the different genotypes gives rise to groups which, under cer- tain assumptions, are randomly assigned and so independent of confounding and reverse causative effects. We here describe a novel Bayesian formulation to address the problem of estimating causal effects using individual data on genetic characteris- tics in multiple studies. It efficiently deals with multiple genes, with different genes being measured in different studies, with missing genetic data, and with heterogene- ity between studies.

Start time 10:25

A DIFFUSION APPROXIMATION FOR STOCHASTIC KINETIC MODELS. Vasileios Giagos Department of Mathematics and Statistics, Lancaster University, UK Keywords: Diffusion process, Gene Regulatory Network, Simulation, ODE,

Gene regulation is a cellular biological process, whereby molecular particles interact dynamically. In this context, molecular interactions are usually expressed as bio- chemical reactions which depend on a set of parameters, the reaction rates. Inference on reaction rates helps us to quantify the properties of the gene regulatory systems. In principle our system follows a multidimensional discrete state continuous time process which often leads to an estimation problem that is intractable both analyti- cally and numerically. Therefore an approximation is often preferred, either deter- ministic based on a system of Ordinary Differential Equations (ODEs), or stochastic

57 based on the Chemical Langevin Equation. A novel stochastic approximation is pre- sented, which is based on the discrepancy between the real process and the system of ODE. As the system size increases, it converges to a diffusion process with a nu- merically tractable transition density. We demonstrate, in a series of simulated exper- iments, that the limiting diffusion provides a good approximation to the real discrete state process.

12.2.2 Session 5b: Spatial Session Room: George Fox 5 and 6 Chair: Alexandre Rodrigues

Start time 09:10

RELATIVE RISK ESTIMATION OF DENGUE DISEASE MAPPINGIN MALAYSIA USING POISSON-GAMMA MODEL Nor Azah Samat Centre for Operational Research and Applied Statistics, Salford , , Salford, M5 4WT, UK Keywords: Relative Risk, Dengue, Disease Mapping, Disease mapping is a method to display the geographical distribution of disease oc- currence, which generally involves the usage and interpretation of a map to show the incidence of certain diseases. Relative risk (RR) estimation is one of the most impor- tant issues in disease mapping. This paper begins by providing a brief overview of dengue disease and its current situation in Malaysia. This is followed by a review of the classical model used in disease mapping, based on the standardized morbid- ity ratio (SMR), which we then apply to our dengue data. We then fit an extension of the classical model, which we refer to as a Poisson-Gamma model, when prior distributions for the relative risks are assumed known. Both results are displayed and compared using maps and we reveal a smoother map with fewer extremes in relative risk estimates. This paper also considers other methods that are relevant to overcome the drawbacks of the existing methods, in order to inform and direct gov- ernment strategy for monitoring and controlling dengue.

Start time 09:35

SPATIAL MODELLINGOF WATER QUALITYON LARGE RIVER NETWORKS David O’Donnell , UK Keywords: Spatial Prediction, Kriging, Water Quality, River Network

58 The routine monitoring of water quality has been carried out on rivers around the world for decades. Increasing industrialisation and intensive farming make it more relevant than ever before to monitor the effects such practices have on our water bod- ies and the aquatic life they support. European Union legislation such as the Nitrates Directive (adopted in 1991) and the Water Framework Directive (adopted in 2002) have set guidelines for monitoring and controlling pollution which national bodies must comply with. From a spatial modelling viewpoint, modelling water quality over a river network presents the important question of whether Euclidean distance is suitable for use in such a context. One development has been the use of “river distance” in place of, and possibly in addition to, Euclidean distance. Standard spatial modelling tech- niques have been shown to be unsuitable in these cases and so information on the river network itself, including data on flow volumes and the “flow connectivity”, must be incorporated to ensure their validity. Both the Euclidean and River Distance approaches have been implemented and com- pared on data from the River Tweed, in the borders of Scotland and England. Simu- lation studies were then carried out to investigate the possible effect of the locations of sampling stations on the results. The data from the Tweed is taken from the Scot- tish Environment Protection Agency (SEPA)’s routine monitoring of the site over a period of twenty-one years, beginning in 1986. During this time period, up to eighty- three stations are monitored for a variety of chemical and biological determinands. This presentation will focus on nitrogen, a key nutrient in determining water quality, especially given the Nitrates Directive.

Start time 10:00

ASPATIO-TEMPORAL ANALYSIS OF MENINGITIS INCIDENCEIN ETHIOPIA:AN ASSOCIATION WITH THE ENVIRONMENTANDCLIMATOLOGY Michelle Stanton and Prof. Peter Diggle CHICAS, Division of Medicine, Lancaster University, UK Keywords: meningococcal meningitis, spatio-temporal, Ethiopia, epidemics, climate, environment

An area of sub-Saharan Africa, known as the meningitis belt, is frequently affected by large-scale meningitis epidemics resulting in tens of thousands of cases, and thou- sands of deaths during epidemic years. The link between the seasonal and spatial patterns of epidemics and the climate has long been recognised, although the mech- anisms which cause these patterns are not well understood. The Meningitis Envi- ronmental Risk Information Technologies Project (MERIT) is a collaborative project involving the World Health Organization, and members of the environmental, public health and epidemiological communities. One of MERIT’s objectives is to use both

59 routine meningitis surveillance data and information on climatic and environmen- tal conditions to develop meningitis epidemic early warning systems (EWS). This EWS could then be used to improve the targeting of preventative and reactive vac- cine efforts. Weekly meningitis incidence data has been obtained from the Ethiopian Ministry of Health for the period October 2000 to July 2008 at district (woreda) level. Further, data on the climate variables most strongly associated with meningitis inci- dence, aggregated to district level, have been obtained for the same time period. We propose to develop an integrated spatio-temporal model of incidence and meterol- ogy, by decomposing the spatio-temporal variation of the relevant variables into a national trend, national temporal variations and local spatio-temporal fluctuations, with the aim of predicting future epidemics.

Start time 10:25

EMULATION OF MULTIVARIATE COMPUTER MODELS USINGTHE LINEAR MODELOF COREGIONALIZATION Tom Fricker University of Sheffield, UK Keywords: Gaussian process, emulator, coregionalization, nonseparable, computer-experiment

An emulator is a statistical surrogate for an expensive computer model, used to ob- tain a fast probabilistic prediction of the output. Emulators can be constructed by considering the input-output relationship as an unknown function and modelling the uncertainty using a Gaussian process prior on the function. If the computer model produces multiple outputs, the emulator must capture two types of correlation: correlation over the input space, and correlation between differ- ent outputs. We show that the usual mathematically-convenient approach of treating the two types as separable can result in misspecified input-space correlation func- tions for some or all of the outputs. We propose an emulator with a nonseparable covariance, based on the linear model of coregionalization (LMC) taken from the geostatistical literature. By allowing different outputs to have different correlation functions, the LMC emulator can provide better estimates of prediction uncertainty across the outputs. The advantages of the LMC over a separable structure are illus- trated with applications from the fields of science and engineering.

12.2.3 Session 5c: General II Session Room: George Fox B59 Chair: Shu-Ting Lee

Start time 09:10

60 AGE-PERIOD-COHORT MODELS Jiayi Liu Lancaster University Keywords: collinearity, Age-Period-Cohort models

In statistical studies of human population, areas like demography, public health, so- ciology, criminology etc., age, period and cohort have been commonly used as ex- planatory variables by researchers. A statistical issue arises from a perfect linear rela- tionship among age, period and cohort effects, which is the cohort membership, can be defined by the period and age. It is not possible to separate the three effects in a generalized linear model form without some kind of constraints. The purpose of this work is to find a method which the estimable functions are invariant to the selection of constraint on the parameters. Two approaches will be introduced and compared in this talk. One is by Osmond and Gardner (1982), a Penalty Function approach, and the other one is Intrinsic Estimators by Fu and Rohan (1999).

Start time 09:35

VALIDATION OF A HEALTH-RELATED QUALITYOF LIFE SCALEFOR CLEFT CHILDREN USING EXPLORATORY FACTOR ANALYSIS Aisyaturridha Abdullah1, Dr. Damon Berridge1, Professor Eric Emerson2 and Dr. Zainul Ahmad Rajion3 1 Department of Mathematics and Statistics, Lancaster University 2 Institute for Health Research, Lancaster University 3 School of Dental Sciences, Universiti Sains Malaysia Keywords: thematic analysis, cleft, children, exploratory factor analysis

This study investigates triangulation of the findings of a thematic analysis by apply- ing an exploratory factor analysis to themes identified in a qualitative study. Six focus groups discussions with elementary school age children’s experiences of living with cleft were carried out during qualitative phase of the study. The parents experiences of parenting a cleft were interviewed at the same time by a trained interviewer. A scale was developed from thematic analysis of elementary school age children’s ex- periences of living with cleft and parents experiences of parenting a cleft child. The same qualitative analysis approach was also used to merge the themes from inter- views and focus groups with the themes provided by previous studies. The scale was divided into three parts; the first part is to measure the impact of the condition; the second part is to determine frequency of the problems and the third part is to measure perception of the children about their condition. Each part consisted of 28, 32, and 13 items respectively. Thematically, the first and second parts of the scale are developed from 7 global themes which are predefined from the thematic analysis:

61 treatment and support; physical functioning; self-concept; peers acceptance and so- cial competence; oral symptoms; emotional adjustment; norms and beliefs. The third part was developed from 3 themes: emotional adjustment; norms and beliefs; self- concept. The scale was piloted on 31 children in their upper elementary school (9 to 12 years old) and the data were analyzed using exploratory factor analysis. The ex- tracted factors validate the original qualitative analysis by translating the qualitative themes into statistical factors. The statistical analysis may inform further qualita- tive analysis such as clinical impact analysis and quantitative analysis such as Rasch analysis or confirmatory factor analysis.

Start time 10:00

PORE PATTERNS IN FINGERPRINTS Bryony Hill University of Warwick, UK Keywords: fingerprints, image analysis, spatial point patterns, tensors

There are three levels of features in fingerprints: Level 1 - the overall pattern, Level 2 - the bifurcations and ridge endings (minutiae), and Level 3 - the pores and ridge contours. Experiments have shown that Level 3 features hold significant discrimina- tory information and they are being used increasingly in fingerprint matching. I have been studying the point pattern formed by the pores with the aim of roughly reconstructing the ridgelines of the fingerprint. In this talk I will show how tensors can be used to summarise local orientations in point patterns and how these local estimations can be used to estimate ridgelines.

Start time 10:25

STATISTICAL SHAPE ANALYSIS AND APPLICATIONS IN BIOINFORMATICS Christopher Fallaize , UK

The shape of an object can be regarded as all the information remaining after remov- ing the effects of the similarity transformations of rotation, translation and rescaling. A common goal in statistical shape analysis is to analyse differences in shape between two or more objects. We first seek an optimal registration which aligns the objects as closely as possible in some sense, after similarity transformations, before proceeding with a statistical analysis. A common approach to shape analysis involves reducing an object to a configuration of points, or landmarks, which capture the important aspects of the overall shape.

62 In many applications, these landmarks are easily identifiable and can be matched to corresponding landmarks between objects of the same type; for example, when com- paring the shape of two human faces, landmarks could include the tip of the nose and corners of the eyes. An emerging application of statistical shape analysis lies in the field of bioinformat- ics, where comparing the shape of molecules, for example proteins, is of interest. The shape of a molecule, or form if scale is not removed, is considered important in pre- dicting its function, since biological function is closely related to shape. An added complication in this application is that of identifying matching landmarks on dif- ferent molecules, usually the atom locations. This is an example of unlabelled shape analysis, where the objective is first to identify matching landmarks on each configu- ration, before analysing shape differences in the usual way. Here, a Bayesian formulation to the problem of unlabelled shape analysis will be pre- sented, along with an application to identifying the atoms held in common by a set of molecules with similar biological functions.

12.2.4 Session 6a: General III Session Room: George Fox 3 and 4 Chair: Rebecca Killick

Start time 11:10 AN ADDITIVE PENALTY APPROACHTO DERIVATIVE ESTIMATION Andrew Simpkin1 and Dr. John Newell2 1 National University of Ireland, Galway, Ireland 2 Clinical Research Facility, NUI, Galway, Ireland Keywords: Derivative estimation, Spline smoothing, Additive penalty The methods of Aldrin (2004) and Belitz & Lang (2008) employ an additive penalty structure to a P-spline model for increased sensitivity in smoothing a function of the data, e.g. find f which minimises n m m X 2 X k 2 X k+1 2 (yi − f(xi)) + λ1 (∆ αj) + λ2 (∆ αj) i=1 j=k+1 j=k+2 We extend this approach to handle derivative estimation. The extra penalty term allows for increased flexibility which is often required for derivative estimation. Pos- sible methods for selecting the smoothing parameters shall be presented alongside discussion of whether further extensions in additive penalty terms could aide the process. Several authors have pursued the idea of a non-constant value of λ in a spline smoothing context. The methods of Ruppert & Carroll (2000) and Krivobokova et al. (2007) shall be compared with our own through simulations and applications in biomedical data. The possibility of an additive adaptive penalty shall be discussed.

63 Start time 11:35

BAYES LINEAR VARIANCE-COVARIANCE ADJUSTMENT FOR DYNAMIC LINEAR MODELSWITH APPLICATION TO LARGEINDUSTRIAL SYSTEMS David Randell1, Prof. Michael Goldstein1 and Philip Jonathan2 1 University of Durham 2 Shell Technology Centre Thornton Keywords: Bayes Linear,Dynamic Linear Models, Covariance Learning, Corrosion Modelling, Multivariate

Modelling of complex corroding industrial systems is critical to effective inspection and maintenance for assurance of system integrity. We model wall thickness and corrosion rate for multiple dependent corroding components using a DLM. Using a Bayes Linear approach we will show how to learn about model variances and covari- ance structures using exchangeability assumptions and the mahalanobis distance

Start time 12:00

RAPID APPLICATION DEPLOYMENTWITH R Wayne R. Jones Shell Global Solutions (UK)

In this presentation we discuss our experiences of deploying R client based solutions. We demonstrate that R packages such as “R-(D)Com” and “RODBC” that allow com- munication with other applications (e.g. Excel) together with R-Gui packages such as “rpanel” and “tcltk” makes for a very powerful combination of tools for build- ing customised statistical applications. Thanks, in part, to the very concise nature of R-programming such applications can be very quickly developed with extreme ease making previously unviable consultancy projects due to cost/benefit or manpower constraints achievable. The scope for using R based applications within Shell is enor- mous and we have already deployed numerous diverse solutions across all areas of the business including: Monte Carlo simulation tools, Customised Data Visualisation Tools, Forecasting toolbox, Groundwater monitoring application, automatic report generation and curve fitting to name but a few.

Start time 12:25

WHYWOULDANYONE (INTHEIRRIGHTMIND) BUY STATISTICA WHENTHEYCANGET R FORFREE?

64 Dr Jurek Gurycz StatSoft Ltd

Over the last few years, R has taken the (academic) statistical world by storm, becom- ing the de facto standard platform for all new methodological developments. A vast library of analytical tools are now freely available to all, with the support of an active global user community. However, due in part to the lack of a comprehensive graphi- cal user interface, its penetration into industry and adoption by non-statisticians has been somewhat slower. Rather than trying to resist the tide, StatSoft have chosen to integrate R into STATIS- TICA in a way that no other statistical software supplier has done, making STATIS- TICA into a powerful deployment environment for R routines, with applications in R&D, production, commerce and data mining. STATISTICA users now have access to the vast potential of the R project, without sacrificing the familiar, spreadsheet-based data management and interactive graphics. The presentation will be illustrated using a number of practical examples.

12.2.5 Session 6b: Multivariate Statistics Session Room: George Fox 5 and 6 Chair: Matthew Sperrin

Start time 11:10

BUILDINGA PRIORFORA VARIANCE MATRIX USING THE CONCEPTOF INDIVIDUAL VARIATION Sarah Germain, Dr. Malcolm Farrow and Professor Richard Boys Newcastle University, UK Keywords: Elicitation, Multivariate normal distribution, Variance matrix, Uncertainty factors

Specifying a prior for a variance matrix is difficult because of the constraint that it must be positive–definite. A simple response to this challenge is to restrict the sup- port of the prior to a less complicated space by assuming some simple parametric structure, such as diagonality or compound symmetry. However the posterior for the variance matrix is then only non–zero on this restricted space. Another common solution is to follow a conjugate analysis using the Wishart distribution as a prior for the precision (inverse variance) matrix. In terms of prior elicitation this approach is unsatisfactory because the Wishart distribution only has a single degree of freedom parameter to completely specify all second–order moments. In the literature several reparameterisations of the variance matrix have been proposed (e.g. Cholesky or spectral decompositions, matrix logarithm transformation) which transform to a less

65 constrained space for which more flexible prior distributions are available. However, the new parameters are generally lacking in statistical meaning which creates obsta- cles for building prior belief specifications. Based on the representation theorem for second–order exchangeable structures, a statistically meaningful reparameterisation of the variance matrix in an unconstrained space is presented. As a first step the aleatory (random) variation in the data is transformed into more interpretable com- ponents termed individual variation factors, which have zero mean, are assumed un- correlated and may be of higher dimension than the data. Specifying a coherent joint distribution for all the variances and covariances in the variance matrix can therefore be reduced to the specification of a joint distribution for the variances of the indi- vidual variation factors, which must only respect the constraint of positivity of the variances. In essence this is a factor model for the elements of the variance matrix. A prior dependence structure for the individual variation factor variances can be de- veloped by dividing uncertainty about the variances into specific and common parts. This facilitates the elicitation of substantive prior information about the variance ma- trix although simple variance structures such as compound symmetry can also be recovered by, in this case, assuming there to be only two distinct individual variation factors.

Start time 11:35

EFFICIENT GEE WHENTHECLUSTERSIZEISINFORMATIVE Menelaos Pavlou1 and Andrew Copas2 1 Department of Statistical Scinece, UCL, London, UK 2 Statistical Methodology Group, MRC Clinical Trials Unit, London, UK Keywords: Generealised estimating equations, informative cluster size, repeated measurements

Generalised Estimating Equations (GEE) is a method to fit marginal models to re- peated measurements and became very popular due to the ease of implementation and loose distributional assumptions required. In the application of GEE it is known that greater efficiency can be achieved when using a realistic correlation structure rather than independence. A limitation in the use of GEEs which will be the main issue of this paper is the behaviour and performance of GEEs with a realistic corre- lation matrix when the cluster size is informative. It is implicitly assumed for the simple GEE that the size of the cluster is unrelated to the parameters under study and any missing data are missing completely at random (MCAR). As it is noted in the literature, informative cluster size in the context of clustered data arises when the expected outcome of interest is related to the cluster size. When the cluster size is informative then the analyst can choose between two analyses, either for the popula- tion of clusters or for the population of measurements. Several authors have demon- strated that applying a modified GEE inversely weighted by the cluster size gives an inference about the population of clusters. It is also noted that in both of the above analyses the correlation structure selected should be the independence, otherwise

66 the GEE will be biased. Recently Chiang et al (2008) have proposed a method to increase efficiency when the cluster size is informative and when the population of interest is the population of clusters. In this paper we propose an alternative method (WGEE) to increase efficiency when the cluster size is informative. The benefit of our weighted GEE with a non independence correlation matrix is that the weights can be selected accordingly to give inference for the population of clusters or population of measurements. We also attempt to value the performance of the Chiang method and our WGEE in the occasions where the data are non-mean balanced. We term this sce- nario “informative covariate structure” and we introduce additional populations for inference. We present the corresponding estimation methods in simple cases where the covariates are categorical cluster varying and we also suggest an adaptation of the WGEE to increase efficiency. We demonstrate an application of our methods on a dataset from the Delta trial which looks at the risk of certain adverse effects.

Start time 12:00

MULTIPLE IMPUTATIONS OF BIO-DATASETS Lu Zou University of Sheffield, UK Keywords: EM algorithm, Additive Regression, Mean Matching

This presentation will start with a brief introduction to two Bio-datasets involved in my study. One inevitable issue is that many values are missing in both sets. Rather than ignoring them, imputation is considered. This talk will focus on the imputa- tion of continuous variables which are to be used as Biomarkers in two situations: i) normal randomly missing situation and ii) a ’File-matching’ situation. Two Multiple Imputations methods are studied: the Multiple Imputation using Additive Regres- sion, Bootstrapping and Predictive Mean Matching (PMM) and the EM imputation combined with re-sampling methods are investigated. I will introduce in details how they work, and then describe my own EM function. A comparison is carried out using the Bio-datasets.

Start time 12:25

GRAPHICAL GAUSSIAN MODELSWITH SYMMETRIES Helene Neufeld Department of Statistics, 1 South Parks Road, Oxford, OX1 3TG

This talk is concerned with new types of graphical Gaussian models obtained by plac- ing symmetry restrictions on their parameters: equality between specified elements of the concentration matrix (RCON), equality between specified partial correlations (RCOR) and restrictions that are generated by permutation symmetry (RCOP). The

67 models can be represented by vertex and edge coloured graphs, where parameters that are associated with equally coloured vertices or edges are restricted to being identical. We will give results on mutual relationships between the model types and, by means of an example, will describe our current research interest: the identification of RCOP models. We have developed a computer programme which, using a necessary and sufficient condition we have derived, determines whether a given vertex and edge coloured graph represents an RCOP model. The core computations are performed by the publicly available computer programme nauty (http://cs.anu.edu.au/ ˜bdm/nauty/) which computes the automorphism groups of vertex coloured graphs and digraphs. We will outline the underlying algorithm of our programme and will describe potential future directions, including graph modification.

Start time 12:50

AMULTIDIMENSIONAL SCALING APPROACHTO MULTIVARIATE PAIRED COMPARISON DATA Stephen Bennett and Trevor Cox Unilever R&D Port Sunlight

The paired comparison test (or 2-alternative forced choice test) is one of the simplest and most used sensory tests. Panellists compare products in pairs and assess vari- ous attributes, such as fragrance, saltiness, overall preference, etc. Data generated for each attribute are usually analysed using the Bradley Terry model. Each product in the test is assigned a rating parameter, πi, for the attribute and then the probability that product i is deemed to have stronger fragrance, to be more salty, or is preferred over product j is modelled as πi/(πi + πj). Maximum likelihood is then used to esti- mate these parameters. The analysis can be carried out separately for each attribute but a multidimensional scaling approach that models all attributes concurrently can give more informed results on the interactions between attributes.

12.2.6 Session 6c: Medical II Session Room: George Fox B59 Chair: Maria Roopa

Start time 11:10

USEOF MIXTURE DESIGNSIN PHARMACEUTICAL PRODUCT DEVELOPMENT Mark Whitlock Pfizer Global Research & Development, Sandwich, Kent

68 A mixture design is a type of response surface design in which the factors are con- strained to sum to a constant and the response is a function of the component pro- portions rather than the amounts. These occur commonly in Pharmaceutical product development. For example, the formulation of a tablet will contain a mix of the active drug and various excipients. In this talk real examples of mixture experiments applied to Pfizer experiments are discussed and some techniques for dealing with non-standard problems, e.g. con- strained component proportions and irregular design regions, are illustrated.

Start time 11:35

CONTACTTRACINGINASTOCHASTICEPIDEMIC Edward Knock University of Nottingham, UK Keywords: Epidemics, contact tracing, threshold behaviour, extinction probability, type reproduction number

The naming and tracing of contacts of a diagnosed individuals, so that treatment or quarantine may be directed towards them, is an important tool in the control of emerging epidemics, such as smallpox or SARS. We examine the threshold be- haviour for a stochastic epidemic model with contact tracing by considering the em- bedded branching process of unnamed individuals. The effect of different distri- bution choices for infectious period, latent period and tracing delay are considered through analytical results and simulations. It is seen that distribution choice has an influence on the effectiveness of the tracing scheme.

Start time 12:00

HOWMUCHDOESANINFECTIONTREETELLUSABOUTA CONTACT NETWORK? Robert Goudie1 and Dr Geoff Nicholls2 1 University of Warwick, UK 2 , UK Keywords: Infectious diseases, Social networks, Approximate Bayesian computation

Two networks are considered in epidemiological studies: the actual transmission pathway of the infection (the ‘transmission tree’), and the potential transmission routes of the infection (the ‘contact network’). These are usually considered sepa- rately, but in this talk I shall investigate how much a transmission tree tells us about the contact network. We simulate a transmission tree using a simple stochastic SIR epidemic model, and model the underlying contact network as an Exponential Random Graph Model

69 (ERGM). We examine the posterior distribution of a parameter of the ERGM, given the transmission tree. This is analytically intractable, but using Bridge Sampling we can find an approximation. I shall show that for some specific cases, the posterior distribution of the ERGM pa- rameter is far more concentrated than its prior. This shows that transmission trees can inform inference of contact networks.

Start time 12:25

A METHODFORCOMBININGBINARYASSESSMENTSOF STROKEOUTCOMES:ASSESSMENTOFACCURACY Zakiyah Zain Lancaster University Keywords: global test, stroke

Assessments of stroke outcomes are usually made at 3 months after treatment, on ordinal scales. Several scales exist, and their outcomes are often dichotomised to rep- resent either a success or failure of treatment. Since the stroke outcomes on multiple scales for the same patient are not perfectly correlated, combining these binary as- sessments offers an increase in power. Such a method is a global statistical test which enables an overall assessment of treatment efficacy for a combination of correlated outcomes. The purpose of this study is to investigate the accuracy of using a global test approach in the case of binary data. Outcomes from previous studies (Bolland et al.; 2008) on the use of drug citicoline in the treatment of acute stroke are used for illustration. Simulations are carried out under the null and alternative hypotheses to verify type I error rate and the power of test.

Start time 12:50

JOINT MODELLINGOF EVENT COUNTSAND SURVIVAL TIMES:EXAMPLE USING DATA FROM THE MESSTRIAL Jennifer Rogers University of Warwick, UK Keywords: Epilepsy, Survival Analysis, MESS

On average, seizure recurrence in those individuals who have had only a single epileptic seizure is around 50%. Treatment with antiepileptic drugs (AEDs) carries the risk of unpleasant side-effects, but for most patients diagnosed with epilepsy the benefits of AEDs will far outweigh the risks. For those individuals who have had only a single seizure, or who have mild and infrequent seizures, the question of when to start treatment becomes an interesting one.

70 We consider the analysis of data from the MRC Multicentre Trial for Early Epilepsy and Single Seizures (MESS), which was undertaken to assess the differences between two policies: immediate, or deferred treatment in patients with single seizures or early epilepsy. The data comes in the form of a pre-randomisation seizure count and multiple post-randomisation survival times. Standard survival analysis may treat this pre-randomisation count as a fixed covariate. A possible extension to this may be to treat the counts as covariates measured with error. We present a further al- ternative and consider a joint model for the pre-randomisation seizure count and post-randomisation survival times.

12.2.7 Session 7a: Clinical Trials Session Room: George Fox 3 and 4 Chair: Jennifer Rogers

Start time 14:10

THE IMPACT OF DROPOUTSONTHE ANALYSIS OF DOSE-FINDING STUDIESWITH RECURRENT EVENT DATA Mouna Akacha1 and Norbert Benda2 1 University of Warwick,UK 2 Novartis Pharma AG, CH Keywords: Missing Data, Recurrent Event Data, Dose-Finding Studies This presentation is motivated by an upcoming Phase II dose-finding study, where the number of events per subject within a specified study period form the primary outcome. The aim of this study is to determine the efficacy of a new drug compared to an active control. In particular, we are interested in identifying the dose-response relationship and the target dose for which the new drug can be shown to be simulta- neously safe and as effective as the control. Given that the outcome is pain-related, we expect a considerable number of patients to drop out before the end of the study period. The impact of missingness on the analysis and diverse models for the missingness process must be carefully discussed. The recurrent events are modelled as over-dispersed Poisson process data, with age and dose as covariates. Constant and time-varying rate functions are examined. Based on these models the impact of missingness on the precision of the target dose estimation is evaluated. Diverse models for the missingness process are considered, including dependence on covariates and number of events. The performance of four different analysis methods (a complete case analysis, a last observation carried for- ward analysis, a direct likelihood analysis and an analysis using pattern mixture models) is assessed via simulation studies. It is shown that the target dose estimation is robust, given the same missingness process holds for the target dose group and the active control group. Furthermore, we demonstrate that this robustness is lost as soon as the missingness mechanisms for the active control and the target dose differ.

71 Start time 14:35

TECHNICAL INFERENCE PROBLEMSIN THREE-ARM NON-INFERIORITY TRIALS Nor Afzalina Azmee2 and Dr Nick Fieller2 1 University of Sheffield, UK 2 University of Sheffield, UK Keywords: non-inferiority trials, Fieller’s confidence interval

The aim of a non-inferiority clinical trial is to show the effect of a new experimental treatment is at least as good as a standard or reference treatment currently in use. By having a three-arm non-inferiority trial (three separate groups receiving experi- mental, reference and placebo treatments), assay sensitivity problem observed in the two-arm non-inferiority can be eliminated as this design allows direct assessment to be made with respect to placebo. Standard statistical analysis involves the use of Fieller’s theorem in determining the confidence interval for the ratio of difference of means. In this two-stage testing, it can be shown that a preliminary test of refer- ence against placebo is a crucial step and must be done to ensure interpretability of confidence intervals. Abandoning the preliminary test leads to possibility of having infinite, exclusive and imaginary intervals.

Start time 15:00

INCORPORATING PRIOR INFORMATION INTO CLINICAL TRIAL DESIGN Shijie Ren University of Sheffield, UK Keywords: Assurance, Expert elicitation, Gaussian process

In clinical trial designs, sample sizes can be determined using assurance instead of traditional power. Assurance is the expectation of the power, averaged over the prior distribution for the unknown true treatment effect. It is the unconditional probability of obtaining a statistically significant difference between treatments. Using assurance avoids assuming a value for an unknown true treatment effect and gives an overall predictive probability of obtaining a specific outcome. However, decision-makers will not trust the prior distribution as an aid to a trial de- sign until they have confidence that experts’ knowledge can be reliably formulated and elicited. One way of eliciting a probability distribution is to elicit a finite num- ber of statistical summaries from the expert, and then to fit these elicited summaries into some member of a convenient parametric family. However, using this method

72 the expert’s beliefs are forced to fit a particular parametric family and given only fi- nite elicited summaries, many other possible distributions might fit these summaries equally well. There is uncertainty about the expert’s true probability distribution. A nonparametric approach involving Gaussian process prior can represent this uncer- tainty.

Start time 15:25

THE BAYESIAN ANALYSIS OF PHASE IIDOSE TITRATION STUDIESIN ORDERTO DESIGN PHASE III Alun Bedding GlaxoSmithKline

In order to protect patients from safety problems, drugs may be given according to a dose titration scheme. The aim is to dose a drug at a low level and using responses to determine what dose should be given next, up to an optimum level. This process relies very much on the opinion of an investigator and rarely uses quantifiable meth- ods. The analysis of dose titration is also inefficient in that it does not use all of the data. The effect of this could be a lower probability of success of a drug and therefore a reduction in potentially good medicines being available for patients. This presentation deals with the design and analysis of dose titration studies in Phase II clinical trials in order to predict the dosing in Phase III to give the highest predicted probability of success. Bayesian and adaptive trial design methods are used to enable the optimal dosing of Phase II to choose the optimal dose scheme in Phase III.

12.2.8 Session 7b: Financial Session Room: George Fox 5 and 6 Chair: Peter Windridge

Start time 14:10

COMPUTATIONAL AND STATISTICAL ASPECTS OF PRICING MODELS Xiaojuan Ma and Dr.Sergey Utev University of Nottingham,UK Keywords: Black-scholes, Levy process, Markov Chain

The classical methods to solve stock option pricing such as Black-Scholes formula, Merton formula are often based on Brownian Motion or Levy processes assumption.

73 My current research is to concentrate on whether the classical assumptions are satis- fied for the real stock market. In this project, we constructed the share price model by analysing the real history return prices in stock market and did the simulation of pay-off function of American put option. The first stage was the model selection problem, which is an important issue. When we have applied several popular models such as a Black-Scholes and other geometric Levy processes, we have discovered that the traditional assumptions can not be fitted to data. We have found that a certain form of the Markov chain model is much better way to model the real data, which supports the method. We are justifying our results by two different statistical approaches which involve a heavy computation using C++. After completion of the model construction and its justifi- cation, our second stage was to analyse pricing models as compared under different model assumptions.

Start time 14:35 BAYESIANMODELINGOFTRADEBYTRADEOFPRICE MOVEMENTS Golnaz Shahtahmassebi , UK Keywords: Bayesian Modeling, Financial Markets, MCMC, Ultra High frequency data, Availability of trade by trade data of individual transaction has allowed many ad- vances in the field of quantitative analysis of financial markets. This kind of data is usually referred as ultra high frequency data or trade by trade data. In this study a dataset containing almost 600,000 transactions was analysed. We applied the ADS model to help us understand price change dynamics better, where A stands for ac- tive price, D for direction of price change and S denotes size of price change. This model decomposes a potential price change into an active/inactive price compo- nent, a direction of price change component and a size of trade component. We adopted a Bayesian framework and obtained inferences about logistic models for the active/inactive and the direction of price change components. We also performed inference about a geometric model for the size of trade component. To simulate from the posterior distributions of parameters of the proposed models, random walk Metropolis-Hastings algorithms were implemented in R using data from the FTSE 100 futures contract with June 2008 expiry. The Geweke and Gelman tests were used to check the convergence of implemented Metropolis Hastings algorithms. Our study enabled us to assess the practical aspect of the implementation of the Metropolis Hastings algorithm on Bayesian models for high frequency financial data. Estimated models showed that the probability of price changes were not significantly different for randomly selected minutes. Our study has paved the way for a promising appli- cation of the Bayesian model for high frequency data especially those arising from financial markets.

74 Start time 15:00

BAYESIAN NONPARAMETRIC INFERENCE IN STOCHASTICVOLATILITYMODELLING Eleni-Ioanna Delatola and Jim E Griffin University of Kent, UK Keywords: Stochastic Volatility Model, Bayesian Nonparametrics

Interest in modelling financial data dates back to the start of the last century. Stochas- tic volatility models have become a popular approach. They capture the unobserved time-varying volatility of the log returns using stochastic processes. In our work, we use Taylor’s representation (1982) and model the return distribution nonparametri- cally. We extend the work of Kim, Shephard and Chib (1998) which samples simul- taneously the unobserved volatilities using an approximate offset mixture model of 2 the log χ1, by defining efficient computational methods. To illustrate this method we analyze S&P index data.

Start time 15:25

QUANTITATIVE RESEARCH AND CAREER OPPORTUNITIES IN AHL, PART OF MAN INVESTMENTS Remy Cottet Man Investments

Man Group plc, the largest listed hedge fund manager in the world, has a long track record of delivering robust investment performance in both bull and bear markets. AHL - its entirely systematic algorithmic trading managed futures manager - has been incredibly successful for over 20 years and manages over USD 20 Billion of investor funds. The AHL research team employs a wide range of quantitative prac- titioners including statisticians and biostatisticians, mathematicians, engineers, com- puter scientists, physicists and econometricians . In this talk, we illustrate some as- pects of the problems they tackle by deriving a simple algorithmic trading system and consider how to systematically dynamically manage its risk in trading.

12.2.9 Session 7c: General IV Session Room: George Fox B59 Chair: Helen Thornewell

Start time 14:10

75 175 YEARSOFTHE ROYAL STATISTICAL SOCIETY Janet Foster Royal Statistical Society

This talk will cover the history of the Society from its foundation in 1834 and the archives recording that history. Looking at how the Society has changed from its original purpose and organisation, through its development in the 19th and 20th cen- turies. Along the way some of the famous, statistiscians and others, who have been Fellows will be featured and illustrations of examples from the archives will be used.

Start time 14:35

ANEW SELECTION METHODFOR MULTINOMIAL DATA Rebecca Baker University of Durham, UK Keywords: imprecise probability, predictive inference, categorical data, selection

Based on observations from a multinomial data set, a new method is presented for selecting a single category or the smallest subset of categories, where the selection criterion is a minimally required lower probability that mS future observations will belong to that category or subset of categories. The inferences about the future ob- servations are made using an extension of Coolen and Augustin’s NPI model to a situation with multiple future observations.

Start time 15:00

DIRICHLET DIFFUSION TREES Geoff Littler University of Bath, UK Keywords: Bayesian nonparametrics, Dirichlet Diffusion Trees

Dirichlet Diffusion Trees (DDTs) were first introduced by Neal (2003). They provide a flexible method of Bayesian nonparametric analysis and are a way of generating an exchangeable data set that can incorporate hierarchical structure. In Bayesian non- parametrics we have less constraints than in the parametric approach. We do not specify a parametric family, but rather let the number and nature of parameters be flexible, and instead let the data, along with a prior distribution on the space of distri- butions, define the structure of the model. The nonparametric approach to Bayesian Statistics allows support for more ’eventualities’ than are allowed by a parametric prior. I will look at the distributions and characteristics of DDTs starting from the basic tree and building up, as well as considering practical uses for the method

76 13 Poster Abstracts by Author

HEDGINGTHE BLACK SWANS:CONDITIONAL HETEROSCEDASTICITYAND TAIL DEPENDENCEIN S&P 500 AND VIX Sawsan Abbas Lancaster Univeristy, UK Keywords: Hedging, Equity Risk Exposure, VIX Futures, Conditional Heteroscedasticity, Tail Dependence, Extreme Value Theory

We propose an extreme value approach for hedging the risk exposure associated with holding a long equity position using volatility futures as a hedging instrument. The proposed approach takes into account the statistical features of the data being ana- lyzed properly. In addition, a great deal of attention has been paid to the marginal tail modelling as well as the extremal dependence structure modelling. A major ad- vantage of the proposed approach is the facilitation of a direct simulation from the conditional predictive joint distribution of the returns on the financial assets of in- terest. Therefore, different distributional aspects can be tracked and evaluated by a Monte Carlo approximation. Moreover, the proposed approach is able to address losses which are outside the range of the historical data and hence it is a useful tech- nique for estimating hedge ratios that emphasize on the downside risk of a port- folio. The proposed approach was applied to return data on the S&P 500 index and the CBOE volatility index (VIX) futures over the period from March 26, 2004 to May 30, 2008. The proposed approach was found to be promising for hedging equity market risk exposure. It performs well in terms of different hedging performance measures and outperforms the conventional approach represented by a time-varying minimum variance hedge ratio estimated by ordinary least squares.

DEVELOPINGOPTIMALDYNAMICTREATMENTREGIMES VIAREGRET-REGRESSION Deyadeen Alshibani Newcastle University Keywords: adaptive strategies, sequential decisions, dynamic programming, causal inference

77 We consider the problem of deriving optimal dynamic treatment regimes from obser- vational data. In samples of modest size there is no realistic alternative to parametric modelling of at least some components of observables. In turn this brings the risk that the chosen model is not suitable for the data. Fundamental statistical practice of model building, checking and comparison has had little attention so far in this literature. In our work we propose a modelling and estimation strategy into a regres- sion model for observed responses. Estimation is quick and diagnostics are available, meaning a variety of candidate models can be compared.

IMPUTATION ON 2-DIMENSIONAL DATA VIA LIFTING Robert G. Aykroyd, Stuart Barber and Samuel J. Peck Department of Statistics, University of Leeds

We describe a method for imputing from a grid of irregularly spaced data points on to any other grid using a Voronoi based lifting scheme. The lifting scheme is a generalisation of wavelet decompositions. Here, the lifting scheme is used as an in- terpolating/smoothing method for data on an irregular grid. Suppose we have some data collected on an irregular two-dimensional grid. Let T th XF = (x1,..., xn) be the matrix of data points, where xi is the location of the i data point. Also let fi = f(xi), i = 1, . . . , n be function values at the data points ob- served with error, i.e. f(x) = g(x) +  where g(·) is the true function and  ∼ N(0, σ2). We wish to make estimates of the function value at points where we have no ob- T servations, say XM = (xn+1,..., xn+m) . To make these estimates, we perform a lifting transformation on the combined data sets and estimate the missing values fi, i = n + 1, . . . , n + m based on the expected sparsity of lifting coefficients. We show an example of this method on real data, and a comparison to similar meth- ods – Heaton and Silverman’s MCMC-based imputation method and Kriging.

VALIDATIONOFCOMPUTERMODELS Leonardo Bastos University of Sheffield, UK Keywords: Computer models, Gaussian process, Emulation, Calibration

Mathematical models, usually implemented in computer programs known as simu- lators, are widely used in all areas of science and technology to represent complex real-world phenomena. Simulators are often sufficiently complex that they take ap- preciable amounts of computer time or other resources to run. In this context, a methodology has been developed based on building a statistical representation of

78 the simulator, known as an emulator. The principal approach to building emulators uses Gaussian processes. This work presents some diagnostics to validate and assess the adequacy of statistical emulators. These diagnostics are based on comparisons be- tween simulator outputs and Gaussian process emulator outputs for some test data, known as validation data, defined by a sample of simulator runs not used to build the emulator. Our diagnostics take care to account for correlation between the validation data.

ASSESSINGVARIABILITYINHISTOLOGICALGRADEOF BREASTCANCERTUMOURS T.R. Fanshawe Lancaster University, UK

We illustrate two methods for assessing inter-rater agreement when there are miss- ing data and a large number of raters. We apply both methods to a large but incom- plete data-set of 24177 grades of 52 breast cancer tumours, measured on a discrete 13 scale, provided by 732 pathologists (raters). Our aim is to summarise the extent of the agreement between the raters in assessing the subjective criteria for breast cancer grading. The first method is a simple non-chance-corrected agreement score based on the ob- served proportion of agreements with the modal grade for each tumour. The second method uses a Bayesian latent trait method to model probabilities of assigning spe- cific grades as functions of rater- and tumour-specific parameters. Via simulation from the fitted model we obtain estimates of probabilities that raters will agree on a given tumour. Both methods suggest that there are substantial differences between raters in terms of rating behaviour, owing to the tendency of certain raters to over- or under-estimate grades relative to the majority. In many cases there is considerable uncertainty over which grade should be assigned to a given tumour, a fact often ignored in research in this area.

NON-GAUSSIAN BAYESIAN SPATIOTEMPORAL MODELING Thais C O Fonseca and Mark F J Steel University of Warwick Keywords: Bayesian Inference, Temperature data, Spatiotemporal modeling

In this work, we develop and study nongaussian models for processes that vary con- tinuously in space and time. The main goal is to consider heavy tailed processes

79 that can accommodate both aberrant observations and clustered regions with larger observational variability. These situations are quite common in meteorological ap- plications where outliers are associated with severe weather events such as tornados and hurricanes. In this context, the idea of scale mixing a gaussian process as pro- posed in Palacios and Steel (2006) is extended and the properties of the resulting process are discussed. The model is very flexible and it is able to capture variabil- ity across time that differs according to spatial locations and variability across space that differs in time. This is illustrated by an application to maximum temperature data in the Spanish Basque Country. The model allows for prediction in space-time since we can easily predict the mixing process and conditional on the latter the finite dimensional distributions are gaussian. The predictive ability is measured through proper scoring rules such as log predictive scores and interval scores. In addition, we explore the performance of the proposed model under departures from gaussianity in a simulated study where data sets were contaminated by outliers in several ways; overall, the nongaussian models recover the covariance structure well whereas the covariance structure estimated by the gaussian model is very influenced by the con- tamination.

USING CLICKSTREAM DATA TO DISINGUISH HIGH POTENTIONALVISITORSTOE-COMMERCEWEBSITES M.A. Jamalzadeh University of Durham, UK

Since the advent of the Internet, the ability of websites to track visitors has been one of the most promising aspects of the new media. Clickstream data gathered from a web site can provide insight into the behaviour, buying habits and preferences of website visitors who can be considered as prospective customers. For web managers, details of Web usage behaviour provids the opportunity to study such browsing and nav- igate behaviour through websites, and to assess site performance in various ways. From the e-commerce point of view, the conversion rate (sale of a product) of a web- site is of the utmost importance to website managers. Hence, it is highly desirable to improve this performance by predicting and understanding online-buying behaviour of website visitors. In our study, we investigate data from commercial websites selling products and ser- vices on the Internet. Server log files provide tracking data which contain the general clickstream information from website visitors. We also have conversion data. This comprises converted visitor information such as time, date, IP, agent, and amount of purchase. Having merged the two sources of data, we establish hypotheses regard- ing buying behaviours and causal relationships, as well as relevant exploratory data analysis. The time duration spent on a website, the number of pages visited, hav- ing bookmarked the website against using Google search to enter the website, etc.,

80 are all factors which are of interest if they affect the probability of conversion. We also develop a logistic regression model to describe the relationship of clickstream information on the probability of conversion in a more general framework. Finally, the model is applied as a classification tool to identify visitors with high potential for online purchase.

A PARTIAL THREE-STATE MARKOVMODELFOR INTERVAL-CENSORED DATA Venediktos Kapetanakis, Ardo van den Hout and Fiona E. Matthews MRC-Biostatistics Unit, Cambridge, UK Keywords: Markov assumption, Semi-Markov, Partial-Markov, Multi-state model, Interval censoring, Interval regression Multi-state modelling is a method of analysing longitudinal data when the observed outcome is a categorical variable. This approach is particularly useful in medical ap- plications where the different levels of a progressive disease can be regarded as the states of a multi-state model. Fitting multi-state models involves several assump- tions. A common hypothesis is that the data satisfy the first order Markov assump- tion under which the transition to the next state depends only on the current state. As a result, the history of the process is ignored. However, this assumption may often be inappropriate. We investigate the case of a progressive disease with no recovery which can be sum- marised by three states: State 1: “Healthy”, State 2: “Not Healthy” and State 3: “Death”. We incorporate history in our model by using the time spent in the un- healthy state as a time-dependent covariate in the modelling of the transition inten- sity from State 2 to State 3. This involves the estimation of the exact transition time from State 1 to State 2, which is generally unknown. To overcome the difficulties imposed by the study design and the existence of left-, right- and interval-censoring, we model age at the time the transition actually oc- curred, (y), by fitting interval regression models adjusting for several covariates. Once we obtain both mean and variance estimates conditional on the covariate spec- ifications of a single individual, we assume normality to find the distribution of y, f(y), for that specific individual. Consequently, we derive the distribution f(y|y ∈ A) where A is the interval within which the transition from State 1 to State 2 could re- ally happen. As a next step, we obtain the mean value E(y|y ∈ A) which is our final estimate for the age the time the transition actually took place for that particular per- son. Thus, the computation of the total time spent in state 2 becomes straightforward and enables us to fit a three-state Partial Markov model where the transition intensity from State 2 to State 3 is adjusted for a time-dependent covariate that comprises part of the history of the process. The method is illustrated by an application where an individual with a history of stroke is considered to be in state 2.

81 MARKOV CHAIN MONTE CARLO METHODS FOR EXACT TESTS IN CONTINGENCY TABLES Shiler Khedri University of Durham, UK

We illustrate an outline of the Markov chain Monte Carlo methods in the analysis of contingency tables. The MCMC method is a valuable tool especially for sparse data sets where enumeration of the reference set is infisable and at the same time large sample approximation of p-value is not sufficiently accurate. A Markov basis for any model is a set of integer arrays that connects every pair of tables having the same margins in the model. A Markov basis B is minimal if no proper subset of B is a Markov basis. A minimal Markov basis is unique if there exists only one minimal Markov basis. Diaconis and Sturmfels (1998) presented a general algorithm for com- puting a Markov basis. Their approach relies on the existence of a Grobner basis of a well specified polynomial ideal. Unfortunately, their algorithm has not been largely used because a Grobner basis could very difficult to compute even for problem of moderate size. It should be noted that a Grobner basis is in general not symmetric because it depends on the particular term order. We consider the no three-factor interaction model in three-way con- tingency tables as an example, and show the difficulty in performing the Markov chain Monte Carlo method for this case.

TRENDSIN SEROTYPESAND MULTI LOCUS SEQUENCE TYPES (MLST) AMONGCASESOF INVASIVE PNEUMOCOCCAL DISEASE (IPD) IN SCOTLAND K. Lamb1, M. Diggle2, D. Inverarity3, J.M. Jefferies4, A. Smith5, G.F.S. Edwards2, J. McMenamin6, T.J. Mitchell3 and S.C. Clarke4 1 University of Strathclyde, UK 2 Scottish Meningococcus & Pneumococcus Reference Laboratory, Stobhill Hospital, Glasgow, UK 3 Glasgow Biomedical Research Centre, University of Glasgow, UK 4 Molecular Microbiology Group, University of Southampton, UK 5 Dental Hospital, University of Glasgow, UK 6 Health Protection Scotland, Glasgow, UK Keywords: Logistic regression, Trend analysis

Introduction Pneumococcal polysaccharide vaccine (Pneumovax II, Aventis Pasteur) was offered to over 65s in Scotland from 2003/04 and conjugate vaccine (Prevenar,

82 Wyeth) introduced in September 2006 for infants. The influence of population vacci- nation on trends in circulating serotypes and MLSTs among IPD cases is investigated. Methods Scottish Invasive Pneumococcal Disease Enhanced Surveillance (SPIDER) records all cases of IPD in Scotland since 1999. IPD cases, diagnosed from blood or cerebrospinal fluid isolates, until the winter season 2005/06 are analysed. Serotyping and MLST data exist from 2003 (serogroup from 1999). The Mantel Haenzel trend test was used with Bonferroni adjustment for multiple testing. Results On average 650 cases are reported each year, rising from 540 in 1999/00 to 740 in 2002/03. A subsequent drop occurred, primarily in over 65s, rising to 740 in 2005/06. 12% of cases occurred in under 5s and 35% affected over 65s. Pre- venar, (Wyeth) would have covered 47% of cases (68% under 5s, 40% 5-64, 48% over 65s). These percentages remain constant (p=0.11) despite increasing numbers of cases, p¡0.001. The commonest serotypes - 14(15%), 1(13%), 4(7%), 9V(7%), 8(6%), 3(6%), 23F(5%), 6B(4%), 7F(4%) and 19F(4%) - account for 71% of IPD. Serotype 1 (primarily affecting ages 5-64) accounted for 16% of cases in 2005/06, 11% in 2003/04 compared to 5% in 1999/00 demonstrating a significant increasing trend, p < 0.001. In 2003/04, 158 MLSTs were observed, 140 in 2004/05 and 116 in 2005/06 despite increasing numbers of cases, signifying a reduction in the diversity of MLSTs. The commonest MLSTs are 9(9%), 306(9%), 162(6%), 53(5%), 180(4%), 191(4%), 124(4%), 218(3%), 199(3%) and 227(3%). The proportion of IPD associated with ST306 has in- creased (p < 0.01). Conclusions Prior to the introduction of (Prevenar, Wyeth), the main change in serotype distribution was the increased prevalence of serotype 1, (predominantly ST 306), oc- curring initially in 2003/04.

THE UK SURNAMEDISTRIBUTIONANDPOTENTIAL APPLICATIONS IN CHILD HEALTH RESEARCH Fiona McElduff1, Pablo Mateos2, Angie Wade1 and Mario Cortina Borja1 1 Institute of Child Health, UCL 2 Department of Geography, UCL Keywords: surname frequencies, child health, surname diversity

Surnames in the UK are mostly patrilinearly inherited, so they correlated well with Y-chromosomes and can be used as genetic indicators. One application of surnames in the field of child health is as indicators of ethnicity in probabilistic record link- age. A potential application of surnames frequencies is their use in childhood dis- ease epidemiology as an indicator of genetic association. The ’enhanced electoral register’ contains the names and addresses of all adults entitled to vote in the UK, with additional non-registered voters sourced from commercial surveys and credit scoring databases. The 2001 register contains surname frequencies from 45.6 mil- lion individuals across 434 districts of the UK, which can be grouped into 12 regions.

83 Measures of vocabulary diversity developed in linguistics, such as Yule’s K, can be applied to surname frequencies to explore the diversity of surnames in a population. Surnames can also be categorized by their geographical origin using the National Trust profiler (http://www.nationaltrustnames.org.uk/). Our study [1] shows that the 12 geographical regions of the UK are well differentiated in terms of their sur- name structures, with districts in London and the South East of England having the highest surname diversity whilst Scotland, Northern Ireland and particularly Wales have the lowest.

References

[1] McElduff, F., Mateos, P., Wade, A., and Cortina Borja, M. (2008) What’s in a name? The frequency and geographic distributions of UK surnames. Significance. 5(4):189- 192.

GENERATIVE AND DISCRIMINATIVE MODELSFOR CLASSIFICATION AND CLUSTERING Mr. Adrian O’Hagan University College Dublin Keywords: discriminative, generative, linear discriminant analysis, logistic regression, Gibbs Sampler, Metropolis-Hastings, classification, clustering

Both generative and discriminative approaches to allocating observations to avail- able subgroups have been extensively explored in the literature, often in the context of datasets exclusively populated with labelled observations. Under a generative method, such as linear discriminant analysis, the data-generating mechanism is mod- elled directly as a means of classifying an observation’s group label given its covariate values. Under a discriminative method, such as logistic regression, the conditional probability of an observation’s group label given its covariate values is fitted via a parametric model. We extend the LDA approach to scenarios where labelling of observations is either completely absent or only partially available, and examine the resulting impact on misclassification rates. Both simulated data and the Iris dataset are investigated. Additionally, a series of LDA models whose covariance structures span a range of eigenvalue decompositions are fitted via the Gibbs sampler. In a similar vein a series of logistic regression models with matching covariance structures are fitted using the Metropolis-Hastings algorithm to fully labelled versions of the data. Techniques aimed at boosting acceptance rates and subsequent mixing of regression parameters in the Metropolis-Hastings algorithm are documented.

84 DESIGN AND ANALYSIS OF DOSE ESCALATION TRIALS Maria Roopa Thomas

Keywords: Dose escalation, Cohort effects, Bayesian methods

My Research work is motivated by the report of a RSS working party into the near fatal Phase I First -in - Man TeGenero trial published in the Journal of the Royal Sta- tistical Society Series A.The Royal Statistical Society established an expert group of its own to look into the details of the statistical issues that might be relevant to the study. First - in -man studies aim to find a dose for further exploration in Phase II trials and to determine the therapeutic effects and side effects. Dose escalation trials involve giving increasing doses to different subjects in distinct cohorts. One of the recommendations of the RSS working party was to consider cohort effects. Cohort effects can be influenced by many factors such as different types of people volunteer- ing at different times, changes in the ambient conditions, the staff running the trial, and the protocols for using subsidiary equipment. We are interested in Bayesian approaches for the design and analysis of dose esca- lation trials which involves prior information concerning parameters of the relation- ships between dose and the risk of an adverse event ,with the chance to update after every dosing period using Bayes theorem. In this poster I will discuss some of these issues.

VULNERABILITYOF BIBDSTO OBSERVATION LOSS: APPLICATIONS TO DESIGN SELECTION &CONSTRUCTION Helen Thornewell Maths Dept, University of Surrey, UK Keywords: Balanced Incomplete Block Designs, Design Construction, Design Selection, Experimental Design, Observation Loss, Repeated BIBDs, Vulnerability

If observations are lost during the course of an experiment, the properties of the even- tual design are different to those of the planned design. In some cases the eventual design can be disconnected, so that the original Null Hypothesis can not be tested. Therefore it is important to guard against disconnected designs. Balanced Incomplete Block Designs (BIBDs) are optimal Incomplete Block Designs with respect to various optimality criteria. However, it is also important to consider how vulnerable the candidate designs are to becoming disconnected through obser- vation loss.

85 The MRROS(i) Vulnerability Measure (Si,Ti) of a design determines the minimum size, Si, of an observation set, whose loss yields a disconnected design and the total number, Ti, of such observation sets of this size in the design. Such observation sets are defined as Minimum Rank Reducing Observation Sets (MRROS(i)s) and partition υ the υ treatments into two sets, the smallest of which consists of i ≤ 2 treatments. General formulae have been derived for calculating (Si,Ti) for BIBD(υ, b, k) and these can give rise to two different approaches: Design Selection and Design Construction. For some values of i and k, the MRROS(i) Vulnerability Measures of non-isomorphic BIBD(υ, b, k)s can vary, in which case the comparison of Vulnerability Measures of a set of competing designs helps to select the least vulnerable BIBD. The construction of larger experimental designs by replicating smaller BIBDs is con- sidered. For example a BIBD(υ, 2b, k) can be constructed from two exact replicates of a “building block” BIBD(υ, b, k). Techniques will be demonstrated for constructing the least vulnerable repeated BIBDs by applying permutations to the treatments in the subsequent replicates of a design, based on the knowledge of the MRROS(i)s of the “building block” BIBD(υ, b, k).

NON-EUCLIDEAN ANALYSIS FOR DIFFUSION TENSOR BRAIN MRIMAGING Diwei Zhou1, Ian L. Dryden1, Alexey Koloydenko2 and Li Bai3 1 School of Mathematical Sciences, University of Nottingham, UK 2 Department of Mathematics, Royal Holloway, University of London, UK 3 School of Computer Science, University of Nottingham, UK Keywords: Procrustes analysis, Riemannian metric, Cholesky decomposition, diffusion tensor

Diffusion tensor imaging (DTI) is becoming increasingly important in clinical stud- ies of diseases such as multiple sclerosis and schizophrenia, and also in investigating brain connectivity. Hence, there is a growing need to process diffusion tensor (DT) images within a statistical framework based on appropriate mathematical metrics. However, the usual Euclidean operations are often unsatisfactory for diffusion ten- sors due to the symmetric, positive-definiteness property. A DT is a type of covari- ance matrix and non-Euclidean metrics have been adapted naturally for DTI process- ing. In this paper, Procrustes analysis has been used to define a weighted mean of diffusion tensors that provides a suitable average of a sample of tensors. For com- parison, six geodesic paths between a pair of diffusion tensors are plotted using the Euclidean as well as various non-Euclidean distances. We also propose a new mea- sure of anisotropy -Procrustes anisotropy (PA). Fractional anisotropy (FA) and PA maps from an interpolated and smoothed diffusion tensor field from a healthy hu- man brain are shown as an application of the Procrustes method.

86 14 RSC 2010: Warwick University

RSC 2010

33rd Research Students’ Conference in Probability and Statistics

12th - 15th April 2010

[email protected] www.warwick.ac.uk/fac/sci/statistics/postgrad/rsc/2010

87 15 Sponsors’ Advertisements

Lancaster Postgraduate Statistics Centre

Short Course Programme for 2008/09

Title Date

Introduction to SPSS I 29-30 October 2008

R 13-14 November 2008

Generalised Linear Models 19-20 November 2008

Secondary Data A n a l y s i s 27 November 2008

Atlas.ti 4 December 2008

Duration Analysis 21-22 January 2009

Data Mining Techniques 4-5 February 2009

Questionnaire Design 12 February 2009

Sampling Design 13 February 2009

STATA 18-19 February 2009

Bayesian Methods 25-26 February 2009

Intermediate SPSS II 4-5 March 2009

Atlas.ti 11 March 2009

Structural Equation Modelling 18-19 March 2009

Multi-Level Models in STATA 22-23 April 2009

Methods for Missing Data 29-30 April 2009

Longitudinal Data Analysis 5-6 May 2009

Postgraduate Research Students from UK Academic Institutions:

The cost will be £25.00 for a one day course and £50 for a two day course.

A similar course timetable runs each year. For more information please go to http://www.maths.lancs.ac.uk/psc.

88 Man Group The Man Group, one of the world’s largest quoted hedge funds, is operating in an environment which has changed dramatically in recent years. Twenty fi ve years ago, hedge funds were relatively unknown and their assets tiny. Today, the industry is an established category, with infl uence, purpose, ambition, 9000 businesses and nearly two trillion US dollars under management.

Man, while heavily involved in alternative investment, (or the hedge fund business as it is sometimes described) has invested in and expanded across a broad range of hedge fund styles to build a balanced business, meeting the expectations of our investors and our shareholders. Our business model comprises both Fund of Funds (RMF and Man Glenwood Strategies), and the operation of our own single funds (AHL and Ore Hill1). As a result of its position, record and potential, major investment institutions now own 85% of Man’s shares.

1. Man Investments entered into a strategic partnership with Ore Hill in May 2008. Ore Hill is not a wholly owned subsidiary of Man Group plc.

89 Career development Man does not have a hierarchical multilayered structure. There are no levels and grades. Development is personal in that individuals are expected to take early responsibility and are supported in building their skills and relationships.

Most of Man’s businesses and functions are led by people who have naturally emerged from this informal approach, and feel strongly that this has given them more scope to grow than more traditional approaches would have. Individuals have opportunities to take up assignments outside of their home location (approximately 7% of our workforce is on an international assignment) and we will actively craft roles to develop strong performers.

For such a major FTSE company why is the profi le of Man so low?

Awareness of Man is low compared to many less valuable companies in the FTSE, who employ thousands more people and are household names. Man does not have a customer base of millions and its selective business relationships are established without the need for advertising and publicity. However, Man’s corporate responsibility and charitable activity has led to the sponsorship of the Man Booker Prize, Saracens Rugby, England Hockey and London Youth Rowing plus numerous charities to whom Man contributed USD 12m in the last nancial year. The investment community is an important audience for Man and the growth and innovation of the rm has ensured a high pro le there.

90 Investing in the future: In 2007 the University of Oxford and Man jointly established a new interdisciplinary academic research centre – the Oxford Man Institute of Quantitative Finance (OMI) – for the study of quantitative fi nance with a particular focus on alternative investments.

It’s a rare beast: a multidisciplinary academic research centre that shares state-of-the-art of ces with the commercially focused Man Research Laboratory (MRL). OMI is independent of Man and is run by Oxford University academics.

The role of MRL is to undertake commercial research projects for the various quantitative groups within Man, and in particular, for its wholly owned subsidiary fund manager AHL. Although quantitative techniques are widely used throughout Man, it is within AHL that they have been used extraordinarily successful for more than 20 years. MRL also provides Man with a stimulating and focussed research environment away from the operational practicalities of actively trading in the global nancial and commodity markets and running one of the world’s largest hedge funds.

The co-location of MRL and OMI provides a unique purpose designed working environment, enabling and promoting close day-to-day interaction between academic and commercial researchers. Ultimately, the aim for both the University and Man is to create a stimulating environment of research and innovation where ideas  ourish and experts from a wide spectrum of disciplines can bring their skills into collaboration and learn from each other.

91 We encourage you to apply direct to Man.

We encourage applications direct to our websites. To learn more about early careers with Man Group visit our graduate recruitment site www.mancareers.com.

For roles requiring more experience or for more general information on the Man Group or AHL visit www.mangroupplc.com

Man Investments is a member of the Man Group www.mangroupplc.com

92 The International Conference of the Royal Statistical Society

7-11 September 2009 Edinburgh 2009 RSS

Statistics in a changing society: 175 years of progress

I Extended deadline for I Application for submission of abstracts conference grants: for RSC attendees: 19 May 7 April I Early registration deadline: 1 June I Main registration deadline: 4 August I Pre-conference short courses: 7 September I Main conference: 8–11 September

For more information visit: www.rss.org.uk/rss2009

93 Royal Statistical Society Young Statisticians

G G GG G

New for 2009 G G G the RSS has created a section specifically for caJreer-young stoatisticians. in

The Young Statisticians Section aims to unite young statisticians from all sectorsu, fields of reseas rch and areas of the UK and to provide both a social and professional resource centre. Formal and social events, often in connection with existing RSS events, will be organised throughout the year and the endorsement of networking is a high priority for the YSS committee!

The Young Statisticians Section have an exciting line up of events for 2009 including a Careers Day in London on June 15th. This will involve talks from statisticians at different stages in their careers and a CPD plenary session which focuses on developing your skills as a professional statistician.

Visit www.rss.org.uk/yss

Or add ‘RSS Young Statisticians Section’ as a friend on Facebook!

Contact Jenny Lannon ([email protected]) for more details.

94 Innovation in Sports Modelling

Sports Modelling Vacancies

ATASS Sports is a leading statistical research innovative analysis and research. Our new recruits consultancy business delivering high‑quality sports will work within close-knit multi-skilled teams to models and predictions. model a variety of sports markets, obtaining and incorporating real-time information, and developing We are currently looking to fill a range of full-time and applying novel statistical and mathematical positions to work as part of our existing and modelling techniques. planned research teams. We have both senior and junior positions available for applied statisticians, The closing date for this round of appointments mathematical modellers, database managers is April 9th 2009. Applications or requests and IT support staff. Generous salary and benefits for further information should be addressed to packages are the norm and depend upon position, Steve Brooks via [email protected]. Additional qualifications and experience. All posts will be information can also be found on our web site – based at our newly developed office complex on www.atassltd.co.uk. ATASS values a diverse the Exeter business park. The ideal applicants will workforce and welcomes applications from any combine a passion for sports with a passion for suitably qualified individuals. www.atassltd.co.uk

95 32nd Research Students’ Conference in Probability and Statistics 23rd – 26th March 2009 Lancaster University p Read about our statistical work here! At our Laboratories in Sandwich, near Dover, you will find the European hub of Pfizer Global Research & Development (PGRD). The site is home to many statisticians who contribute to all aspects of the discovery and development of new medicines, for both humans and animals. Individuals may work on projects based locally in Sandwich or on a project that is taking place in almost every continent. The development of a We focus on designing programmes new compound into an of experiments and studies that will approved medicine enable quality decisions to be made about the future of each new compound typically follows a long, or potential new medicine. and sometimes complex, path. We make decisions of ever-increasing We also help the decision makers complexity in a climate understand any possible biases and of uncertainty. uncertainties in the data underlying a decision. In turn, this ensures that better medicines reach the patient as soon as For each new compound possible, whilst those candidate that eventually receives medicines with potential difficulties are a licence as a new discarded early. medicine, many

thousands of others will We work closely with scientists, have been synthesized, clinicians and other professionals and tested and discarded. rely on the support of a team of statistical programmers. As well as having excellent, practical statistical skills, we rapidly develop a good knowledge of the scientific or medical area that we are supporting.

We work continuously on our interpersonal and consulting skills.

96

Statistics and Chemometrics Making decisions with confidence

About us

The Statistics and Chemometrics team includes statisticians, data analysts, chemometricians and modellers who help clients in the commerce, finance, process and product development industries to develop better business solutions.

The team draws on Shell Group experience of providing cutting-edge consultancy, software, innovation and training for more than 30 years to serve clients worldwide from bases in the UK, the Netherlands and the USA.

Website: http://www.shell.com/globalsolutions/statisticsandchemometrics

Email: [email protected]

97 Process Solutions

� Statistical process modelling

� Process solutions and software for optimising performance and operating cost, e.g. pre-heat units

� Process software for assuring integrity of pipework

� Tools and techniques to emulate and optimise process conditions

Chemometrics

� Process Analytical Chemistry using advanced spectroscopy with multivariate calibration models, e.g. for MOGAS blending

� Advanced Process Monitoring using multivariate statistical techniques, e.g. Dynamic Chemical Processes

� Enhanced Experimentation, e.g. catalyst characterisation Electron Microscopy, X-Ray Analysis, Kernels, Multivariate Analysis

Training and Software

� Customised statistical training courses - statistics and design of experiments training

� Customised training on use of specialist software tools

Business Solutions

� Statistical forecasting

� Decision tools on Carbon management

� Risk and uncertainty modelling

� Benchmarking advanced data analysis

Product Development

� Supporting product development in fuels, Shell Global Solutions is a network of independent technology companies in the Shell lubricants, chemicals: e.g. vehicle testing, Group. In this case study, the expressions ‘Shell’ and ‘Shell Global Solutions’ are sometimes used for convenience where reference is made to these companies in general, or where no emission testing useful purpose is served by identifying a particular company.The information contained in this material is intended to be general in nature and must not be relied on as specific advice in connection with any decisions you may make. Shell and Shell Global Solutions are not � Designing experiments to provide evidence liable for any action you may take as a result of you relying on such material or for any loss or damage suffered by you as a result of you taking this action. Furthermore, these materials to support marketing claims do not in any way constitute an offer to provide specific services. Some services may not be available in certain countries or political subdivisions thereof. Photographs are from � Analysing data to help understand effects various installations. Copyright © 2008 Shell Global Solutions International BV. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical including by photocopy, recording or information � Collaborating with product teams to provide storage and retrieval system, without permission in writing from Shell Global Solutions on going support International BV. GS1427690308-En(A)

98 Unilever has research and technology centres around the world that help us respond to fast changing needs, tastes and trends among customers everywhere.

We have six principal research and development laboratories: two in the UK (Colworth House and Port Sunlight), one in the Netherlands (Vlaardingen), one in the US (Trumbull), one in China (Shanghai) and one in India (Bangalore). Our scientists and engineers in these sites work seamlessly with a network of global and regional technology centres that stretches from São Paulo, Brazil in the west, to Shanghai, China in the east.

We spend over €1bn a year on R&D. Our scientists have the resources, the knowledge and the imagination to define and design technology that brings vitality to life, delivering a continuous stream of innovation to our business. Working with some of the world’s leading academic institutions, they explore many fascinating issues. What makes food taste the way it does? What attracts people visually to food? What will the raw materials of the future be? And how can good nutrition help safeguard the health of our children?

99 This role of the Data Scientist at Unilever is very exciting and challenging. The work is never routine. Data Scientists work closely with other Scientists driving and enabling the use of advanced statistical and mathematical approaches to R&D problem solving, developing entirely new R&D methodologies.

Forest plots for gingival index at 3-months and 6-months

MDS paired comparisons

Optimising the cleaning performance of a laundry detergent

100

20% Discount

Buy these and other great titles from the John Wiley & Sons stand

101 16 RSC History

33 2010 Warwick 32 2009 Lancaster 31 2008 Nottingham 30 2007 Durham 29 2006 Glasgow 28 2005 Cambridge 27 2004 Sheffield 26 2003 Surrey 25 2002 Warwick 24 2001 Newcastle 23 2000 Cardiff and University of Wales 22 1999 Bristol 21 1998 Lancaster 20 1997 Glasgow 19 1996 Southampton 18 1995 Oxford 17 1994 Reading 16 1993 Lancaster 15 1992 Nottingham 14 1991 Newcastle 13 1990 Bath 12 1989 Glasgow 11 1988 Surrey 10 1987 Sheffield (9) (8) 1985 Oxford (7) 1984 Imperial College (6) (5) (4) (3) 1982 Cambridge (2) 1981 Bath (1) 1980 Cambridge

Table 1: RSC History

For more information see the Wikipedia entry for “Research Students Conference Probability and Statistics”.

102 17 Delegate List

Name Institution Year Email Research Interests Lancaster 3 (FT) PhD Ms Sawsan Abbas [email protected] Exteme Value Theory, Finance Newcastle University 2 (FT) PhD Ms Noorazrin Abdul Rajak [email protected] Bayesian Experimental Design Lancaster 3 (FT) PhD Ms Aisyaturridha Abdullah [email protected]

103 Durham 1 (FT) PhD Mr Ahmad Aboalkhair [email protected] NPI with system reliability Bristol 3 (FT) PhD Ms Achilleas Achilleos [email protected] NONPARAMETRIC SMOOTHING University Of Warwick 2 (FT) PhD Ms Mouna Akacha [email protected] Missing Data and Repeated Measurements University Of Salerno (FT) Other Ms Giuseppina Albano [email protected] Markov models, Stochastic differential equations University Of Bristol 2 (FT) PhD Mr Matthew Aldridge [email protected] Information theory, communications, probability, Poisson processes Newcastle Upon Tyne 3 (FT) PhD Mr Deyadeen Alshibani [email protected] Name Institution Year Email Research Interests decision making, optimal dynamic treatment regimes NUI Galway 1 (FT) PhD Mr Alberto Alvarez Iglesias [email protected] Survival analysis, Classification and Regression trees. Trinity College Dublin 1 (FT) PhD Mr Louis Aslett [email protected] Bayesian statistics, Bayesian networks, classification Lancaster University 2 (FT) PhD Mr Nammam Azadi [email protected] Medical Statistics, Bayesian Application and Computing University Of Sheffield 3 (FT) PhD Ms Nor Azmee afzalina.azmee@sheffield.ac.uk

104 Clinical trials, pharmaceutical statistics Lancaster 1 (FT) PhD Ms Karen Baird [email protected] Linguistics, structural equation modelling Durham 2 (FT) PhD Mr Rebecca Baker [email protected] nonparametric predictive inference University Of Sheffield 3 (FT) PhD Mr Leonardo Bastos l.bastos@sheffield.ac.uk Bayesian Statistics, computer models Lancaster 1 (FT) PhD Mr Edward Bell [email protected] Stochastic processes, parametric language modelling Liverpool University 1 (FT) PhD Mr Alex Berriman [email protected] Mathematical Modelling, System Dynamics, Epidemiology Trinity College 1 (FT) PhD Mr Arnab Bhattacharya Name Institution Year Email Research Interests [email protected] Spatio-Temporal Models, Bayesian Analysis Aston University 3 (FT) PhD Mr Alexis Boukouvalas [email protected] Gaussian Processes, Emulators, Bayesian Probablistic Modelling, Ex- perimental Design University Of Glasgow 1 (FT) MSc Ms Gillian Boyle [email protected] educational statistics University Of Cambridge 1 (FT) PhD Mr Stephen Burgess [email protected] Causal inference, Mendelian randomization, Bayesian meta-analysis

105 University Of Cambridge 1 (FT) PhD Mr Neil Casey [email protected] Developing statistical methods for use in complex intervention trials Lancaster University 1 (FT) PhD Ms Yu-Jie (Madeleine) Chen [email protected] Panel data analysis, Longitudinal data analysis, categorical data analy- sis University Of Sheffield 3 (FT) PhD Ms Lindsay Collins l.a.collins@sheffield.ac.uk Climate variability, probabilistic sensitivity analysis, emulation. University Of Kent 2 (FT) PhD Ms Eleni-Ioanna Delatola [email protected] Bayesian Nonparametrics, Financial Econometrics Lancaster 2 (FT) PhD Ms Lisha Deng [email protected] Point processes in one-dimension, Patient safety related data sets University Of Kent 2 (FT) PhD Ms Vasiliki Dimitrakopoulou Name Institution Year Email Research Interests [email protected] Bayesian variable selection, Clustering methods, Stochastic search tech- niques National University Of Ireland, Galway 1 (FT) PhD Ms Cara Dooley [email protected] Survival Analysis NUI Galway Ireland 1 (FT) MRes Ms Orla Doolin [email protected] Ordinal regression, medical stats Bath 1 (FT) PhD Ms Susan Doshi [email protected] Image analysis; medical imaging; image guided radiotherapy

106 Durham 3 (FT) PhD Mr Mohamed Elsaeiti [email protected] Nonparametric predective inference and quality control University of Southampton 2 (FT) PhD Mr Hisham Elsayed [email protected] Survival analysis University Of Leeds 2 (FT) PhD Mr Christopher Fallaize [email protected] Shape Analysis, Statistical Bioinformatics Lancaster University 3 (FT) PhD Mr Tom Fanshawe [email protected] Geostatistics Medical Statistics Spatial Statistics Environmental Epi- demiology University Of Glasgow 1 (FT) PhD Ms Maria Franco [email protected] environmental statistics University Of Sheffield 3 (FT) PhD Mr Tom Fricker Name Institution Year Email Research Interests [email protected] Gaussian process, emulator, coregionalization, nonseparable, computer-experiment University Of Glasgow 1 (FT) MRes Ms Rachael Fulton [email protected] Genetic statistics, systems biology, bioinformatics Newcastle University 3 (FT) PhD Ms Sarah Germain [email protected] Bayesian spatio-temporal modelling; hidden Markov models; elicita- tion; environmental statistics. Lancaster University 3 (FT) PhD Mr Vasileios Giagos [email protected]

107 Difussion Process, autoregulatory networks, biochemical reactions Bristol University 1 (FT) PhD Ms Antri Giannakou [email protected] prognostic modelling longitudinal data analysis Warwick 2 (FT) PhD Mr Flavio Goncalves [email protected] Inference for stochastic processes. Warwick 1 (FT) PhD Mr Robert Goudie [email protected] Social Networks, Health, Epidemiology University Of Southampton 2 (FT) PhD Ms Erofili Grapsa [email protected] Bayesian inference, survey sampling theory, weights, mixed effects model University Of Cambridge 2 (FT) PhD Ms Hui Guo [email protected] Causal inference, propensity score analysis Name Institution Year Email Research Interests University Of Glasgow 1 (FT) PhD Ms Ruth Haggarty [email protected] Environmental Statistics University Of Warwick 2 (FT) PhD Ms Bryony Hill [email protected] Spatial Statistics, Point Patterns, Tensors, Fingerprints, Galaxies, MCMCs Durham University 2 (FT) PhD Mr Nathan Huntley [email protected] Decision Theory, Imprecise Probability Lancaster 3 (FT) PhD Mr Farhat Iqbal [email protected]

108 Financial Time Series Modelling, GARCH, Diagnostic testing University Of Durham 2 (FT) PhD Mr Amin Jamalzadeh [email protected] Web mining, Data mining, Bayesian Networks, Bayes linear statistics University Of Sheffield 1 (FT) PhD Ms Emma Jones [email protected] Statistics/ Dendrochronology/ Archeology University Of Dublin 2 (FT) PhD Mr Chaitanya Joshi [email protected] Bayesian Inference, Diffusion Processes MRC - Biostatistics Unit, University Of Cambridge 1 (FT) PhD Mr Venediktos Kapetanakis [email protected] Markov, multistate, semi-Markov University Of Bristol 3 (FT) PhD Mr Georgios Karagiannis [email protected] Markov Chain Monte Carlo, Bayesian Statistics University Of Warwick 1 (FT) PhD Mr Md. Hasinur Khan Name Institution Year Email Research Interests [email protected] Bayesian, Variable Selection, Survival Data, Microarray Durham University 2 (FT) PhD Ms Shiler Khedri [email protected] Model Selection, Categorical Data, Contingency Table, MCMC, Net- work algorithm Lancaster University 1 (FT) PhD Ms Rebecca Killick [email protected] Wavelets Change Points University Of Nottingham 3 (FT) PhD Mr Edward Knock [email protected] Stochastic Epidemic Modelling

109 University Of Strathclyde 3 (FT) PhD Ms Karen Lamb [email protected] Multilevel modelling, population dynamics, generalised linear models University Of Oxford 2 (PT) PhD Ms Shu Lee [email protected] Social Networks analysis Newcastle University 3 (FT) PhD Mr Nan Lin [email protected] Misspecified model approach to missing data problem University Of Bath 3 (FT) PhD Mr Geoff Littler [email protected] Bayesian Statistics MCMC Nonparametrics Lancaster University 3 (PT) PhD Ms Jiayi Liu [email protected] MIXTURE MODEL Cambridge 2 (FT) PhD Ms Yang Luo [email protected] Name Institution Year Email Research Interests Stochastic modelling; model selection ; excitable system University Of Nottingham 2 (FT) PhD Ms Xiaojuan Ma [email protected] MARKOV CHAIN, OPTION PRICING +MONTE CARLO Glasgow 1 (FT) PhD Ms Colette Mair [email protected] genetics Imperial College London 1 (FT) PhD Mr Timothy Mak [email protected] Evidence synthesis, bias modelling, medical statistics University Of West Bohemia, Czech Republic 1 (FT) PhD Mr Patrice Marek [email protected]

110 risk, finance Southampton 2 (FT) PhD Mr Christopher Marley [email protected] Design and Analysis of Experiments University Of Bath 2 (FT) PhD Mr Giampiero Marra [email protected] generalized additive modelling, model selection Warwick 4 (FT) PhD Mr Tristan Marshall [email protected] Monte Carlo, Markov Chain, Diffusion, Stochastic Differential Equa- tions, Adaptive Southampton University 1 (FT) PhD Mr Kieran Martin [email protected] Time Series Analysis Long-Memory Seasonally Persistent Processes Pa- rameter Estimation, Model Selection Southampton University 1 (FT) PhD Mr Kieran Martin [email protected] Name Institution Year Email Research Interests experimental design non-linear models National University Of Ireland, Galway 1 (FT) PhD Ms Nur Mat Yusoff [email protected] Statistical Modeling Bristol University 1 (FT) PhD Mr Benedict May [email protected] Multi-armed bandits, reinforcement learning, regression, game theory Institute Of Child Health, University College London 2 (FT) PhD Ms Fiona McElduff [email protected] discrete distributions University Of Bristol 2 (FT) PhD Ms Skevi Michael [email protected]

111 Stochastic Processes, Random trees + University Of Bath 1 (FT) PhD Mr David Miller [email protected] Ecological modelling, GAMs, computer-intensive statistics Newcastle 3 (FT) PhD Mr Peter Milner [email protected] Stochastic, modelling, approximations, moment closure, systems biol- ogy University Of Surrey 1 (FT) PhD Ms Rofizah Mohammad [email protected] Bayesian Statistics University Of Oxford 2 (FT) PhD Ms Helene Neufeld [email protected] Graphical Models, Symmetry, Group Invariance, Causality Lancaster University 2 (FT) PhD Mr Adalbert Ngongang [email protected] Time series analysis, Bayesian methods Name Institution Year Email Research Interests University Of Glasgow 3 (FT) PhD Mr David O’Donnell [email protected] Spatial Environmental Rivers UCD Dublin 2 (FT) PhD Mr Adrian O’Hagan [email protected] Generative Discriminative LDA Logistic Regression Gibbs Metropolis- Hastings University Of Cambridge 1 (FT) PhD Mr Aidan O’Keeffe [email protected] Likelihood based statistical modelling and causal inference Warwick University 3 (FT) PhD Ms Thais Oliveira da Fonseca [email protected]

112 Bayesian Statistics; Space-time models; UCL 2 (PT) PhD Mr Menelaos Pavlou [email protected] Repeated measurements, Longitudinal data, informative cluster size University Of Liverpool 2 (FT) PhD Mr Christopher Pearce [email protected] Applied Probability, Epidemiology University Of Leeds 3 (FT) PhD Mr Samuel Peck [email protected] Wavelets, lifting The University Of Warwick 1 (FT) PhD Mr Duy Pham [email protected] socialising, sports, movies University Of Glasgow 1 (FT) PhD Ms Helen Powell [email protected] Modelling the effects of air pollution on the number of respiratory ad- missions to hospitals. Name Institution Year Email Research Interests Lancaster 2 (FT) PhD Mr Dennis Prangle [email protected] Approximate Bayesian Computation, MCMC Bath 2 (FT) PhD Ms Rosalba Radice [email protected] Bayesian Phylogentic Network University Of Sheffield 1 (FT) PhD Ms Siti Rahayu stp08sm@sheffield.ac.uk MULTIVARIATE ANALYSIS IN QUALITY CONTROL Durham University 3 (FT) PhD Mr David Randell [email protected] Bayes Linear, Covariance Learning, Exchangeability, Mahalanobis Dis-

113 tance + University Of Sheffield 2 (FT) PhD Ms Shijie Ren [email protected] assurance, prior elicitation University Of Salerno (FT) Other Ms Marialuisa Restaino [email protected] Survival Analysis, Multi-State Models Lancaster 3 (FT) PhD Mr Alexandre Rodrigues [email protected] Spatio-temporal ; convolution-based models ; point process University Of Warwick 2 (FT) PhD Ms Jennifer Rogers [email protected] Survival Analysis, Recurring Events, Epilepsy University Of Cambridge 1 (FT) PhD Ms Verena Roloff [email protected] Sequential Meta-Analysis Queen Mary University Of London 2 (FT) PhD Ms Maria Roopa Name Institution Year Email Research Interests [email protected] Dose Escalation Trials Salford University 2 (FT) PhD Ms Nor Azah Samat [email protected] Statistical modelling +spatial epidemiology +disease mapping +dengue disease Newcastle University 2 (FT) PhD Mr Javier Serradilla [email protected] Multivariate Statistical Process Control, Gaussian Process Regression, Multivariate Statistics University Of Plymouth 1 (FT) PhD Ms Golnaz Shahtahmassebi [email protected]

114 Modelling and predicting financial data behaviour specially consider- ing high frequency data and its special characteristics. NUI, Galway 2 (FT) PhD Mr Andrew Simpkin [email protected] Derivative Estimation, Smoothing, Splines University Of Bristol 2 (FT) PhD Mr Andrew Smith [email protected] Nonparametric regression Image analysis University Of Bristol 1 (FT) PhD Mr Andrew Smith [email protected] Shape analysis with medical applications. Bristol University 2 (FT) PhD Mr Michalis Smyrnakis [email protected] sequential monte carlo methods, game theory, probability collectives Lancaster 3 (FT) PhD Mr Matthew Sperrin [email protected] Genetics, High dimensional Name Institution Year Email Research Interests Lancaster University 2 (FT) PhD Ms Michelle Stanton [email protected] spatial/spatio-temporal epidemiology, tropical disease epidemiology University Of Southampton 1 (FT) PhD Ms Natalie Staplin [email protected] Informative censoring in survival models for liver transplants University Of Bristol 1 (FT) PhD Ms Kara Stevens [email protected] Time Series Analysis Wavelets University Of Nottingham 3 (FT) PhD Ms Nicola Stone [email protected] Gaussian Process Emulation, Bayesian Analysis

115 University Of Cambridge 1 (FT) PhD Mr Alexander Strawbridge [email protected] Measurement error in non-linear dose-response relationships Lancaster University 2 (PT) PhD Mr David Suda [email protected] stochastic processes, bayesian inference, computational statistics Trinity College Dublin 2 (FT) PhD Mr James Sweeney [email protected] Spatial Statistics, GMRFS Lancaster University 2 (FT) PhD Mr Benjamin Taylor [email protected] Sequential Monte Carlo / Particle Filtering, MCMC Bath 1 (FT) PhD Ms Jane Temple [email protected] Statistics, clinical trials, dose response University Of Oxford 2 (FT) PhD Mr Tjun Teo [email protected] Name Institution Year Email Research Interests Classification Theory University Of Surrey 2 (FT) PhD Ms Helen Thornewell [email protected] Experimental Design University Of West Bohemia, Czech Republic 3 (FT) PhD Mr Tomas Toupal [email protected] Risk, Finance Trinity College Dublin 3 (FT) PhD Ms Richa Vatsa [email protected] Improving the Variational Bayes Method for high dimensional latent models. University Of Nottingham 1 (FT) PhD

116 Ms Eleni Verykouki [email protected] Bayesian Statistics, MCMC Trinity College Of Dublin 1 (FT) PhD Ms Rountina Vrousai [email protected] paleoclimate reconstruction bayesian methods Lancaster University 1 (FT) PhD Ms Jenny Wadsworth [email protected] Extreme value theory University Of Cambridge 1 (FT) PhD Mr Dennis Wang [email protected] genomics, bioinformatics, statistical genetics, bayesian inference Durham 3 (FT) PhD Mr Daniel Williamson [email protected] Computer Models, Decision Theory, Emulation. Newcastle 2 (FT) PhD Mr Kevin Wilson [email protected] Bayesian Bayes linear Name Institution Year Email Research Interests Warwick 3 (FT) PhD Mr Peter Windridge [email protected] optimal control, interacting particle systems, combinatorics, optimal stopping Nottingham 1 (FT) PhD Mr Lei Yan [email protected] statistical image analysis (MEG) Queenmary, University Of London 1 (FT) PhD Ms Wai Yeung winnie [email protected] Markov Chain University Of Sheffield 2 (FT) PhD Mr Ben Youngman b.youngman@sheffield.ac.uk

117 Extreme value theory; spatiotemporal modelling. Lancaster University 2 (FT) PhD Ms Zakiyah Zain [email protected] score statistics, global test, interval censored survival data Durham 2 (FT) PhD Mr Mohammad Zayed [email protected] Fitting local principal curves through multidimensional data sets. University Of Nottingham 3 (FT) PhD Ms Diwei Zhou [email protected] Statistical Image Analysis Sheffield University 3 (FT) PhD Ms Lu Zou stp06lz@sheffield.ac.uk multivariate imputation University Of Warwick 2 (FT) PhD Mr Piotr Zwiernik [email protected] algebraic statistics, geometric foundations of statistical inference