.

33rd Research Students’ Conference in Probability and Statistics

12th -15th April 2010 Conference Proceedings

Timetable of Events

Monday 12th April

13:00 Registration of Delegates (The Street)

15:00 Afternoon Tea (The Street)

15:30 Opening Address & Plenary Session (MS.01, Maths/Stats Building)

15:30 Opening Address: Prof. Jane L. Hutton () 15:45 Plenary Talk I: Prof. Jim Q. Smith (University of Warwick) 16:20 Plenary Talk II: Dr. Jonathan Rougier (University of Bristol) 16:55 Announcements/Housekeeping

18:00 Dinner (Rootes Social Building)

19:00 Pub Quiz (Varsity Pub)

Tuesday 13th April

07:30 Breakfast (Rootes Social Building)

09:10 Session 1 (Math/Stats Building)

11:10 Refreshments (The Street)

11:30 Session 2 (Math/Stats Building)

13:30 Lunch (The Street)

14:30 Session 3 (Math/Stats Building)

16:10 Poster Session and Refreshments (The Street)

18:00 Dinner (Rootes Social Building)

19:00 Evening Entertainment ( City Centre) 19:00 Bus Collection by Students Union to Cross Point Business Park (Bowling and Cinema) 19:30 Bus Collection by Students Union to Town Hall (Pub Crawl)

3 22:00 Bus Collection from Bowling and Cinema to Campus

23:30 First Bus Collection from Pub to Campus

00.30 Second Bus Collection from Pub to Campus

Wednesday 14th April

07:30 Breakfast (Rootes Social Building)

09:10 Session 4 (Math/Stats Building)

11:10 Refreshments (The Street)

11:30 Session 5 (Math/Stats Building)

13:30 Lunch (The Street)

14:30 Sponsors’ Talks (Math/Stats Building)

16:10 Sponsors’ Wine Reception (The Street)

18:15 Bus Collection to Conference Dinner (Coventry Transport Museum)

22:15 First Bus Collection to Campus

23:45 Second Bus Collection to Campus

Thursday 15th April

07:30 Breakfast (Rootes Social Building)

09:30 Delegates Depart

4 Contents

1 Welcome from the Organisers 7

2 The City and University 8

3 Campus Map 11

4 University Facilities 12

5 Accommodation 12

6 Conference Details 13 6.1 Meals ...... 13 6.2 Sponsors’ Wine Reception ...... 13

7 Help, Information and Telephone Numbers 14 7.1 Departmental Computing and Internet Access ...... 14

8 Instructions 15 8.1 For Chairs ...... 15 8.2 For Speakers ...... 15 8.3 For Displaying a Poster ...... 16 8.4 Prizes ...... 16

9 Plenary Session 17 9.1 Professor Jane L. Hutton (University of Warwick) ...... 17 9.2 Professor Jim Q. Smith (University of Warwick) ...... 18 9.3 Dr. Jonathan Rougier (University of Bristol) ...... 19

10 List of Sponsors’ Talks 20

11 Talks Schedule 21 11.1 Monday 12th April ...... 21 11.2 Tuesday 13th April ...... 22 11.3 Wednesday 14th April ...... 28

12 Talk Abstracts by Session 32 12.1 Tuesday 13th April ...... 32 12.1.1 Session 1a: Image Analysis ...... 32 12.1.2 Session 1b: Computational Statistics ...... 36 12.1.3 Session 1c: Operational Research ...... 39 12.1.4 Session 1d: Statistical Inference ...... 42 12.1.5 Session 2a: Medical Statistics I ...... 45 12.1.6 Session 2b: Financial ...... 48 12.1.7 Session 2c: Elicitation and Epidemiology ...... 51 12.1.8 Session 2d: Multivariate Statistics ...... 54

5 12.1.9 Session 3a: Genetics ...... 56 12.1.10 Session 3b: Medical Statistics II ...... 59 12.1.11 Session 3c: Dimension Reduction ...... 62 12.1.12 Session 3d: Environmental ...... 65 12.2 Wednesday 14th April ...... 68 12.2.1 Session 4a: Medical Statistics III ...... 68 12.2.2 Session 4b: Point Processes and Spatio-temporal Statistics . . . 71 12.2.3 Session 4c: General ...... 74 12.2.4 Session 4d: Graphical Models and Extreme Value Theory . . . . 77 12.2.5 Session 5a: Experimental Design and Population Genetics . . . 79 12.2.6 Session 5b: Censoring in Survival Data and Non-Parametric Statistics ...... 82 12.2.7 Session 5c: Time Series and Diffusions ...... 85 12.2.8 Session 5d: Probability ...... 88 12.2.9 Session 6a: Sponsors’ Talks ...... 91 12.2.10 Session 6b: Sponsors’ Talks ...... 92 12.2.11 Session 6c: Sponsors’ Talks ...... 93

13 Poster Abstracts by Author 94

14 RSC 2011: Cambridge University 103

15 Sponsors’ Advertisements 104

16 RSC History 115

17 Delegate List 116

6 1 Welcome from the Organisers

Welcome to the 33rd Research Students’ Conference in Statistics and Probability (RSC 2010). This year the conference is hosted by the University of Warwick. The RSC is an annual event aiming to provide postgraduate statisticians and probabilists with an appropriate forum to present their research. This four day event is organised by postgraduates, for postgraduates, providing an excellent opportunity to make con- tacts and discuss work with other students, who have similar interests. For many students this will be your first experience of presenting your work, with some of you also taking the opportunity to chair a session. For those of you attending and not presenting, we hope that you will benefit greatly from observing others and networking with researchers working in a similar field. Finally, we will be looking for potential hosts for RSC 2012. If you think your institution would be keen to take part in such an exciting project, please let us know. Next year the conference will be held in Cambridge.

Mouna Akacha, Flavio´ Gonc¸alves, Bryony Hill and Jennifer Rogers Conference Organisers

7 2 The City and University

The University of Warwick is one of the leading UK research universities and is ranked number 1 in the Midlands. Consistently ranked in the top ten of UK uni- versities, it is an entrepreneurial institution that has a large positive impact on local and regional communities. The University is located in the heart of , 3 miles (5 kilometres) from Coventry city centre, on the border with Warwickshire.

Coventry Coventry is a city and metropolitan borough in the county of West Midlands in Eng- land. Coventry is the 9th largest city in England and the 11th largest in the United Kingdom. It is also the second largest city in the English Midlands, after Birmingham. Coventry is situated 95 miles (153 km) northwest of and 19 miles (30 km) east of Birmingham, and is farthest from the coast of any city in Britain. Although harbouring a population of almost a third of a million inhabitants, Coventry is not amongst the English Core Cities Group, due to its proximity to Birmingham. Coventry was also the world’s first ‘twin’ city when it formed a twinning relation- ship with the Russian city of Stalingrad (now Volgograd) during World War II. The relationship developed through ordinary people in Coventry who wanted to show their support for the Soviet Red Army during the Battle of Stalingrad. The city is now twinned with Dresden and with 27 other cities around the world. Coventry Cathedral is one of the newer cathedrals in the world, having been built following the World War II bombing of the ancient cathedral by the Luftwaffe. Coventry motor companies have contributed significantly to the British motor indus- try, and it has two universities, the city centre-based Coventry University as well as the University of Warwick on the southern outskirts. In the late 19th century, Coventry became a major centre of bicycle manufacture, with the industry being pioneered by Rover. By the early 20th century, bicycle man- ufacture had evolved into motor manufacture, and Coventry became a major centre of the British motor industry. Over 100 different companies have produced motor vehicles in Coventry, but car production came to an end in 2006 as the last car rolled off the lines at Peugeot’s Ryton plant. Production was transferred to a new plant near Trnava, Slovakia, with the help of EU grant aid to Peugeot: this made Peugeot deeply unpopular in the city. The design headquarters of Jaguar Cars is still in the city at their Whitley plant and although they ceased vehicle assembly at their Browns Lane plant in 2004, they still continue some operations from there. A major visitor attraction in Coventry city centre is the free-to-enter Coventry Transport Museum, which has the largest collection of British-made road vehicles in the world and will be the venue for our Conference Dinner. The most notable exhibits are the world speed record-breaking cars, Thrust2 and ThrustSSC. The museum re- ceived a major refurbishment in 2004 which included the creation of a striking new entrance as part of the city’s Phoenix Initiative project. The revamp saw the museum exceed its projected five-year visitor numbers within the first year alone, and it was

8 a finalist for the 2005 Gulbenkian Prize. The most famous daughter of Coventry is Lady Godiva. Her ride through the streets of the city has passed into legend. According to the popular story, Lady Go- diva took pity on the people of Coventry, who were suffering grievously under her husband’s oppressive taxation. Lady Godiva appealed again and again to her hus- band, who obstinately refused to remit the tolls. At last, weary of her entreaties, he said he would grant her request if she would strip naked and ride through the streets of the town. Lady Godiva took him at his word and, after issuing a proclamation that all persons should stay indoors and shut their windows, she rode through the town, clothed only in her long hair. Today a statue positioned in the heart of the city centre is reminding of her braveness.

The University The establishment of the University of Warwick was given approval by the govern- ment in 1961 and received its Royal Charter of Incorporation in 1965. It straddles the boundary between the City of Coventry and the County of Warwickshire. The idea for a university in Coventry was mooted shortly after the conclusion of the Second World War, but it was a bold and imaginative partnership of the City and the County which brought the University into being on a 400 acre site jointly granted by the two authorities. Since then, the University has incorporated the former Coventry College of Education in 1978 and has extended its land holdings by the purchase of adjoining farm land. The University initially admitted a small intake of graduate students in 1964 and took its first 450 undergraduates in October 1965. In October 2009, the student pop- ulation was over 21,598 of which around 9008 are postgraduates. 25% of the student body comes from overseas and over 125 countries are represented on the campus. The University has 29 academic departments and over 50 research centres and insti- tutes, in four Faculties: Arts, Medicine, Science and Social Sciences. The University hosts two HEFCE Centres for Excellence in Learning and Teaching (CETLs): CAPI- TAL and Reinvention. The new Medical School took its first students on an innova- tive 4-year accelerated postgraduate programme in September 2000. In summer 2004 the first 64 students graduated from the school. In October 2004 the combined intake of the Warwick Medical School was 403, making it one of the largest in the country. From its beginnings, the University has sought to be excellent in both teaching and research. It has now secured its place as one of the UK’s leading research univer- sities, confirmed by the results of the government’s Research Assessment Exercises since 1986. In all of these, Warwick has been placed in the top half dozen or so of uni- versities for the quality of its research. The results of the 2008 Research Assessment Exercise (RAE) again reiterate Warwick’s position as one of the UK’s leading research universities, with Warwick ranked at 7th overall in the UK (based on multi-faculty institutions). The University of Warwick campus was recently voted the best campus in the UK. It’s a lively, cosmopolitan place with its own shops, banks, bars and restaurants - an

9 exciting place to live and work with everything you could need close at hand. There is a great sense of community at Warwick: The campus is home to students and staff from over 120 different countries and from all backgrounds, and is a great resource for the local community with excellent facilities such as Warwick Arts Centre and the University Sports Centre. The campus is continually developing; in August 2008 Warwick Digital Laboratory was opened by Prime Minister Gordon Brown, and the indoor tennis centre at Westwood campus was opened in March 2008. The campus is situated on three adjacent sites: Central campus, Gibbet Hill campus and West- wood campus. There are lakes and woods, trees and landscaped gardens but whilst the campus has many green open spaces, inside the buildings ground-breaking re- search is taking place and academics and students are sharing their knowledge and experience.

The Department The Department of Statistics at the University of Warwick is one of the largest UK concentrations of researchers in statistics and probability, and the synergy between probabilistic and statistical research is particularly strong. The research environment is vibrant, with a large and active community of PhD students and postdoctoral researchers, excellent library, computing and other research support facilities, and sustained programmes of research seminars, workshops and international visitors. There are strong research links with other disciplines both at Warwick and externally. Research related activities (seminars, workshops, visitors, etc.) take place mainly through three long-term initiatives: CRiSM (Centre for Research in Statistical Method- ology), P@W (Probability at Warwick) and RISCU (Risk Initiative and Statistical Con- sultancy Unit). CRiSM is funded by EPSRC and HEFCE, as well as Warwick, as a national Science and Innovation investment. P@W is a focus for inter-departmental probability research at Warwick and for the organisation of externally open research workshops and training events in probability, while RISCU provides resources for developing applied research collaborations with industry, commerce, government and other outside bodies, and with other academic disciplines. The Department’s research ranges from probability theory, through computation and statistical methodology, to substantive applications in many different fields. In the most recent national Research Assessment Exercise (RAE 2008), the Department had 70% of its activity rated as internationally excellent (grade 3* or higher), with more than a quarter classed as world leading (grade 4*). For publications by members of the department, please see individual staff web pages. The Department leads the EPSRC-funded Academy for PhD Training in Statistics, a collaboration with eight other prominent UK research groups to organise intensive courses for first-year PhD students. From 2010 a further new feature of our PhD pro- vision is the EPSRC-funded MASDOC initiative for doctoral training at the interface between statistics and applied mathematics.

10 3 Campus Map

A B C D E F G H

1 67

44

57 33 28 68 27 51 2

20 67 16 66 70 2 19 69

5 40 17 59

31 38

50

3 30 3 21 60

41

62 23 43 42

1

26 13 35 46 47 4 11 18

64 65 32

53 24 39 63

12 10 45

15 61 14 5 54

58

52 71 H

6 E

55 A L

49 56 T

H

C

E

N

T

R

E E

R

O

A D 29 22 25 4 6 48 48

BUILDING KEY International Automotive Research Centre (IARC)...... 1...... E4 Medical Teaching Centre...... 37...... D8 Arden...... 2...... F2 Millburn House...... 38...... F3 Argent Court, incorporating Modern Records Centre & BP Archive...... 39...... D5 Estates, AdsFab & Jobs.ac.uk...... 3...... G3 Music...... 40...... H2 Arthur Vick ...... 4...... F6 Nursery...... 41...... C3 Avon Building, incorporating Drama Studio...... 5...... G2 Physical Sciences...... 42...... D4 Benefactors...... 6...... C5 Physics...... 43...... D4 Biological Sciences...... 7...... D8 Porters & Postroom...... 44...... G1 Biomedical Research...... 8...... D8 Psychology...... 45...... E5 7 Gibbet Hill Farmhouse...... 9...... C8 Radcliffe...... 46...... C4 Chaplaincy...... 10...... D5 Ramphal Building...... 47...... D4 Chemistry...... 11...... D4 Rootes...... 48.C6/D6 Claycroft...... 12...... G5 Rootes Building...... 49...... C5 Computer Science...... 13...... E4 Scarman...... 50...... C3 Coventry House...... 14...... D5 Science Education...... 51...... H2 Cryfield, Redfern & Hurst...... 15...... B5 Shops...... 52...... D5 Dining & Social Building Westwood...... 16...... G2 Social Sciences...... 53...... D4 Education, Institute of, incorporating Sports Centre...... 54...... E5 Multimedia CeNTRE & TDA Skills Test Centre...... 17...... H2 Sports Pavilion...... 55...... A5 Engineering...... 18...... E4 Students’ Union...... 56...... D5 Engineering Management Building...... 19...... F2 Tennis Centre...... 57...... F2 Games Hall...... 20...... E2 Tocil...... 58...... F5 Gatehouse...... 21...... D3 University House, incorporating Learning Grid...... 59...... E2 Health Centre...... 22...... D6 Vanguard Centre...... 60...... G3 Heronbank...... 23...... A4 Warwick Arts Centre, incorporating Music Centre...... 61...... D5 8 8 Humanities Building...... 24...... E4 Warwick Business School (WBS)...... 62...... D4 International House...... 25...... C6 WBS Main Reception, Scarman Rd...... 62...... D3 9 7 International Manufacturing Centre...... 26...... E4 WBS Social Sciences...... 63...... D5 37 IT Services Elab level 4...... 27...... H2 WBS Teaching Centre...... 64...... C4 IT Services levels 1-3...... 28...... H2 Warwick Digital Laboratory...... 65...... F4 36 Jack Martin...... 29...... E6 WarwickPrint...... 66...... H2 Lakeside...... 30...... B3 Westwood...... 67.G1/G2 Lakeside Apartments...... 31...... B2 Westwood Gatehouse OCNCE...... 68...... H2 34 Library...... 32...... D4 Westwood House, incorporating Occupational Health, Lifelong Learning...... 33...... G2 Counselling & DARO Calling Room...... 69...... G2 Medical School Building...... 34...... D8 Westwood Teaching and Westwood Lecture Theatre..70...... H2 Mathematics & Statistics (Zeeman Building)...... 35...... F4 Whitefields...... 71...... D5 Maths Houses...... 36...... E8 SYMBOLS University Buildings Wheelchair Accessible Footpaths/Cycleways Entrances 9 Student Residences One way Road Car Parks Controlled Access Bus Stop

Building Entrances Footpaths No Entry For the most up-to-date version of this map go to warwick.ac.uk/go/maps For further information see the University web site or mobile site www.m.warwick.ac.uk

A full-size version of the map is provided in the Conference pack.

11 4 University Facilities

Everything you will need during your stay can be found on the University campus. Situated on 700 acres of rural parkland, the campus ’village’ environment has its own banks, bars, shops and outlets. All meals - breakfasts, lunches, dinners and morning/afternoon refreshments- are included in the conference registration. However, if you find yourself still hungry there are a number of bars and cafes open around campus and also a small Costcutter supermarket located next to the Student Union. Inside Costcutter there is also a Post Office and Copyshop (for printing, photocopying and binding). A 10-minute walk takes you to the local Tesco, Boots and Iceland at Cannon Park Shopping Centre. Coventry’s high street stores are a bus-ride away, as is Leamington Spa’s range of boutique and high street shops. The Student Union building (possibly the largest in Europe) has recently been re- built in a 11 million redevelopment project. As well as a new entertainments venue, there are also more spaces for those who just want to go out and have a drink, includ- ing the new pub ’The Dirty Duck’ which serves its own local ale, and ‘The Terrace Bar’ which looks out over the Piazza. Downstairs in the Union are branches of two major UK banks - Barclays and Natwest- and also a pharmacy and hair salon, should you need them! If you are coming by rail or bus (e.g., National Express or ), you should come to Coventry. Travel Coventry service number 12 (which display the destination: University of Warwick or Leamington Spa) run from the city centre bus station (Pool Meadow), via Coventry Rail Station, to the University Central Campus, passing the Westwood campus en route. Free car parking is available for all delegates staying on campus. You can request an access code for car parks 7, 8 and 15 (see campus map) from Rootes Social Building reception when you check in.

5 Accommodation

Accommodation is in en-suite rooms on campus, 5 mins walk from both the Math/Stats Department and Rootes Social Building where breakfast and dinners will be served. All rooms have towels and toiletries. Kitchen facilities are available although all meals are provided. Internet is available in all bedrooms. Details of how to log onto the system will be displayed in each individual bedroom, but delegates will need to bring their own Ethernet cable. These can be purchased from Rootes Reception should anyone not be in possession of one. Rooms will be available after 15:00 for check in, however luggage can be left at Rootes Reception in Rootes Social Building until this time. All bedrooms must be vacated by 9:30am on Thursday 15th.

12 6 Conference Details

On Monday 12th, delegates should arrive at the Math/Stats Building (Zeeman Build- ing) between 13:00 and 15:00 to register and collect conference packs. These contain all the information needed during the conference. If you are presenting a poster, please submit it at registration. The conference will open with the plenary session at 15:30 in the Math/Stats Department. On Tuesday 13th and Wednesday 14th, delegates will have the opportunity to present talks. Posters will be on display in The Street of the Math/Stats Building throughout the afternoon of Tuesday 13th, with the poster session commencing at 16:10. Presenters are encouraged to be near their posters during this session in order to answer questions from interested participants.

6.1 Meals Breakfasts and evening meals (except on the evening of the conference dinner) will be served in Rootes Restaurant on campus. Lunches and morning/afternoon refresh- ments will be served in Math/Stats Department where the conference will be held. Please note that on the first day of the conference (Mon 12th) we will not be pro- viding any lunch. However there are plenty of eating facilities available on campus, and tea, coffee and cakes will be served before the plenary session. Dinner on the Wednesday evening will be at Coventry Transport Museum. You will be expected to wear formal attire (no jeans or trainers please). Before the meal you will be given an opportunity to have a look around the museum, and afterwards there will be a Ceilidh, followed by a DJ. Coaches to the conference dinner will pick delegates up by the Students Union at 18:15.

6.2 Sponsors’ Wine Reception The Sponsors’ Reception will be held in The Street in the Maths/Stats Building on the Wednesday at 16:10, prior to the conference dinner. Please take this opportunity to talk with our sponsors and visit their displays to learn more about possible career opportunities.

13 7 Help, Information and Telephone Numbers

Department address: Dept of Statistics University of Warwick Coventry CV4 7AL Telephone: 024 7657 4812 Fax: 024 7652 4532

Emergency Numbers: University Security: 024 7652 2083 (also for general emergencies) Conference Organiser: 077 2998 4952 (Jennifer Rogers, resident on campus)

Transport: Swift Taxis Coventry: 024 7676 7676 Trinity Street Taxis: 024 7663 1631 Bus information: 0871 200 2233 National Rail Enquiries: 08457 484950

7.1 Departmental Computing and Internet Access

Free wireless internet access will be available to all delegates in The Street area of the Maths/Stats building. You will be given the username and password in order to access this service via your laptops after the Plenary Session.

14 8 Instructions

8.1 For Chairs

• Please arrive at the appropriate seminar room five minutes before the start of your session. Familiarise yourself with the visual equipment.

• Packs will be left in each seminar room. Do not remove the packs or any of their contents from the seminar room. If you think something might be missing from the pack, please contact one of the organisers.

• You should clearly introduce yourself and each speaker in turn.

• It is very important that we stick to the schedule. Therefore please start the session on time, use the time remaining cards, and make sure that questions are not allowed to delay the rest of the session.

• If a speaker fails to show, please advise the audience to attend a talk in an alter- native seminar room. Do not move the next talk forward.

• After each talk, thank the speaker, encourage applause, and open the floor to questions (from students only). If no questions are forthcoming, ask one your- self.

• Use the 5 min and 1 min flash cards to assist the speaker in finishing on time.

8.2 For Speakers

• Each seminar room will contain a computer, data projector and white/black board.

• Arrive five minutes before the start of the session, introduce yourself to the chair and load your presentation onto the computer.

• Presentations must be pdf or Powerpoint (ppt or pptx) files. No other format is acceptable.

• Talks are strictly fifteen minutes plus five minutes for questions. Anyone going over this time will be asked to stop by the chair.

• Your chair will let you know when you have five minutes and then one minute remaining for your presentation.

15 8.3 For Displaying a Poster

• The poster session will be held in The Street area of the Math/Stats Building at 16:10 on Tuesday 13th April.

• Please submit posters upon registration on Monday 12th April.

• Posters will be erected by conference organisers.

• During the poster session, it is advisable to be near your poster in order to answer questions from interested participants.

• Posters will also be displayed throughout Tuesday afternoon.

• Please ensure that your poster is removed by 17:30 on Tuesday.

• Posters should be of no greater size then A1.

8.4 Prizes The three best talks and the best poster, as voted for by all delegates, will receive prizes in the form of book vouchers from our sponsors CUP and Wiley-Blackwell and additionally, courtesy of the Royal Statistical Society:

The RSS will offer the best three presentations and the best poster from the RSC2010 conference the opportunity to present their work at the RSS2010 conference which will be held from 13-17 September in Brighton. The three best presentations will participate in a special session at the confer- ence and the poster will be presented alongside the other posters at the event. The prize will be in the form of free registration at the conference for the four winners. (The registration fee includes many meals and social events but not transport or accommodation).

Further details about the conference can be found at: www.rss.org.uk/rss2010

16 9 Plenary Session

9.1 Professor Jane L. Hutton (University of Warwick) Opening Address

Jane L. Hutton is a Professor of Statistics in the Department of Statistics, University of Warwick. She works in medical statistics, with special interests in survival analysis, meta-analysis and non-random data. Accelerated failure time models are a particu- lar focus in her research in survival analysis. She has major collaborations in cerebral palsy and epilepsy. Her work with Professor Peter Pharoah and Dr Allan Colver, on life expectancy in cerebral palsy, has had a substantial effect on the size of awards in medico-legal cases. This work is widely cited nationally and internationally. In epilepsy, she has contributed to many Cochrane reviews of anti-epileptic drugs. She is currently working on a research project with Dr Tony Marson, of Liverpool Uni- versity Neurosciences Department. She has written extensively on ethics and philos- ophy of statistics. She has contributed to Research Council ethics guidelines.

17 9.2 Professor Jim Q. Smith (University of Warwick) Title: How to do Research Creatively

Abstract Making the shift from being a taught student to a researcher is a challenging one. We all develop the skill to deliver to our teachers what they want to see in ex- ams. Now suddenly we must develop a completely distinct set of skills where the point of our work is to produce something *different* from what other re- searchers do. How can this transition to becoming a creative researcher in Statis- tics or Probability be managed? In this short talk I will outline some techniques I have developed over the years: some of which I hope you might find useful.

Jim Q. Smith is a Professor of Statistics at Warwick University and has researched a wide range of topics both theoretical and applied, but always Bayesian. He is cur- rently Chair of RISCU, the consultancy arm of the statistics department and has close research ties with various companies and government departments.

18 9.3 Dr. Jonathan Rougier (University of Bristol) Title: Complex systems: Accounting for model limitations

Abstract Many complex systems, notably environmental systems like climate, are highly structured, and numerical models, known as simulators, play an important role in prediction and control. It is crucial to account for limitations in simulators, since these can be substantial, and can vary substantially from one simulator to another. These limitations can be categorised in terms of input uncertainty, para- metric uncertainty, and structural uncertainty. The talk explains this framework, and the particular challenge of accounting for simulator limitations in dynamical systems, with illustrations from climate science and natural hazards.

Jonty Rougier is an applied statistician working in the area of computer experiments, particularly for complex environmental systems like climate. He studied Economics and then Statistics at Durham, the latter as a postdoc working with Michael Gold- stein and Allan Seheult. He is currently a Lecturer in Statistics in the Department of Mathematics at the University of Bristol.

19 10 List of Sponsors’ Talks

On Wednesday 14th several of the conference sponsors will be giving presentations as part of the main conference programme, providing an opportunity to learn about their statistical work.

Session 6a, Room MS.01, Chair: Jennifer Rogers Time Sponsor Speaker Title Pg 14:30 International Bio- Richard Ems- The International Biometric Society: 91 metric Society ley What can it offer to Postgraduate Students? 15:05 Pfizer Phil Wood- Bayesian Design & Analysis of Ex- 91 ward periments 15:40 SmartOdds Robert Mas- An Introduction to Football Mod- 91 trodomenico elling at Smartodds

Session 6b, Room MS.04, Chair: Mouna Akacha Time Sponsor Speaker Title Pg 14:30 Shell Wayne Jones Making Decisions with Confidence - 92 Statistics the Shell Way 15:05 AHL, Man Group Martin Lay- An Introduction to AHL 92 PLC ton

Session 6c, Room MS.05, Chair: Flavio´ B Gonc¸alves Time Sponsor Speaker Title Pg 15:05 Royal Statistical Helen Support from the RSS and their 93 Society Thornewell Young Statisticians Section 15:40 Lloyds Banking Bill Fite Opportunities in Probability and 93 Group Statistical Modelling at Lloyds Bank- ing Group Decision Science

20 11 Talks Schedule

11.1 Monday 12th April Session – Plenary Chair: Jennifer Rogers Room: MS.01, Maths/Stats Building Time Speaker Title Pg

15:30 Hutton, Jane L. Opening Address 17 15:45 Smith, Jim Q. How to do Research Creatively 18 16:20 Rougier, Jonathan Complex systems: Accounting for model limita- 19 tions

21 11.2 Tuesday 13th April

Session 1a: Image Analysis Chair: Bryony Hill Room: MS.01 Time Speaker Title Pg

09:10 Doshi, Susan Statistical image reconstruction for cone-beam 32 computed tomography 09:35 Fallaize, Christopher Matching Shapes of Different Sizes 33 10:00 Khatun, Mahmuda Morphological Granulometry for Image Texture 34 Analysis and Classification 10:25 Yan, Lei Statistical Threshold of Magnetoencephalo- 34 graphic (MEG) Data 10:50 Llewelyn, Stephanie Statistical Modelling of Fingerprints 35

Session 1b: Computational Statistics Chair: Flavio´ B Gonc¸alves Room: MS.04 Time Speaker Title Pg

09:10 Cainey, Joe Performance of Pseudo-Marginal MCMC Algo- 36 rithms 09:35 O’Hagan, Adrian Computational Advances in Fitting Mixture 36 Models via the EM Algorithm 10:00 Prangle, Dennis Summary statistics for Approximate Bayesian 37 Computation 10:25 Raychaudhuri, Clare Investigating methods to approximate the ex- 38 pectation efficiently 10:50 Vrousai, Dina Sampling from the posterior- MCMC, Impor- 39 tance resampling or Numerical integration?

22 Session 1c: Operational Research Chair: Fiona Sammut Room: MS.05 Time Speaker Title Pg

09:10 Anacleto-Junior, Os- Bayesian forecasting models for traffic manage- 39 valdo ment systems 09:35 Aslett, Louis JM Modelling and Inference for Networks with Re- 40 pairable Redundant Subsystems 10:00 May, Benedict Multi-Armed Bandit with Regressor Problems 40 10:25 Moffatt, Joanne Analysing strategy in the sprint race in track cy- 41 cling using logistic regression 10:50 Hashim, Siti R.M. Interpretation Problems in Multivariate Control 42 Chart

Session 1d: Statistical Inference Chair: Stephen Burgess Room: A1.01 Time Speaker Title Pg

09:10 Jamalzadeh, Amin Developing Effect Sizes for Non-Normal Data 42 09:35 Jesus, Joao Inference without likelihood 43 10:00 McElduff, Fiona Maximum likelihood estimation of discrete dis- 43 tribution parameters using R 10:25 Ogundimu, Em- Investigating the impact of missing data on 44 manuel Cronbach’s alpha estimates and Confidence In- tervals 10:50 Zwiernik, Piotr Posets, Mobius¨ functions and tree-cumulants 44

23 Session 2a: Medical Statistics I Chair: Mouna Akacha Room: MS.01 Time Speaker Title Pg

11:30 Ewings, Sean Modelling Blood Glucose Concentration for Peo- 45 ple with Type 1 Diabetes 11:55 Smith, Joanna Methods for the Analysis of Asymmetry 46 12:20 Strawbridge, Alexan- Measurement error correction of the associa- 46 der tion between fasting blood glucose and coronary heart disease - a structural fractional polynomial approach 12:45 Verykouki, Eleni Modelling the effects of antibiotics on carriage 47 levels of MRSA 13:10 Roloff, Verena Planning future studies based on the conditional 48 power of a random-effects meta-analysis

Session 2b: Financial Chair: Murray Pollock Room: MS.04 Time Speaker Title Pg

11:30 Lapinski, Tomasz Modelling the rank system with Gibbs, Bose Ein- 48 stein or Zipf Law. Application in Mathematical Finance 11:55 Michelbrink, Daniel A Martingale Approach to Active Portfolio Se- 49 lection 12:20 Pham, Duy Measuring vega risks of Bermudan swaptions 49 under the Markov-Functional model 12:45 Shahtahmassebi, Mathematical and Statistical Models for Predict- 50 Golnaz ing Financial Behaviour 13:10 Wang, Chun An optimal stopping problem of finite horizon 50 with regime switching

24 Session 2c: Elicitation and Epidemiology Chair: Michelle Stanton Room: MS.05 Time Speaker Title Pg

11:30 Elfadaly, Fadlalla G. On Eliciting Expert Opinion in Generalized Lin- 51 ear Models 11:55 Noosha, Mitra Discordancy between the prior and data using 51 conjugate priors 12:20 Ford, Ashley P. Indian Buffet Epidemics. A Bayesian Approach 52 to Modelling Heterogeneity 12:45 Worby, Colin A hidden Markov model to analyse MRSA trans- 53 mission in hospital wards 13:10 Walker, Neil Estimating the size of a badger population using 53 live capture and post-mortem data

Session 2d: Multivariate Statistics Chair: Nathan Huntley Room: A1.01 Time Speaker Title Pg

11:30 Fayomi, Aisha Cauchy Principal Components Analysis 54 11:55 Sweeney, James Approximate Joint Statistical Inference for Large 54 Spatial Datasets 12:20 Tsagris, Michael Multivariate outliers, the forward search and the 55 Cronbach’s Reliability Coefficient 12:45 Mohammad, Rofizah Bayesian Analysis in Multivariate Data 55 13.10 Sammut, Fiona Some Aspects of Compositional Data 56

25 Session 3a: Genetics Chair: Dennis Prangle Room: MS.01 Time Speaker Title Pg

14:30 Evangelou, Marina Incorporating available biological knowledge to 56 explore genome-wide association data 14:55 Fowler, Anna Informed Bayesian Clustering of Gene Expres- 57 sion Levels 15:20 Burgess, Stephen An application of Bayesian techniques for 58 Mendelian randomization to assess causality in a large meta-analysis 15:45 Cairns, Jonathan BayesPeak: A Hidden Markov Model for 59 analysing ChIP-seq experiments

Session 3b: Medical Statistics II Chair: Helen Thornewell Room: MS.04 Time Speaker Title Pg

14.30 Hee, Siew Wan Designing a Series of Phase II Trials 59 14:55 Magirr, Dominic Response-Adaptive Block Randomization in Bi- 60 nary Endpoint Clinical Trials 15:20 Ren, Shijie Bayesian clinical trial designs for survival out- 60 comes 15:45 Yeung, Wai Yin The power of the biased coin design for clinical 61 trials

26 Session 3c: Dimension Reduction Chair: James Sweeney Room: MS.05 Time Speaker Title Pg

14:30 Chand, Sohail Oracle properties of Lasso-type methods in Re- 62 gression problems 14:55 Khan, Md. Hasinur Penalized Weighted Least Squares Variable Se- 62 Rahaman lection Method for AFT Models with High Di- mensional Covariates 15:20 Serradilla, Javier Latent Variable Models for Process Monitoring 63 15:45 Yusoff, Nur Fatihah A study of item selection using principal compo- 63 Mat nent analysis and correspondence analysis

Session 3d: Environmental Chair: Andrew Smith Room: A1.01 Time Speaker Title Pg

14:30 Jones, Emma M. Using a Bayesian Hierarchical Model for Tree- 65 Ring Dating 14:55 Norris, Beth Not another species richness estimator?! 65 15:20 Oxlade, Rachel Uncertainty analysis for multiple ecosystem 66 models using Bayesian emulators 15:45 Powell, Helen Estimating biologically plausible relationships 67 between air pollution and health

27 11.3 Wednesday 14th April

Session 4a: Medical Statistics III Chair: Fiona McElduff Room: MS.01 Time Speaker Title Pg

09:10 Iglesias, Alberto Al- An application of survival trees to the study of 68 varez cardiovascular disease 09:35 Dooley, Cara Analysis of an Observational Study to in Col- 68 orectal Cancer Patients 10:00 O’Keeffe, Aidan Causal Inference in Longitudinal Data Analysis: 69 A Case Study in the Epidemiology of Psoriatic Arthritis 10:25 Thomas, Maria Design and analysis of dose escalation trials 70 Roopa 10:50 Nicholls, Stuart Modelling parental decisions for newborn 70 bloodspot screening

Session 4b: Point Processes and Spatio-Temporal Statistics Chair: Chris Fallaize Room: MS.04 Time Speaker Title Pg

09:10 Marek, Patrice Poisson Process Parameter Estimation from Data 71 in Bounded Domain 09:35 Bakar, Khandoker A Comparison of Bayesian Space-Time Models 72 Shuvo for Ozone Concentration Levels 10:00 Proctor, Iain Multi-level models for ecological response appli- 72 cations 10:25 Stanton, Michelle A Spatio-temporal modelling of Meningitis Inci- 73 dence in sub-Saharan Africa 10:50 Smith, Andrew Denoising UK House Prices 73

28 Session 4c: General Chair: Michael Tsagris Room: MS.05 Time Speaker Title Pg

09:10 Gollini, Isabella Mixture of Latent Trait Analyzers 74 09:35 Klapper, Jennifer A wavelet based approach to HPLC data analy- 74 sis 10:00 Bhattacharya, Sakya- Delete-Replace Identity For A Set Of Indepen- 75 jit dent Observations 10:25 Sanderson, Ria Modelling Main Contractor Status for the New 75 Orders Survey 10:50 Wilson, Kevin Bayes linear kinematics in the analysis of failure 76 rates

Session 4d: Graphical Models and Extreme Value Theory Chair: Guy Freeman Room: A1.01 Time Speaker Title Pg

09:10 Wadsworth, Jenny Uncertainty in Choice of Measurement Scale for 77 Extreme Value Analysis 09:35 Youngman, Ben Modelling extremal phenomena using different 77 data sources 10:00 Byrne, Simon Parametrisation of graphical models 78 10:25 Caimo, Alberto Bayesian inference for Social Network Models 78

29 Session 5a: Experimental Design and Population Genetics Chair: Andrew Simpkin Room: MS.01 Time Speaker Title Pg

11:30 Khadim, Mudakkar Canonical Analysis of Multi-Stratum Response 79 M. Surface Designs & Standard Errors of Eigenval- ues 11:55 Martin, Kieran D-optimal design of experiments for a dynamic 79 model with correlated observations 12:20 Thornewell, Helen Vulnerability: A 2nd Criterion to Distinguish be- 80 tween Equally-Optimal BIBDs 12:45 Kershaw, Emma Surfing In One Dimension 80 13:10 Mair, Colette Dimension Reduction for Human Genomic SNP 81 Variation

Session 5b: Censoring in Survival Data and Non-Parametric Statistics Chair: Jennifer Rogers Room: MS.04 Time Speaker Title Pg

11:30 Elsayed, Hisham Ab- Parametric Survival Model with Time- 82 del Hamid dependent Covariates for Right Censored Data 11:55 Staplin, Natalie Assessing the Effect of Informative Censoring in 83 Piecewise Parametric Survival Models 12:20 Thom, Howard Dealing with Censoring in Quality Adjusted 83 Survival Analysis and Cost Effectiveness Anal- ysis 12:45 Aboalkhair, Ahmad Nonparametric Predictive Inference for System 84 M Reliability 13:10 Toupal, Tomas Nonparametric Estimation of Reliability of Two 85 Random Variables Using Kernel Estimation of Density

30 Session 5c: Time Series and Diffusions Chair: Alexander Strawbridge Room: MS.05 Time Speaker Title Pg

11:30 Bhattacharya, Arnab Sequential Integrated Nested Laplace Approxi- 85 mation 11:55 Killick, Rebecca Finding changepoints in a Gulf of Mexico hurri- 86 cane hindcast dataset 12:20 Stevens, Kara Prediction Intervals of the Local Spectrum Esti- 87 mate 12:45 Suda, David Discrete- and Continuous-time Approaches to 87 Importance Sampling on Diffusions 13:10 Villalobos, Isadora Bayesian inference for diffusions based on exact 88 Antoniano simulation

Session 5d: Probability Chair: Duy Pham Room: A1.01 Time Speaker Title Pg

11:30 Barranon, Antonio A. A New Bivariate Generalized Pareto Model 88 Ortiz 11:55 Huntley, Nathan Backward Induction and Subtree Perfectness 89 12:20 Lee, Rui Xin On the Convergence of Continuously Monitored 89 Barrier Options Under Markov Processes 12:45 Wagnerova, Eva Distortion of Probability Models 90

31 12 Talk Abstracts by Session

12.1 Tuesday 13th April 12.1.1 Session 1a: Image Analysis Session Room: MS.01 Chair: Bryony Hill

Start time 09:10

STATISTICAL IMAGE RECONSTRUCTION FOR CONE-BEAM COMPUTEDTOMOGRAPHY Susan Doshi and Chris Jennison University of Bath, UK Keywords: Bayesian image analysis, Cone-beam CT, Image-guided radiotherapy

In image-guided radiotherapy, the accuracy of patient positioning is determined us- ing images of internal anatomy in addition to the traditional external markers. This gives confidence that radiation prescribed for the treatment of cancer will be deliv- ered to the desired volume. Treatment is usually delivered five days a week for sev- eral weeks, with imaging used on many of these occasions. X-ray cone-beam com- puted tomography (CBCT) is increasingly being used for this purpose. An X-ray source moves in a circular trajectory around the patient and planar projection images are acquired at increments of 1◦. The data in these images are used to reconstruct a 3D representation of the patient. Conventional Fourier-based reconstruction techniques rely on relatively noise-free projection images, with the entire patient diameter being included in each projec- tion, and with a complete set of projections over more than 180◦. Satisfying each of these requirements can be difficult. In addition, metallic fiducial markers may be im- planted to help track the movement of soft tissues. These improve visualisation on the projection images, but may cause artefacts in the 3D reconstruction. Statistical reconstruction techniques can cope naturally with these obstacles. In this presentation, we will introduce the Bayesian approach to image reconstruction. Mod- elling may be carried out in a number of spaces: the 2D projection image, the 3D pa- tient space, or the 3D sinogram space (formed by ’stacking’ the 2D projections along a third axis indexed by the projection angle). We can use a normal likelihood, or include aspects of the physical system in a more realistic model. Inference on the structure of the patient is based on MCMC sampling from the posterior distribution, and choices of prior and likelihood are made by considering the trade-off between accurate inference and the time taken to perform this sampling. The methods will be demonstrated using data acquired on clinical systems.

32 Start time 09:35

MATCHING SHAPESOF DIFFERENT SIZES Christopher Fallaize University of , UK Keywords: Bayesian alignment, MCMC, Scale factor, Statistical shape analysis, Unlabelled landmarks

The shape of an object is the information invariant under the full similarity trans- formations of rotation, translation and rescaling. In statistical shape analysis, we are concerned with analysing differences in shape between individual objects or popula- tions. To this end, we first seek some optimal registration which removes the effects of orientation, location and size, so that any remaining differences are due to genuine differences in shape. Objects are often reduced to k points, known as landmarks, in m dimensions and thus can be represented as k × m point configurations. In labelled shape analysis the cor- respondence between landmarks on different configurations is known. Unlabelled shape analysis deals with the more complex situation where the correspondence be- tween landmarks is unknown. Green and Mardia (Biometrika, 2006 pp. 235–254) developed a Bayesian methodology for the pairwise alignment of two unlabelled configurations using the rigid body transformations of rotation and translation. We present the extension to full similarity shape by introducing a scaling factor to the model. Taking one of the configurations as a fixed reference, the aim is to estimate the transformation of the other configuration onto the reference whilst simultaneously identifying the matching between landmarks. Particular challenges include efficient simulation from a non-standard distribution for the scale factor and the desire for a symmetrical setup to ensure that equal inferences are drawn regardless of which con- figuration is taken as the reference. Possible applications include automated image analysis (where objects nearer or further away have different sizes) and biological morphometrics (where objects at different growth stages may be of different sizes). We shall illustrate our methodology with examples using both real and artificial data sets.

33 Start time 10:00

MORPHOLOGICAL GRANULOMETRYFOR IMAGE TEXTURE ANALYSIS AND CLASSIFICATION Mahmuda Khatun1, Dr Alison Gray1 and Prof. Steve Marshall2 1 Department of Mathematics and Statistics, University of Strathclyde, Glasgow 2 Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow Keywords: Image analysis, Morphology, Opening, Granulometry, Pattern spectrum, Structuring element

An important area of digital image analysis is the analysis of texture images. A sta- tistical approach to image texture classification based on granulometric moments is described here. Mathematical morphology provides a set of non-linear techniques to extract shaped-based information from an image, using image probes in the form of ‘structuring elements’. Opening granulometry is based on a sequence of morpholog- ical openings using scaled structuring elements. As the scale increases, more image areas are removed. Pattern spectra are formed by normalising the removed area by the total image area. Since the pattern spectrum is a probability density function its moments can be calculated. The pattern spectrum moments can be used as texture features for classification. This work concerns sequences of texture images which evolve in time, and the clas- sification of a new image to a point in time. Statistical models are being built to relate granulometric moments to evolution time directly, using training images for which both the evolution parameters and the time state are known. Each model can be used for back-prediction of evolution time of a new image from its observed granulometric moments. Better predictions are expected by combining different models.

Start time 10:25

STATISTICAL THRESHOLDOF MAGNETOENCEPHALOGRAPHIC (MEG)DATA Lei Yan, C.J. Brignell and C. D. Litton School of Mathematical Sciences, University of Nottingham, UK Keywords: FWER, Random field, Permutation method

In this presentation, we show how Magnetoencephalographic (MEG) data can be analyzed statistically using parametric (standard and random field) and nonpara- metric methods (permutation, bootstap). Compared to parametric statistical tests, nonparametric statistical tests provide complete freedom to the user with respect to the test statistic by means of which the experimental conditions are compared. We

34 propose statistical thresholds that control the familywise error rate (FWER) across time or across both space and time. These approaches use the distribution of test statistics under the null hypothesis to find FWER thresholds. We show the original permutation tests can not control FWER while experimental conditions have same variance-covariance structure, which is difficult to achieve in practice. Unlike pre- vious permutation based tests in neuroimaging, we also address the problem by a permutation based tests without assumption that different experimental conditions have same variance-covariance structure.

Start time 10:50

STATISTICAL MODELLINGOF FINGERPRINTS Stephanie Llewelyn University of Sheffield, UK Keywords: Fingerprints, Identification, Modelling

It is believed that fingerprints are determined in embryonic development. Unlike other personal characteristics the fingerprint appears to be a result of a random pro- cess. For example fingerprints of identical twins (whose DNA is identical) are dis- tinct, and extensive studies have found little evidence of a genetic relationship in terms of types of fingerprint, certainly at the small scale. At a larger scale the pattern of ridges on fingerprints can be categorised as belonging to one of five basic forms: loops (left and right), whorls, arches and tented arches. The population frequencies of these types show little variation with ethnicity and a list of the types occurring on the ten digits can be used as an initial basis for identification of individuals. However, such a system would not uniquely identify an individual although the frequency of certain combinations could be extremely small. At a smaller scale various minutiae or singularities can be observed in a fingerprint. These include ridge endings and bifurcations, amongst others. Typical fingerprints have several hundred of these as well as two key points (with the exception of a simple arch) referred to as the core and delta, which are focal points of the overall pattern of ridges. Modern identifica- tion systems are based upon endings and bifurcations, not least because they are the easiest to determine automatically from image analysis. The configuration of these minutiae is unique to the individual. The presentation will outline the history of use of fingerprints, illustrate some of these features used for identification and discuss ways in which statistical models could be developed to generate realistic fingerprints using data obtained from fingermarks.

35 12.1.2 Session 1b: Computational Statistics Session Room: MS.04 Chair: Flavio´ B Gonc¸alves

Start time 09:10

PERFORMANCEOF PSEUDO-MARGINAL MCMC ALGORITHMS Joe Cainey Statistics Group, University of Bristol Keywords: MCMC, Latent Variable, Pseudo-Marginal, Metropolis-Hastings, GIMH, Autocorrelation

Given the problem of sampling from a distribution π (θ), the Metropolis-Hastings (MH) algorithm is often used to generate a Markov Chain with invariant distribution π (θ). In cases where π (θ) is intractable, or too complex to evaluate, a different ap- proach must be taken. It is often possible to instead construct a Markov Chain with invariant distribution π (θ, z), where z can be missing data, or latent variables which make π (θ, z) easier to evaluate, which is known as data augmentation. A pseudo- marginal algorithm attempts to combine the precision of the marginal sampler with the computational efficiency of data augmentation techniques. Grouped Independence Metropolis-Hastings (GIMH) is a pseudo-marginal algorithm which uses importance sampling to estimate π (θ). When running any form of MCMC sampler, the performance of the resulting chain is of great importance. We show that as the number of importance sampling particles approaches infinity the performance of the chain produced by the GIMH algorithm converges to that of the marginal al- gorithm.

Start time 09:35

COMPUTATIONAL ADVANCES IN FITTING MIXTURE MODELSVIATHE EMALGORITHM Adrian O’Hagan University College Dublin, Ireland Keywords: Expectation-Maximisation Algorithm, Starting values, Multimodal likelihood functions, Convergence rate, Multicycle ECM Algorithm

The Expectation-Maximisation (EM) Algorithm is a popular tool for deriving maxi- mum likelihood estimates in a large family of statistical models. Chief among its at- tributes is the property that the algorithm always drives the likelihood uphill. How- ever it can be difficult to assess convergence and, in the case of multimodal likelihood

36 functions, the algorithm may become trapped at a local maximum. We introduce a variety of schemes to promote algorithmic efficiency. A range of ”burn-in” functions are described. These can produce initialising values for the EM algorithm of a higher quality than those arising from simply employing random starts. The use of likelihood monitoring and multicycle features allows maximiza- tion steps to be ordered and targeted on parameter subsets. Outcomes are compared with those from the model-based clustering package mclust in R where a hierarchi- cal clustering initialisation is performed. The overall goal is to increase convergence rates to the global likelihood maximum and/or to attain the global maximum in a higher percentage of cases.

Start time 10:00

SUMMARY STATISTICS FOR APPROXIMATE BAYESIAN COMPUTATION Dennis Prangle Lancaster University Keywords: ABC, MCMC, Bayesian statistics

Approximate Bayesian Computation (ABC) methods are a family of algorithms for ‘likelihood-free’ Bayesian inference. The domain of use is models where numerical evaluation of the likelihood is impossible or impractical, but from which data can eas- ily be simulated. For example, over the last decade ABC has allowed investigation of realistic but previously intractable models in population genetics. Other applications include infectious disease epidemiology and missing data models. ABC operates by simulating data Xsim from the model of interest for many param- eter values θ and constructing an approximation to the posterior from those θ values for which the associated Xsim closely matches the observations Xobs. Algorithms have been proposed which implement this idea within the frameworks of rejection and importance sampling, Markov Chain Monte Carlo and Sequential Monte Carlo. A key insight in past research is that to achieve practical acceptance rates, ‘closeness of match’ should be judged by some norm ||S(Xsim) − S(Xobs)|| where S(.) are low dimensional summary statistics of a data set. However the problem of how to choose S well is an open question in the literature. This talk uses visual examples to introduce the main ideas of ABC and describe a novel methodology for constructing efficient summary statistics. Theoretical support for the method is also briefly outlined.

37 Start time 10:25

INVESTIGATING METHODS TO APPROXIMATE THE EXPECTATION EFFICIENTLY Clare Raychaudhuri University of Bristol, UK Keywords: variance reduction, Monte-Carlo methods

Suppose we wish to estimate the expectation of a function g (x) with respect to the standard Gaussian distribution, i.e. the Gaussian distribution with mean 0 and vari- ance the identity matrix. One method to estimate this expectation is to use basic 1 P Monte-Carlo methods n g (xi). However basic Monte-Carlo methods may require large number of function evaluations for the estimate to converge. Luckily it is of- ten possible to speed up this convergence using control variates. In order to use a control variate it is required that there exists a function α (x) for which the expecta- tion is known, E {α (x)} = c, and which has a strong correlation with g (x). This new estimator µˆ for E {g (x)} is

( n ) 1 X µˆ = g (x ) + [c − α (x )] B . n i i i=1

The variance of this estimator is minimised when B = B∗

B∗ = Var {α (X)}−1 Cov {α (X) , g (X)} .

However often B∗ is not known so it has to be estimated using linear regression. In the case where α (X) = X and so c = 0 this problem is equivalent to estimating the intercept of a linear regression of (1, X) on Y. Unfortunately this is a biased estimator of E {g (x)} since the same data points are used both to estimate B and to estimate µˆ. Therefore a method such as jack-knife should be applied to reduce the estimator bias and provide an estimate of the variance of the estimator. While using linear regression is appropriate when n  q + p, (where p is the dimen- sion of y and q is the dimension of x), it is not appropriate if there is only a small sample size n. In this case dimensional reduction techniques such as principle com- ponent analysis or partial least squares analysis can be considered.

38 Start time 10:50

SAMPLINGFROMTHEPOSTERIOR-MCMC,IMPORTANCE RESAMPLINGOR NUMERICAL INTEGRATION? Dina Vrousai and John Haslett Trinity College Dublin, Ireland Keywords: Numerical Integration, MCMC

Many methods and algorithms have been developed to sample from the posterior distribution. Importance resampling (IR) and particularly Markov Chain Monte Carlo (MCMC) methods are widely used for this purpose. Sampling from the posterior us- ing these methods doesn’t require the knowledge of the normalizing constant. An- other alternative is to compute the normalizing constant and then to sample from the posterior. This can be very computationally demanding, especially in high dimen- sional problems. We are using an R package, lately released, which implements multidimensional in- tegration algorithms, only for Riemann integrals (unit hypercube). The aim is to compare the special characteristics of these three methods (IR, MCMC, Numerical integration) using an application on blood lactate data. We are using Kriging with Gaussian processes to model these data. We then compare the posterior distributions for our model obtained using these three different methods (MCMC, IR and Numer- ical integration).

12.1.3 Session 1c: Operational Research Session Room: MS.05 Chair: Fiona Sammut

Start time 09:10

BAYESIAN FORECASTING MODELS FOR TRAFFIC MANAGEMENTSYSTEMS Osvaldo Anacleto-Junior and Dr. Catriona Queen The Open University, Department of Mathematics and Statistics

Many roads have real-time traffic flow data available which can be used as part of a traffic management system. In a traffic management system, traffic flows are moni- tored over time with the aim of reducing congestion by taking actions, such as impos- ing variable speed limits or diverting traffic onto alternative routes, when problems

39 arise. Reliable short-term forecasting models of traffic flows are crucial for monitor- ing traffic flows and, as such, are crucial to the ultimate success of any traffic man- agement system. The model used here for forecasting traffic flows uses a directed acyclic graph (DAG) in which the nodes represent the time series of traffic flows at the various data collec- tion sites, and the links between nodes represent the conditional independence and causal structure between flows at different sites. The DAG breaks the multivariate model into simpler univariate components, each one being a dynamic linear model. This makes the model computationally simple, no matter how complex the traffic network is, and allows the forecasting model to work in real-time, as required by any traffic management system. This talk will report current research in the development of this class of model with particular reference to a busy motorway junction in the UK.

Start time 09:35

MODELLINGAND INFERENCEFOR NETWORKSWITH REPAIRABLE REDUNDANT SUBSYSTEMS Louis JM Aslett and Simon P Wilson Trinity College Dublin, Ireland Keywords: Bayesian inference, reliability theory, phase-type distributions, telecommunications, MCMC

We consider the problem of modelling the reliability of a network of subsystems where each subsystem has redundancy and is repairable. The motivation for this work is large-scale telecommunications networks. The time to failure of the subsystem hardware is modeled by an appropriate Markov process and is hence a phase-type distribution. The network structure defines a fail- ure rule in terms of the states of the subsystems, allowing computation by Monte Carlo simulation of the time to failure distribution for the network. When data on the reliability of the subsystems are available, this can be incorporated via modifica- tions to an existing Bayesian inference approach to update the prediction of network reliability.

Start time 10:00

MULTI-ARMED BANDITWITH REGRESSOR PROBLEMS Benedict May and Dr. David Leslie University of Bristol, UK Keywords: Bandit Problem, Reinforcement Learning, Linear Regression, Nonparametric Regression

40 The multi-armed bandit problem is a simple example of the exploitation/exploration trade-off generally inherent in reinforcement learning problems. An agent is tasked with learning from experience how to sequentially make decisions in order to max- imize average reward. In the extension considered, the agent is presented with a regressor before making each decision. The agent has to balance the tendency to explore apparently sub-optimal actions (in order to improve regression function es- timates) against the tendency to exploit the current estimates (in order to maximise reward). Study of several past approaches to similar problems has indicated particu- lar desirable properties for the policy used. These properties motivate the choice and study of the algorithm that features in this work. The theoretical properties of the algorithm have been studied and it has been tested on both linear and nonparametric regression problems. The intuitive algorithm has useful convergence properties and, compared to many conventional methods, performs well in simulations.

Start time 10:25

ANALYSING STRATEGY IN THE SPRINT RACE IN TRACK CYCLINGUSINGLOGISTICREGRESSION Joanne Moffatt1, Philip Scarf1, Louis Passfield2 and Ian McHale1 1 Centre for Operations Management, Management Science and Statistics, Salford Business School, University of Salford, UK 2 Centre for Sports Studies, University of Kent, UK Keywords: Individual sprint race, Track cycling, Strategies, Logistic regression

Competitors and coaches in sports continually try to gain a competitive edge by op- timising strategy. One highly tactical contest is the individual sprint in track cycling, where one small strategic error can potentially cost the competitor the race. The aim of this research is to use statistical analysis to give insight into strategies in this event. Eight logistic regression models were developed to predict the probability of the leading rider winning from different stages of the race, based on how the race proceeded just before each stage. Logistic regression was selected since it is suitable to use when there are a large number of potential strategies. It also has the advantage of being simple to implement and straightforward to interpret. Key strategies were successfully identified from the models, including how the leading rider can defend their lead and how the following rider optimises their chances of overtaking.

41 Start time 10:50

INTERPRETATION PROBLEMSIN MULTIVARIATE CONTROL CHART Siti R.M. Hashim University of Sheffield, UK Keywords: multivariate control chart, multivariate processes, quality control, diagnostic method, correlation

Multivariate control charts have assumed a major role in multivariate processes qual- ity control. Unlike univariate control charts, the interpretation of the out-of control signals triggered from a multivariate control chart is not an easy straight forward task. Practitioners and quality control researchers have proposed a few diagnostic methods to deal with this problem. Unfortunately, most of the proposed methods do not perform similarly under different type of mean shifts and correlation. As a result, different diagnostic methods adopted might lead to different interpretations and conclusions. In this study, a few diagnostic methods are selected and tested un- der different type of mean shifts and correlations. The performances of the selected diagnostic methods are measured by the percentage of correct identification with re- spect to the different mean shifts and correlations. A general guideline will be given with respect to the selection of the appropriate diagnostic methods in interpreting the signals produced by multivariate control chart.

12.1.4 Session 1d: Statistical Inference Session Room: A1.01 Chair: Stephen Burgess

Start time 09:10

DEVELOPING EFFECT SIZESFOR NON-NORMAL DATA Amin Jamalzadeh Durham University, UK Keywords: Effect size, hypothesis test, two sample t-test, Normal distribution, Weibull distribution

The classical hypothesis testing model seeks to determine whether to reject the hy- pothesis of the non-existence of a phenomenon. Therefore, statistical significance does not necessarily provide information about the importance or magnitude of the phenomenon. There are indicators, known as effect sizes (ES), which are used by some to quantify the degree to which a phenomenon exists. Statistical significance

42 is not a direct measure of ES, but there exists a functional relationship between the sample size, the ES and the p-value. For this reason, if the sample size is sufficiently large even a weak ES may appear as statistically significant. The ES has been mainly introduced and investigated based on an assumption of normal distribution for the underlying population. However, there are many circumstances where the popula- tions are non-Normal, or depend on scale and shape and not just location parameter. We will review how to interpret the effect size for the two independent sample com- parison studies when the assumption of normality holds. We will also investigate how results change when the parameters of location and scale both change for a nor- mal population. We introduce explorations for effect sizes for phenomena in which the variable follows a distribution with shape and scale parameters. As a special case, power analysis and sample size determinations will be discussed for continuous and discrete Weibull distributions for two sample comparison. Finally, for an application, we show how to detect the effect of some factors on the amount of time spent and the number of pages viewed while a user surfs on an E-commerce website.

Start time 09:35

INFERENCEWITHOUTLIKELIHOOD Joao Jesus University College London, UK Keywords: Estimating Functions, Method of Moments, Efficiency, Minimal Variance, Simulation, Rainfall

Maximum likelihood estimation has been shown to be optimal for numerous classes of statistical models. However there are still many cases for which is not possible to derive a likelihood, and where traditionally moment based inference is used. The aim of this talk is to show some asymptotic results for moment based estimators including consistency and efficiency. We investigate the validity of the asymptotic results for finite samples using simulations, the particular processes chosen are from a class of models for rainfall based on point-processes which are widely present in rainfall modeling literature, and are also used by official bodies like the UK Climate Impacts Programme.

Start time 10:00

MAXIMUMLIKELIHOODESTIMATIONOFDISCRETE DISTRIBUTIONPARAMETERSUSING R Fiona McElduff, Mario Cortina-Borja and Angie Wade Centre for Paediatric Epidemiology and Biostatistics, Institute of Child Health, UCL. Keywords: discrete distributions, maximum likelihood estimation, rapid estimation

43 Value inflation, truncation and overdispersion frequently appear in discrete datasets. The most widely used model for discrete data is the Poisson distribution, but in practice the equal mean-variance assumption is often not supported by the obser- vations. Many probability distribution functions have been developed to improve modelling highly skewed variables. It is of interest to fit several models correspond- ing to competing data-generating mechanisms hypotheses. We have developed an R library to fit a comprehensive range of probability distributions to discrete data using maximum likelihood estimation. The library includes models characterised as parameter-mix Poisson distributions and members of the Lerch and generalized Hy- pergeometric families, as well as their modified versions, e.g. those incorporating value-inflation and truncation. Models are compared using the BIC. We apply this methodology to several datasets within the field of child health research.

Start time 10:25

INVESTIGATING THE IMPACT OF MISSING DATA ON CRONBACH’S ALPHA ESTIMATES AND CONFIDENCE INTERVALS Emmanuel Ogundimu University of Warwick, UK

Cronbach’s alpha is widely used to describe reliability of tests and measurements. Point estimates of Cronbach’s alpha are readily computed by statistical software, and methods for constructing confidence intervals have also been suggested in the lit- erature. However, both point estimates and confidence intervals of Cronbach’s al- pha can give misleading results when data is missing. We demonstrate in a Monte Carlo study the impact of missing data on point estimates and confidence intervals for Cronbach’s alpha when items in tests have homogeneous or heterogeneous co- variance, and when an underlying normality assumption holds or is violated for test items. In particular, we assess the coverage rates of Cronbach’s alpha Exact, Nor- mal theory (NT) and Asymptotic Distribution Free (ADF) intervals. Four methods of imputing missing items scores were evaluated. Finally, we recommend the ‘best’ im- putation techniques for test developers to use when their data falls within scenarios described in this study.

Start time 10:50

POSETS,MOBIUSFUNCTIONSANDTREE¨ -CUMULANTS Piotr Zwiernik University of Warwick, UK Keywords: partially ordered sets, cumulants, model identifiability, bayesian networks with hidden variables, phylogenetic tree models, binary data

44 It has been noted by several authors that in the case of multivariate models cumulants often form a convenient system of coordinates. We investigate Bayesian networks on rooted trees where all variables in the system are binary and the inner nodes repre- sent hidden variables. We show that in this case we can construct a more flexible change of coordinates. This change depends on classical results in the theory of par- tially ordered sets, which mirrors the combinatorial definition of cumulants. The new coordinates give us a good understanding of the structure of the models under consideration. The nice structure of the parameterization allows us for exam- ple to understand the identifiability issues for this class of models: the formulae for the estimators in the case when the model is identified and the structure of the MLE fibers in the case when it is not.

12.1.5 Session 2a: Medical Statistics I Session Room: MS.01 Chair: Mouna Akacha

Start time 11:30

MODELLING BLOOD GLUCOSE CONCENTRATION FOR PEOPLEWITH TYPE 1 DIABETES Sean Ewings University of Southampton, UK

Type 1 diabetes mellitus is a chronic metabolic disorder which affects millions of peo- ple worldwide. It is characterised by loss of insulin-production mechanisms which results in prolonged high blood glucose concentration (hyperglycaemia). Day-to- day treatment is the responsibility of the individual and is based on injections of insulin. Insulin requirements are assessed daily according to various lifestyle fac- tors, predominantly diet and exercise. Poor control of the illness is associated with many short- and long-term health complications such as ketoacidosis, cardiovascular events (heart disease, stroke) and neuropathy. Diabetes UK (DUK) currently supports a three-year study to investigate and model the effect of physical activity on capillary blood glucose concentration. Volunteers to the study have blood glucose and exercise (as metabolic equivalent of task, MET) recorded continuously over a number of days. Food and insulin regimes are also recorded. Previous research provides models for the action of ingested carbohydrate and injected insulin in the blood. These models may be combined with the information on METs in order to investigate the behaviour of blood glucose concentration. The focus is on a descriptive model that can aid cur- rent treatment and hence limit complications. Currently, various time series models including the Dynamic Linear Models are investigated.

45 Start time 11:55

METHODSFORTHE ANALYSIS OF ASYMMETRY Joanna Smith University of Glasgow, UK Keywords: shape analysis, asymmetry, landmarks

There is interest in knowing the extent of asymmetry present in the breasts of pa- tients who have undergone a unilateral mastectomy and reconstruction procedure. Three-dimensional images were captured for 44 such patients, and each case was then marked with ten anatomically significant landmarks. Asymmetry can be quan- tified as the degree to which there is a mismatch between a landmark configuration (the set of all landmarks on an individual image) and its relabelled and matched reflection. After a configuration has been reflected, rotated and scaled to minimise sums of squares distances between corresponding landmarks we should have re- moved any location, orientation and size effects and be left purely with the genuine shape differences. This can be quantified into an asymmetry score for each patient. These asymmetry scores give an indication of the overall asymmetry present in a case, however it is also possible to examine what factors are contributing to this asym- metry as well. We can assess how much of the asymmetry that is present is due to the location, orientation and size of the reconstructed breast separately. It follows that any asymmetry remaining after these transformations is due to a difference in the actual shape of the breasts, or an ‘intrinsic asymmetry’. It is also desirable to examine asymmetry over the whole surface of the breasts, rather than just the landmarks. In order to do this, we create a set of comparable points across all breasts, so that they all have the same number of points which are in corresponding positions. Then, after re- flection, the asymmetry can be quantified by calculating the distances between these corresponding points on the reconstructed and unreconstructed breast. The shape differences between the two breasts can also be examined by a principal components analysis.

Start time 12:20

MEASUREMENTERRORCORRECTIONOFTHE ASSOCIATION BETWEEN FASTING BLOOD GLUCOSE AND CORONARYHEARTDISEASE - ASTRUCTURAL FRACTIONALPOLYNOMIALAPPROACH Alexander Strawbridge MRC Biostatistics Unit, Cambridge Keywords: measurement error, fractional polynomials, regression calibration, epidemiology

46 Some epidemiological variables such as height and weight may be assumed to be measured precisely however others such as blood pressure, blood glucose or food intake may be subject to substantial measurement error. Fractional polynomials are widely used in epidemiological studies to model contin- uous non-linear exposure-response relationships but measurement error can lead to serious bias in the parameter estimates in our models. Regression calibration is an in- tuitive and easily implemented method for modelling the relationship between true exposure and observed exposure when repeat measurements are available. We show how fractional polynomials and regression calibration can be combined to produce a model that is corrected for the bias induced by measurement error. We then illustrate this method on a dataset looking at the association between fasting blood glucose and the risk of coronary heart disease events and show that measure- ment error may be leading us to underestimate the risk associated with higher than normal levels of blood glucose.

Start time 12:45

MODELLING THE EFFECTS OF ANTIBIOTICS ON CARRIAGE LEVELSOF MRSA Eleni Verykouki University of Nottingham, UK Keywords: Markov Models, Maximum Likelihood, MCMC

Methicillin-Resistant Staphylococcus Aereus (MRSA) is a bacterium that is usually found on the skin and in the nose. Once it enters the body it becomes harmful as it is resistant to antibiotics and is one of the most serious causes of nosocomial and surgical site infections. In the project we are interested in assessing the effect of an- tibiotics of MRSA on data taken from a hospital study in London. A discrete-time Markov chain model is used to describe the daily MRSA carriage level in patients. Frequentist and Bayesian inference for the model parameters is drawn via maximum likelihood and MCMC methods respectively. We validate our methodology using simulated data and then we fit our model to the real data (obtained from the above study). Finally, we discuss how chi-square tests can be used to assess the goodness of fit.

47 Start time 13:10

PLANNINGFUTURESTUDIESBASEDONTHE CONDITIONALPOWEROFARANDOM-EFFECTS META-ANALYSIS Verena Roloff and Julian Higgins MRC Biostatistics Unit, Cambridge, UK Keywords: Random-effects meta-analysis, conditional power, sample size, information size, heterogeneity

Systematic reviews like those produced by The Cochrane Collaboration often provide recommendations for further research. When meta-analyses are inconclusive, such recommendations typically argue for further studies to be conducted. However, the nature and amount of future research should depend in the nature and amount of the existing research. We propose a method based on conditional power to make these recommendations more specific. Assuming a random-effects meta-analysis model, we evaluate the influence of the number of additional studies, of their information sizes and of the heterogeneity anticipated among them on the ability of an updated meta-analysis to detect a pre-specified effect size. The conditional powers of possible design alternatives can be summarized in a simple graph which can also be the basis for decision making. An example from literature is used to demonstrate our strategy. We find that if heterogeneity is anticipated, it might not be possible for a single study to reach the desirable power no matter how large it is.

12.1.6 Session 2b: Financial Session Room: MS.04 Chair: Murray Pollock

Start time 11:30

MODELLINGTHERANKSYSTEMWITH GIBBS,BOSE EINSTEINOR ZIPF LAW.APPLICATION IN MATHEMATICAL FINANCE Tomasz Lapinski University of Warwick, UK

Rank systems frequently occur in areas such as linguistics, physics, economy and finance therefore their structure varies significantly. Existing modelling approaches

48 have been developed and introduced separately to meet the needs of the particular discipline. However, it turns out, that for the particular rank system, which has not been ex- plored before, we are able to combine the existing approaches and then determine which of distributions is the most appropriate: Gibbs, Bose-Einstein or Zipf Law, as- suming that in real life such system obeys the maximum entropy principle. Particularly, this approach could be used in the financial mathematics, for the choice of optimal portfolio of assets.

Start time 11:55

AMARTINGALE APPROACHTO ACTIVE PORTFOLIO SELECTION Daniel Michelbrink The University of Nottingham, UK Keywords: active portfolio selection, martingales, expected utility maximisation, geometric Brownian motion

An active portfolio selection problem is considered where an investor is interested in outperforming a benchmark portfolio. This benchmark can be given, for example, by a stock index. The investor chooses to maximise expected utility from the ratio of his portfolio and the benchmark. The problem can then be solved using a stochastic control approach or a martingale approach. We will present the latter one.

Start time 12:20

MEASURINGVEGARISKSOF BERMUDAN SWAPTIONS UNDERTHE MARKOV-FUNCTIONALMODEL Duy Pham and Dr. Joanne E Kennedy Department of Statistics, University of Warwick, UK Keywords: Markov-Functional, Bermudan swaption, Hedging, vega risks

Markov-Functional (MF) models form a popular class of models in which the value of pure discount bonds can be expressed as a functional of some (low-dimensional) Markov process. We shall consider a particular application of MF model, pricing and hedging the Bermudan swaptions which are by far the most common in the interest rate derivatives market. Practically, calculation of risk sensitivities for a Bermudan swaption is as important as calculation of its value. In this work, we consider dif- ferent parametrizations of the driving Markov process and their implications on the Bermudan swaption’s vega risks.

49 Start time 12:45

MATHEMATICAL AND STATISTICAL MODELSFOR PREDICTING FINANCIAL BEHAVIOUR Golnaz Shahtahmassebi University of Plymouth, UK Keywords: Ultra high frequency financial data, Poisson difference distribution, decomposition, Bayes, Markov chain Monte Carlo

In this study we introduce the application of the Poisson difference (PD) distribution to ultra high frequency financial data. To investigate the behaviour of index change, PD models were implemented in a Bayesian framework via the Markov chain Monte Carlo (MCMC) methods. In order to capture the excess of zero counts in the data, the zero-inflated distribution is used. In addition, a decomposition (ADS) model, which decomposes an index change into three components: index activity, direction and size of the index change, was also considered using the Bayesian approach. Both of the models predicted the index change with a reasonable degree of accuracy. How- ever, the PD model might be easier and less time consuming to implement in online applications, e.g. making predictions. The Gelman convergence diagnostics showed a good convergence of the chains in the case of both the ADS and PD models.

Start time 13:10

AN OPTIMAL STOPPING PROBLEM OF FINITE HORIZON WITHREGIMESWITCHING Chun Wang School of Mathematical Sciences, The University of Nottingham, UK Keywords: optimal stopping, regime switching, supermartingale

We study a class of finite-horizon optimal stopping problems under regime switch- ing models by considering a series of optimal stopping problems and its limit. The application of this problem includes the pricing of American put options where the stock price evolves as a regime switching geometric Brownian motion. The construc- tion involved will naturally lead to a computational procedure for which a numerical example also is provided.

50 12.1.7 Session 2c: Elicitation and Epidemiology Session Room: MS.05 Chair: Michelle Stanton

Start time 11:30

ON ELICITING EXPERT OPINIONIN GENERALIZED LINEAR MODELS Fadlalla G. Elfadaly and Prof. Paul H. Garthwaite The Open University, UK Keywords: Elicitation Methods, Prior Distributions, Generalized Linear Models, Interactive Graphical Software

An important assessment task in Bayesian analysis of generalized linear models (GLMs) is to specify an informative prior distribution for the model parameters. Suitable elic- itation methods play a key role in this specification by obtaining and including expert knowledge as a prior distribution. An elicitation method of quantifying opinion about any GLM was developed in Garthwaite and Al-Awadhi (2006). The relationship between each continuous pre- dictor and the dependant variable (assuming all other variables are held fixed) was modeled as a piecewise-linear relation. The regression coefficients of this relation were assumed to have a multivariate normal distribution. However, a simplifying as- sumption was made regarding independence between these coefficients, in the sense that regression coefficients were a priori independent if associated with different pre- dictors. In this current research we relax the independence assumption between coefficients of different variables. This will significantly increase the range of situations where the method is useful, but it means that the variance-covariance matrix of the prior distribution is not necessarily block-diagonal. A method of elicitation for this more complex case is given and it is shown that the resulting variance-covariance matrix is positive-definite. The method was designed to be used with the aid of interactive graphical software, which is being revised and extended further in this research to handle the case of GLM with correlated pairs of covariates.

Start time 11:55

DISCORDANCY BETWEEN THE PRIOR AND DATA USING CONJUGATE PRIORS Mitra Noosha Queen Mary University of London

51 In Bayesian Inference the choice of prior is very important to indicate our beliefs and knowledge. However, if these initial beliefs are not well elicited, then the data may not conform to our expectations. The degree of discordancy between the observed data and the proper prior is of interest. Pettit and Young (1996) suggested a Bayes Factor to find the degree of discordancy. I have extended their work to further exam- ples. I try to find explanations for Bayes Factor behaviour. As an alternative I have looked at a mixture prior consisting of the elicited prior and another with the same mean but a larger variance. The posterior weight on the more diffuse prior can be used as a measure of the prior and data discordancy and also gives an automatic robust prior. I discuss various examples and show this new measure is well correlated with the Bayes factor approach.

Start time 12:20

INDIAN BUFFET EPIDEMICS ABAYESIAN APPROACHTO MODELLING HETEROGENEITY Ashley P. Ford and Gareth O. Roberts University of Warwick, UK Keywords: Epidemic, MCMC

The application of mathematical and computer models to the prediction of epidemics in real time is often lacking the crucial stage of statistical inference. There is a need for techniques of inference on models which lie between the extremes of over simpli- fication and too complex for inference. The Indian Buffet Epidemic model has been developed to address the need for a model which is more suitable than assuming homogeneous mixing or an incorrect network model. The aim is to have a process which fits the heterogeneity and two or three parameters that measure the departure from homogeneity. The Indian Buffet Epidemic combines a bipartite network model with the Indian Buf- fet process to provide a realistic model which is simple to define and simulate from. The model assumes that there are a large number of potential classes, individuals belong to a subset of these classes. The classes might be households, schools, clubs, etcetera, an important feature of this new class of models is that the classes do not need to be specified. Within each class infection occurs homogeneously and recovery is as in the basic SIS or SIR model. The model is descibed along with an MCMC algorithm for deriving parameter esti- mates. An important aspect is the development of a new proposal distribution for large binary matrices. The algorithm is demonstrated on a range of simulated data from both the true model and other epidemic models and comparisons made be- tween centered and non-centered representations for the augmented data.

52 Start time 12:45

A HIDDEN MARKOV MODEL TO ANALYSE MRSA TRANSMISSION IN HOSPITAL WARDS Colin Worby University of Nottingham, UK Keywords: hidden Markov model, epidemiological model, MRSA

Methicillin-resistant Staphylococcus aureus (MRSA) remains a problem in healthcare institutions in the UK and worldwide, causing serious, sometimes life-threatening, infections with limited treatment options. For this reason there is much emphasis on the prevention of transmission, for example through the isolation of known cases in side rooms or cohorts, and the use of contact precautions such as disposable gowns and gloves. However, there is still much debate over the efficacy of individual control measures. We use hospital data collected from a selection of general medical wards, and create a model describing MRSA transmission dynamics amongst patients, with the aim of estimating the effectiveness of hospital infection control strategies. A hid- den Markov model is used to describe the indirectly observed MRSA transmission process, accounting for the fact that screening to detect MRSA presence is not 100% accurate. This framework allows us to analyse how the probability of a patient ac- quiring MRSA is related to ward prevalence, and how effective isolation and de- colonisation measures are in reducing transmission. The study confirms a reduction in transmission due to the combined effect of isolation and decolonisation treatment. While side room isolation is widely used in controlling the spread of nosocomial pathogens, we found no evidence to suggest physical isolation, through moving pa- tients to a single room, significantly reduces transmission potential in comparison to isolation methods on the open ward.

Start time 13:10

ESTIMATING THE SIZE OF A BADGER POPULATION USING LIVECAPTUREANDPOST-MORTEM DATA Neil Walker1, Dez Delahay1 and Prof Peter Green2 1 Fera, Woodchester Park, Stonehouse, Glos. 2 Maths Dept, University of Bristol Keywords: Bayesian, mark-recapture, autocorrelation, population size

Woodchester Park in Gloucestershire has been the site of an intensive mark-recapture study on a local badger population study since 1975. We consider methods of popula- tion size estimation using these data supplemented by information on post-mortem

53 recoveries. Of particular relevance is the integrated approach advocated by Catch- pole et al (1998) - this is applied in a Bayesian context. In addition, we look at idiosyn- cracies in the data and possible extensions therein, for example temporal autocorre- lation in the capture, survival and recovery parameters. Finally, the performance of different models is considered and we discuss possible reasons for these differences.

12.1.8 Session 2d: Multivariate Statistics Session Room: A1.01 Chair: Nathan Huntley

Start time 11:30

CAUCHY PRINCIPAL COMPONENTS ANALYSIS Aisha Fayomi and Prof. Andy Wood University of Nottingham, UK Keywords: Principal Components Analysis, robust statistical techniques, Cauchy likelihood

Robust methods are highly relevant in multivariate statistical analysis. Many dif- ferent robust methods have been developed to cover the needs of numerous other fields. Principal components analysis (PCA) is considered as one of the most impor- tant techniques in statistics. However, it depends on either a covariance or a correla- tion matrix, which are both very sensitive to outliers. From this point of view, it was our thought to develop an alternative method to classical PCA, which is more robust, by using the Cauchy likelihood function to construct a robust principal components procedure.

Start time 11:55

APPROXIMATE JOINT STATISTICAL INFERENCEFOR LARGE SPATIAL DATASETS James Sweeney and John Haslett Trinity College Dublin, Ireland Keywords: Multivariate nonparametric regression, Palaeoclimate reconstruction, Inverse problems

We propose an approximate sequential approach for inferring the correlation matrix in large multivariate spatial regression problems. This enables the decomposition of the computationally intensive, multivariate, ”joint” problem, into a set of indepen- dent univariate problems with possible correlation structure inferred sequentially. Omission of correlation structure (where inappropriate) in potential models will lead

54 to increased uncertainty in the degree of confidence at the reconstruction stage of an associated inverse problem.

The results from the proposed sequential approach are compared to those obtained using the (correct) full joint approach through the comparison of bias and predictive properties for simulated and palaeoclimate data. Inference procedures used are Em- pirical Bayes (EB) based where the hyperparameters governing a given model are considered as unknown fixed constants.

Start time 12:20

MULTIVARIATE OUTLIERS, THE FORWARD SEARCH AND THE CRONBACH’S RELIABILITY COEFFICIENT Michael Tsagris University of Nottingham, UK Keywords: multivariate outliers, Forward search, Cronbach’s alpha The multivariate outliers are of very interest due to the nature of the data. While in the univariate case, things are straightforward, when moving to more than one variables things can be very difficult. In this work, multivariate outlier detection methods are discussed and the Forward search is also implemented. The robust es- timates of scatter and location is the key feature for the detection of outliers. Finally, the Cronbachs reliability coefficient is discussed and applied to the Forward search as a monitoring statistic.

Start time 12:45

BAYESIAN ANALYSIS IN MULTIVARIATE DATA Rofizah Mohammad and Dr. Karen Young University of Surrey, UK Keywords: Model choice, Bayes factors, Classification, Discriminant analysis, Influential observations

In this presentation we will be considering a Bayesian approach to model selection in multivariate normal data using the Bayes factor, similar to that used by Spiegel- halter and Smith (1982). We are particularly interested in classifying observations, when we know that they come from different populations. We shall compare clas- sical techniques of linear and quadratic discriminant functions with a new Bayesian approach. We are interested in looking at the effect of observations on this classifica- tion. One diagnostic to determine the effect of observations on a Bayes factor is kd, which is used to assess the effect of individual observations on model choice, Pettit and Young (1990).

55 Start time 13:10

SOME ASPECTSOF COMPOSITIONAL DATA Fiona Sammut University of Warwick, UK Keywords: Multivariate Constrained Data

A composition X is a D-vector, whose components X1,...,XD satisfy a sum con- 6 straint, that is, X1 + ... + XD = c, where c may be equal to 1, 100, 10 or any other constant, depending on unit of measurement. Due to its nature, compositional data conveys only relative information, the elements are always zero or positive and one part of the composition may always be written in terms of the remaining parts. Data is thus not free to range as unconstrained variables encountered in traditional multi- variate analyses. This fact conditions the variance covariance structure in that at least one covariance is forced to be negative. In general, analyzing compositional data with methods which are based on the variance covariance or correlation structure like factor analysis, discriminant analysis and principal component analysis would lead to incorrect results. It was thus necessary to find some parametric class of dis- tributions which could cater for the dependence structure between the parts of the compositions but which could also make the transition from the simplex (the space of compositional data) to the whole real line possible. A possible approach to such a situation is based on logratio transformations which provide a one to one mapping from the simplex to the real space, removing the problem of having to work within a constrained sample space. Such a transformation then makes it possible to apply the standard multivariate techniques on the transformed compositional data. A ma- jor shortcoming which is common to all logratio transformations, however, is that if some parts of a composition are zero, the corresponding logratios may not be com- puted. Different strategies had to be developed in attempt to deal with this problem.

12.1.9 Session 3a: Genetics Session Room: MS.01 Chair: Dennis Prangle

Start time 14:30

INCORPORATING AVAILABLE BIOLOGICAL KNOWLEDGE TOEXPLOREGENOME-WIDE ASSOCIATION DATA Marina Evangelou MRC-Biostatistics Unit, University of Cambridge, UK Keywords: Genome-wide association studies, Pathway-based analysis

56 The evolution of the science of genetics and the development of genotyping technolo- gies have made genome-wide association studies (GWAS) feasible. GWAS have been successful in identifying SNPs that are significantly associated with various complex diseases, but they do not have the required power to detect small effects of SNPs that are known to be biologically associated with the disease. Our research focuses on the exploration of genome-wide association data using pathway-based analysis. Pathway- based analysis is a joint test of association between a group of SNPs/ genes within a known biological pathway and the outcome (which can be either a binary response variable or a continuous one). Pathway-based approaches have the advantage of in- corporating the available biological knowledge of SNPs and genes and therefore have a better chance of identifying the true model of association. Our genome-wide association study aims to identify the relationship between ge- netic loci and platelet function. Platelets, which play an important role in thrombus formation, are rapidly activated by a range of agonists like collagen and ADP. This study involves a cohort of 500 healthy individuals for each of whom four endpoints were measured: fibrinogen and p-selectin responses to ADP and collagen agonists in order for platelet function to be determined. It is believed that a large number of genes with small effects is associated with platelet function and we are aiming to find this by implementing approaches to pathway analysis.

Start time 14:55

INFORMED BAYESIAN CLUSTERINGOF GENE EXPRESSION LEVELS Anna Fowler Imperial College London, UK Keywords: Bayesian Hierarchical Clustering, Variable Selection, Gene Expression Levels

Single Nucleotide Polymorphisms (SNPs) occur when there is a variation in the DNA sequence at one of the nucleotide bases. This can cause differences in the proteins produced and therefore alter the actions of the cell. HLA-DQA proteins play an es- sential role in the immune system by presenting antigens to a specific group of white blood cells (T cells) to enable them to produce the antibodies needed. The data we are analysing are part of the HapMap project and consist of genotype labels for three SNPs which cause the 116 subjects to produce different amounts of the HLA-DQA protein. There are also gene expression levels for each subject, which indicate the level of production for the proteins associated with each gene. It is the immune sys- tem which is primarily of interest here, and of the 3538 measured genes very few produce proteins which are related to immunity. Identifying the significant genes is complicated by the dimensionality of the data and has been approached in many ways recently. Two-way Bayesian hierarchical clustering allows clusters to form over both genes and subjects, revealing the underlying block-like structure of the data. Genes which

57 are related to the immune system are more likely to be co-regulated with the SNP genotypes than those which are not. Therefore, the clustering of the subjects and their genotypes will influence the clustering of the genes which are related to the im- mune system significantly more than the clustering of those which are not. Hence, by applying a novel method of two-way clustering only over the genes which ben- efit significantly from this additional information, we seek to determine which gene clusters are co-regulated with the production of the HLA-DQA proteins and identify these genes as the variables associated with the immune system.

Start time 15:20

AN APPLICATION OF BAYESIAN TECHNIQUES FOR MENDELIANRANDOMIZATIONTOASSESSCAUSALITYIN A LARGE META-ANALYSIS Stephen Burgess and Simon G. Thompson MRC Biostatistics Unit, University of Cambridge Keywords: Genetic epidemiology, Mendelian randomization, Causality, Meta-analysis, Bayesian methods

The determination of causality from observational data is historically a controver- sial question. Observational relationships between a risk factor and an outcome are affected by confounding and reverse causation. Mendelian randomization is a tech- nique whereby genetic information is used analogously to randomization in a ran- domized control trial. Under certain assumptions, genetic information can give in- sight to the nature and direction of a causal association. Genetic variation in a risk factor is determined at birth, so is causally prior to any event, and is allocated ran- domly in population groups, meaning that subgroups differing in genetic variants with a specific effect on the risk factor of interest will not systematically differ in other factors. We show how novel Bayesian techniques can be applied to a large dataset, comprising over 100 000 participants in over 30 different studies measuring over 20 different genetic variants, to assess the causal association of C-reactive protein on coronary heart disease.

58 Start time 15:45

BAYESPEAK:AHIDDEN MARKOV MODELFOR ANALYSING CHIP-SEQEXPERIMENTS Jonathan Cairns1, Christiana Spyrou4, Andy Lynch1, Rory Stark3 and Simon Tavare´1 1 Department of Oncology, University of Cambridge, Li Ka Shing Centre, Cambridge, UK 2 DAMTP, Centre for Mathematical Sciences, Wilberforce Road, Cambridge, UK 3 Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK 4 MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College London, UK Keywords: Bayesian Inference, Hidden Markov Model, Gibbs Sampling, Metropolis-Hastings, Negative Binomial, Oncology, ChIP-seq

Accurate identification of interactions between proteins and DNA is a key element in understanding the mechanisms that lead to cancer. The biological experiment ”ChIP- seq” is used to investigate sites on the chromosome where proteins bind, often acti- vating or silencing a particular gene. The data presents itself as ”peaks” across the chromosome. However, various techni- cal or biological effects can lead to noise, disguising true peaks and even generating false peaks. Hidden Markov Models (HMMs) have applications in this biological setting. We can use the hidden state to indicate a binding site, and choose a model that reflects the expected biological features of the signal. ”BayesPeak” is an MCMC algorithm we have developed to solve this problem, using a Bayesian approach and based on negative binomial emissions. I will be discussing the statistical issues we face when fitting our theoretical model to large data sets.

12.1.10 Session 3b: Medical Statistics II Session Room: MS.04 Chair: Helen Thornewell

Start time 14:30

DESIGNINGA SERIESOF PHASE IITRIALS Siew Wan Hee Warwick Medical School, University of Warwick, UK

59 In some diseases with very small population, the number of patients eligible for clini- cal trial is limited. When the development of new therapies increases relatively faster than the recruitment of patients there is a need to identify a promising treatment as quickly as possible. A design that requires fewer patients will require less time to identify a treatment for further testing in phase III trial. Some authors (Whitehead, 1985 and Yao et al, 1996) have proposed considering a series of clinical trials where each trial tests a treatment that is different from the others. There is a trade-off be- tween large trials which require many patients and small trials which may yield little information, particularly if there is a high start-up cost. We propose a design that is a hybrid of classical frequentist and Bayesian where the traditional analysis at the end of the trial is based on the conventional frequentist hypothesis testing and the Bayesian method is used to maximize the power of the series of trials. Designs are obtained optimise the number of patients and power for each trial in a series. The total number of patients eligible for trial and the type I error (which is declaring the treatment as effective when it is not) are fixed and a start-up cost is included.

Start time 14:55

RESPONSE-ADAPTIVE BLOCK RANDOMIZATION IN BINARY ENDPOINT CLINICAL TRIALS Dominic Magirr Lancaster University, UK Keywords: Clinical trials, Adaptive design The results of a clinical trial will typically accumulate steadily throughout its dura- tion. Response-adaptive randomization (RAR) uses the accumulating data in order to skew the randomization of remaining patients to treatment groups in favour of the current better performing treatment. The aim is to reduce the number of patients receiving inferior treatment. RAR has rarely been used in practice. One example is a trial of extra-corporeal membrane oxygenation (ECMO) to treat newborn infants with respiratory failure. The results of the trial were controversial in large part because only one patient received control therapy. In this talk the ECMO trial is described. Alternative RAR designs are proposed that incorporate random permuted blocks in order to eliminate the possibility of such an extremely unequal allocation ratio.

Start time 15:20

BAYESIAN CLINICAL TRIAL DESIGNS FOR SURVIVAL OUTCOMES Shijie Ren University of Sheffield, UK Keywords: Assurance, Survival outcome

60 When designing a clinical trial, sponsors or decision-makers may only consider the power of the trial, i.e. the conditional probability of a successful trial assuming a specified treatment effect. Since the treatment effect is uncertain, this will not provide a reliable assessment of the probability of a successful outcome and can often give a misleading impression of the likely outcome of the trial. As an alternative to using power, one can consider the unconditional probability of a successful trial outcome known as assurance. This involves utilizing prior information about treatment effects in the design of the trial. We consider how to derive assurance when a trials outcome measure is survival time. We allow for uncertainty in both treatment effect and the control group survival function.

Start time 15:45

THEPOWEROFTHEBIASEDCOINDESIGNFORCLINICAL TRIALS Wai Yin Yeung Queen Mary, University of London, UK Keywords: biased coin design, clinical trials, sequential patient allocation

The biased coin design introduced by Efron (1971, Biometrika) is a design for allocat- ing patients in clinical trials which helps to maintain the balance and randomness of the experiment. Chen (2006, Journal of Statistical Planning and Inference) studied the power of repeated simple random sampling and the biased coin design in which the power is treated as the conditional probability of correctly detecting a treatment effect given the current numbers of patients on the two treatments, the control group and the treatment group. The variances of the responses for the two groups are assumed to be equal. The z test and the t test for a treatment effect are used to demonstrate and analyse the power function when the variances of the treatment responses are known and unknown, respectively. Numerical results given in his paper showed that the biased coin design is uniformly more powerful than repeated simple ran- dom sampling. In this talk, I shall report on my current work which extends Chen’s on the power to the case where the variances of the responses for the two treatments are assumed to be different. I will give numerical results for the powers of repeated simple random sampling and the biased coin design when the variances are known and different; and also when they are unknown and different.

61 12.1.11 Session 3c: Dimension Reduction Session Room: MS.05 Chair: James Sweeney

Start time 14:30

ORACLEPROPERTIESOF LASSO-TYPEMETHODSIN REGRESSIONPROBLEMS Sohail Chand School of Mathematical Sciences, University of Nottingham, UK Keywords: Variable Selection, Lasso, LARS, Oracle properties.

In model building, we often have a large set of predictors. As all the variables are not equally important for the model, we seek a parsimonious model. Parsimonious models are very important for prediction purposes as overfitted models have higher prediction variance. In practice, it is often quite difficult to find a model which is a good fit as well as easy to interpret. As discussed by Fan and Li (2001, JASA 96(456):1348-1360), a good estimation procedure should have the oracle properties, namely variable selection consistency and the optimal estimation rate. Lasso-type methods in the regression context are popular for their simultaneous estimation and variable selection. Our numerical results show in some scenarios how normalisation of the predictors can nullify the advantage of using the adaptive weights and may lead to failure of the necessary and sufficient condition for correct subset selection. The choice of the regularisation parameter is critical for the oracle performance of these methods. We have compared the performance of cross validation with the Wang and Leng (2009, J Roy Stat Soc B Met; 71(3):671-683) BIC approach in choosing the appropriate value of regularisation parameter. Our results show that the cross validation choice of regularisation parameter may lead to inconsistent variable selec- tion.

Start time 14:55

PENALIZED WEIGHTED LEAST SQUARES VARIABLE SELECTION METHODFOR AFTMODELSWITH HIGH DIMENSIONAL COVARIATES Md. Hasinur Rahaman Khan and J. Ewart H. Shaw University of Warwick, UK Keywords: AFT model, Penalized Regression, Variable Selection, Weighted Least Squares

62 Although, in recent years penalized regression methods have received a great deal of attention for simultaneous variable selection and coefficient estimation particularly in the analysis of high-dimensional datasets, only small number of methods based on penalized approaches have been suggested for survival datasets. Here we look at a new penalized approach, based on weighted least squares, for model estima- tion and variable selection in parametric accelerated failure time (AFT) models. We applied this approach for Log-Normal AFT model with both low-dimensional and high-dimensional datasets. This approach improves predictive accuracy which is an important inferential goal in survival analysis while dealing with variable selection techniques. The performance of this approach is demonstrated with simulated exam- ples and real datasets where time to survival, in the presence of right censoring, is of interest.

Start time 15:20

LATENT VARIABLE MODELSFOR PROCESS MONITORING Javier Serradilla and Dr. Jian Q. Shi Newcastle University, UK Keywords: Multivariate Statistical Process Control, Latent Variable Models, Probabilistic PCA Fault detection and diagnosis in manufacturing process are a key aspect in current good engineering practice. Statistical approaches to fault detection based on histori- cal operating data have been found to be advantageous with processes having a large number of measured variables. These models, however, tend to underperform in the area of fault diagnosis, where the variable(s) responsible for the plant abnormal be- haviour must be identified. In this presentation we intend to review how latent variable models can be used both to reduce the data dimensionality and form subgroups of variables. These new vari- ables are then used for process monitoring. The added advantage of the approach is that each latent variable will be selectively looking at a specific and well defined sub- set of the original variables. Likewise, fault detection is quicker as the confounding effect of redundant variables is eliminated.

Start time 15:45

A STUDYOFITEMSELECTIONUSINGPRINCIPAL COMPONENTANALYSISANDCORRESPONDENCE ANALYSIS Nur Fatihah Mat Yusoff National University of Ireland, Galway Keywords: item selection, principal component analysis, correspondence analysis

63 This study investigates the dimension-reduction techniques in psychometric testing by using Principal Component Analysis (PCA) and Correspondence Analysis (CA). Psychometric research is one of the fields of social science study that is interested in the theory and techniques of education and psychological measurement. Researchers in this area are frequently concerned with the construction and validation of measure- ment instruments. Theoretically, PCA is a mathematical algorithm that transforms a number of possibly correlated variables into a smaller number of uncorrelated vari- ables by performing a covariance analysis between variables. The PCA concept is closely related to Factor Analysis (FA) which aims to detect structure in the rela- tionships between variables. It is a common technique that has been used by social science researcher in conducting validity and reliability analysis of their study. The CA can be considered as a factor method for the categorical variables and is often linked with producing a low-dimensional graphical display of variables and units. Simple CA is a technique designed to analyse a two-way table, while Multiple Cor- respondence Analysis (MCA) is an extension of simple CA in that it is applicable to a large set of variables. The result will provide information which is similar in na- ture to those produced by principal component analysis, and allows us to explore the structure of the categorical variables included in the table. This study is concerned with reducing the dimension, or number of variables, in an instrument by using the data from a pilot study on personality traits. The original instrument was developed by Oliver P. John and Sanjay Srivastava from University of California, Barkeley in 1999. The pilot survey was conducted at the University Malaysia Sarawak, Malaysia where 80 students from second year and above were randomly selected as respondents. In the original instrument, there are 44 items to assess five personality traits or the big five dimensions. We believe that some of the items, or even dimensions are not relevant in the Malaysian context. At the end of this study, our aim is to produce the best instrument that can represent all of the vari- ables that we are interested in for subsequent use in structural equation modelling of student achievement.

64 12.1.12 Session 3d: Environmental Session Room: A1.01 Chair: Andrew Smith

Start time 14:30

USINGA BAYESIAN HIERARCHICAL MODELFOR TREE-RING DATING Emma M. Jones1, Caitlin E. Buck1, Clifford D. Litton2, Cathy Tyers1 and Alex Bayliss3 1 University of Sheffield, UK 2 University of Nottingham, UK 3 English Heritage, UK Keywords: Dendrochronology, Bayesian hierarchical modelling

Dendrochronology, or tree-ring dating, uses the annual growth of tree-rings to date timber samples. Variation in ring width is determined by variation in the climate. Trees within the same geographical region are exposed to the same climatic signal in each year, but the signal differs from year to year. Dendrochronologists measure sequences of tree-ring widths with a view to dating samples by matching undated sequences to dated sequences known as ‘master’ chronolo- gies. The tree-ring widths from undated timbers are measured and the data are pro- cessed to remove growth trend. The processed data are sequentially matched against one another, each match position is known as an offset; initially matching timbers from the same site or woodland and then matching average sequences from each site or woodland, known as ‘site’ chronologies, to master chronologies. The hierarchical nature of the data leads to modelling the data using a Bayesian hi- erarchical model. The ring-width for tree j in year i is modelled as the sum of the climatic signal in year i and a random noise which is particular to a tree j in year i. This model can be extended to include climatic signals at varying geographic scales. A Gibbs sampler is used to produce posterior probabilities for a match at each offset. This methodology relies on careful prior specification of parameters at each level of the hierarchy. Data are currently being collated from trees of known age from several woods in the UK that will be used to provide informative prior knowledge.

Start time 14:55

NOTANOTHERSPECIESRICHNESSESTIMATOR?! Beth Norris University of Kent, UK Keywords: Statistical ecology, Species richness estimation

65 One of the oldest and most intuitive measures of biodiversity is species richness, which is simply the number of species present in an area of study. Sampling from populations will rarely give a complete inventory of species and therefore several methods have been developed in order to estimate the true species richness of a pop- ulation from sample data. There are over 20 different techniques already described that will produce an estimate of total species richness, so why do we need another? Species richness estimators often perform badly for benthic data sets. Some stud- ies have suggested that species richness estimation is dependent on spatial patterns, and that the clustered spatial distribution of benthic assemblages hampers incidence based estimators such as Chao2 and ICE. None of the commonly used species rich- ness estimators considered take into account spatial heterogeneity, and non-parametric estimators often underestimate the total species richness for such data sets. Therefore, an alternative approach has been proposed which relies on modelling the underlying spatial pattern of individual species. The modelling framework consid- ered is based on the method of maximum likelihood, and fits a parametric model to observed species abundances. As species heterogeneity factors will be taken into account alongside species abundances, this method should perform well in estimat- ing the true species richness of an area. The method will be assessed by simulation, and will be applied to benthic data sets supplied by Cefas. The estimates will be compared to the results from some established estimators.

Start time 15:20

UNCERTAINTY ANALYSIS FOR MULTIPLE ECOSYSTEM MODELSUSING BAYESIAN EMULATORS Rachel Oxlade, Prof. Michael Goldstein and Dr. Peter Craig University of Durham, UK Keywords: Bayesian, Bayes Linear, simulator, ecosystem, model, emulation

Bayesian emulation provides a tool for analysing complex simulators. When there are many parameters over a large input space, and model runs are costly, emulation enables us to approximate the simulator across the space, and gives a measure of our uncertainty at each point. This talk introduces emulation and then investigates how it can be applied to HadOCC, the Hadley Centre Ocean Carbon Cycle model. The goal of the project is to be able to jointly emulate two simulators of the same system, and this idea will be introduced in the talk.

66 Start time 15:45

ESTIMATING BIOLOGICALLY PLAUSIBLE RELATIONSHIPS BETWEENAIRPOLLUTIONANDHEALTH Helen Powell, Duncan Lee and Adrian Bowman Department of Statistics, University of Glasgow, UK Keywords: Air pollution, Monotonic dose-response relationship, Respiratory health

The effects of air pollution on human health can be estimated using ecological time- series studies, which comprise daily data for the population living within an urban area. The responses are daily counts of mortality or morbidity outcomes, which are related to air pollution concentrations and other covariates. The majority of studies estimate a linear relationship between pollution (xt) and health, although a number have estimated non-linear dose-response curves g(xt). However, these curves are typically unconstrained and estimated using smoothing or penalised splines, mean- ing that non-biologically plausible results can occur. For example, for some levels of pollution the estimated health effects may decrease for increasing concentrations. Therefore, we propose a method for estimating biologically plausible dose-response curves, which must satisfy the following properties: (i) increasing monotonicity; (ii) smoothness; and (iii) g(0) = 0, which together enforce the dose-response curve to be non-negative. We applied this approach to data from Glasgow, using counts of respiratory related hospital admissions and ozone concentrations. We compared our model with one that incorporates an unconstrained curve, and found that the latter produced un- realistic results, as the relative risk falls below one and there was a decreasing risk of hospital admissions at high concentrations of ozone. In contrast, the constrained curve does not give a relative risk below one for any concentration of ozone, and therefore does not imply it could be beneficial to your health. This curve was also biologically plausible, because increasing ozone concentrations result in increasing health risks.

67 12.2 Wednesday 14th April 12.2.1 Session 4a: Medical Statistics III Session Room: MS.01 Chair: Fiona McElduff

Start time 09:10

AN APPLICATION OF SURVIVAL TREES TO THE STUDY OF CARDIOVASCULAR DISEASE Alberto Alvarez Iglesias1, John Newell2 and Liam Glynn3 1 School of Mathematics, Statistics and Applied Mathematics, NUI, Galway, Ireland. 2 Clinical Research Facility, NUI, Galway, Ireland. 3 Department of General Practice, NUI, Galway, Ireland. Keywords: Recursive partitioning, Survival Trees, Random Survival Forest

Recursive partitioning methods are a popular non-parametric alternative to the clas- sical parametric and non-parametric models in regression, classification and survival problems. They have been recognised as a useful modelling tool as they produce a model that is very easy to interpret. The beauty of these methods lies in their sim- plicity and the relative ease in which the results of the analysis can be explained to a person with a non statistical background. Single trees are an excellent way to describe the structure of the learning data but their predictive power can be disappointing. In the last decade, many efforts have been made to overcome this problem. These meth- ods are generally known as ”ensemble methods” and they use a set of trees, created by bootstrapping the original data, in order to improve predictibility. The price to be paid, however, is the absence of a singular tree. In this work, a data set of 1586 patients with cardiovascular disease will be analyzed. The primary endpoint was a cardiovascular composite endpoint, which included death from a cardiovascular cause or any of the cardiovascular events of myocardial infarction (MI), heart failure, peripheral vascular disease and stroke. Seventeen factors/covariates will be consid- ered for development of a prognostic model and the results of different methods for growing survival trees will be compared.

Start time 09:35

ANALYSIS OF AN OBSERVATIONAL STUDYTOIN COLORECTAL CANCER PATIENTS Cara Dooley1, John Hinde1 and John Newell2 1 National University of Ireland, Galway 2 Clinical Research Facility, National University of Ireland, Galway

68 The aim of the study was to compare survival of colorectal cancer patients in the whole population against the survival of patients in a sub-population who also had inflammatory bowel disease (IBD). All individuals who suffered from colorectal can- cer were drawn from the entire Irish population using data from January 1994 to December 2005 provided by the National Cancer Registry of Ireland (NCRI). The control group contained many more observations (n > 20000) when compared to the IBD group (n = 170). Given the number of control patients, there was large diversity in this group. In a conventional designed experiment or trial, patients en- tering the trial would be taken to be as similar as possible. Usually patients would be similar in age, health etc. As this was an observational study, there was no design prior to collecting the data. To compensate for this lack of design, each IBD patient is matched to the ”closest” control patient. For each pair of IBD and control patients a distance is calculated and those two patients which have the smallest distance between them (and are so are the most similar) are matched. The distance used in this case is a Malanhobis distance based on ranks. The matching is carried out using the Optmatch Package in R.

Start time 10:00

CAUSAL INFERENCEIN LONGITUDINAL DATA ANALYSIS: ACASE STUDYINTHE EPIDEMIOLOGYOF PSORIATIC ARTHRITIS Aidan O’Keeffe University of Cambridge, UK Keywords: Causality, Multi-state model, Local Dependence and Independence, Psoriatic Arthritis

In any setting when there exists a causal link between two processes or events, the cause must precede its effect. Hence, it seems plausible that a model which aims to uncover a causal relationship should account for the passage of time between cause and effect. Longitudinal data are characterised by repeated measurements being taken over time on units/subjects, and in this longitudinal setting it appears natu- ral to consider causality. Multi-state models offer a way of describing changes in longitudinal data over continuous time and it is through the use of such models, in conjunction with important causal concepts, such as composability, local dependence and local independence and the Bradford Hill criteria, that we shall attempt to infer causality. We use data on the progression to clinical damage in the hand joints of patients suffering from the disease psoriatic arthritis (PsA), under observation at the University of Toronto PsA Clinic, in an effort to demonstrate our approach to causal inference. Specifically, we examine the possibility of a causal link between disease activity and clinical damage at the individual joint level.

69 Start time 10:25

DESIGN AND ANALYSIS OF DOSE ESCALATION TRIALS Maria Roopa Thomas Queen Mary University of London Keywords: Dose escalation, Cohort effects, Bayesian methods

My research work is motivated by (Senn.et al(2007)). The Royal Statistical Society established an expert group of its own to look into the details of the statistical issues that might be relevant to the Phase I First-in-Man TeGenero trial published in the Journal of the Royal Statistical Society Series A. First-in-Man studies aim to find a dose for further exploration in Phase II trials and to determine the therapeutic effects and side effects. Dose escalation trials involve giving increasing doses to different subjects in distinct cohorts. One of the recommendations of the RSS working party was to consider cohort effects. Cohort effects can be influenced by many factors such as different types of people volunteering at different times, changes in the ambient conditions, the staff running the trial, and the protocols for using subsidiary equip- ment. With reference to (Senn.et al(2007)) four designs for three escalating doses and the placebo are taken into account. Using WinBugs the cohort effects are fitted and the designs are compared.The variance of the difference between the doses are computed using the WinBugs software. Area of interest are the Bayesian approaches for the design and analysis of dose esca- lation trials which involves prior information concerning parameters of the relation- ships between dose and the risk of an adverse event as well as the desirable effects of the drug.There is a chance to update after every dosing period using Bayes theorem. In this talk I will discuss some of these issues.

Start time 10:50

MODELLING PARENTAL DECISIONS FOR NEWBORN BLOODSPOTSCREENING Stuart Nicholls Lancaster University, Lancster, UK Keywords: latent variable, decision-making, screening, model

A national programme of newborn bloodspot screening has been in place in the UK since 1969. Recent advances have expanded the range of conditions for which screening is available, with a concomitant increase in the information made available to parents. There is a lack of research, however, as to how parents make decisions about the newborn bloodspot screening. This paper reports the analysis of a postal

70 questionnaire in order to evaluate a proposed model of parental decision-making for newborn bloodspot screening. Structural equation modelling was used to assess the model which showed a good level of fit on several goodness of fit measures as well as a non-significant χ2 value. Squared Multiple Correlations indicate that a high degree of variance associated with parental decisional quality is accounted by it’s predictors of attitude towards screening and perceived choice, with an increase in perceived choice leading to a perceived improvement in parental decisions. Trust in the staff conducting the screening tests was also significantly related to attitudes to- wards screening. This analysis suggests that the proposed model is appropriate. The model expands on existing decision-making models suggesting that decisions are af- fected by sociological factors such as perception of choice and trust in staff as well as rational cognitive elements, such as risk and benefit analyses. This suggests that existing measures of parental decision-making and/or informed choice may may be improved by incorporating these elements.

12.2.2 Session 4b: Point Processes and Spatio-temporal Statistics Session Room: MS.04 Chair: Chris Fallaize

Start time 09:10

POISSON PROCESS PARAMETER ESTIMATION FROM DATA IN BOUNDED DOMAIN Patrice Marek University of West Bohemia, Czech Republic Keywords: Poisson Process, Bounded domain, Parameter estimation, Exponential distribution, Distance-based methods

In the case where we want to estimate the parameter of the Poisson process that de- scribes some natural phenomenon like earthquakes we usually have to use only one realization of this process, because it is quite clear that performing repetition is im- possible because these processes are in the hands of the nature. Moreover, we are usually limited by time or finance and therefore we can use only several observa- tions. The approach presented in this paper offers an alternative to the classical distance- based methods presented in the literature. Our approach is based on the estimation of two parameters, the measure of domain and the parameter of the Poisson process. Using this approach we can avoid censoring which would be problematic in the fur- ther research of the spatial Poisson process in the bounded domain. The work has been supported by the grant of Ministry of Industry and Trade of the Czech Republic MPO 2A 2TP1/051.

71 Start time 09:35

ACOMPARISON OF BAYESIAN SPACE-TIME MODELSFOR OZONE CONCENTRATION LEVELS Khandoker Shuvo Bakar School of Mathematics, University of Southampton, UK Keywords: Space-time modelling, ozone centrations, auto-regressive model, dynamic linear model, Bayesian spatial prediction

Recently, there has been a surge of interest in space-time modelling of ozone con- centration levels. Well known time series modelling methods such as the dynamic linear models (DLM) and the auto-regressive (AR) models are being used together with the Bayesian spatial prediction (BSP) methods adapted for dynamic data. As a result, the practitioners in this field often face a daunting task of selection among these methods. This paper presents a study comparing three approaches: the DLM approach of Huerta et al. (2004), the BSP method as described by Le and Zidek (2006), and the AR models proposed by Sahu et al. (2007). Recent theoretical results (Dou et al., 2009) comparing the first two approaches are extended to include the AR mod- els. The results are illustrated with a realistic numerical simulation example using information regarding the location of the ozone monitoring sites and observed ozone concentration levels in the state of New York in 2005-2006 for months June and July. The speed of computation, the availability of high-level software packages for imple- menting the methods, and the practical difficulties for using the methods for large space-time data sets are also investigated.

Start time 10:00

MULTI-LEVELMODELSFORECOLOGICALRESPONSE APPLICATIONS Iain Proctor2, R.I. Smith1 and Prof. E.M. Scott2 1 Centre for Ecology and Hydrology, Edinburgh, UK 2 University of Glasgow, UK Keywords: Spatial processes, Multi-level models

A problem which occurs often in spatial statistics, is how to represent spatial change. Multi-level models are used for interpreting nested datasets, where various covari- ates are available at differing resolution scales. Used widely in epidemiological stud- ies, this framework is applicable for population studies. In this approach, I will model the population trend of carabid communities in upland sites of the United Kingdom. For these locations, environmental variables are measured at the site level; habitat of the surrounding area is defined for each transect, with repeat transect measures

72 at some sites in later years. The setup of these data lends itself naturally to a multi- level model, in which various covariates can be assigned as fixed or random effects. The structure allows one to assign non-Gaussian distributions to the random effects, thereby creating more flexibility in the model.

Start time 10:25

ASPATIO-TEMPORALMODELLINGOF MENINGITIS INCIDENCEINSUB-SAHARAN AFRICA Michelle Stanton and Prof. Peter Diggle School of Health and Medicine, Lancaster University, UK Keywords: meningococcal meningitis, spatio-temporal, dynamic generalised linear models,

An area of sub-Saharan Africa, known as the meningitis belt, is frequently affected by large-scale meningitis epidemics resulting in tens of thousands of cases, and thou- sands of deaths during epidemic years. The link between the seasonal and spatial patterns of epidemics and the climate has long been recognised, although the mech- anisms which cause these patterns are not well understood. The Meningitis Envi- ronmental Risk Information Technologies Project (MERIT) is a collaborative project involving the World Health Organization, and members of the environmental, public health and epidemiological communities. One of MERIT’s objectives is to use both routine meningitis surveillance data and information on climatic and environmen- tal conditions to develop a meningitis epidemic decision support tool. This decision support tool could then be used to improve the targeting of preventative and reactive vaccine efforts. Weekly meningitis incidence data have been obtained from the Ethiopian Ministry of Health for the period October 2000 to July 2008 at district (woreda) level. Data on the climate variables most strongly associated with meningitis incidence have been obtained for Ethiopia over the same time period from the International Research In- stitute (IRI) at Columbia University, New York. We formulate a spatio-temporal dy- namic generalised linear model for incidence and describe how the model can be fit- ted to spatially aggregated incidence data using remotely sensed images of environ- mental and meteorological factors as explanatory variables. The aim of this project is to enable short-term forecasting of district-level incidence as part of the development of a country-wide meningitis decision support tool.

Start time 10:50

DENOISING UKHOUSE PRICES Andrew Smith University of Bristol, UK Keywords: Nonparametric regression, Penalised regression, Graphs

73 The British people are obsessed with house prices. There is considerable interest in the difference in price between different areas and in different years. This talk will attempt to show a smooth national trend in house prices, in both space and time. We will look at noisy data, provided by Halifax, on UK house prices and discuss it as a particular example of regression on a graph. There are considerable challenges in the data, most notably the lack of covariate values and missing observations, that make existing regression methods fail. Regression on a graph is a new technique that estimates a denoised version of obser- vations made at the vertices of a graph. It is a type of penalised regression, in which distance from data is penalised at all the vertices, and roughness at all the edges of the graph. These penalty terms present computational challenges, so we will see the result of a new, fast algorithm for regression on a graph.

12.2.3 Session 4c: General Session Room: MS.05 Chair: Michael Tsagris

Start time 09:10

MIXTUREOF LATENT TRAIT ANALYZERS Isabella Gollini and Thomas Brendan Murphy University College Dublin, Dublin 4, Ireland Keywords: Binary Data Models, Latent Variable Models, Mixture Models, Variational Methods Latent class analysis and latent trait analysis are two of the most common latent vari- able models for categorical data. Sometimes these models are not sufficient to sum- marize the data, especially when the data comes from a heterogeneous source, the variables are highly dependent and/or the data dimensionality is large. The mixture of latent trait analyzers model extends latent class analysis and latent trait analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable. Fitting the mixture of latent trait analyzers model is difficult because the likelihood function involves an integral that cannot be evaluated analytically. We focus on the variational approach that works particularly well when the dimensionality of the data is large.

Start time 09:35

A WAVELET BASED APPROACH TO HPLC DATA ANALYSIS Jennifer Klapper and Dr. Stuart Barber Department of Statistics, University of Leeds, UK Keywords: Wavelets, High Performance Liquid Chromatography, Vaguelette-Wavelet

74 High Performance Liquid Chromatography (HPLC) is a process by which chemi- cal compounds are separated into their constituent ingredients. The data produced by this type of experiment can be viewed as a time-dependent baseline with inter- mittent peaks. The locations of these peaks indicates which chemicals are present and the area underneath each peak the quantity of the relevant chemical. However there are many issues which confound peak identification and quantification, these include the presence of background noise in the data and baseline drift. These prob- lems, amongst others, mean that a certain amount of preprocessing is needed before the any type of quantification can take place. We use wavelet denoising techniques to remove the background noise and eliminate the effects of baseline drift. We subse- quently use vaguelette-wavelet methods to estimate the derivatives of the data and thus locate the peaks within the data. Finally, numerical integration is used to calcu- late the areas under the peaks.

Start time 10:00

DELETE-REPLACE IDENTITY FOR ASET OF INDEPENDENT OBSERVATIONS Sakyajit Bhattacharya1, Brendan Murphy1 and John Haslett2 1 University College Dublin 2 Trinity College Dublin The delete-replace diagnostic method is developed in the context of a general model of independent observations. If a set of observations is deleted then it is shown to be estimated by the remaining observations. The identity is shown to be particularly true in case of a scalar sufficient statistic. In a multi-parameter case the delete-replace identity holds conditionally. As an ex- ample, the exponential family is explored and delete-replace is shown to be true for a one parameter exponential family. For a curved exponential family the necessary and sufficient conditions for the delete-replace are derived. The estimate of the set of deleted observations is shown to depend only on the suf- ficient statistic. More particularly, the estimate comes out to be the maximum likeli- hood estimator of the parameter. The delete-replace holds only for independent set of observations. A counter exam- ple is derived for a set of dependent observations where the identity does not hold.

Start time 10:25

MODELLING MAIN CONTRACTOR STATUS FOR THE NEW ORDERS SURVEY Ria Sanderson and Salah Merad Office for National Statistics, UK

75 In the past, the New Orders survey sampled only main contractors; this population could be identified as the main contractor (MC) status was collected from an annual census. Following the transfer of construction statistics to the Office for National Statistics, the MC status is now collected through the Business Register and Employ- ment Survey (BRES). This change means that, for small businesses, the population of MCs can no longer be identified (since only a very small number of these businesses are sampled as part of BRES) and hence all small businesses are eligible for selection by the New Orders survey. One important consideration therefore is non-response, as it is unknown whether the non-response rate will be the same for both MCs and non-MCs. In order to reduce potential non-response bias, we introduce a calibration weight which requires an accurate estimate of the number of MCs in the popula- tion. We use data from BRES to build a model, and apply it to every business in the population to give each a predicted probability of being a MC. The small number of businesses in BRES means that we have only been able to construct a model that yields accurate estimates of the number of MCs at high levels of aggregation. How- ever, there could be differential non-response within these levels. Therefore, in the future, we would like to make use of past data to update the predicted probabilities, which should allow for accurate estimates at lower levels of aggregation. In this talk, I will describe briefly the sampling design and the estimation method in the New Orders survey, present some results from the modelling of the MC status, and then discuss data errors in the reporting of the MC status.

Start time 10:50

BAYES LINEAR KINEMATICS IN THE ANALYSIS OF FAILURE RATES Kevin Wilson Newcastle University, UK Keywords: Bayesian inference, Bayes linear kinematics, count data, failure rates

Collections of related Poisson counts arise, for example, from numbers of failures in similar machines or neighbouring time periods. A conventional Bayesian analy- sis requires a rather indirect prior specification and intensive numerical methods for posterior evaluations. An alternative approach using Bayes linear kinematics in which simple conjugate specifications for individual counts are linked through a Bayes linear belief structure is presented. The use of transformations of the Poisson parameters is proposed. The approach is illustrated using an example involving Poisson counts of failures.

76 12.2.4 Session 4d: Graphical Models and Extreme Value Theory Session Room: A1.01 Chair: Guy Freeman

Start time 09:10

UNCERTAINTY IN CHOICEOF MEASUREMENT SCALEFOR EXTREME VALUE ANALYSIS Jenny Wadsworth1, Jonathan Tawn1 and Philip Jonathan2 1 Lancaster University 2 Shell Technology Centre Thornton Keywords: Extreme Value Theory, Measurement Scale, Significant Wave Height

The effect of the choice of measurement scale upon inference and prediction from extreme value models is examined. When measurements of the same process are recorded on different scales linked by a non-linear transformation, separate extreme value analyses carried out on the two scales can lead to highly discrepant conclu- sions concerning future extremes of the process. For some distributions it turns out there is in fact an optimal choice of scale to minimise the bias of the model. This talk describes a how a Box-Cox transformation can be incorporated into an analysis, providing a parametric methodology to account for scale uncertainty. An example dataset of significant wave height measurements is used to illustrate both the prob- lem and the new methodology.

Start time 09:35

MODELLING EXTREMAL PHENOMENA USING DIFFERENT DATA SOURCES Ben Youngman University of Sheffield, UK Keywords: Extreme value theory, Spatial modelling

A common problem in the modelling of extremes of phenomena is sparsity or quality of data. This may be because few extremes have occurred or because extremes are difficult to measure. A consistent source of non-observational data comes from nu- merical model output, eg. climate models. Typically these provide data of high spa- tiotemporal resolution, yet often poorly capture the behaviour of extremes. Here a method is proposed to characterise this inaccuracy. This is done by relating the model output to some proximate observational data, both of which theoretically quantify the same phenomenon.

77 Start time 10:00

PARAMETRISATIONOFGRAPHICALMODELS Simon Byrne Statistical Laboratory, University of Cambridge, UK Keywords: Graphical models, Bayesian inference, Covariance matrix estimation

Graphical models have recently become popular tools in statistics and related fields. A graphical model is a joint probability distribution which has certain conditional independence properties, known as Markov properties, based on the structure of a graph. This graph provides both an aid to the human comprehension of complex multivariate models, as well as a framework for efficient computation of the marginal and conditional distributions, either by exact, approximate or sampling based meth- ods. This talk will focus on the problem of efficiently parameterising families of such dis- tributions. If the parameters for the conditionally independent components them- selves have certain independence properties, so called “hyper Markov properties”, then the problem of parameter estimation, both in a maximum likelihood and Bayesian framework, can be simplifies by local computations. I will provide some examples and applications of these properties.

Start time 10:25

BAYESIAN INFERENCE FOR SOCIAL NETWORK MODELS Alberto Caimo University College Dublin, Ireland Keywords: Exponential random graph models, MCMC methods, Bayesian inference

Exponential random graph models are widely used and studied models for social networks. Despite their popularity, they are extremely difficult to handle from a sta- tistical viewpoint since their normalising constant is available only in very trivial cases. We propose to carry out the estimation using a Bayesian framework via the exchange algorithm of Murray et al. (2006), which circumvents the need to calculate the normalising constants of the posterior density. Moreover we propose to further improve mixing and local moves on the posterior support using a population MCMC approach with snooker update. This method improves performance with respect to the widely used Monte Carlo maximum likelihood estimation whose convergence is often troublesome.

78 12.2.5 Session 5a: Experimental Design and Population Genetics Session Room: MS.01 Chair: Andrew Simpkin

Start time 11:30

CANONICAL ANALYSIS OF MULTI-STRATUM RESPONSE SURFACE DESIGNS &STANDARD ERRORSOF EIGENVALUES Mudakkar M. Khadim School of Mathematical Sciences, Queen Mary University of London, UK Keywords: Response surface methods, Canonical analysis, Eigenvalues, Multi-stratum Design

Bisgaard and Ankenman described the double linear regression method to obtain the standard errors for the eigenvalues in second order response surface models. But they discussed this method only for completely randomized error control structure. However, in many industrial experiments, experimenter might not be able to per- form complete randomization and hence might be forced to use the multi-stratum error control structures of which the Split-plot design is a special case. We have tried to apply the same double linear regression model to multi-stratum error control struc- tures to get the standard errors for the eigenvalues in second order response surface models.

Start time 11:55

D-OPTIMALDESIGNOFEXPERIMENTSFORADYNAMIC MODEL WITH CORRELATED OBSERVATIONS Kieran Martin, Stefanie Biedermann, Susan Lewis, David Woods and EPSRC CASE project supported by GlaxoSmithKline University of Southampton, UK Keywords: experimental design, dynamic models

Models derived from differential equations occur frequently in the pharmaceutical industry. Optimal designs for these models are required to gather information for model fitting. Finding such designs can be problematic: the models will usually be non-linear, making the optimal choice of design parameter dependent, and the observations may be correlated. We aim to find designs which will find accurate estimates of the model parameters while remaining robust to the effects of correla- tion and parameter uncertainty. We find pseudo-Bayesian D-optimal designs to meet

79 these objectives, then use a simulation study to assess their robustness by calculat- ing the mean square error for each design. We demonstrate that the designs found still perform well when the domains of the prior parameter distributions are mis- specified.

Start time 12:20

VULNERABILITY: A 2ND CRITERIONTO DISTINGUISH BETWEEN EQUALLY-OPTIMAL BIBDS Helen Thornewell Maths Department, University of Surrey, Guildford, UK Keywords: Balanced Incomplete Block Designs (BIBDs), Disconnectedness, Observation Loss, Optimality, Robustness, Selection, Vulnerability

If a Balanced Incomplete Block Design (BIBD) exists for the parameters, it is known that these designs are universally optimal. However, if there exists more than one BIBD with the same parameters, is one design better than the other? Is optimality the only criterion that needs to be tested at design selection? Are there ways of distin- guishing between non-isomorphic, equally-optimal BIBDs? Many experiments suffer from observation loss during the course of the experiment. This may result in a disconnected eventual design so that not all pairwise treatment comparisons can be estimated and the null hypothesis cannot be tested. In order to guard against poor eventual designs, I have introduced a Vulnerability Measure to determine how likely a design is to becoming disconnected. The formulae depend on the design concurrences. Are some BIBDs more vulnerable than others? My new robustness criterion is compared to other criteria from literature. For exam- ple, Prescott & Mansson (2001) consider the robustness of designs against the loss of any two single observations, which depends on the block intersection sizes. Are there combinatorial links between block intersections and concurrences? Is the least vulnerable BIBD for disconnectedness also the most robust BIBD against the loss of single observations? Does one criterion provide more information than the other for comparison, selection and construction of BIBDs? General theorems, formulae and results will be presented and interactive examples using sets of complement BIBDs will be demonstrated in order to answer these ques- tions and more...

Start time 12:45

SURFING IN ONE DIMENSION Emma Kershaw University Of Bristol, Statistics Group Keywords: Coalescent, Population Genetics, Stochastic Processes, Population Expansion

80 Geographical expansions of a population have occurred throughout history, with hu- mans believed to have expanded out of Africa in the last 100,000 years. They are of particular interest in the field of evolutionary biology as they can have a drastic effect on the distribution and diversity of genes in the newly colonized area. Such genetic phenomena have been used as markers to indicate possible range expansions in the past. This talk considers the phenomenon of genes surfing on the wave front of an expand- ing population in one dimension and we introduce some classical statistical popula- tion genetics models. Two simulations are introduced which explore the problem further. A forward-in-time model using classical population genetics theory enables an exact ancestral graph of individuals at the wave front to be constructed and used as a means of comparison for the second model, an approximate backward-in-time simulation which attempts to estimate this ancestral distribution using methods of coalescent theory.

Start time 13:10

DIMENSION REDUCTIONFOR HUMAN GENOMIC SNP VARIATION Colette Mair and Dr. Vincent Macaulay University of Glasgow, UK Keywords: population structure, Wright’s island model

We will discuss ways of detecting population structure from genetic data from a set of individuals, each belonging to one population from a genetic collection of popula- tions. The main question of interest is whether the set of individuals belong to a larger homogenous population or if the population can be segregated into subpopulations that are genetically distinct. This is important since a great deal of genetic analysis assumes independence of individual genotypes which may be violated through pop- ulation structure. As a result, not correcting for population structure can result in misleading results. Further, discovering population structure can help understand the demographic history of the populations of interest. One of the many issues with such studies is dealing with the large quantity of data. Over the last decade or so, SNP data are becoming widely available in vast quantities. This is the type of data we will consider throughout. A single nucleotide polymor- phism, or SNP, is a position in the DNA sequence which is known to be variable in the populations of interest. Since we will be dealing with a large number of variables (SNPs), we will consider principal components analysis. This was first introduced to the study of genetic data over 30 years ago and is a common statistical tool for reducing the dimension of data to relatively few components but still accounting for a substantial part of the variation. Each component will capture a proportion of the population structure present in the data, if any. Established software can be used which, given such SNP data and using principal component analysis, can determine

81 if population structure is present in the data. By observing a biplot from a real data set and also using simulated data, correlations with geographical locations will be considered. Such correlations have been observed recently, for example, in Europe. We will firstly consider SNP data from the Human Genome Diversity Panel, consist- ing of roughly 1050 individuals from 50 countries all genotyped at around 650,000 SNPs. From there, we will briefly consider simulated data under Wright’s island model. With this model, simulation of SNP’s from a number of populations is pos- sible with the amount of migration between populations controlled. This simplified model will help illustrate the ideas presented but is only one of many possible mod- els. However it is useful in demonstrating population structure and correlations be- tween geographical and genetic distance.

12.2.6 Session 5b: Censoring in Survival Data and Non-Parametric Statistics Session Room: MS.04 Chair: Jennifer Rogers

Start time 11:30

PARAMETRIC SURVIVAL MODELWITH TIME-DEPENDENT COVARIATES FOR RIGHT CENSORED DATA Hisham Abdel Hamid Elsayed Statistics Group, School of Mathematics, University of Southampton, UK Keywords: Parametric models, Right censoring, Splines, Time-dependent covariates

One standard approach in survival analysis is to use the Cox proportional hazards regression model. This can easily be extended to incorporate one or more covari- ates whose values are subject to change over time. An alternative and potentially more efficient approach is to use simple parametric accelerated failure time mod- els with standard survival distributions such as the Weibull, log-logistic and log- normal. Again these models may be extended to incorporate time-dependent covari- ates. However, in some areas of medical statistics simple parametric models often fit poorly. In this paper the standard Weibull regression model is extended to in- corporate time-dependent covariates and made more flexible by using splines. The competing methods are implemented and compared using two large data sets (sup- plied by NHS Blood and Transplant) of survival times of corneal grafts and heart transplant patients.

82 Start time 11:55

ASSESSINGTHE EFFECT OF INFORMATIVE CENSORINGIN PIECEWISE PARAMETRIC SURVIVAL MODELS Natalie Staplin University of Southampton Keywords: Survival analysis,Informative censoring,Sensitivity analysis,Parametric models,Piecewise exponential

Many of the standard techniques used to analyse censored survival data assume that there is independence between the failure time and censoring processes. There are situations where this assumption can be questioned, especially when looking at medical data. It would be useful to know whether we can assume independence or whether we need a model that takes account of any dependence. The method pre- sented here assesses the sensitivity of the parameter estimates in parametric models to small changes in the amount of dependence between failure time and censoring. Parametric models with piecewise hazard functions are considered to allow a greater amount of flexibility in the models that may be fitted. In particular, piecewise con- stant hazard functions are considered, which means the piecewise exponential model is being used. This method is applied to a dataset that follows patients registered on the waiting list for a liver transplant. It suggests that in some cases even a small change in the amount of dependence can have a large effect on the results obtained.

Start time 12:20

DEALINGWITH CENSORINGIN QUALITY ADJUSTED SURVIVAL ANALYSIS AND COST EFFECTIVENESS ANALYSIS Howard Thom Biostatistics Unit, University of Cambridge, UK Keywords: Cost Effectiveness Analysis, Health Economics, Censoring, Inverse Probability Weighting, Bootstrapping

Estimation of average costs and quality adjusted life years is often complicated by heavy censoring in the data, as this censoring is implicitly informative. Simple em- pirical means are biased, and standard survival analysis methods are inappropriate. For the purposes of cost-effectiveness analysis, it is necessary to obtain unbiased es- timates of the means and variances of our quantities. This issue will be illustrated with a contemporary example comparing the cost-effectiveness of four functional diagnostic tests in the diagnosis and management of coronary artery disease. The

83 method of inverse-weighting will be applied to this example, and an analytic form for variance estimates, derived by Willan et al, will be discussed in comparison with a simple bootstrap method.

Start time 12:45

NONPARAMETRIC PREDICTIVE INFERENCEFOR SYSTEM RELIABILITY Ahmad M Aboalkhair Durham University, UK Keywords: k-out-of-m systems, lower and upper probabilities, nonparametric predictive inference, redundancy allocation, series-parallel systems, system reliability

Recently, the application of a novel statistical method called nonparametric predic- tive inference (NPI) to problems of system reliability has been presented. In NPI, relatively weak statistical modelling assumptions are made, which is made possible by the use of lower and upper probabilities to quantify uncertainty, leading to infer- ences which are strongly based on observed data and which explicitly consider future observable events. Throughout this work, attention is on lower and upper probabili- ties for system functioning, given binary test results on components, as such it takes uncertainty about component functioning and indeterminacy due to limited test in- formation explicitly into account. Lower and upper probabilities, also known as im- precise probability, have several advantages over classical (precise) probability in re- liability context. Coolen-Schrijner et al (2008) considered systems that are series con- figurations of subsystems, with each subsystem a voting system (’k-out-of-m’ system) which consists of only one type of components, and different subsystems consisting of components of different types. They presented a powerful optimal algorithm for redundancy allocation for such systems, for the situation where components of all types have been tested with zero failures found in the tests. MacPhee et al (2009) generalized this to general test results. We present the basic results of NPI for sys- tem reliability, followed by a detailed presentation of optimal redundancy allocation following general component test results, and outline related research challenges.

84 Start time 13:10

NONPARAMETRIC ESTIMATION OF RELIABILITYOF TWO RANDOM VARIABLES USING KERNEL ESTIMATION OF DENSITY Tomas Toupal University of West Bohemia, Czech Republic Keywords: Bivariate distribution, Nonparametric estimation, Reliability, Kernel estimation, Density and distribution function

In this talk there is discussed the problem of the reliability estimation particularly for the bivariate distribution. In the real situations it may be used in many applications, especially in engineering concepts (as structures, static fatigue, the ageing of concrete pressure vessels), medicine, quality control, military service or in a balance of pay- ments. The parametric estimation of a density and distribution function of reliability follow- ing a specified distribution has been discussed extensively in a literature. Hence, in this talk I will present the kernel estimation of density and distribution function us- ing several types of kernels. In the final part I will use results of the previous estimation and I will demonstrate how to obtain the reliability of the obtained kernel estimation and apply it for the experimental data of the balance of payments of the Czech Republic. In this case, the reliability is represented by a fact that the total amount of the expenditures is not higher than the total income. The work has been supported by the grant of Ministry of Industry and Trade of the Czech Republic MPO 2A 2TP1/051.

12.2.7 Session 5c: Time Series and Diffusions Session Room: MS.05 Chair: Alexander Strawbridge

Start time 11:30

SEQUENTIAL INTEGRATED NESTED LAPLACE APPROXIMATION Arnab Bhattacharya and Simon Wilson Trinity College Dublin, Ireland Keywords: Bayesian inference, Sequential methods

85 This work addresses the problem of sequential inference of time series in real time, which will be improved further to deal with spatio-temporal models. The idea is to develop a fast functional approximation scheme so as to perform real-time data analysis of unknown quantities, given observations, which are dependent on some underlying latent variable. The problem is defined as follows: the observed variables Yt, t ∈ N, Yt ∈ Y are assumed to be conditionally independent given the latent process Xt (assumed to be a GMRF) and the unknown hyperparameters Θ, can have any distribution. The primary aim is to estimate the posterior distribution P (x0:t|y1:t, θ) and also the filter- ing density P (xt|y1:t, θ). The computation of these two terms necessarily requires the estimation of the posterior density of Θ. We are interested in providing sequential so- lutions for both P (θ|y1:t) and (xt|y1:t, θ). The new method is motivated by a recently published technique known as Integrated Nested Laplace Transformation (INLA) de- veloped by by Rue et al, 2009. The procedure has already been implemented on Lin- ear Gaussian state-space models with unknown state of the system and covariance parameters and has proved to be very accurate and fast. We consider implementing it in the generalized case where there is nonlinearity and non-Gaussianity.

Start time 11:55

FINDINGCHANGEPOINTSINA GULFOF MEXICO HURRICANE HINDCAST DATASET Rebecca Killick1, Idris Eckley1, Kevin Ewans2 and Philip Jonathan3 1 Maths & Stats, Lancaster University 2 Shell International Exploration & Production, Netherlands 3 Shell Technology Centre Thornton, Chester Keywords: Changepoints, Likelihood, Schwarz Information Criterion, Bayesian Information Criterion, GOMOS

Statistical changepoint analysis is used to detect changes in variability within GO- MOS hindcast time-series for significant wave heights of storm peak events across the Gulf of Mexico for the period 1900-2005. To detect a change in variance, the two-step procedure consists of (1) validating model assumptions per geographic lo- cation, followed by (2) application of a penalised likelihood changepoint algorithm. Results suggest that the most important changes in time-series variance occur in 1916 and 1933 at small clusters of boundary locations at which, in general, the variance re- duces. No post-war changepoints are detected. The changepoint procedure is readily applied to other environmental time-series.

86 Start time 12:20

PREDICTION INTERVALS OF THE LOCAL SPECTRUM ESTIMATE Kara Stevens University of Bristol, UK Keywords: Time series, locally stationary, Bayesian wavelet shrinkage, localized autocovariance, local spectrum prediction intervals

Time series data occur in many disciplines such as finance and medicine. Often there is a dependence structure between time series observations. The typical indicator of this dependence is the covariance function. If a time series is second order stationary then the mean and variance are constant, and the covariance only depends on the time difference between observations. However many time series are not stationary. One class of non-stationary time series are locally stationary time series that possess slowly evolving second order quantities, such as variance. In these cases models that assume stationarity are inappropriate and alternative methods should be used. An interesting class are locally stationary wavelet models, which can be used to de- fine a localized autocovariance, calculated from an evolutionary wavelet spectrum. This is similar to the spectrum used to analyse stationary time series in the frequency domain, but it is expressed within the wavelet domain and changes through time. The evolutionary wavelet spectrum is estimated from data through the wavelet peri- odogram. This quantity is asymptotically unbiased but not consistent. We have developed an empirical Bayesian wavelet shrinkage method to smooth the wavelet periodogram thus improve our estimation of the evolutionary wavelet spec- trum. Our method has the advantage of producing prediction intervals and probabil- ities associated with the evolutionary wavelet estimate. The new methodology will be compared with current techniques.

Start time 12:45

DISCRETE- AND CONTINUOUS-TIME APPROACHESTO IMPORTANCE SAMPLINGON DIFFUSIONS David Suda University of Lancaster, UK Keywords: stochastic calculus, Bayesian inference, computational statistics

In this talk we shall tackle the problem of importance sampling methods for diffu- sions. We first start by approximating an Ito diffusion by a discrete-time Markov chain using the Euler discetization, and then implementing importance sampling

87 methods appropriate for discrete-time Markov chains. This setting is simpler to con- ceive as it only requires the understanding of the Radon-Nikodym derivative for finite-dimensional distributions. We then look at the problem within a continuous- time context. In this case, one requires the understanding of the Radon-Nikodym derivative with respect to probability measures which are infinite-dimensional. In actual practice, continuous-time importance sampling is never implemented exactly. However it will be useful in constructing new proposal densities, and it can also prove useful in analyzing the asymptotic behaviour of importance sampling weights. Some empirical results based on a simulation study of the above shall also be pre- sented.

Start time 13:10

BAYESIAN INFERENCE FOR DIFFUSIONS BASED ON EXACT SIMULATION Isadora Antoniano-Villalobos and Prof. Stephen Walker University of Kent, UK Keywords: Univariate diffusions, Exact Simulation, Bayesian non-parametric, Consistency

When a certain phenomena is modelled by means of a real-valued diffusion process, the model is often stated in terms of a stochastic differential equation. Statistical in- ference in this context is then aimed at the estimation of parameters appearing in the drift and diffusion coefficients of the SDE. When exact simulation via MCMC is used for Bayesian estimation, the algorithm introduces latent variables which transform the model into a Bayesian non-parametric model. In this framework, we propose a way of using the exact simulation algorithm for Bayesian estimation of the parameters of a specific family of SDEs. We then study the consistency of the resulting posterior densities of the parameters involved when the number of data points of a single diffusion path grows within a fixed time inter- val.

12.2.8 Session 5d: Probability Session Room: A1.01 Chair: Duy Pham

Start time 11:30

ANEW BIVARIATE GENERALIZED PARETO MODEL Antonio A. Ortiz Barranon and Stephen Walker University of Kent, UK Keywords: Extreme Value Theory, Generalized Pareto Distribution

88 Recently, Extreme Value Theory (EVT) has become a well developed area of research. However, some open problems in the multivariate case remain, since types of distri- butions present more complications, principally in the dependence structure. So far, the bivariate case is the main focus of the multivariate EVT. One of the concepts that underpin this theory is the tail dependence, which is a measure of the dependence between two variables given that one of them is extreme. Most of the approaches found in the literature deal with the problem via the use of copulas. In the present project, we present a model not based on copulas. We deal with the data with a simple parametric model that leads us to easier computation of the tail dependence and that does not involve the difficulties that the copulas models have shown.

Start time 11:55

BACKWARD INDUCTIONAND SUBTREE PERFECTNESS Nathan Huntley and Matthias C. M. Troffaes Durham University, UK Keywords: Sequential Decision Making, Backward Induction, Separability, Preference Ordering, Independence Principle, Normal Form Solutions

When studying solutions to sequential decision problems, an important property is subtree perfectness (also called separability and consistency). This states that, roughly, for any subtree of the decision tree, the solution of the subtree equals the subtree of the solution. Commonly, solutions lacking subtree perfectness have the following behaviour: the subject initially wants to choose X if he were to reach node N, but upon reaching N wants to choose Y . This is a significant conflict. Subtree perfectness is, however, a very restrictive property, requiring adherence to a preference ordering and the independence principle. We have found that a weaker form of subtree perfectness, admitting many more possible uncertainty and prefer- ence models, can be introduced. This essentially involves relaxing the ordering re- quirement while maintaining the independence principle. In this talk I will explain why this weakening may be acceptable, and make links with backward induction.

Start time 12:20

ONTHE CONVERGENCEOF CONTINUOUSLY MONITORED BARRIER OPTIONS UNDER MARKOV PROCESSES Rui Xin Lee and Dr. Vassili Kolokoltsov University of Warwick, UK Keywords: Barrier options, Markov chains, Feller process, exit probabilities for continuous time Markov chains, infinitesimal generator

89 We consider a general barrier option for which expected discounted random cash flow is modelled as

g(ST )I{τA T } + h(SτA )I{τA≤T } where St, t ≥ 0 is a random price process, IC donates the indicator of set C, τA = inf{t ≥ 0 St ∈ A}, g denotes non-negative payoff, h denotes reabate function, A de- notes knock-out range. Given barrier options prices under a given Feller price process (St)t≥0 equipped with corresponding generator L, Mijatovic and Pitorious (2009) present a novel approxi- mation algorithm by constructing a finite-state continuous-time Markov chains (X(n)) so that its generator X is close to L, its law is close to that of (St)t≥0 and its expected payoffs approximate S = {St}t≥0. We build on the work in Mijatovic and Pitorious (2009). We study the convergence of such sequence of finite-state continuous-time Markov chains to S = {St}t≥0 and establish its rates of convergence.

Start time 12:45

DISTORTIONOF PROBABILITY MODELS Eva Wagnerova University of West Bohemia in Pilsen, Czech Republic Keywords: distortion functions, choice of a model, correction

The choice of a suitable model and its description with a probability distribution is the beginning of every statistical inference. However, the data do not always fol- low typical (textbooks’) probability distributions. There is a possible solution to that problem – to use a distortion function to correct the model. The distortion function is a non-decreasing mapping of the interval [0, 1] into itself. It is a tool to transform distribution functions. This means it can be used already at the beginning of the modelling, too. Some useful modifications of goodness-of-fit tests are possible to construct through distortions. In our presentation, we demonstrate some noted distortion functions and their us- age. We show examples of suitable distortion function upon the choice of the model, too.

90 12.2.9 Session 6a: Sponsors’ Talks Session Room: MS.01 Chair: Jennifer Rogers

Start time 14:30

THE INTERNATIONAL BIOMETRIC SOCIETY:WHAT CAN IT OFFER TO POSTGRADUATE STUDENTS? Richard Emsley International Biometric Society

This talk will introduce the International Biometric Society, which promotes the de- velopment and application of statistical and mathematical theory and methods in the biosciences. We discuss how the Society was founded by eminent statisticians of the day, and how it has now evolved into a truly international society. We focus on the opportunities available to postgraduate students within the International Bio- metric Society, including the FREE student membership, the activities of the British and Irish Region, and details of the 2010 International Biometric Conference taking place in Brazil in December this year.

Start time 15:05

BAYESIAN DESIGN &ANALYSIS OF EXPERIMENTS Phil Woodward Pfizer

Bayesian approaches are becoming widely used in the Pharmaceutical Industry, par- ticularly in the earlier stages of drug discovery and development. This talk will present on current uses of these methods at Pfizer. It will show how the objectives of the studies are quantified using the Bayesian probability concept, and how prior knowledge concerning the efficacy of the compounds being tested is formally used to assess the operating characteristics of the study design. It will also illustrate how more efficient studies have been designed by incorporating the formal use of such prior knowledge in the analysis.

Start time 15:40

AN INTRODUCTIONTO FOOTBALL MODELLING AT SMARTODDS Robert Mastrodomenico SmartOdds

91 Sports modelling presents modern statistics with many interesting and complex prob- lems. As well as the challenge of building models with high predictive utility, there is also a computational challenge associated with calibrating the models given the vast data sets now available across a wide range of sports. This talk describes some of the work we do at Smartodds by providing an introduction to football modelling, and all the associated problems and challenges. We begin by introducing some of the earlier published work in this area, in particular focusing on using generalised linear models to model the goals scored by each team in a football match. We discuss a range of modelling challenges that typically arise in the field of sports modelling, such as how to take account of home field advantage, how to allow for the different strengths of teams, and how to describe the variable nature of team strengths over time. Following this we discuss what it is like to work for Smartodds, and mention some other sports which we are actively researching.

12.2.10 Session 6b: Sponsors’ Talks Session Room: MS.04 Chair: Mouna Akacha

Start time 14:30

MAKING DECISIONSWITH CONFIDENCE -STATISTICS THE SHELL WAY Wayne Jones Shell

Shell’s Statistics and Chemometrics group provides research and consultancy ser- vices in data analysis and visualisation, statistical modelling, experimental design and statistical software tool development to many Shell businesses in the fields of commerce, finance, process development and product development. The group, based at Amsterdam, Chester and Houston, serves clients world-wide.

Start time 15:05

ANINTRODUCTIONTO AHL Martin Layton AHL, Man Group PLC

In this presentation I will talk about AHL, a quantitative hedge fund with a 20 year track record of profitably trading financial markets using model-based, systematic approaches. After introducing AHL, I will walk through the process of creating and evaluating a simple trading system. Finally, time permitting, I will talk through some of the current areas of research within our group.

92 12.2.11 Session 6c: Sponsors’ Talks Session Room: MS.05 Chair: Flavio´ B Gonc¸alves

Start time 15:05

SUPPORTFROMTHE RSS ANDTHEIR YOUNG STATISTICIANS SECTION Helen Thornewell Young Statisticians Section

The Royal Statistical Society is the professional body for statistics and statisticians in the UK. The presentation will remind you about the different memberships available as well as courses and qualifications on offer to support YOU. In particular, informa- tion will be presented about the RSS Young Statisticians Section, including its aims & objectives, a summary of successes since its official launch at the start of 2009, ways to get involved and adverts for upcoming events. Come and find out more about YOUR section

Start time 15:40

OPPORTUNITIESIN PROBABILITYAND STATISTICAL MODELLING AT LLOYDS BANKING GROUP DECISION SCIENCE Bill Fite Lloyds Banking Group

‘There are no problems, only opportunities’ - Jacques Benacin

93 13 Poster Abstracts by Author

NONPARAMETRICPREDICTIVEINFERENCEFORSYSTEM FAILURE TIME Abdullah Al-Nefaiee Durham University, UK Keywords: Lower and upper probabilities, Nonparametric predictive inference, System reliability

Nonparametric predictive inference (NPI) is a recently developed statistical frame- work which makes few modelling assumptions and uses lower and upper proba- bilities to quantify uncertainty. Throughout, we consider the use of NPI to predict reliability of systems, given failure times of tested components which are exchange- able with components used in the system considered. We present some main ideas, and these ideas are illustrated and discussed via examples. We also include a brief outline of main research challenges.

ACOMPARISON OF BAYESIAN SPACE-TIME MODELSFOR OZONE CONCENTRATION LEVELS Khandoker Shuvo Bakar School of Mathematics, University of Southampton Keywords: Space-time modelling, ozone centrations, auto-regressive model, dynamic linear model, Bayesian spatial prediction

Recently, there has been a surge of interest in space-time modelling of ozone con- centration levels. Well known time series modelling methods such as the dynamic linear models (DLM) and the auto-regressive (AR) models are being used together with the Bayesian spatial prediction (BSP) methods adapted for dynamic data. As a result, the practitioners in this field often face a daunting task of selection among these methods. This paper presents a study comparing three approaches: the DLM approach of Huerta et al. (2004), the BSP method as described by Le and Zidek (2006), and the AR models proposed by Sahu et al. (2007). Recent theoretical results (Dou et al., 2009) comparing the first two approaches are extended to include the AR mod- els. The results are illustrated with a realistic numerical simulation example using information regarding the location of the ozone monitoring sites and observed ozone concentration levels in the state of New York in 2005-2006 for months June and July.

94 The speed of computation, the availability of high-level software packages for imple- menting the methods, and the practical difficulties for using the methods for large space-time data sets are also investigated.

BIASIN MENDELIANRANDOMIZATIONFROMWEAK INSTRUMENTS Stephen Burgess and Simon G. Thompson MRC Biostatistics Unit, University of Cambridge Keywords: Genetic epidemiology, Mendelian randomization, Causality, Weak instruments, Finite sample bias

A common epidemiological question of interest is whether an observed correlation between a risk factor and a disease is a true causal association. Mendelian randomiza- tion is a technique for determining the causal association between a risk factor and an outcome in the presence of several possibly unmeasured confounders. A genetic vari- ant is sought, by means of which, under certain assumptions, a causal association can be estimated. However, even when the necessary underlying assumptions are valid, estimates from analyses using genetic variants which are not strongly associated with the risk factor are biased. This bias, which acts in the direction of the observational association between risk factor and disease, if not correctly acknowledged, may con- vince a researcher that an observed observational association is causal, when in fact there is no true association.

USINGDYNAMICSTAGEDTREESFORDISCRETETIME SERIES DATA: ROBUSTPREDICTION, MODELSELECTION AND CAUSAL ANALYSIS Guy Freeman and Jim Q. Smith University of Warwick, Coventry, UK Keywords: Staged trees, Bayesian model selection, Bayes factors, forecasting, discrete time series, causal inference, power steady model, multi-process model

The class of chain event graph models is a generalisation of the class of discrete Bayesian Networks, retaining most of the structural advantages of the Bayesian Net- work for model interrogation, propagation and learning, while more naturally encod- ing asymmetric state spaces and the order in which events happen. We demonstrate here how with complete sampling, conjugate closed form model selection based on

95 product Dirichlet priors is possible for this class of models. We demonstrate our techniques using a simple educational example, and go on to discuss possible future enhancements to and applications of this model class.

FINDINGCHANGEPOINTSINA GULFOF MEXICO HURRICANE HINDCAST DATASET Rebecca Killick1, Idris Eckley1, Kevin Ewans2 and Philip Jonathan3 1 Maths & Stats, Lancaster University 2 Shell International Exploration & Production, Netherlands 3 Shell Technology Centre Thornton, Chester Keywords: Changepoints, Likelihood, Schwarz Information Criterion, Bayesian Information Criterion, GOMOS

Statistical changepoint analysis is used to detect changes in variability within GO- MOS hindcast time-series for significant wave heights of storm peak events across the Gulf of Mexico for the period 1900-2005. To detect a change in variance, the two-step procedure consists of (1) validating model assumptions per geographic lo- cation, followed by (2) application of a penalised likelihood changepoint algorithm. Results suggest that the most important changes in time-series variance occur in 1916 and 1933 at small clusters of boundary locations at which, in general, the variance re- duces. No post-war changepoints are detected. The changepoint procedure is readily applied to other environmental time-series.

ONTHE CONVERGENCEOF CONTINUOUSLY MONITORED BARRIER OPTIONS UNDER MARKOV PROCESSES Rui Xin Lee and Dr. Vassili Kolokoltsov University of Warwick, UK Keywords: Barrier options, Markov chains, Feller process, exit probabilities for continuous time Markov chains, infinitesimal generator

We consider a general barrier option which expected discounted random cash flow is modelled as

g(ST )I{τA T } + h(SτA )I{τA≤T } where St, t ≥ 0 is a random price process, IC donates the indicator of set C, τA = inf{t ≥ 0 St ∈ A}, g denotes non-negative payoff, h denotes reabate function, A de- notes knock-out range. Given barrier options prices under a given Feller price process (St)t≥0 equipped with

96 corresponding generator L, Mijatovic and Pitorious (2009) present a novel approxi- mation algorithm by constructing a finite-state continuous-time Markov chains (X(n)) so that its generator X is close to L, its law is close to that of (St)t≥0 and its expected payoffs approximate S = {St}t≥0. We build on the work in Mijatovic and Pitorious (2009). We study the convergence of such sequence of finite-state continuous-time Markov chains to S = {St}t≥0 and establish its rates of convergence.

MULTI-ARMED BANDITWITH REGRESSOR PROBLEMS Benedict May and Dr. David Leslie University of Bristol, UK Keywords: Bandit Problem, Reinforcement Learning, Linear Regression, Nonparametric Regression

The multi-armed bandit problem is a simple example the exploitation/exploration trade-off generally inherent in reinforcement learning problems. An agent is tasked with learning from experience how to sequentially make decisions in order to max- imize average reward. In the extension considered, the agent is presented with a regressor before making each decision. The agent has to balance the tendency to explore apparently sub-optimal actions (in order to improve regression function es- timates) against the tendency to exploit the current estimates (in order to maximise reward). Study of several past approaches to similar problems has indicated particu- lar desirable properties for the policy used. These properties motivate the choice and study of the algorithm that features in this work. The theoretical properties of the algorithm have been studied and it has been tested on both linear and nonparametric regression problems. The intuitive algorithm has useful convergence properties and, compared to many conventional methods, performs well in simulations.

ADAPTIVEANALYSISANDDESIGNOF MULTIVARIATE NORMALRESPONSESTUDYWITHAPPLICATIONINFMRI STUDIES Giorgos Minas, Dr. F. Rigat, Dr. J. Aston, Prof. N. Stallard and Dr. T.Nichols Department of Statistics, University of Warwick Keywords: Multivariate Normal Distribution, Power, prior/posterior distribution, Monte Carlo approximation

97 We propose a two-stage adaptive design for a study with multivariate normal re- sponse where an overall effect is way more important than local effects. A linear combination of the marginals of the second-stage response is the main endpoint. The weights of the linear combination are chosen using the pilot data of the first stage such that power is maximised. Power is defined as the expectation of the rejection probability for the z-test (or t-test) of the linear combination where expectation is taken over the posterior distribution of the mean (and variance if unknown) of the multivariate response. The analytic expression for the optimal weighting under an identifiability constraint is given. The power under the optimal weighting is approx- imated using Monte Carlo approximation and sample size requirements for the two stages are provided. Application in fMRI studies is explored.

BAYESIAN ANALYSIS IN MULTIVARIATE DATA Rofizah Mohammad and Dr. Karen Young University of Surrey, UK Keywords: Model choice, Bayes factors, Influential observations In this study, we consider Bayesian model selection in multivariate normal data using the well-known Bayes factor. The standard improper priors are used for the param- eter model. The device of imaginary observations is used to determine the ratio of unspecified constant in the Bayes factors. We discuss a few different models. The diagnostic kd is used to assess the influential observation on model choice based on Bayes factors method. The calculations are illustrated using simulation data and Iris data sets.

MINKOWSKI FUNCTIONALIN IMAGE ANALYSIS Noratiqah Mohd Ariff and Dr. Elke Thonnes¨ University of Warwick, UK Keywords: Minkowski functional, Boolean model Various lung diseases, such as emphysema or pulmonary fibrosis, lead to structural deformations in lung tissue. These become apparent as textural changes in high res- olution CT scans of the lung. One natural set of descriptors that may be used to quantify textural changes are the so-called Minkowski functionals or intrinsic vol- umes from integral geometry. These are related to more commonly known mea- sures of shape, curvature and connectivity. In this work, methods of computing the Minkowski functionals from digital images are discussed and their accuracy are tested via standard models in stochastic geometry where the mean Minkowski func- tionals are already known analytically.

98 EXACT DISTRIBUTIONSAND SEQUENTIAL MONTE CARLO FOR CHANGE POINTS Christopher Nam, John Aston and Adam Johansen Department of Statistics, University of Warwick, UK Keywords: Change Point analysis, Hidden Markov Models, Finite Markov Chain Imbedding, Sequential Monte Carlo Samplers

Quantifying the uncertainty in the locations of change points is a topic of increasingly significant interest with various application areas including economics and genetics. This poster will review an existing methodology in calculating change point distri- butions using general finite state Hidden Markov Models (HMMs) for a sequence of data. A change point is defined to have occurred when a run of a particular state has occurred consecutively for at least a desired number of time periods. This method- ology generates exact distributions for the location of change points for particular parameter values using Finite Markov chain Imbedding (FMCI). The use of FMCI extends the original posterior Markov chain to a new Markov chain, such that the progress of any particular run can also be recorded within the state space. This ul- timately allows the probability distribution function to be characterised completely without requiring any asymptotic arguments or being influenced by sampling error. However, as these parameter estimates are themselves subject to uncertainty, the methodology is extended to generate samples from the parameter distributions using Sequential Monte Carlo (SMC). This in turn allows for a more complete characterisa- tion of the distribution of change points to be computed. The extended methodology benefits from the use of exact conditional distributions within the SMC, and thus be- ing computationally more efficient than other approaches where state estimates for each time point are required.

SAMPLESIZERE-ESTIMATIONINCLINICALTRIALSWITH MULTIPLE ENDPOINTS Ives Ntambwe, Tim Friede and Nigel Stallard Warwick Medical School, University of Warwick, UK Keywords: Bonferroni, multiple endpoints, sample size re-estimation, familywise error rate

The choice of an appropriate sample size is a main concern in the design of any clin- ical trial. In the planning stage of a trial one is often quite uncertain about the sizes of parameters or assumptions needed for sample size calculations. The idea of this

99 project is to explore the use of designs that allow checking of these assumptions and adjustment of the sample size if necessary. Designs with sample size re-estimation, also called designs with internal pilot study (IPS), are conducted to look at assumptions regarding the nuisance parameters. Multiple endpoints are not uncommon in clinical research. One example is the use of a test battery in schizophrenia. The analysis is complicated in the presence of mul- tiple endpoints and special techniques are needed to control the Type I error rate because hypotheses are tested for various endpoints. This project aims to bring together the concept of designs with sample size re-estimation and the methodology for dealing with multiple endpoints. This will provide an ex- tension of the current methodology for single endpoint sample size re-estimation to the multiple outcomes setting. Preliminary results have been based on the use of a Bonferonni correction and show that despite misspecification of the nuisance parameters at the planning stage, the power is maintained when performing sample size re-estimation.

MODELLING AIR POLLUTIONANDITS RELATIONSHIP TO HEALTH Oyebamiji Oluwole, Dr. Alison Gray and Prof. Chris Robertson Mathematics and Statistics, University of Strathclyde, Glasgow, UK Keywords: Air pollution, Spatial and temporal modelling, Time series Atmospheric pollution is any substance capable of altering the natural composition of air and causing harm to both humans and their environment. The adverse effects of airborne pollutants upon human health have been well established. The aim of the current work is to model sulphur dioxide (S02) levels in Scotland and to relate these to health. The method we are adopting is to concentrate on systematic trend sur- faces which provide basic descriptions of the patterns of the data both spatially and temporally by incorporating these two attributes in a generalized additive regression model. The study uses S02 data from 41 stations monitoring air pollutants over Scotland, ob- tained from the UK Air Quality Archive data website (www.airquality.co.uk/data). The data used covers the years 1996-2007, comprising 3653 days in the entire study period. The data represent daily mean S02 concentrations. We also have data on the geographical locations of the sites (Easting and Northing). There is missing data and not all stations have measurements for all the years. Descriptive analysis of the data have been carried out and the results of investigating ARMA and ARIMA models for time series modelling and imputation of missing data will be shown. Further modelling will involve use of generalized additive models to incorporate the data attributes of both space and time, before linking the modelled S02 levels to variables concerning human health.

100 INFERENCE ABOUTTHE RATIO OF TWO NORMAL MEANS FOR PAIRED OBSERVATIONS Francisco J. Rubio University of Warwick, UK Keywords: Normal Ratio Distribution, Paired Observations, Reference Analysis

In order to make inferences about the ratio of Normal means β in the case of paired independent Normal random variables (X,Y ), appropriate statistical models have been given in statistical literature. However, in other scientific disciplines such as Cytometry, Physiology, and Medicine, the distribution of the corresponding ratio of the Normal variables or a Normal approximation to this distribution are used to es- timate β. It has been reported (Merril, (1928), Marsaglia, (1965), Kuete et. al. (2000)) that the distribution of the ratio of two independent Normal variables Z = X/Y could be bimodal, or asymmetric, or symmetric, or similar to a Normal distribution under some conditions on the parameters. These conditions have been settled down through simulations and empirical results. In the revised literature there is a lack of assessment of the error made with these procedures. The goal of the present work is to quantify and characterize this error in terms of certain conditions on the parame- ters of the Normal variables. In addition, a result about the existence of a Normal approximation to the distribution of Z when the means of X and Y are positive is presented. Finally, the reference posterior distribution of the ratio of two positive Normal means is analysed.

WAVELET METHODSFOR BRAIN IMAGING ANALYSIS Yiqin Shen and Dr. J.A.D. Aston Department of Statistics, University of Warwick Keywords: Wavelet, Brain Imaging Analysis, fMRI, pre-whitening

The human brain can now be studied using neuroimaging techniques such as func- tional Magnetic Resonance Imaging (fMRI). fMRI data is four dimensional; three spa- tial dimensions and one temporal dimension. When modelling the time dimension, the traditional way is to estimate linear model parameters using least squares model fitting. An alternate way is proposed in the following steps: first, do a wavelet trans- form for the data in the space dimension; second, on the coarsest wavelet coefficients, estimate parameters using standard linear regression. These parameters can then be used to construct a prior to estimate the parameters in the rest of the hierarchical

101 wavelet structure by Bayesian linear regression. A normal prior, scaled using the pre- vious parameter estimates, is used, as the noise increases at higher resolutions and the Bayesian framework consequently bounds the estimates and smoothly shrinks the estimated parameters towards zero (helping remove noise); third, apply the in- verse wavelet transform for the estimated parameters and thus obtain the parameters for all locations in the original image space. The errors are not independent in time, so it is necessary to pre-whiten the data and design matrices using the autocorrelation of each time series. However, when using wavelet method with autocorrelations in different time series model, the design ma- trices of each spatial location need to be kept identical. Thus, we examine through simulation an approximation using a global value of autocorrelation for the design matrices and apply this to real data.

102 14 RSC 2011: Cambridge University

34th Research Students’ Conference in Probability and Statistics

Cambridge

4th - 7th April 2011

Centre for Mathematical Sciences

[email protected]

103 15 Sponsors’ Advertisements

`

This year the conference is being financially supported by CRiSM. CRiSM (Centre for Research in Statistical Methodology) is an EPSRC supported initiative to build capacity in Statistics within UK. It lives within the Statistics Department at Warwick, and funds three academic positions, five postdoctoral research associates and many PhD students. In addition it organises many workshops and conferences and has an energetic visitor programme. Its director is Gareth Roberts. Further information about its activities can be found at http://www2.warwick.ac.uk/fac/sci/statistics/crism/

Forthcoming CRiSM Workshops:

CRiSM Workshop: Continuous-time and continuous space processes in ecology Tue, Apr 20, '10

Runs from Tuesday, April 20 to Wednesday, April 21.

CRiSM Workshop: Model Uncertainty Sun, May 30, '10 Runs from Sunday, May 30 to Tuesday, June 01.

CRiSM Workshop: Orthogonal Polynomials and Application in Statistics and Stochastic Processes Mon, Jul 12, '10

Runs from Monday, July 12 to Thursday, July 15.

CRiSM Workshop: InFer (Inference for Epidemic- related Risk) Mon, Mar 28, '11

Runs from Monday, March 28 to Friday, April 01.

104 Innovation in Sports Modelling

Statistical Modelling Vacancies

ATASS Sports is a leading statistical research a variety of sports, obtaining and incorporating real- consultancy business providing clients with time information, and developing and applying novel high‑quality sports models and predictions. statistical and mathematical modelling techniques.

We are currently looking to fill a range of full-time The closing date for this round of appointments positions to work as part of our existing and is April 22nd 2010. Applications or requests planned research teams. We have both senior and for further information should be addressed to junior positions available for applied statisticians, Steve Brooks via [email protected]. Additional mathematical modellers, database managers information can also be found on our web site – and IT support staff. Generous salary and benefits www.atassltd.co.uk. packages are the norm and depend upon position, qualifications and experience. All posts will be ATASS is committed to equality and values diversity. based at our newly developed office complex on We welcome applications from all suitably qualified the Exeter business park. Our new recruits will individuals - see web site for details. work within close-knit multi-skilled teams to model www.atassltd.co.uk

105

Get the benefi ts of a City career whilst working alongside world class academics

Work for AHL in Oxford For more information visit www.ahl.com

Candidates must have a PhD or equivalent qualifi cation in a quantitative discipline

106

107 Then play a vital role in developing new life-saving, life-enhancing drugs for people worldwide. Then play a vital role in developing new life-saving, pharmaceutical largest and you’ll help one of the world’s Join our Statistics group at Pfizer, development and trial of new molecules and compounds that companies with the discovery, lives. improve Want to extend Want your reach? Statistics

Statistics and Chemometrics Making decisions with confidence

About us

The Statistics and Chemometrics team includes statisticians, data analysts, chemometricians and modellers who help clients in the commerce, finance, process and product development industries to develop better business solutions.

The team draws on Shell Group experience of providing cutting-edge consultancy, software, innovation and training for more than 30 years to serve clients worldwide from bases in the UK, the Netherlands and the USA.

Website: http://www.shell.com/globalsolutions/statisticsandchemometrics

Email: [email protected]

108 Process Solutions

� Statistical process modelling

� Process solutions and software for optimising performance and operating cost, e.g. pre-heat units

� Process software for assuring integrity of pipework

� Tools and techniques to emulate and optimise process conditions

Chemometrics

� Process Analytical Chemistry using advanced spectroscopy with multivariate calibration models, e.g. for MOGAS blending

� Advanced Process Monitoring using multivariate statistical techniques, e.g. Dynamic Chemical Processes

� Enhanced Experimentation, e.g. catalyst characterisation Electron Microscopy, X-Ray Analysis, Kernels, Multivariate Analysis

Training and Software

� Customised statistical training courses - statistics and design of experiments training

� Customised training on use of specialist software tools

Business Solutions

� Statistical forecasting

� Decision tools on Carbon management

� Risk and uncertainty modelling

� Benchmarking advanced data analysis

Product Development

� Supporting product development in fuels, Shell Global Solutions is a network of independent technology companies in the Shell lubricants, chemicals: e.g. vehicle testing, Group. In this case study, the expressions ‘Shell’ and ‘Shell Global Solutions’ are sometimes used for convenience where reference is made to these companies in general, or where no emission testing useful purpose is served by identifying a particular company.The information contained in this material is intended to be general in nature and must not be relied on as specific advice in connection with any decisions you may make. Shell and Shell Global Solutions are not � Designing experiments to provide evidence liable for any action you may take as a result of you relying on such material or for any loss or damage suffered by you as a result of you taking this action. Furthermore, these materials to support marketing claims do not in any way constitute an offer to provide specific services. Some services may not be available in certain countries or political subdivisions thereof. Photographs are from � Analysing data to help understand effects various installations. Copyright © 2008 Shell Global Solutions International BV. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical including by photocopy, recording or information � Collaborating with product teams to provide storage and retrieval system, without permission in writing from Shell Global Solutions on going support International BV. GS1427690308-En(A)

109 advert.indd 1 10/3/10 03:22:10

110 Sign up to our mailing list to win a Sony e-Reader Carry 100s of electronic books on a slim, lightweight digital Sony e-book reader and access your essential Wiley collection wherever and whenever you want. We’re offering a fantastic opportunity to win a Sony eBook reader*. For your chance to win simply sign up to our mailing list at www.wiley.com/go/win Entrants will be entered into an additional monthly draw for the chance to win a £100 Wiley voucher to spend on Wiley books.

*Full terms and conditions can be found at www.wiley.com/go/win WIN!

www.wiley.com/go/win

111

20% conference discount on selected Probability and Statistics titles For a limited time only!

To view all the titles included in this offer please visit www.cambridge.org/RSC2010

Coming this summer: Bayesian Decision Analysis Principles and Practice Jim Q. Smith Hardback 9780521764544 | GBP c. 35.00

www.cambridge.org

112 RSS2010_AD_A5_portrait 15/3/10 14:23 Page 1

A forum for presentation and discussion of methodological developments and areas of application for statisticians and users of statistics. RSS 2010 Brighton, 13 –17 September International Conference

Topics include Key dates G climate change modelling 19 April Extended deadline for RSC attendees G trust in official statistics 24 May Deadline for grant and bursary applications G 2011 census 7 June Deadline for ‘double discount’ registration G measuring progress Deadline for discounted registration G adaptive clinical trials 9 August G performance indicators 8 September Deadline for pre-event registration G composite likelihood G MCMC Confirmed speakers G data capture G Tim Davis (Jaguar Land Rover) G risk G Peter Donnelly (University of Oxford) G statistical literacy G Nancy Reid (University of Toronto)

An opportunity for statisticians, analysts, researchers and other users of statistics from all sectors to share current research and insights . www.rss.org.uk/rss 2010

113 114 16 RSC History

34 2011 Cambridge 33 2010 Warwick 32 2009 Lancaster 31 2008 Nottingham 30 2007 Durham 29 2006 Glasgow 28 2005 Cambridge 27 2004 Sheffield 26 2003 Surrey 25 2002 Warwick 24 2001 Newcastle 23 2000 Cardiff and University of Wales 22 1999 Bristol 21 1998 Lancaster 20 1997 Glasgow 19 1996 Southampton 18 1995 Oxford 17 1994 Reading 16 1993 Lancaster 15 1992 Nottingham 14 1991 Newcastle 13 1990 Bath 12 1989 Glasgow 11 1988 Surrey 10 1987 Sheffield (9) (8) 1985 Oxford (7) 1984 Imperial College (6) (5) (4) (3) 1982 Cambridge (2) 1981 Bath (1) 1980 Cambridge

Table 1: RSC History

115 17 Delegate List

Name Institution Year Email Research Interests Durham University 2 Ahmad Mohammad Aboalkhair [email protected] Nonparametric Predictive Inference (NPI) - System Reliability Newcastle University 1 Holly Ainsworth [email protected] Bayesian Modelling for Ecology University of Warwick 3 Mouna Akacha [email protected] Longitudinal Data, Missing Data, Non-Linear Mixed Models 116 Durham University 2 Faiza Ali [email protected] bayes linear statistics Durham University 2 Abdullah H. Al-nefaiee [email protected] Nonparametric predictive inference for system failure time University of Plymouth 1 Muhannad F. K. Al-saadony [email protected] Stochastic Integral and Application to Finance The Open University 1 Osvaldo Anacleto-Junior [email protected] Time Series, Bayesian Forecasting, Traffic Modelling University of Kent 2 Isadora Antoniano-Villalobos [email protected] Bayesian inference for Markov continuous real-valued processes Name Institution Year Email Research Interests University of Warwick 1 Noratiqah Mohd Ariff [email protected] Statistical Image Analysis Trinity College Dublin 2 Louis JM Aslett [email protected] Bayesian inference and reliability theory Newcastle University 1 Nuri Badi [email protected] Generalized linear models University of Southampton 2 Khandoker Shuvo Bakar [email protected] Environmental Modelling, Bayesian Analysis, Spatio-temporal Mod-

117 elling. University of Kent 2 Antonio Armando Ortiz Barranon [email protected] Extreme Value Theory Trinity College Dublin 1 Paul Barry [email protected] Bayesian Inference University of Liverpool 2 Alex Berriman [email protected] Epidemiology Trinity College Dublin 2 Arnab Bhattacharya [email protected] Bayesian inference and sequential methods University College Dublin 2 Sakyajit Bhattacharya [email protected] Linear models, deletion diagnostics Name Institution Year Email Research Interests University of Sheffield 1 Ms Sujunya Boonpradit stp09sb@sheffield.ac.uk Bayesian statistics University of Leeds 1 Philippa Burdett [email protected] Bioinformatics MRC Cambridge 2 Stephen Burgess [email protected] Mendelian randomization, Causal inference, Literary works of L.N. Tolstoy Statistical Laboratory 2 Simon Byrne [email protected]

118 Graphical models, Bayesian statistics University College Dublin 2 Alberto Caimo [email protected] Statistical network analysis, MCMC methods, Bayesian statistics University of Bristol 1 Joe Cainey [email protected] Monte Carlo, MCMC, Adaptive MC, Sequential MC University of Cambridge, Dept of Oncology 1 Jonathan Cairns [email protected] Computational Biology, Markov Chain Monte Carlo University of Nottingham 3 Sohail Chand [email protected] Time series, Regression analysis, Bootstrap methos Durham University 1 Thomas Dessain [email protected] Applied Probability, discrete and continuous time markov processes Name Institution Year Email Research Interests NUI, Galway 1 Cara Dooley [email protected] Survival Analysis, Fraility Models University of Bath 2 Susan Doshi [email protected] Image analysis, cone-beam CT, image-guided radiotherapy The Open University 2 Fadlalla Elfadaly [email protected] Bayesian Statistics, Subjective Prior Elicitation University of Nottingham 1 Eleni Elia [email protected] Modelling hospital superbugs

119 University of Southampton 3 Hisham Abdel Hamid Elsayed [email protected] Survival Analysis MRC Cambridge 1 Marina Evangelou [email protected] Statistical genetics and Bioinformatics, Genome wide association anal- ysis and Pathway Genome wide analysis University of Durham 1 Felicity Kim Evison [email protected] Public Health Statistics, Medical, Statistical Genetics University of Southampton 2 Sean Ewings [email protected] Diabetes University of Leeds 3 Christopher Fallaize [email protected] Statistical shape analysis, structural bioinformatics Name Institution Year Email Research Interests University of Nottingham 3 Aisha Fayomi [email protected] Multivariate Analysis University of Southampton 1 Verity Fisher [email protected] Experimental Design University of Warwick 2 Ashley Parry Ford [email protected] MCMC, Epidemics Imperial College London 1 Anna Fowler [email protected] Bayesian statistics; statistical genetics

120 University of Warwick 4 Guy Freeman [email protected] Bayesian statistics, causality, graphical models University of Southampton 1 Jotham Gaudoin [email protected] Bayesian modelling for binary data University College Dublin 2 Isabella Gollini [email protected] Model based clustering, Mixture models, Network models University of Warwick 3 Flavio´ B Gonc¸alves [email protected] Bayesian inference for diffusions Lancaster University 1 Aimee Gott [email protected] Wavelets Name Institution Year Email Research Interests University of Sheffield 1 Seungjin Han [email protected] Bayesian, Time Series, Statistical Arbitrage The University of Sheffield 2 Siti Rahayu Mohd Hashim stp08sm@sheffield.ac.uk Multivariate quality control University of Warwick 2 Siew Wan Hee [email protected] Adaptive Bayesian design, Phase II trial University of Warwick 3 Bryony Hill [email protected] Spatial Statistics, MCMCs, Stochastic Geometry.

121 Durham University 1 Kirsty Hinchliff [email protected] Info-Gap Decision Theory Durham University 3 Nathan Huntley [email protected] Foundations of Decision Theory, Imprecise Probability National University of Ireland, Galway 2 Alberto Alvarez Iglesias [email protected] Regression Trees, Classification Trees and Survival Trees Durham University 3 Amin Jamalzadeh [email protected] Data Mining Techniques; Bayesian Analysis; Data Visualization University College London 3 Joao Jesus [email protected] Inference Without Likelihood Name Institution Year Email Research Interests University of Sheffield 2 Emma Jones [email protected] Dendrochronology Trinity College Dublin 3 Chaitanya Joshi [email protected] Bayesian Modelling, Bayesian Inference for Diffisions Processes University of Strathclyde 1 Oyebamiji Oluwole Kehinde [email protected] Spatio-temporal modelling of air pollution University Of Bristol 1 Emma Kershaw [email protected] Applied Probability, Population Genetics, Statistical Genetics

122 Queen Mary University of London 2 Mudakkar Mnas Khadim [email protected] Design of experiments University of Warwick 2 Md. Hasinur Rahaman Khan [email protected] Bayesian Statistics, Biostatistics, Social Statistics University of Strathclyde 2 Mahmuda Khatun [email protected] Statistics in Image Processing Lancaster University 2 Rebecca Killick [email protected] Wavelets, Changepoints, Non-stationary Time Series University of Leeds 2 Jennifer Helen Klapper [email protected] Wavelets, Vaguelette-Wavelets, HPLC Name Institution Year Email Research Interests University of Southampton 1 Maria Konstantinou [email protected] Experimental Design Lancaster University 1 Karolina Krzemieniewska [email protected] Wavelets University of Warwick 1 Tomasz Lapinski [email protected] Probability Theory MRC Cambridge 1 Michael Lawton [email protected] Bayesian Hierarchical Modelling and Stochastic Processes

123 University of Southampton 1 Min Cherng Lee [email protected] Statistical Disclosure Control University of Warwick 1 Rui Xin Lee [email protected] financial mathematics, probability theory, markov processes, credit derivatives University of Lancaster 1 Ye Liu [email protected] Extreme Value Theory University of Sheffield 1 Stephanie Llewelyn s.llewelyn@sheffield.ac.uk Probability and Statistics Lancaster University 1 Dominic Magirr [email protected] Early-phase clinical trials, adaptive designs Name Institution Year Email Research Interests University of Glasgow 2 Colette Mair [email protected] Population genetics University of West Bohemia 4 Patrice Marek [email protected] statistics University of Southampton 2 Kieran Martin [email protected] Optimal design for non-linear models University of Bristol 2 Benedict Christian May [email protected] reinforcement learning, multi-armed bandits, regression

124 Institute of Child Health, UCL 3 Fiona McElduff [email protected] discrete distributions The University of Nottingham 3 Daniel Michelbrink [email protected] Mathematical Finance University of Warwick 1 Giorgos Minas G.C.Minas at warwik.ac.uk Adaptive analysis and design, fMRI University of Lancaster 1 Erin Mitchell [email protected] Non-Stationary Time Series Analysis, Dynamic Linear Models The University of Salford 2 Joanne Louise Moffatt [email protected] Modelling techniques used to investigate strategies teams/players can apply to increase their chances of winning in a sporting contest. Name Institution Year Email Research Interests Newcastle University 1 Nur Anisah Mohamed [email protected] Optimal Dynamic Treatment Regimes University of Surrey 2 Rofizah Mohammad [email protected] Bayesian Modelling in multivariate data University of Warwick 1 Christopher Nam [email protected] Hidden Markov Models Lancaster University 3 Stuart Nicholls [email protected] Latent variable models, bioethics, decision-making, attitude measure-

125 ment Queen Mary University of London 2 Mitra Noosha [email protected] Discordancy between prior and data in Bayesian Inference University of Kent 2 Beth Norris [email protected] statistical ecology Warwick Medical School 2 Ives Ntambwe [email protected] Medical Statistics University of Warwick 1 Emmanuel Olusegun Ogundimu [email protected] missing data, longitudinal studies and surrogate markers evaluation in clinical trials Name Institution Year Email Research Interests UCD Dublin 3 Adrian O’Hagan [email protected] Mixture Models, Generative Discriminative Hybrids, Extensions to the EM Algorithm MRC Cambridge 2 Aidan O’Keeffe aidan.o’[email protected] Dynamic causal inference and multi-state modelling Durham University 2 Rachel Oxlade [email protected] Bayesian statistics, Bayes linear, uncertainty analysis, computer simu- lators, emulation Lancaster University 1

126 Ioannis Papastathopoulos [email protected] Extreme Value Theory, Bayesian Statistics, Time Series University of Liverpool 3 Christopher Pearce [email protected] Epidemiology, Stochastic models University of Warwick 2 Duy Pham [email protected] Probability, Financial Mathematics, Interest Rate Modelling University of Warwick 1 Murray Pollock [email protected] MCMC Durham University 1 Benedict Powell [email protected] Bayesian emulation, multivariate spatial statistics University of Glasgow 2 Helen Powell [email protected] Modelling the effects of air pollution Name Institution Year Email Research Interests Lancaster University 3 Dennis Prangle [email protected] Bayesian Statistics, ABC, Infectious Disease Models University of Glasgow 2 Iain Proctor [email protected] Statistics, Ecology Newcastle University 2 Noorazrin Abdul Rajak [email protected] Bayesian Experimental Design Bristol University 2 Clare Emily Raychaudhuri [email protected] Stochastic differential equations, numerical integration, chaotic sys-

127 tems University of Sheffield 3 Shijie Ren [email protected] Bayesian Clinical Trials University of Warwick 3 Jennifer Rogers [email protected] Survival Analysis, Recurrent Events MRC Biostatistics Unit 2 Verena Roloff [email protected] Meta-analysis University of Warwick 1 Francisco Rubio [email protected] Bayesian Statistics, Biostatistics University of Glasgow 1 Alastair Rushworth [email protected] Spatial and environmental modelling Name Institution Year Email Research Interests University of Warwick 1 Fiona Sammut [email protected] Multivariate Analysis, GLMs Office for National Statistics NA Ria Sanderson [email protected] Sample design & estimation, modelling techniques Trinity College Dublin 1 Susanne Schmitz [email protected] Bayesian Inference Newcastle University 3 Javier Serradilla [email protected] Gaussian Processes, Multivariate Statistical Process Control, Factor

128 Analysis University of Plymouth 2 Golnaz Shahtahmassebi [email protected] Financial Statistics, Bayesian modelling, Computational Statistics University of Warwick 1 Yiqin Shen [email protected] Brain imaging analysis National University of Ireland, Galway 3 Andrew Simpkin [email protected] Smoothing and Derivatives University of Sheffield 1 Sawaporn Siripanthana smp08ss@sheffield.ac.uk Statistical process control University of Bristol 3 Andrew Smith [email protected] Nonparametric regression, Image analysis Name Institution Year Email Research Interests University of Glasgow 2 Joanna Smith [email protected] Shape analysis Lancaster University 3 Michelle Stanton [email protected] Spatial and spatio-temporal epidemiology; tropical disease epidemiol- ogy University of Southampton 2 Natalie Staplin [email protected] Survival Analysis University of Bristol 2 Kara Nicola Stevens [email protected]

129 Time Series Analysis MRC Cambridge 2 Alexander Strawbridge [email protected] Measurement Error Lancaster University 3 David Suda [email protected] Stochastic Calculus, Bayesian Inference, Computational Statistics Trinity College Dublin 3 James Sweeney [email protected] Spatial statistics, multidimensional integration, Nonparametric regres- sion Lancaster University 1 Sarah Taylor [email protected] Modelling of textures using wavelets University of Warwick 1 Alexandre Thiery [email protected] Monte Carlo methods - Statistical physics Name Institution Year Email Research Interests MRC Cambridge 1 Howard Thom [email protected] Model Averaging, Cost Effectiveness Analysis Queen Mary University London 3 Maria R Thomas [email protected] Bayesian statistics and dose finding in clinical trials Univeristy of Surrey 3 Helen Thornewell [email protected] Experimental Design University of West Bohemia 2 Tomas Toupal [email protected] statistics

130 University of Nottingham 1 Michael Tsagris [email protected] Robust Statistics University of Nottingham 2 Eleni Verykouki [email protected] Bayesian Statistics, MCMC Trinity College Dublin 2 Rountina Vrousai [email protected] Bayesian methods for spatial-temporal analysis Lancaster University 2 Jenny Wadsworth [email protected] Extreme value theory; Bayesian methods, especially nonparameterics University of West Bohemia 1 Eva Wagnerova [email protected] statistics Name Institution Year Email Research Interests University of Bristol 3 Neil Walker [email protected] Environmental Statistics University of Nottingham 3 Chun Wang [email protected] Mathematical Finance Newcastle University 3 Kevin Wilson [email protected] Bayesian inference, Bayes linear methods, experimental design University of Nottingham/HPA 1 Colin Worby [email protected] Infection Modelling, Markov Models, MCMC

131 University of Plymouth 1 Alan Wright [email protected] Genetic Epidemiology MRC Cambridge 1 Yang Xia [email protected] Multi-state modelling and survival analysis Lancaster University 1 Tatiana Xifara [email protected] MCMC Methods, Bayesian Statistics University of Nottingham 2 Lei Yan [email protected] Image Analysis, Stochastic Processes Newcastle University 1 Peng Yin [email protected] statistics analysis of missing data Name Institution Year Email Research Interests Queen Mary, University of London 2 Yeung Wai Yin (Winnie) [email protected] Biased Coin Design in clinical Trials University of Sheffield 3 Ben Youngman b.youngman@sheffield.ac.uk Extreme value theory National University of Ireland, Galway 2 Nur Fatihah Mat Yusoff [email protected] Structural Equation Modeling, Correspondence Analysis, Measure- ment Error, Latent Variable University of Nottingham 1 Vytaute Zabarskaite [email protected]

132 Stochastic processes in Mathematical Finance University of Warwick 3 Piotr Zwiernik [email protected] algebraic and geometric methods in statistics, model identifiability, asymptotics under non-regular scenarios Best Talks and Poster

Prizes will be awarded to the three best talks and the best poster as voted for by yourselves, the delegates.

Please use this page to vote for your favorite two talks and your fa- vorite poster and hand it in during lunchtime on Wednesday.

1st best talk:

2nd best talk:

Best poster: Back of RSC 2010 voting slip

Department of Statistics University of Warwick Coventry CV4 7AL www2.warwick.ac.uk/fac/sci/statistics/postgrad/rsc/2010/