Epidemiology for Data Users (EDU) Trainer‘s Manual

ACKNOWLEDGEMENTS

Epidemiology for Data Users (EDU) Trainer‘s Manual

TABLE OF CONTENTS

TOPIC Page

GLOSSARY AND ABBREVIATIONS iii

COURSE SCHEDULE xii

SESSION I: COURSE OVERVIEW 1

SESSION II: DATA SOURCES AND UTILIZATION IN 6

SESSION III: INTRODUCTION TO EPIDEMIOLOGY 35

SESSION IV: DATA QUALITY 72

SESSION V: INTRODUCTION TO MONITORING AND EVALUATION 105

SESSION VI: DATA SUMMARY AND ANALYSIS 150

SESSION VII: INTRODUCTION TO EPI INFO 188

SESSION VIII: PRESENTING AND MODELING DATA 214

SESSION IX: GRAPHING AND MODELING IN EXCEL 238

SESSION X: USING INFORMATION DERIVED FROM SMARTCARE 264

SESSION XI: USING PIVOT TABLES IN EXCEL 289

SESSION XII: HANDS-ON HMIS 291

SESSION XIII: HANDS-ON NACMIS 340

SESSION XIV: CSO BREAKOUT SESSION – RATIOS, PROPORTIONS AND RATES 398

SESSION XV: CSO BREAKOUT SESSION – BASIC STATISTICAL ANALYSES AND 421 INTERPRETATIONS

SESSION XVI: CSO BREAKOUT SESSION - SAMPLING 445

SESSION XVII: WRITING CLEAR, COMPELLING, ACCURATE REPORTS 458

SESSION XVIII: CREATING AN EPIDEMIOLOGICAL PROFILE 496

APPENDIX A: EPI INFO HANDOUT 512

APPENDIX B: STATISTICAL FORMULAS 520

Epidemiology for Data Users (EDU) Trainer‘s Manual

GLOSSARY AND ABBREVIATIONS

Abbreviations:

ANC: Antenatal Care

ANCSS: Antenatal Clinic Sentinel Survey

ART: Antiretroviral Therapy

BSS: Behavioral Surveillance Survey

CBO: Community Based Organization

CDC: Center for Disease Control

CFR: Case Fatality Rate

CHAI: Clinton Health Access Initiatives

CI: Confidence Intervals

CIDRZ: Center for Infectious Disease Research in Zambia

CPR: Contraceptive Prevalence Rate

CRS: Catholic Relief Services

CSO: Central Statistics Office

DACA: District AIDS Coordinating Advisor

DART: Decentralization, Action-Oriented, Responsive, Time Bound

DATF: District AIDS Task Force

DDU: Data Dissemination and Use

DHIO: District Health Information Officer

DHIS: District Health Information Systems/District Health Information Software

DHMT: District Health Management Team

DHO: District Health Office

DHS: Demographic Health Survey

DMO: District Medical Office

Glossary and Abbreviations iii

Epidemiology for Data Users (EDU) Trainer‘s Manual

DPO: Disabled Peoples Organizations

DQA: Data Quality Assurance

DQC: Data Quality Control

EDU: Epidemiology for Data Users

EHR: Electronic Health Record

EWI: Early Warning Indicator

FBO: Faith Based Organizations

GIS: Geographical Information System

GRZ: Government of the Republic of Zambia

HIA Forms: Health Information Aggregation Forms

HMIS: Health Information Management System

ICT: Information and Communication Technology

IDSR: Integrate Disease Surveillance and Response

IEC: Information and Education Communication Materials

IR: Immediate Results

M&E: Monitoring and Evaluation

MCH: Maternal and Child Health

Measles SIA: Supplementary Measles Immunization Activity

MOH: Ministry of Health

NAC: National HIV/AIDS/STI/TB Council

NACMIS: National AIDS Council Management Information System

NARF: National AIDS Reporting Form

NASTAD: National Alliance of State and Territorial AIDS Directors

NGO: Non-Governmental Organization

NIDS: National Indicator Dataset

OPD: Outpatient Department

Glossary and Abbreviations iv

Epidemiology for Data Users (EDU) Trainer‘s Manual

OVC: Orphans and Vulnerable Children

PACA: Provincial AIDS Coordinating Advisor

PATF: Provincial AIDS Task Force

PCSO: Provincial CSO Officer

PEPFAR: President‘s Emergency Plan for AIDS Relief

PHC: Primary Health Clinic

PHO: Provincial Health Office

PLWHA: People Living with HIV/AIDS

PMO: Provincial Medical Office

PMTCT: Preventing Mother to Child Transmission

PPP: Private Public Partnerships

SAS: Statistical Analysis Software

SBS: Sexual Behavior Survey

SC: Smart Care

SHIO: Senior Health Information Officer

SO: Strategic Objective

SOCO: Single Overriding Communication Objective

SOP: Standard Operating Procedures

SWAP: Sector-Wide Approach

SPSS: Statistical Package for Social Sciences

TBA: Traditional Birth Attendant

USG: United States Government

VCT: Voluntary Counseling and Testing

ZDHS: Zambia Demographic Health Surve

ZPCT: Zambia Prevention, Care and Treatment

Glossary and Abbreviations v

Epidemiology for Data Users (EDU) Trainer‘s Manual

Glossary:

Accessibility: Mechanism with which to backtrack and validate data

Accuracy: Degree which data describes what is actually occurring

Activates: Programs planned and complete

Bar Chart: Displays categorical data and helps compare discrete data in distinct categories (four types: simple, grouped, stacked, 100% component)

Categorical Data: Data that cannot be broken in to distinct categories (ex: gender and marital status)

Chart: A tool used to summarize data and present them in figures

Chi-Square Test: A statistical test used to analyze the difference between proportions for two or more populations

Completeness: The extent to which each of the reporting entities is completing the expected indicators

Conceptual Framework (Research Framework): Identifies and illustrates the factors and relationships that can influences the outcome of an intervention or a program

Continuous Data: Numerical data that is measured in an unbroken numerical scale (ex: age, weight)

Counts: Raw number of cases (ex: number of people impacted by a disease or a condition)

Coverage: Extent to which a program reaches its intended target population, institution, or geographic area; assess the availability and utilization of services

Data Cleaning: Omitting obvious errors in data entry that can then be corrected

Data Dissemination: The process of sharing information or systematically distributing information or knowledge to potential users and/or stakeholders

Data Elements: Fields in/on each instrument of which specific data are entered (ex. date of birth, test results, etc.)

Data Flow Analysis: Process of mapping how information flows throughout the organization and/or system

Glossary and Abbreviations vi

Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Profiling: Detailed analysis of available information (id analytics of range of values, inferred data types, etc.)

Data Quality: Features and characteristics that ensure data are accurate and complete and that they convey the intended meaning

Data Use: Process of entering data is used, to make decision or to make changes in planning, policy making, program administration/ management and delivery services, or to take other specific actions designed to improve outcomes

Database: A collection of data from multiple sources

Deviation: The difference between one of a set of values and some fixed value

DHIS (District Health Information System): Aggregation electronic system of health records of patients. Enables districts to assess whether the goals, objectives, indictors, and targets based on strategic and operational plans are being achieved.

Dichotomous Variable: A binary variable that is categorical and has two levels/categories

Distractor Data: Information that distracts from the main point. This information should be excluded from graphic depictions and results text.

Epidemiological Curve: Shows information on a graph that maps the curve of data.

Epidemiology: The study of the distribution and the determinants of health- related states or events in specified populations, and the application of the study towards control of health problems.

Evaluation: A systematic and objective assessment of a project, program or policy, which aims to measure achievements and impacts.

Framework: A graphic depiction of a project and the steps needed to achieve a desired outcome.

Frequencies: The number of occurrences of a repeating event per unit time.

Frequency Distribution: The arrangement of values for that variable showing how often each value occurred

Goal: A broad statement about a desired long termed outcome of the program

Herd Immunity: at a certain level of vaccination, you protect others who are not immune

Glossary and Abbreviations vii

Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence: Measures new cases of a disease that develop over a period of time

Indicator: Clues, signs or markers that measure one aspect of a program and show how close a program is to its desired path and outcomes

Inputs: Expenditures and resources

Integrity: The degree to which data is free from conscious or subconscious bias

Institutional Memory: A collective set of facts and concepts held by a group of people

Key Finding: Information that answers a particular question. It should be highlighted in graphics and results text.

Legibility: Description of whether the data can be interpreted

Line Graph: Displays relationships between two variables on two dimensions

Logic Model (M&E Framework): Provides a linear description of programs, including the needed resources and how they will be used to achieve the programs objectives and goals

M&E Plan: A plan that links strategic information obtained via data collection with evidence-based decision making to improve health programs, activities, responsibilities, timeframes, and cost for each of the 12 components that M&E system will function. Considered a living document of organization that is continually updated.

M&E System: A larger system that consists of people and processes that work together to achieve the 12 performance goals of an M&E system

Median: The middle of the distribution of a variable set in ascending or descending order

Mean: The average of all values of a continuous variable set: best descriptive measure for data

Mode: The value that occurs most in a variable set

Monitoring Data: Tracking the implementation of a program; ongoing effort to inform the stakeholders of the quality, quantity, and timeline of progress towards delivering intended results; day-to-day follow-up of activities

NACMIS (National AIDS Council Management Information System): Database to guide the multispectral national response to HIV/AIDS, STI, and TB

Nominal Variable: Categorical variable without any intrinsic order

Glossary and Abbreviations viii

Epidemiology for Data Users (EDU) Trainer‘s Manual

Normal Distribution: A distribution that represents a bell shaped curve

Null Hypothesis: The hypothesis of no change or no effect

Objective: Statement of desired, specific and measurable program goals

Ordinal Variable: Any categorical variable with some intrinsic order of numeric value

Organizations: An institution that implements projects and programs

Outlier: An observation that is numerically distant from the rest of the data

Outcome: use indicators to measure success of program by evaluating outcomes

Outputs: what directly comes out of the program or proxy via indicators

P-Values: A measure of evidence against results in relation with null hypothesis

Percentage: A way of expressing a proportion as a fraction of 100

Pie Chart: A circular chart divided into segments, illustrating proportion

Prevalence: Measures existing cases of a disease at a particular point in time or over a period of time

Problem Statement: Identifies the specific problem to be addressed; provides information about the situation that needs changing, who it affects, its causes, its magnitude, and its impact on society

Process Evaluation: See: Monitoring

Program: A collection of projects funded by a donor or implemented by an organization

Project: An action undertaken by individuals or groups of managers, researchers, or community members interested in achieving certain goals

Proportion: The ratio of one quantity to another, especially the ratio of a part compared to a whole.

Range: The difference between the maximum and the minimum

Rate: A ratio in which two measurements are compared to each other. (ex: birth rate compares the number of births to a period of time)

Ratio: A relationship between two numbers of the same kin (ex: a comparison of one count to another)

Glossary and Abbreviations ix

Epidemiology for Data Users (EDU) Trainer‘s Manual

Recall Bias: A type of bias that occurs when the way a survey respondent answers a question is affected not just by the correct answer, but also by the respondent's memory

Reliability: The consistency of data

Report: A written presentation of facts and findings, usually used as a basis for recommendations

Results Framework (Strategic Framework): A diagrams that shows the direct causal relationships between incremental results of the key activities all the way up to the overall objective and goal of intervention

Retention: success at delivering the entire package of services to a client

Skewed Distribution: A distribution that has a central location to the left and/or right of the center, with a tail off to the other end

Source: Where data is obtained

Stakeholder: A person, group, organization, or system that affects or can be affected by an organization's actions

Standard Deviation: A measurement of the spread from the central value (the mean)

Statistically Significant: A result is statistically significant when nit is unlikely to have occurred by chance

Statistical Testing: The process of concluding from your data whether an observed difference is meaningful or due to chance

Stratified: When members of the population are divided into homogeneous subgroups before sampling

Subject Measurement: A description of whether the variables are influenced by the format of data collecting

Survey: An investigation of the characteristics of a given population that is performed by collecting data from a sample from that population and estimating their characteristics through the systematic use of statistical methodology

T-Test: A statistical test that is used to observe the difference between two means

Tables: A set of quantitative data arranged in rows and columns

Target: A specific level of performance for a measure (indicator) at a predetermined point in time

Glossary and Abbreviations x

Epidemiology for Data Users (EDU) Trainer‘s Manual

Timeliness: The degree to which data is collected in the realm of time

Validity: The degree to which data is appropriately and correctly measured to the indicator

Glossary and Abbreviations xi

Epidemiology for Data Users (EDU) Trainer‘s Manual

COURSE SCHEDULE

Day 1 TIMES ACTIVITY Presenter(s)/Facilitator(s) 8:15 – 8:30 Registration 8:30 – 8:45 Opening remarks 8:45 – 9:45 Pre-test 9:45 – 10:00 Overview of the course (goals, objectives, schedule) 10:00 – 10:15 Tea break 10:15 – 11:30 Epidemiology 11:30 – 12:45 Monitoring and Evaluation (M&E) 12:45 – 13:45 Lunch 13:45 – 14:45 Data sources and utilization in Zambia 14:45 – 15:00 Tea break 15:00 – 17:00 Exercise

Day 2 TIMES ACTIVITY Presenter(s)/Facilitator(s) 8:15 – 8:30 Have participants finish exercise from previous day (if needed) 8:30 – 9:00 Recap of previous day & small group discussion 9:00 – 10:15 Data Quality 10:15 - 10:30 Tea Break 10:30 – 11:45 Data summary & analysis 11:45 – 13:00 Introduction to Epi Info (with hands on examples) 13:00 – 14:00 Lunch 14:00 – 15:30 Graphs, Charts, Tables in Excel 15:30 - 15:40 Tea break 15:40 – 17:00 Exercise

Course Schedule xii

Epidemiology for Data Users (EDU) Trainer‘s Manual

Day 3 TIMES ACTIVITY Presenter(s)/Facilitator(s) 8:15 – 8:30 Have participants finish up exercise from previous day (if needed) 8:30 – 9:00 Recap of previous day 9:00 – 11:00 Hands-on HMIS (Concurrently Hands-on CSO (for CSO morning through only) afternoon) 11:00 – 11:15 Tea break 11:15 – 12:15 HMIS Exercise 12:15 – 12:45 Recap of Lessons learned in HMIS 12:45 – 13:45 Lunch 13:45 – 15:30 Hands-on NACMIS 15:30 – 15:45 Tea break 15:45 – 16:45 NACMIS Exercise 16:45 – 17:00 Recap of Lessons learned in HMIS & NACMIS

Day 4 TIMES ACTIVITY Presenter(s)/Facilitator(s) 8:30 – 10:30 Hands on SmartCare 10:30 – 10:40 Tea break 10:40 – 11:30 SmartCare Exercise

11:30 – 12:30 Writing Reports 12:30 – 13:00 Development of Epi Profiles Introduction of Development of Regional Epi Profiles assignment 12:30 – 13:00 Develop plan of action (Group work) 13:00 – 14:00 Lunch 14:00 – 14:30 Develop plan of action (Group work) 14:30 – 15:30 Regional Epi Profiles

Course Schedule xiii

Epidemiology for Data Users (EDU) Trainer‘s Manual

exercise (Group work) 15:30 – 15:40 Tea break 15:40 – 17:00 Regional Epi Profiles exercise (Group work)

Day 5 TIMES ACTIVITY Presenter(s)/Facilitator(s) 8:30 – 10:00 Regional Epi Profiles exercise (Group work) for participants 9:00 - 10:00 Steering Committee for trainers (concurrently) 10:00 – 10:10 Tea break 10:10 – 11:20 Regional Epi Profiles exercise (Group work) 11:20 – 13:00 Provincial/District Epi Profiles Presentation – MAXIMUM 15 minutes per group (followed by 5 minutes feedback from participants and 2 facilitators) 13:00 – 13:45 Lunch 13:45 – 14:45 Post-test 14:45 – 15:00 Course Evaluation 15:00 – 15:15 Closing remarks 15:15 – 15:30 Certificates of attendance 15:30 – 15:45 Refreshments

Course Schedule xiv

Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION I: COURSE OVERVIEW

Epidemiology for Data Users Training

EPIDEMIOLOGY FOR DATA USERS (EDU) TRAINING

Republic of Zambia Ministry of Health Central Statistic Office

Description of Session: This session will provide an overview of the EDU training. The purpose of this training is to build capacity of provincial and district staff to collect relevant, quality data, analyze and use them to for local decision-making. The training also aims to harmonize approaches to M&E in Zambia under the direction of the national government, Ministry of Health (MOH), and National HIV/AIDS/STI/TB Council (NAC), with support from bilateral partners from U.S. Government (USG) and other international agencies. Throughout the

Session I: Course Overview 1

Epidemiology for Data Users (EDU) Trainer‘s Manual training, we encourage you to ask questions if you need clarification on any definitions or concepts. We want this to be an interactive process.

Epidemiology for Data Users Training – Zambia Experience • Health facilities and Program staff collect health data from a variety of sources • Regions could use their own data as evidence for local decision-making • Where and for whom interventions/services are needed • Funding allocation • Priorities for staff time • Central staff can use data for national decision- making

Epidemiology for Data Users Training – Why Are We Here? • Using data is hard with current limitations: • Gaps in training for data collection, data entry, data analysis • Inconsistent data quality management • Lack of training and support for the regular use of collected data • We aim to use this training to: • Improve the quality of health data • Strengthen analytic skills among those involved in collection and reporting so data are used in decision-making  You are the key to make this happen

Session I: Course Overview 2

Epidemiology for Data Users (EDU) Trainer‘s Manual

Overall Goals

• To strengthen the quality of health data collection practices, and improve the analytic capacity of those staff synthesizing and reporting health data in Zambia

• To provide an opportunity for collaboration and interaction among provincial and district-level staff on provincial/district level epidemiologic profiles and M&E plans in support of national strategic frameworks

Overall Objectives

By the end of the training, you will be able to: • Clearly explain epidemiological concepts related to health data collection, use and interpretation • Implement quality monitoring and evaluation with epidemiological data • Develop an M&E plan for your district/province • Summarize and analyze data • Create Graphs, Charts and Tables

Session I: Course Overview 3

Epidemiology for Data Users (EDU) Trainer‘s Manual

Objectives continued

• Write summaries of data analysis results using scientifically accurate language • Describe and utilize the main health data sources in Zambia • Use SmartCare, HMIS, and NACMIS to extract health data for summarization, analysis and interpretation • Synthesize health data from Zambian data sources to highlight disease trends, services, and needs in epidemiologic profiles

EDU Training Agenda • Monday: – Introduction to Epidemiology – Introduction to Monitoring & Evaluation (M&E) – Data sources and utilization • Tuesday: – Data quality – Data summary & analysis – Intro to Epi Info – Graphs, Charts, Tables in Excel

Session I: Course Overview 4

Epidemiology for Data Users (EDU) Trainer‘s Manual

EDU Training Agenda continued • Wednesday: – Using DHIS – Using NACMIS – Using data from periodic surveys (CSO) • Thursday: – Using SmartCare – Writing reports – Developing Epi profiles • Friday: – Developing Epi profiles

Your Output

This week: • (draft) integrated provincial/district HIV/AIDS Epi Profile and plan to complete this as a group • (draft) provincial/district M&E plan and plan to complete this as a group • Roll-out plan to train your district level staff Long-term: For MOH and NAC-led Data Quarterly Review Meetings: • Finalized integrated provincial/district HIV/AIDS Epi Profile • Finalized provincial/distrcict M&E plan

Session I: Course Overview 5

Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION II: DATA SOURCES AND UTILIZATION IN ZAMBIA

Data Sources and Utilization in Zambia

Description of Session: This session will discuss the importance of data, describe different data types, introduce specific sources of data available in Zambia for programmatic use, and discuss the importance of disseminating collected information.

Session II: Data Sources and Utilization in Zambia 6 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives:

Session Objectives Upon completing this session, participants will be able to: • Distinguish between routine and non-routine data types • Identify major public health data sources available in Zambia • Explain the importance of analyzing, disseminating and using information produced by information systems • Describe how such data can be used to guide decisions at the local level • Develop an M&E plan for your province

Speaker Notes: At the end of this session participants will be able to: - Distinguish between routing and non-routine data - Identify the major public health data sources in Zambia - Explain the importance of analyzing, disseminating and using the information produced by these data sources - Describe how these data can be used to guide decisions at the local level - Develop and M&E plan for your province

Session II: Data Sources and Utilization in Zambia 7 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Data Sources and Lecture Copies of the Utilization presentation for each participant

Blank flipchart paper and markers 2 Exercise Group Copies of Work exercise for participants

Copies of datasets for participants

Copies of answer key for facilitators

Session II: Data Sources and Utilization in Zambia 8 Epidemiology for Data Users (EDU) Trainer‘s Manual

Steps Leading to Data Use

Data Summarization Dissemination & Analysis & Use Skills Availability

Need

Speaker Notes: Ask the participants what are the steps leading to data use? (Expected answers)  Need  Availability  Skills  Summarization and analysis  Data dissemination and use Solicit more responses from the audience and discuss. Then review the slide and summarize using additional facilitator background information provided below.

Facilitator background information This graph highlights specific steps needed that lead to data use: - Need: there has to be a need for data to be collected (i.e., stakeholders needs and interest) and determination of what questions and uses stakeholders have for the data - Availability: the data needs to be made available - Skills: staff need to have the skills to collection relevant, quality data and,

Session II: Data Sources and Utilization in Zambia 9 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Summarize and analyze them to: - Disseminate and use the data for decision making and programmatic improvements.

Data Source Considerations

The following should be considered when assessing a data source’s potential for use: • How are data collected? ―What instruments/methods are used? ―What populations are represented? ―How frequent is collection? • In what formats are data available? ―Electronic ―Paper-based (tally sheets, logbooks, forms, etc.) • What is the quality of the data?

Speaker notes First prompt the audience: what questions do we need ask to asses the potential of a data source ? (Expected answers)  How was the data collected?  In what form is the data?  What is the quality of the data? Solicit more responses from the audience and discuss. Review the slide.

Session II: Data Sources and Utilization in Zambia 10 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Flow

Speaker Notes: Review the slide

Facilitator background information Data is collected (patient records, census, population surveys, etc.) and then entered into databases (such as HMIS, NACMIS and SmartCare). Data is then analyzed, summarized and included in reports. These reports are often used by partners for decision making.

Session II: Data Sources and Utilization in Zambia 11 Epidemiology for Data Users (EDU) Trainer‘s Manual

MOH Data Flow

• The Ministry of Health organizes its data flow via the Health Management Information System (HMIS) • Data from this system is organized in the District Health Information System (DHIS) database

Speaker notes

Firstly, from among the audience, ask the MoH participants to describe the follow of data from the health facility to the District Health Information System Discuss the responses and review the slide and summarize.

Facilitator background information

1. Health workers collect data during service provision at the facility. They then aggregate on the HIA report and Self Assessment Tool and send to District Health Information Officer at the DHO. 2. The DHIO validates the data and captures into the DHIS. The DHIO must provide feedback on the data submitted to the health facility by the end of the second month. The feedback must include any suspicious data identified and pre-determined performance league tables across the district. 3. The DHIO then sends the district data to the Provincial Health Office for further processing and assessment by Senior Health Information Officer (SHIO). 4. Information received from the districts is passed to the HMIS national office for consolidation.

Session II: Data Sources and Utilization in Zambia 12 Epidemiology for Data Users (EDU) Trainer‘s Manual

5. As shown in the diagram above the national office provides feedback and technical support to the province, district and eventually the facility. 6. The HMIS information is made available to program focal persons and managers at all levels. 7. Parallel vertical systems are discouraged. Additional data can be collected through setting up of sentinel surveillance sites or special surveys

HMIS acts on the DART principle (Decentralization, Action-Oriented, Responsive, Time-Bound)

NAC Data Flow

• The National Community District HIV/AIDS/STI/TB Provincial National Council organizes its data flow via its M&E Unit • Data arising from this system is organized in the National AIDS Council Management System (NACMIS) database.

Presenters notes Before you display and review this slide, begin by asking the participants from NAC to describe the flow of data from the community to the M&E unit at NAC. Discuss. Review the slide and summarize.

Facilitator background information NAC uses the National AIDS Reporting Form (NARF) as data collection tool. Data flow s from the community (Community AIDS Task Forces (CATFs) to

Session II: Data Sources and Utilization in Zambia 13 Epidemiology for Data Users (EDU) Trainer‘s Manual

District AIDS Task Force (DATF). The DATFs aggregate the data into a District NARF and submit to the Provincial AIDS Task Force (PATF) who then consolidate the data into a provincial NARF and submit to M&E unit at NAC headquarters.

Routine vs. Non-routine Data Types • Routine data – Collected continuously at various time periods (daily, patient by patient, monthly, etc.) – Come from health information systems and are collected as part of an ongoing system • Non-Routine (periodic) data – Collected at certain periods of time, or over a specific period of time – Come from special studies or surveys carried out for specific purposes

Speaker notes Review the slide.

Ask the participants: What type of data is collected by the DHIS? Answer – Routine.

What type is collected by the NARF? Answer - Routine

Session II: Data Sources and Utilization in Zambia 14 Epidemiology for Data Users (EDU) Trainer‘s Manual

Strengths & Limitations of Routinely Collected Data • Strengths – Often low cost since data is routinely collected – Usually collected at the point of service delivery (i.e., health facility) then obtained at district, provincial and national levels (to offer ‘complete picture’ of the country) – Data collection often coincides with program implementation therefore these day can be used to monitor and evaluate program goals • Limitations – Time-consuming, can be burdensome for health staff – Data quality not always ensured

Speaker Notes: Review the slide Explain that fluctuations in what is being measured are more apparent in routine data collection than they are in non-routine data collection. This is a major strength of routine data collection.

Strengths and Limitations of Periodic Data • Strengths: – Surveys especially useful if other data are not available or inadequate – Surveys can be tailored to fit specific measurement objectives – Surveys yield cost-efficient data on population and services useful for facilities, providers and clients

• Limitations: – Surveys are expensive and time consuming – Recall bias – Survey sampling design and analysis may be complex

Speaker Notes: Review the slide. Give an example of:

Session II: Data Sources and Utilization in Zambia 15 Epidemiology for Data Users (EDU) Trainer‘s Manual

DHS: costs around $3 million and takes many months to complete Recall bias: which occurs when the way a survey respondent answers a question is affected not just by the correct answer, but also by the respondent's memory.

Part I: Data Sources

Routine Data

Session II: Data Sources and Utilization in Zambia 16 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes Remind the participants that you have already defined and explained what routine data is as well as its strengths and weaknesses. In this section you will focus on the sources of routine health data in Zambia.

Routine Data - Instruments • Instruments on which data elements are collected – Registers (i.e., PMTCT, ART, etc.) – Daily activity sheets – Patient cards (i.e., ANC cards) – Program-specific forms (i.e., SmartCare forms) • Instruments on which data are summarized – Activity (tally) sheets – Health Information Aggregation (HIA) Forms – National AIDS Council Activity Reporting Form (NARF) • Data elements: fields in/on each instrument on which specific data are entered – Date of birth, test result, pregnancy status Speaker Notes: Provide examples on registers (TB register, etc.), Such data may be collected for other (e.g., clinical) purposes but can be useful for guiding overall programmatic directions (M&E, etc.).

Session II: Data Sources and Utilization in Zambia 17 Epidemiology for Data Users (EDU) Trainer‘s Manual

Routine Data – Routine Information Systems

• Routine Information Systems – Include program information on activities and outputs – Useful for Epidemiology, Monitoring & Evaluation (M&E) and/or to link data across institutions • Examples of Information Systems in Zambia – MOH’s District Health Information System – National HIV/AIDS/STI/TB Council Management Information System

Speaker Notes: Prompt audience: What are some examples of systems used in Zambia? Answer - HMIS (DHIS and SmartCare), NACMIS Review slide and summarize

MOH’s District Health Information System (DHIS) • Main software and database used to store MOH HMIS data – A tool to help improve service management to achieve better health for all by using available information • Based on the collection & analysis of routine data (essential dataset) at local level – Gathers info that shows changes in local health conditions, health status & health priorities – is indicator based • Informs different levels of health care providers of progress made towards achieving set objectives & targets

Speaker Notes: Ask the audience: What is District Health Information System? Answer - DHIS is a tool to help improve service management achieve better health care. It informs different levels of health care providers of the progress that is being made Review slide and summarize

Session II: Data Sources and Utilization in Zambia 18 Epidemiology for Data Users (EDU) Trainer‘s Manual

MOH’s District Health Information System (DHIS) • Data Collection Instruments – Health Information Aggregation (HIA) Forms: Paper based forms used to tally facility data from various registers (i.e., Out-Patient, In- Patient, Child Health, Delivery, etc.) – SmartCare: computerized electronic health record system used to collect patient-level data at facility level. • Integrated System – Shares and combines a variety of information from numerous systems

Speaker notes Prompt the audience: What tools are used to collect data for the DHIS? (Expected answers)  Health Information aggregation forms (HIA)  Electronic health record Review the slide and summarize

DHIS: An Integrated System

Speaker Notes: Review the slide.

Session II: Data Sources and Utilization in Zambia 19 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information DHIS is an integrated tool (software) to help improve service management to achieve better health for all by using available information. It is based on the collection & analysis of routine data (essential dataset) at local level and enables districts to assess whether the goals, objectives, indicators and targets, based on both strategic and operational plans are being achieved

MOH’s SmartCare

• SmartCare is… – An integrated electronic health record (EHR) system designed to improve continuity of care – Collects patient-level data at facility-level • How can SmartCare information can be used? – Improve patient care quality – View, print or export reports for M&E – Use GIS for surveillance & epidemiology trends

Speaker notes Review the slide

Facilitator background information

SmartCare is Zambia‘s National Standard Electronic Health Record (EHR) System. It is a tool for the routine documentation of patient healthcare and facility operations and It provides:  ‗Continuity of Care‘ for the patient  Aggregate reports for providers, administrators and policy makers SmartCare includes patient data, facility data, facility operations data, aggregate data, Geographical Information Systems (GIS) data. Micro-data is de-identified, but identifiable, person level data.

Session II: Data Sources and Utilization in Zambia 20 Epidemiology for Data Users (EDU) Trainer‘s Manual

Strengths and Challenges of Using DHIS • Strengths – Able to integrate both computerized and paper- based methods of data collection – Implemented and operational in all districts – Internally analyzes data and produces pre-defined indicators • Challenges – Limited linkages between DHIO and program officers – Most information officers still have difficulties in understanding data elements – Denominators used to calculate indicators often are not representative – Not harmonized with other Zambian data systems

Speaker notes Review the slide

National HIV/AIDS/STI/TB Council’s Management Information System

• NACMIS is the data system behind NAC’s routine monitoring and evaluation program – Designed to guide the multi-sectoral national response to HIV/AIDS, STIs and TB • NACMIS collects, compiles & aggregates data on the national response to HIV/AIDS, STIs and TB • The M&E unit of the National HIV/AIDS/ STI/TB Council has the responsibility of coordinating and reporting all NACMIS activities

Speaker notes Ask the participants: what is the NACMIS? Discuss. Review slide and summarize

Session II: Data Sources and Utilization in Zambia 21 Epidemiology for Data Users (EDU) Trainer‘s Manual

National HIV/AIDS/STI/TB Council’s Management Information System • Data Collection Instruments – National Activity Report Form (NARF): paper based form used to summarize data from NAC-implementing partners • System integrates additional data from multiple sources – (i.e., DHIS, Education Management Information System) and surveys (DHS, Sentinel Surveillance survey, etc.)

Speaker notes Prompt the audience: what is the data collection instrument used NAC? Answer – National AIDS Reporting Form (NARF) Review slide and summarize

NACMIS: An Integrated System

Speaker notes As a Database to guide the multi-sectoral national response to HIV & AIDS, STI‘s & TB, the NACMIS Collects information from a number of other sources

Session II: Data Sources and Utilization in Zambia 22 Epidemiology for Data Users (EDU) Trainer‘s Manual

Strengths and Challenges of Using NACMIS • Strengths – Contain comprehensive data on the entire national response to HIV/AIDS, STIs and TB – Allows for a more complete picture of the HIV/AIDS epidemic by synthesizing data from many different sources – User friendly • Challenges – Not fully tested – Limited ability to export or graph user-defined data – Limited number of data elements

y

Speaker Notes: Explain to the participants that this training has been developed in response to the above challenges.

Non-routine Data

Session II: Data Sources and Utilization in Zambia 23 Epidemiology for Data Users (EDU) Trainer‘s Manual

Non-Routine Data – Instruments: The Periodic Survey • A survey is an investigation of the characteristics of a given population by collecting data from a sample of that population and estimating their characteristics through the systematic use of statistical methodology

25

Speaker notes

Prompt the audience: what is a survey? Solicit several responses from the audience. Review the slide and summarize.

Periodic Surveys

In addition to routinely collected health data, Zambia has periodic surveys such as: – Demographic and Health Survey (DHS) – Behavioral Surveillance Survey (BSS) – Antenatal Clinic Sentinel Surveillance (ANCSS) – Sexual Behaviour Survey (SBS) – National Census – AIDS Indicator survey

Speaker Notes: Firstly ask the audience: What are the major sources of periodic data in Zambia: (Expected answers)  Demographic Health Survey

Session II: Data Sources and Utilization in Zambia 24 Epidemiology for Data Users (EDU) Trainer‘s Manual

 Census  Sexual behavior survey etc Solicit more examples from the participants. Review the slide and summarize. Explain that more information on each to follow.

Demographic and Health Survey

• Nationally representative • Years completed: 1992, 1996, 2002, 2007 • Coverage – Random sample of Zambian households – 7,146 women ages 15-49 – 6,500 men ages 15-59 • Availability – Hard copies – Online in PDF format: www.measuredhs.com – Raw data available on request on the Macro website or on CD at CSO

Speaker Notes: Randomization is used to ensure that a sample is representative of the entire population.

Some variables collected in the DHS • Child Health • Education • Fertility and fertility preferences • Gender/domestic violence • HIV/AIDS Knowledge, Attitudes and Behavior • HIV prevalence • Household and respondent characteristics • Infant and child mortality • Malaria • Maternal health • Nutrition • Wealth/Socioeconomics

Session II: Data Sources and Utilization in Zambia 25 Epidemiology for Data Users (EDU) Trainer‘s Manual

Antenatal Clinic Sentinel Surveillance Survey (ANC SS) • Cross-sectional surveillance of pregnant women attending antenatal clinic (for their first visit) at selected sites across the country • Conducted every 2 years • Data used to monitor HIV and Syphilis prevalence trends • Implementation: MoH, TDRC & UTH Virology Laboratory

Speaker notes

Review the slide and summarize

Sexual Behavior Survey (SBS)

• Focus on international indicators • Provides national estimates on: – HIV/AIDS Knowledge, Attitudes and Sexual behavior – Household and respondent characteristics • Coverage – Stratified random sample of ~2,500 households – Men ages 15-59 and women ages 15-49 • Years completed: 1998, 2000, 2003, 2005, 2009 • Availability online in pdf format at www.zamstats.gov.zm/media.php?id=8

Speaker Notes: Prompt the audience: What are Sexual Behavioral surveys? How atre they conducted? Solicit several responses from the audience. Review the slide and summarize

Session II: Data Sources and Utilization in Zambia 26 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information HIV-related behavioral data collected from at-risk groups Not nationally representative Years completed Female Sex Workers (2000, 2005, 2006) Long Distance Truck Drivers (2000, 2006) Uniformed Personnel (2005) Coverage Targeted sampling 1,114 FSWs (2006) 1,106 Drivers (2006) Availability Hard copies Online in PDF format: www.fhi.org/en/CountryProfiles/Zambia/zambiatools.htm

Session II: Data Sources and Utilization in Zambia 27 Epidemiology for Data Users (EDU) Trainer‘s Manual

National Census

• Represents total population • Years completed: – 1969, 1980, 1990, 2000, 2010 • Coverage – All households and people in Zambia • Availability – Hard copies available – Online http://www.zamstats.gov.zm/census.php

Speaker notes Describe methods used for survey to attempt to capture everyone in Zambia

Census of Population and Housing: Variables • Data available for all variables at EA, CSA, ward, constituency, district, province & national levels – Population – Fertility – Growth rate – Mortality – Age distribution – Household – Educational status characteristics – Employment status

Session II: Data Sources and Utilization in Zambia 28 Epidemiology for Data Users (EDU) Trainer‘s Manual

Part II: Data Use

Data Use • Process of ensuring data is used, such as to make decisions or changes in planning, policy making, program administration/ management and delivery of services, to make changes, or to take other specific actions designed to improve outcomes Data must be analyzed, synthesized and presented in a clear and easy to understand format that is appropriate for the audience!

Speaker Notes: Review slide and summarize Emphasize that it is the Examination and use of routinely collected program information by stakeholders for program decision-making, advocacy, and accountability

Session II: Data Sources and Utilization in Zambia 29 Epidemiology for Data Users (EDU) Trainer‘s Manual

Potential Uses of Data at the Provincial and District Levels • Monitor and evaluate programs • Track performance indicator targets and progress • Compare program planning with data from actual implementation • Identify priority needs and emerging problems • Make decisions on program improvements • Guide and enhance service delivery • Obtain additional funding, if needed • Communicate program successes and challenges to policymakers at national level

Potential Data Uses, continued

• Developing products like epidemiologic profiles that assemble data from different sources in one accessible place can help shape and guide decisions • Remember -- data are useful to the extent that they are analyzed, shared/ communicated, and used -- not just collected!

Speaker notes Ask the participants for examples of how they use data at their local level. Solicit several answers. Then review the slide and summarize

Facilitator background information Data can be used for program improvement. These data are usually obtained from an organization‘s process monitoring and evaluation activities. 1. To manage and improve program processes and systems by comparing program planning data with actual implementation data Examples:

Session II: Data Sources and Utilization in Zambia 30 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Inform staffing decisions - Monitor service delivery site activities - Track and monitor materials disseminated and expenditures 2. To inform capacity-building plans and activities Examples: - Hire more staff - Train staff - Purchase more supplies 3. To make decisions about the future direction of the program Examples: - Scale-up services/expand coverage - Identify new geographical areas and/or other services to be added to the program 4. To guide and enhance service delivery Examples: - Assess whether services are culturally appropriate - Understand client needs - Monitor changes in client risk behavior - Follow up with clients

Session II: Data Sources and Utilization in Zambia 31 Epidemiology for Data Users (EDU) Trainer‘s Manual

National Uses of data

• To communicate overall program successes and challenges • To gain additional resources and funding • To be accountable to clients, donors and other stakeholders • To report to policymakers

37

Speaker Notes: Prompt Audience: Cite some examples of external uses of data, and then provide additional examples. Solicit several responses. Review slide and summarize.

Facilitator Background information External Uses 1. To communicate program successes and challenges to the community Examples: - Provide valuable information and lessons learned for agencies planning to implement similar programs - Raise awareness about HIV risk and prevention efforts 2. To gain additional resources Examples: - Raise funds 3. To be accountable to clients, donors and other stakeholders 4. To report to policymakers

Session II: Data Sources and Utilization in Zambia 32 Epidemiology for Data Users (EDU) Trainer‘s Manual

WRAP UP AND REVIEW

Conclusion

• Data is collected on a routine and non-routine basis throughout Zambia • Data must be disseminated and used for it to be effective • There are many important uses of data: – Helping to set public health care priorities – Helping with the allocation of resources – Improving the quality of services provided – Health program and service effectiveness and impact • For data to be useful it must be analyzed and presented in an understandable way 38

Speaker notes Review the slide. Explain to the participants that in this course, they will learn these skills to be able to achieve effective data use.

Session Objectives

Upon completing this session, participants will be able to:  Distinguish between routine and non-routine data collection methods  Identify major public health data sources available in Zambia  Identify what types of data are captured in each information system  Explain the importance of analyzing, disseminating and using information produced by information systems  Describe how such data can be used to guide decisions at the local (district and provincial) level  Develop an M&E plan for your province

Session II: Data Sources and Utilization in Zambia 33 Epidemiology for Data Users (EDU) Trainer‘s Manual

Resource Center

• Demographic Health Survey (DHS) http://www.measuredhs.com/pubs/pub_ details.cfm?id=894

40

Session II: Data Sources and Utilization in Zambia 34 Epidemiology for Data Users (EDU) Trainer’s Manual

SESSION III: INTRODUCTION TO EPIDEMIOLOGY

Introduction to Epidemiology

Description of Session: This session will provide an introduction to Epidemiology. The overall goals are to familiarize students with epidemiological terms and concepts that will serve as a foundation for the rest of the material to be presented throughout the course.

Session III: Introduction to Epidemiology Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives: By the end of this session, participants will be able to:

Session Objectives

At the end of the session participants will be able to: – Define epidemiology – Describe three key elements of epidemiology – Determine when epidemiology may be used to identify a health problem and influence program planning – Illustrate the use of health data for health policy- making – Calculate and interpret primary measures of disease frequency – Distinguish between incidence and prevalence 2

Speaker notes Review the slide, reading out one objective after another.

Session III: Introduction to Epidemiology 36 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 What is Lecture Copies of the 15 Epidemiology? presentation for each participant Blank flipchart paper and markers 2 How can Lecture Copies of the epidemiology be presentation used for decision- for each making? participant Blank flipchart paper and markers 3 Measures of disease Lecture and Copies of the Appendix D: frequency hands-on presentation Disease exercises for each frequency participant formulas Blank flipchart paper and markers Copies of exercises Calculators and pencils 4 Epidemiological Lecture and Copies of the Appendix D: concepts hands-on presentation Disease exercises for each frequency participant formulas Blank flipchart paper and markers Copies of exercises Calculators and pencils

Session III: Introduction to Epidemiology 37 Epidemiology for Data Users (EDU) Trainer‘s Manual

Definition of Epidemiology

The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study towards control of health problems

Speaker Notes: Ask the participants: What is epidemiology, in your own words? (Expected answer)  It is the study of the distribution and determinants of disease risk in populations. Solicit more answers from the audience In summarizing, review the slide. Tell the participants that in Epidemiology you get to be a detective – asking who, what, where, when, how and why,

Facilitator Background Information: Epidemiology is often called the science of public health. Epidemiology is the study of the distribution and determinants of disease risk in populations. Epidemiologists study a diverse range of health conditions as well as the impact that various exposures have on the manifestation of disease. (American College of Epidemiology)

Session III: Introduction to Epidemiology 38 Epidemiology for Data Users (EDU) Trainer‘s Manual

What are we Measuring?

• Distribution How many people developed the illness in a period of time? What are their demographic characteristics? Where do they live/work? • Determinants How did they become sick? What are the reasons some are more likely to have the illness than others?

Speaker notes Ask the participants: From the definition what do you think is measured in Epidemiology? (Expected answer)  Number of people who have a particular disease Solicit more answers from the participants. Emphasize that a main component of epi, is that we count things. Review the slide and summarize using background information below:

Facilitator background information Epidemiology uses statistics to measure the occurrence and the extent of exposure. You need to turn observable phenomena into data in order to analyze it with the tools of Epidemiology. Therefore Epidemiologists count things. It mainly focuses on the measuring the following: Distribution of disease Consideration of who, where, and when What are their demographic characteristics? (Who?) Where do they live/work (Where?)

Session III: Introduction to Epidemiology 39 Epidemiology for Data Users (EDU) Trainer‘s Manual

Determinants of disease Derived from frequency and distribution Testing an epidemiological hypothesis How did they get sick? (How?) What are the reasons some are more likely to have the illness than others (Why?) Being an epidemiologist is a lot like being a detective- we‘re interested in WHO, WHAT, WHERE, WHEN, WHY, HOW: How many people have/got the illness in a period of time? (What? When?) What are their demographic characteristics? (Who?) Where do they live/work (Where?) How did they get sick? (How?) What are the reasons some are more likely to have the illness than others (Why?)

Epidemiological data are used to control health problems • Increase awareness to stop/slow a trend • Encourage people at high risk to avoid exposures • Discourage risky behavior overall • Encourage healthy behaviors

Speaker Notes: Prompt Audience: Why does Epi collect data on or measure the distribution and determinants of disease? (Expected answer)  It helps to put in place programmes to prevent and control diseases.

Session III: Introduction to Epidemiology 40 Epidemiology for Data Users (EDU) Trainer‘s Manual

Solicit more answers from the audience Review slide and summarize.

Facilitator background information Epidemiology is sometimes referred to the basic science of Public Health. It is a science that produces information which is used in decision making at all level of the health care system. Its main goal is to prevent and control disease. Therefore the focus of epidemiology is to understand the causes and patterns of disease and use the knowledge to improve the health populations. For example, if you know that a vector causes a particular disease, you can create a program to target that vector.

Key Elements of Epidemiology

• Person – Age – Sex – Race or ethnicity • Place – Geographic location – Proximity to potential exposure – Clustering • Time – Date/time of exposure or onset of illness – Seasonality of infectious disease – Identify disease rates

Speaker notes Review the slide. Ask participants to add to the list of the examples of each of the elements.

Facilitator background information The first step of epidemiologic investigation is to describe disease distribution. Three key elements are used in epidemiology; they are: Person, Place and Time.

Session III: Introduction to Epidemiology 41 Epidemiology for Data Users (EDU) Trainer‘s Manual

For example, who is impacted (person); where and how (place); and when (time). Data include time and place of occurrence of the disease and the characteristics of the persons infected.

John Snow (1813 – 1858)

On the Mode of Communication of Cholera

Speaker Notes: Ask the participants if anyone of them knows the story of John Snow?  If there is any, ask them to briefly explain who John Snow is and his contribution to Epi. Review the slide.

Facilitator background Information John Snow is considered to be the father of epidemiology He was a physician in London who used the three key elements of epidemiology (person, place, time) to identify the source of a cholera outbreak in London and used the information to control its spread.

Session III: Introduction to Epidemiology 42 Epidemiology for Data Users (EDU) Trainer‘s Manual

Broad St. Pump Cholera Outbreak London, England 1854 • Low-level transmission in August • Increase of cases Aug. 31 – Sept.1, 79 deaths Sept. 1 & 2 • 87% of deaths clustered around Broad St. pump • John Snow believed cholera was associated with central water pump • Pump handle removed Sept. 8 • Then cases decreased

Speaker Notes: Review the slide

Facilitator background information At the time, it was not believed that Cholera could be caused by water, but it was thought that it was due to elevation/clouds. Through observation and case investigation, Snow found Cholera was associated with central water pump (Broad St. Pump) by comparing that pump to another pump that got water upstream. To test his theory he had the Broad Street Pump handle removed, and soon after cases started to decrease.

Session III: Introduction to Epidemiology 43 Epidemiology for Data Users (EDU) Trainer‘s Manual

“One’s knowledge of science begins when he can measure what he is speaking about and express it in numbers”

-Lord Kelvin (1824-1907)

7

Speaker Notes: Ask the participants if anyone of them knows the story of Lord Kelvin?  If there is any, ask them to briefly explain who John Snow is and his contribution to Epi.

Review the slide Facilitator Background information Lord Kelvin was a renowned mathematical physicist and engineer. This quote speaks to the importance of being able to provide factual evidence of events. It is important that you express information that you have gathered in numbers. Therefore Epidemiology utilizes numerical data for its observations.

Session III: Introduction to Epidemiology 44 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measures of Disease Frequency

Four Basic Mathematical Parameters: • Counts • Ratios • Proportions • Rates

8

Speaker Notes: Ask the audience: What are the four basic mathematical parameters used in Epidemiology? Solicit several answers from the audience. Review slide and summarize

Facilitator background information When we talk about disease frequency we often use one of four measures: 1. Counts - the raw number of cases/people impacted by the disease/condition, 2. Ratios - a comparison of one count to another, for example, ratio of men to women impacted by the disease/condition, 3. Proportions – a ratio that takes into account the size of a given population; and 4. Rates - standardized measure per population size in order to be able to compare frequency of diseases in different communities/populations.

Session III: Introduction to Epidemiology 45 Epidemiology for Data Users (EDU) Trainer‘s Manual

Count

• Simple measure of quantity

Example: Number of stillbirths in at the University Teaching Hospital in Zambia in 2010

9

Speaker notes Ask the participants to give examples of a count Answer – Number of participants in a workshop (for example) Solicit more examples from the audience Review the slide

“How many?”

Most basic measurement is a simple count of cases in a time frame – Not useful for comparing to other groups or populations

Race # of measles Pop. size cases Black 119 1,450,675 White 497 5,342,532

Speaker Notes Review the slide

Session III: Introduction to Epidemiology 46 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background Let‘s look at the question of ―What‖ answered by descriptive epidemiology. When we ask ―what‖, we want to know what the problem is – what is the disease and how much of it is happening? The most basic answer to this question is conducting a simple count of cases. Case counts are helpful for looking at the burden of disease in a population, but are not as useful for comparing to disease levels in other groups or populations. For example, if we found 119 cases of measles in a Black population, and 497 cases in a White population, we might conclude that the white population has more measles – about a little more than five times as many cases. However, if we look at the size of each population, we also see that the white population has about five times as many people. So the proportion of measles cases doesn‘t differ that much between populations – this is a point that would be missed if we examined only the number of cases in each population.

Ratio

• Ratios express the relationship of two values in order to make a comparison

• Obtained by dividing one quantity by another. (These quantities may be related or may be independent).

Example: Ratio of females to males living with HIV in Zambia: 560,000 females/540,000 males = 1.03

11

Speaker Notes: Ask the participants: What is a ratio? (Expected answer)

Session III: Introduction to Epidemiology 47 Epidemiology for Data Users (EDU) Trainer‘s Manual

 A comparison of one count to another, for example, ratio of men to women impacted by the disease/condition, Solicit more answers from the audience. Review slide and summarize

Facilitator background information Ratios tend to be the most common way of presenting epidemiologic data. For example, we can compare the number of females living with HIV in Zambia to the number of males using 2007 UNAIDS estimates. The ratio is very close to 1, indicating similar number of infections by gender.

Proportion

• A ratio in which the numerator is included in the denominator: X (X+Y)

Speaker Notes: Ask the audience: What is a proportion (Expected answer)  A proportion is a part of a whole.

Session III: Introduction to Epidemiology 48 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example of calculating a proportion

Example: Gender proportions (expressed as a %) sampled in the Zambian Demographic Health Survey (DHS)

Formula= # Men Surveyed or # Women Surveyed #Men Surveyed + #Women Surveyed

Men: 16,314 = .482 x 100 = 48.2% (16,314+17,557) As a percentage Women: 17,551 = .518 x 100 = 51.8% (16,314+17,551)

Speaker Notes: Review slide and demonstrate how to calculate a proportion. Clarify the denominator to be the appropriate pool from which cases occur. Population only includes people who could have the disease. Often expressed as a percentage- to express the ratio as part of 100

Session III: Introduction to Epidemiology 49 Epidemiology for Data Users (EDU) Trainer‘s Manual

Percentage

• A way to express a proportion (proportion multiplied by 100) • Expresses a number in relation to the whole • Example: Males comprise 2/5 of the patients, or 40% of the clients are male (0.40 x 100) • Allows us to express a quantity relative to another quantity. Can compare different groups, facilities, countries that may have different denominators

Speaker Notes: Ask the audience: What is a percentage? (Expected answer)  A percentage is a way to express a proportion multiplied by 100. It expresses a number in relation to the whole.

Facilitator background information A percentage allows us to express a quantity relative to another quantity. It allows us to compare different groups, facilities, or countries that may have different denominators – it represents a fraction of 100.

Using the previous example, we saw that two-fifths of the patients are male. To make this a percentage, we convert the fraction to a decimal (2/5 = 0.40) and then multiply by 100 (0.40 x 100 = 40%).

Session III: Introduction to Epidemiology 50 Epidemiology for Data Users (EDU) Trainer‘s Manual

Rate

• A measure of how quickly something of interest happens • A way to standardize disease frequency so that it is comparable across populations Examples: . Birth rate (number of births per year) in Zambia: 43.6 per 1,000 person-years* . HIV prevalence rate in Zambia is 14 per 100 persons compared to 24 per 100 persons in Botswana *2007 Zambian Demographic Health Survey (DHS) 14

Speaker Notes: Ask participants to give an example of a rate (Expected answer)  Birth rate, HIV prevalence rate etc Solicit more answers from the audience Review slide and summarize

Facilitator background information Rates are another common measure in epidemiology. Rates allow us to indicate how quickly something is happening and because rates are measures standardized to the size of the population, they also allow us to compare two different populations. The first example shows us what the birth rate is in Zambia per person years. The second example uses rates to compare the HIV burden in Zambia and Botswana based on UNAIDS data.

Session III: Introduction to Epidemiology 51 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measures of Disease Frequency

• Prevalence (P): Measures existing cases of a disease at a particular point in time or over a period of time

• Incidence (I): Measures new cases of a disease that develop over a period of time

15

Speaker Notes: Ask the participants: What are the key measures of disease frequency? (Expected answers)  Prevalence  Incidence Follow up question: What does each one measure? (Expected answers)  Prevalence – How commonly a disease or condition occurs in a population  Incidence – number of new cases within a specified time period Review slide and summarize

Facilitator background information Several measures of disease frequency are based on the fundamental concepts of prevalence and incidence. Calculating measures of disease frequency depends on correct estimates of the numbers of people under consideration. Population at risk is the part of the population which is susceptible to a disease Data on population at risk are not always available and in many studies the total population in the study area is used as an approximation. Ideally these figures

Session III: Introduction to Epidemiology 52 Epidemiology for Data Users (EDU) Trainer‘s Manual should include only people who are potentially susceptible to the disease studied, e.g. men should not be included in calculations of the frequency of carcinoma of the cervix. Prevalence and incidence are fundamentally different ways of measuring occurrence and the relation between prevalence and incidence varies between diseases. Prevalence is the number of cases in a defined population at a specified point in time. Incidence is the number of new cases arising in a given period in a specified population. There may be a high prevalence and low incidence as for diabetes or low prevalence and high incidence as for the common cold.

Calculating Prevalence Proportion of existing individuals in a population who have the disease (or condition) at a specified time

Number of cases at a Prevalence = specified Number of persons time in population

16

Speaker Notes: Review the slide and demonstrate the calculation

Facilitator background information To be able to compare the impact of a specific disease on a population, epidemiologists often use a measure called prevalence, instead of counts of cases.

Session III: Introduction to Epidemiology 53 Epidemiology for Data Users (EDU) Trainer‘s Manual

Prevalence is the number of affected persons present in the population divided by the number of people in the population. Prevalence is often referred to as a rate. Let‘s go through an example of how to calculate prevalence. Data on prevalence and incidence is much more useful if converted into a rate/ratio or proportion

Prevalence: - Includes all of the people with the condition, whether or not they just developed it or have had it for years. - Is often reported as a percent (%) or as a rate per 100,000 persons. - Specified time can be a specific ‗point‘ in time or a ‗period‘ or ‗interval‘.

Prevalence Example

• In 2010, the Zambian Ministry of Health reported 1494 HIV infections among the 15-49 age group in one of the districts. The Central Statistics Office estimated that the 2010 population in the district was 10,444 among this 15-49 age group 1,494 Prevalence= *100= 14.3% 10,444 • In 2010, the prevalence of HIV among Lusaka residents was 14.3 % (Can also be expressed as 143 cases per 1,000 residents)

Speaker Notes: Review the slide and demonstrate

Session III: Introduction to Epidemiology 54 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information To calculate the prevalence, we divide the number of existing cases by the number of people in the population. These numbers may not be actual numbers—this is for the purpose of having participants calculate prevalence.

Uses of Prevalence

• Useful for assessing the burden of disease within a population

• Valuable for planning (i.e., planning health services or interventions)

• Not useful for determining what caused disease

Speaker Notes: Review the slide

Facilitator background information Prevalence is a useful measure for assessing the burden of disease within a population. Prevalence of HIV/AIDS in a population is useful for allocating resources and planning health services. Routine data collected at health facilities is useful for planning requirements (i.e., # test kits, ARVs, etc.) Prevalence is not useful, however, for determining what causes disease in a population. Prevalence tells us ―how much,‖ but not how or why.

Session III: Introduction to Epidemiology 55 Epidemiology for Data Users (EDU) Trainer‘s Manual

Calculating Incidence

Proportion of people who develop the disease (new cases) over a specified period of time

Number of new cases at a Incidence = specified Number of persons time at risk in population

19

Speaker Notes: Review the slide and demonstrate

Facilitator background information Incidence is the number of new cases of a disease that occur during a specified period of time, divided by the number of persons at risk of developing the disease during that period of time. Time period must be the same for both numerator and denominator Can be reported as a percent over a specified period of time

*“At risk” = susceptible to and do not already have outcome The denominator of an incidence proportion is the number of persons at the start of the observation period. The denominator should be limited to the ―population at risk‖ for developing disease, i.e., persons who have the potential to get the disease and be included in the numerator. For example, if the numerator represents new cases of HIV/AIDS, the denominator should be restricted to

Session III: Introduction to Epidemiology 56 Epidemiology for Data Users (EDU) Trainer‘s Manual those who do not have HIV/AIDS. This can be accomplished when data on existing HIV/AIDS cases are available. Let‘s look at an example of incidence to help make this clearer

Speaker Notes: Review the slide and demonstrate

Facilitator background information Of the 450 in the village, 45 had a prior diagnosis of HIV. During 2010, 9 people became newly infected with HIV. The numerator for the incidence calculation is the number of new cases. We are told that this number is 9. The denominator for the incidence calculation is the number of persons in Village A who are at risk of being diagnosed/infected with HIV/AIDS over that period of time. Because 45 people were already infected/diagnosed with HIV/AIDS, they were not at risk of developing HIV/AIDS during the time frame (2010). These 45 people need to be taken out of the picture, so we subtract them from the 450 people living in the village, for a total of 405 people at-risk of getting infected with HIV. The result of the calculation is 0.022

Session III: Introduction to Epidemiology 57 Epidemiology for Data Users (EDU) Trainer‘s Manual

This can also be expressed as a percent. The interpretation would be: The one- year incidence of depression among adults in village A is 2.2%. The incidence could also be expressed as 22 cases of HIV per 1,000 persons.

Uses of Incidence

• High incidence represents diseases with high occurrence; low incidence represents diseases with low occurrence

• Can be used to help determine the causes of disease

• Can be used to determine the likelihood of developing disease

Speaker Notes: Review the slide

Facilitator background information Incidence is a useful measure for examining how common it is for people to get a disease. High incidence represents diseases with high occurrence; while low incidence represents diseases with low occurrence. Incidence can be used to help determine the causes of disease, and can be used to determine the likelihood of developing disease.

Session III: Introduction to Epidemiology 58 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence and Prevalence

Cured Incidence

Prevalence

Prevalence

Died / Lost to follow up 22

Speaker Notes: Ask the participants to describe what is depicted in the picture on the slide? (Expected answer)  A Bathtub Explain to the participants that if we consider incidence and prevalence as being related to a bathtub, the incidence would be represented by the water flowing into the tub, with the prevalence being depicted by the water in the tub (pool of cases). The prevalence pool is affected by individuals leaving the population (e.g. death), which can be viewed as the drainage from the tub and the ones being cured (depicted as evaporation from the pool).

Ask the participants: what happens when the water flow (prevalence) increases and the drain (deaths) is plugged? (Expected answer)  The water in the bath tub (the prevalence pool ) will increase

Prompt Audience: If people are not dying, but prevalence is increasing, are you doing well? Answer is yes and no.

Session III: Introduction to Epidemiology 59 Epidemiology for Data Users (EDU) Trainer‘s Manual

Why? Yes, because fewer people are dying and more individuals are living longer with the disease. And no, because the incidence may still be high, meaning the disease has not been eliminated.

Facilitator background information If a disease is curable or fatal, its prevalence decreases. If a disease is only treatable and not curable, its prevalence increases. For example, in countries where Anti-Retrovirals are readily available for individuals living with HIV, the prevalence of the disease increases.

Incidence vs. prevalence

Characteristic Incidence Prevalence

Numerator New Cases All Cases Denominator Susceptible All cases + (At risk) non-cases Time Duration Single point

23

Speaker notes Review the slide together with the participants.

Session III: Introduction to Epidemiology 60 Epidemiology for Data Users (EDU) Trainer‘s Manual

Practice Scenario

A town has a population of 3600. In 2003, 400 residents of the town are diagnosed with a disease. In 2004, 200 additional residents of the town are diagnosed with the same disease. The disease is lifelong but it is not fatal. • How would you calculate the prevalence in 2003? In 2004? • How would you to calculate the incidence in 2004?

Speaker Notes: Introduce the exercise to the participants. The activity is a practice scenario of calculating incidence and prevalence. Let the participants practice the calculations. Say we are looking at a town with a population of 3600. In 2003, 400 residents of the town are diagnosed with a disease of interest. In 2004, 200 additional residents of the town are diagnosed with the same disease. The disease is lifelong but it is not fatal. How would you calculate the prevalence in 2003? How about in 2004? How would you to calculate the incidence in 2004?

Facilitator Background information Note that we can calculate prevalence in both 2003 and 2004, but we can only calculate incidence in 2004. This is because we do not know how many of the cases in 2003 were NEW cases. In 2004, we can see that we have 200 new cases. Also, the prevalence calculations will not be affected by case-patients that recover or case-patients that die. No one recovers or dies from this disease, so we

Session III: Introduction to Epidemiology 61 Epidemiology for Data Users (EDU) Trainer‘s Manual do not have to consider this in the prevalence calculation. On the next slide, we will look at how to calculate these.

Practice Scenario Answers

• Population : 3600 • 2003: 400 diagnosed with a disease • 2004: 200 additional diagnosed with the disease • No death, no recovery

Prevalence Prevalence Incidence (2003) (2004) (2004) Numerator 400 600 200 Denominator 3600 3600 3200 11.1% 16.7% 6.3%

Speaker Notes: Before you show this slide, firstly ask the participants to give the answers they have come up with. Collect several responses. Review the slide and compare with the answers from participants. Clarify and demonstrate how to calculate the measures

Facilitator background information A short re-cap is at the top of this slide for your reference: we have a population of 3600, with 400 cases in 2003 and 200 more cases in 2004. To calculate the prevalence in 2003, we simply take the number of people with disease in 2003 (which is 400), and divide that by the total population (which is 3600). This yields a prevalence of 11.1%. The prevalence calculation is done in the same way in 2004, only this time we add the 200 NEW cases to the 400 OLD cases, so the numerator is 600. This gives a 2004 prevalence of 16.7%.

Session III: Introduction to Epidemiology 62 Epidemiology for Data Users (EDU) Trainer‘s Manual

To calculate the 2004 incidence, we take the new cases in 2004 (which is 200), and divide by the total population at-risk. Anyone who already has the disease is not at risk of contracting the disease again, so those 400 prevalent cases from 2003 have to be subtracted out. This leaves us with a total population of 3200 who are at-risk of contracting the disease. 200 divided by 3200 gives an incidence of 6.3%. For this disease, the prevalence will climb higher and higher as each year passes, because no one ever recovers!

Use of Epidemiology in Health Program Planning Epidemiological data are useful to: – Plan and respond to health issues – Monitor and evaluate health services – Set programmatic priorities and allocate scarce health care resources – Influence public policy, including health policy

Speaker notes Review the slide.

Facilitator background information Epidemiology is used for diagnosis of the health of the community. We try to make a picture of the characteristics of the community with respect to its demographic makeup and the health problems that exist in the community. From this information we can formulate specific plans and programs to intervene in order to improve the health of the community. Another use is to improve the working of health services. For example, we want to find out if there are areas of

Session III: Introduction to Epidemiology 63 Epidemiology for Data Users (EDU) Trainer‘s Manual the community that are lacking in health services or whether there are some that are overlapping and use the knowledge to take corrective measures.

Measles Cases by Week of Onset 4 January 2010 to 19 September 2010

28

Speaker Notes: Ask the participants: What is the above graph showing? (Expected answer)  Number of measles cases Review slide and summarize

Facilitator background information This graph displays the number of measles cases during the period of January 4 to September 19, 2010. The dates are displayed as intervals on the x-axis and the number of cases on the y-axis. The bars represent how many cases for the specific time period. You can see that there was an increase in cases that peaked in July and then declined. Assuming that Supplemental Immunization Activity (implemented the week of 19/07/2010) was responsible for reducing incidence and funding wasn‘t an

Session III: Introduction to Epidemiology 64 Epidemiology for Data Users (EDU) Trainer‘s Manual issue, when would it be better to implement interventions? How would we know?

Prevalence – Measles Example, Lusaka • In order to achieve herd immunity, 90% vaccine coverage needs to be achieved • Administrative coverage both nationwide and in were >90% • Even where the coverage had been reported as high by administrative method, there may be an aspect of inaccurate estimate (in this case, an underestimate) of population denominator. So, many children may have missed the vaccination

Speaker notes Review slide

Facilitator Background Information Epidemiology can be used to influence health program planning. In this case, in order to prevent a measles outbreak in Lusaka, 90% vaccine coverage needs to be achieved (herd immunity). However, recently in Lusaka, while 90% vaccine coverage had been reported, a measles outbreak occurred. This was due to an inaccurate population denominator (underestimated) so in reality many may have been missed in vaccination and >90% was not achieved. This illustrates importance of using accurate data (in this case the correct denominator) in program planning.

Session III: Introduction to Epidemiology 65 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measles Vaccination Status by Age Group (Lusaka District)

Incidence Not eligible for Not Vaccinated rate/ 1,000 vaccination Vaccinated pop

<9 months 1279 28.8

9-59 months 490 2673 12.5

6-15 years 221 1109 2.9

• Age Eligible for Measles Vaccine: 9 months

29

Speaker Notes: Ask the participants: What does the table show? Answer – measles vaccination status by age group. Review the slide and summarize

Facilitator background information The table indicates that:  Intense measles transmission especially involving children aged 9 months, too young to be vaccinated and very high CFR; some of which could be due to malnutrition.  Most measles cases were in children not previously vaccinated. Accumulation of susceptible children missed in routine immunization and prior follow‐up campaign of poor quality. For future program planning, the following would be recommended:  The scheduled measles immunization campaigns for the year 2010 came in hand to serve as an outbreak response given resource constraints.  The EPI team to look at ways of strengthening routine immunization program to ensure that children get all scheduled vaccinations.

Session III: Introduction to Epidemiology 66 Epidemiology for Data Users (EDU) Trainer‘s Manual

 Need to budget for next follow‐up campaign to avoid the situation seen with measles in 2010.

Use of Epidemiology in Health Policy • Epidemiological research results can influence public policy, including health policy • Local evidence is usually required before a strong case can be made for policy change • Time scale for the application of epidemiological research to policy varies

30

Speaker Notes: In the application of epidemiology to public policy in a given country, difficult decisions have to be made about the relevance of research done in other countries since it is usually impossible, and probably unnecessary, for major studies to be repeated.

Session III: Introduction to Epidemiology 67 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example: Malaria Treatment Policy in Zambia • Significant increase in incidence of malaria (121.5/1,000 persons in 1976 to 308/1,000 persons in 1999) • Surveillance data indicated increased resistance to chloroquine and treatment failure (25.9% - 52.0%) • This evidence prompted Zambian government to change malaria treatment policy: from use of chloroquine to artesiminin-based combination therapy (ACT) • Phased scale up in ACT deployment

Speaker Notes: The scale up included 7 districts in early 2003, 28 districts in second half of 2003, and then to all 72 districts countrywide in 2004. This illustrates how summarizing and analyzing routinely collected data can influence drug/treatment policy change throughout the country.

Session III: Introduction to Epidemiology 68 Epidemiology for Data Users (EDU) Trainer‘s Manual

Randomised Controlled Trials of Male Circumcision to Reduce HIV Infection

Rakai, Uganda

Kisumu, Kenya

Orange Farm, South Source: 2006 Report on the global AIDS epidemic Africa 32 (UNAIDS, May 2006)

Speaker Notes: Randomized controlled trials were conducted in South Africa, Uganda and Kenya between 2002 and 2006 involving 10,908 men on whether male circumcision reduces risk of HIV acquisition in men.

Conclusions and Recommendations from WHO/UNAIDS Technical Consultation • Conclusion Randomized controlled trials in South Africa, Uganda and Kenya between 2002 and 2006 involving 10,908 men provided compelling conclusive evidence that MC reduces the risk of HIV acquisition in men by about 60% • Key Recommendations: “Promoting male circumcision should be recognized as an additional, important strategy for prevention, but not replace other known methods of HIV prevention and should always be considered as part of a comprehensive HIV 33 prevention package” Speaker Notes: This is an example of how epidemiology can be used to provide evidence to influence policy

Session III: Introduction to Epidemiology 69 Epidemiology for Data Users (EDU) Trainer‘s Manual

Zambian MOH Policy and Implementation

• Male Circumcision adopted as effective National prevention strategy • Should be scaled up throughout country • Strategy & Implementation plan launched in July 2009 • Target 2.5 million males circumcised by 2020

WRAP-UP AND REVIEW

Session Objectives

At the end of the session participants will be expected to: Define epidemiology Describe three key elements of epidemiology Determine when epidemiology may be used to identify a health problem, target control measures, and influence program planning or policy Calculate and interpret primary measures of disease frequency

35

Speaker Notes: As stated at the beginning of this session by now you should be able to provide a definition of epidemiology and describe three key elements. You should also be able to determine when epidemiology may be used to identify a health problem, implement control measures, or influence program planning and policy. Finally you should have a good understanding of the most commonly used measures of disease frequency, both how to calculate and interpret them.

Session III: Introduction to Epidemiology 70 Epidemiology for Data Users (EDU) Trainer‘s Manual

RESOURCE CENTER

Resource Center

• Gordis, L. ( 2000). Epidemiology: 2nd Edition. W. B. Saunders Company: Philadelphia.

• Kipp, A. (2004). “Overview of Epidemiology in Public Health.” North Carolina Center for Public Health Preparedness, UNC Chapel Hill School of Public Health.http://www.sph.unc.edu/nccphp/training/all _trainings/at_epidmeth.htm

Speaker Notes: Here are some additional resources that may be useful in understanding epidemiological concepts.

Session III: Introduction to Epidemiology 71 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION IV: DATA QUALITY

Data Quality

Description of Session: Explain to the participants that this session will discuss data quality and why having quality data is important. Additionally it will provide participants with ways of identifying gaps in data quality and how to address those.

Learning Objectives

Objectives

At the end of this presentation, participants should be able to: • Define data quality and describe its importance for data entry, synthesis and reporting • Determine factors contributing to poor quality data • Follow criteria for assessing data quality • Recognize breaches/gaps in data quality

Speaker notes Display the objectives slide above and read them out to the participants.

Session IV: Data Quality 72 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Data Quality Lecture/ Copies of question PowerPoint and answer presentation for each participant

Blank flipchart paper and markers

Session IV: Data Quality 73 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is Data Quality?

• Features and characteristics that ensure data are accurate and complete and that they convey the intended meaning

• The degree of excellence of data

 Data of high quality improve the intended use in operations, decision making and planning

4

Speaker notes Start by reminding the participants that during the Data Sources presentation, it was discussed that Data is collected (patient records, census, population surveys, etc.) and then entered into databases (such as HMIS, NACMIS and SmartCare). Data is then analyzed, summarized and included in reports. These reports are often used by partners for decision making. At all these stages, Data Quality plays an important role. Ask the participants what they understand by the term DATA QUALITY? Collect several responses and note them on a flip chart and brainstorm. (Expected Answers)  Data should be accurate  Data that is timely  Etc. (Presenter should solicit more answers from the participants) After collecting several responses, you may now beam the PowerPoint slide below. Building on the responses from the participants, summarize and define data quality. Additional information is provided in the facilitator‘s background information below.

Facilitator background information: • Data quality involves things that help one be sure that the data say the right thing, and that the right thing is accurate. • Data that can be described as high quality will hold stronger weight in a decision-making process; data that are of poor quality may be discounted and/or ignored.  Decision and policy makers should always assess the quality of data, and be able to speak to the quality of the data before using it.

Session IV: Data Quality 74 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data quality can be defined in many ways, but getting back to our approach to epidemiology – answering the questions of who, what, where, when, why – we might want to be able to speak with some confidence about the answers, and to do so, we might want to consider these ideas related to data quality:

 Do we trust the data? o Are the data reliable? . For example, do we know that the people entering or submitting the data know what they are doing?  i.e., have the CATF members been well trained in the data collection form? Can they correctly identify the definitions of the indicators? Do they input the correct data into the indicator measure? o Are the data valid? . For example, do we know that the people reporting the data have double checked what they are actually submitting for errors?  i.e., does a site supervisor review the data entry and aggregation of health center indicators before the quarterly report is submitted to the MoH?  Do the data represent all or some of what we are talking about? o Are all input sources represented? . For example, if you are expected to report on inputs from five sites, are all sites represented in the aggregate data?  i.e., if there are five districts in a province, has the PACA received all of the DACA‘s reports to compile their own? And, does the PACA know if the DACA has received and compiled data from all of the expected CATFs or NGOs? o Are all expected fields completed? . For example, if there are 10 indicators, and five reporting entities, have all five entities completed all of the 10 indicators?  i.e., if there are five VCT services within a hospital, when the aggregate report is completed, have each of those five services reported on all of the expected indicators?  Are the data current? o Are the data that you are using from the last quarter, or from a year ago? . For example, are you reporting on a current trend, or historic trends?

Session IV: Data Quality 75 Epidemiology for Data Users (EDU) Trainer‘s Manual

. Do you know that the reporting entity is reporting on the correct and anticipated time period? For example, if you ask for Q3 (quarter three) data, are you getting data from October through December (Q3 calendar year) or from July through September (PEPFAR Q3), or from another period

Why is Data Quality Important?

Reporting & Accountability

Program & Resource Policy Decision Allocation Making

Program Tracking Improvement Disease Trends

Speaker notes Before you beam the above slide, ask the participants in a brief interactive session why data quality is important. (Expected Responses)  Good decision making  Planning  Accountability etc (Solicit more answers from the participants)  Show slide and explain using additional background information provided:

Facilitatorr Background Information Good data quality is essential for all aspects of our daily program operations. Data quality is important for:  Program decision making - Understanding what your program or your region is doing, how you are doing compared to targets, and what kind of things you can do to improve processes and/or impact  Sharing program information to highlight impact, progress, or needs  Accounting for program and fiscal investments

Session IV: Data Quality 76 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data of high quality improves the intended use in operations, decision making and planning

Throughout the data collection process it is essential that data quality be monitored and maintained. Data quality is important to consider when determining the usefulness of various data sources; the data collected are most useful when they are of the highest quality. It is important to use the highest quality data that are obtainable, but this often requires a trade off with what it is feasible to obtain. The highest quality data are usually obtained through the triangulation of data from several sources. It is also important to remember that behavioral and motivational factors on the part of the people collecting and analyzing the data can also affect data quality.

Data Quality and M&E

Quality Data

Program Improvement Data Sharing with Partners

Reporting/ Accountability

M&E

Speaker notes Begin by reminding the participants that Data use was covered in the presentation on Data Sources. Ask the audience:

What are some of the [potential uses of data?

(Expected answers)  To monitor and evaluate programmes  To track the performance of indicators  Programme planning  Decision making

Session IV: Data Quality 77 Epidemiology for Data Users (EDU) Trainer‘s Manual

(Solicit more responses from the audience.) Using the PowerPoint slide, explain and emphasize relationship between data quality and monitoring& evaluation.

Facilitator background information

Throughout the data collection process it is essential that data quality be monitored, supported, and maintained.

 It is great to start with quality data, but and M&E system and a quality improvement effort can also work to improve the inputs. o M&E data should be timely, reliable, specific to the activities in question, and easy to understand. o An M&E system should be used to guide perceivable progress to improvement goals.

 With the ability to speak to the quality of one‘s data, and having confidence, programs can use the data to improve their work and their outputs.

 When programs have data that they understand and are confident in, they can share it with funders or others that require reporting data, and they can share it with others in their region to support effective evidence-based planning. o To be useful, information must be based on quality data and it also must be communicated effectively to policy makers and other interested stakeholders. The key to effective data use involves linking the data to the decisions that need to be made and to those making these decisions

Session IV: Data Quality 78 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Quality and Epidemiology

Bad Data Quality

leads to

Incorrect Data Analysed

leads to

Unreliable Epidemiological Results Slide 6

Speaker notes Firstly remind the audience about the questions that epidemiologists try to answer about a disease i.e. WHO, WHAT, WHERE, WHEN, WHY and HOW? Emphasize that to answer these questions, one needs to collect DATA.

Facilitator Background information Epidemiologists carry out extensive research and gather data about the patterns of diseases and health conditions. This data is analyzed and the conclusions that are reached are used to design interventions to control the spread of diseases. The Quality of that data is therefore important for Epidemiology because poor quality data will result in wrong analysis and conclusion and their unreliable and incorrect epidemiological results.

Session IV: Data Quality 79 Epidemiology for Data Users (EDU) Trainer‘s Manual

What are some of the Data Quality Issues?

?

Speaker Notes: Display above slide. In a brainstorming session ask the participants to give some examples of data quality issues.

(Expected Responses)

 Incomplete data  Late submission  Wrong computations etc

(Solicit more answers and ask participants to justify why the particular issue is a data quality issue)

Facilitator background information: Here are some ideas of questions that you might consider asking of the group as part of the brain storming session:

 What data do you collect? o Do you ever see if the source data match the report that you receive or submit? o What issues have you come across?

 Are there any PACAs here? o Every quarter, do you always get a report from every DACA?

Session IV: Data Quality 80 Epidemiology for Data Users (EDU) Trainer‘s Manual

o Every quarter, are you sure that the DACA‘s report represents every community-level/NGO activity that should be a part of the report?

 Has anyone here ever had to aggregate data from multiple submissions onto one report? o Have you ever had a problem reading the handwriting on the report? o Have you ever just guessed what was written on a report?

What’s wrong with this picture?

Speaker notes Ask the participants what is wrong with the data in this picture? How would you get information from this system?

(Expected answers)  Data is not stored adequately. A lot of forms containing data are not in a usable format.  It would be very difficult to information from such a system

Facilitator background information: Though this picture may be an example of data that are not being stored adequately, and/or that the data are not in a usable format, this may also be indicative of other issues:  These could be data that are waiting to be entered into the data collection system

Session IV: Data Quality 81 Epidemiology for Data Users (EDU) Trainer‘s Manual

 This is a site that has no storage space available despite repeated requests to their manager  This site has become a point-of-care data entry site (SmartCare), and these are historic files that were entered into the system and are waiting to be filed in archives.

Use of Improvised data collection tools

Speaker notes Ask the participants what is wrong with the data in this picture?

(Expected answer)  Some information is missing

(Solicit more responses from the audience)

Facilitator background information

Explain to them that this is an improvised exercise book and not the standard register Most of the information as requested in the standard register is missing. This consequently affects the reporting of information to the next level.

Session IV: Data Quality 82 Epidemiology for Data Users (EDU) Trainer‘s Manual

How would you address the situation as shown in the picture in our resource constraints environment?.

Speaker notes Ask the participants what is wrong with the data in this picture?

(Expected answer)  One person to attend to patients and also write reports

(Solicit more answers from the audience)

Follow up question: How would you address such a situation?

Facilitator background information As you can see in this cartoon, this woman is not receiving care because the health workers are overwhelmed with time-consuming data reports. We need to count this woman. Inefficient and disorganized data collection should not interfere with health services. Poor data quality has a detrimental effect on services rendered.

Session IV: Data Quality 83 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Quality Issues

• Too many data elements are being collected • Lack of data use locally • Absence of M&E “culture” • Multi-tasking - Registers and records are not adequately maintained because of other functions • Insufficient Financial, technical and human resources

11

Speaker notes Sum up data quality issues by using the slide and additional background information.

Facilitator background information: The slide shows some examples of data quality issues. Here are some examples of what the issue is, and a description of the impact that it might have:  Over/under reporting: this misrepresents what is actually happening o For example: if a site reports that it has tested five pregnant women found to be HIV positive, the MoH may decide to send ART for five women (PMTCT). If there are actually 10 pregnant women that were tested as HIV positive, there will not be enough ARVs to prevent HIV among the women‘s children

 Poor quality control o Using the same PMTCT example, this error could have been corrected if someone was reviewing and validating the source and the aggregate data.

 Omissions o Using the same PMTCT example as above, omissions could have been avoided if someone had checked the source data (i.e., the register) vs. the summary report. o Another consequence of omissions could relate to a decrease in financial support. Imagine that a district does not check that all sites are reporting to them every quarter, and only half actually do.

Session IV: Data Quality 84 Epidemiology for Data Users (EDU) Trainer‘s Manual

In this case, let‘s say that the submitted report describes that 10 sites in a district did 1,000 HIV tests in the quarter when in fact 30 sites in the district did more than 5,000 HIV tests. If someone in Lusaka were to look at the report, they may plan to make only 1,000 tests available to the district in the next quarter, to meet their stated need. Would this actually meet the district‘s stated need? Likely not.

 Double counting o Double counting can result in the opposite of what was noted above: it over-describes someone‘s performance or reach, and it may in fact draw resources away from others that have much greater need.

 Calculation errors o Calculation errors occur when two or more numbers are summed or compared to each other. Calculation errors can result in many of the aforementioned data quality issues, including over or under reporting.

 Change of report format o A change in report format may influence data quality in a positive or negative way, depending on how much training is provided to the users. If training and orientation is not provided, those who are using the form may not notice the changes, and may transcribe erroneous data unknowingly. Reporting formats can often be changed to better support a data collection process (i.e., after a pilot), but in every case, users must be oriented to the change.

 Non-systematic data collection o Systematic data collection allows for a standardization of the process and system, and can often support improve data quality as a user becomes accustomed to the process, and can implement quality improvement processes. Non-systemic data collection may impact data quality in the opposite way.

 Lack of supporting documents o The lack of supporting documents – i.e., DACA reports that were used to create the PACA report – means that data validation cannot be done.

Session IV: Data Quality 85 Epidemiology for Data Users (EDU) Trainer‘s Manual

o Inconsistent reporting periods Inconsistent reporting periods may misrepresent trends or needs, and/or may negatively impact the systematic nature of a data collection and reporting process

Common Errors in Data Quality

• Transposition - an example is when 39 is entered as 93. Transposition errors are usually caused by typing mistakes. • Copying errors - one example is when 1 is entered as 7; another is when the number 0 is entered as the letter O. • Coding errors - putting in the wrong code. For example, an interview circled 1 = Yes, but the coder copied 2 (which = No) during coding • Consistency errors - consistency errors occur when two or more responses on the same questionnaire are contradictory. For example, if the birth date and age are inconsistent • Range errors - Range errors occur when a number lies outside the range of probable or possible values

Speaker Notes: Ask the audience: What are some of the common errors I data quality? (Expected responses)  Addition errors etc Review slide and ask participants to add to the list.

Facilitator Background Information:  Transposition: A reversing of the numbers o 39 entered as 93 o This can also happen with letters

 Copying Errors: Entering the incorrect data where it looks similar, especially challenging when copying from poor handwriting o 1 entered as 7 or I o 0 entered as O

 Coding Errors: Putting the wrong numerical code into a data system o If 1=Yes and 2=No in a survey, and in the data entry process, 2 is entered rather than 1. o This can also happen with letters

 Consistency Errors: Where two variables should be the same in the data collection, but they are not

Session IV: Data Quality 86 Epidemiology for Data Users (EDU) Trainer‘s Manual

o Stated age and calculated age from date of birth do not match o Data of diagnosis is different in two parts of the form for the same person

 Range Errors: Range errors are where the data just don‘t make sense o A child of 3 years old is noted as pregnant o A male is listed as pregnant o A person diagnosed with HIV is noted as 96 years old.

These errors are not always easy to find, or to correct too long after the fact. Some of these errors can be flagged for validation if two people enter the same data, and a discrepancy is found. Another idea is to validate a proportion of all data (i.e., 5% per site), and doing further follow-up where issues are found

HINT: In assessing data quality, one should not make judgments from a quick scan. There may well be more than meets the eye. There are many reasons that data might be of poor quality.

Effects of Poor Data Quality

• Waste of additional program resources in order to correct errors • Late or poor reporting to stakeholders – Can lead to reduced confidence and support (and loss of funding) • Missed opportunities to identify strengths or gaps in program activities • Can lead to wrong decisions that could result in loss of life

Speaker notes Before showing this slide, emphasize to the participants that the data issues discussed above can have a negative effect. Ask the participants:

What is the effect of poor data quality?

(Expected answer)

 Wrong analysis and decisions are made

Session IV: Data Quality 87 Epidemiology for Data Users (EDU) Trainer‘s Manual

(Solicit more responses from the audience)

Sum up interactive discussion with the slide above and additional background information below.

Facilitator background information: Poor data quality can impact at many levels – site, district, provincial, or national – and may include:  A misunderstanding of what is happening, including program successes and/or program needs  A demotivation of the staff if their data reports to not seem to align with their work  A misallocation or loss of fiscal, programmatic, or human resources.

Ensuring data quality, Where does one start???

Speaker notes Ask the participants: What is the starting point in ensuring data quality?

(Expected response)

 At the point of data collection.

Solicit more responses from the participants. Speaker Notes:

Emphasize to participants that good data quality:  Do it early  Do it often

Session IV: Data Quality 88 Epidemiology for Data Users (EDU) Trainer‘s Manual

 Do it organized  Do it well

The process should start with the tools e.g. forms, procedures, human resources, equipment, etc. Don‘t assume. Make sure instructions, process, and responsible persons are identified before the first piece of data is collected.

Steps to Ensuring Data Quality in Data Management • Use standard data collection and reporting tools • Develop clear Standard Operating Procedures (SOPs )-Data Quality Assurance procedure (DQC & DQA) • Establish a data review culture • Establish training plan/orientation and on- site mentorship • Develop budgeted M&E plan that encompass printing of data tools

Steps to Ensuring Data Quality in Data Management • Establish timelines for data quality checks (Data Auditing) and submission • Performing data profiling and data flow analysis • Assess current data quality

Speaker notes Inform the participants that Data quality issues require solutions. There are some simple steps that you can take to ensure data quality. Review the slide

Session IV: Data Quality 89 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background information

 If you are not creating your own forms and you are responsible for collecting and aggregating the data from others, you should ensure that they have clear instructions and training.  when receiving paper-based and electronic records It is useful to have a data codebook next to you during analysis especially when asked to analyze a dataset you did not prepare yourself  Establish a strong data management system before collection begins  Update and manage information frequently  Ensure that good data storage, retrieval, and security systems are in place  Focus on data quality through all parts of the data flow. Data flow analysis is the process of mapping how information flows throughout the organization

Data profiling is a detailed analysis of available information. Typically data profiling will unearth metadata about the data being investigated. This is done through simple analytics such as the range of values, inferred data types, maximum and minimum values, and frequency distributions

Session IV: Data Quality 90 Epidemiology for Data Users (EDU) Trainer‘s Manual

3 C’s of Good Data Quality

Correct good quality data Complete submission by all (most) reporting facilities Consistent data within normal ranges population shifts definition changes 17

Speaker notes Emphasize to the audience that good, quality data needs to have all the three C‘s:

Correct, complete and consistent. Reinforce that good quality data on its own is useless, unless it is analyzed

Stages to check for data quality

Speaker notes Remind the participants that data passes through various stages. Some these are:  Collection  Entry

Session IV: Data Quality 91 Epidemiology for Data Users (EDU) Trainer‘s Manual

 Storage

At what stage do you check for data quality?

(Expected answer)  All the stages

Facilitator background information Data quality is important and every organization needs to pay particular attention to the quality of the data that it holds. Data flows from one or more sources and is stored in a management information system or database before it is processed or analyzed. At each of the stages in the data flow system, issues of data quality are critical. The quality of the data needs to be ensured and maintained if the data is to be of any use.

Data Collection

• Orient staff in the use of the data collection • Ensure all relevant staff are trained (data collectors, information officers, program managers) • Use standard tools – Encourage routine recording and reporting, using standard forms and procedures (avoid use of assumptions) – Emphasize on completeness – Develop a plan for addressing staff turnover

Speaker Notes: If you are not creating your own forms and you are responsible for collecting and aggregating the data from others, you should ensure that they have clear instructions and training.  Are the submitted forms correct (standard reporting forms?)  Are forms complete?  Are answers clearly written?  Are answers consistent?  Are figures tallied correctly?

Session IV: Data Quality 92 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information:  Create ownership: o Why is this important? . For staff to invest time in their work – especially the mundane task of data collection, entry, and reporting - they must understand and appreciate the value of the work they are doing. o What can we do to support it? . Providing staff training, supporting the staff with supportive supervision, providing tools and tricks for success, and most importantly, providing feedback to the staff (their data, as well as feedback on the quality) are absolutely integral to any success.

 Ensure all are trained, and more: o Why is this important? . A system will not function, let alone be successful, if there is not a trained staff. And, a staff is not one person. The person who is expected to do the work must be oriented and trained, but so must the people where data will be drawn from, the people managing and supervising the person, the person‘s team, and a back-up for the person. Data systems do not function with only one person per level. o What can we do to support it? . First and foremost, the output, the inputs, and the system in between must be clearly defined. Then, the roles and responsibilities needed for the system, and the staff that can fill those roles must be identified and trained. Finally, supportive and supervisory relationships and timelines must be established and supported.

 Design, support and provide feedback on a standard process: o Why is this important? . As previously noted, the system must be well designed and thought out to be put into place. Once in place, the system must be supported, and those involved must receive feedback and input on their work. If a system is note designed and described, there is no way for people to know what to do. If expectations are not set, and people are not provided feedback on their process, success cannot just be assumed. o What can we do to support it?

Session IV: Data Quality 93 Epidemiology for Data Users (EDU) Trainer‘s Manual

. Positive process and quality improvement, via mentorship and supportive supervision work well.

 Support quality and quality improvement: o Why is this important? . Quality improvement is a positive process where one looks at where they are, identifies ways to improve, sets out a plan and some goals, and then sets out to reach them. Data quality improvement can be self-driven or system driven, and can (and should) happen from the most organic/original point of data collection to ensure that a) they have the most accurate data, and to b) make the best use of resources. o What can we do to support it? . Data quality assessment tools and processes can be designed and implemented. These can be self-administered or done via supervision. The former allows for self-directed improvement and buy-in; the latter can be a good assistance and outside view.

 Create sustainability: o Why is this important? . Sustainability in a system allows for consistency and timeliness, and means that staff must not be trained in isolation, but rather in teams. o What can we do to support it? o In line with the roles and responsibilities, and the staff identified, groups of applicable staff should be trained and supported to allow for longer-term constancy where staff turnover, absenteeism, or task-shifting occurs.

Session IV: Data Quality 94 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Entry

•Set regular schedule for data review and entry •Data Cleaning • Treat all received raw data as “suspect data values” Check… – To see if responses in fields make sense – Number of records – For duplicates – If data are complete – For outliers (extreme or abnormal values) Always maintain the original file of unclean data and validated data separately

Speaker Notes If data is not what you expected, then you should question it. Here are a couple of hints for data quality – these are by no means exhaustive.

 Setting a schedule and timelines can help with getting applicable data to the right place at the right time. o Be sure to communicate the timeline and the expectations well, and check for understanding and trouble-shooting needs.

 Always scan the data for obvious errors before entering: o Do the numbers make sense? o Are all of the fields complete? o As you get to know your data tool well, you will know the fields that ‗trip‘ people up – double check those!

 Double-check the data after they are entered o Do a more in-depth review of a subset of the data . Consider a second person to do this o Double-enter a subset of the data and review for error . Consider a second person to do this

 If you are entering or managing coded data, be sure to have the codebook handy.  The Check Code function in Epi Info is used to verify the data entered in one field against data in other fields to avoid these errors.

Session IV: Data Quality 95 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Storage and Retention

• Determine where and how data will be stored • Store data according to data types • Ensure Data is complete but no duplicates • Determine how will the data be kept secure and who has access • Create back-up systems for electronic files • Determine how long data will be kept • Determine how data will be disposed

Speaker Notes Prompt Audience: Do you have disposal plans for data and data files? Ask participants from different organizations how long they keep their files.

Facilitator Background Information Anyone that has access to data should have a plan for how to store the data. In general, all health data should be kept in a private, safe, and locked environment. In the case of HIV data, the recommendation is to always have it double locked – i.e., in a locked file cabinet in a locked office.

When storing data, it is important to have a plan:  Who knows where the data are?  Who has access to the data, and why?  How long will the data be kept?  How will the data be disposed thereafter?

Session IV: Data Quality 96 Epidemiology for Data Users (EDU) Trainer‘s Manual

Feedback

• Data collection sites need feedback on data quality • Institute monitoring of data flow at all levels • Field visits/data audits to facilities/districts can be used to: • What data is collected and from which source(s). • Where and how recorded data is stored. • How the data is used, and how it passes both between systems and to data users. • Who is responsible for the data at both an operational and a strategic level Can identify strengths and weaknesses of the process Provide a summary of findings to site

Speaker Notes I know that we‘ve been saying this all day, but we cannot say this enough. Giving feedback to the reporting sites is among the most important step.

Remember that there is always a reporting site below your level, and you cannot be successful without their report.  Consider positive reinforcement and supportive supervision to establish trust and buy in to the system, and to collaboration with you.

Sites will appreciate information on the completeness, timeliness, and accuracy of the data they provide to you. If they do not receive feedback, they cannot improve.

Session IV: Data Quality 97 Epidemiology for Data Users (EDU) Trainer‘s Manual

Key Principles of Data Quality

Precision Reliability Validity Integrity Completeness Timeliness

Data are Repeated Data are Data are All intended Data collected, measures of a measuring accurate data are collection, analyzed variable have what was from the collected entry, interpreted the same intended time they submission, at an results; data and you are & use occur appropriat collection is are getting collected with e level of consistent the type of through appropriate detail to response reporting frequency & answer expected on schedule; questions data are available when needed

Speaker Notes: The whole process of indicator selection, data collection, collation/analysis to reporting should be monitored for data quality. The following lists show some keys issues to look out for when assessing the quality of the data being collected:

Data quality checks:  Precision: Related to reliability (above) is the need for the data accuracy to be known. Taking into account environmental, data entry, analysis or manipulation errors, check the margin of error in the data. Decisions making processes should take into account the magnitude of error.  Reliability: How consistent is this data? Was the data collected with the same tool (questionnaire, data form)?  Validity: Are the data being collected still the appropriate measure of the indicator? Has the sampling frame changed in any way that would affect the data? Was the data entry process free from data input and analysis errors (calculation errors, transposition errors, copying errors, coding errors, consistency errors etc.)  Integrity: Are the data free from personal or technical manipulation introduced willfully or unconsciously?  Timeliness: Are the data being collected in time for its use in decision making? The time of collection, collation and reporting (put together) should be before the data are required for decision making. There is usually the temptation to only emphasize the time for data collection at the expense of analysis and reporting.

Session IV: Data Quality 98 Epidemiology for Data Users (EDU) Trainer‘s Manual

Good Data Quality

 AVAILABLE ON TIME fix dates for reporting

 AVAILABLE AT ALL LEVELS who reports to whom - feedback mechanisms

 RELIABLE & ACCURATE check that all data is correct, complete, consistent

 COMPREHENSIVE collected from all possible data sources

 USABLE if no action, throw data out

 COMPARABLE same numerator & denominator definition 24 used by all

Speaker notes

• Emphasize that data, in order to be locally useful should be of good quality

• Examine the essential elements of good quality data (illustrate with appropriate examples)

Session IV: Data Quality 99 Epidemiology for Data Users (EDU) Trainer‘s Manual

Using program graphs to improve Data Quality

Speaker notes

Prompt for examples from participants for local data use.

Emphasize that even without a computer, you can display your data graphically as seen in above slide.

Graphs can easily display outliers or ‖funny‖ inconsistent trends that can point to data quality issues.

5.3 HMIS Completeness Table 6. District Performance in HMIS Completeness Zambia League Tables

Best Six Performers Worst Six Performers

HMIS HMIS District Completeness, % Rank District Completeness, % Rank Chadiza 100 1 Nchelenge 83.3 66 Chibombo 100 1 Luwingu 82.7 67 Kalabo 100 1 Chongwe 82.2 68 Mufulira 100 1 Sesheke 74.7 69 Mufumbwe 100 1 Siavonga 73.2 70 Solwezi 100 1 Kasama 67.9 71 Samfya 100 7 Kaputa 66.3 72 Average 100 88.3

Speaker Notes: This slide represents clear feedback on data quality and completeness in Zambia, as it relates to the HMIS. The data on HMIS completeness are presented in tables where each district‘s data completeness is scored and ranked compared to others.

Session IV: Data Quality 100 Epidemiology for Data Users (EDU) Trainer‘s Manual

This table in particular compares best six performers with worst six performers.

Ranking districts on data quality indicators (here completeness was used) can provide incentive for districts to improve data quality.

We must, however, remember that we must be sure to give support where support is needed, such as training, re-training, support, supervision, mentoring, and feedback on the process and the data submitted.

Using Feedback to Improve Data Quality

Speaker Notes: This slide represents actual feedback that was provided to NGOs providing HIV prevention services. The data are able to be compared easily in this bar chart.

Each of these agencies was provided with a copy of this chart with only their agency name showing. This allowed them to compare their performance in relation to other sites, while still not embarrassing them in front of others. The objective was to motivate them to improve their data completeness while not humiliating them.

In this case, certificates were given to all agencies that achieved the benchmark of 80% completeness that had been set earlier in the year.

Session IV: Data Quality 101 Epidemiology for Data Users (EDU) Trainer‘s Manual

Using Feedback to Improve Data Quality

Speaker Notes: This is a great example of an output of quality improvement processes.

Six months after the initial feedback was provided to the agencies – per the previous slide – these are the results that were seen.

 On the first chart one agency had 100% data completeness, and three had completeness scores in the 90%s  On this graph, four agencies scored 100%, and four others had scores in the 90%s.

Based on these results, new completeness benchmark for agencies to reach was set at 95%

Facilitator background information:

If you are looking for more examples, here is a real life example:  Agency X, the most poorly performing site in the first chart, increased their data completeness from 36% to 87% within six months.  Of particular note, Agency X had received prior warnings that its data completeness was not acceptable, but did not seriously address their deficiencies until they received the performance chart.

Session IV: Data Quality 102 Epidemiology for Data Users (EDU) Trainer‘s Manual

Exercise Assessing Data Quality

Speaker notes

Distribute the handout that matches this sheet and ask participants to spend about five minutes reviewing it for errors or outliers. Then ask the participants to pair up and compare what they found with a partner.

Exercise continued…. How Did We Do?

Speaker Notes: Get the large group back together and click through the potential errors on this sheet.

#1 Record number 798 appears to be missing #2 8008 was entered instead of 808 #3 818 entered twice #4 Report period is January

Session IV: Data Quality 103 Epidemiology for Data Users (EDU) Trainer‘s Manual

#5 Outlier. May or may not be an error. Everyone else who received an HIV test was an adult. Double check data #6 Another outlier in the age category. Double check data #7 Patient is marked both Male and Pregnant #8 Missing variable #9 Answers are either Y for yes or N for No. Invalid variable #10 Answers are either Y for yes or N for No. Invalid variable #11 Multiple data variables are missing #12 Missing variable

WRAP-UP AND REVIEW

Objectives

At the end of this presentation, participants should be able to: • Define data quality and describe its importance for data entry, synthesis and reporting • Determine factors contributing to poor quality data • Follow criteria for assessing data quality • Recognize breaches/gaps in data quality

Session IV: Data Quality 104 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION V: INTRODUCTION TO MONITORING AND EVALUATION

Introduction to Monitoring and Evaluation

Description of Session: This session will: 1. Provide an introduction to monitoring and evaluation and describe the purpose, importance, and reason behind implementing program monitoring and evaluation. 2. Describe the difference between monitoring and evaluation and how each is used. 3. Describe the connection between epidemiology and monitoring and evaluation. 4. Describe the components of a Monitoring and Evaluation (M&E) plan. 5. Describe different frameworks that can be used in developing an M&E plan. 6. Define what indicators are and characteristics to evaluate quality indicators.

Session V: Introduction to Monitoring and Evaluation 105 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives: By the end of this session participants will be able to:

Objectives

By the end of this lecture, participants should be able to… • Identify the basic purposes of M&E • Differentiate between monitoring and evaluation • Describe the functions of an M&E plan • Illustrate how frameworks are used for M&E planning • Identify steps toward constructing an M&E plan

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Introduction to Lecture Copies of the Monitoring and presentation Evaluation for each participant

Blank flipchart paper and markers

Sources: 1. M&E Fundamentals, USAID, www.globalhealthlearning.org 2. Evaluation Briefs, No. 3b, January 2009, Centers for Disease Control and Prevention, www.cdc.gov/healthyyouth/evaluation/index.htm 3. Performance Monitoring and Evaluation TIPS, USAID Center for Development Information and Evaluation http://www.who.int/management/district/planning_budgeting/Buildi ngResultsFramework.pdf

Session V: Introduction to Monitoring and Evaluation 106 Epidemiology for Data Users (EDU) Trainer‘s Manual

4. Marsh, David. 1999. Results Frameworks & Performance Monitoring. A Refresher by David Marsh (ppt) http://www.childsurvival.com/tools/Marsh/sld001.htm 5. Program Development and Evaluation, University of Wisconsin – Extension http://www.uwex.edu/ces/pdande/evaluation/evallogicmodel.html 6. Logic Model Workbook, Innovation Network Inc., http://www.innonet.org/client_docs/File/logic_model_workbook.pdf

Session V: Introduction to Monitoring and Evaluation 107 Epidemiology for Data Users (EDU) Trainer‘s Manual

Main Purposes of M&E

M&E Can: – Determine what has been achieved – Measure progress toward objectives and goals – Assess what works and does not work – Manage program activity: • Justify changes to workplans and budgets • Identify areas for technical support or capacity building • Identify strengths and weaknesses • Share information and lessons learned for broader use

Speaker Notes: Before presenting the slide, prompt audience: WHAT IS THE PURPOSE OF MONITORING AND EVALUATION? (Expected answers)  To measure progress.  To determine achievement of objectives  To make informed decisions etc (Solicit more responses from the participants)

After collecting several responses summarize using the slide and additional background information below.

Facilitator Background Information: Purpose of M&E: - To provide information on how and what work is being done. - To measure progress towards achieving program objectives and goals - To provide information that can be used to make decisions related to program funding, planning and implementation

Session V: Introduction to Monitoring and Evaluation 108 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is Monitoring & Evaluation?

Speaker notes Ask the participants WHAT IS MONITORING? (Expected answer)  Checking to see if targets are being met WHAT IS EVALUATION? (Expected answer)  Assessing the impact of a an activity or programme (Solicit more responses from the participants)

Session V: Introduction to Monitoring and Evaluation 109 Epidemiology for Data Users (EDU) Trainer‘s Manual

Monitoring AND Evaluation

Monitoring: What are we doing? Tracking inputs and outputs to assess whether programs are performing according to plans (e.g., people trained, condoms distributed)

Evaluation: What have we achieved? Assessment of impact of the programme on behaviour or health outcome (e.g., condom use at last risky sex, HIV prevalence)

Speaker Notes: Emphasize the difference between the two concepts. For example : Monitoring: course done by UNZA 2x/year, are meeting are target? Evaluation: HIV/AIDS change in behavior so that prevalence may decrease

Facilitator Background Information: While we often talk about monitoring and evaluation together it is important to distinguish between the two as the purpose is quite different. Monitoring data is used to track program implementation, that is, to track what we are doing, how we are doing it, and how many people are being reached. It is important to do this so that you can determine whether resources are being used in the way you intended them to be used. An additional definition of monitoring is: - The systematic collection, analysis and use of information from projects and programs for three basic purposes:  Learning from the experiences acquired;  accounting internally and externally for the resources used and the results obtained;  making decisions

Session V: Introduction to Monitoring and Evaluation 110 Epidemiology for Data Users (EDU) Trainer‘s Manual

Evaluation data is used to gauge where you are in reaching your outcomes, that is, is the program having an impact on the behavior or health outcome you are trying to impact. An additional definition of evaluation is: – The systematic method for collecting, analyzing, and using information to answer questions about the effectiveness of projects, policies and programs While monitoring is a continuous process that begins the moment a program is initiated, evaluation is usually conducted after the program ends or at key moments such as the end of a funding cycle. However, it is important to be thinking about evaluation from the beginning since often the information that is needed to evaluate the program must be collected while the program is happening. As it will be discussed later, this is where developing a logic model for the program can be helpful in identifying the outcomes the program is trying to achieve.

Session V: Introduction to Monitoring and Evaluation 111 Epidemiology for Data Users (EDU) Trainer‘s Manual

Illustration: Programme Impact (Evaluation)

With programme

Programme Programme Outcome Impact Indicator: Without programme CPR

Programme Programme Start Time End

Speaker Notes: CPR = Contraceptive Prevalence Rate

Explain to the participants that this graph shows that even when the program was not in place, the contraceptive prevalence rate (CPR) increased over time. However, after the implementation of the program, there was an even greater increase in the contraceptive prevalence rate. In evaluating the program at the time of the program‘s end, we can see that the impact of the program is equivalent to the difference between these two rates.

Session V: Introduction to Monitoring and Evaluation 112 Epidemiology for Data Users (EDU) Trainer‘s Manual

Differences between Monitoring and Evaluation MONITORING EVALUATION Monitoring is the routine Evaluation is the use of research process of data collection and measurement of progress methods to systematically toward program objectives. investigate a program’s effectiveness.

. Monitoring involves . Evaluation is periodic routinely looking at the quality of our services

. Monitoring is a continuous . Evaluation involves process measurements over time

.Monitoring involves . Evaluation requires study counting what we are doing design

Speaker notes Before showing slide, ask the participants: WHAT IS THE DIFFERENCE BETWEEN MONITORING AND EVALUATION? (Expected answers)  Monitoring routine measurement of progress  Evaluation is done at the end of project measure effectiveness etc (Use slide and additional background information to summarise) Ask the participants to add to the list.

Facilitator Background Information As we have discussed monitoring and evaluation are not the same thing but they are part of the same continuum. Monitoring is focused on following and documenting the process while evaluation is focused on assessing the impact. In other words, monitoring answers ―What Happens?‖ and evaluating answers ―Why does this happen?‖

Session V: Introduction to Monitoring and Evaluation 113 Epidemiology for Data Users (EDU) Trainer‘s Manual

MonitoringMonitoring & Evaluation and Evaluation Pipeline Pipeline MONITORING EVALUATION “Process” “Effectiveness”

All Most Some Few

Speaker Notes: Explain that the slide above shows the M&E pipeline in form of Inputs, Outputs, Outcomes and Impacts. Ask the participants for examples of: Inputs: (Expected answers)  Resources, Staff, Funds, Materials, Facilities, Supplies, Training Outputs: (Expected answers)  condom availability, trained staff, quality of services, knowledge of HIV transmission Outcomes: (Expected answers)  behavior change, attitude change, change in trends Impacts: (Expected answers)

Session V: Introduction to Monitoring and Evaluation 114 Epidemiology for Data Users (EDU) Trainer‘s Manual

 HIV/AIDS trends, AIDS-related mortality, social norms, economic impact

Facilitator Background Information: The Monitoring process focuses on the first three pieces – inputs, processes/activities and outputs, while evaluation focuses on outcomes and impacts. While it is easy to measure inputs, processes/activities and outputs, it is often more difficult to measure outcomes and much harder to measure impacts, as it often takes quite a bit of time for those to show an effect from programs.

12 Components of an M&E System

Source: UNAIDS

Speaker notes Explain to the participants what the 12 components of an M&E system are.

Facilitator Background Information: UNAIDS in collaboration with other organizations developed an organizing framework for a functional HIV Monitoring and Evaluation System. While, this

Session V: Introduction to Monitoring and Evaluation 115 Epidemiology for Data Users (EDU) Trainer‘s Manual framework was developed specifically for HIV work, it can be applied to other health areas. The 12 components are organized under three major areas: A. People, partnerships and planning (outer ring): 1. Organizational structures with M&E functions 2. Human capacity for M&E 3. Partnerships to plan, coordinate, and manage the M&E system 4. National multi-sectoral M&E plan 5. Annual costed national M&E work plan 6. Advocacy, communications, and culture for M&E B. Collecting, verifying and analyzing data (middle ring): 7. Routine program monitoring 8. Surveys and surveillance 9. National and sub-national databases for condition of interest 10. Supportive supervision and data auditing 11. Evaluation and research C. Using data for decision-making (inner ring): 12. Data dissemination and use

While these are presented in a sequential manner they do not need to be implemented in that manner. The idea is that in order to have a successful M&E program all of these pieces should be present.

What is an M&E Plan?

• An M&E Plan is a document that describes a system which links strategic information obtained from various data collection systems to decisions that will improve health programs • An action plan with activities, responsibilities, timeframes, and costs for each of the 12 components of an M&E system • Fundamental document to ensure: Accountability Measure of success

Speaker Notes: Before showing slide, ask the audience

Session V: Introduction to Monitoring and Evaluation 116 Epidemiology for Data Users (EDU) Trainer‘s Manual

WHAT IS AN M&E PLAN? (Expected answer)  The M&E plan describes the strategic information your program will gather and use for decision making that will lead to improved health programs and ultimately to improved health status. It is also the fundamental document that will hold the program accountable and tell you whether you succeeded or not. Follow up question: DOES EVERY PROGRAM NEED AN M&E PLAN? Expected answer – YES Using the slide explain the following.

Facilitator Background Information: Every program, project or intervention should have an M&E plan. This is the document that details a program‘s objectives, the activities to be implemented to achieve the stated objectives, and the procedures that will be implemented to determine whether or not the objectives are achieved. Additionally, the M&E plan should describe the strategic information the program will gather, how the information will be gathered and analyzed, the resources needed, and how it will be used for decision making. The M&E plan should be created during the design phase of a program, lay out the long-term (3 to 5 year) implementation strategy for the program, indicate resource requirements and outline how those resources will be obtained. To ensure that the plan is effective it should be seen as a living document that is updated regularly whenever a program is modified or new information is needed.

Session V: Introduction to Monitoring and Evaluation 117 Epidemiology for Data Users (EDU) Trainer‘s Manual

Functions of an M&E Plan

• State how the program is going to measure what it has achieved (ensure accountability) • Document consensus (encourage transparency and responsibility) • Guide M&E implementation (standardization and coordination) • Preserve institutional memory  M&E plan is a living document and needs to be adjusted when a program is modified.

Speaker Notes: Explain that an M&E plan has a number of functions and provide an outline the functions of an M&E plan. Emphasize to the participants that an M&E plan is a living document and needs to be adjusted when a program is modified or new information is needed.

Facilitator Background information In addition to stating how the program is going to measure what it has achieved, an M&E plan functions to document consensus, guide implementation, and preserve institutional memory.

Session V: Introduction to Monitoring and Evaluation 118 Epidemiology for Data Users (EDU) Trainer‘s Manual

M&E Plans and M&E Systems

• Relationship between an M&E plan and M&E system: The M&E plan documents all aspects of the M&E system. The M&E system is much larger – it consists of people and processes that work together to achieve the 12 performance goals of an M&E system • The M&E Plan also generally describes how the 12 components of the M&E system will function

23

Speaker notes Ask participants to explain the difference between an M&E plan and M&E system? Are they one and the same? (Expected answers)  The M&E plan is essentially a document that describes how the M&E system will work  The M&E system is the overall set of planning, information gathering and synthesis, and reflection and reporting processes, along with the necessary supporting conditions and capacities required for the M&E outputs to make a valuable contribution to project decision-making and learning.  The M&E plan is a part of the M&E system

Session V: Introduction to Monitoring and Evaluation 119 Epidemiology for Data Users (EDU) Trainer‘s Manual

Functions of an M&E plan by level of the health system

National

Province

District

Facility

Speaker Notes: Explain to the participants that the function of an M&E plan will vary based on the level at which it is being implemented. For example: National level (Comprehensive vs. Vertical Programs)  Allocate resources  Collect national data  Report to international donors and other stakeholders District level:  Allocate resources  Report to provincial/national Facility level (primary data collection):  Manage patients/clients, supervise providers  Manage logistics  Report to district

Facilitator Background Information: The information collected by an M&E system at each level should inform both the planning and data collection at the other levels. The initial level at which an

Session V: Introduction to Monitoring and Evaluation 120 Epidemiology for Data Users (EDU) Trainer‘s Manual

M&E plan is implemented is at the facility/implementing partner level. At this level the M&E plan is usually focused on collecting data that assists with the management of clients, understanding of program implementation, and management of program logistics. Program collected at this level is often reported to the next level, which in Zambia‘s case would be the district and/or province. Facility/implementing partner M&E plans should reflect the measures needed to inform the national strategy as well as those needed to ensure that program implementation and success can be monitored. The M&E plan at the district/province level should collect data that can inform both decision making for program resource allocation at the district/province level, and provide data that can be summarized at the national level. As with the facility/implementing partner, this plan should be strategically aligned with national priorities and also reflect the work being done at the facility/district/province level. Finally, the M&E plan at the national level should reflect the priorities of the national strategy and ensure that the information needed for decision making related to resource allocation, and national and international reporting.

Session V: Introduction to Monitoring and Evaluation 121 Epidemiology for Data Users (EDU) Trainer‘s Manual

Elements of an M&E Plan

Key sections of an M&E plan document: 1. Brief description 2. Evaluation framework 3. Description of plan indicators 4. Information system (data sources) 5. Dissemination and utilization plan

Speaker Notes: Program description-background, problem statement, objectives, etc. Framework Detailed description of plan indicators Data collection plan (for monitoring and evaluation), Monitoring and Evaluation Tools, and data sources Dissemination and utilization plan

Facilitator Background Information: An M&E plan should provide the following information: 1. Introduction to the plan 2. Brief description of the project and its evaluation framework 3. Detailed description if the indicators needed 4. Data collection plan including a description of the data sources and data collection tools 5. Plan for how monitoring and evaluation will occur 6. Dissemination and utilization plan for information gathered 7. Description of the process by which the M&E plan will be updated

The introduction section should describe the purpose of the plan, the monitoring and evaluation activities needed and their importance.

Session V: Introduction to Monitoring and Evaluation 122 Epidemiology for Data Users (EDU) Trainer‘s Manual

Additionally, it should provide some background on the history of the program development, a description of the program stakeholders, and the level of commitment and participation of the different stakeholders.

1. Brief project description

Include: • Problem statement • Objective • Implementation responsibility • Dates/duration • Geographic focus

Facilitator Background Information: The project description should start by describing the problem to be addressed and its impact on the community. It should also state what the long-term goal(s) of the project are, as well as, short-term objectives. It should also provide a description of the activities to be implemented, who will be responsible for implementation, the duration and geographic scope of the program activities, and a description of the target population. Finally, it should provide some information on the needed human, financial and infrastructure resources needed to successfully implement the project.

Session V: Introduction to Monitoring and Evaluation 123 Epidemiology for Data Users (EDU) Trainer‘s Manual

Problem Statement

• A problem statement that identifies the specific problem to be addressed. This concise statement provides information about the situation that needs changing, who it affects, its causes, its magnitude, and its impact on society.

Goals and Objectives

• Goal: broad statement about a desired long-term outcome of the program – For example, improvement in the reproductive health of adolescents or a reduction in unwanted pregnancies in X population

• Objectives: statements of desired specific and measurable program results. – For example, to reduce the total fertility rate to 4.0 births by year X or to increase contraceptive prevalence over the life of the program.

Speaker Notes: Objectives will be linked to goal.

Facilitator Background Information: It is important to understand that while program goals and objectives are linked there are significant differences between the two. Goals are what you set to achieve the mission of your organization or program. Goals should include words such as "increase/decrease," "deliver," "improve" and "create." Goals should be written in a clear and concise manner so that

Session V: Introduction to Monitoring and Evaluation 124 Epidemiology for Data Users (EDU) Trainer‘s Manual someone should be able to determine your program‘s purpose from reading your goals. Objectives are the milestones you will measure on the way to reaching your goal. Each goal should be linked to at least one objective, although often there are many more as program objectives often show how the program will reach its goal. When writing objectives it is important to ensure that they are clear and measurable. Often this is referred to as writing SMART objectives. The SMART acronym stands for the five characteristics every objective should be evaluated against. SMART objectives will provide the needed information to develop indicators and data collection processes that gather the needed information.

“SMART” Objectives

• Specific: Is the desired outcome clearly specified? • Measurable: Can the achievement of the objective be quantified and measured? • Appropriate: Is the objective appropriately related to the program's goal? • Realistic: Can the objective realistically be achieved with the available resources? • Timely: In what time period will the objective be achieved?

Speaker Notes: The objectives listed in the program description should be "SMART," an acronym that stands for:

Session V: Introduction to Monitoring and Evaluation 125 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background Information: As mentioned before program objectives should be written and measured against the SMART criteria. SMART objectives are: Specific – Means the objective is concrete, detailed, focused and well-defined. When writing specific objectives it might be helpful to answer the questions: - What is the program activity? - Who is responsible for implementation? - Why is it being done? - When is it being done? - How is it being done? The more specific an objective the easier it will be to measure. Measurable – Means that the measurement source is identified and we are able to track progress towards the achievement of the objective. Additionally, the objective should provide a reference point against which change can be clearly measured. Appropriate – Means that the objective is directly related to the program‘s goal and accurately addresses the problem and specific programmatic steps that are to be implemented. Realistic – Means that the objective can be attained within the specified time frame and with the available resources. Timely – Often also referred to as time-phased means that there is a clearly defined time frame for when the objective will be attained and measured. Ensuring that a time frame is included in the objective will assist in planning, implementing, and evaluating the program.

Session V: Introduction to Monitoring and Evaluation 126 Epidemiology for Data Users (EDU) Trainer‘s Manual

Is this objective SMART?

• Increase contraceptive prevalence by 15% in women 30-49 years of age

Speaker Notes: Specific: Yes, the intended outcome of the program is specified. Measurable: Yes, contraceptive prevalence is measurable. Appropriate: Unknown, because the program's goal would need to be provided in order to know whether the objective relates logically to it. Realistic: Unknown, because the resources available to the program would need to be known. Timely: No, the time within which the objective is to be achieved is not specified. So this objective is not known to be "SMART" because, although it meets some of the criteria, it is not known if it meets them all.

Session V: Introduction to Monitoring and Evaluation 127 Epidemiology for Data Users (EDU) Trainer‘s Manual

2. Frameworks

• Graphic depiction components of a project and steps needed to achieve desired outcomes • Helps to understand project goals & objectives • Serves as the foundation for selecting appropriate, useful M&E indicators • Types: – Conceptual framework – Results frameworks – Logic model

Speaker Notes: Frameworks, such as the three types discussed in this course, also help define the relationships between factors key to the implementation and success of a project, both internal and external to the program context. This design process deepens the understanding of managers, implementers, and other partners in many practical ways, including serving as the foundation for selecting appropriate, useful M&E indicators.  Based on a ―theory of change‖  Used to describe the main elements of a program and how they work together within a program  Provides rationale for programme/project/intervention existence  This may be in graphic form to portray the sequence of steps leading to program outcomes (i.e., conceptual framework, results framework, logic model)

Facilitator Background Information: Frameworks are key elements of M&E plans that depict the components of a program and the sequence of events needed to achieve the desired objectives

Session V: Introduction to Monitoring and Evaluation 128 Epidemiology for Data Users (EDU) Trainer‘s Manual

(outcomes). Frameworks are integral to defining the relationships between factors key to the implementation and success of a project, as well as, defining internal and external factors that can impact the program‘s success. As such they are crucial for understanding and analyzing how a program is supposed to work. This process of designing the M&E framework deepens the understanding of managers, implementers, and other partners in many practical ways, including serving as the foundation for selecting appropriate, useful M&E indicators. There are several types of frameworks and no one is perfect. Three commonly used frameworks are: - Conceptual framework – also called ―research framework‖ identifies and illustrates the factors and relationships that can influence the outcome of a program or intervention - Results framework - also called ―strategic framework‖ diagrams the direct causal relationships between the incremental results of the key activities all the way up to the overall objective and goal of the intervention. - Logic Model – also called ―M&E‖ framework provides a linear description of the program‘s needed resources and how they‘ll be used to achieve the program‘s objectives and goal.

Session V: Introduction to Monitoring and Evaluation 129 Epidemiology for Data Users (EDU) Trainer‘s Manual

Conceptual framework

• Places the health problem in a wider context, one that considers the various factors that can affect the program or intervention • Clarifies the causal relationships between these factors, and identifies those that the intervention may affect • Used for program design rather than for program M&E.

Facilitator Background Information: There are many ways to explain a conceptual framework. It can be any or all of the following:  A set of coherent ideas or concepts organized in a manner that makes them easy to communicate to others.  An organized way of thinking about how and why a project takes place, and about how we understand its activities.  The basis for thinking about what we do and about what it means, influenced by the ideas and research of others.  An overview of ideas and practices that shape the way work is done in a project.  A set of assumptions, values, and definitions under which we all work together.  A graphical depiction of the factors thought to influence the problem of interest and how these factors relate to each other.

Session V: Introduction to Monitoring and Evaluation 130 Epidemiology for Data Users (EDU) Trainer‘s Manual

A conceptual framework can work as a map for the program, that is, it can help us decide and explain why we would use certain methods to get to a certain point. It can also help us to understand and use the ideas of others who have done similar things. Understanding the conceptual framework of a program will be crucial in understanding how a program works and thus how it can be monitored and evaluated.

Conceptual Framework for HIV/AIDS

Speaker Notes: Conceptual frameworks are typically shown as diagrams illustrating causal linkages between the key components of a program and the outcomes of interest. For instance, in the example above, the program, in addition to other donors, is supplying health services, in order to increase service utilization, with the ultimate outcome of improved health. By identifying the variables that factor into program performance and depicting the ways that they interact, the results that can reasonably be expected from program activities are outlined. Clarifying this process permits program designers to develop valid measures for evaluating the success of the outcomes and also guides the identification of appropriate indicators.

Session V: Introduction to Monitoring and Evaluation 131 Epidemiology for Data Users (EDU) Trainer‘s Manual

Results Framework

• Show the causal relationships between the various intermediate results needed to achieve the objective • The effectiveness of these activities can be measured at each step along the way

Speaker Notes: Results frameworks, sometimes called ―strategic frameworks,‖ diagram the direct causal relationships between the incremental results of the key activities all the way up to the overall objective and goal of the intervention. This clarifies the points in an intervention at which results can be monitored and evaluated. This is the type of framework used by USAID in its performance monitoring plans.

Facilitator Background Information: A results framework can be both a planning and management tool. It is central to a program‘s strategic plan and provides a program-level framework for managers to gauge progress toward the achievement of results and to adjust relevant interventions and activities as needed. This framework can also be used as a communication tool to provide information about the program to other stakeholders as it captures the key elements of the strategy for achieving the objectives. Finally, it is important to think of a results framework as a living tool that needs to be kept current and should be revised/revisited when 1) results are not achieved as expected, 2) assumptions made during the design are shown to

Session V: Introduction to Monitoring and Evaluation 132 Epidemiology for Data Users (EDU) Trainer‘s Manual be invalid, 3) the theory underlying the program development is shown to be wrong, or 4) policy, resource, or operational problems weand under those are intermediate results or IRs. Under each IR are subordinate intermediate results or sub-IRs that relate directly to the intermediate results. For example, under IR2, factors related to the information system, training and supervision of clinicians, and provider performance lead to quality health services.

Results Framework

Health status Goal

Strategic Use of primary health care services objective Adoption of key preventive health practices

Intermediate Availability Quality of Demand for Partner’s, Results of, and primary primary health County Health accessibility health care care services Department’s, to, primary services and preventive and health care health Communities’ services practices capacity for sustainable primary health care services

Speaker Notes: As can be seen in this example, results frameworks include an overall goal, a strategic objective (SO) and intermediate results (IRs). This is a generic example of a typical results framework for a program to improve use of health services or practices. The four themes of access/availability, quality, sustainability, and demand are usually addressed. As you will see in subsequent slides, program frameworks contain more information that what appears in this abbreviated example. Notice that the goal and strategic objective appear at the top of the framework, and under those are intermediate results or IRs. Under each IR are subordinate intermediate results or sub-IRs that relate directly to the intermediate results. For example, under IR2, factors related to the information system, training and

Session V: Introduction to Monitoring and Evaluation 133 Epidemiology for Data Users (EDU) Trainer‘s Manual supervision of clinicians, and provider performance lead to quality health services.

Logic Model

25

Speaker notes Explain to the participants that a logic model sets out in more detail the logic of how an intervention contributes to intended or observed results. It represents the logical flow from: 1. Inputs: (financial, human and material resources) to 2. Activities:(processes: e.g., activities to prevent, diagnose, treat and rehabilitate patients suffering from HIV/AIDS) to 3. Outputs: (e.g., increase access to ARV treatment) to 4. Outcomes: ( e.g., improve quality of life for persons living with HIV) to 5. Impact: (e.g., reduce mortality rate from HIV/AIDS)

Facilitator Background Information: The logic model provides a simplified picture of a program, initiative, or intervention and shows the logical relationships among the resources invested, activities that take place, and the benefits and changes expected to result from program implementation. As with the results framework mentioned previously,

Session V: Introduction to Monitoring and Evaluation 134 Epidemiology for Data Users (EDU) Trainer‘s Manual the logic model can be thought of as a roadmap for program planning, implementation and management, as well as evaluation. A program‘s logic model has five essential components and the more detail one can have for each of the components the more useful the model will be. The five key components are: 1. Inputs: This details the resources needed to implement the program. It includes financial, human, material and structural resources. Providing detailed information on the needed resources in the logic model will assist in determining whether limitations for implementation will be present due to lack of needed resources. 2. Processes/Activities: Describes the activities that will be implemented in order to reach the desired objectives. It may be helpful to group related activities together under a single heading. 3. Outputs: Describe the measurable, tangible, and direct products of the program activity. While outputs lead to the desired outcomes, they are not themselves the changes you expect the program will produce. Outputs are usually used to assess the program implementation and are usually the measures used in monitoring a program 4. Outcomes: Describe the short or intermediate-term results expected from the implementation of the program activities. These can be written at either the population or program participant level. Outcomes refer to change such as change in behavior, change in knowledge, or change in a condition. 5. Impact: Also referred to as long-term outcomes of the program implementation. While this is still looking at the result of program implementation it often refers to an outcome that is expected to occur over time and often is a result of the short and intermediate-term outcomes.

Session V: Introduction to Monitoring and Evaluation 135 Epidemiology for Data Users (EDU) Trainer‘s Manual

Logical framework (Logframe)

• Project improvement tool – communicates complex project clearly and understandably on single sheet of paper • Participatory planning tool • Summarizes scope & key building blocks of a proposed project in an easy- to- understand, structurally- interrelated format • Uses – Plan & design – Reference for Monitoring, Evaluation, Progress Reporting, and Briefings

Speaker notes Ask the participants to indicate by a show of hands, how many of them have seen or have come across a logical framework? From among those that have to describe a logical framework (Expected answer)  It is a tool used in the design, monitoring and evaluation of projects (Solicit more responses from the audience)

Facilitator background information The Logical framework is a tool aimed at strengthening the design of interventions, implementation, monitoring and evaluation and it is used throughout the project cycle. It helps people to:  Organize their thoughts  Relate activities and inputs to expected results  Set performance indicators  Allocate responsibilities  Communicate information on the initiative concisely and unambiguously

Session V: Introduction to Monitoring and Evaluation 136 Epidemiology for Data Users (EDU) Trainer‘s Manual

Logframe Matrix

Objectively Narrative Verifiable Means of Assumptions Summary Indicators Verification / Risks

Goal

Purpose

Output

Activities

* PSI Logframe Handbook, 2000

Speaker notes Inform the participants that this is an example of a logical framework. Explain that the framework generally takes the form of a 4 by 4 matrix of columns and rows Ask the participants what is contained on the Narrative summary column: (Expected response)  Goal  Purpose  Outputs  Activities What is contained in the vertical logic? (Expected answer)  Objectively Verifiable Indicators (OVIs)  Means of verification (MOV)  Assumptions Explain to the participants the step by step sequence for constructing a logical framework.

Session V: Introduction to Monitoring and Evaluation 137 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information First step in constructing a Logical framework is that you prepare a general description of the intervention or Narrative summary as follows: 1. Outline the overall Goal of your initiative 2. Outline the purpose to be achieved 3. Outlines the outputs to be put in place 4. Outline the activities for achieving each output These statements need to be logically linked and you need to confirm that the logic is correct. The next step is now to define the vertical logic of your project. Here you: 1. Define the Objectively verifiable indicators at Goal, then purpose, then output, and the activity levels. 2. Define the means of verification to show how you will collect the evidence of your indicators, by whom and where it will be found 3. State the assumptions related to each level that you will need to manage or mitigate if you are to achieve your outputs or activities at each level.

Session V: Introduction to Monitoring and Evaluation 138 Epidemiology for Data Users (EDU) Trainer‘s Manual

The Logic of the Framework

DESIGN CAUSE & EFFECT SUMMARY Column Develop Attain Goal through framework GOAL from top to Purpose bottom PURPOSE Attain Purpose through Outputs OUTPUTS Attain Outputs ACTIVITIES through Activities

Attain Activities

Speaker notes Prompt the audience: What is the logic on the above slide? Answer – Cause and effect Solicit more responses. Review the slide and summarizer

Facilitator background information The logical framework is based on the concept of CAUSE and EFFECT. The logic is that if something occurs or is achieved, then something else will result. Each initiative described by a logical framework is based on this IF – Then or Cause and effect logic. For instance, if certain activities are carried out, you can then expect certain outputs to result. And if the outputs are delivered, then the purpose is met. When you meet the purpose, you expect to achieve the Goal. The Logical framework helps you to make this link between your objectives at the different levels much clearer.

Session V: Introduction to Monitoring and Evaluation 139 Epidemiology for Data Users (EDU) Trainer‘s Manual

Logframe Structure

Logical when read from bottom up

• Identifies what the project intends to do and achieve • Clarifies causal relationships (means to end) • Specifies important assumptions and risks beyond the project’s control

Speaker notes Review the slide. Emphasize that the logic is clearer when the framework is read from bottom going up, that is starting with activities.

Summary of Frameworks

Type of Framework Program Basis for Monitoring and Brief Description management and Evaluation? Conceptual- Determine which No. Can help to Interaction of various factors the program explain results factors will influence

Results- Shows the causal Yes- at the objective- Logically linked relationship between level program objectives program objectives

Logic model- Shows the causal Yes – at all stages of Logically linked relationship between the program from inputs, processes, inputs and objectives inputs to process to outputs, and outputs to outcomes outcomes/objectives

Speaker Notes: The conceptual framework places the health problem in a wider context, one that considers the various factors that can affect the program or intervention, clarifies the causal relationships between these factors, and identifies those that

Session V: Introduction to Monitoring and Evaluation 140 Epidemiology for Data Users (EDU) Trainer‘s Manual the intervention may affect. It is used for program design rather than for program M&E. Results frameworks show the causal relationships between the various intermediate results that are critical to achieving the strategic objective. The effectiveness of these activities can be measured at each step along the way. Logic models help to show the logical connections between the inputs, processes, and outputs of an activity and how they link to the program's objectives (outcomes) and goals (impacts). They also clarify the linear relationships between program decisions, activities, and products.

Facilitator Background Information: Using frameworks is one way to develop a clearer understanding of the goals and objectives of a project, with an emphasis on identifying measurable objectives, both short-term and long-term. Frameworks, such as the three types discussed, help define the relationships between factors key to the implementation and success of a project, both internal and external to the program context.

3. Description of Indicators

– Organized by framework component – Attributes • Name of indicator • Description/definition • Unit of measurement • Data source (primary/secondary) • Baseline/target value(s) by year(s) • Frequency of data collection • Responsibility • Reporting plan and frequency

Session V: Introduction to Monitoring and Evaluation 141 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes Remind the participants that we have talked of OVIs in the logical framework. Ask the participants: What is an indicator? (Expected answer)  Indicators tell us how to recognize successful accomplishment of objectives (Solicit for more answers from the audience)

Facilitator Background Information: Indicators are clues, signs or markers that measure one aspect of a program and show how close a program is to its desired path and outcomes. They are used to provide benchmarks for demonstrating the achievements of a program. Another definition of an indicator that breaks down the relationship of the indicator to the program is: ―An indicator is a variable that measures one aspect of a program or project that is directly related to the program or project‘s objective.‖ Another key component of an M&E plan is a description of the indicators used to measure the desired outputs and outcomes of the program. The M&E plan should describe indicators for all areas of the M&E framework for which something is being measured. Hence it is possible to have indicators that measure inputs, outputs and outcomes of the program. The description of each indicator should include the following information: - Name of the indicator: Each indicator should be identified by a unique name - Description/definition: Each indicator should be operationalized with an agreed upon definition to ensure that what is being measured with that indicator is consistent across tools and activities. - Unit of measurement: The unit of measurement should be clearly defined for each indicator.

Session V: Introduction to Monitoring and Evaluation 142 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Data source: Data sources for each indicator should be identified and a description of the data source should also be included - Baseline/target value(s) by year(s): For each indicator a baseline value should be included whenever possible. Additionally, target values by the agreed upon time frame should also be included - Frequency of data collection: A description of how often the indicator will be collected should also be included. For example, some indicators may be collected every time the participant interacts with the program, while others may be collected every six-months - Responsibility: It will be important to describe who will be responsible for ensuring that the indicator is collected. - Reporting plan and frequency: Finally, it will be important to describe the plan how reporting on the indicator will be done and how often it will occur.

Examples of Indicators

• Percentage of women attending PMTCT clinics • Number of clients receiving VCT • Percentage of children aged 12-23 months who receiving measles immunization in a given year • Number of condoms distributed by health staff

Speaker Notes: Ask audience to give a few more examples

Session V: Introduction to Monitoring and Evaluation 143 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background Information: Above are some examples of indicators. Some guidelines and challenges to keep in mind when selecting indicators are: Guidelines: 1. Select indicators requiring data that can realistically be collected within the program being implemented and with the resources available 2. Select at least one or two indicators per activity or result. Ideally each indicator would have its own data source. 3. Select at least one indicator for each core program activity (e.g. training event, mobilization campaign, etc.) 4. Select no more than eight to ten indicators per area of program focus. 5. Use a mix of data sources whenever possible. Challenges: 1. Choosing an indicator that the program activities cannot affect. 2. Choosing an indicator that is too vague and does not address the specific issue the program is trying to impact. 3. Choosing an indicator that relies on data that is not available. 4. Choosing an indicator that does not accurately represent the desired outcome.

Session V: Introduction to Monitoring and Evaluation 144 Epidemiology for Data Users (EDU) Trainer‘s Manual

4. Data Sources

• What are some examples of data sources that you’ve learned in your previous lecture?

Speaker notes Remind the participants that data sources were covered in an earlier session. Ask them : What are some of the data sources covered? (Expected answers)  HMIS  NACMIS  Smartcare  DHS  Census etc Solicit more responses. Explain that data sources provide the data for you to measure your indicators.

Session V: Introduction to Monitoring and Evaluation 145 Epidemiology for Data Users (EDU) Trainer‘s Manual

5. Dissemination and utilization plan

Decide on: – Databases for information storage – Clearly defined users – Schedule and timeliness – Dissemination methods • Reports (schedule and audience) • Media • Speaking events – Information-utilization models

Speaker Notes: Adjustments to M&E plan: Program changes can occur, affecting the original M&E plan for both performance monitoring and impact evaluation M&E plan is a working document and needs to be adjusted when a program is modified

Facilitator Background Information: Another important part of the M&E plan is the description of how the information collected will be disseminated and utilized. It is important to think about and clearly delineate ahead of time who will be responsible for disseminating the information, how that will be done, and with what frequency. Additionally, it is very important to ensure that the data is utilized and describing the ways in which that will happen makes it more likely that it will indeed happen.

Session V: Introduction to Monitoring and Evaluation 146 Epidemiology for Data Users (EDU) Trainer‘s Manual

How does M&E relate to Epidemiology? • Epidemiology: The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems

• Monitoring and evaluation: is a way to determine the progress and effectiveness of interventions responding to health problems and their determinants.

Speaker notes Review the slide.

M&E AND EPIDEMIOLOGY

• Epidemiology: The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems • Monitoring and evaluation: is a way to determine the progress and effectiveness of interventions responding to health problems and their determinants.

 M&E feeds into Epidemiology studies and epidemiology is a means of collecting data for M&E

Speaker Notes: M&E entails collecting relevant information from past and ongoing activities to be used for program improvements, accountability, knowledge, evidence-based decisions and future planning.

Session V: Introduction to Monitoring and Evaluation 147 Epidemiology for Data Users (EDU) Trainer‘s Manual

Epidemiology is important to M&E for a number of reasons. One key reason is epidemiological information is used to plan and evaluate strategies, policies, programs, projects and other interventions aimed at preventing health problems. i.e., information can also be used to plan and evaluate the treatment of patients.

Facilitator Background Information: Epidemiological information can often be used to explain the context in which the M&E data is being collected. For example, when monitoring programs that provide ART in a country it will be more meaningful and provide a better gauge for how the program is working if it can be set against epidemiological information on the number of people living with HIV in the country. Again, while these are separate activities and often not related when it comes to health issues the two disciplines provide information that is valuable to each other.

Wrap-Up and Review

Objectives

By the end of this lecture, participants should be able to… Identify the basic purposes of M&E Differentiate between monitoring and evaluation Describe the functions of an M&E plan Illustrate how frameworks are used for M&E planning Identify steps toward constructing the M&E plan

Session V: Introduction to Monitoring and Evaluation 148 Epidemiology for Data Users (EDU) Trainer‘s Manual

Resource Center

• Frankel, N & Gage, A (2004). MEASURE Evaluation. M&E Fundamentals: a self-guided minicourse. • PSI: PSI Logframe Handbook • UN M&E Websource: http://www.unsudanig.org/workplan/mande/r esources/links.php • MEASURE M&E Websource: http://www.cpc.unc.edu/measure/training/m aterials/hiv-english/m-e-resources

Session V: Introduction to Monitoring and Evaluation 149 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION VI: DATA SUMMARY AND ANALYSIS

Data Summary and Analysis

Description of Session: This session will discuss different ways of summarizing and presenting data. It will discuss how common statistical measures are calculated and how to determine appropriate use.

Learning Objectives:

Learning objectives

At the end of this lecture, you should be able to:

• Determine which measures to use for summarizing specific data types • Calculate means, medians, range, and standard deviations • Identify which statistical tests are appropriate to use with which types of data

2

At the end of this session participants should be able to:

Session VI: Data Summary and Analysis 150 Epidemiology for Data Users (EDU) Trainer‘s Manual

1. Determine which measures are appropriate for summarizing different data types 2. Calculate descriptive statistical measures such as mean, median, range and standard deviation 3. Identify which statistical tests are appropriate for different data types

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Data summary and Lecture Copies of the Appendix B – analysis presentation Statistical for each Formulas participant

Blank flipchart paper and markers

Session VI: Data Summary and Analysis 151 Epidemiology for Data Users (EDU) Trainer‘s Manual

How do we convince people to listen to us? SHOW THEM THE DATA!

3

Speaker Notes: Just show them the data

Facilitator Background Information: Data is often crucial in making sure people understand the magnitude of a given problem and it is important to ensure that we are able to present in a way that makes it understandable.

Session VI: Data Summary and Analysis 152 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measles data in a table

4

Speaker Notes: The Minister of Health asks what is going on with Measles in the Lusaka District. Do you show him this? When presented in this form, the Minister of Health may not have the time to understand it fully.

Improving Presentation of Measles Data

What is the main point of these data? How can you make the data easier to interpret? How could we summarize these data to show context?

5

Session VI: Data Summary and Analysis 153 Epidemiology for Data Users (EDU) Trainer‘s Manual

Frequency of Measles Cases by Clinic, Lusaka District May 2010

Clinic No. of measles cases Percent Kanyama 3 8.3% Chipata 2 5.6% Chawama 7 19.4% George 2 5.6% Kalingalinga 8 22.2% Bauleni 1 2.8% Misisi 4 11.1% Ngonbe 3 8.3% Mtendere 6 16.7% total 36 100.0%

6

Speaker Notes: Prompt Audience: What can you say about this table? What is the table missing? (Answer: Catchment population.) Let‘s assume we want to highlight the distribution of these data in these 9 Lusaka district compound clinics. First we may want to summarize the cases/clinic by adding up the lines with each compound listed to make ―frequencies‖. Here you can see the frequency of cases per clinic for May 2010 and the percent of cases coming from each clinic (assuming for the sake of this exercise that these were the only clinics that existed). It looks like Kalingalinga, Chawama and Mtendere clinics have the highest percentage of cases, while George, Chipata and Bauleni have the lowest. Is there an easier way to compare these? RIGHT – a graph would be easier to read.

Session VI: Data Summary and Analysis 154 Epidemiology for Data Users (EDU) Trainer‘s Manual

Frequency of Measles Cases by Clinic, Lusaka District May 2010 9 8 7 6 5 4 3 2 1

0

Number of Measles cases Measles of Number

Misisi

George

Bauleni

Chipata

Ngonbe

Kanyama

Mtendere

Chawama Kalingalinga

Clinic 7

Speaker Notes: In this graph, you can see the frequency of cases per clinic more easily. It is easy to pick out the clinics with the highest and lowest frequency of cases. [If needed, name them again and discuss distribution]. What may we be missing by looking at the ―raw‖ frequency or number of cases? (If no ideas/answers) What information may help us to see these data in context? RIGHT – populations (or denominators). This graph could be misleading. We can‘t tell here whether a high proportion of the population served by a clinic was diagnosed with measles in May without the additional information.

Session VI: Data Summary and Analysis 155 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measles Cases per 1,000 Catchment Population by Clinic, Lusaka District May 2010

No. of measles Catchment area Clinic cases population Cases/1,000 population Kanyama 3 8500 0.35 Chipata 2 6000 0.33 Chawama 7 10000 0.70 George 2 2800 0.71 Kalingalinga 8 20000 0.40 Bauleni 1 5000 0.20 Misisi 4 8500 0.47 Ngonbe 3 4500 0.67 Mtendere 6 6300 0.95

8

Speaker Notes: Prompt Audience: Now that we have factored in the catchment populations, what clinic seems has the highest percentage of measles cases? Is this different than the clinic that had the greatest number of measles cases? Here we have added (fictional) catchment populations for each clinic, what would make this information easier to read? (GRAPH)

Session VI: Data Summary and Analysis 156 Epidemiology for Data Users (EDU) Trainer‘s Manual

Measles Cases per 1,000 Catchment Population by Clinic, Lusaka District May 2010

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10

0.00

Measles cases per 1000 population

Misisi

George

Bauleni

Chipata

Ngonbe

Kanyama

Mtendere

Chawama Kalingalinga

Clinic 9

Speaker Notes: Here we can see the cases/1000 population (1,000 is chosen arbitrarily so the numbers are a little easier to process). Does it still look like Kalingalinga, Chawama and Mtendere clinics have the highest percentage of cases, while George, Chipata and Bauleni have the lowest? No – Kalingalinga has a high catchment population, so the proportion of cases per population is lower than the raw number suggested. Mtendere now has the very highest proportion reflecting a high number of cases in a slightly lower population. George now has one of the highest proportion of cases/population. Is there anything else that would help us to add context to this graph? [possible answers: restrict age ranges of cases and catchment population, show this information on a map, show number of cases per day in each compound = epidemiologic curve. Lecturer can discuss as many as come out, but will specifically mention epi curve as follows to link to next slide]. We may want to know how many cases occurred each day in these compounds in May. We may want to know how many cases occurred overall in Zambia in May or in a longer period of time. These kinds of graphs are called ―Epidemiologic (or ―Epi‖) curves‖.

Session VI: Data Summary and Analysis 157 Epidemiology for Data Users (EDU) Trainer‘s Manual

Display data in a graph

4 Apr 4

6 Jun 6 7 Feb 7 Mar 7

5 Sep 5

2 May 2 9 May 9 Aug 1 Aug 8

11 Jul 11 4 July 4 Jul 18 Jul 25

10 Jan 10 17 Jan 17 Jan 24 Jan 31 Apr 11 Apr 18 Apr 25

<4 Jan <4

14 Feb 14 21 Feb 21 Feb 28 Mar 14 Mar 21 Mar 28 Jun 13 Jun 20 Jun 27

12 Sep 12

15 Aug 15 16 May 16 May 23 May 30 Aug 22 Aug 29 Week ending date/2010

Speaker Notes: Here‘s a good way to summarize these data for the whole country in an Epi curve. You could do this for any of the clinics or regions of the district separately if you wish to describe measles outbreak by region. The key points are 1) decide your main point – or SOCO and focus on it 2) decide how to summarize your data to give context to your raw numbers 3) view your data in graphs before deciding what else needs to be described/put in context.

Facilitator Information Background: There are different ways to present data and sometimes it is important to ensure that the way you present it tells the story by itself. In some cases having a table might be the appropriate format, but often a picture can more easily present the situation.

Session VI: Data Summary and Analysis 158 Epidemiology for Data Users (EDU) Trainer‘s Manual

Major steps in data summary

• Data cleaning • Summarize data using descriptive analytical techniques • Creating figures (in graphing lecture) • Optional – provide additional information regarding certainty of results using statistical testing

Speaker Notes: This will be elaborated on further in this lecture.

Facilitator Background Information: Data analysis can be broken down into three major steps. These are: 1. Data cleaning: This step ensures that the final dataset used for analysis has been examined and data that does not appear to fit with the collection tools, or that does not make sense is examined and followed up on. 2. Data Analysis: This is the step during which it is important to ensure that the measures that you are looking at make sense given the type of data, and that the questions posed of the data are answered 3. Statistical testing: During this step you‘ll identify whether statistical testing is needed and if so which tests might be appropriate to the data collected.

Session VI: Data Summary and Analysis 159 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Cleaning

• First step in analysis • Used to identify errors • Determine the source of errors • Correct erroneous information

Facilitator Background Information: Data Cleaning: The first step in data analysis is to clean the data. Data cleaning involves looking for obvious errors in data entry that can be followed up on and fixed. You can do this by running frequencies for each variable. Some things to look for are: 1. Is the value entered within the range of acceptable values for the variable? 2. Does the value make sense in relation to the expected values for the variable? 3. If there are logical steps to the collection tool, does the data entered match those, that is, if a variable should only have a value if the previous one was completed is that the case? 4. Take a sample of missing data and check it against the data collection tool. Is the data truly missing or was it not entered?

Session VI: Data Summary and Analysis 160 Epidemiology for Data Users (EDU) Trainer‘s Manual

Using Frequency Distributions to check data Quality • Display values a variable can take and number of records with each value • This initial look at variables (through graphs and tables) can identify: o Do all subjects have data, or are values missing? o Are most values clumped together, or is there a lot of variation? o Are there outliers? o Do the minimum and maximum values make sense, or could there be mistakes in the coding?

Speaker Notes: Steps to construct frequency distribution:  List all values a variable can take (from lowest to highest)  For each value, record observed frequencies (number of times of occurrence)  These are often done in software programs such as Epi Info (which you will learn later today) Outlier is an extreme deviation (the difference between one of a set of values and some fixed value) from the mean.

What to look for while cleaning data

• Outliers: (really high or low numbers) – For example, age value of 110 years could be an error for someone really 10 or 11 • Invalid data: Does the value entered answer the question asked with an understandable response? – If sex is asked and 1=male, 0=female, but the number “2” was entered – If sex is asked and “green” is entered • Missing values: Was no answer given, was the answer not entered in the database, was it meant as “zero”.

Session VI: Data Summary and Analysis 161 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background Information: Looking at a variable‘s frequency distribution can help answer a few initial questions about the data such as: 1. How much missing data is there for this variable? Will the amount of missing data impact analysis? 2. How varied is the data? Is the data clumped together around one value or is it evenly distributed across the possible values? 3. Are there some values that seem out of the expected range of possible values? That is, are there outliers present? An outlier is a value is an extreme deviation from most of the other values. While outliers can occur and represent true data points, they can also often be a sign of data entry or data collection error. 4. What is the range of values? Do the minimum and maximum values make sense, or could there be errors in data entry or coding? This question is related to 2 and 3 above and speaks to the need of understanding what one might expect from the data prior to initiating analysis.

Session VI: Data Summary and Analysis 162 Epidemiology for Data Users (EDU) Trainer‘s Manual

Ways of Improving and Checking Data Quality • Insert limits or parameters for valid responses – Epi Info can prevent entry of a 2 when only 0, 1, or missing are acceptable values • Double data entry – Enter all data twice and then compare the two entries to see if there are discrepancies • Explore data using simple graphs before analyzing

Speaker Notes: Be sure to explore and clean data before analyzing it.

Facilitator Background Information: It is important to put systems in place to guarantee data quality before getting to the data analysis and summary point. Often it is possible to put error checking in place during the data entry phase to improve data quality. For example, when programming data entry screens it is possible to define the range of acceptable values for a particular message, or create pop up error messages when data is missing. Another way to work on data quality is to do double data entry. This allows you to compare the two entries and verify any discrepancies.

Session VI: Data Summary and Analysis 163 Epidemiology for Data Users (EDU) Trainer‘s Manual

Summarizing data

• Recall your questions • Focus only on those analyses that will answer and give context to questions • Identify – Type of data you are summarizing (e.g., categorical) – Summary methods and statistical tests applicable for each type (e.g., frequency, chi-squared test) – Contextual data needed to fully describe results (e.g., population)

Types of variables

• Continuous- numerical data measured on an unbroken scale – Age: . . .18, 19, 20, 21, 22, 23 . . . – Weight: . . .150, 151, 152, 153 . . .

• Categorical –data that can be divided into distinct categories – Age Categories: 18-24, 25-44, 45-64, 65+ – Gender: Male, Female

12

Speaker Notes: Prompt Audience: What other examples of continuous and categorical variables can you think of? Continuous: height, temperature, blood pressure readings.

Facilitator Background Information: There are two common types of variables:

Session VI: Data Summary and Analysis 164 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Continuous: These are numerical data that is measured in an unbroken scale, such as age, weight, and CD4 count. Often these data can be categorized during data analysis. - Categorical: These are data that can be broken into distinct categories, such as gender and marital status. It is also common to take data that was collected using a continuous variable and categorize it for analysis purposes (e.g. age).

Types of Categorical Variables

• Ordinal- any categorical variable with some intrinsic order or numeric value – Scales: On a scale of 1 to 5, how much do you like this lecture? • Nominal- a categorical variable without any intrinsic order – Place of residence: which province in Zambia do you live in? • Dichotomous- binary variable is a categorical variable that has only 2 levels or categories – Yes/No question: did you eat nshima this week?

Speaker Notes: Within categorical variables there are three main types that we will discuss: - Ordinal variable is any categorical variable with some intrinsic order or numeric value. For example, categorize info on the highest education received among of a group of people in a variable called EDUCATION: graduated from university, graduated from high school, graduated from high school OR no schooling received). Scales (ie, on a scale of 1 to 5, how much do you like peanuts?) are also examples of ordinal variables. The order, not the magnitude, is most important in ordinal variables.

Session VI: Data Summary and Analysis 165 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Nominal variable is a categorical variable without any intrinsic order. For example, say we have a place of residence that characterizes the part of the Zambia in which a person lives—the North, etc. The categories of this variable have no numeric value or order. Residence in the Northwest has no quantitative value compared to the Northeast. Other examples of nominal variables include sex (male, female), marital status (married/not married). Neither order nor magnitude is important with respect to nominal variables. - Dichotomous, or binary variable is a categorical variable that has only 2 levels or categories. Many dichotomous variables represent the answer to a yes or no question. For example, ―Did you attend the church picnic on May 24?‖ or ―Did you eat potato salad at the picnic?‖ A variable does not have to be a yes/no variable to be dichotomous–it just has to have only 2 categories, such as sex (male/female). Neither order nor magnitude is important with respect to dichotomous variables.

Distribution of Categorical Data: an example Distribution of Measles Cases by Select* Treatment Centers in Lusaka, Zambia: 2010

Treatment Center Frequency Percent Chawama 337 38.2% Chelstone 8 0.9% Chilenje 6 0.7% Chipata 3 0.3% Kabwata 7 0.8% Kalingalinga 88 10.0% Kamwala 20 2.3% Kanyama 374 42.4% Kaunda sq 4 0.5% Makeni 18 2.0% Ngombe 17 1.9% Total 882 100.0% *Note: not all treatment centers are displayed. Speaker Notes: One way that we can look at categorical data is by listing the data categories in tables. For example, the table above displays the frequency number as well as the

Session VI: Data Summary and Analysis 166 Epidemiology for Data Users (EDU) Trainer‘s Manual percent of the total individuals in the table. When graphs or tables are used to report and describe data, they should include all relevant information in the titles and labels, such as population, dates, and total number. Let‘s talk about how we summarized these types of data in the first few slides of this talk. What kind of context did our summary require?

Facilitator Background Information: Frequency distribution of a variable is an arrangement of values for that variable showing how often each value occurred. When starting data analysis it is important to first look at the frequency distribution of each variable. This process will allow you to get familiar with the data and will also help in determining if there might have been issues with data collection that might have impacted the data quality.

Summarizing Continuous Variables

• Mean • Median • Mode • Range • Standard deviation

Speaker Notes: There are several measures that can be used to summarizing continuous data. They are: - Mean – average of all values of a continuous variable in the dataset

Session VI: Data Summary and Analysis 167 Epidemiology for Data Users (EDU) Trainer‘s Manual

- Median – the middle of the distribution when the numbers are in ascending or descending order; the number where half of the values are above and half are below - Mode – the value that occurs most - Range of values – from minimum value to maximum value - Standard deviation – a measure of how much the data varies from the mean

Distribution of Continuous Variables: an example

 Inconsistencies or unexpected results should always be investigated, using the original data as the reference point

Speaker Notes: An initial look at continuous variables can give you several important pieces of information: • Do all subjects have data, or are values missing? • Is there a lot of variation? (Figures 1a) or are most values clumped together (Figure 1b) • Are there outliers? (Figure 1a: outlier of 84) • Do the minimum and maximum values make sense, or could there be mistakes in the coding?  You can examine graphs or tables to see what your variables look like

Session VI: Data Summary and Analysis 168 Epidemiology for Data Users (EDU) Trainer‘s Manual

Normal distribution for AGE variable

Normal distribution for 'age' variable 14

12

10

8

Series2

6 Frequency 4

2

0 1 2 3 4 5 6 7 8 9 10 11 Age

Speaker Notes: This is the classic symmetrical bell-shaped curve. The clustering of values at the center is a key characteristic of this.

Skewed Distribu on for AGE variable

Frequency of the Age Variable 14

12

10 F r e q 8 u e n c 6 y 4

2

0 1 2 3 4 5 6 7 8 9 10 Age

Speaker Notes: A distribution that has a central location to the left and or right of the center, with a tail off to the other end, is said to be skewed, that is, not normally

Session VI: Data Summary and Analysis 169 Epidemiology for Data Users (EDU) Trainer‘s Manual distributed. In this figure, the distribution is skewed to the left: the distribution has a central location to the right and a tail to the left. You can remember this by thinking that the data on the left, or outliers, are causing skewness. Prompt Audience: Would the distribution of the incomes in Zambia be skewed to the left or to the right? (Answer: To the Right.)

Mean

Mean = sum of value # of observations

Example: age (yrs) of individuals (8, 56, 10, 8, 9)

91 Mean: sum = 8+8+9+10+56 = 18.2 yrs # obs. = 5 5

Adding an outlier (age = 56) changes the mean dramatically!

23

Speaker Notes: Adding an outlier of 56 (from an observation of 11) dramatically changes the mean from 9.2 to 18.2 years!

Facilitator Background Information: The mean is highly susceptible to outliers. Additionally, the smaller the number of observations the bigger impact an outlier will have on the mean value.

Session VI: Data Summary and Analysis 170 Epidemiology for Data Users (EDU) Trainer‘s Manual

Mean

Mean = sum of value # of observations Example: age (yrs) of children (8, 11, 10, 8, 9)

8+8+9+10+11 46 Mean: sum = = = 9.2 yrs # obs. 5 5

22

Speaker Notes: The mean or average (a more technical term is the arithmetic mean) is the value closest to all other values in a distribution. The mean is the best descriptive measure for data that are normally distributed but not for data that are skewed or have extreme values (outliers). The mean is affected by any extreme value because it uses all observations in the distribution. Method for calculating the mean: Step 1: Add all the observed values in the distribution Step 2: Divide the sum by the number of observations In the example above, summing all the observations and dividing by 5 (# of observations) equals 9.2

Session VI: Data Summary and Analysis 171 Epidemiology for Data Users (EDU) Trainer‘s Manual

Median

Median = Middle value

Example: age (yrs) of children (8, 11, 10, 8, 9)

Median: -Order data: 8,8,9,10,11 -Pick the middle value Here it is the 3rd: 9 years

24

Speaker Notes: The median is the middle value of a set of data that has been put into rank order. The median divides that data into two halves, with one half of the observations being below the median and the other half being larger than the median. The median is a good descriptive measure, particularly for data that skewed because it is a central point of distribution. Since the median is the middle value, it is not generally affected by one or two extreme values (outliers). For example, if the last value was 90 instead of 9, the median would still be 9.

Method for identifying the median: Step 1: Arrange the observations into increasing or decreasing order Step 2: Find the middle position of the distribution visually or by using the following formula: Middle position = (n + 1)/ 2 (n=number of observations) - If the number of observations (n) is odd, the middle falls on a single observation - If the number of observations (n) is even, the middle falls between two observations

Session VI: Data Summary and Analysis 172 Epidemiology for Data Users (EDU) Trainer‘s Manual

Step 3: Identify the middle position - If the number of observations (n) is odd the middle will fall on a single observation and the median will equal the value of that observation - If the number of observations is even the middle position will fall between two observations and the median will equal the average of the two values.

Mode Observation that occurs most frequently 9 12 15 15 15 16 16 20 26

Observation Number of occurrences 9 1 12 1 15 3 16 2 20 1 26 1

25

Speaker Notes: The mode is the value that occurs most often in a dataset. It can be determined simply by tallying the number of times each value occurs.

Method for identifying the mode: Step 1: Arrange the observations into a frequency distribution, indicating the values of the variable and the frequency with which each occurs. Alternatively, for a dataset with only a few values, arrange the actual values in the ascending order. Step 2: Identify the value that occurs most often.

Session VI: Data Summary and Analysis 173 Epidemiology for Data Users (EDU) Trainer‘s Manual

Properties and uses of the mode:  The mode is easiest measure of central location to identify and explain.  The mode is the preferred measure of central location for addressing which value is the most popular or most common (i.e., the mode is used to describe which day of the week patients prefer to come to the vaccination clinic)  A distribution can have more than one mode if two or more values tie as the most frequent values. It has no mode if no value appears more than once.

Range

• The spread of the data from lowest (minimum) value to highest value (maximum) Example: 9 12 15 15 15 16 16 20 26

Range = (9—26) Sometimes reported: Range = 17

26

Speaker Notes: The range of a set of data is the difference between its largest (maximum) value and its smallest (minimum) value. In the epidemiologic community, the range is usually reported as ―from the (minimum value) to (maximum value)‖ as two numbers rather than one (the difference between the two values). Method for identifying the range: Step 1: Identify the smallest (minimum) observation and the largest (maximum) observation Step 2: Epidemiologically, report the minimum and maximum values.

Session VI: Data Summary and Analysis 174 Epidemiology for Data Users (EDU) Trainer‘s Manual

Statistics describing continuous variable distribution

Facilitator Background Information: The slide above shows a graphic representation of the distribution for age data collected in years and the statistical measures previously discussed. By looking at this slide one is able to provide some information regarding the age of the population from whom the data was collected.

Session VI: Data Summary and Analysis 175 Epidemiology for Data Users (EDU) Trainer‘s Manual

Note: On the above slide, the meaning of the symbols used in this formula should be specified.

Speaker Notes: Standard deviation is the measure of spread (distribution out from a central value, the mean). A large standard deviation compared to mean indicates that the data are widely distributed. A small standard deviation compared to the mean indicates that the data are narrowly distributed. While it is easier to calculate standard deviation using a statistical package software such as Epi Info, the calculation can also be done by hand.

Method for calculating standard deviation: 1. Calculate the mean 2. Subtract the mean from each observation and square the result 3. Sum the squared differences 4. Divide the sum of the squared differences by n-1 5. Take the square root of the value obtained in Step 4. The outcome is the standard deviation.

Session VI: Data Summary and Analysis 176 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let‘s try calculating the standard deviation by hand. You would usually use a software package such as Epi Info to calculate this.

Speaker Notes: A common statistical distribution is the normal distribution, which is represented by a bell shaped curve. As you can visualize here, for normally distributed data, the mean (µ) is the best descriptive measure since this is where most of the values are concentrated. In addition, for normally distributed data

Session VI: Data Summary and Analysis 177 Epidemiology for Data Users (EDU) Trainer‘s Manual

68% of the data is within one standard deviation (σ) from the mean, 95% of the data is within two standard deviations from the mean, and 99% of the data is within three standard deviations from the mean. Outliers can be detected using a Bell Curve because they lie outside the normal distribution.

Did HIV prevalence decrease from 1994 to 2004?

Table 1. Trends in HIV prevalence among 15-39 year-old pregnant women attending one of 21 antenatal care sites, Zambia 1994-2004

Year HIV Mean Site Prevalence (%) 1994 19.6 1998 18.1 2002 18.6 2004 18.6  How do we know if the magnitude of decline (difference from 19.6 to 18.6) was meaningful?

31

Percent Change is Not a Good Test of Difference Scenario Absolute Percent increase difference during the 2- year period  1% HIV prevalence 2%-1% = 1% (2 -1)/1)*100% in district A, 2007 =100%  2% HIV prevalence in district B, 2009

 27% HIV prevalence 28%-27% = (28-27)/27)*100% in district C2007 1% =3.7%  28% HIV prevalence in district D, 2009

Speaker Notes: Percent change is strongly dependent on an estimate‘s magnitude and therefore not a good test of difference. In both scenarios, the absolute difference is 1%.

Session VI: Data Summary and Analysis 178 Epidemiology for Data Users (EDU) Trainer‘s Manual

However, the percent increase is 100% in Scenario 1 but only 3.7% in Scenario 2. Clearly, using percent change alone to describe differences can be very misleading if the magnitude of the prevalence estimates is not simultaneously considered. Further, percent change calculations do not take into consideration changes in population characteristics.

So, how do you know if…

• There has been a meaningful increase or decrease in prevalence? • A mean is really different from another mean? • A proportion is really different from another proportion?

 ANSWER: Do a statistical test!

Speaker Notes: In other words, testing for a statistically significant difference between estimates.

Facilitator Background Information: Often when comparing values they might look different but we want to know how different the values actually are, that is, is it a significant/meaningful difference or not. The best way to determine this is to conduct a statistical test for difference between the two values.

Session VI: Data Summary and Analysis 179 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is statistical testing?

• Statistical testing is the process of concluding from your data whether an observed difference is real or due to chance

• Statistical tests:

o T-test: test to compare 2 means

o Chi square (2 ): test to compare 2 proportions

Speaker Notes: T-test or 2 will be associated with p-value For example, a T-test could be used to compare the mean age of men wit the mean age of women in a group. The Chi square test could be used to compare the Proportion of Smokers who have Lung Cancer to the Proportion of Non- Smokers who have Lung Cancer.

Facilitator Background Information: There are several statistical tests that can be used to determine if observed differences are real or due to chance. The most commonly used ones are t-test or chi-square tests, depending on whether you are comparing means or proportions. Statistical testing assumes that your data follows a statistical distribution and calculates the probability of observing the distribution of your data.

Session VI: Data Summary and Analysis 180 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example of t-test: Are the Means Really Different?

Group 1 Weight Group 2 Weight

70 71 49 67 55 59 77 57 65 51 76 68 Group 1 Mean = 65.3

Group 2 Mean = 62.2 35

Speaker Notes: Have audience add up means for each group and then ask are the means different? Is there a statistically significantly difference?

Chi Square (2 )

• Chi Square can also be used to test differences between proportions for two or more populations

• T-test or 2 will be associated with p-value

Speaker Notes: It is based on the principle that if the two variables are not related (for example smoking status is not related to cancer) then measures applied to each variable

Session VI: Data Summary and Analysis 181 Epidemiology for Data Users (EDU) Trainer‘s Manual will give similar results (for example about the same proportion of smokers and non-smokers being found to have cancer) and any variation between the results being purely caused by chance.

P values

• If P-value is smaller than a pre-specified level (called significance level, 5% for example) then there is a 5% likelihood that your difference is just due to chance • Therefore, a p-value of  0.05 means your difference is statistically significant • T-test or 2 will be associated with p- value

Speaker Notes: Elaborate on p-value and that .05 is usually used T-test or 2 will be associated with p-value THINGS THAT EFFECT P-VALUE:  Size of difference o Bigger differences are easier to find  Size of the sample

Facilitator Background Information: A p-value is a measure of how much evidence we have against the null hypothesis. The null hypothesis, traditionally represented by the symbol H0, represents the hypothesis of no change or no effect, that is, the hypothesis that the observed difference is not meaningful.

Session VI: Data Summary and Analysis 182 Epidemiology for Data Users (EDU) Trainer‘s Manual

When conducting statistical testing, the p-value is the probability of obtaining a test statistic (t-value or 2-value) as extreme as the calculated one. The p-value is determined by using the existing model for that statistical distribution and tables that provide those values. If using a statistical software package, that value will be calculated for you. Commonly used p-values are 0.05 or 0.001, that is, significance levels of 5 or 1 percent, that is, one will say that if the calculated p-value is less than or equal to 0.05/0.001 then the value is statistically significant meaning that the observed difference is meaningful. Sample size can greatly impact whether an observed difference is statistically significant. With a small sample size one can only determine significant differences when those differences are very large, however with a larger sample size, smaller differences may be easier to detect. When conducting statistical test it is important to understand the relationship between sample size, size of the difference being tested, and the ability of the test to detect significance. During the study design phase it is important to take these issues into consideration and used statistical formulas to calculate what the needed sample size is to be able to detect the desired magnitude of change.

Session VI: Data Summary and Analysis 183 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example: p-value

•Are the means really different? (at level of p≤0.05)

•p=0.59

•p=0.01

38

Speaker Notes: We conclude that the association is ―statistically significant‖ at a level of p0.05  First p-value: not statistically significant (p=0.59 > 0.05) – The probability that the observed result was due to chance alone is 59%, much higher than the threshold of likelihood set for the statistical test.  Second p-value: statistically significant (p=0.01 < 0.05) – The probability that the observed result was due to chance is 1% which is below the threshold of likelihood set for the statistical test.

Statistical Tests provide an appropriate test of difference or trends over time • It is NOT possible to determine if a statistically significant trend exists by simply looking at the prevalence estimates

• To accurately determine if two prevalence estimates based on a sample of the population are different, or represent a trend, a statistical test must be used

Session VI: Data Summary and Analysis 184 Epidemiology for Data Users (EDU) Trainer‘s Manual

Statistically significant linear trend

• Statistically significant linear trend means the prevalence estimates either increased (A) or decreased (B) over time • A graph of the prevalence estimates will create a relatively straight line

Speaker Notes: If there is not a statistically significant linear trend then this situation is described as ―No change, 1991–2009.‖

Why Public Health Significance Also Should Be Considered • Statistically significant differences or trends do not necessarily have public health or “real world” significance • Statistically significant differences or trends are partly a function of sample size, they may not be big enough to merit public policy consideration • Statistically significant differences or trends should be considered as a minimum starting point for any discussion about differences or changes over time

Speaker Notes: Larger the sample, smaller change can be detected

Session VI: Data Summary and Analysis 185 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background Information: While statistical testing is a measure of significance of the observed change it should not be the only measure taken into consideration. Because of the need for very large sample sizes to measure small change often observed change is deemed to not be statistically significant, however it can have significant public health implications. While a 1% decrease in HIV prevalence may not be statistically significant, it can have great public health significance and impact on the public health system.

WRAP-UP AND REVIEW:

Learning objectives

At the end of this lecture, you should be able to:

 Determine which measures to use for summarizing specific data types  Calculate means, medians, range, and standard deviations  Identify which statistical tests are appropriate to use with which types of data

42

Session VI: Data Summary and Analysis 186 Epidemiology for Data Users (EDU) Trainer‘s Manual

RESOURCE CENTER:

Resource Center

• UNC CPHP Focus on Field Epidemiology, Volume 3, Issue 6: Data Analysis: Simple Statistical Tests

Session VI: Data Summary and Analysis 187 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION VII: INTRODUCTION TO EPI INFO

Introduction to Epi Info Epidemiology for Data Users

Description of Session: This session will provide participants with an introduction to the free data analysis software provided by CDC, Epi Info. Additionally the session will provide participants the opportunity to work with Epi Info using an example dataset.

Session VII: Introduction to Epi Info 188 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives:

Learning Objectives

By the end of this session, you should be able to:

• Describe Epi Info’s three main modules • Import data from an existing data set in Excel into Epi Info • Display the variables in a dataset using the ‘List’ command • Summarize data using the ‘Frequency’ and ‘Means’ commands in Epi Info • Generate simple graphs in Epi Info

By the end of the session participants should be able to: 1. Describe what the three main modules of Epi Info are and how they can be used 2. Import an existing dataset from Excel to Epi Info 3. Perform some basic analyses in Epi Info 4. Generate graphs with Epi Info

Session VII: Introduction to Epi Info 189 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Introduction to Epi Lecture Copies of the Info presentation for each participant

Blank flipchart paper and markers

Excel dataset loaded into presenter computer 2 Epi Info exercise Group work Excel dataset Appendix A: Epi loaded into Info handout participants computer

Session VII: Introduction to Epi Info 190 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is Epi Info?

• A free public domain software package developed by the Centers for Disease Control and Prevention (CDC) for the global community of medical and public health professionals • It can be used to: – Develop an electronic data entry form (i.e., questionnaire) – Enter data into this form – Analyze the entered data – Generate graphs, reports, map, etc.

Facilitator Background Information: Epi Info is a free software package developed by CDC for use by physicians, nurses, epidemiologists, and other public health workers lacking a background in information technology. These types of professionals often have a need for simple tools that allow the rapid creation of data collection instruments and data analysis, visualization, and reporting using epidemiologic methods.

Download Epi Info at: www.cdc.gov/epiinfo

Facilitator Background Information: Epi Info can be downloaded for free from the CDC website.

Session VII: Introduction to Epi Info 191 Epidemiology for Data Users (EDU) Trainer‘s Manual

Strengths of Epi Info

• Free • User-friendly – Point-and-click • All-in-one software – Design a data entry form, enter data, and analyze data • Time efficient – Can summarize data – Very useful for outbreak investigations

Limitations of Epi Info

• Like all known packages, Epi Info will not interpret statistics for you • Epi Info has not been designed to replace: – Professional data management software packages such as Microsoft Access – Professional data analysis software packages such as SAS, STATA, SPSS

Facilitator Background Information: Epi Info is user-friendly. It is an all-in-one package, allowing the user to design a data entry form, enter and analyze data on the same platform. It is also a very efficient package in summarizing data and it has been particularly useful in analyzing outbreak data. However, as with everything there are also some limitations to Epi Info. Epi Info is not a data management tool. While you can manage small datasets with Epi Info it is not recommended that you replace data management software such Microsoft Access with Epi Info. Additionally, Epi

Session VII: Introduction to Epi Info 192 Epidemiology for Data Users (EDU) Trainer‘s Manual

Info is not as robust of a data analysis software as SAS or SPSS, and just like these packages, Epi Info will provide you with the statistics but not the interpretation.

Opening Epi Info

The following screen should appear on your screen:

Session VII: Introduction to Epi Info 193 Epidemiology for Data Users (EDU) Trainer‘s Manual

Main Modules

• Make View : For designing an electronic data entry form (questionnaire) which automatically creates a data table(s)

• Enter Data: For entering data into the designed electronic data entry form

• Analyze Data: For conducting a relatively wide range of statistical analysis of the data

Speaker Notes: Epi Info consists of three main modules: 1. Make View: This module allows you to design your own data entry form. When the data entry form is design, the software automatically creates the dataset. Designing a questionnaire in Epi Info will be useful when you have a small amount of data to be entered, but should not be used for ongoing data collection. In that instance, it is better to used data management software such as Microsoft Access and then import the data into Epi Info for analysis. 2. Enter Data: This module allows the user to enter data into a form created using the Make View module. 3. Analyze Data: This module allows the user to conduct a range of statistical analysis of either an imported dataset or data entered by the user into Epi Info. When might you want to design a questionnaire in Epi Info? (related to strengths and limitations) Suggest explaining that entering data can be done manually, one data element at a time (into what – a database? And what exactly is a database?)

Session VII: Introduction to Epi Info 194 Epidemiology for Data Users (EDU) Trainer‘s Manual

or by importing an already existing set of data from another source (provide examples) Epi Info can be used to perform all three of these functions, but we will focus on some basic analyses today

Import and Export Data

• You can import data generated in some other software packages such as – Text, Excel, Access

• You can export data suitable for use in some other software packages such as – Text, Excel, Access

Speaker Notes: Epi Info allows you to import data generated using a different software package such as Excel, Microsoft Access, or even text files. Epi Info also allows you to export data entered into Epi Info in files compatible with those same software packages. Suggest explaining advantages of importing a data set (for example, to avoid manually re-entering data, thereby saving time and avoiding introducing errors during re-entry) and brief reasons why. You may want to export your data into other software or software programs for more advanced statistical analyses.

Session VII: Introduction to Epi Info 195 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we are going to open Epi Info. Click on the Epi Info icon on your desktop.

List of Output Window Commands

Program Editor

Speaker Notes: Click on ‗analyze data‘ tab from the menu

Describe what some of the screen‘s components does (Commands, Output Window, Editor)

Session VII: Introduction to Epi Info 196 Epidemiology for Data Users (EDU) Trainer‘s Manual

READ Command

• The READ command imports your data from an existing source so that they can be analyzed • The READ command identifies your data source • Specify the: • Current project • Data formats • Data source • Views Speaker Notes: Data sets from software programs such as Excel can be imported into Epi Info. The Read command is able to access and open data tables stored either in the Epi Info data format or in other data formats, such as Excel or Access.

List of Output Window Commands

Program Editor

Speaker Notes: Point out Read Command located under Analysis Commands, Data tabs

Session VII: Introduction to Epi Info 197 Epidemiology for Data Users (EDU) Trainer‘s Manual

Using the READ Command

Project Format Source File

Table or View

Speaker Notes: Follow the steps below to import a data set from Excel: Click Analyze Data button from the main menu Click on Read (Import) Click on Data Formats drop-down box and select Excel 8.0 (or whatever version you want to import) Click on Data Source (…) and Analysis will display a window in which you can select the Excel file you want to read: C:\ measles.xls Under Views, select „mapping‟ (this is selecting which sheet from the Excel file you want to import) Click on your file and then click Open Click on desired Sheet and then click OK (leave the default option in the box First Row contains field names). When the Read command has been executed, the Analysis Output section will display the Current View, Record Count, and Date to indicate the data table is open. Current View indicates the data table that is open in Epi Info. Record Count indicates the number of records that has been identified (read) by the Read command.

Session VII: Introduction to Epi Info 198 Epidemiology for Data Users (EDU) Trainer‘s Manual

Date gives the date and time the table was opened in Epi Info.

LIST Command

• While the READ command identifies your data source, it will not display the data elements • Use the LIST command to display either all or selected variables in a data set • You can specify your line listing in your output window

Speaker Notes: The data set is now open and available in Epi Info. However, it is not displayed yet. The List command is used to display either all or selected variables from the data set.

Speaker Notes: Follow the steps below to display the table you just read, ‗measles.xls‘

Session VII: Introduction to Epi Info 199 Epidemiology for Data Users (EDU) Trainer‘s Manual

From the command tree on the left side of the screen, click on the Analysis command under the Statistics folder Click the List command In the List dialog box that appears, click the Variables drop-down list to display all the variables that are available. Notice that the fields in the drop-down list are listed alphabetically. You can select specific variables that you want to display by clicking on the variable name. Click OK and note the frequencies of the variable. The output will produce a line listing of the variables

Session VII: Introduction to Epi Info 200 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Summarization and Analysis Choice of statistics is affected by the type of data • Continuous- data measured on an unbroken scale - Examples: weight, age • Categorical –data that can be divided only into distinct categories - Examples: sex, marital status Note: It is useful to have codebook while doing analysis 18

Speaker Notes: Recall in the previous lecture that continuous variables are data measured on an unbroken scale, such as weight or age. Data that can be divided into distinct categories include sex and marital status.

Why is it useful to have a codebook while doing analysis (e.g., variables are represented by abbreviations and responses by numbers for convenience and codebook provides reference for you and others)

Facilitator Background Information: An important step in creating a dataset is to create a reference codebook for the dataset. The codebook should contain the following for each variable: 1. Name: The variable name in the dataset 2. Description: A description of the data stored in that variable 3. Type: The type of variable (e.g numeric versus text) 4. Values: The values that the variable might take (e.g. sex can be M or F). This will be helpful in determining the type of statistics you can obtain for that variable.

Session VII: Introduction to Epi Info 201 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let us look at the variables we have in this measles dataset. For example, ―Treating_HC‖ (treating health center). Is this variable continuous or categorical? This variable is categorical as the data we have can be categorized by health center. What about the variable ‗sex‘? Is this continuous or categorical? This is categorical as well since the data can be categorical by ‗sex‘ (male or female). What about the variable ‗age‘? This is continuous since age is can be measured on a continuous scale (i.e., 0 through 100). What about ‗vacc_status‘? This is also categorical (vaccinated, not vaccinated, or unknown). What about the variable ‗out_come‘? This variable is categorical as it can be broken into two distinct categories (alive and died).

Session VII: Introduction to Epi Info 202 Epidemiology for Data Users (EDU) Trainer‘s Manual

MEANS Command

• The MEANS command produces descriptive statistics for continuous variables • The descriptive statistics include: – Mean – Median – Mode – Standard deviation

Speaker Notes: The Means command is used to generate descriptive statistics on continuous variables. Suggest (briefly) asking why one doesn‘t use a mean with a categorical variable to assure people understand concept and illustrate idea (one is currently married or unmarried, one is born biologically male or female, etc. – nothing in between)

Speaker Notes: Click the Means command under the Analysis folder

Session VII: Introduction to Epi Info 203 Epidemiology for Data Users (EDU) Trainer‘s Manual

In the Means dialogue box that appears, select ‗variablename‘ from the list of variables under Means of drop-down list. What is a continuous variable in our dataset that we can run a Means on? Let‘s try ‗age in yrs.‘ Click OK.

Speaker Notes: The output should display the variables‘ minimum, median, maximum values, as well as mean, median and standard deviation. What is the mean age of our measles patients? What is the median? Standard deviation? We see that most of patients in our dataset are age 0. What could be the reason for this? What other variables do we have? We also have the variable ‗age in months.‘ Let‘s run a Means on this variable and see how our distribution may differ from that of ‗age in years.‘

Session VII: Introduction to Epi Info 204 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes: When we look at age by month we can see the ages of those considered to be ―0‖ when we looked at ‗age in years‘. Most of those under age 1 year are 8 months (12.0%).

Speaker Notes: Example of why one might want to look at frequencies; for example, show where data are grouped along the entire range for a continuous variable (how many HIV cases in 2002 versus 2009?) or if the variable values are distributed in a

Session VII: Introduction to Epi Info 205 Epidemiology for Data Users (EDU) Trainer‘s Manual relatively even manner for a categorical variable (for example, does the population surveyed consists of even numbers/proportions of males and females?)

Speaker Notes: Follow the steps to run a frequency: Click Analysis button from the main menu. From the command tree on the left side of the screen, click on the Frequencies command. From the Frequencies of box, use the down arrow to select a variable. For example, ‗sex.‘ Click OK and note the frequencies of the variable.

Session VII: Introduction to Epi Info 206 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Epi Info produces the distribution of the variable ‗sex‘ as percentages and in a table. Note that 46.8% of patients were female and 53.2% were male. So, there was a slighter higher proportion of male than female patients. Now let us run a frequency on a categorical variable. Let‘s try the variable ‗outcome‘.

Session VII: Introduction to Epi Info 207 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Epi Info produces the distribution of the variable ‗outcome‘ as percentages and in a table. This indicates that, at the outcome of the measles outbreak, 95.7% of the patients were ‗alive‘ while 4.3% of the patients ‗died.‘

Speaker notes: What statistical test would we use if we wanted to determine if the proportion of males who died from measles was higher than the proportion of females? Chi- Square.

Session VII: Introduction to Epi Info 208 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Follow the steps below to run a table. 1. Select ―Statistics, Tables‖ from the Analysis Tree. 2. Select the variable you want to appear down the left side of the table from the ―Exposure Variable‖ drop down list. 3. Select the variable you want to appear across the top of the table from the ―Outcome Variable‖ drop down list. 4. Optional: Select ―weight‖ from the Weight drop down list to run a weighted table. 5. Optional: Select a variable from the ―Stratify by‖ drop down list to stratify the table output. 6. Click ―OK‖. Run a table comparing Sex with Outcome

The TABLES dialog box should look like the figure below:

Session VII: Introduction to Epi Info 209 Epidemiology for Data Users (EDU) Trainer‘s Manual

The output should look like this figure. How many men died of measles? How many women?

Session VII: Introduction to Epi Info 210 Epidemiology for Data Users (EDU) Trainer‘s Manual

Epi Info will produce a number of diagnostic and other statistics including confidence intervals. Note that the chi-square statistic is p>.05 and therefore there does not appear to be a statistically significant difference between gender and those who died.

Speaker Notes: Epi Info provides a Tutorial that can be accessed from within the program and might be helpful if you run into difficulties.

Session VII: Introduction to Epi Info 211 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: This screen can be reached through Help button on the main Epi Info page, by selecting Contents.

WRAP-UP AND REVIEW

Learning Objectives

By the end of this session, you should be able to:

Describe Epi Info’s three main modules Import data from an existing data set in Excel into Epi Info Display the variables in a dataset using the ‘List’ command Summarize data using the ‘Frequency’ and ‘Means’ commands in Epi Info Generate simple graphs in Epi Info

Session VII: Introduction to Epi Info 212 Epidemiology for Data Users (EDU) Trainer‘s Manual

RESOURCE CENTER

Resource Center

• http://www.idready.org/rdbms/database_RDBMS. html • http://cphp.sph.unc.edu/training/training_list/?m ode=view_subcat_detail&subcat_id=321 • http://www.epiinformatics.com/ • http://nccphp.sph.unc.edu/training/index.html • http://www.cdc.gov/epiinfo/tutorials.htm  Training manuals and an Epi Info download are included in the ‘Resource Center’ of your CD’s

Session VII: Introduction to Epi Info 213 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION VIII: PRESENTING AND MODELING DATA

Description of Session: This session will discuss how to use Microsoft Excel to create charts and tables to display data. It will also discuss how to model data in Microsoft Excel in order to assess program progress.

Session VIII: Presenting and Modeling Data 214 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: By the end of the presentation, participants will be able to: • Choose the most appropriate table or chart type for the message being conveyed • Create charts and graphs using Microsoft Excel

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Data summary and Lecture Copies of the analysis presentation for each participant Blank flipchart paper and markers 2 Excel Exercise Hands-on Excel loaded in participants computer Excel datasets loaded in computer Copies of exercise and answer sheet

Session VIII: Presenting and Modeling Data 215 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Just show them the data! People absorb information in different ways, while for some people reading a report that describes everything in words is the best way to get information, for others having a picture makes things much clearer. This is especially important to keep in mind when presenting data. Often a graph or chart can drive a point home about the data that words just aren‘t able to. Visually displaying data helps get the message across. It is useful when wanting to: • Communicate with policy makers • Document disease • Demonstrate successes • Show trade-offs between 2 choices • Make decisions • Can you think of other examples when it would be useful to visually display data? The first step is answering the question: What do you want to communicate? • Disease Trends? • Difference among populations?

Session VIII: Presenting and Modeling Data 216 Epidemiology for Data Users (EDU) Trainer‘s Manual

The answer will determine how you will display the data. In this presentation we will look at tables and charts, and discuss when to use them and how to make them.

Note: A Line graph is not a chart. It may be less confusing if the terminology is consistent in the following slides.

Speaker Notes: Prompt Audience: Can you identify differences between the three different charts? Common types of charts include:  Pie chart  Bar chart  Line graph Let‘s begin with tables. Tables are a set of quantitative data arranged in rows and columns. Tables are useful for demonstrating patterns, exceptions, differences, and other relationships. Tables organize data into a format that is easy to understand and explain. Tables help the reader to see the data all in one place, rather than finding it in the text. Tables usually serve as the basis for preparing more visual displays of data, such as graphs and charts.

Facilitator Background Information:

Session VIII: Presenting and Modeling Data 217 Epidemiology for Data Users (EDU) Trainer‘s Manual

It will be important to stress that different types of charts will be appropriate for different types of data and picking the right one will be important to ensure that the correct message is conveyed.

Speaker Notes: A table enables you to absorb information quickly. Prompt Audience: Why is the title important? (Answer: It gives the person an idea of what is in the table.) It is important to improve footnotes because they explain any abbreviations used in the table. Let‘s look at this example of a table. As we saw on the last slide, a table is an arrangement of data in rows and columns. We can see here that a row is a horizontal arrangement of data. A column is vertical. A table should always have a title. Here the title is Table 1.0: # PLWHA by District and Sex, Province Y. PLWHA is an acronym and is described below the table so that the reader understands that is short for Persons living with HIV/AIDS.

Session VIII: Presenting and Modeling Data 218 Epidemiology for Data Users (EDU) Trainer‘s Manual

Title row here includes the column headings: District, Male, Female and Total. The bottom row is a summary of the totals. Are there any observations you can make about Province Y by looking at the table? Examples: Districts E and I have 1.5 million PLWHA, but District E has many more HIV+ men compared to women. This trend is not the same in District I. What is it about District A that it has such lower numbers compared to the others? etc.

Speaker Notes: Let‘s talk about creating great tables. A table should be self-explanatory. If a table is taken out of its original context, it should still convey all the information necessary for the reader to understand the data. To create a table that is self-explanatory, follow these guidelines: • Keep the display simple by sticking to one single overriding communication objective (SOCO). Give only the necessary data, and avoid repetition . • Explain data in table fully with title • Label rows & columns clearly, concisely (including units) • Write out or footnote all codes, abbreviations, symbols

Session VIII: Presenting and Modeling Data 219 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Prompt Audience: What is wrong with this table? (Answer: there is no title, the data is disorganized, repetitive and raw. In addition, the information that you are comparing needs to be placed in adjacent columns.) What is the main point of these data? How can you make it easier to read? Is a table the best way to represent these data?

Speaker Notes:

Session VIII: Presenting and Modeling Data 220 Epidemiology for Data Users (EDU) Trainer‘s Manual

Charts (e.g., line and bar graphs, pie charts) make sense to visually display trends and when comparing populations, especially when you wish to show populations broken into subsets, such as males and females or age groups. The key points of tables and figures should always be explained in the accompanying narrative. • When used appropriately, can summarize and display complex data clearly and effectively and emphasize specific points • Let you identify and present distributions, trends, and relationships among the data • Help make sense of the data in the profile and communicate findings to planning groups and decision makers • However, poorly designed or executed tables and figures can mislead users or distract them from your message!

Facilitator Background Information: It is important to understand when and what type of charts to use in order to get the intended message across. Often a chart and table will be shown side by side in order to provide information using multiple methods, which can be important when presenting data to a diverse group of learners.

Session VIII: Presenting and Modeling Data 221 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Pie Charts • Circular chart split into segments which show components of a larger group • Suitable for displaying categorical data, or discrete data in distinct categories • The size of a ―slice‖ is proportional to the amount of data (e.g., number of cases) it represents • Proportion (%) of the total each component represents is frequently added on the slice • Different colors can be used to identify various slices • Prompt Audience: Are there any weaknesses or limitations to the pie chart?

Session VIII: Presenting and Modeling Data 222 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session VIII: Presenting and Modeling Data 223 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: The simplest bar graph displays data from a table with one variable. Each bar represents one category. Bar graphs can be organised horizontally or vertically. Vertical bars differ from histograms since they are separated by a space. The height of the bar is proportional to the number of events (e.g. cases) in the category, nevertheless the surface is not always proportional to the width of the category on the x-axis (e.g. different width of age groups). If there is a logical order between categories it should be respected. Otherwise categories can be organised along decreasing or increasing values of respective bars. Variables in a bar graph can be discrete (e.g. sex, region, race) or continuous (e.g. age) but organised in categories (e.g. age groups). The x axis does not need to be continuous.

Session VIII: Presenting and Modeling Data 224 Epidemiology for Data Users (EDU) Trainer‘s Manual

“Grouped” bar graph

# Days Since Patients Started ARVs

140

120 1-30 days 100 31-60 days 61-90 days 80 91-180 days

60 More than 180 days # Patients # 40

20

0 Itezhi Tezhi Kamoto Katondwe Mtendere Mwandi Siavonga Town

Speaker Notes: Prompt Audience: How are grouped bar graphs different than simple bar graphs? A grouped bar chart is used to illustrate data from two-variable or three-variable tables, when an outcome variable has only two categories. Bars within a group are usually adjoining. The bars must be illustrated distinctively and described in a legend. It is best to limit the number of bars within a group to no more than three. Sometimes several subcategories can be shown and placed close to each other in a larger category (grouped bar graphs). It is important that information should be presented in the same order in each category. The following vertical grouped bar graph represents the distribution of cases of Ebola haemorrhagic fever in Bumba zone, Zaire in 1976. The graphic shows two variables, age groups and sex. Note in this graph the variables on the x-axis are categorical, however more than 3 categories are being compared

Session VIII: Presenting and Modeling Data 225 Epidemiology for Data Users (EDU) Trainer‘s Manual

Stacked bar graph

# Days Since Patients Started ARVs 300

250 More than 180 days 200 91-180 days

150 61-90 days

31-60 days # Patients # 100 1-30 days

50

0 Itezhi Tezhi Kamoto Katondwe Mtendere Mwandi Siavonga Town

Speaker Notes: An alternative to grouped bar graphs is the stacked bar graph (also known as sub- divided bar graph). On this type of graph a bar is divided into components, it shows segments of totals. The height of each component is proportional to the part it takes in the bar. Similarly to grouped bar graphs, information should be presented in the same sequence on each bar. The following stacked bar graph shows the distribution of cases of Salmonella Typhimurium infection by age and sex in Norway. In each age group category the bar is divided into two sub-categories, males and females.

Session VIII: Presenting and Modeling Data 226 Epidemiology for Data Users (EDU) Trainer‘s Manual

100% Component column graph

# Days Since Patients Started ARVs 100% 90% 80%

70% More than 180 days 60% 91-180 days 50% 61-90 days

40% 31-60 days # Patients # 30% 1-30 days 20% 10% 0% Itezhi Tezhi Kamoto Katondwe Mtendere Mwandi Siavonga Town

Speaker Notes: Prompt Audience: What are the limitations of 100% Component Column Graphs?

Session VIII: Presenting and Modeling Data 227 Epidemiology for Data Users (EDU) Trainer‘s Manual

Line Graphs

Trends in HIV prevalence among 15-49 year olds in province X: 2001– 2010

80

60

% 40

20

0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Year

24 source: Ministry of Health annual report 2010

Speaker Notes: Line graphs: Display relationships between 2 variables on 2 dimensions, or axes: Y-axis (vertical): dependent variable - the variable you wish to predict or explain X-axis (horizontal): independent variable values are recorded as points on a graph and then connected (as a line) to show trends (usually this is time, e.g., weekly, annually, etc.)

Session VIII: Presenting and Modeling Data 228 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session VIII: Presenting and Modeling Data 229 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session VIII: Presenting and Modeling Data 230 Epidemiology for Data Users (EDU) Trainer‘s Manual

HANDS-ON EXERCISES Part I. Introduction to the tables and the data Walk the participants through the 2 tables and explain the data that is being presented.

Scenario for Data analysis exercise: Antiretroviral Therapy (ART) Monthly summary HIV care results for Jan. through June 2011 - Mondello Clinic

Mondello clinic provides HIV care and treatment for individuals infected with HIV. Specifically they provide nutritional and palliative support for HIV positive individuals not on ART and ART care for those individuals that are medically eligible. Below you will see six months of data from monthly summary reports from Mondello Clinic for the first half of 2011. You will see 2 tables, 1) HIV Care (non ART and ART) and 2) ART follow up. 'Previous' represents the value of the indicator from the previous monthly report. 'New' represents the number of new cases during this reporting period. Cumulative' is the current cumulative total Medically eligible' represents clients clinically eligible for ART but not yet on treatment Current' represents the actual number (minus deaths, drop-outs and lost to follow-up.) 'New' represents the number of new cases during this reporting period. Cumulative' is the current cumulative total

HIV care (non -ART and ART) - new and cumulative number of persons enrolled Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Previous 101 108 114 122 131 142 New 7 6 8 9 11 10 Cumulative 108 114 122 131 142 152 Medically Eligible for ART but not yet on treatment 29 30 34 38 39 44 Previous on ART 69 78 88 97 105 116 New on ART 9 10 9 8 11 9 Cumulative on ART 78 88 97 105 116 125 Current (active) on ART 64 72 82 92 102 98

Session VIII: Presenting and Modeling Data 231 Epidemiology for Data Users (EDU) Trainer‘s Manual

ART Follow up (Cumulative) Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Stopped 1 1 2 1 2 2 Transferred out 1 1 1 2 2 2 Dead 0 1 1 8 8 8 Lost to Follow-up 0 0 0 2 2 2 Unknown 12 13 11 0 0 14 Total 14 16 15 13 14 27

Part 2. Creating graphs to visualize and make decisions based on the data Follow the activities below and provide feedback to the participants as they practice.

1) Graph the ART follow-up data showing the reason for losses (stopped, transferred, dead, lost to follow-up) by month

Tips Highlight the table starting from the 2nd row and extending through the "unknown" row, including all the columns. Go to - Insert - Column - and choose the stacked column chart. Go to the Layout tab and give the chart a chart title, and horizontal and vertical axes titles. Experiment with the other 2-D column charts - the clustered and the 100% stacked.

This table is just a reproduction of the table above. ART Follow up (Cumulative) Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Stopped 1 1 2 1 2 1 Transferred out 1 1 1 2 2 2 Dead 0 1 1 8 8 8 Lost to Follow-up 0 0 0 2 2 2 Unknown 12 13 11 0 0 14 Total 14 16 15 13 14 27

Session VIII: Presenting and Modeling Data 232 Epidemiology for Data Users (EDU) Trainer‘s Manual

2) Experiment with the other 2-D column charts - the clustered and the 100% stacked. What message would you be trying to convey if you chose the clustered? The 100% stacked?

Session VIII: Presenting and Modeling Data 233 Epidemiology for Data Users (EDU) Trainer‘s Manual

ANSWER TO QUESTION 2 The clustered would be most useful if you wanted to clearly see the differences among the categories per month, e.g., the extent of the problem of lost to follow-up, while retaining the absolute value number of clients. The 100% stacked would be most useful if you wanted to see the differences among the categories but it was important to understand what percentage they made up of the total number of ART patients that were started on ART but are not currently on ART.

Part 3. Creating graphs to make decisions about setting and reaching targets

3) The target for Mondello Clinic for the indicator 'number of HIV-infected individuals currently on ART' for the year 2009 is 500. Develop a graph that displays the trend in the 'number of HIV-infected individuals currently on ART during the first half of 2009.

Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Current on ART 64 72 82 92 102 98 Target 500 500 500 500 500 500

Session VIII: Presenting and Modeling Data 234 Epidemiology for Data Users (EDU) Trainer‘s Manual

4B) What would the number of clients need to be each month to reach the 500 target by December 2009? Draw a graph representing this increase.

Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Current on ART 64 72 82 92 102 Target 500 500 500 500 500

Session VIII: Presenting and Modeling Data 235 Epidemiology for Data Users (EDU) Trainer‘s Manual

ANSWER TO QUESTION 5)

Month Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Cumulative 78 88 97 105 116 125 Current 64 72 82 92 102 98 % Current 82% 82% 85% 88% 88% 78%

5A) What are the implications of this trend? ANSWER TO QUESTION 5A) There are a significant amount of clients that are not continuing their ARTs each month. Considering the low numbers on ART, the program cannot afford to have clients drop out of the program. The data reported in June is troubling and could either be a result of poor data quality or signify a problem at the program site/facility.

6) What do the above data tell you about the ART-eligible individuals not yet on ART? ANSWER TO QUESTION 6) That there are HIV positive clients that are ready for treatment but not receiving it. The number of new on ART fluctuates slightly, but is not going up while the number medically eligible continues to rise.

Session VIII: Presenting and Modeling Data 236 Epidemiology for Data Users (EDU) Trainer‘s Manual

6A) What implications do these results have for program planning? ANSWER TO QUESTION 6A) The program needs to find ways to meet the needs of the clients that are waiting for services or they will never attain their targets. There is a need to talk with staff at the facility/site to determine if there is a barrier to clients accepting treatment or if the problem is related to the facility struggling to meet the needs for services.

6B) What does it tell you about program retention? ANSWER TO QUESTION 6B) The program is not efficient. The full range of services of the clinic is to get clients on ART and this is not happening sufficiently.

Session VIII: Presenting and Modeling Data 237 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION IX: MATHEMATICAL MODELING WITH EXCEL

Session IX: Mathematical modeling with Excel 238 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Examples of mathematical models in public health There are many examples of models in use in public health. For example, the Spectrum suite of models is used by countries and UNAIDS to estimate global and national HIV prevalence and incidence figures. It is also used to

Session IX: Mathematical modeling with Excel 239 Epidemiology for Data Users (EDU) Trainer‘s Manual estimate the number of HIV-related deaths, number of infants born with HIV, number of people eligible for antiretroviral treatment, among other estimates. Recently in Zambia there was a population census. The census collects data as inputs into a model that is used to project the size and make-up of population today and into the future. Data collected in 2010 will be used to project the size of the population in 2011, 2012, 2015, etc. because it is impossible to count everyone every year. Cost-effectiveness models are used to make decisions about the interventions with the greatest impact for the cost. As a matter of national policy, we may be deciding whether it would be smart to add iodine to salt or other interventions to improve the public‘s health. We use models to helps us analyze those questions.

Core elements of mathematical models

Calculations & Inputs Outputs Assumptions

• As calculations become more complicated and the number of inputs grows, it becomes increasingly useful to create a model to replicate the system and to compare the output when the inputs are varied

Speaker Notes: Core elements of mathematical models The core elements of mathematical models are inputs, calculations & assumptions, and outputs. The inputs are what goes into the model, and the outputs are what comes out of the model. The calculations and assumptions are in the middle, and are usually a set of equations that help mathematically represent a system in real life.

Session IX: Mathematical modeling with Excel 240 Epidemiology for Data Users (EDU) Trainer‘s Manual

As calculations become more complicated and the number of inputs grows, it becomes increasingly useful to create a model to replicate the system and to compare the output when the inputs are varied.

Speaker Notes: Excel models can be built to make decision-making based on alternative scenarios easier. We will see now how we can use Excel for:  Setting targets  Calculating coverage  Assessing retention  Forecasting resources needed for an intervention  Estimating the impact of an intervention  Estimating the cost of an intervention  Comparing scenarios

Session IX: Mathematical modeling with Excel 241 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let‘s discuss target setting. A target is a specified level of performance for a measure (indicator) at a predetermined point in time (i.e., achieve ‗x‘ by ‘y‘ date). There are two types of targets: - Overall targets that indicate what the overall program is trying to achieve, and - Annual targets that break the overall target up into manageable pieces and help with program monitoring Targets can be expressed either as raw numbers or percentages. For example, 3 million people on antiretroviral treatment by 2005, or the 3 by 5 initiative. Or 80% ART coverage = universal coverage.

Session IX: Mathematical modeling with Excel 242 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let‘s suppose we are a district health team and we wanted to answer the question, by 2015, what should be the target % children aged 12-23 month who have received a measles vaccine? It would be useful to know a few things in order to answer this question. What is the current coverage rate? What has been the pattern in the last few years? What interventions are being put in place (if any) to increase access to the vaccine?

Session IX: Mathematical modeling with Excel 243 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Using Excel, we can graph the past coverage of % measles vaccination. You can see the solid red line represents the past trends. We can then use Excel to project the trend through 2015. In this example, we are looking at the coverage that could be achieved if we continue annual linear growth, that is, if we grow at the same pace each year.

Session IX: Mathematical modeling with Excel 244 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: We can also use Excel to see what it would take to reach a certain target in a particular timeframe. Here we can see what it would take to reach 100% vaccination coverage by 2015. We could then discuss whether or not it is feasible to reach 100%, what would be feasible, and what it would take to reach that target.

Session IX: Mathematical modeling with Excel 245 Epidemiology for Data Users (EDU) Trainer‘s Manual

HANDS-ON EXERCISES

Target Setting Exercise

Part I. Introduction to the data Walk the participants through the table and explain the data that is being presented.

Year 2005 2006 2007 2008 2009 2010 % Vaccination against 30% 32% 35% 40% 45% 50% Measles

Part 2. Setting targets looking at past trends Follow the activities below and provide feedback to the participants as they practice the concepts.

1) How much has coverage increased from 2005 to 2010?

ANSWER TO QUESTION 1) 20%

2) We are interested in setting a reasonable target for 2015. If we look at past trends, where might we be in 2015?

ANSWER TO QUESTION 2) There has been a pretty steady increase of 5% each year since 2007.

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 % Vaccination 30% 32% 35% 40% 45% 50% 55% 60% 65% 70% 75% against Measles Tip: We want Excel to think the years are labels and not data points, so we put a ' mark before them.

Session IX: Mathematical modeling with Excel 246 Epidemiology for Data Users (EDU) Trainer‘s Manual

3) If we wanted to compare that projection with achieving 100% in 2015, how would we do it?

ANSWER TO QUESTION 3) We can add a new row to our table that allows for 10% additional coverage each year, scaling to 100% by 2015. If you want the graph to stop at 100%, you can right click on the vertical axis and select, format axis. Instead of the maximum being automatically provided ("auto"), you can select "fixed" and enter "1.0".

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Linear Trend 30% 32% 35% 40% 45% 50% 55% 60% 65% 70% 75% Rapid Scale-up 30% 32% 35% 40% 45% 50% 60% 70% 80% 90% 100% to 100% by 2015

Session IX: Mathematical modeling with Excel 247 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Coverage is the extent to which a program reaches its intended target population, institution, or geographic area. Coverage assesses the availability and utilization of services. Examples of coverage indicators for HIV/AIDS care and treatment programs include: Number of clients receiving voluntary counseling and testing (VCT) services Number of clients provided with antiretrovirals (ARVs) Percentage of HIV patients receiving tuberculosis therapy

Session IX: Mathematical modeling with Excel 248 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: If we had these data, we could create a table in Excel. The 2nd column for the first 4 rows would be inputs. The 5th column could be an equation with a result that changes with the inputs: • % coverage = cumulative number on ARVs / 5-year target • % coverage = 23,861 / 32,000, or 75%

Session IX: Mathematical modeling with Excel 249 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: By changing any one or several of the inputs, we could assess the impact on our output of interest, or % coverage. In this example, there is a change in the 5-year target. How will it impact % coverage? Here, if the target increases to 40,000, coverage decreases to 60%. If the number of males on ARVs were to increase to 25,000 in scenario 3, we‘d have a coverage rate closer to 100%.

Session IX: Mathematical modeling with Excel 250 Epidemiology for Data Users (EDU) Trainer‘s Manual

HANDS-ON EXERCISES Coverage Exercise

Part I. Introduction to the tables and the data Walk the participants through the table and explain the data that is being presented.

Scenario 1 Scenario 2 Scenario 3 Number on Number Number ARVs on ARVs on ARVs Males 7,980 7,980 15,000 Females 15,881 15,881 15,000 Cumulative number 23,861 23,861 30,000 5-year target 32,000 35,000 32,000 % Coverage 75% 68% 94% These 2 columns to be added as part of the exercise.

Part 2. Creating scenarios to calculate coverage in Excel Follow the activities below and provide feedback to the participants as they practice the concepts.

1) What is the equation used to calculate coverage in scenario 1?

ANSWER TO QUESTION 1) Cumulative number / 5-year target. In the cell: =C11/C12

2) How can we create additional scenarios?

ANSWER TO QUESTION 2) By copying and pasting, we can change our table into one with countless scenarios, but we can start with 3.

We would change the header row to reflect the change in scenario title. We can then adjust the 5-year target in scenario 2 to 35,000 and see how the % coverage decreases to 68%. If we were to increase the number of males and females on ART to 15,000 each, our coverage would increase to 94%.

Session IX: Mathematical modeling with Excel 251 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Modeling and Excel can be used in assessing retention. Program retention refers to a program‘s success at delivering the entire package of services to a client. Assessing retention is especially important in clinical programs for which drug adherence is an issue (e.g., TB or HIV/AIDS) and there are multiple steps to completing treatment or services (e.g., immunization, prevention of mother-to-child (PMTCT) HIV programs). To assess retention, we typically look at the percentage of completion of each phase of the service. For example, does the baby that comes to the clinic for their first vaccination follow through with all of their follow-up vaccines, including their measles vaccine at 9 months? How many get lost before completing the full package?

Session IX: Mathematical modeling with Excel 252 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: We can use Excel to help us assess retention. In this example we are highlighting a PMTCT program. Women must come for an antenatal visit, receive an HIV test, and if positive receive ARVs. We use our coverage data to estimate that in this example, 50% of women present for ANC, 80% of those women get tested for HIV, and 60% of those women receive ARVs to prevent new infant infections of HIV.

Session IX: Mathematical modeling with Excel 253 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now let‘s compare the example we just saw, and call it scenario 1, to another scenario, Scenario 2. In the second scenario % ANC attendance increases to 75%, HIV testing is a bit decreased to 70%, and % receiving ARVs increases from 60% to 90%. By looking at our output, we can see that % retention is more than double in Scenario 2 than it is in Scenario 1, but still it is only just over half of all HIV+ pregnant women. In order to improve retention, service delivery will need to improve at every point along this cascade of services. If we wanted to expand this PMTCT model, we could go farther and calculate mother- to-child transmission rates, the number of infections averted, the human and lab resources necessary to run the program, the cost of the program and the cost- effectiveness of different approaches.

Session IX: Mathematical modeling with Excel 254 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Example graphs of modeled data When building a model, it is possible to link the tables we create in Excel to a graph template.

Session IX: Mathematical modeling with Excel 255 Epidemiology for Data Users (EDU) Trainer‘s Manual

HANDS-ON EXERCISES Retention Exercise Part I. Introduction to the tables and the data

Walk the participants through the tables and graphs and explain the data that is being presented, and the value of a model for running calculations that are done repeatedly. PMTCT (Retention exercise)

Inputs

Scenario 1 Scenario 2

# Live births 100,000 100,000

% HIV prevalence rate 20.000% 20.000% % ANC attendance 50.00% 85.00%

% HIV testing at ANC 80.00% 75.00%

% Receiving ARVs at ANC after 60.00% 90.00% testing positive

Outputs

Output 1. # HIV-positive pregnant and the PMTCT cascade of services through receipt of ARVs

Total (HIV+ pregnant women) 20,000 20,000

# Retained 4,800 11,475

# Lost 15,200 8,525 % Retained 24.0% 57.4%

Equation Total (HIV+ pregnant women): # Live births * % HIV prevalence # Retained: # HIV-positive pregnant women * % ANC attendance * % HIV testing * % Receiving ARVs

# Lost: # HIV-positive pregnant women - # Retained

% Retained: # HIV-positive pregnant women retained / # Total HIV-positive pregnant women

Session IX: Mathematical modeling with Excel 256 Epidemiology for Data Users (EDU) Trainer‘s Manual

Graphs

Graph 1. # HIV-positive pregnant women and the PMTCT cas cade of services through receipt of ARVs

Graph 2. % HIV-positive pregnant women receiving ARVs for PMTCT

Session IX: Mathematical modeling with Excel 257 Epidemiology for Data Users (EDU) Trainer‘s Manual

Part 2. Add 3 additional scenarios and demonstrate how the output changes as the inputs change.

Walk the participants through the tables and graphs and explain the data that is being presented, and the value of a model for running calculations that are done repeatedly.

Inputs Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5

# Live births 100,000 100,000 100,000 100,000 100,000 % HIV prevalence rate 20.000% 20.000% 20.000% 20.000% 20.000% % ANC attendance 50.00% 85.00% 90.00% 95.00% 100.00% % HIV testing at ANC 80.00% 75.00% 80.00% 85.00% 100.00% % Receiving ARVs at 60.00% 90.00% 90.00% 95.00% 100.00% ANC after testing positive

Outputs

Output 1. # HIV+ pregnant and the PMTCT cascade of services through receipt of ARVs Scenario Scenario Scenario Scenario Scenario 1 2 3 4 5

Total (HIV+ pregnant women) 20,000 20,000 20,000 20,000 20,000

# Retained 4,800 11,475 12,960 15,343 20,000

# Lost 15,200 8,525 7,040 4,658 - % Retained 24.0% 57.4% 64.8% 76.7% 100.0%

Equation

Total (HIV+ pregnant women): # Live births * % HIV prevalence

# Retained: # HIV-positive pregnant women * % ANC attendance * % HIV testing * % Receiving ARVs

# Lost: # HIV-positive pregnant women - # Retained

% Retained: # HIV-positive pregnant women retained / # Total HIV-positive pregnant women

Session IX: Mathematical modeling with Excel 258 Epidemiology for Data Users (EDU) Trainer‘s Manual

Graphs Graph 1. # HIV+ pregnant and the PMTCT cascade of services through receipt of ARVs

You can highlight the chart area, right click, and go to Select Data - and then click on the button that says "Switch Column/Row".

Session IX: Mathematical modeling with Excel 259 Epidemiology for Data Users (EDU) Trainer‘s Manual

You can get rid of any unwanted data from the chart. By selecting the chart and going up to the source table, you can move the highlighted area around so that you can expand or contract the area around the cells that are included in the graph.

If you highlight the chart area, and go to Design - you can CHANGE chart type to make it a stacked instead of clustered. Graph 2. % HIV-positive pregnant women receiving ARVs for PMTCT

Session IX: Mathematical modeling with Excel 260 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: What can we learn from these examples? • Setting up a model in Excel can save time and improve accuracy, especially if something is going to be done repeatedly

Session IX: Mathematical modeling with Excel 261 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Asking and answering ‗what if‘ questions by running scenario analyses is useful in data-driven decision-making • Target-setting, coverage calculations and retention assessments can all be done in Excel

WRAP-UP AND REVIEW

Speaker Notes: In conclusion: • Charts and tables are two key ways to clearly present and visualize data • Including these in reports and presentations helps convey key messages • For data to be useful it must be analyzed and presented in an understandable way • Excel is a powerful tool to help create charts and tables • Excel can be used for modeling data and decision-making

Session IX: Mathematical modeling with Excel 262 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Have we met our learning objectives?

Session IX: Mathematical modeling with Excel 263 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION X: USING INFORMATION DERIVED FROM SMARTCARE

Ministry of Health Using Information Derived from SmartCare Zambia’s Primary Electronic Health Record System

Description of Session: This session will provide an introduction to SmartCare, its use and the information it contains. It will also describe the report generation capabilities of the system and will discuss some of the challenges in using it.

Learning Objectives:

Objectives

• By the end of this presentation, participants should be able to… – Describe information that can be derived from SmartCare – Generate at least three aggregate reports using SmartCare – Export and graph information from SmartCare reports in Excel – Describe at least two data challenges that may impact how one interprets information from SmartCare

Session X: Using Information Derived from SmartCare 264 Epidemiology for Data Users (EDU) Trainer‘s Manual

At the end of this session participants will be able to: 1. Describe what SmartCare is and what information can be derived from it 2. Generate aggregate reports using SmartCare 3. Export and graph information from SmartCare reports in Excel 4. Describe some of the limitations of the system and how they impact the information

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Using Information Lecture Copies of the derived from presentation SmartCare for each participant

SmartCare loaded on participants‘ and presenter‘s computers

SmartCare data loaded on participants‘ and presenter‘s computers

Session X: Using Information Derived from SmartCare 265 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is SmartCare?

• SmartCare is… – An integrated electronic health record (EHR) system designed to improve continuity of care – A micro-data repository present at the facility, district, provincial, and national level • How can SmartCare information can be used? – Improve patient care quality – View, print or export reports for M&E – Use GIS for surveillance & epidemiology trends – Future developments: role-based data access • Increases in MoH approved SmartCare information use

Speaker Notes: Firstly, remind the participants that SmartCare was briefly introduced in the session on Data sources and utilization in Zambia. Ask them to recap: What is SmartCare? (Expected answer)  An integrated electronic health record (EHR) system Solicit several responses Review the slide. Ask the audience: what is Micro data? Solicit several responses. ANSWER - Micro-data is de-identified, but identifiable, person level data.

Session X: Using Information Derived from SmartCare 266 Epidemiology for Data Users (EDU) Trainer‘s Manual

m1 Will we cover now…

Information Using information captured in from SmartCare to SmartCare and how Creating reports Create Graphs and it is later organized Tables in Excel. and used.

Speaker notes Review the slide. Explain to the participants that presentation will be focused on the three stages in SmartCare i.e data collection and analysis.

How does SmartCare data move? National e Patient Level, de-identified erg ” m up ge oll er “R ” m ion iat cil on ec “R

Province Patient Level, de-identified

District Patient Level, Data de-identified integrity checks at Facility Patient Level, each level Identifiable

Speaker Notes: Use the slide to explain to the participants the data flow from the Health facility at the local level to the Ministry of Health at the National level, explaining the activities at each level.

Session X: Using Information Derived from SmartCare 267 Epidemiology for Data Users (EDU) Trainer‘s Manual

Explain that Reconciliation is when two reports of one patient are merged into a single report. This is a major strength of the Smart Care software.

Data Entry Personnel & EHR Users

• For care Providers and other EHR users to make effective health care decisions, they require accurate, complete data! – It is important that data entry personnel understand how information will be used later on so they are motivated to enter complete, accurate data!

• For data Analysts, it is likewise important to understand how information is collected, entered, and aggregated - the ‘provenance’ of the data. – This allows one to understand the gaps, and thus, the representation bias of conclusions derived from data.

Speaker Notes: Review the slide. Ask the participants: Why do we collect data? Answer - We collect data in order to provide better care. Our primary concern is the patient. Solicit more responses

Session X: Using Information Derived from SmartCare 268 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Confidentiality and Security

• SmartCare information is securely stored to safeguard confidentiality and privacy. • SmartCare has many measures that help ensure confidentiality is maintained, such as: • Password (logical) access protection • Physical Protection of computers • ‘Need-to-know’ Role Based Security Access to information depends on work of the user and his/her specific information NEED – not curiosity!

Speaker notes Ask the audience: Why do we need data confidentiality and security in SmartCare? Answer – SmartCare involves collection, storage and analysis of HIV data. Solicit more responses. Review the slide and summarize

Facilitator background information Depending on the location of where the data are stored one way of ensuring confidentiality and security is to store it in a de-identified format. This is where the data is totally anonymized – all personal identifiers and other identifying information removed and data can no longer be linked to the original source. Where data still contains identifiers, there exists a system of passwords to ensure that data is only accessible to authorized persons. In addition to passwords and de – identification, data is also stored on portable smartcards which the patients carry with them. Also, access to personal computers, laptops, and servers is made secure through the use of passwords, smartcards or other means of securing access to the stored information.

Session X: Using Information Derived from SmartCare 269 Epidemiology for Data Users (EDU) Trainer‘s Manual

Which information needs can be addressed by SmartCare data?

Situation 1: Situation 2: • HIV drug resistance • Finding out how early warning many clients used indicators VCT services in Lusaka last quarter

Situation 3: Situation 4: • Tracking shipments • CD4 counts over time and dispensations of by patient medication

Speaker Notes: Situation 1: Yes Situation 2: Yes Situation 3: Yes and No. Situation 4: Yes.

Session X: Using Information Derived from SmartCare 270 Epidemiology for Data Users (EDU) Trainer‘s Manual

What’s in SmartCare?

• Pa ent Demographics data: ini ally collected during registra on, & updated over me • Longitudinal Care data or Clinical Interac ons – Users enter data during/a er VCT, ANC, Delivery, Postnatal, At-Risk Infant , <5, FP, & ART visits. This data, + me +place +user +… = an interac on. • Cohort Data: a subset of care data for a group • Facility Data: staffing, services, inventory, etc. • Facility Opera ons Data: work load, quality, .. • Geographical Informa on Systems: map layers, geo-located data… and much more.

Speaker Notes: Review the slide, outlining the data containing in SmartCare. Additional background information is provided below.

Facilitator Background information The following is a detailed outline of the data contained in SmartCare. 1. Record Level: Individual patient record  Confidential client data from any facility and for all services used  Used to provide continuity of care 2. Aggregate Level: Large scale ‗sample‘  Counts and statistics from the record level; high statistical power of any research findings due to high sample size  Statistics can inform health management and policy on a large scale; potential for comparison to other health information systems 3. Facility Data: Each facility  Drug supply, inventory, other resources, client load and type, staffing levels  Managing drug and other inventory and staffing

Session X: Using Information Derived from SmartCare 271 Epidemiology for Data Users (EDU) Trainer‘s Manual

4. GIS data: Varying scales, usually large scale  Geographically tagged data  Can link geographically tagged data to SmartCare or other data and generate maps

Clinical Services in SmartCare

•Client Registration •VCT •ANC •Delivery •Postnatal •At-Risk Infant •Adult ART •Paeds ART •TB & ART •Labs •Pharmacy •Under 5 (piloting - Lusaka) •Family Planning (piloting - Lusaka) •Under dev: Out Patient (OPD incl. TB)

Reports in SmartCare

• One of the benefits of making the effort to put clinical data into SmartCare, is the ability to have the system generate processed, nicely formatted information that providers and EHR users can use to improve clinical care and facility operations. – This formatted information is otherwise called a report! – Reports can be on paper, but are most often electronic; • (paper may be less secure)

• Some Report types in SmartCare: – Single Patient Reports – Cohort and Aggregate Tables – Facility and Inventory Reports – Geographical Information Systems (GIS) Maps – SmartQuery pivot tables of low risk M&E data

Speaker Notes: Explain to the participants that the session will now be a hands on practical lesson. Facilitator should ensure that all participants have a access a computer installed with SmartCare, and able to locate the SmartCare Icon.

Session X: Using Information Derived from SmartCare 272 Epidemiology for Data Users (EDU) Trainer‘s Manual

Using the PowerPoint screen shots, Facilitator will now demonstrate, step by step, to the participants the process of creating various reports and the participants will practice on their computers. Guide and regularly check to ensure participants are course.

Selected Reports of Interest m1 All Facility Reports Aggregate Reports

• Facility Visits by Age Group & Gender • List of patients on ARV drugs* (Facility View) • Consumption of ARV drugs* • New Registration Visits by Age Group & • Consumption of non- ARV Gender in Period drugs* •Facility Demographics by Visits in Time- Total Visits • HIA2 •Unique Patient’s Visits to a Facility • ARV Daily Activity Register •Monthly Facility New Registrations • HIV Drug Resistance Early •Monthly Facility Total Visits Warning Indicators (EWIs) •Monthly Unique Facility Clients • HIV Drug Resistance EWIs, district view • IDSR • VCT Services • ART HMIS • Report and Requisition*

* These features are in the newest release, currently at 7 pilot sites.

Example of a Report: HIA2

****NOTE : REPORT NEEDS TO BE MOCK POPULATED

Speaker Notes: These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 273 Epidemiology for Data Users (EDU) Trainer‘s Manual

Reports: Where to find them

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images. After logging in to Smartcare, agree to all statements, enter a new password, and select ―Finish‖ at the top right corner.

Select Report Risk Category

Speaker Notes: Demonstrate and guide the participants

Session X: Using Information Derived from SmartCare 274 Epidemiology for Data Users (EDU) Trainer‘s Manual

These images were taken from a training installation of SmartCare and real data is not present in these images. When information has been covered in grey (―Greyed Out,‖) it is because you cannot see patient information at the district level.

Select Report (HIA2)

Speaker Notes: These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 275 Epidemiology for Data Users (EDU) Trainer‘s Manual

Choose Location and Date Range

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Run the Report (HIA2)

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 276 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example of a Report: HIA2

Speaker Notes: These images were taken from a training installation of SmartCare and real data is not present in these images.

Facility Demographics Report - 1

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 277 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facility Demographics Report - 2

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Facility Demographics Report - 3

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 278 Epidemiology for Data Users (EDU) Trainer‘s Manual

ART Summary Report - 1

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images. What are some general characteristics of the data contained in the ART Summary Report?

ART Summary Report - 2

Session X: Using Information Derived from SmartCare 279 Epidemiology for Data Users (EDU) Trainer‘s Manual

ART Summary Report - 3

Speaker notes Demonstrate and guide the participants

PEPFAR Quarterly Report - 1

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images.

Session X: Using Information Derived from SmartCare 280 Epidemiology for Data Users (EDU) Trainer‘s Manual

What are some general characteristics of the data contained in the PEPFAR Summary Report?

PEPFAR Quarterly Report - 2

Speaker notes Demonstrate and guide the participants

PEPFAR Quarterly Report - 3

Session X: Using Information Derived from SmartCare 281 Epidemiology for Data Users (EDU) Trainer‘s Manual

How to Export a Report

Speaker Notes: Demonstrate and guide the participants These images were taken from a training installation of SmartCare and real data is not present in these images. At this point, the report is ready to be exported. Click on Export

m1 Save File in an Excel format

Change this to Excel file type from the drop down menu

Speaker Notes: Demonstrate and guide the participants In order to save a file in Excel format, Select File Type, and save to ―Microsoft Excel.‖

Session X: Using Information Derived from SmartCare 282 Epidemiology for Data Users (EDU) Trainer‘s Manual

Successful Export

Open Report Export in Excel

Speaker Notes: It is not possible to make these Excel reports into graphs.

Session X: Using Information Derived from SmartCare 283 Epidemiology for Data Users (EDU) Trainer‘s Manual

Geographical Information Systems (GIS) Maps

Maps show distribution of health events and status

Visually track: • Progression of diseases in the system by location • Distribution of health data • How far clients travel to facility Maps can be printed or saved as templates

3 GIS Data Sources

• Live SmartCare Client data (local)

• Static data provided by MOH (national)

• User provided data

Speaker Notes: GIS in SmartCare can be

Session X: Using Information Derived from SmartCare 284 Epidemiology for Data Users (EDU) Trainer‘s Manual

GIS Maps: Where to find them

Select Indicator(s) & Regions, then Show Map (Static Data)

Session X: Using Information Derived from SmartCare 285 Epidemiology for Data Users (EDU) Trainer‘s Manual

Select Indicator(s) & Regions, then m1 Show Map (Live Data)

Session X: Using Information Derived from SmartCare 286 Epidemiology for Data Users (EDU) Trainer‘s Manual

Challenges in analyzing routine health data

Data Use Challenge: Duplication

Occurs when: • Patients register multiple times • Undertrained staff may sometimes re- register the same person by mistake • Analysts add people counts & assume total counts are unique by mistake This will affect later analysis and cause more work! • Statistical analyses of routine service data must adjust for both duplication of person records and of counts of records

Session X: Using Information Derived from SmartCare 287 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Use Feature: De-duplication

• SmartCare Feature: De-Duplication – As data rolls up, SmartCare records are de-duplicated – A patient that may have attended VCT services at two different facilities will be counted as two different VCT clients at the facility level – Once the data is rolled up to the district, the patient will be counted as one VCT client • Thus: aggregates may be different at every level! – With De-Duplication, there may appear to be less clients in than all the Lusaka district clients put together – In reality, the number of clients is the same, but now they are all only counted once at the province level!

WRAP-UP AND REVIEW

Objectives

• By the end of this presentation, participants should be able to… – Describe information that can be derived from SmartCare – Generate at least three aggregate reports using SmartCare – Export and graph information from SmartCare reports in Excel – Describe five data use challenges that may impact how one uses and interprets routinely collected care data, such as SmartCare data – Bonus for Scrupulous Data Users: Describe how best to address each of these challenges, at data collection, analysis, and reporting stages.

Session X: Using Information Derived from SmartCare 288 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XI: USING PIVOT TABLES IN EXCEL

Using Pivot Tables in Excel

Data System Exports Covered in EDU

SmartCare DHIS NACMIS

Printable Printable Paper Excel Pivot Table Paper Excel File Reports Reports

Excel Pivot Excel File Table

Speaker Notes: A system export is a file or item produced by a system for a use specified by the user. Each system covered in EDU can export data in many different ways – the above exports are only a sample of the larger spectrum, but we will focus on only these for the time being.

Session XI: Using Pivot Tables in Excel 289 Epidemiology for Data Users (EDU) Trainer‘s Manual

Pivot Table Skills

• NACMIS and HMIS both require use of Pivot table when expor ng data to Excel – HMIS exports data directly into a pre-created pivot table in Excel – NACMIS export large complicated datasets that can be simplified by crea ng pivot tables

Pivot Table Video

Session XI: Using Pivot Tables in Excel 290 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XII: HANDS-ON HMIS

Hands-On HMIS

Description of Session: This session will describe to participants the various components of HMIS, what type of information is collected and stored in the system, and how to generate reports and check data quality within HMIS.

Session XII: Hands On HMIS 291 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives:

Speaker Notes: At the end of this session participants will be able to: 1. Describe the various components of HMIS 2. Identify the kind of data and specific indicators captured and generated by the system 3. Generate reports using DHIS 4. Identify data marked for checking using tools in DHIS

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Hands-On HMIS Lecture Copies of the presentation for each participant

HMIS loaded on computers

Session XII: Hands On HMIS 292 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Ask the participants (particularly those from MoH): What is the HMIS? (Expected answer)  It is the Health Management Information System  It is a routine monitoring and evaluation system that is used to collect, compile and aggregate data on disease epidemiology and service related indicators Explain that it has 5 main components. Review the slide and outline the five main components.

Facilitator Background Information HMIS was created in an attempt to achieve primary health care for all people in Zambia. The HMIS is meant to track the progress that takes place in the health sector. It has five components among them:

Session XII: Hands On HMIS 293 Epidemiology for Data Users (EDU) Trainer‘s Manual

NIDS: defines what is collected; composed of a set of indicators that are used to monitor service provision. Data Flow Policy:  Dictates the requirements for data submission, including timeframes at each level within the hierarchy.  Defines the channels of communication

Speaker notes

Review the slide

Facilitator background information

8. Health workers collect data on the patient record form during service provision at the facility. 9. They then aggregate on the HIA report and Self Assessment Tool and, 10. Send to District Health Information Officer at the DHO for entry into the Data base. 11. The Data is processed, analyzed and used at the local level for reporting.

Session XII: Hands On HMIS 294 Epidemiology for Data Users (EDU) Trainer‘s Manual

The National Indicator Dataset

• Defines “what” data is collected • Composed of a set of 220 indicators • Calculated from data from numerous sources • Indicators are still in development – Current issues with non- reporting and denominators

Speaker Notes: Review the slide

Facilitator background information The DHIS can be viewed as a data warehouse which combines a variety of information collected through numerous (sometimes specialized) systems. The data is integrated, and made available through a uniform set of data elements and indicators, the NIDS: National Indicator Dataset. The NIDS is composed of 220 indicators

Session XII: Hands On HMIS 295 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Firstly ask the DHIOs present among the participants to list the indicators in the DHIs. Solicit answers from several of them. Review the slide the slide and summarize

Facilitator background information The DHIS contains data that tracks the following service areas and respective indicators: • Child Health and Nutrition – Under 5, Infant, Neo-natal mortality rate – Top 10 Childhood Diseases and Inpatient deaths – Malaria, ARI, diarrhea incidence and case fatality – Growth monitoring – Prevalence of Under Weight Children – Neo-Natal tetanus incidence rate – Immunization Coverage (DTP-HepB , OPV1/3, Measles, BCG, Vit A, Full immunization, Drop-Out)

Session XII: Hands On HMIS 296 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Reproductive Health – Maternal mortality ratio – Antenatal coverage – Deliveries: complications, death rates, by skilled personnel, by TBAs, by C-section – Pregnancies protected against tetanus – Abortion admission and death rate – Low birth weight – Family Planning: Contraceptive prevalence rate, Couple-years of protection  HIV/AIDS/STIs - HIV testing coverage - ART access - HIV prevalence among various age groups - Prevalence of comprehensive HIV/AIDS knowledge - Condom use - Population treated for STIs • TB – Case detection, conversion, death rate – Treatment completion, cure rate • Malaria – Confirmed positive cases – Case fatality rate – ITN distribution rate – ANC preventive treatments • Environmental Health and Food Safety – Households with access to safe drinking water • Epi Control and Public Health Surveillance – Cholera, Dysentery death rates

Session XII: Hands On HMIS 297 Epidemiology for Data Users (EDU) Trainer‘s Manual

– Skin infection incidence rate – Non-communicable diseases: diabetes, hypertension, substance and alcohol abuse – Mental Illnesses • Impact Indicators – Crude death rate – Life Expectancy – Total fertility rate  Primary Health Care (PHC) Management - Health facilities with expired essential drugs, with functioning cold chain - Health staff attrition rate - Bed utilization rates of occupancy, turnover, average length of stay - Health budget per capita - Percentage of HMIS reports received

What is the District Health Information system?

• A tool (software) to help improve service management to achieve better health for all by using available information • Based on the collection & analysis of routine data (essential dataset) at local level – Gathers info that shows changes in local health conditions, health status & health priorities – is indicator based • Informs different levels of health care providers of progress made towards achieving set objectives & targets

Speaker Notes: DHIS is a tool to help improve service management to achieve better health care. It informs different levels of health care providers of the progress that is being made.

Session XII: Hands On HMIS 298 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes Explain to the participants that this will be a hands on session. Ensure that every participant has access to a computer installed with the DHIS software and able to locate the DHIS Icon. Check and guide the participants to ensure all are on course

Speaker notes Review the slide

Session XII: Hands On HMIS 299 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Inform the audience that first, we will start with creating Raw Routine Data Reports. Explain that certain steps have been highlighted on the slides by red arrows and boxes on the slides, as seen below.

Facilitator background information These reports show all the data reported from different levels of the MOH (HMIS) system. You can generate Raw Routine Data Reports for a specific facility (Rural Health Center, Urban Health Center, Hospital), specific district, specific province or for the entire Ministry of Health for selected periods. Anything above the facility level can take the computer a long time to generate so we‘ll focus on one Rural Health Center for today.

Session XII: Hands On HMIS 300 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Demonstrate to the participants how to open DHIS and select ―Standard Reports‖. Guide and check to ensure all are on course

Speaker Notes: Guide and check the participants. Demonstrate Click ―Routine Data Reports‖

Session XII: Hands On HMIS 301 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: First, we need to define the data set that we are pulling the data from. Each ―Data Set‖ contains a different subset of values that loosely corresponds to a certain category or categories of data. We will be looking at the ―Routine PHC Dataset‖ which focuses on routine data and HIV/AIDS (basically, it refers to data that we have to deal with every year such as ANC attendance, number of households in catchment area). Click the dialog box next to the “Data Set” box.

Session XII: Hands On HMIS 302 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Check the ―Routine PHC Dataset HIA2‖ checkbox.

Speaker Notes: Click ―OK‖.

Session XII: Hands On HMIS 303 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Aggregation OrgUnit refers to the level at which you wish the data to be exported. For our case (and to save time) we will choose only one facility. Click the dialogue box next to “Aggregation OrgUnit.”

Session XII: Hands On HMIS 304 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Click the box with a (+) in it next to ―ce ‖ to find ―ce Chimbombo Rural Health Centre‖. Select ―ce Chimbombo Rural Health Centre‖ and then press ―OK‖.

Session XII: Hands On HMIS 305 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we want to define our timeframe. We will first define our start date. Click the button next to the “From” textbox.

Session XII: Hands On HMIS 306 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: For our purposes, let us define just the year of 2009. To do this, select Jan 1, 2009 in the calendar.

Speaker Notes: Now we need to select our end date. Click the button next to the ―To‖ textbox.

Session XII: Hands On HMIS 307 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now select 31 of December 2009 on the calendar.

Speaker Notes: Now that we have set our parameters we can generate our report. Select the Microsoft excel icon in the ―Output‖ box. Please wait. This may take a few minutes.

Session XII: Hands On HMIS 308 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: We can scroll down to find routine data created at the Chibombo Rural Health Centre. So just for practice, lets quickly find out what the Attendance for child health services was for males under the age of one year. (GIVE TIME) -That‘s right, by adding all the months for 2009, we find that the total equals 2,532. Does this mean 2,532 males under the age of 1 year accessed child health services for 2009? (GIVE TIME) -No. One child can show up each month, and if add the numbers like we did, he would be counted 12 times.

Session XII: Hands On HMIS 309 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we‘ll create an Ad Hoc Data Report. Ad Hoc Reporting allows us the ability to choose specific data that we want to view. For example, if we only want to know how many people accessed testing services for the district of Chama without have to generate an entire data set for Chama district (which would take a lot of time) we can generate a specific Ad Hoc Data Report looking only at testing services in Chama District.

Session XII: Hands On HMIS 310 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Once in ―Standard Reports‖ click the ―Ad Hoc Raw Data Report‖ on the left hand side of the screen. For this example, we will be looking at the number of Children fully immunized under the age of one in Central Province for the year 2009. To get started, again, we need to select our data set. Click ―Browse‖ next to the ―Data set‖ Text Box.

Session XII: Hands On HMIS 311 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Since immunizations are a routine Health service, we need to select the dataset that contains routine data. Select ―Routine PHC Dataset HIA2‖.

Session XII: Hands On HMIS 312 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we need to specify the level at which we want the data to be generated at. Click ―Browse‖ next to the ―Root OrgUnit‖ textbox.

Speaker Notes: Select ―ce Central Province‖ and click ―OK‖

Session XII: Hands On HMIS 313 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we need to specify the time range. Click the button next to the ―From‖ textbox.

Session XII: Hands On HMIS 314 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Select the start date. For our purposes, since we want to only view 2009, click on January 1, 2009 on the calendar.

Speaker Notes: Now we need to select our end date. Click the button next to the ―To‖ text box.

Session XII: Hands On HMIS 315 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Select the very last day in 2009 – December 31, 2009 on the calendar.

Speaker Notes: Now we need to select the Data Element we want to view. Scroll down in the ―Available Data Elements‖ box until you find ―Immunized fully <1 year new.‖

Session XII: Hands On HMIS 316 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now click the arrow pointing right to put the data element into the ―Data Elements selected‖ box.

Session XII: Hands On HMIS 317 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we want to specify the file name. Scroll down so you can fully see the ―Save Report As…‖ box.

Speaker Notes: Select the last portion of the name and enter ―\CENTRALimmunized…‖ Whatever Report you generate, you can always go back and view it again from the main DHIS

Session XII: Hands On HMIS 318 Epidemiology for Data Users (EDU) Trainer‘s Manual screen by clicking ―Reports Folder‖. The name you choose here will be the name that shows up in the ―Reports Folder‖

Speaker Notes: Now click ―Create Report‖ and wait for the Excel file to load. This is one of the limitations of the program, it takes some time for this Excel file to load. Prompt Audience: While we are waiting, let me ask you one question. What are some of the challenges you face as program officers when you try to obtain data?  Frequently the people we need to get information from demand that we beg them for the data; they take advantage of being in those powerful positions  Sometimes the information that we obtain does not make any sense  There is a gap between the District Health Information Officers and the Program Officers

Session XII: Hands On HMIS 319 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Remember that video about Pivot Tables? Well, this is an example of a pre-made, rather complex pivot table. Don‘t worry about how it‘s made, we will only be focusing on how to use this pivot table generated by the DHIS. Pivot Tables are very powerful tools. They allow you to handle huge amounts of information very easily. We are going to learn how to use these Pivot Tables. Let‘s start my manipulating the table to show Different parts of this data set. Start by click and dragging the ―District‖ box to the vertical line between ―Year‖ and ―Data Period‖ on the existing pivot table.

Session XII: Hands On HMIS 320 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we can see the numbers of fully Immunized children broken up for all the districts. If you‘re feeling skeptical about the data show, go ahead and add up all the district number for January. It‘s still 4,113, right? Great. Let‘s continue to manipulate this table. Click and drag the ―Facility‖ box to the line between ―District‖ and ―Data Period‖

Session XII: Hands On HMIS 321 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we can see each district broken down by its constituent facilities. Again, all the total numbers are the same, but we now can see more information about the numbers of immunized children in Central Province. Now, let‘s do something different. Let‘s manipulate the table so that we only see the totals for 2009, not the monthly numbers. To do this, click and drag ―Data Period‖ back to the top, just above the ―Province‖ box.

Session XII: Hands On HMIS 322 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we can only see the yearly totals for 2009. Let‘s continue to shrink – or aggregate – the data table. Click and drag the ―Facility‖ box back to the top, right above the ―Data Period‖ box.

Session XII: Hands On HMIS 323 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let‘s aggregate the data table again: click and drag the ―District‖ box back to the top, right above the ―Province‖ box

Session XII: Hands On HMIS 324 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we have shrunk – or aggregated – the data table to its smallest point. This shows the total number of children fully immunized in Central province for 2009. Now that we know how to manipulate pivot tables, let‘s work on generating graphs from these pivot tables. Let‘s return the pivot table to its original form by dragging the ―DataPeriod‖ back to table, right above the ―Total‖ box.

Speaker Notes: To graph our table we need to tell excel what data we want to focus on. Before we can do this we need to be able to see all of our data, in other words we need to Zoom-Out. To do this click on the (-) button in the lower right hand corner next to the zoom slider. Click once so ―80%‖ is visible next to the (-) button.

Session XII: Hands On HMIS 325 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: So, what do we want to graph? For our purposes, let‘s say you want to know the trend in new, fully immunized children on a monthly basis for 2009. What would graph would best answer this question? Since we want to know the change in something over time, or a trend, we should use a line graph.

Now that we know what graph we want, let‘s learn how to create it. First we need to select the data to be displayed in the graph. To do this, highlight all the numbers in the ―Immunizedfully‖ row, starting with ―Jan‖ and ending with ―Dec‖.

Session XII: Hands On HMIS 326 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: As we mentioned, we want to create line graph to show the trend in new, fully immunized children. In order to select just the graph we want, open the Line graph drop down menu. To do this, click on the downward arrow just under the ―Line‖ button in the ―Insert‖ tab.

Session XII: Hands On HMIS 327 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Since our data is collected only once on a monthly basis, we want to emphasize the difference between monthly numbers and those in-between (since we don‘t know them). To do this, let‘s choose the ―Line with Markers‖ graph.

Session XII: Hands On HMIS 328 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Don‘t worry you haven‘t made a wrong turn. Since HMIS exports its pivot tables with lots of choices, for breaking down or aggregating your data, excel can get a bit confused as to how it should organize its graphs. So, we have to help it out a little. In this case, as we agreed, we want to use a line graph to show the trend of Immunizations over the course of 2009 on a monthly basis, so we want our x- axis representing time (―data period‖) to be broken down by month. Looking at our graph, we can see that it does display our data monthly, but only as a legend – we want it to be displayed on the x-axis and (as we will see later) we want the ―Province‖ to be displayed as a legend. To correct this first click and drag the ―Province‖ field from the ―Report Filter‖ box to the ―Legend Fields‖ box.

Session XII: Hands On HMIS 329 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now to make the x-axis display time on a monthly basis, click and drag ―DataPeriod‖ from the ―Legend Fields‖ box into the ―Axis Fields‖ box.

Speaker Notes: There you have it. A detailed line graph outlining the trend in newly immunized children in 2009.

Session XII: Hands On HMIS 330 Epidemiology for Data Users (EDU) Trainer‘s Manual

We want to keep in mind that we are making this graph for someone else, be it a boss, committee or general public. While you may know what each part of the graph means, other people won‘t. So, in order to appropriately communicate what this graph is says, we need to label and title every aspect. But first, we need this graph to be big enough to read. To do this click and drag the upper left hand corner of the box outlining our graph until the box is big enough.

• Using the skills learned in the Excel lesson, you should be able to modify the graph so it is presentable to any parties.

Speaker notes Our graph is now ready for export. As we discussed earlier, the purpose of our graph is to inform other people, thus we need to move our graph into a form that can be easily presented. We can put our graph into a printed report, webpage, poster or any other form of wide communication. For our purposes, let‘s put our graph into a powerpoint. First, make sure the graph is selected (if you‘ve been following, the graph should already be selected) and click ―Copy‖ under the ―Home‖ tab.

Session XII: Hands On HMIS 331 Epidemiology for Data Users (EDU) Trainer‘s Manual

• To break down our data so we can see individual districts, move the “District” field from the “Report Filter” box to the “Legend Fields” box

Speaker Notes: As we mentioned before, our graph only displays one data set: the monthly totals for Full Immunizations for children under one year in Central province. Can we infer what the monthly total for Kapiri Mposhi was in January 2009? Not from this graph alone. But, we can get this information by making a few changes to pivot tables. To break down our data so that we can see individual districts, move the ―District‖ field from the ―Report Filter‖ box to ―Legend Fields‖ box.

Session XII: Hands On HMIS 332 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Each line refers to a different district within Central Province • To add a legend, click on the “Layout” tab under “PivotChart Tools”

Speaker notes Now we can see that things are a little more complicated. Each line refers to a different district within Central Province. To know which line refers to which district we need to add our legend back into the graph. To do this, again click on the ―Layout‖ tab under ―PivotChart Tools‖

Session XII: Hands On HMIS 333 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Click on “Show Legend at Bottom”

• Let’s change the title to reflect the changes we have made • Click on the title to rename

Speaker Notes: Let‘s take a look at the title. Does this title still work for the graph? Does it fully describe the information being presented below? That‘s right: it doesn‘t. Let‘s rename the title to reflect the changes we have made. Click on the title to rename.

Session XII: Hands On HMIS 334 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Rename: “Trends in Full Immunization in Central Province by District, 2009”

Speaker notes Rename the title: ―Trends in Full Immunization in Central Province by District, 2009‖

• Enlarge the graph so that it can be interpreted easily

Speaker Notes: Now that our graph is once again labeled correctly, we need make sure it is big enough to interpret correctly. If it isn‘t, enlarge the graph so it is.

Session XII: Hands On HMIS 335 Epidemiology for Data Users (EDU) Trainer‘s Manual

Data Quality Checks: “Check-It Data”

Data Quality Checks

• DHIS has an automated modules for checking data quality during data entry – something we will not be covering in this training – We can however learn more about suspicious data with DHIS’s “Check- it Data” module.

Session XII: Hands On HMIS 336 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes Most data quality checking modules in DHIS function during data entry. Meaning suspicious looking data will be flagged only when the DHIO enters the data – prompting him to recheck the data and refer back to the data source. When extremes in data are a result of certain events or situations in the facilities – the DHIO will enter comments into the ―Check-it Data‖ module, allowing data users to later refer back to the comments when reviewing the data. Since we are not covering data entry in this training, we will focus solely on the ―Check-It Data‖ module.

• If you have identified suspicious data, go to the “Data Quality” module on the DHIS Main Menu

Speaker notes Once you have identified an outlier or some other suspicious data, go to the ―Data Quality‖ module on the DHIS main-menu.

Session XII: Hands On HMIS 337 Epidemiology for Data Users (EDU) Trainer‘s Manual

• Click on “Routine ‘Check-It’ Data”

Speaker notes Click on ―Routine ‗Check-It‘ Data‖

• To search for a dataset based on location, enter the facility name in the text box above the “OrgUnit” column • To search for a dataset based on the data element, enter the facility name in the text box above the “Data Element” Column

Session XII: Hands On HMIS 338 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes Now you can search for the dataset in question by location or by Data Element. If you are searching based on the location (i.e., facility, district, province), enter either part or all of the facility name in the text box above the ―OrgUnit‖ Column. If you are searching based on the data element, enter either part or all of the facility name in the text box above the ―Data Element‖ Column.

WRAP-UP AND REVIEW

Objectives

Participants should now be able to…. •Describe various components of HMIS •Identify what types of data and indicators are captured and generated in HMIS •Generate reports using DHIS – Produce relevant graphs, figures and charts using data from the DHIS •Identify data marked for checking using tools in DHIS

Session XII: Hands On HMIS 339 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XIII: HANDS-ON NACMIS

Hands On NACMIS

National AIDS Council Zambia

Nyewani Soko

Description of session: This session will provide a description of NACMIS, the data it contains and how it can be utilized.

Session XIII: Hands On NACMIS 340 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives:

Objectives

• At the end of the presentation, participants should be able to: – Describe the various components of NACMIS – Generate reports, graphs and tables from NACMIS – Export data from NACMIS into Excel – Generate Excel pivot-tables and graphs from NACMIS data

Speaker Notes: Review the slide on objectives together with the participants. The emphasis of this training is on use of the data that resides in the NACMIS system. Ask the participants to read out the objectives one at a time.

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Hands-On Lecture Copies of the NACMIS presentation for each participant

NACMIS loaded on computers

Session XIII: Hands On NACMIS 341 Epidemiology for Data Users (EDU) Trainer‘s Manual

Components (subsystems) of NACMIS

• Performance Monitoring • Civil Society Funding Directory • Documentation • Private Sector Directory • AIDS Service Organizations Directory

Speaker notes Review the slide outlining the components of NACMIS. Explain to the participants what is contained in each subsystem. Ask the participants from NAC to explain (one at a time) what is contained in each of the five sub systems. (Expected responses)  Performance monitoring – contains all the indicators that are being tracked and reported on by NAC  Civil society directory – lists the NGOs in the HIV& AIDS response  Documentation - contain information on IEC materials  Private sector Directory – Lists all the private sector and workplace programmes  AIDS Service Organizations – Lists all the known service organizations in the response Use the background information provided below to expand on the responses and explain further.

Facilitator Background Information The NACMIS is a database system developed to help manage information for NAC. The information managed by NACMIS is organized in modules as follows: i. Performance Monitoring ii. Civil Society Funding Directory iii. Documentation

Session XIII: Hands On NACMIS 342 Epidemiology for Data Users (EDU) Trainer‘s Manual iv. Private Sector Directory v. AIDS Service Organizations Directory

Indicators and Data Available in NACMIS

• Indicators from MoH NARF – Contains data collected by the DACA from the DHIO • Indicators from the NAC NARF – Contains data collected by the DACA from various NAC Partners

Speaker Notes: Ask the participants in which sub system of NACMIS are indicators contained:  ANSWER – In the Performance Monitoring subsystem. Review the slide and explain: Emphasize to the participants that this session will focus mainly on the Performance Monitoring System.

Facilitator background information The NACMIS currently holds data tracking a total of 43 indicators. The data is drawn from the National AIDS Reporting Form (NARF) which is the main data collection tool for the NACMIS. There are two types of NARFs. The Ministry of Health (MOH) NARF and the District AIDS Task Force (DATF) NARF  The current MoH NARF has 22 indicators while the DATF NARF has 21 indicators  MoH NARF specifically is used to collect clinical related data while the DATF NARF collects non –clinical related data from NGO‘s and Govt departments

Session XIII: Hands On NACMIS 343 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Quickly review the slide presenting the indicators. Ensure that you indicate to the participants the DATF NARF indicators and MoH indicators.

Session XIII: Hands On NACMIS 344 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker notes This is a hands on session. Ensure that each participant is sitting in front of a computer, which is turned on and has NACMIS installed on it. Ensure that all participants locate the NACMIS Icon and are aware of how to open the programme

To launch the NACMIS application, simply double-click the NACMIS icon on the desktop of your computer. The icon may look something like this

When you double-click the above icon, the NACMIS launches, and a screen displaying the menus for the NACMIS appears,

Speaker Notes: Open NACMIS and ask the participants to also open the programme on their respective computers. Inform them that this is the main menu of the NACMIS. It is the first screen which appears when you first run the system Ask the participants to locate the five subsystems

Session XIII: Hands On NACMIS 345 Epidemiology for Data Users (EDU) Trainer‘s Manual

Review and slide and point out the five sub systems. Emphasize on the Performance monitoring.

Facilitator background information The NACMIS consists of 5 subsystems  Performance monitoring  Civil society Funding Directory  Documentation  Private sector Directory  AIDS Service Organizations Directory

Speaker notes Ask participants to click on ‗Performance monitoring‘. Demonstrate and guide.

Session XIII: Hands On NACMIS 346 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Demonstrate to the participants the process of selecting reports in NACMIS. Ask participants to follow the steps on their own computers. Ask them to select ―Performance Monitoring,” then ―Reports‖ then select―PM2: NARFs – Provincial Summaries‖. Then click ―Open‖. Check and guide them to ensure that all the participants are on course.

Facilitator Background information There are 3 types of NARF Reports that can be generated under Performance Monitoring These are (1) ―By District and quarter‖ which can generate a report per district per quarter, (2) ―Provincial Summaries‖ which generate reports for a specified province by period of time (e.g. 1 quarter, 2 quarters, or 4 quarters), and (3) ―National Summaries‖ which can generate national overall report for a specified Period of time (e.g. 1 quarter 2 quarters ..or 4 quarters).

Session XIII: Hands On NACMIS 347 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Demonstrate the Data Range selection by ‗clicking on the ―period box‖ and ask participants to practice on their computers. Check and guide them to ensure that all are on course.

Facilitator background information NACMIS collects data on a quarterly basis. There are two ways to select the date range: 1. To select More than 1 quarter you use the ―From‖/‖To‖ input Box to select the date range 2. To select a specific quarter you use the period combo box

Session XIII: Hands On NACMIS 348 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: For this exercise select 2009 quarter 3. Demonstrate to the participants and ask them to practice.

Session XIII: Hands On NACMIS 349 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Click ―OK‖. Check and guide participants to ensure all have successfully selected the desired data range. Ask participants if they have any questions up to this point and provide the necessary clarifications.

Speaker Notes: Explain that having selected the data range, you could now choose to focus only on data for one province or a number of provinces. Demonstrate to the participants the provincial filter process and ask them to practice by looking at all the provinces. Check and guide to ensure all are on course.

Facilitator Background information It is possible to select just one province, or multiple provinces, or all provinces. For instance, if we want all provinces, make sure all the provinces are checked and click ―OK‖. If we want only one or a selected number, then highlight the desired province or provinces and click ok.

Session XIII: Hands On NACMIS 350 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Demonstrate to the participants how to use the Data collection tool filter. Participants follow and practice the process. Check and guide the participants to ensure all are on course.

Facilitator background information NACMIS contains data not only collected by the National AIDS Council but also data collected by the MOH via the HIA. AT this point NACMIS is asking you to define what indicators are included in the (soon-to-be) generated report. For our situation let‘s just look at the data collected by the DATFs. Make sure ―MOH+DHMT‖ is unchecked. Click ―OK‖.

Session XIII: Hands On NACMIS 351 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Demonstrate while participants follow and practice process to use the Thematic filter. Check and guide to ensure all are on course.

Facilitator background information Since we want to view all indicators offered, it is recommended to select ALL The Thematic Areas to avoid any important data from being left out at this stage. Click the ―All‖ button. There is also a sub thematic filter.

Session XIII: Hands On NACMIS 352 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Click ―OK‖

Speaker Notes: Again Demonstrate process for sub thematic filter and let participants practice. Check and guide them. Emphasize that they select ALL to avoid any important data from being left out at this stage. Click ―All‖

Session XIII: Hands On NACMIS 353 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Click ―OK‖

Speaker Notes: Now we can view all the indicators. If you left any Thematic or Sub-Thematic options unchecked then you would see an incomplete listing.

Session XIII: Hands On NACMIS 354 Epidemiology for Data Users (EDU) Trainer‘s Manual

For our exercise let‘s just look at indicator #5: ―# of traditional healers trained in infection prevention and use of sharp instruments…‖ Click the corresponding box.

Speaker Notes: Click ―OK‖

Speaker Notes: The report appears. It is possible to zoom in so that you can see the report and the figure more clearly.

Session XIII: Hands On NACMIS 355 Epidemiology for Data Users (EDU) Trainer‘s Manual

Note that Reports are preformatted to look like the NARF. Select the next page to see the other pages of the report, which contain information about the other provinces.

Speaker Notes: A Report showing the number of traditional healers trained in infection prevention It is possible to print this report as a PDF. It is possible to return to the home page by clicking ―Preview.‖

Session XIII: Hands On NACMIS 356 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: To generate the Graphs, select ―Performance Monitoring‖ in the SUB-SYSTEMS dialogue box and then ―National Graphs‖ in the TASKS box. Again, for the purpose of this exercise, we will look at the Graph of traditional healers and service providers trained in infection prevention. But to find this graph we need to understand a little about the current state of the NACMIS system.

Session XIII: Hands On NACMIS 357 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Due to NACMIS still being under development, the titles of the graphs and table are not named and, instead, are represented by ―Figure #‘s‖

Speaker Notes: We won‘t go too far into what each figure refers to – this chart is in your notes and you can refer back to it as needed. Notice that the indicator we are interested in (# of traditional healers and service providers trained in infection prevention) is figure number 3. Also note that Figure 3 is bundled together with Figures 3, 3.1, 6, 13, 14, and 15. You do not need any additional software to generate these graphs; it can all be done in NACMIS.

Session XIII: Hands On NACMIS 358 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Again, for this exercise we are interested in Graph of traditional healers and service providers trained in infection prevention. If we refer back to our reference table on the previous slide, we can see that this graph is referred by the system as Figure #3. So to pull this graph up we need to select the option: Select Figures 3,3.1,6,13 14,15. Click ―Open‖.

Speaker Notes:

Session XIII: Hands On NACMIS 359 Epidemiology for Data Users (EDU) Trainer‘s Manual

Since we have selected multiple figures (Figures 3, 3.1, 6, 13, 14, and 15) we need to now select Figure 3 to be specifically shown. To do this, expand the ―GRAPH‖ drop down menu by clicking on the far left arrow of the first text box.

Speaker Notes: Now select Figure 3: Graph of traditional healers and service providers trained in infection prevention.

Session XIII: Hands On NACMIS 360 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we need to define the period we are interested in looking at. To do this, expand the ―AS AT PERIOD‖ drop down menu by clicking on the arrow of the second text box.

Speaker Notes:

Session XIII: Hands On NACMIS 361 Epidemiology for Data Users (EDU) Trainer‘s Manual

While generating reports within NACMIS we are only able to select the date range by quarter. We will see later on in the presentation that we can generate more detailed reports with greater selectability by first exporting data to Excel. For now though, let‘s focus on 2009 Quarter 3. Select “2009 Qtr3”

Speaker Notes: Figure 3: Graph of traditional healers and service providers trained in infection prevention. If you scroll down you will see a ―copy‖ button. This allows you to copy the graph directly into another program such as PowerPoint.

Session XIII: Hands On NACMIS 362 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Just like the pre-selected graphs available, NACMIS also offers few pre-formatted, exportable tables. The advantage of generating tables directly from NACMIS is that it is simple and easy ways to view the data in excel without have to export raw data (which we will cover shortly). However, just like the national graphs, the details in and options for generating these tables are limited.

Speaker Notes: To create National table select performance monitoring. Then select tables.

Session XIII: Hands On NACMIS 363 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Just like the NAMCIS graphs, the tables are represented table number and not by the title name.

Session XIII: Hands On NACMIS 364 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Again, this chart is in your notes and you can refer to it as needed.

Speaker Notes: Since we can generate one file that contains all the tables as separate sheets, make sure that you export each table. To do this, ensure that all the table check-boxes are checked.

Session XIII: Hands On NACMIS 365 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we need to select the date range, which for our example will be Quarter 2 of 2010. Select “2010 Qtr 2” from the drop down menu.

Speaker Notes: Click the Command Button: ―click here to generate tables‖

Session XIII: Hands On NACMIS 366 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: The tables are generated in Microsoft excel in a folder named Excel exports in the following path C:\Nacmis\ExcelExports\tables.xls. Click OK.

Speaker Notes: Notice that each table is placed in a sheet (i.e., table 12 is in the sheet named table 12)

Session XIII: Hands On NACMIS 367 Epidemiology for Data Users (EDU) Trainer‘s Manual

Note that table 12 does not contain any data. Also notice that table 12 is not included in the NACMIS table selection list

Speaker Notes: Table 6 is Number of service providers trained in diagnosis & treatment of STIs according to national standards. You can alter these tables in Excel, and copy and paste them into your reports.

Session XIII: Hands On NACMIS 368 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: As mentioned earlier, while NACMIS does offer basic modules for viewing data, we can export raw data to excel for further analysis and greater control in the creation of graphs and tables. Let‘s extend our consideration of traditional healers in respect to infection prevention training.

Session XIII: Hands On NACMIS 369 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: To begin, click on ―Performance Monitoring‖ under the SUB-SYSTEM and then click on ―Excel Exports‖ under TASKS. We can now choose the level at which we would like to export our data. And, just like the reports, data can be exported by District, Province and as National aggregates. For our purposes, select ―EPM2: NARFs – Provincial Summaries‖. Then click ―Open‖.

Speaker Notes: The Date Range selector appears. Here we can already see that we have more flexibility in our date range. To select More than 1 quarter use the From: To input Box to select the date range To select a specific quarter use the period combo box

Session XIII: Hands On NACMIS 370 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: For this exercise we will select 2009 quarter 3. Click ―2009 Qrt 3

Speaker Notes: Click ―Ok‖

Session XIII: Hands On NACMIS 371 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Let‘s take a look at the data for all the provinces. To do this make sure that all the check- boxes are checked then click ―Ok‖.

Speaker Notes:

Session XIII: Hands On NACMIS 372 Epidemiology for Data Users (EDU) Trainer‘s Manual

Again, NACMIS is giving us the option to export data collected by the NARF only, the HIA only or both. For our purposes, since our indicator of interest is collected exclusively by the NARF, let‘s just select the “DATF/PATF” option.

Speaker Notes: Click ―Ok‖

Session XIII: Hands On NACMIS 373 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: The thematic areas Filter appears. Again, it is recommended to select ALL the thematic areas, as to avoid any important data from being left out at this stage. To do this click the ―All‖ button.

Session XIII: Hands On NACMIS 374 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Click Ok

Speaker Notes: Again, on the filter on sub thematic Areas select ―ALL”

Speaker Notes: Then click OK

Session XIII: Hands On NACMIS 375 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: All the indicators appear at this stage. It is at this stage that we can be selective. Make sure that only the indicators that are to be reported on are selected. For our example, check indicator number 5: ―# of traditional healers trained in infection prevention and use of sharp instruments…”

Session XIII: Hands On NACMIS 376 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Then click OK

Speaker Notes: The Excel file will now be created. Click ―Ok

Speaker Notes: The excel file created in the path C: NACMIS\ExcelExports \EPM2 …‖date‖ ….. at …‖time‖

Session XIII: Hands On NACMIS 377 Epidemiology for Data Users (EDU) Trainer‘s Manual

Click Yes to open the file

Speaker Notes: As you can see the exported data is quite chaotic. There‘s a lot of information that we don‘t really need, so in order to work with the data we do need – we have to clean. Cleaning the data usually refers more to the process of removing corrupt or inconsistent data, however for our purpose we will refer to data cleaning as a way of getting rid of useless or unnecessary information in our data set. There are two ways of doing this. Your manual employs a simple highlight-and-delete method that is simple but long. Instead, for this presentation (and given to you on the CD) you will learn the pivot table method.

Session XIII: Hands On NACMIS 378 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: First click the ―PivotTable‖ under the insert tab.

Speaker Notes: Click ―OK‖.

Session XIII: Hands On NACMIS 379 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: As you learned earlier, we need to drag our data elements into the table – organizing the data as needed. Let‘s make the columns the indicator by dragging the field item ―Indicator‖ into the ―Column Labels‖ box below.

Session XIII: Hands On NACMIS 380 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Don‘t panic. Yes, the columns are blown up but Excel is just doing this so that you can see the names of the indicators. Let‘s finish constructing our pivot table and then we will go back and make the columns smaller. As mentioned, NACMIS sometimes exports data that you didn‘t request. So to make sure we are only looking at the data we are concerned with, let‘s only select ―# of traditional healers trained in infection prevention and use of sharp instruments…‖ To do this, click the drop down button on the cell ―Indicator‖ (this is in cell B3).

Session XIII: Hands On NACMIS 381 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Uncheck ―% of health facilities providing…‖ and any others you do not want displayed.

Speaker Notes: Click ―OK‖.

Session XIII: Hands On NACMIS 382 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we can break our columns down even further. Let‘s break ―# of traditional healers trained in infection prevention‖ down by gender by dragging the ―Indicator Segregation‖ field into the ―Column Labels‖ box under the ―Indicator‖ field.

Session XIII: Hands On NACMIS 383 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Again you‘ll notice that the columns have been blown up – don‘t worry we will format the table correctly once it is complete. Next we need to define our row. Since we are concerned with comparing numbers across provinces, drag and drop the ―Province‖ Field to the ―Row Labels‖ box.

Session XIII: Hands On NACMIS 384 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now that we have defined our rows and columns, we need to actually populate our table. Let‘s populate with the current total number of trained traditional healers by dragging the field ―Currents‖ to the ―Values‖ box.

Speaker Notes: Once we do this, we can see that every province currently has only one trained healer. Looking back at our data we can see that this is not correct. This is because Excel is currently reporting on the counts (you can see this displayed as ―Count of Currents‖) – not the actual total numbers of trained healers. To change this click on the downward arrow next to the field ―Count of Currents‖ in the Values box. Select ―Value Field Settings‖.

Session XIII: Hands On NACMIS 385 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Change the setting to ―Sum‖.

Speaker Notes: Click ―OK‖.

Session XIII: Hands On NACMIS 386 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now if we compare the new values in the table we can see that they match up with the raw exported data! Now that our table is complete, we can start formatting. Start by making the columns smaller by click and dragging the right side of each highlighted column.

Session XIII: Hands On NACMIS 387 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now that our pivot table is finished we can create our graph.

Session XIII: Hands On NACMIS 388 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Create a grouped bar graph by clicking on the ―Cluster Column‖ option under the Column button on the Insert tab.

Session XIII: Hands On NACMIS 389 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now our graph appears with very little work! Now if you created this graph via the pivot table method, your legend and axis may be labeled differently. Notice that even though the labels are correctly placed, we do still need to do a little modification. First let‘s add a Title.

Session XIII: Hands On NACMIS 390 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: To add a title, click on the ―Chart Title‖ button under the Layout tab and select ―Above Chart‖.

Speaker Notes: Now enter the title. What is an appropriate title?

Session XIII: Hands On NACMIS 391 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: How about ―Number of Traditional Healers Trained in Infection Prevention for Zambia, Quarter 2 2009‖?

Session XIII: Hands On NACMIS 392 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now that we are title correctly, lets resize the chart are to get a better view of the graph. To do this click and drag the bottom corner of the chart area to a reasonable size.

Session XIII: Hands On NACMIS 393 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now we need to add a vertical axis (or y-axis) title. To do this go this, click on the ―Axis Titles‖ button under the ―Layout‖ tab and select the ―Rotated Title‖ under the ―Primary Vertical Axis Title‖.

Session XIII: Hands On NACMIS 394 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Now rename the axis title: ―# Traditional Healers Trained.‖

Session XIII: Hands On NACMIS 395 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: There you have it! A complete graph completely created from data available in the NACMIS! (If time remaining, discuss what information the graph is communicating.)

Session XIII: Hands On NACMIS 396 Epidemiology for Data Users (EDU) Trainer‘s Manual

WRAP-UP AND REVIEW

Session XIII: Hands On NACMIS 397 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XIV: CSO BREAKOUT SESSION –RATIOS, PROPORTIONS AND RATES

Ratios, Proportions, and Rates

1

Learning Objectives

At the end of this lecture, you should be able to: • Describe the difference between types of ratios (i.e. proportions, rates, rate ratios) and give an example of each • Define and calculate prevalence and incidence • Identify and define different mortality rates • Define odds ratios and rate ratios

2

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 398 Epidemiology for Data Users (EDU) Trainer‘s Manual

Bottom Line: All Roads Lead to Ratios

• Descriptive and analytic epidemiology relies on the presentation of data

• Ratios tend to be the most common mode of presenting epidemiologic data

• Ratios put other numbers into context by making a comparison

3

Bottom Line: All Roads Lead to Ratios

• Types of ratios – Prevalence (aka proportion) – Incidence (aka rate)

• Ratios of ratios – Relative Risks – Odds Ratios

4

Speaker Notes: Prevalence: The total number of cases of a specific disease or condition at given time Incidence: Number of new cases of a disease or condition at a given time Relative Risk: The risk of developing a disease relative to exposure Odds Ratio: Odds of exposure to a risk in the diseased population compared to the odds of exposure in the non-diseased population

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 399 Epidemiology for Data Users (EDU) Trainer‘s Manual

All Roads Lead to Ratios

Ratios

Proportions Non-proportion ratios (x/x+y) (x/y)

Prevalence Incidence Case-fatality ratio Rate ratios/ Odds ratios Relative Risk

Incidence proportion Incidence rate

5

What is a Ratio?

• Ratios express the relationship of two quantities

6

Speaker Notes: Can you name some common ratios that are used on a daily basis? Examples: Male to female ratio

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 400 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is a Ratio?

• Ratios express the relationship of two quantities

• These quantities may be – related (proportion ratio) X

(X+Y)

7

What is a Ratio?

• Ratios express the relationship of two quantities

• These quantities may be – related (proportion ratio) X (X+Y) – independent (non-proportion ratio)

X Y

8

Speaker Notes: Proportion Ratio Of all the patients who went to Hospital A on May 28th, 28 became sick with the measles X= Number of patients who had the measles Y= Patients who went to the hospital on May 28th who did not become ill

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 401 Epidemiology for Data Users (EDU) Trainer‘s Manual

X+Y=All patients who went to Hospital A on May 28th Non-Proportion Ratio 28 Patients from Hospital A contracted the measles, 19 from Hospital B contracted the measles. If there is no overlap in patients, those who attended Hospital A are completed independent from those who went to Hospital B X= Measles cases in Hospital A Y=Measles cases in Hospital B

Proportions

• Ratio in which the numerator is included in denominator, often expressed as %

X

(X+Y) – Example: 4/100 homeless individuals have TB = 4%

9

Speaker Notes: Clarify denominator to be the appropriate pool from which cases occur. Population only includes people who could have the disease. For example, if we look at cases of ovarian cancer the reference population would only include women.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 402 Epidemiology for Data Users (EDU) Trainer‘s Manual

Prevalence

• Proportion of individuals in a population who have the disease (or condition) at a given time – Includes all of the people with the condition, whether or not they just developed it or have had it for years – Often reported as a percent (%)

10

Prevalence

• Proportion of individuals in a population who have the disease (or condition) at a given time

Number of cases • Prevalence = at a specified Number of persons time in population

11

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 403 Epidemiology for Data Users (EDU) Trainer‘s Manual

Prevalence

• Specified time can be a specific ‘point’ in time or a ‘period’ or ‘interval’ • Examples: – In 2007, the prevalence of overweight/obesity amongst women aged 15-49 in Zambia is 19.0% (DHS)

12

Only 10% of Zambian Women were classified as underweight in 2007

Nutritional Status of Women Aged 15-49 (DHS 2007)

10% 19% Underweight Normal Weight Overweight/Obese

71%

13

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 404 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence

• Measure of new cases of illness or occurrences of a condition

14

Speaker Notes: Death is an indicator that is always described as an incidence because it only occurs once

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 405 Epidemiology for Data Users (EDU) Trainer‘s Manual

Comparison of Incidence and Prevalence

• Prevalence = Incidence X Duration of illness

15

Incidence and Prevalence

Cured Incidence

Prevalence

Prevalence

Died / Lost to follow up 16

Speaker Notes: In this diagram, the water coming out of the spout is the number of new cases. The amount of water in the tub represents the prevalence of the disease. The water that goes through the drain represents people who have passed away due to the disease, or who have been cured.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 406 Epidemiology for Data Users (EDU) Trainer‘s Manual

Examples: HIV/AIDS- With the onset of health education and HIV/AIDS awareness programs there has been a decrease in the incidence of HIV/AIDS. In addition, the availability of ARTs has increased the prevalence of HIV/AIDS because they are allowing people to live longer.

Incidence vs Prevalence

1

2

3

4

5

Jan 2006 Dec 2006

17

Speaker Notes: Using our previous definitions of prevalence and incidence, what is the disease prevalence from January 2006 until December 2006, and what is the incidence during this same time period? Answers: Prevalence: 5 Incidence: 2

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 407 Epidemiology for Data Users (EDU) Trainer‘s Manual

Two Kinds of Incidence

• Incidence proportion –Uses calendar or clock time

• Incidence rate –Uses person-time

18

Two Kinds of Incidence

• Incidence proportion – Cumulative incidence – Case-fatality rate (ratio) – Most common rate used in applied public health

19

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 408 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence Proportion (Cumulative Incidence) • Proportion of people who develop the disease (new cases) over a specified period of time

Incidence = # of new cases over a specific period of time Proportion Population at risk during same period of time

21

Incidence Proportion (Cumulative Incidence) • Time period must be the same for both numerator and denominator

• Can be reported as a percent over a specified period of time

• Can be reported per 10n population over a specified period of time

22

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 409 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence vs Prevalence

Characteristic Incidence Prevalence

Numerator New Cases All Cases Denominator Susceptible All cases + (At risk) non-cases Time Duration Single point How measured Cohort Cross- sectional

23

Case Fatality Ratio

• Usually referred to as case fatality ‘rate’ • The proportion of persons with a particular condition who die from that condition • A measure of disease severity

Case fatality = Number of cause-specific ratio deaths among incident cases x 10n Number of incident cases

24

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 410 Epidemiology for Data Users (EDU) Trainer‘s Manual

Case Fatality Ratio

• Example: Of 11 women who developed breast cancer in Hospital A in 2006, two died

• 2 deaths/11 cases x 100 = 18.2%

25

Rates

• A rate is a ratio that expresses – the frequency (number) of events – that occur in a population at risk – over a specified period of time.

• TIME & RISK are intrinsic parts of the rate

26

Speaker Notes: What is missing in the following statement: The rate of HIV transmission in Zambia is 14.3% Answer: Year. You must specify the time period in which this rate occurred.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 411 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence Rate

• The incidence rate quantifies how many susceptible (at risk) individuals develop a specific outcome during a specific time interval – Incidence rate measures NEW cases per person time at risk

27

Estimated Annual Incidence of TB in High Burden Countries, 1999

Cases Country (thousands)

1. India 1,847 2. China 1,300 7. Philippines 234 8. S. Africa 197 10. Vietnam 149 13. Brazil 118 15. Kenya 123

28

Speaker Notes: Ask question regarding whether rates or absolute numbers tell a better story. Discuss relevance to policy makers and program managers, etc.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 412 Epidemiology for Data Users (EDU) Trainer‘s Manual

Estimated Annual Incidence of TB in High Burden Countries, 1999

Cases Population Incidence Country (thousands)(thousands) Rate x105

1. India 1,847 998,056 185 2. China 1,300 1,266,838 103 7. Philippines 234 74,454 314 8. S. Africa 197 39,900 495 10. Vietnam 149 78,705 189 13. Brazil 118 167,988 70 15. Kenya 123 29,549 417

29

Speaker Notes: Note how India has the highest number of cases, but South Africa has the highest incidence. It is very important to compare the number of cases to the total number of people at risk.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 413 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence Rate Issues

• Loss to follow-up: Individuals drop out of the study, move away, or become un-contactable

• When someone develops the condition, they can no longer be in the denominator

• Competing risks: Individuals may die from other causes

30

Speaker Notes: Loss to follow-up: If you are doing a longitudinal study of a condition, this is a huge issue. You must ask why people are dropping out of the study, and if those who are dropping out are similar. For example, if you are looking at a study about a new medication you may ask if the drug regimen was too difficult, or if the drug is causing many side effects.

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 414 Epidemiology for Data Users (EDU) Trainer‘s Manual

Incidence Rate

• How rapidly does an event occur in a susceptible population?

Incidence rate = Number of new cases Total follow-up time among people at risk

• Incidence rate is expressed as “events per unit of follow-up time” 31

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 415 Epidemiology for Data Users (EDU) Trainer‘s Manual

Some Important Rates

• Morbidity: measure rate of illness/ time • Mortality: measure rate of death/ time – Crude mortality rate: the mortality rate from all causes of death for a specified population over a specified period of time (usually a year) – Cause-specific mortality rate: the mortality rate from a specified cause for a specified population (usually a year) – Age-specific mortality rate: a mortality rate which limits both the numerator and denominator to a particular age or age group (usually a year) – Infant mortality rate: a ratio expressing the number of deaths among children under one year of age reported during a given time period divided by the number of births reported during the same time period (usually a year).

32

Infant Mortality Rate

# of deaths among infants <1 year of age (defined place and time period) X 1000 # of live births (same place and time period)

33

Speaker Notes: Infant mortality rate is a standard public health measure. According to the 2007 DHS, Zambia‘s annual infant mortality rate was 70 deaths per 1,000 live births

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 416 Epidemiology for Data Users (EDU) Trainer‘s Manual

Other Ratios

• Odds Ratio – Measure of association – Odds of exposure to a risk in the diseased population compared to the odds of exposure in the non-diseased population – Used as an estimate of risk ratio when exposure and disease data measured at the same time • Relative Risk/Risk Ratio – Measure of association – Compares the incidence of an exposed population developing a disease to the incidence of the non- exposed population developing the same disease

34

Exposed vs Diseased

Diseased Non-diseased

Exposed A B

Not Exposed C D

35

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 417 Epidemiology for Data Users (EDU) Trainer‘s Manual

Odds Ratio

Odds that an exposed person develops disease Odds that a non-exposed person develops disease

= A/B = AD = C/D BC

Ratio of the odds that the cases were exposed to the odds that the controls were exposed 36

Relative Risk

Risk of disease in exposed Risk of disease in non-exposed

A/(A+B) Incidence in exposed = = Incidence in non-exposed C/(C+D)

Ratio of the risk of disease in exposed individuals To risk of disease in non-exposed individuals

37

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 418 Epidemiology for Data Users (EDU) Trainer‘s Manual

Interpretation

When odds ratios and relative risks are less than 1.0, then the exposure has a protective effect.

(e.g. OR=0.7, Use of bed nets in malaria endemic area decreases risk of malaria by 30%)

Reporting

• Prevalence is a percent and unit-less – The prevalence of diabetes is 9% • Incidence must have a time component – The annual heart disease death rate in NYC was 282.6 per 100,000 in 2004 • Relative Risks/Odds Ratios are unit-less, comparing one group to another group – Homeless people are 4 times more likely to get TB as non-homeless people

39

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 419 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Objectives

At the end of this lecture, you should be able to:

• Describe the difference between types of ratios (i.e. proportions, rates, rate ratios) and give an example of each • Define and calculate prevalence and incidence • Identify and define different mortality rates • Define a rate ratio

40

Session XIV: CSO Breakout Session – Ratios, Proportions and Rates 420 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XV: CSO BREAKOUT SESSION – BASIC STATISTICAL ANALYSES AND

INTERPRETATION

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 421 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Can you give any additional examples of continuous and categorical variables?

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 422 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 423 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: An example of a bivariate analysis would be the relationship between low birth weight and infant death. Low birth weight is the independent variable and infant death is the dependent variable. In this example the hypothesis is that low birth weight is associated with infant death.

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 424 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 425 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 426 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Null Hypothesis: There is no association between the 2 variables Alternate Hypothesis: There is an association between the 2 variables

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 427 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 428 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 429 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 430 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 431 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: The 2 values of X and  represent the means at 2 time points The 2 values of S represent the standard error The 2 values of n represent the number of values

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 432 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes:

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 433 Epidemiology for Data Users (EDU) Trainer‘s Manual

When you run a t-test the output you will be given is called the t-statistic. The important number to look at is the p-value. In the above example the p-value is 0.59. Is it significant? T-tests can be calculated using programs such as Epi Info, SPSS, Stata, and SAS.

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 434 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Is there really an association between eating ice cream and getting sick? –Find out by using a chi-square test and a p-value

Speaker Notes:

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 435 Epidemiology for Data Users (EDU) Trainer‘s Manual

This is an example of a cohort study because it assess disease status based on exposure.

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 436 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 437 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Calculate the chi-square statistic for the example above

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 438 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 439 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: In other words, if you took 100 samples from the same population, 95 of them would fall within the range of 1.2 to 1.9

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 440 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 441 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Which confidence interval(s) are statistically significant?

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 442 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes:

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 443 Epidemiology for Data Users (EDU) Trainer‘s Manual

In previous slides we noted that a wider CI indicated a less precise figure. Why might the CI for those who consume 25+ cigarettes a day be so wide?

Session XV: CSO Breakout Session – Basic Statistical Analyses and Interpretations 444 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XVI: CSO BREAKOUT SESSION – SAMPLING

Session XVI: CSO Breakout Session – Sampling 445 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XVI: CSO Breakout Session – Sampling 446 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: What are some advantages of using a sample versus surveying the entire population?

Speaker Notes: What is the study universe for the given examples?

Session XVI: CSO Breakout Session – Sampling 447 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XVI: CSO Breakout Session – Sampling 448 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: When a sample is too small any difference that is detected could be due to chance.

Speaker Notes: When generalizing the findings of a study, the results should only be generalized to people who are similar to those included in the study. For example, if you do a study of

Session XVI: CSO Breakout Session – Sampling 449 Epidemiology for Data Users (EDU) Trainer‘s Manual poverty amongst women aged 15-49 in Lusaka, Zambia, you cannot use the results to make generalizations about women living in poverty all around the world.

Speaker Notes: Confidence – want it to be really high if investigating whether a toxic cancer treatment is better than another treatment

Session XVI: CSO Breakout Session – Sampling 450 Epidemiology for Data Users (EDU) Trainer‘s Manual

but can compromise if you‘re trying to determine whether eating 2 bananas improves health more than eating one

Power – want it really high if you‘re investigating whether cardiac deaths increased after polio vaccination in Nairobi in 1947 but can compromise if you‘re investigating whether obesity is associated with TV-watching

Speaker Notes: This formula will give you the ideal number of participants needed to detect a difference between two means. However, getting a sample size as large as the ideal may not always be possible. Sometimes there aren‘t enough resources to acquire a large enough sample size. An additional consideration is that the disease or condition of interest may be rare and there aren‘t enough people with the disease within your region.

Session XVI: CSO Breakout Session – Sampling 451 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Main advantage –probability of selecting certain person is known, thus may more closely represent population.

Session XVI: CSO Breakout Session – Sampling 452 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XVI: CSO Breakout Session – Sampling 453 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: Stratified random sampling ensures that populations that were not captured in other sampling methods to be captured.

Session XVI: CSO Breakout Session – Sampling 454 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: These methods of sampling are often used when there is limited time and resources. Although these methods are not ideal, they will still generate results that could be very helpful. Can you give examples of these different sampling methods? Examples: Purposive-An example of a purposive study of crime in Lusaka would be to use a sample composed of prisoners in a single Lusaka prison. This study would be weak for several reasons: 1) not all people who commit a crime are sent to prison, 2) there are several prisons in Lusaka so you are not capturing the full range of crime Convenience- In a study of HIV/AIDS in Zambia, a convenience sample would be gathering participants from a single health center in Ndola. Weaknesses: The prevalence of HIV/AIDS in Ndola is lower than in other areas in Zambia, not everyone who has HIV/AIDS goes to a health center to receive care, that single health facility may attract a sample of people that are not representative of the entire population

Session XVI: CSO Breakout Session – Sampling 455 Epidemiology for Data Users (EDU) Trainer‘s Manual

Snowball-In a snowball study of sex workers in Lusaka province you could ask each sex worker you contact to bring in other sex workers he/she knows. Snowball studies are often used when the study involves sensitive topics that have legal implications or they involve behaviors that are not socially/morally acceptable.

Speaker Notes: Other programs you can use: SAS: Solutions  Analysis  Analyst  Statistics  Sample size (select type of study) Stata: (just open program and type [help menu is also good]) sampsi prop. exposed in cases, prop. exposed in controls, power( ), r( ) example: sampsi .45 .20 p(.80) r(2)

Session XVI: CSO Breakout Session – Sampling 456 Epidemiology for Data Users (EDU) Trainer‘s Manual

Session XVI: CSO Breakout Session – Sampling 457 Epidemiology for Data Users (EDU) Trainer‘s Manual

SESSION XVII: WRITING CLEAR, COMPELLING AND ACCURATE REPORTS

Writing Clear, Compelling, Accurate Reports

That tell the story of your data

Description of Session: This session will describe how to put together a report that tells a story. It will help participants understand when, what and how to present their data in order to be effective.

Facilitator background information: Decision making is the performance of a task, that of making a particular decision. Senior government officials, either at the district, provincial or National level are required to make decisions as part of their day to day tasks. Such tasks usually involve making choices between a numbers of possible courses action. Therefore, decision makers need to be presented with large amounts of data, and categories of data that describe various situations that may influence their decisions. That is why it is important that data is analyzed and the findings presented to decision makers in a clear, accurate and compelling manner so as to enable them make effective decisions. If presented wrongly, the data may result in wrong and costly decisions being taken.

Session XVII: Writing Clear, Compelling, Accurate Reports 458 Epidemiology for Data Users (EDU) Trainer‘s Manual

Objectives

By the end of this session, participants will be able to… • Identify words that have statistical meaning – Use them correctly and appropriately • Describe in words: trends and patterns seen in graphs, charts, tables • Select key findings to highlight in reports • Describe key sections of reports and what information should be in each • Document limitations of your conclusions • List 2 ways of writing more effectively

At the end of this session participants will be able to: 1. Identify words that have statistical meaning and determine when and how to use them in an effective manner 2. Describe patterns and trends seen in graphs and charts using words 3. Understand how to select key findings to highlight in reports 4. Understand what the key sections of a report are and what the content of each of those sections should be 5. Understand how to describe the limitations if the information presented 6. List two ways of being a more effective writer

Session XVII: Writing Clear, Compelling, Accurate Reports 459 Epidemiology for Data Users (EDU) Trainer‘s Manual

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Writing clear, Lecture Copies of the compelling, presentation accurate reports. for each participant

Blank flipchart paper and markers

Data for decision-making

• Data are needed for decision-making • Decision-makers – May not understand data – Do not have the time to synthesize data • Assist them – Ask relevant questions – Collect and/or extract relevant data – Summarize, display and analyze – Guide them through relevant results – Highlight most important results – Suggest action

Speaker Notes: As we know at the central level and in the field, decisions makers or your boss may not have the background or time to synthesize raw data. You need to succinctly highlight the most important results so you don‘t get lost.

Session XVII: Writing Clear, Compelling, Accurate Reports 460 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator Background Information: The purpose of collecting data is to obtain information to keep on record and to help in making decisions about important issues. Thus data has to be analyzed first before it can be presented. The purpose of data presentation is COMMUNICATION. Any presentation of data that you make—whether in the form of tables or graphs or in some other form - is only useful and effective to the extent that it communicates to your target audience what you intended for it to communicate. Ask yourself the following question ―Did your message get through? Was the data understood, accurately and efficiently?‖ It‘s all about the data being able to tell the story!!

How do we make them listen to us?

• Show them the data

• Explain what it means…

Clearly, concisely, simply, and appropriately for the audience

Speaker Notes: Prompt Audience: How do we make them listen to us? It‘s important to explain information clearly and simply. You don‘t want to explain something at a PhD level to people with a 7th grade education. Similarly, you don‘t want to explain something at a 7th grade level to people with a PhD education. It is important to know your audience to present your data appropriately.

Session XVII: Writing Clear, Compelling, Accurate Reports 461 Epidemiology for Data Users (EDU) Trainer‘s Manual

Facilitator background information: The best way to show the data is by using visual representations in the form of graphs, charts, and tables. Before you decide how to present the data, take time and think carefully about what you want to say. You must first know what it is that you want to communicate. This will determine whether you will use a graph, a chart, a table or just text. If you can communicate your message clearly, efficiently, and with the desired impact in a simple sentence, do just that. If your message needs the use of a table, then that's what you use. Point to remember is that always make sure you use the right type of graph!

Would you give the Minister this table on TB in Lusaka?

Speaker Notes: Prompt Audience: What if the minister of health asked you what the situation is of TB in the clinics? Would you give this to minister? Why not? The minister might not have the time go through all of this! This is too much information! The main point is not discernable and it needs summarizing.

Session XVII: Writing Clear, Compelling, Accurate Reports 462 Epidemiology for Data Users (EDU) Trainer‘s Manual

How about this?

Speaker Notes: Prompt Audience: How about this? Would this be something the minister could digest? The minister and easily see that there appears to be… (Add ―The prevalence of fever…‖ from slide 11) Can anyone identify the type of graph this is? Answer: Grouped bar chart (malaria) Prevalence of fever, treatment and prompt treatment appear to have declined This graph shows the change/difference in three indicators between 2001-2002 (in light grey) and 2007: the percentage of patients:  with fever  with fever treated with antimalarials  with fever receiving prompt treatment with antimalarials.

Session XVII: Writing Clear, Compelling, Accurate Reports 463 Epidemiology for Data Users (EDU) Trainer‘s Manual

Or this?

Measles cases, Zambia 2010

4 Apr 4

6 Jun 6 7 Feb 7 Mar 7

5 Sep 5

2 May 2 9 May 9 Aug 1 Aug 8

11 Jul 11 4 July 4 Jul 18 Jul 25

10 Jan 10 17 Jan 17 Jan 24 Jan 31 Apr 11 Apr 18 Apr 25

<4 Jan <4

14 Feb 14 21 Feb 21 Feb 28 Mar 14 Mar 21 Mar 28 Jun 13 Jun 20 Jun 27

12 Sep 12

15 Aug 15 16 May 16 May 23 May 30 Aug 22 Aug 29 Week ending date/2010

Speaker Notes: Would you give the minister this? Maybe, maybe not. The thing about this one is that you can see exactly what is going on in this is graph. You can see there appears to be rise in measles in…and decline in… What kind of graph is this? Still a bar chart, but a bar chart called an epidemic curve. Can anyone identify the type of graph here? Bar chart. Specifically it‘s an epidemic curve. Cases of measles appears to have increased steadily, peaking at >800 @ week 19/07/2010-25/07/2010 and then appears to have dropped dramatically after vaccination campaign For this one, we can see easily that the number of measles cases changed over the time period and when the Measles SIA (supplementary Measles immunization activity - lightning strike) occurred.

Session XVII: Writing Clear, Compelling, Accurate Reports 464 Epidemiology for Data Users (EDU) Trainer‘s Manual

Summarizing graphs

• Line listing (table) – Main point not discernable, needs summarizing • Bar chart (malaria) – Prevalence of fever, treatment and prompt treatment appear to have declined • Bar chart (measles) – Appears to have increased steadily, peaking at >800 @ week 19/07/2010-25/07/2010 – Appears to have dropped dramatically after vaccination campaign

Note: In the June 15, 2011 presentation, the above slide was skipped. Facilitator Background Information: Display one of the graphs in the presentation. Ask the participants to describe what the visual representation of the graph is. Building on the responses from the participants, use the power point above to explain the descriptions. See below further information which you may use. Tables and graphs are both ways to organize and arrange data so that it is more easily understood by the viewer. It is important to know, not only how to create, but also to interpret them as they are used in many important areas such as helping people in decision making. There are different types of graphs like line graphs, bar graphs, pictographs, etc they are used to indicate how much something has either dropped or gotten higher during a period of time or how much greater something is than something else. Whichever graph you use, you need to provide a summary description of what it seeks to depict .

Session XVII: Writing Clear, Compelling, Accurate Reports 465 Epidemiology for Data Users (EDU) Trainer‘s Manual

Why use “Appear to be”?

• Changes in data may be due to a variety of factors • You can only say there is a difference or change in two numbers if you did a statistical test and results were statistically significant (p<0.05) • If a statistical test was NOT run, you do not know whether results were statistically significant or just by chance – Report data and certainty honestly

Speaker Notes: If you have not applied a statistical test to your data, you must always preface a presentation of your data with ―Appears to be‖. P<0.05 is most commonly used (presenter may add ―generally‖ before ―p<0.05‖ in bullet #2. *Should discuss factors that may affect data such as increases or decreases in reporting, new testing initiatives, change in staffing etc.

Facilitator Background Information: Ask the participants: Are the changes observed on the graphs real? Can you say for certain that there has been a change between two values? Answer is no. why? Because no statistical test has been conducted to prove that the change is real and not just by chance. So we use the word ‗appear to be‘ because we are not sure.

When a change is said to be statistically significant, it simply means that you are very sure that the occurrence is reliable. It doesn't mean the finding is important or that it has any decision-making value. Significance is a statistical term that tells us how sure

Session XVII: Writing Clear, Compelling, Accurate Reports 466 Epidemiology for Data Users (EDU) Trainer‘s Manual we are that an observed difference or relationship really exists. It is determined by doing a statistical test to compute the P – value.

Examples of words to use when no statistical tests were conducted

• Appears to… • Suggest a possible… • Seems to…

Others?

Use if no statistical tests were conducted

Speaker Notes: Prompt Audience to think of other terms to use (for example, may indicate)

Examples of key words that denote statistical significance • Lower • Higher • Change • Greater • Fewer • Less • More • Decline • Increase

Others?

Use only if you’ve done a statistical test (p<0.05)

Speaker Notes: These are words you hear when results are described all the time. Reserve these words for instances where statistical significance was done. Any other words?

Facilitator Background Information:

Session XVII: Writing Clear, Compelling, Accurate Reports 467 Epidemiology for Data Users (EDU) Trainer‘s Manual

In simple terms, statistical significance implies that the likelihood that the results of any finding are caused by other factors and not by chance. There are broadly three elements of statistical significance: o It shows the levels of confidence with which we can state the relationships or differences we have established o It shows us the level of chance that we are wrong o It shows us that the relationship or differences we have just established is true Significant level is commonly set at the level of less than five per cent (5%) with a probability of less than 0.05, i.e. P < 0.05 implies that findings are likely to be 95% accurate .

Remember this malaria graph?

Speaker Notes: Here is the same graph with an indication that a statistical test was performed. What statistical test can be performed to compare two proportions? Answer: Chi-squared. What does a p-value of less than 0.05 mean in this case? Answer: the difference is statistically significant and unlikely to be due purely to chance. It is difficult to determine if the differences are statistically significant just by looking at the graph.

Session XVII: Writing Clear, Compelling, Accurate Reports 468 Epidemiology for Data Users (EDU) Trainer‘s Manual

Statistical tests are frequently used to distinguish between cases where changes occurred due to chance and cases are meaningful. However, these statistical tests cannot tell us what caused these changes. A statistically significant difference does not necessarily imply causality. For example, say we use a statistical test and find a statistically significant difference in the polio prevalence between two populations, one of which received vaccines and one of which did not. These results provide strong evidence that the vaccine reduced the polio prevalence. However, it is possible that other external factors played a role in the decrease. Thus, based on these results, we cannot conclude that the vaccine caused the decrease in the polio prevalence.

Reporting results more definitively

Before we did a statistical test we said:

“The prevalence of fever, treatment and prompt treatment among under-five year- olds appeared to have dropped from 2001/2 to 2007.”

Facilitator Background Information: Emphasize to the participants that: We cannot state for certain that the prevalence dropped because we have not conducted a test to determine that the observed drop was not just a chance occurrence. Unless there is statistical significance, we can only say that it ―appeared to drop‖ since we are not sure. o Remember that results with no statistical significance are inconclusive.

Session XVII: Writing Clear, Compelling, Accurate Reports 469 Epidemiology for Data Users (EDU) Trainer‘s Manual

A statistically significant comparison

* *

*statistically significantly different at p<0.05 Speaker Notes: Here is the same graph with an indication that a statistical test was performed. What statistical test can be performed to compare two proportions? What does a p-value of less than 0.05 mean in this case? [answer = chi-squared; the difference is statistically significant and unlikely to be due purely to chance].

Session XVII: Writing Clear, Compelling, Accurate Reports 470 Epidemiology for Data Users (EDU) Trainer‘s Manual

Reporting results more definitively

Now we can say: “The prevalence of fever and prompt treatment with antimicrobials among under-five-year- olds declined statistically significantly from 2001/2 to 2007. While treatment with antimicrobials (at any time) also appeared to decline during this time, the change was not statistically significant.”

Facilitator Background Information: In agreeing with the correct responses and building on the participants‘ responses, the point to emphasize here is that: The statistical significance of P < 0.05 the probability that the observed differences are simply a chance process is less than this threshold of significance. We can, therefore, say the results are not due to chance and state for certain that prevalence dropped.

Session XVII: Writing Clear, Compelling, Accurate Reports 471 Epidemiology for Data Users (EDU) Trainer‘s Manual

Think about findings

• Does it make sense that the percentage of cases being treated promptly would decline while prevalence declined?

• What are some explanations for this?

• Needs further exploration before reporting

Speaker Notes: Ask the audience for some explanations before putting up the second main bullet. Add their answers (the ones not on this slide) on the board/flipchart. Some possible explanations if none are brought up:  Error in data collection  Ns are small  Real decline – those few still getting it are rural and can‘t find treatment  Others?

The last bullet can be elaborated by saying ―before reporting, find out what‘s really happening. Ask others, especially people who may have collected these data and/or may be malaria experts‖. *NOTE the participant version of this presentation- in their manual - should not include the second and third bullets, but the presenter PowerPoint and the presenter manual SHOULD have all bullets.

Session XVII: Writing Clear, Compelling, Accurate Reports 472 Epidemiology for Data Users (EDU) Trainer‘s Manual

Key findings and distractor data

• Key findings – Answer a particular question – Should be highlighted (or the only thing) in graphics – Should be highlighted in results text • Distractor data – Distract from main point – Should be deleted from graphics (including tables) – Should not be mentioned in results text

Speaker Notes: Now that we have thought about the findings and recorded any important caveats regarding data collection, we want to know what the key findings are. Anything that doesn‘t inform your question or add context, should be removed. If you want to discuss malaria, only talk about malaria and not other diseases. Similarly, if you want to discuss the prevalence of malaria, do not discuss vaccination rates. For example using 3d in graphs, they are a distracter! They should not be added. Facilitator Background Information: o The key findings or results should be presented objectively interpretation and in an orderly and logical sequence. Use text, tables, graphs and charts. Always begin with text, reporting the key results and referring to your figures and tables as you proceed. o The Results should be presented around properly sequenced Tables, graphs and charts that should be sequenced to present your key findings in a logical order.

Session XVII: Writing Clear, Compelling, Accurate Reports 473 Epidemiology for Data Users (EDU) Trainer‘s Manual

o The results should be crafted to highlight the evidence needed to answer the questions of interest to your audience. Any data/ findings that have no relevance to your questions of interest constitute distracters and should be avoided. o Write the text of the Results section concisely and objectively.

What’s the main point of this graph?

HIV prevalence among 15 – 49 year olds in province X in the years 2001– 2010 80

60

% 40

20

0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

source: Ministry of Health annual report 2010

Speaker Notes: Can you identify this graph type? Change over time: more dramatic 2002-2004, stabilizing 2009-2010. Note it is not known whether this is statistically significant. Can anybody tell what is the main point? Can you tell what is going on? There appears to be a change over time here, most dramatic over 2002 and 2003 and a leveling off over the last five years. …We don‘t know if there have been statistical tests!

Session XVII: Writing Clear, Compelling, Accurate Reports 474 Epidemiology for Data Users (EDU) Trainer‘s Manual

Give two main points of this epidemic curve

Measles cases, Zambia 2010

4 Apr 4

6 Jun 6 7 Feb 7 Mar 7

5 Sep 5

2 May 2 9 May 9 Aug 1 Aug 8

11 Jul 11 4 July 4 Jul 18 Jul 25

10 Jan 10 17 Jan 17 Jan 24 Jan 31 Apr 11 Apr 18 Apr 25

<4 Jan <4

14 Feb 14 21 Feb 21 Feb 28 Mar 14 Mar 21 Mar 28 Jun 13 Jun 20 Jun 27

12 Sep 12

15 Aug 15 16 May 16 May 23 May 30 Aug 22 Aug 29 Week ending date/2010

Speaker Notes: Prompt Audience: Can you give two main points that this graph is trying to illustrate? Two Main points: there appears to be a rapid rise and then the trend appears to decrease after a specialized campaign. What else do you see in here that might be interesting to talking about? Lab confirmed. Perhaps, but after a confirmed measles outbreak, the general procedure is to stop doing lab work and only focus on symptomatic cases. We might decide to leave out the colors to help people focus on the rise and decline of epi cases. Points to consider – rise and fall of epi linked cases and timing relative to vaccination campaign. Also available, lab confirmed, compatible, discarded, pending – do we discuss this? NO – to convey the main two points, we do not need the colors and can delete them.

Session XVII: Writing Clear, Compelling, Accurate Reports 475 Epidemiology for Data Users (EDU) Trainer‘s Manual

What kind of figure is best?

• Describe trends in HIV prevalence in pregnant women from 2000-2010 • Compare condom use between rural and urban residents of Zambia • Describe characteristics of the population of your province • Display the proportion of HIV infections in your province among women, men and children

Speaker Notes:  Line graph  Bar chart – clustered  Table  Pie chart

Facilitator Background Information: Emphasize the following to the participants: o Using Graphics to Display Data aims at communicating ideas with clarity, precision, Graphical Displays should  Serve a clear purpose  Reveal different levels of detail in data  Show/reveal the data  Avoid distorting the data  Provoke the viewer to think about key messages  Encourage the eye to compare different pieces of data  should be matched with descriptions of data

Session XVII: Writing Clear, Compelling, Accurate Reports 476 Epidemiology for Data Users (EDU) Trainer‘s Manual

Writing reports

• Use a standardized format for similar, recurring reports so that it is easy to compare data across reports

• Sections of a standardized report form you will use for Epi-profiles follow

Speaker Notes: Prompt Audience: What is a report?

Facilitator Background Information: Briefly preview what is going to be covered in this section by telling participants that the section will provide a brief overview of what a report is and will outline a general format for a report structure. Tell the participants that the section is going to begin with a getting an understanding of the participant‘s knowledge of reports and that everyone needs to contribute as they might have written reports at one time or another in their work or studies. Ask the participants what they think a report is. Collect several responses in an interactive way and if possible write them on a flip chart. Building on the participants‘ comments, direct the conversation towards the definitions of a report, Draw on participants‘ previous answers to develop a common understanding of the definition of a report among the entire group. Use the power points and notes below to help facilitate the discussion.

Session XVII: Writing Clear, Compelling, Accurate Reports 477 Epidemiology for Data Users (EDU) Trainer‘s Manual

What is a report? A report may be defined as a written presentation of facts and findings, and is usually a basis for recommendations. It is written for a specific audience and is usually intended to be kept as a permanent record of events. So, there are three aspects that we need to understand clearly before we can start writing our report. These are: 1. Purpose - why are you writing the report? 2. Audience - Who is going to receive and read the report? Make sure you know your audience before preparing your report. Is it a scientific or professional? Lay public, informed lay audience? 3. Information - What should the report contain? Make sure it meets the needs of your audience and speaks to them in a way they understand, and you avoid acronyms/jargon that they won‘t understand and/or will make them feel unintelligent. Knowing these aspects of our report will help us to communicate more clearly and concisely.

Key sections in reports

• Introduction (includes objective)

• Methods (includes data sources)

• Results/findings

• Recommendations (subsections: conclusions, limitations, discussion, recommendation)

• Annexes* (detailed tables, graphs, instruments)

*will not be discussed further

Speaker Notes:

Session XVII: Writing Clear, Compelling, Accurate Reports 478 Epidemiology for Data Users (EDU) Trainer‘s Manual

Make sure you know your audience before preparing your report. Is it a scientific or professional? Lay public, informed lay audience? Make sure it meets their needs and speaks to them in a way they understand, and you avoid acronyms/jargon that they won‘t understand and/or will make them feel unintelligent. We‘re going to talk about all of this except Annexes.

Facilitator Background Information: Ask the participants to outline the sections of a report. One participant should only mention one section. Brainstorm with the participants the sequence of the sections they have mentioned and what information they contain, starting with which section comes first until the last section. Building on the responses, use the power point to briefly outline the format of a report. You may select one participant who provided a correct answer and indicate that the format provided was in line with your presentation. Explain to the participants that reports are an organized form of writing and follow a specific format. All reports need to have a formal structure, as outlined on the PowerPoint, and must be planned so that material is presented in a logical way using clear and concise language for effective communication. Indicate that in the slides to follow, you will discuss the most common structure of reports and examine each step in report writing process.

Session XVII: Writing Clear, Compelling, Accurate Reports 479 Epidemiology for Data Users (EDU) Trainer‘s Manual

Introduction section of a report

• Background & context to understand report – Population distribution – Geography, including topography – Relevant history of disease/health status – Healthcare resources • Objective of report – Questions it aims to answer – Policy, implications of answering question

Two to four short paragraphs and one map

Speaker Notes: • Topography that might impact health in your region • Healthcare resources: clinics and hospital • Objective should be two to three sentences and background: should be two to four paragraphs

Facilitator Background Information: Prompt the participants to give examples of what may be included in the background section of a report. After several responses, display the PowerPoint . Build on the participants‘ responses to lead the discussion on the introduction section. You could consider using the background notes below. The introduction section is the beginning of your report. Generally, the introduction provides the background and explains the scope of the report in terms of what you are going to cover and why (i.e. the objective of the report). In other words, it is here that the subject which you are about to present in the report is put in context.

Session XVII: Writing Clear, Compelling, Accurate Reports 480 Epidemiology for Data Users (EDU) Trainer‘s Manual

Methods section: How Data were…

• Collected – Data sources – Methods of data collection (if applicable) – Dates/locations of data extraction/collection • Managed – Managed (database, software) – Organized and Displayed – Summarized – Analyzed (statistical tests)

As long as needed, generally 2-5 paragraphs

Speaker Notes: This section can be as long as needed (if there are long methods) but should usually stick to 2-5 paragraphs. The point here is to be complete so people can replicate your findings, so you need to make sure you are keeping your methods detailed enough for replication. Avoid giving results here. Length depends on number of sources, complexity of collection and of analysis, 3-8 paragraphs. Routine administrative reports may not need an extensive methods section. This section may be more detailed for a research project or large scale report.

Facilitator Background Information: In an earlier session, there was a discussion of data sources for M&E. Possible sources are: o Routine sources - provide examples o Periodic surveys - provide examples Explain to participants that reports contain data that is collected through any of these sources. Using the PowerPoint, indicate that the methods section details the processes through which data used for the report was collected and managed.

Session XVII: Writing Clear, Compelling, Accurate Reports 481 Epidemiology for Data Users (EDU) Trainer‘s Manual

There are various methods through which data for use in reports is collected. These may include the following o Administrative or routine - data collected as part of day to day operations. E.g. patient record o Periodic surveys – data collected at intervals. E.g. census

Results and Findings section

Describe results of data summary & analysis – Figures (tables, charts, graphs, maps) – Each figure should have 1 SOCO – Describe main point of each figure in text in order of presentation of figures – Report results only of analyses described in methods

Three to five paragraphs and 3 figures for a manuscript, more for a report.

Speaker Notes: Here‘s where you get you use all those figures and charts! Remember: pictures are worth a thousand words! You can use graphs chart to cut down on your descriptions. Don‘t report any analysis that you have described in methods! If you need to, go back and add to methods. Create your questions before you create your report! Avoid describing methods here Length As many figures as needed Brief text with overall numbers (―Ns‖) Brief text describing SOCO for each figure

Facilitators‘ background information

Session XVII: Writing Clear, Compelling, Accurate Reports 482 Epidemiology for Data Users (EDU) Trainer‘s Manual

Start this section by reminding the participants about the three aspects of a report discussed earlier. These are: Purpose, Audience and Information. Ask them which of these, is important in this section? ALL!!!!! Ask participants what they think should be included in the results section

The purpose of a results section is to present and illustrate your findings. The length of this section is dependent on the amount and types of data to be reported. Be concise, using figures such as graphs, charts and tables, where necessary, to present results most effectively.

Recommendation section’s subsections • Conclusion • Limitations • Discussion • Recommendations

Facilitator Background Information: Ask the participants why a report should contain recommendations? Hint: One reason for M&E is to make available data for decision making. So, remember that reports are written for a purpose. Recommendations represent the essence of the report‘s work! The recommendations section of a report is very important because it proposes the actions (decisions) to be taken based on the data that was collected, analyzed and

Session XVII: Writing Clear, Compelling, Accurate Reports 483 Epidemiology for Data Users (EDU) Trainer‘s Manual presented in the report. It needs to be actionable, specific and make sense as a solution to the problems detailed in the report. Points to emphasize to the participants are the following: o The recommendations section should flow logically and directly from information contained in the main body of the report. o List the recommendations in a numbered or bulleted format. If possible, list them in order of priority to enable decision makers to know which ones need attention first. o Write them as one-sentence statements, starting with actionable verbs and using concise language. Remember, to recommend means that there will be some action needed, so starting with a verb reinforces this.

Conclusion

• Restate main finding(s) – In first paragraph – Concisely and directly

• State conclusion (directly from findings) – Avoid adding other thoughts or results of analyses not included in Results section

One to three sentences only

Speaker Notes: Conclusion should restate main findings. My advice is go thru results and pick what tells you the most and answers your question the best. The first couple sentences should be really concise. It is not time to add other thoughts – stick to the objectives and results when stating conclusion. 1-3 sentences only. Sometime you need to choose what is the most important here.

Session XVII: Writing Clear, Compelling, Accurate Reports 484 Epidemiology for Data Users (EDU) Trainer‘s Manual

Limitations

• Understand how your data were collected, analyzed, reported • Keep on-going list of possible limitations • Be comprehensive and harsher on yourself than your supervisors would be • List limitations frankly, comprehensively and without apology

One to two paragraphs OR within discussion

Speaker Notes: You have to admit your limitations. In order to do that you need to understand… Be a little pickier about your limitations than others would be! Every exercise, every study has limitations! You need to admit it so people can consider the impact of the results you have found. Ask for assistance from someone who doesn‘t know your program

Facilitator Background Information Limitations of the study/report can be said to be like a disclaimer for the report. Here you tell the reader what you could not investigate, what problems faced in the process, or what factors might have affected your findings. Examples: incomplete data, inadequate time allocation to do the research, lack of finances etc

Session XVII: Writing Clear, Compelling, Accurate Reports 485 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example of limitation

These findings are limited by the fact that SmartCare is not yet deployed in all health centers and may be underutilized in others

Speaker Notes: This is saying we have really great finding but it doesn‘t cover all health facilities and may be incomplete!

Discussion

• Discuss how results relate to expected, other geographic areas, funding allocated, and/or existing literature

• Make recommendations – Change funding, improve coverage, target populations… – Further analyses to answer questions

Generally four to five paragraphs

Speaker Notes: Depending on size and purpose of reports: altogether recommendations section can be four to eight paragraphs. This is where you can take a little more leeway in why you found what you found, why it differs from what you expected and what can be done about it.

Session XVII: Writing Clear, Compelling, Accurate Reports 486 Epidemiology for Data Users (EDU) Trainer‘s Manual

Speaker Notes: I want you to help me figure out what section of the report this is from.  Conclusion. May look like it comes from results section, but this is what you picked out as the most important info from you results  Limitations. This is giving a caveat to the main conclusion we‘ve given. This is an example of lumping it in your description!  Discussion. This is where we are comparing our findings to findings elsewhere.  Recommendations. This is your chance to suggest action! Note: In June 15, 2011 presentation, the presenter stated that this excerpt was from the Conclusion section.

Session XVII: Writing Clear, Compelling, Accurate Reports 487 Epidemiology for Data Users (EDU) Trainer‘s Manual

Writing concisely and simply

• Reduce unneeded words and phrases

• Avoid jargon and repetition

“Verbose is not a synonym for literary.”

--(Sin and Syntax)

Speaker Notes: Acronyms not spelled out are also considered jargon.

Example: Creating concise sentences

“This paper provides a review of the basic tenets of cancer biology study design, using as examples studies that illustrate the methodologic challenges or that demonstrate successful solutions to the difficulties inherent in biological research.”

Speaker Notes: This is very long! Can anybody tell me something about that sentence? Did anything stick with you? Nothing came through to me! No matter what your education level is, if it‘s wordy it‘s not coming through!

Session XVII: Writing Clear, Compelling, Accurate Reports 488 Epidemiology for Data Users (EDU) Trainer‘s Manual

How can we make text more concise?

• Find words and phrases that may be unnecessary

• Take them out to see if removing them changes the meaning

• If the meaning stays the same, take it out

Remove unneeded phrases

• in the event that • in the nature of • it has been estimated that • it seems that • the point I am trying to make • what I mean to say is • it may be argued that

Session XVII: Writing Clear, Compelling, Accurate Reports 489 Epidemiology for Data Users (EDU) Trainer‘s Manual

Use concise words instead of clunky phrases Clunky Concise

• A majority of most • A number of many • Are of the same opinion agree • At the present moment now • By means of by • Less frequently occurring rare

Use simple, instead of complex words Complex Simple • Assistance help • Utilize use • Numerous many • Facilitate ease • Individual man or woman • Remainder rest • Initial first • Implement do • Sufficient enough

Session XVII: Writing Clear, Compelling, Accurate Reports 490 Epidemiology for Data Users (EDU) Trainer‘s Manual

Example: Creating concise sentences

“This paper provides a reviews of the basic tenets of cancer biology study design, using as examples studies that illustrate the methodologic specific challenges or that demonstrate successful and solutions to the difficulties inherent in biological research.”

“This paper reviews cancer biology study design, using examples that illustrate specific challenges and solutions.”

Speaker Notes: • What unnecessary words or phrases can we cut out of this to make it clearer?

Session XVII: Writing Clear, Compelling, Accurate Reports 491 Epidemiology for Data Users (EDU) Trainer‘s Manual

WRAP-UP AND REVIEW

Summary

• We must synthesize data for decision makers

• Synthesize using statistically accurate language, concisely and simply

• State conclusions and limitations honestly and thoroughly

• Use standard reporting format

Learning objectives

Can you: • Identify words that have statistical meaning • Use them correctly and appropriately • Describe in words: trends and patterns seen in graphs, charts, tables • Select key findings to highlight in reports • Describe the key sections of reports and what information should be in each • Document limitations of your conclusions • List 2 ways of writing more effectively

40

Session XVII: Writing Clear, Compelling, Accurate Reports 492 Epidemiology for Data Users (EDU) Trainer‘s Manual

RESOURCE CENTER

Examples of Unneeded Phrases and Concise/Simple Words:

More examples of removing unneeded phrases

• for the most part • for the purpose of • in a manner of speaking • in a very real sense • in my opinion • in the case of • in the final analysis

More examples of using concise, instead of clunky phrases Clunky Concise • All three of the three • Fewer in number fewer • Give rise to cause • In all cases always • In a position to can • In close proximity to near • In order to to

Session XVII: Writing Clear, Compelling, Accurate Reports 493 Epidemiology for Data Users (EDU) Trainer‘s Manual

More examples of using simple, instead of complex words Complex Simple • Investigate study • Optimum best • Indicate show • Initiate start • Currently now • Facilitate help • Endeavor try • Ascertain find out

More examples of using simple, instead of complex words Complex Simple • Attempt try • Referred to as called • With the possible exception of except • Due to the fact that because • He totally lacked the ability to he couldn’t • Until such time as until • For the purpose of for

Session XVII: Writing Clear, Compelling, Accurate Reports 494 Epidemiology for Data Users (EDU) Trainer‘s Manual

More examples of using simple, instead of complex words Wordy Pointed in spite of the fact that although in the event that if new innovations innovations one and the same the same period of four days four days personally, I think/feel I think/feel personal opinion opinion refer back refer repeat again repeat revert back revert shorter/longer in length shorter/longer had been previously found had been found

Session XVII: Writing Clear, Compelling, Accurate Reports 495 Epidemiology for Data Users (EDU) Trainer’s Manual

SESSION XVIII: CREATING AN EPIDEMIOLOGICAL PROFILE

Creating an Epidemiologic Profile

Description of Session: This session will walk participants through the steps necessary to create an epidemiological profile. It will reflect back on what participants have learned over the past few days and put it to work in developing a report such as a fact sheet or epidemiologic profile to consolidate data analyses, share programmatic accomplishments and needs, and guide decision-making.

Learning Objectives:

Objectives At the end of this presentation, participants should be able to: • Apply course skills to develop an epidemiologic profile • Synthesize health data from Zambian data sources to highlight disease trends, services, and needs

Session XVIII: Creating an Epidemiological Profile 496 Epidemiology for Data Users (EDU) Trainer’s Manual

At the end of this session participants will be able to: 1. Apply the knowledge acquired in this course to develop an epidemiological profile 2. Understand how to synthesize health data from Zambian data sources to highlight disease trends, services and needs

Learning Activities at a Glance: Learning Content Method Material Handout/ Length Activity Appendix (min) 1 Developing an Lecture Copies of the Epidemiological presentation Profile for each participant

Blank flipchart paper and markers

What is an Epidemiologic Profile?

• A document that uses epidemiologic principles to clearly characterize the status of a disease in a population and the services available to address it • Involves extraction and summarization of data from key data sources – For a defined geographic region – Over a defined period of time • Provides analysis and interpretation of these data in text and figures • Produces a written report that concisely describes findings and provides recommendations

Session XVIII: Creating an Epidemiological Profile 497 Epidemiology for Data Users (EDU) Trainer’s Manual

Speaker Notes: Produce a written report concisely describing findings and offering recommendations supported by data discussed in the report We‘ll speak interchangeably of a ―summary report,‖ ―an epidemiologic profile,‖ and a ―factsheet.‖ The difference is a matter of coverage and comprehensiveness; the concepts we‘ll cover apply to each, regardless of what we call it. The report can be tailored to best address its intended uses. An epidemiologic profile draws together data from multiple sources to present as complete a description as possible of the status of a disease or health condition. The profile employs epidemiologic principles and measures using these data to clearly characterize the status of a disease or health condition. It is important to note that an epi profile/fact sheet can be developed for most diseases/health conditions and programs. In this session, we will use primarily HIV/AIDS example, as many of you work in this area and also because the most data are readily available on this condition to support your practicing the concepts covered. The concepts and skills are applicable to any disease you might want to highlight.

Why Develop an Epi Profile?

To create a consolidated resource to: • Communicate information for program planning and decision-making • Help define needs and communicate priorities • Create a baseline for comparisons over time • Assist in monitoring progress and evaluating programs

Session XVIII: Creating an Epidemiological Profile 498 Epidemiology for Data Users (EDU) Trainer’s Manual

Speaker Notes: Developing a profile provides an opportunity to SHARE data – collecting and having them are NOT sufficient – must USE them! Data and thoughtful analyses can help support, for example, requests for resources to address identified needs or to reallocate resources to other areas Examples of Profiles: 1. Annual or other periodic reports to governmental decision makers or external funding agencies – for example, the Ministry of Health develops an Annual Health Statistical Bulletin 2. NAC develops a Zambia Country Report to the United Nations Special Assembly, and reports to various funding agencies 3. WHO provides country fact sheets about certain conditions for reference on the web In addition to policy and funding purposes, such reports are often used by health care providers, CBOs, educators, media, and advocacy groups to guide their activities Such a report can, by sharing data and other information, improve the likelihood that the public will hear similar data from different educational sources, and can help to reduce the number of special data requests made of the areas that collect data. In NASTAD work with the State government in Andhra Pradesh, India, a detailed Epi Profile was designed to influence NGOs to focus their programs on areas the State saw as needs (rather than only on the NGO‘s interests)

Session XVIII: Creating an Epidemiological Profile 499 Epidemiology for Data Users (EDU) Trainer’s Manual

Importance of an Epi Profile

• Analyzing data from multiple sources can help identify priority needs and emerging problems • Supporting program decisions with data can increase likelihood of acceptance • Sharing data from different sources with involved sites may identify explanations for data inconsistencies and lead to changes in provider behavior

Importance of an Epi Profile, cont’d

• Developing products like epidemiologic profiles that assemble data from different sources in one accessible place can help shape and guide decisions • Remember -- data are useful to the extent that they are analyzed, shared/ communicated, and used -- not just collected!

Session XVIII: Creating an Epidemiological Profile 500 Epidemiology for Data Users (EDU) Trainer’s Manual

What is an Epidemiologic Profile?

• A document that uses epidemiologic principles to clearly characterize the status of a disease in a population and the services available to address it • Involves extraction and summarization of data from key data sources – For a defined geographic region – Over a defined period of time • Provides analysis and interpretation of these data in text and figures • Produces a written report that concisely describes findings and provides recommendations

Speaker Notes: Produce a written report concisely describing findings and offering recommendations supported by data discussed in the report We‘ll speak interchangeably of a ―summary report,‖ ―an epidemiologic profile,‖ and a ―factsheet.‖ The difference is a matter of coverage and comprehensiveness; the concepts we‘ll cover apply to each, regardless of what we call it. The report can be tailored to best address its intended uses. An epidemiologic profile draws together data from multiple sources to present as complete a description as possible of the status of a disease or health condition. The profile employs epidemiologic principles and measures using these data to clearly characterize the status of a disease or health condition. It is important to note that an epi profile/fact sheet can be developed for most diseases/health conditions and programs. In this session, we will use primarily HIV/AIDS example, as many of you work in this area and also because the most data are readily available on this condition to support your practicing the concepts covered. The concepts and skills are applicable to any disease you might want to highlight.

Session XVIII: Creating an Epidemiological Profile 501 Epidemiology for Data Users (EDU) Trainer’s Manual

Why Develop an Epi Profile?

To create a consolidated resource to: • Communicate information for program planning and decision-making • Help define needs and communicate priorities • Create a baseline for comparisons over time • Assist in monitoring progress and evaluating programs

Speaker Notes: Developing a profile provides an opportunity to SHARE data – collecting and having them are NOT sufficient – must USE them! Data and thoughtful analyses can help support, for example, requests for resources to address identified needs or to reallocate resources to other areas Examples of Profiles: 1. Annual or other periodic reports to governmental decision makers or external funding agencies – for example, the Ministry of Health develops an Annual Health Statistical Bulletin 2. NAC develops a Zambia Country Report to the United Nations Special Assembly, and reports to various funding agencies 3. WHO provides country fact sheets about certain conditions for reference on the web In addition to policy and funding purposes, such reports are often used by health care providers, CBOs, educators, media, and advocacy groups to guide their activities Such a report can, by sharing data and other information, improve the likelihood that the public will hear similar data from different educational sources, and can help to reduce the number of special data requests made of the areas that collect data.

Session XVIII: Creating an Epidemiological Profile 502 Epidemiology for Data Users (EDU) Trainer’s Manual

In NASTAD work with the State government in Andhra Pradesh, India, a detailed Epi Profile was designed to influence NGOs to focus their programs on areas the State saw as needs (rather than only on the NGO‘s interests)

Importance of an Epi Profile

• Analyzing data from multiple sources can help identify priority needs and emerging problems • Supporting program decisions with data can increase likelihood of acceptance • Sharing data from different sources with involved sites may identify explanations for data inconsistencies and lead to changes in provider behavior

Importance of an Epi Profile, cont’d

• Developing products like epidemiologic profiles that assemble data from different sources in one accessible place can help shape and guide decisions • Remember -- data are useful to the extent that they are analyzed, shared/ communicated, and used -- not just collected!

Session XVIII: Creating an Epidemiological Profile 503 Epidemiology for Data Users (EDU) Trainer’s Manual

Speaker Notes: A general process for Profile development may be logically broken down into 10 steps.

Speaker Notes:

Session XVIII: Creating an Epidemiological Profile 504 Epidemiology for Data Users (EDU) Trainer’s Manual

Having seen all the steps together, we‘ll further explore each one individually You‘ll notice that the steps to produce the report itself are the same as those discussed in the session on report writing

Practical Scenario: Developing Your Own Epi Profile You will carry out 10 steps to develop your own Epi Profile – some have been decided for you: 1. Audience = • Decision-makers at provincial and national level 2. Scope = • For HIV/AIDS program in your region, highlight key indicators used by MOH and NAC for decision- making (refer to ‘Integrated List of Indicators’) • Optional: select 2-3 indicators important to your work or region

Speaker Notes: Detailed enough to present and discuss available data

3. Questions an Epi Profile Answers

• What socio-demographic characteristics describe and Who differentiate the general and infected populations?

• What is the extent of the disease? What have been the What trends over time?

• What is the extent of factors placing people at risk for How the disease?

• What is the availability of services to prevent, treat, or Where care for the disease?

Speaker Notes: These are key questions to address in describing a health condition – the degree to which each is explored depends on the scope of the report.

Session XVIII: Creating an Epidemiological Profile 505 Epidemiology for Data Users (EDU) Trainer’s Manual

Wouldn‘t it be handy to have all of the answers in one easy-to-reach report or fact sheet?

Developing Your Own Epi Profile

4. Format - use template provided, which includes the following sections: – Introduction – Background – Methods – Results/findings – Recommendations 5. Data sources - HMIS, NACMIS, SmartCare, Census, and 2007 DHS

Data Sources Contributing to Epi Profile

SmartCare

HMIS NACMIS Epi Profile

DHS & Census Behavioral Surveys

Speaker Notes: We‘ve discussed all of these sources in earlier sessions and explored several in depth Bringing data analyses from these various sources together in one resource can help us (and others) use them to the address the issues at hand.

Session XVIII: Creating an Epidemiological Profile 506 Epidemiology for Data Users (EDU) Trainer’s Manual

Developing Your Own Epi Profile

6. Assess the data (consider any limitations) 7. Analyze the data (summarize, explore trends, consider how best described, etc.) 8. Synthesize the data across sources, interpret, and present findings 9. Draw conclusions & make recommendations – Acknowledge limitations 10.Review profile with team and bring to Data Quarterly Review Meeting

Speaker Notes: Finding the right data for your epi profile requires critical assessment of every data element that you plan to use. In order to make sure that the data truly answers your question, you need to ensure that the limitations and quality of your data fit! Follow this guide to assessing data while doing your EpiProfile.

Session XVIII: Creating an Epidemiological Profile 507 Epidemiology for Data Users (EDU) Trainer’s Manual

Epi Profile Exercise

Team exercise will develop two products: 1) Written Epi Profile Report (final version due in 3-4 months at Quarterly Review Meeting) – One Every Team member is to be involved in developing the written report, with each responsible for specific parts

2) Oral Presentation (15 minutes MAXIMUM) to summarize preliminary written report (due tomorrow afternoon) – One or more Team members present orally

Speaker Notes: Explain that need to continue to work on developing this after the training. This needs to be prepared in time for the first quarterly review meeting (MOH and NAC)

Suggested Epi Profile Organization

Determine how your team will function: • Develop an outline for Epi Profile (for exact statistics to be inserted later) by end of today • Decide who from your team will write each section • Plan time (set target times to finish key steps • Focus your time today and tomorrow on developing graphs and charts for all indicators (since you have HMIS, NACMIS, SC and CSO experts available these next two days) • Each team member should produce at least two graphs or chart (share indicators)

Speaker Notes: Advise participants to plan carefully but not spend all of their time perfect a plan – they can and likely will modify their approach as they go along. Try to develop an outline (to insert statistics later) in Word.

Session XVIII: Creating an Epidemiological Profile 508 Epidemiology for Data Users (EDU) Trainer’s Manual

Oral Presentation

Each Team will have 15 minutes to summarize key points from its written report (total 6 slides) • Identify key points for oral presentation and decide who will present (one or more people) • Introduce presentation, summarize on one text slide • Discuss key findings, support with < 4 slides of figures to illustrate findings (each has one SOCO) • Present conclusions/recommendations, summarize on one text slide • Feedback provided after each presentation

Speaker Notes Figures may include graphs, charts, or maps and should be appropriate to the data presented SOCO = Single Overriding Communication Objective Reinforce that Teams may use up to 6 slides total in the oral presentation, two with text and 4 with graphs, charts, or maps, and that faculty will be available throughout the exercise to help guide people in the appropriate directions and to answer questions Faculty should be prepared to listen and provide feedback after each presentation

Session XVIII: Creating an Epidemiological Profile 509 Epidemiology for Data Users (EDU) Trainer’s Manual

Teams for Exercise

• ______Province • ______Province – Xxx – Xxx – Xxx – Xxx – Xxx – Xxx • ______District • ______District – Xxx – Xxx – Xxx – Xxx – Xxx – Xxx • Etc. • Etc.

Speaker Notes: If teams have not already been assigned, use this slide to provide team assignments with members of each by name; include (if possible), people from the same province or district, with representation from MoH (DHIO, DHO, PHO), NAC (PACA, DACA), and CSO (central CSO, PCSO)

Facilitators are here to assist, as needed

Enjoy your work!

Session XVIII: Creating an Epidemiological Profile 510 Epidemiology for Data Users (EDU) Trainer’s Manual

WRAP-UP AND REVIEW

Objectives

Participants should be able to: • Apply course skills to develop an epidemiologic profile • Synthesize health data from Zambian data sources to highlight disease trends, services, and needs

Speaker Notes: You should now be able to meet these objectives and develop an Epi Profile or similar report

RESOURCE CENTER

Resource Center

• CDC. “Completing the Epidemiologic Profile.”http://www.cdc.gov/hiv/topics/s urveillance/resources/guidelines/epi- guideline/chapter4.htm

Session XVIII: Creating an Epidemiological Profile 511 Epidemiology for Data Users (EDU) Trainer’s Manual

APPENDIX A: EPI INFO HANDOUT Download Epi Info at: http://www.cdc.gov/epiinfo

OPEN Epi Info:

The following should appear on your screen:

Appendix A: Epi Info Handout 512 Epidemiology for Data Users (EDU) Trainer’s Manual

Action: Click the „Analyze Data‟ Button

The following should appear on your screen:

Appendix A: Epi Info Handout 513 Epidemiology for Data Users (EDU) Trainer’s Manual

IMPORT a data set from Excel into Epi Info:

Overview Data sets from software programs such as Excel can be imported into Epi Info. The Read command is able to access and open data tables stored either in the Epi Info data format or in other data formats, such as Excel or Access.

Procedure Follow the steps below to import a data set from Excel:

1. Click Analyze Data button from the main menu 2. Click on Read (Import)

3. Click on Data Formats drop-down box and select Excel 8.0 (or whatever version you want to import) 4. Click on Data Source (…) and Analysis will display a window in which you can select the Excel file you want to read: C:\ …\ 5. Click on your file and then click Open 6. Click on desired Sheet and then click OK (leave the default option in the box First Row contains field names).

Appendix A: Epi Info Handout 514 Epidemiology for Data Users (EDU) Trainer’s Manual

Check When the Read command has been executed, the Analysis Output section will display the Current View, Record Count, and Date to indicate the data table is open. ● Current View indicates the data table that is open in Epi Info. ● Record Count indicates the number of records that has been identified (read) by the Read command. ● Date gives the date and time the table was opened in Epi Info.

Using the LIST Command

Overview The data set is now open and available in Epi Info. However, it is not displayed yet. The List command is used to display either all or selected variables from the data set.

Procedure Follow the steps below to display the table you just read, ‗dataset.xls‘

1. From the command tree on the left side of the screen, click on the Analysis command under the Statistics folder 2. Click the List command 3. In the List dialog box that appears, click the Variables drop-down list to display all the variables that are available. Notice that the fields in the drop-down list are listed alphabetically. You can select specific variables that you want to display by clicking on the variable name.

Appendix A: Epi Info Handout 515 Epidemiology for Data Users (EDU) Trainer’s Manual

4. Click OK and note the frequencies of the variable.

Check The output will produce a line listing of the variables

Running a FREQUENCY on a specific variable

Overview Frequencies are tables that show how many records have each value of a categorical variable (for example: sex or marital status) and the percentage of records that have each value.

Procedure Follow the steps below to run a frequency:

1. Click Analysis button from the main menu 2. From the command tree on the left side of the screen, click on the Frequencies command 3. From the Frequencies of box, use the down arrow to select a variable.

4. Click OK and note the frequencies of the variable.

Appendix A: Epi Info Handout 516 Epidemiology for Data Users (EDU) Trainer’s Manual

*Note that frequencies can be calculated for more than one variable at a time or can be stratified by another variable.

Check Epi Info produces the distribution as percentages and in a table

Using the MEANS command to generate descriptive statistics on continuous variables

Review The Means command is used to generate descriptive statistics on continuous variables

Appendix A: Epi Info Handout 517 Epidemiology for Data Users (EDU) Trainer’s Manual

Procedure Follow the steps below to run the means command:

1. Click the Means command under the Analysis folder 2. In the Means dialogue box that appears, select ‗variablename‘ from the list of variables under Means of drop-down list 3. Click OK

Check The output should display the variables‘ minimum, median, maximum values, as well as mean, median and standard deviation.

Creating Graphs

Review Graphs are a useful way to visualize data. Epi Info can be used to create: ● Line graphs ● Bar charts

Appendix A: Epi Info Handout 518 Epidemiology for Data Users (EDU) Trainer’s Manual

● Histograms ● Pie charts ● others

Procedure Follow the steps below to create a graph:

1. Click the Graph command 2. Select type of graph from Graph Type 3. Select the variables for the x-axis from ―x-axis, main variables‖ 4. Select the type of value to display from the ―y-axis, show value of‖ 5. Select the variables to display from ―series, bar for each value of‖ 6. Click OK Check The output should display the graph

Appendix A: Epi Info Handout 519 Epidemiology for Data Users (EDU) Trainer’s Manual

APPENDIX B: STATISTICAL FORMULAS Formula for T-test:

Formula for T-Test

difference between 2 means t = standard error of the difference between 2 means

44

Formula for T-Test

X1 = mean of values in group 1 X2 = mean of values in group 2 N1 = total number in group 1 N2 = total number in group 2 S1 = standard deviation in group 1 S2 = standard deviation in group 2

45

Appendix B: Statistical Formulas 520 Epidemiology for Data Users (EDU) Trainer’s Manual

Formula for χ2-square:

The Chi-square 2 Formula:

• 2 = ∑ (Observed – Expected)2 / Expected

OR

• 2 = [(ad-bc)]2(T)/(a+b)(c+d)(a+c)(b+d)

46

Example:

Chi-Square Smoking and Lung Cancer Expected

Cases Controls

(69*60)/120 (69*60)/120 Smoker 34.5 34.5

(51*60)/120 (51*60)/120 Non-Smoker 25.5 25.5

2 = ∑ (Observed – Expected)2 / Expected

2 2 2 2 = (41-34.5) + (28-34.5) + (19-25.5) + (32-25.5) = 5.76 34.5 34.5 25.5 25.5

47

Appendix B: Statistical Formulas 521