Why map the data?
Now that the purpose of the OHSS has been defined, you will need to ascertain what surveillance systems and subsequently data are available to address that purpose. With this information you can then determine where data sharing can occur and how the OHSS can be designed based on the existing structures.
Identify relevant surveillance systems and other sources of data
To identify surveillance systems and other sources of data that are potentially relevant your OHSS reach out to your stakeholder group. In addition to the obvious surveillance systems from the public sector, this action will help you to identify surveillance systems executed within the private sector and non-classical surveillance activities such as prevalence studies that may also contribute data to your system.
What information do we need to collect?
For each surveillance system or study that could be included in the OHSS, you will need to collect information about the surveillance system or study itself (objective, situational context, surveillance strategy, sampling strategy etc), the variables for which data are collected, and how the surveillance data are managed from the technical perspective.
The information can be split into three main categories: background information about the surveillance system (setting the scene), the epidemiological and situational context within which the surveillance system operates (context), and the variables about which data are collected (variable data). Each button below lists base-line information that can be collected for each category to begin the descriptive process. This acts as a starting point; however, you will likely want to include or remove variables based on the unique features of your particular project.
Setting the scene
Context
Variable description
How should the information be collected?
Where possible, the data description process should be completed jointly by the epidemiologist responsible for the data and the data scientist responsible for the technical management of the data (they may be one and the same person).
The information is best collected in a standardised format so that the similarities and differences between the data collected within each surveillance system are easily identifiable. This approach also allows availability and limitations (including quality) to become apparent.
The link below provides a Data Mapping tool (Excel format), that can be used to describe the data in a standardised manner based on the suggestions above. This tool can be modified to your needs, including additional information you may want to collect, and removing variables that are not important to your project.
Setting the scene
What is the pathogen (s), disease (s) or hazard (s) under surveillance
Within which sector is this surveillance system executed (animal health, human health, food safety etc.)
Where are the data stored? (geographically)
Who has primary responsibility for the data?
Who has primary responsibility for analysis of the data?
For example: EU and local data protection laws, data confidentiality laws etc.
What data management software is used to manage the data?
e.g. Microsoft SQL Server Management Studio, Access, Excel
(xlsx, csv. etc)
Surveillance structure from the epi context
Surveillance systems can be composed of several unique surveillance activities, these are called surveillance components.
Surveillance components may be differentiated by the target species or population, the data source, sampling strategy, type of surveillance and so on.
What is the pathogen (s), disease (s) or hazard (s) under surveillance
Species: the susceptible animal population targeted by the surveillance system, and about which conclusions are drawn (eg. human, pig, chicken, rabbit etc)
Matrix: the potentially contaminated product or matrix targetd by the surveillance system and about which conclusions are drawn (eg cheese, bread etc)
When did data collection begin?
Is the surveillance system in place for the whole geographical area of the country, or only certain regions? Please describe.
What is the current status of the disease in the geographical area covered?
(eg. historically absent, absent, endemic etc)
What are the goals of the surveillance system, that when met will result in the collection and analysis of data in order to achieve the purpose of the system?
(eg. early detection, prevalence estimation, freedom from disease documentation etc)
Is the disease: notifiable or reportable under EU legislation (animal health and food safety), OR listed as communicable in the COMMISSION IMPLEMENTING DECISION (EU)2018/945(human health)?
What is the legal status of the disease in your country (this may, or may not be the same as the EU classification)
Is this an active or a passive surveillance system?
Active surveillance: Investigator-initiated collection of animal health related data using a defined protocol to perform actions that are scheduled in advance. Decisions about whether information is collected, and what information should be collected from which animals is made by the investigator
Passive surveillance: Observer-initiated provision of animal health related data (e.g. voluntary notification of suspect disease) or the use of existing data for surveillance. Decisions about whether information is provided, and what information is provided from which animals is made by the data provider.
What is the definition used to determine when a sample is considered to be positive for the hazard, disease or syndrome or when the person, animal, herd, flock or matrix is considered to be positive for the hazard, disease or syndrome.
Include specifics about laboratory test used.
What samping strategy is in place for this surveillance system?
eg. suspect, objective, selective, census, convenience, import or other? (definitions for each are provided in the ‘Definitions’ tab)
Describe in as much detail as possible how the sampling strategy is realised.
What unit is targeted in the surveillance system component
(is it the individual/animal, the group, the batch, the herd, etc)?
Where are the samples collected?
(eg. at the primary healthcare facility, at the hospital, on farm, at the processing plant etc)
What sample type(s) are collected?
(eg. blood, faeces,tissue etc)
Who collects the samples?
(eg. primary care physician, veterinarian, farmer etc)
Variable and data description
try to create unique variables that are self-explanatory
describe/define the variable
For instance: 0 and 1
For instance: 0 = no, 1=yes
(Int, float, char, string, date, time)
How are the data displayed (if applicable)
What percentage of records have a valid entry for this variable?
What percentage of records are correctly entered for this variable?