Privacy and Data Security in SPIDeRR:
a Guide for UK Citizens.
Health data about people with musculoskeletal problems is a precious resource for research, to improve care in the future:
A third of people experience significant musculoskeletal problems during their lifetime, but it can take many weeks or months for them to get a diagnosis and start the right treatment. In order to develop new digital tools that help such people – and their doctors – to access the most useful healthcare services for them at the earliest possible stage, data scientists need to work with “real-world” data from the health records of thousands of people. In SPIDeRR, these real-world data will come from electronic health records (EHRs) of people in both primary and secondary care (family doctors and hospitals) across multiple countries in the EU, including the United Kingdom (UK). Data scientists do not need to access personal details of all these people, such as their names, dates of birth and postal addresses – they are not interested in this information and it will be removed before the data is made available to the researchers. But they will have access to detailed information on measurements, symptoms, blood results and diagnoses for these people. All these data will help the scientists develop computer algorithms and artificial intelligence tools that are of use to doctors when applied to individuals presenting with musculoskeletal symptoms for the first time – to help prioritise management strategies and, where necessary, referral pathways. To enable this in the UK, we have worked hard to ensure the security of the data about people’s health that will contribute to digital tool development. Our approach has been subject to careful scrutiny by an independent Research Ethics Committee and a Confidentiality Advisory Group, both administered at a national level under the UK government’s Health Research Authority.
What follows is an explanation of the steps we have taken (i) to ensure the security of data obtained from primary and secondary care in SPIDeRR, and (ii) to be certain that the privacy of the people described by the data is respected. The SPIDeRR partners dealing with data from the UK are Newcastle University and the Academic Health Sciences Network for the Northeast and North Cumbria (NENC) – the information that follows only applies to UK citizens who live, or attend a GP or hospital, within this region of the UK (specifically, Newcastle upon Tyne Hospitals or GP practices in the vicinity); no data will be extracted from other UK sources during the lifetime of the SPIDeRR project.
Extracting EHR data about people with musculoskeletal complaints in hospital (secondary care).
A search will be conducted to identify EHRs relating to individuals presenting with musculoskeletal complaints – whether attending the rheumatology department or the Newcastle upon Tyne Hospitals more generally. Once the relevant health records have been found, they will be “de-identified at source,” using three step process. First, data items which could directly identify an individual will be removed from the record in their entirety; this would include individual’s name and address etc. Second, other data items which could lead to identification will be transformed into a non-identifiable format; for example replacing date of birth with age, and postcode with a social deprivation measure based on address location. Finally, an industry standard tool will be used to create a project specific patient number allowing complete removal of patient identifiers from the dataset. The “de-identified” data will then be transferred using a secure file transfer service to a “secure data environment” (SDE) housed by the lead academic institution for SPIDeRR, namely Leiden University Medical Centre (LUMC), in the Netherlands. An SDE is a “protected” space that is very secure and only accessible by registered individuals: de-identified data from UK EHRs can only be transferred into the SDE by responsible individuals at Newcastle upon Tyne Hospitals, and it cannot be removed from the SDE after that. With permission, named researchers across the SPIDeRR consortium can work on the de-identified data in the SDE, but they cannot remove it to another location; the only way the data can be removed from the SDE once transferred into it is for it to be deleted. In relation to the SPIDeRR project, a legally binding data sharing agreement (DSA) has been reviewed and signed by the Information Governance Team at Newcastle upon Tyne Hospitals (with oversight from Newcastle University as the primary research partner) as well as LUMC partners. This confirms that data handling will be compliant with European Union law (General Data Protection Regulation, GDPR), which continues to apply in the UK after “Brexit.”
Extracting EHR data about people with musculoskeletal complaints in GP practices (primary care).
We will access information about people with musculoskeletal complaints attending their GP practices. We will prioritise GP practices involved in the care of the largest possible number of people with musculoskeletal complaints for whom we have also collected EHRs at Newcastle upon Tyne Hospitals – because this will ultimately afford researchers with the richest data with which to develop digital tools. We have had to adopt a different approach to extracting data in Primary Care, because GP practices are not usually resourced to be able to spend time doing the data extraction, de-identification and transfer (to the SDE). Therefore, we are working with a NHS contractor called the NHS North of England Care System support service (NECS), who have the expertise to be able to provide this service for our GP partners. An individual from NECS will be responsible to de-identification of EHRs from primary care (including those linked to secondary care records) prior to their transfer into the SDE. Each individual participating GP practice will sign a data sharing agreement with LUMC, under the same conditions as those in place for secondary care data.
Frequently asked questions.
- I have already registered for the NHS National Data Opt Out, available to all NHS users. Could my health records still be used for the research described in SPIDeRR?
- No. We will automatically filter out all individuals who have opted out of NHS digital from our searches as a first step. If you have already opted out of NHS digital and you don’t want to be part of SPIDeRR, you don’t need to do anything.
- Having found out about this project, I’ve decided I would not want data about me to be used in this way and I would like to OPT OUT of any participation in SPIDeRR. How do I do that?
- If you found out about this project from a poster displayed at your GP practice, or if you are a patient at a GP practice in the Northeast and North Cumbria, it is possible (but not necessarily the case) that your de-identified data could be used for SPIDeRR. If your GP practice is not in this region of the UK, your data will not be accessed as part of the SPIDeRR project, and you don’t need to do anything.
- If you think your data could be used for SPIDeRR and you want to opt out you can:
- National Data Opt Out. Choose this option if you do not want your NHS data to be used for research or planning, irrespective of the purpose. You can find out how here.
- Opt out of SPIDeRR only. Choose this option if you are happy for NHS data from your GP records to be used for any purpose except the SPIDeRR research project. If you follow the step(s) below, your GP data will be excluded from any future data extracts in relation to the SPIDeRR project. To do this, you can
- E-mail nuth.spiderr@nhs.net. You can supply your NHS number in the email and simply say “I want to opt out of SPIDeRR”. If you don’t know your NHS number you can leave your name and date of birth. If you don’t want to include these details you can instead supply a ‘phone number so that we can call you back during working hours to collect it over the ‘phone.
- Telephone 0191 213 7753. If we do not answer, we will do so as soon as possible during working hours to take your details.
- Who is leading this research and how can I contact them?
- The overall academic lead of the SPIDeRR project is Professor Rachel Knevel, who can be contacted via the SPIDeRR contact form. The UK academic lead for the SPIDeRR project is Dr. Arthur Pratt, who can be contacted via: nuth.spiderr@nhs.net.
- Who is in charge of data protection for SPIDeRR and how can I contact them?
- For general data protection information about SPIDeRR the SPIDeRR contact form may be used to make an enquiry. For matters regarding information governance at Newcastle Hospitals, contact the Information Governance department by emailing nuth.dpo@nhs.net or telephoning 0191 2137089.
- What is the purpose and lawful basis of data collection in SPIDeRR and who is funding the project?
- The purpose and lawful basis for data collection in SPIDeRR is research for public benefit (UK GDPR article 6 (1)(e) and article 9(2)(j)). The SPIDeRR project is funded by the European Union. In the UK, research activities are funded by the UKRI following the UK’s exit from the EU.
- What types of personal data will be collected about me for SPIDeRR?
- The following categories of personal data will be removed from EHR data before it is shared, and will not be available to researchers:
- Name.
- Date of birth.
- Address (including post code).
- Telephone number, email address and other means of contact.
- The following categories of personal data will be made available to researchers:
- Age (years)
- Sex
- Medical information available from health records including:
- Background medical diagnoses
- Pattern of musculoskeletal symptoms (including joints affected, type of pain, symptom duration presence/absence of swelling, etc.
- Blood test results
- The following categories of personal data will be removed from EHR data before it is shared, and will not be available to researchers:
- Who will have access to my personal data?
- If your data are identified in secondary care records, a single member of the NuTH team will briefly have access to personal identifying information (including name, address, NHS number, etc.), but will quickly “de-identify” it so that these features are removed from it.
- If your data are identified in primary care records, a single member of the NECS team will briefly have access to personal identifying information (including name, address, NHS number, etc.), but will quickly “de-identify” it so that these features are removed from it.
- Once data has been de-identified, SPIDeRR researchers will have access to it for research purposes only. Your data will be analysed alongside that of many thousands of other individuals across the EU. It will not leave the SDE during this process. Researchers will be based at SPIDeRR partner institutions listed here, and will need to have been given permission to do so by the lead SPIDeRR investigator for that institution.
- Will my data be used for commercial purposes or to make money?
- No.
- For how long will data about me be available to SPIDeRR researchers?
- The SPIDeRR project is scheduled to run until April 2028. IT is possible that an extension to prolong its duration may be applied for but this will need to be approved by the European Commission. When the project comes to an end, de-identified EHR data will be removed (deleted) from the SDE.