Data Services
Data Services is a content and technical advisor for projects leveraging HUS data, performing data extractions in the HUS Data Lake for various purposes, such as scientific research and knowledge management.
HUS Data Services
We provide the access to the world’s most comprehensive datasets on specialized health care.
We can deliver Real World Data (RWD) for various use cases, efficiently and securely. The most common purposes are scientific research, information management and authorities’ requests.
The HUS Data Lake contains register data from patient information systems in the various specialties at HUS, such as patient visits and inpatient care periods, diagnoses, procedures performed, laboratory examinations, surgical operations, imaging, pathology samples, intensive care and anesthesia. Medical images, signal data and genome data are also available.
You must have a research permit and/or a data access permit to use the data. Instructions on how to apply for a permit are available here. You may submit a preliminary study to Data Services for your research plan if you wish to explore whether data is available or how large the target cohort is. If you intend to process and combine datasets from various data controllers (for example HUS and other wellbeing services counties), the data permit process is managed by Findata.
In the HUS Data Lake, the data from patient registers and certain administrative registers are organized so that they can be leveraged for example for scientific research, knowledge management or patient care. The HUS Data Lake integrates data from more than 100 different patient information systems and quality registers.
The Data Lake is a patient register that compiles a variety of patient-related health information and is a part of the broader HUS patient information system. It complies with the same data security and GDPR requirements as all other patient and administrative information systems used at HUS, and a Data Protection Impact Assessment (DPIA) has been made. The HUS Data Lake does not combine registers; the datasets retrieved from various registers are kept separate by technical means.
The data in the Data Lake are pseudonymized. All datasets in the HUS Data Lake can potentially be joined via pseudonymized identifiers.
Our work includes finding data for a wide range of use cases, improving the usability and quality of data, investigating and processing new datasets integrated into the Data Lake and generating datasets to add to our service offering. We also validate data, contribute to reporting development and are involved in various projects. Our goal is to provide a seamless service and to deliver high-quality data to customers.
You can submit a data request, preliminary study request, a cost estimate request or and data transfer request (into a secure user environment) through the Data Portal.
Please submit a preliminary study request (e.g. a report on the volume of the target dataset or data availability; this does not require a permit), before applying for a registry research permit, so that we can ensure that the data required for your research is available.
A well thought out specification of your dataset needs at the permit application stage will expedite processing of the application and the creation of the dataset itself.
You can monitor the progress of your data requests and other service requests in the Data Portal.
Datasets subject to the Secondary Use Act will be delivered to a secure user environment (such as HUS Acamedic). We can also deliver datasets as per your data access permit to another audited secure user environment that complies with the Secondary Use Act.
For the time being, HUS has decided to cover the costs of using Data Services for HUS-based research projects. A HUS-based research project is defined as one where the responsible researcher and a significant percentage of the research team members are employed at HUS.
Datasets in the Data Lake
Patient background data
- Demographic data, e.g. date of birth, date of death, gender, municipality
- Source systems and availability:
- Uranus and Apotti:
- 2004–2023
- background data for a total of c. 3.5 million patients
- height and weight data for a total of c. 1 million patients
- Uranus and Apotti:
Referral data
- Data recorded from referrals, e.g. date of referral, referring unit and department, receiving unit, approval date, referral diagnosis, referral type (e.g. consultation request)
- Source systems and availability:
- Uranus:
- 2002–2013, partial
- 2013–2020 c. 400,000 referrals per year
- Apotti:
- 2018–2019, partial
- From 2020, c. 600,000 referrals per year
Diagnosis data
- Diagnosis data entered in the patient information system, e.g. ICD-10 diagnosis code, the unit entering the diagnosis, update date, primary or secondary diagnosis, recorded time when diagnosis started
- Source systems and availability:
- Uranus:
- 2004–2007, partial
- 2008–2020, c. 2–6 million diagnoses per year
- Apotti
- From 2020, partial
- From 2022, c. 12 million diagnoses per year
Visit data
- Patient visit data, e.g. time of visit, health care unit, specialty, type of visit, primary diagnosis, further treatment facility
- Source systems and availability:
- Uranus:
- 2004–2008, partial
- 2009–2020, c. 2–3 million visits per year
- Apotti:
- From 2018, gradually
- From 2021, c. 3.5 million visits per year
- From 2021, appointment booking data
- Uranus:
Inpatient care period data
- Data recorded during inpatient care periods, start and end time of inpatient care, health care unit, receiving unit and further treatment unit, specialty, primary diagnosis and primary procedure
- Source systems and availability:
- Uranus:
- 2004–2008, partial
- 2009–2019, c. 200,000 inpatient care periods per year
- Apotti:
- From 2018, c. 200,000 inpatient care periods per year
- Uranus:
Patient record texts
- Patient record texts, statements, specialty (tab) and care report texts entered in patient information systems
- Source systems and availability:
- Uranus:
- 2002–2004, partial
- 2005–2020, c. 2–6 million patient record texts per year
- Apotti:
- From 2021, c. 7 million patient record texts per year
- Uranus:
- Sexual assault data removed from the dataset
- In a data access request, it is useful to delimit text for instance by search term or by specialty for privacy protection reasons
Laboratory examination data
- Data entered in the laboratory system, e.g. laboratory examination number, sampling time, commissioning unit and examination result
- ECG examination visit data
- Source systems and availability:
- Multilab: from 2000, c. 10–70 million laboratory examinations per year
- Muse: from 2010, c. 100,000–200,000 ECG examinations per year
Procedure data
- Entries related to a procedure, e.g. time and procedure code, primary or secondary procedure
- Urgency of surgical procedure, duration of surgery, accessories used and anesthesia data
- Source systems and availability:
- Uranus:
- 2004–2008, partial
- 2009–2019, c. 1–2 million procedures per year
- Opera:
- 2005–2009, partial
- 2010–2020, c. 100,000 surgical procedures per year
- Apotti:
- From 2021, c. 3 million procedures per year
- From 2021, c. 100,000 surgery data per year
- Uranus:
Pathological examination data
- Pathological examinations, examination results and statements, Snomed code set
- Source systems and availability:
- Qpati:
- 1987–1993, c. 16,000–65,000 samples per year
- 1994–2021, c. 90,000–400,000 samples per year
- Qpati:
Medication data
- Prescriptions entered for the patient, medicine dispensation entries
- Source systems and availability:
- Uranus:
- 2012–2020, c. 1–4 million prescriptions per year
- 1–4 million medicine dispensation entries per year
- Kemokur:
- 2014–2020, c. 60,000 courses of treatment and dispensation entries per year
- Apotti:
- From 2019, 1–8 million prescriptions per year
- Marela:
- From 2003, 40,000–90,000 medicine orders per year
- Uranus:
Intensive care and anesthesia data
- Data entered in intensive care systems
- Limited availability of monitoring data
- Source systems and availability:
- Caresuite Picis82: 2012–20201
- Clinisoft Jorvi: 2001–2019
- Clinisoft Haartman Malmi: 2019
- Caresuite Picis80: 2009–2014
- Caresuite Meipicis: 2003–2009
- Caresuite Peipicis: 2006–2009
- Caresuite Toopics: 2005–2009
- Clinisoft LNS kix: 1999–2019
- Clinisoft LNS Que: 1999–2019
- Clinisoft Meilahti: 1999–2019
Prehospital emergency care calls
- Data recorded for prehospital emergency care calls, e.g. emergency vehicle time stamps, patient medication, patient records
- Source systems and availability:
- Merlotmedi:
- 2007–2012, partial
- From 2013, c. 100,000–200,000 calls per year
- Merlotmedi:
Imaging examination data
- Imaging examination data, e.g. time, examination number, referral, visit details, statement details
- Imaging data available on the PACS server
- Source systems and availability:
- Mustiradu:
- 1996–1998, partial
- 1999–2013, c. 200,000–1,000,000 imaging examinations per year
- HUSRadu
- 2013, partial
- From 2014, c. 1.5 million examinations per year
- Replaced with Apotti 2021–2022
- Apotti
- 2021, partial
- From 2022, c. 1.3 million examinations per year
Childbirth data
- Data recorded on childbirths
- Source systems and availability:
- Obstetrix: 2005–2019
Prehospital emergency care data
- Data on first response procedures and patient transports
- Source systems and availability:
- Merlotmedi from 2008, around 100,000 calls per year
Basic patient data and measurement results
- Patient data structurally entered in the care table, e.g. weight, height, blood pressure, body temperature, recording unit and time
- Source systems and availability:
- Uranus:
- 2013–2020, c. 10 million entries per year
- Apotti:
- From 2021, c. 1.1 billion follow-up data per year
- Uranus:
BCB quality registry data
- Registers with data available:
- Asthma: a total of c. 10,000 patients since 2018
- Back: a total of c. 31,000 patients since 2016
- Bipolar disorder register: a total of c. 700 patients 2018–2020
- Bladder cancer onc: a total of c. 150 patients 2017–2019
- Brain tumor: a total of c. 4,000 patients since 2019
- Breast cancer onc: a total of c. 14,000 patients since 2015
- Breast cancer: a total of c. 13,000 patients since 2015
- Cardiac arrest: total of c. 4,000 patients since 2020
- Cardiac surgery: a total of c. 21,500 patients since 2017
- Cataract: a total of c. 59,000 patients since 2014
- Catheter valve (TAVI): a total of c. 750 patients 2017–2021
- Child and adolescent psychiatry: c. 300 patients 2018–2020
- Cornea: a total of c. 700 patients since 2020
- Diabetes: a total of c. 7,300 patients since 2018
- Epilepsy: a total of c. 6,400 patients since 2018
- Fracture: a total of c. 20,000 patients since 2018
- Glaucoma: a total of c. 500 patients since 2021
- Gynecological cancers: a total of c. 8,300 patients since 2018
- Head and neck cancers onc: a total of c. 3,600 patients since 2018
- Head and neck cancers: a total of c. 10,400 patients since 2018
- Hepatitis C: a total of c. 1,000 patients since 2019
- Hernia: a total of c. 32,000 patients since 2016
- Hip: c. 29,000 patients since 2018
- HUSUKE (deformities of head and face): a total of c. 11,000 patients since 2016
- IBD: a total of c. 8,200 patients since 2016
- Implantdb: a total of c. 40,000 patients 2007–2021
- Infcare: a total of c. 1,800 patients since 2013
- Invasive cardiology (PCI/ANGIO register): a total of c. 21,000 patients 2017–2021
- Kidney cancer onc: c. 1,400 patients since 2018
- Knee: c. 30,000 patients since 2018
- Lung cancer: a total of c. 2,300 patients since 2019
- Lymphoma onc: c. 3,600 patients since 2018
- Muscular flap: a total of c. 2,300 patients since 2019
- Nephrology: a total of c. 6,000 patients since 2015
- Neuromodulator: a total of c. 1,100 patients since 2018
- Nose: a total of c. 12,000 patients since 2019
- Obesity: c. 4,000 patients since 2014
- Opioid replacement therapy: a total of c. 500 patients since 2021
- Pacemaker: a total of c. 35,000 patients since 2014
- Pad: a total of c. 50,000 patients since 2016
- Pediatric and adolescent cancer: c. 300 patients 2016–2021
- Pediatric back disorders: c. 1,300 patients since 2015
- Pediatric cancer: c. 1,600 patients since 2018
- Pediatric fractures: c. 22,000 patients since 2016
- Prostate onc: a total of c. 3,600 patients since 2017
- Prostate: a total of c. 6,700 patients since 2017
- Psychosis: a total of c. 600 patients 2018–2020
- Psychotherapy: a total of c. 14,700 patients since 2018
- Rare diseases: a total of c. 6,600 patients since 2017
- Rectal carcinoma onc: a total of c. 3,000 patients since 2017
- Rectal carcinoma: a total of c. 8,600 patients since 2014
- Renal cancer: c. 1,700 patients since 2018
- Resuscitation (MET team): a total of c. 5,000 patients 2015–2021
- Resuscitation: a total of c. 6,800 patients since 2015
- Retina: a total of c. 18,000 patients since 2015
- Rheumatoid arthritis: a total of c. 40,000 patients since 2015
- Sarcoma and gist: a total of c. 350 patients since 2020
- Skin cancer onc: a total of c. 800 patients since 2017
- Skin cancer: a total of c. 25,000 patients since 2017
- Spinal injury: a total of c. 2,000 patients since 2015
- Stroke (cerebral circulation disorder): a total of c. 47,000 patients since 2015
- Tissuedb: a total of c. 3000 patients since 2007
- Transplantations: a total of c. 24,000 patients since 2015
- Urogynecology: total of c. 24,000 patients since 2016
- Vascular anomalies: a total of c. 3,400 patients since 2016
- Vascular: a total of c. 27,000 patients since 2015
- Veins: a total of c. 10,000 patients since 2017
- Wounds: a total of c. 700 patients since 202
- Ylage: a total of c. 13,000 patients since 2019
Tips from the Data Services for research permit applications
Always request a preliminary study first
Always request a preliminary study before applying for a permit for register-based research. Preliminary studies involve things like investigating the size of the target group and the availability of data. A preliminary study, for which no research permit is required, and no cost is incurred, eliminates unpleasant surprises such as discovering after submitting a research permit application that essential data for the research are not available.
Describe the register data required accurately – use the Dataset Catalog
Please describe the register data you require as accurately as possible in your permit application documentation. You should use the HUS Dataset Catalog, a compilation of some of the available datasets. (Dataset catalogs in other hospital districts may also be useful.) Ensuring that your research plan is consistent with your dataset request helps avoid unnecessary filing for amendments and delays in delivering the datasets.
Indicate in your research plan and consent form that you will be using register data
If you are requesting datasets from the Data Services for which there is a research permit and the study is based on patient consent, then you must indicate in your research plan and patient consent form that register data will be included in your research materials.
Keep your research group details up to date
Details of your research group must be kept up to date, because only persons designated as research group members on Tutkijan työpöytä are allowed to access research materials in the study. Please add any new group members to the research group on Tutkijan työpöytä and submit the amendment application so that the processors become aware of them. You should also contact your research secretary. You must also update your research group details whenever a member leaves the group. Only persons named on the research permit will be issued access rights to HUS Acamedic.