As health care increasingly relies on artificial intelligence, new research from the Massachusetts Institute of Technology is raising concerns about how well patient privacy is protected in AI systems trained on electronic health records (EHRs). A paper co-authored by MIT researchers and presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS) finds that even models trained on de-identified medical data can unintentionally memorise and reveal patient-specific information.
Foundation models built on large-scale EHR datasets are designed to generalise patterns across many patients to improve clinical predictions. However, the study highlights a phenomenon known as memorisation, in which a model draws on data from an individual patient record rather than broader trends, creating a risk of privacy breaches. High-capacity AI models are already known to be vulnerable to data leakage under targeted prompting.
The research was led by Sana Tonekaboni, a postdoctoral researcher at the Eric and Wendy Schmidt Centre at the Broad Institute of MIT and Harvard, in collaboration with Marzyeh Ghassemi, an associate professor at MIT and principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health. The team developed a structured testing framework to evaluate whether attackers could extract sensitive health information from EHR-trained models.
Their findings suggest that the likelihood of information leakage increases as attackers gain more prior knowledge about a patient. The researchers also distinguish between relatively benign disclosures, such as age or general demographics and more harmful leaks, including diagnoses of conditions like HIV or substance abuse.
The study underscores the urgency of robust privacy safeguards as health data digitisation accelerates. In the past two years alone, the U.S. Department of Health and Human Services has reported 747 major health data breaches, most involving hacking incidents. Patients with rare or unique medical conditions face heightened risks, as they are easier to identify even in anonymised datasets.
The researchers argue that AI models should undergo context-specific privacy evaluations before deployment, ensuring they do not meaningfully compromise patient confidentiality. Future work will expand the framework to include clinicians, legal experts, and privacy specialists.
“There’s a reason our health data is private,” Tonekaboni said. “That principle must be protected as AI becomes part of medical care.”


