Sharing data supporting published findings is crucial for reproducibility. Beyond verification, it enables testing new predictions and exploring unforeseen effects, accelerating research progress. However, on some occasions, ethical concerns may prevent public data sharing, especially when participants’ anonymity is at risk. A solution is synthetic data—mathematically derived datasets that preserve key statistical properties (e.g., variable means, variance-covariance structure) of the original while protecting sensitive information. In this talk, I will introduce synthetic data, demonstrate its applications, and discuss its role in Open Research practices.
Presented at Midlands Innovation Open Research Week, 07/05/25.