Workshop
Synthetic Data
Innovation, Limitations and Practical Applications
Organisers
Talha Iqbal
Insight SFI Research Center for Data Analytics, University of Galway
Muhammad Ali Farooq
School of Electrical Engineering, University of Galway
Peter Corcoran
School of Electrical Engineering, University of Galway
Ihsan Ullah
Insight SFI Research Center for Data Analytics, University of Galway
Length
2 hours
Description
In the recent years, the rise of data-driven technologies has shown how important it is to have high-quality data for machine learning and artificial intelligence (AI) applications. However, acquiring sufficient real-world data for the training and testing of these models often presents significant challenges, ranging from data scarcity and data privacy concerns to issues of bias and representativeness. In response to these challenges, the use/generation of synthetic data has emerged as a promising solution, offering artificially generated data using algorithms and statistical models that replicate the patterns, characteristics, and relationships found in real-world data. Synthetic data can serve as a valuable supplement/replacement to the real data, especially in situations where obtaining/gathering sufficient real data is challenging or where privacy concerns limit data access. Additionally, the synthetic data generation techniques can be tuned to address the bias present in a dataset by generating data that is more representative and balanced. While synthetic data offers many advantages, including the ability to augment the available data, enhance the model performance, and mitigate privacy concerns, it cannot fully replace real-world data in all domains. Real data often contains multi-layered details and complexities that are difficult to replicate accurately. Thus, it is crucial to consider the limitations and biases inherent in synthetic data and interpret results accordingly.
The proposed workshop aims to provide a platform for researchers, practitioners, and industry experts to exchange their insights, share best practices, and explore cutting-edge developments in synthetic data generation. By fostering collaboration and interdisciplinary interaction, this workshop aims to advance our understanding of synthetic data’s capabilities, limitations, and ethical implications (fairness) in the context of machine learning and AI. The proposed workshop will be 2 hours long and will cover a wide range of topics, including but not limited to:
Applications of synthetic data in training machine learning models, reducing bias, enhancing privacy, and improving data diversity.
Privacy-preserving techniques (pseudonymization, anonymization, and differential privacy) in the context of synthetic data generation.
Challenges and limitations of synthetic data in terms of privacy risks, quality assessment, and generalization to diverse datasets.
Case studies and real-world implementations of synthetic data in healthcare, autonomous vehicles, and virtual environments (domain transfer).
Regulatory considerations, compliance with data protection laws, and implications for synthetic data generation.
Large Language Models for generating high-quality synthetic data in various domains with a focus on different modalities (images, tabular, and time series datasets).
Diffusion models for synthetic data generation
Vision Language models for synthetic data generation
Submission Instructions
Submission to the workshop follows the same procedures as for main conference papers. This workshop supports both ordinary and short paper submissions. please select the specific Special Session name you are interested in the "additional questions" part of the submission. Please note that submission dates are the same as per the main conference schedule.