Benchmarking LLM-Based Synthetic Data Generators for Structural Coherence in Behavioral and Human-Centered Datasets

Scholarship details

Benchmarking LLM-Based Synthetic Data Generators for Structural Coherence in Behavioral and Human-Centered Datasets
Study levels	Ph D
Close date	Monday, 22 September 2025
Domestic/international	Domestic Only

About the scholarship

This project benchmarks large language model (LLM)-based synthetic data generators, such as GReaT and TabulaLLM, with a focus on their ability to preserve structural coherence in behavioural and human-centred datasets. These datasets encompass psychological, educational, and user behaviour data that often include ordinal scales, categorical variables, logical constraints, and complex theory-driven relationships unique to human-centred research. The project will evaluate how effectively current LLM-based models generate synthetic data that maintains these important structural and semantic properties. This project is for a single student.

Entry requirements

A completed online application must be submitted by 4.30 pm 22 September 2025. Late or incomplete applications will not be accepted. Any required supporting documentation (including references) must also be received by 4.30 pm on the closing date in order for the application to be considered.