Problem
Real datasets can be sparse or constrained, making it hard to run safe, repeatable churn experiments at scale.
Case Study
Research-backed engineering for synthetic tabular data generation
Real datasets can be sparse or constrained, making it hard to run safe, repeatable churn experiments at scale.
Implemented and tested a TabDiff-inspired workflow to generate synthetic tabular records and benchmarked downstream prediction behavior against baseline datasets.
Framed the experiment design, implemented preprocessing and evaluation scripts, and documented tradeoffs between fidelity and utility.
Controlled for data leakage risk while keeping synthetic outputs statistically meaningful enough for practical experimentation.