Case Study

Independent Study: TabDiff

Research-backed engineering for synthetic tabular data generation

PythonNumPyPandasJupyterscikit-learn

Problem

Real datasets can be sparse or constrained, making it hard to run safe, repeatable churn experiments at scale.

Solution

Implemented and tested a TabDiff-inspired workflow to generate synthetic tabular records and benchmarked downstream prediction behavior against baseline datasets.

What I Did

Framed the experiment design, implemented preprocessing and evaluation scripts, and documented tradeoffs between fidelity and utility.

Challenges

Controlled for data leakage risk while keeping synthetic outputs statistically meaningful enough for practical experimentation.

Links