Discopula: Scalable Checkerboard Copula Modeling for Categorical Data

Jan 24, 2025ยท
Dhyey Mavani
,
Professor Shu-Min Liao
ยท 1 min read
Abstract
As part of my honors thesis in statistics, this work presents a scalable framework for discrete checkerboard copula modeling and novel regression dependency measures for ordinal and nominal categorical data. By leveraging parallel computing and modularized Python and R implementations, this project enhances efficiency and accessibility for large-scale data analysis while introducing innovative tools for quantifying regression dependencies in complex datasets.
Type

This work introduces Discopula, a Python package for scalable discrete checkerboard copula modeling and regression dependency analysis in categorical data. Developed as part of an honors thesis in statistics, the framework focuses on efficient and modular implementations, offering researchers new tools for understanding complex ordinal and nominal data structures.

Key Features:

  • Checkerboard Copula Construction: Build copulas directly from contingency tables.
  • Regression Analysis: Compute marginal distributions, conditional expectations, and association measures such as CCRAM and SCCRAM.
  • Performance Optimization: Vectorized implementations and parallel computing support for scalability.
  • Edge-Case Handling: Rigorous testing ensures robustness across diverse datasets.
  • Statistical Insights: Innovative tools for identifying regression dependencies in multivariate categorical data.