CCRVAM: Scalable Checkerboard Copula Modeling for Multi-Dimensional Categorical Data

Jan 24, 2025ยท
Dhyey Mavani
,
Professor Shu-Min Liao
,
Professor Daeyoung Kim
ยท 1 min read
Abstract
As part of my honors thesis in statistics, this work presents a scalable framework for discrete checkerboard copula modeling and novel regression dependency measures for ordinal and nominal categorical data. By leveraging efficient and modularized Python implementations, this project enhances efficiency and accessibility for large-scale data analysis while introducing innovative tools in form of a package for quantifying regression dependencies in complex datasets.
Type

This work introduces CCRVAM, a Python package for scalable discrete checkerboard copula modeling and regression dependency analysis in categorical data. Developed as part of an honors thesis in statistics, the framework focuses on efficient and modular implementations, offering researchers new tools for understanding complex ordinal and nominal data structures.

Key Features:

  • CCRVAM Object Construction: Build copulas, and relevant distributions directly from contingency tables or case-forms.
  • Regression Analysis: Compute marginal distributions, conditional expectations, and association measures such as CCRAM and SCCRAM.
  • Performance Optimization: Vectorized implementations and parallel computing support for scalability.
  • Edge-Case Handling: Rigorous testing ensures robustness across diverse datasets, and environments through Makefile.
  • Statistical Insights: Innovative tools for identifying regression dependencies in multivariate categorical data.