CCRVAM: Scalable Checkerboard Copula Modeling for Multi-Dimensional Categorical Data

Abstract
As part of my honors thesis in statistics, this work presents a scalable framework for discrete checkerboard copula modeling and novel regression dependency measures for ordinal and nominal categorical data. By leveraging efficient and modularized Python implementations, this project enhances efficiency and accessibility for large-scale data analysis while introducing innovative tools in form of a package for quantifying regression dependencies in complex datasets.
Type
This work introduces CCRVAM, a Python package for scalable discrete checkerboard copula modeling and regression dependency analysis in categorical data. Developed as part of an honors thesis in statistics, the framework focuses on efficient and modular implementations, offering researchers new tools for understanding complex ordinal and nominal data structures.
Key Features:
- CCRVAM Object Construction: Build copulas, and relevant distributions directly from contingency tables or case-forms.
- Regression Analysis: Compute marginal distributions, conditional expectations, and association measures such as CCRAM and SCCRAM.
- Performance Optimization: Vectorized implementations and parallel computing support for scalability.
- Edge-Case Handling: Rigorous testing ensures robustness across diverse datasets, and environments through Makefile.
- Statistical Insights: Innovative tools for identifying regression dependencies in multivariate categorical data.