[Statistics Thesis] ccrvam : a Python Package for Model-Free Exploratory Analysis of Multivariate Discrete Data with an Ordinal Response Variable

Apr 16, 2025·
Dhyey Mavani
,
Professor Shu-Min Liao
,
Professor Daeyoung Kim
· 1 min read
Abstract
Understanding regression dependencies among discrete variables, especially in the presence of ordinal responses, poses a persistent challenge in exploratory data analysis (EDA). While classical EDA techniques and continuous copula models have proven effective for continuous data, they often fail to capture the structure and interpretability required for categorical datasets. This thesis begins by critically evaluating these traditional approaches and highlighting their limitations in discrete settings. Motivated by these gaps, we explore the model-free dependence measures proposed by Wei and Kim (2021) and further advanced by Liao et al. (2024), which leverages the checkerboard copula framework to robustly characterize regression relationships in multidimensional contingency tables with both ordinal and nominal variables. To operationalize this method, we present a novel, modular, and scalable Python package, ccrvam, designed to support efficient large-scale analysis. The package integrates with established scientific libraries such as NumPy, Pandas, SciPy, and Matplotlib, while incorporating Pytest and Sphinx for testing and maintainability. Through extensive simulations and real-world case studies, we demonstrate that ccrvam offers a powerful and flexible toolset for uncovering complex dependence structures in categorical data. Our contributions provide both a theoretical exposition and a novel practical resource for researchers engaged in data-driven exploration of discrete regression phenomena.
Type
Publication
ccrvam : a Python Package for Model-Free Exploratory Analysis of Multivariate Discrete Data with an Ordinal Response Variable

This work introduces CCRVAM, a Python package for scalable discrete checkerboard copula modeling and regression dependency analysis in categorical data. Developed as part of an honors thesis in statistics, the framework focuses on efficient and modular implementations, offering researchers new tools for understanding complex ordinal and nominal data structures.

Key Features:

  • CCRVAM Object Construction: Build copulas, and relevant distributions directly from contingency tables or case-forms.
  • Regression Analysis: Compute marginal distributions, conditional expectations, and association measures such as CCRAM and SCCRAM.
  • Performance Optimization: Vectorized + Parallelized implementations and parallel computing support for scalability.
  • Edge-Case Handling: Rigorous testing ensures robustness across diverse datasets, and environments through Makefile.
  • Statistical Insights: Innovative tools for identifying regression dependencies in multivariate categorical data.