ccrvam: A Python Package for Model-Free Exploratory Analysis of Multivariate Discrete Data with an Ordinal Response Variable

Jun 30, 2025·
Dhyey Mavani
Dhyey Mavani
,
Professor Daeyoung Kim
,
Professor Shu-Min Liao
· 2 min read
Python package architecture for checkerboard copula regression analysis
Abstract
Understanding regression dependencies among discrete variables, especially with ordinal responses, poses significant challenges in exploratory data analysis. Traditional continuous copula models and classical EDA techniques often fail to capture the structure required for categorical datasets. This paper introduces “ccrvam,” an open-source Python package implementing the Checkerboard Copula Regression-based Visualization and Association Measure framework. The package operationalizes the model-free dependence measures proposed by Wei and Kim (2021) and extended by Liao et al. (2024), providing robust tools for characterizing regression relationships in multidimensional contingency tables. ccrvam offers efficient implementations for constructing checkerboard copulas, computing marginal distributions and CDFs, performing regression analysis with visualization, and calculating association measures (CCRAM/SCCRAM) with bootstrap and permutation testing support. The package integrates seamlessly with the Python scientific computing ecosystem through NumPy, Pandas, SciPy, and Matplotlib, while maintaining rigorous testing standards with Pytest. Through extensive benchmarks and real-world case studies, we demonstrate that ccrvam provides researchers with powerful, scalable tools for uncovering complex dependence structures in categorical data.
Type
Publication
Journal of Statistical Software (under review)

This work introduces ccrvam, a comprehensive Python package for Checkerboard Copula Regression-based Visualization and Association Measure analysis. The package provides researchers with efficient, scalable tools for exploring regression dependencies in multivariate categorical data with ordinal response variables.

Key Features

Core Functionality

  • CCRVAM Object Construction: Build copulas and distributions from contingency tables in multiple formats (case form, frequency form, table form)
  • Regression Analysis: Compute marginal distributions, conditional expectations, and perform checkerboard copula regression with prediction capabilities
  • Association Measures: Calculate CCRAM (Checkerboard Copula Regression Association Measure) and SCCRAM (Scaled CCRAM)
  • Statistical Testing: Bootstrap functionality for predictions and association measures, plus permutation testing for significance assessment

Technical Implementation

  • Performance Optimization: Vectorized and parallelized implementations for scalability
  • Robust Design: Comprehensive edge-case handling and rigorous unit testing with Pytest
  • Integration: Seamless compatibility with NumPy, Pandas, SciPy, and Matplotlib
  • Documentation: Full documentation with Read the Docs, including examples and API reference
  • Distribution: Available on PyPI for easy installation and dependency management

Methodological Foundation

The package implements the theoretical framework developed by:

  • Wei and Kim (2021): Original checkerboard copula regression methodology for multidimensional contingency tables
  • Liao et al. (2024): Extensions for visualization and enhanced association measures

Software Architecture

ccrvam follows modern Python package development standards with:

  • Modular design separating core algorithms, utilities, and visualization components
  • Comprehensive test suite ensuring reliability across diverse datasets
  • Continuous integration and automated testing workflows
  • GPL-3.0 licensing for open-source compatibility

Applications

The package addresses critical gaps in categorical data analysis by providing:

  • Model-free exploration of complex dependence structures
  • Robust alternatives to traditional continuous copula approaches
  • Scalable solutions for large-scale categorical datasets
  • Interpretable visualizations of regression relationships

This contribution fills a significant void in the Python statistical computing ecosystem, offering researchers powerful tools for understanding categorical data relationships that were previously difficult to analyze systematically.