ccrvam: A Python Package for Model-Free Exploratory Analysis of Multivariate Discrete Data with an Ordinal Response Variable

Jun 30, 2025·

Dhyey Mavani

Professor Daeyoung Kim

Professor Shu-Min Liao

· 2 min read

Cite Code Project Slides

Python package architecture for checkerboard copula regression analysis

Abstract

Understanding regression dependencies among discrete variables, especially with ordinal responses, poses significant challenges in exploratory data analysis. Traditional continuous copula models and classical EDA techniques often fail to capture the structure required for categorical datasets. This paper introduces “ccrvam,” an open-source Python package implementing the Checkerboard Copula Regression-based Visualization and Association Measure framework. The package operationalizes the model-free dependence measures proposed by Wei and Kim (2021) and extended by Liao et al. (2024), providing robust tools for characterizing regression relationships in multidimensional contingency tables. ccrvam offers efficient implementations for constructing checkerboard copulas, computing marginal distributions and CDFs, performing regression analysis with visualization, and calculating association measures (CCRAM/SCCRAM) with bootstrap and permutation testing support. The package integrates seamlessly with the Python scientific computing ecosystem through NumPy, Pandas, SciPy, and Matplotlib, while maintaining rigorous testing standards with Pytest. Through extensive benchmarks and real-world case studies, we demonstrate that ccrvam provides researchers with powerful, scalable tools for uncovering complex dependence structures in categorical data.

Type

Software

Publication

Journal of Statistical Software (in progress)

This work introduces ccrvam, a comprehensive Python package for Checkerboard Copula Regression-based Visualization and Association Measure analysis. The package provides researchers with efficient, scalable tools for exploring regression dependencies in multivariate categorical data with ordinal response variables.

Key Features

Core Functionality

CCRVAM Object Construction: Build copulas and distributions from contingency tables in multiple formats (case form, frequency form, table form)
Regression Analysis: Compute marginal distributions, conditional expectations, and perform checkerboard copula regression with prediction capabilities
Association Measures: Calculate CCRAM (Checkerboard Copula Regression Association Measure) and SCCRAM (Scaled CCRAM)
Statistical Testing: Bootstrap functionality for predictions and association measures, plus permutation testing for significance assessment

Technical Implementation

Performance Optimization: Vectorized and parallelized implementations for scalability
Robust Design: Comprehensive edge-case handling and rigorous unit testing with Pytest
Integration: Seamless compatibility with NumPy, Pandas, SciPy, and Matplotlib
Documentation: Full documentation with Read the Docs, including examples and API reference
Distribution: Available on PyPI for easy installation and dependency management

Methodological Foundation

The package implements the theoretical framework developed by:

Wei and Kim (2021): Original checkerboard copula regression methodology for multidimensional contingency tables
Liao et al. (2024): Extensions for visualization and enhanced association measures

Software Architecture

ccrvam follows modern Python package development standards with:

Modular design separating core algorithms, utilities, and visualization components
Comprehensive test suite ensuring reliability across diverse datasets
Continuous integration and automated testing workflows
GPL-3.0 licensing for open-source compatibility

Applications

The package addresses critical gaps in categorical data analysis by providing:

Model-free exploration of complex dependence structures
Robust alternatives to traditional continuous copula approaches
Scalable solutions for large-scale categorical datasets
Interpretable visualizations of regression relationships

This contribution fills a significant void in the Python statistical computing ecosystem, offering researchers powerful tools for understanding categorical data relationships that were previously difficult to analyze systematically.

Last updated on Jun 30, 2025

Checkerboard Copula Categorical Data Analysis Regression Dependencies Python Software Statistical Computing Exploratory Data Analysis

Authors

Dhyey Mavani

Software Engineer, Agentic AI
(Special Projects with C-suite)

← chipfiring: A Python Package for Efficient Mathematical Analysis of Chip-Firing Games on Multigraphs Aug 1, 2025

Statistics Honors Thesis at Amherst College (Summa cum laude) Apr 16, 2025 →