tabularTools -- A biomedical ML-based data analysis pipeline • tabularTools

A config-driven R package for reproducable analysis of tabular biomedical data

Motivation

tabularTools demonstrates a clean, testable software engineering approach to tabular data analysis in biomedical research, with a focus on

explicit configuration with YAML
reproducable preprocessing using dplyr/tidyverse
extendable modeling APIs
parallel-safe execution
unit-tested components

#Reading the yaml config file
read_config()

#Reading data file specified in config
read_data()

#Validating the data is valid tabular data
validate_data()

#Proprocessing based on config definitions
preprocess_data()

#Fitting defined models
fit_models()

#Evaluation of model results
evaluate_results()

#Creating visualizations based on evaluations
visualize_results()

Example Usage

library(tabularTools)
library(future)

#Enable parallel execution
plan(multisession, workers = 4)

cfg   <- read_config("example/config.yaml")
data  <- read_data(cfg)

validate_data(data, cfg)

pdata <- preprocess_data(cfg, data)

models <- fit_models(pdata, cfg)

#Inspect fitted model
summary(models$logistic$`0_vs_1`$model)

User Configuration

Analysis is driven by a YAML configuration file

data:
  file: "heart_disease_uci.csv"
  
analysis:
  outcome: num
  predictors:
    - age
    - sex
    - chol
    - cp
    - trestbps
    - fbs
    - restecg
    - thalch
    - exang
    - oldpeak
    - slope
    - ca
    - thal
  models:
    - logistic
    - svm
  contrasts:
    - [0, 1]
    - [0, 2]
    - [0, 3]
    - [0, 4]

preprocessing:
  scale_numeric: true
  impute_missing: median #options are "drop", "median", or "none"

visualization:
  roc_curve: true
  coefficient_plot: true

report:
  title: "Heart Disease Analysis"

Repository structure

R/ - source code
tests/ - testthat unit tests
vignettes/ - sample quarto markdown files

Status

This package is under active development and is intended to show R’s useful software development tools to simplify the management of complex tabular data

Tabular Tools

A config-driven R package for reproducable analysis of tabular biomedical data

Motivation

Example Usage

User Configuration

Repository structure

Status

Links

License

Citation

Developers

Dev status