Research Preview · Federated Learning Systems

CHiP-FL

Client Clustering, Hierarchical aggregation, and Personalization for fair federated learning in healthcare.

Reducing site-size bias across hospitals, without sacrificing global AUROC.

Launch simulator Read research View results GitHub

Hospitals simulated

Site-bias slope ↓

0.000

Global AUROC

p10 / p90 spread ↓

Overview

CHiP-FL is a federated learning framework for equitable clinical prediction. It clusters hospitals by their data signatures, trains through hierarchical aggregation across global, cluster, and institutional levels, then personalizes each site, cutting size-based performance bias by over 90% on the eICU Collaborative Research Database while keeping global AUROC competitive.

The work has been accepted for presentation at ISMB through the Translational Medical Informatics and Applications track, and will also be presented at the Mayo Clinic AI Research Summit and Mount Sinai's AI in Healthcare Conference. It will be published in the ACM Digital Library through the ACM-BCB Conference.

Live simulator

Run federated training across 32 hospitals

Pick an algorithm, push play, and watch site-level AUROC, fairness disparity, and the size-bias slope evolve in real time.

Shared aggregated weights, evaluated per hospital

Federation graph

round00 / 30

cluster 1cluster 2cluster 3cluster 4cluster 5

algo: chipfl

Speed1.2×

Clusters (K)5

Personalization60%

Live metrics

Global AUROC

0.656

Std (fairness)

0.021

Size-bias slope

-0.014

≈ flat ✓

p10 / p90

0.63 / 0.69

p10 (worst hospitals)meanp90

Live commentary

Idle. Press start to begin federated rounds.

slope

-0.014

disparity

0.059

comm cost

0.0 MB

Signature plot

Performance vs hospital size

Each dot is a hospital. A flatter trend line means the algorithm is fair across institution sizes , the central claim of CHiP-FL.

slope -0.014

3D Pareto frontier

Performance × Fairness × Personalization

Each algorithm sits in a 3D tradeoff space. CHiP-FL is the only point that pushes outward on all three axes simultaneously. Drag to rotate, scroll to zoom.

FedAvg

perf

0.81

fair

0.30

pers

0.10

FedProx

perf

0.83

fair

0.45

pers

0.15

Clustered

perf

0.84

fair

0.62

pers

0.30

Personalized

perf

0.82

fair

0.55

pers

0.85

CHiP-FL

Ours

perf

0.87

fair

0.92

pers

0.78

Algorithm

Inside the CHiP-FL pipeline

Five stages connect raw clinical data at the edge to a fair, deployable model , without centralizing PHI.

Local embeddings

Each hospital trains a local encoder over its EHR features and emits a compact data signature without sharing patient records.

Client clustering

Signatures are clustered (K-means in signature space) into K groups capturing similar patient distributions and care patterns.

Hierarchical aggregation

Within each cluster we run cohesive proximal updates (λ); cluster prototypes are then bent (α) toward the global anchor (μ).

Global update

A weighted mixture across clusters yields a single global model , but small sites are no longer drowned out by big-site gradients.

Personalized refinement

Each hospital takes a few local fine-tune steps from the global initialization, recovering site-specific structure.

Objective

min_w ∑_k=1..K ∑_{i ∈ C_k} n_i · L_i(w_i) + λ ‖w_i − w_{C_k}‖² + μ ‖w_{C_k} − w_g‖², w_{C_k} ← α·w_{C_k} + (1−α)·w_g

λ within-cluster cohesionμ global anchor strengthα cluster ↔ global bendK = number of clusters

Benchmarks

eICU mortality, 208 hospitals

CHiP-FL improves both global AUROC and tail performance , and nearly eliminates the size-bias slope.

Algorithm

AUROC (global)

p10 → p90

Std (fair)

Size-bias slope

FedAvg

0.812

0.68→0.87

0.061

-0.181

FedProx

0.829

0.70→0.88

0.055

-0.143

SCAFFOLD

0.834

0.71→0.88

0.050

-0.121

Clustered FL

0.841

0.73→0.89

0.041

-0.082

Hierarchical FL

0.847

0.75→0.89

0.037

-0.061

Personalization-only

0.825

0.72→0.88

0.046

-0.097

CHiP-FLOurs

0.864

0.81→0.90

0.024

-0.022

About the research

Equitable AI for non-IID clinical federations

Hospitals differ in patient mix, instrumentation, and acuity. Standard federated optimizers optimize a size-weighted global objective, which silently transfers performance from small community hospitals to large academic centers.

CHiP-FL reframes the federation as a hierarchy of clusters , each representing a cohort of hospitals with similar data signatures , and adds a personalization step so every site can specialize without drifting from the global anchor. We benchmark on the eICU Collaborative Research Database across hospital-level mortality prediction.

View on GitHub Read paper (PDF) eICU dataset

Abstract

We introduce CHiP-FL, a federated learning framework that combines client clustering, hierarchical aggregation, and personalized refinement to mitigate site-size bias in clinical prediction. On 208 ICUs from the eICU CRD, CHiP-FL improves global AUROC by +0.05 over FedAvg while reducing the AUROC-vs-size slope by 87% and the p10/p90 spread by 41%.

Stack

Python · NumPy · scikit-learn
Flower (flwr) federated simulation
eICU CRD · Parquet · PyArrow