Research Preview · Federated Learning Systems

CHiP-FL

Client Clustering, Hierarchical aggregation, and Personalization for fair federated learning in healthcare.

Reducing site-size bias across hospitals, without sacrificing global AUROC.

0
Hospitals simulated
0%
Site-bias slope ↓
0.000
Global AUROC
0%
p10 / p90 spread ↓
Overview

CHiP-FL is a federated learning framework for equitable clinical prediction. It clusters hospitals by their data signatures, trains through hierarchical aggregation across global, cluster, and institutional levels, then personalizes each site, cutting size-based performance bias by over 90% on the eICU Collaborative Research Database while keeping global AUROC competitive.

The work has been accepted for presentation at ISMB through the Translational Medical Informatics and Applications track, and will also be presented at the Mayo Clinic AI Research Summit and Mount Sinai's AI in Healthcare Conference. It will be published in the ACM Digital Library through the ACM-BCB Conference.

Live simulator

Run federated training across 32 hospitals

Pick an algorithm, push play, and watch site-level AUROC, fairness disparity, and the size-bias slope evolve in real time.

Shared aggregated weights, evaluated per hospital
Federation graph
round00 / 30
SERVER0.670.690.630.660.670.650.650.680.630.660.630.630.640.650.640.670.660.690.630.620.630.690.670.660.650.640.640.690.660.650.690.68
cluster 1cluster 2cluster 3cluster 4cluster 5
algo: chipfl
Speed1.2×
Clusters (K)5
Personalization60%
Live metrics
Global AUROC
0.656
Std (fairness)
0.021
Size-bias slope
-0.014
≈ flat ✓
p10 / p90
0.63 / 0.69
p10 (worst hospitals)meanp90
Live commentary

Idle. Press start to begin federated rounds.

slope
-0.014
disparity
0.059
comm cost
0.0 MB
Signature plot

Performance vs hospital size

Each dot is a hospital. A flatter trend line means the algorithm is fair across institution sizes , the central claim of CHiP-FL.

3D Pareto frontier

Performance × Fairness × Personalization

Each algorithm sits in a 3D tradeoff space. CHiP-FL is the only point that pushes outward on all three axes simultaneously. Drag to rotate, scroll to zoom.

FedAvg
perf
0.81
fair
0.30
pers
0.10
FedProx
perf
0.83
fair
0.45
pers
0.15
Clustered
perf
0.84
fair
0.62
pers
0.30
Personalized
perf
0.82
fair
0.55
pers
0.85
CHiP-FL
Ours
perf
0.87
fair
0.92
pers
0.78
Algorithm

Inside the CHiP-FL pipeline

Five stages connect raw clinical data at the edge to a fair, deployable model , without centralizing PHI.

01

Local embeddings

Each hospital trains a local encoder over its EHR features and emits a compact data signature without sharing patient records.

02

Client clustering

Signatures are clustered (K-means in signature space) into K groups capturing similar patient distributions and care patterns.

03

Hierarchical aggregation

Within each cluster we run cohesive proximal updates (λ); cluster prototypes are then bent (α) toward the global anchor (μ).

04

Global update

A weighted mixture across clusters yields a single global model , but small sites are no longer drowned out by big-site gradients.

05

Personalized refinement

Each hospital takes a few local fine-tune steps from the global initialization, recovering site-specific structure.

Objective
minw k=1..Ki ∈ Ck ni · Li(wi) + λ ‖wi − wCk‖² + μ ‖wCk − wg‖², wCkα·wCk + (1−α)·wg
λ within-cluster cohesionμ global anchor strengthα cluster ↔ global bendK = number of clusters
Benchmarks

eICU mortality, 208 hospitals

CHiP-FL improves both global AUROC and tail performance , and nearly eliminates the size-bias slope.

Algorithm
AUROC (global)
p10 → p90
Std (fair)
Size-bias slope
FedAvg
0.812
0.680.87
0.061
-0.181
FedProx
0.829
0.700.88
0.055
-0.143
SCAFFOLD
0.834
0.710.88
0.050
-0.121
Clustered FL
0.841
0.730.89
0.041
-0.082
Hierarchical FL
0.847
0.750.89
0.037
-0.061
Personalization-only
0.825
0.720.88
0.046
-0.097
CHiP-FLOurs
0.864
0.810.90
0.024
-0.022
About the research

Equitable AI for non-IID clinical federations

Hospitals differ in patient mix, instrumentation, and acuity. Standard federated optimizers optimize a size-weighted global objective, which silently transfers performance from small community hospitals to large academic centers.

CHiP-FL reframes the federation as a hierarchy of clusters , each representing a cohort of hospitals with similar data signatures , and adds a personalization step so every site can specialize without drifting from the global anchor. We benchmark on the eICU Collaborative Research Database across hospital-level mortality prediction.

Abstract

We introduce CHiP-FL, a federated learning framework that combines client clustering, hierarchical aggregation, and personalized refinement to mitigate site-size bias in clinical prediction. On 208 ICUs from the eICU CRD, CHiP-FL improves global AUROC by +0.05 over FedAvg while reducing the AUROC-vs-size slope by 87% and the p10/p90 spread by 41%.

Stack
  • Python · NumPy · scikit-learn
  • Flower (flwr) federated simulation
  • eICU CRD · Parquet · PyArrow