CHiP-FL
Client Clustering, Hierarchical aggregation, and Personalization for fair federated learning in healthcare.
Reducing site-size bias across hospitals, without sacrificing global AUROC.
CHiP-FL is a federated learning framework for equitable clinical prediction. It clusters hospitals by their data signatures, trains through hierarchical aggregation across global, cluster, and institutional levels, then personalizes each site, cutting size-based performance bias by over 90% on the eICU Collaborative Research Database while keeping global AUROC competitive.
The work has been accepted for presentation at ISMB through the Translational Medical Informatics and Applications track, and will also be presented at the Mayo Clinic AI Research Summit and Mount Sinai's AI in Healthcare Conference. It will be published in the ACM Digital Library through the ACM-BCB Conference.
Run federated training across 32 hospitals
Pick an algorithm, push play, and watch site-level AUROC, fairness disparity, and the size-bias slope evolve in real time.
Idle. Press start to begin federated rounds.
Performance vs hospital size
Each dot is a hospital. A flatter trend line means the algorithm is fair across institution sizes , the central claim of CHiP-FL.
Performance × Fairness × Personalization
Each algorithm sits in a 3D tradeoff space. CHiP-FL is the only point that pushes outward on all three axes simultaneously. Drag to rotate, scroll to zoom.
Inside the CHiP-FL pipeline
Five stages connect raw clinical data at the edge to a fair, deployable model , without centralizing PHI.
Local embeddings
Each hospital trains a local encoder over its EHR features and emits a compact data signature without sharing patient records.
Client clustering
Signatures are clustered (K-means in signature space) into K groups capturing similar patient distributions and care patterns.
Hierarchical aggregation
Within each cluster we run cohesive proximal updates (λ); cluster prototypes are then bent (α) toward the global anchor (μ).
Global update
A weighted mixture across clusters yields a single global model , but small sites are no longer drowned out by big-site gradients.
Personalized refinement
Each hospital takes a few local fine-tune steps from the global initialization, recovering site-specific structure.
eICU mortality, 208 hospitals
CHiP-FL improves both global AUROC and tail performance , and nearly eliminates the size-bias slope.
Equitable AI for non-IID clinical federations
Hospitals differ in patient mix, instrumentation, and acuity. Standard federated optimizers optimize a size-weighted global objective, which silently transfers performance from small community hospitals to large academic centers.
CHiP-FL reframes the federation as a hierarchy of clusters , each representing a cohort of hospitals with similar data signatures , and adds a personalization step so every site can specialize without drifting from the global anchor. We benchmark on the eICU Collaborative Research Database across hospital-level mortality prediction.
We introduce CHiP-FL, a federated learning framework that combines client clustering, hierarchical aggregation, and personalized refinement to mitigate site-size bias in clinical prediction. On 208 ICUs from the eICU CRD, CHiP-FL improves global AUROC by +0.05 over FedAvg while reducing the AUROC-vs-size slope by 87% and the p10/p90 spread by 41%.
- Python · NumPy · scikit-learn
- Flower (flwr) federated simulation
- eICU CRD · Parquet · PyArrow