JuliaHealth Projects – Summer of Code

JuliaHealth is an organization dedicated to improving healthcare by promoting open-source technologies and data standards. Our community is made up of researchers, data scientists, software developers, and healthcare professionals who are passionate about using technology to improve patient outcomes and promote data-driven decision-making. We believe that by working together and sharing our knowledge and expertise, we can create powerful tools and solutions that have the potential to transform healthcare.

Observational Health Subecosystem Projects

Project 1: Supporting Patient Level Prediction Pipelines within JuliaHealth

Description: Patient level prediction (PLP) is an important area of research in observational health research that involves using patient data to predict outcomes such as disease progression, response to treatment, and hospital readmissions. JuliaHealth is interested in developing supportive tooling for PLP that utilizes historical patient data, such as patient medical claims or electronic health records, that follow the OMOP Common Data Model (OMOP CDM), a widely used data standard that allows researchers to analyze large, heterogeneous healthcare datasets in a consistent and efficient manner. For this project, we are looking for students interested in developing supportive PLP tooling within JuliaHealth.

This project will be very experimental and exploratory in nature. To constrain the expectations for this project, here is a possible approach students will follow while working on this project:

In whatever functionality that gets developed for tools within JuliaHealth, it will also be expected for students to contribute to the existing package documentation to highlight how new features can be used. Another perspective of this project is that its intended goal is to provide the foundational support needed within JuliaHealth to better accommodate multiple modalities of data available within public health settings. The long term goal is to use the development of foundational tooling with JuliaHealth to better support patient level prediction workflows across observational health data and additional information such as survey data, social determinants of health data, and climate data.

Additionally, depending on the success of the package, there is a potential to run experiments on actual patient data to generate actual patient population insights based on a chosen research question. This could possibly turn into a separate research paper, conference submission, or poster submission. Whatever may occur in this situation will be supported by project mentors.

Medical Imaging Subecosystem Projects

Julia Radiomics

Project Title: Julia Radiomics Difficulty: Medium Duration: 375 hours (22 Weeks) Mentor: Jakub Mitura

Description

Radiomic features are quantitative metrics extracted from medical images using data-characterization algorithms. These features capture tissue and lesion characteristics, such as heterogeneity and shape, which may provide valuable insights beyond what the naked eye can perceive.

This project aims to implement algorithms for extracting radiomic features from 2D and 3D medical images, similar to PyRadiomics, using Julia. The implementation will include Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRM), Neighborhood Gray Tone Difference Matrix (NGTDM), and Gray Level Dependence Matrix (GLDM). The extracted features will be validated against PyRadiomics and applied to medical imaging data, such as the AutoPET dataset, to demonstrate the methodology.

Deliverables

Implementation of Radiomic Feature Extraction Algorithms

Feature Extraction Pipeline

Validation

Final Report & Code Repository

Success Criteria and Timeline

  1. Literature Review and Setup (3 Weeks)

  1. Feature Implementation (6 Weeks)

  1. Feature Extraction Pipeline (4 Weeks)

  1. Validation (3 Weeks)

  1. Documentation and Packaging (4 Weeks)

  1. Reporting (2 Weeks)

Stretch Goals

Clarification

This implementation will be done entirely in Julia, and Python will not be used in any part of the implementation. Any cross-validation with PyRadiomics is purely for benchmarking purposes.

Importance and Impact

Technical Impact

Clinical Impact

Community Impact

References

KLAY-Core: High-Performance Neurosymbolic Constraint Layers for Trustworthy Medical AI

Difficulty: Hard / Ambitious Duration: 350 hours (22 weeks) Mentor: Jakub Mitura Technology Stack: Julia, Lux.jl, NNlib.jl, ChainRules.jl, LogExpFunctions.jl

Description

Deep learning in medical diagnostics suffers from a well-known trust gap. Models often behave as black boxes and may produce physiologically implausible predictions — for example simultaneously predicting cachexia and obesity. This lack of interpretability and clinical consistency limits adoption of AI systems in healthcare environments.

Neurosymbolic artificial intelligence (NeSy) addresses this limitation by integrating structured logical knowledge directly into neural models. However, many existing approaches struggle with numerical stability, scalability, and GPU efficiency when deployed in realistic clinical settings.

KLAY-Core is a high-performance logical constraint layer designed for Lux.jl. It enables domain experts and developers to encode clinical knowledge as differentiable logical constraints integrated directly into neural network architectures.

Using the Knowledge Layers (KLAY) architecture, the project introduces static linearization of logical circuits (d-DNNF) into optimized tensor buffers. Circuit evaluation is reduced to sequences of NNlib.scatter operations and tensor indexing, significantly improving GPU parallel efficiency while ensuring physiologically consistent predictions.

Main Goals and Implementation

Medical Logic Compiler and d-DNNF Bridge (Julia-Native)

The project follows a "compile once, evaluate often" paradigm for efficient integration of symbolic knowledge into neural models.

Yggdrasil and JLL Integration

High-performance symbolic compilers (e.g., d4, SDD) will be distributed as precompiled binaries via Yggdrasil and JLL packages. This guarantees a fully Julia-native workflow without requiring Python environments or local C++ toolchain configuration.

Level-Order Flattening

A dedicated algorithm groups logical graph nodes into layers based on structural height. This converts hierarchical logical circuits into flat GPU-friendly buffers, eliminating recursion and enabling efficient parallel execution.

Solving the Derivative Bottleneck

Custom adjoints (rrule) implemented using ChainRules.jl ensure backward-pass efficiency comparable to standard neural layers while avoiding excessive memory overhead typical of recursive automatic differentiation.

User Interface – The @clinical_rule Macro

To reduce usability barriers for clinicians and developers, the package introduces a domain-specific DSL macro supporting full Boolean logic and weighted relationships where w ∈ [0,1].

Unlike Python-based frameworks such as Dolphin, which rely on object-oriented logic definitions, KLAY-Core offers a declarative macro interface integrated directly with the Julia compiler. This improves readability, auditability, and interdisciplinary collaboration between clinicians and AI engineers.

Supported Logical Operators:

Constraint Types:

Hard Constraints (w = 1.0): Strict logical rules ensuring physiological consistency.

Soft Constraints (w < 1.0): Probabilistic correlations or clinical risk relationships.

KLAY-Core Engine for Lux.jl

Explicit Layer Design

Implementation of an AbstractExplicitLayer where circuit structure is stored in the layer state while constraint strengths remain trainable parameters. This supports determinism, transparency, and reproducibility required in medical AI systems.

Log-Space Numerical Stability

Logical gates are evaluated in logarithmic space using logsumexp (OR) and summation (AND), preventing numerical instability and vanishing-gradient effects.

Comparison with Existing Solutions

FeatureKLAY-Core (Julia)Dolphin (Python/PyTorch)DeepProbLog / LTNJuice.jl (Julia)
GPU ParallelismNative scatter-reduceStandard PyTorch opsMostly sequentialLimited optimized kernels
IntegrationNative Lux.jlWrapper-style integrationPython–C++ bridgesIndependent library
EcosystemJLL / YggdrasilPip / Conda environmentsMixed dependenciesNative Julia ecosystem
InterfaceHigh-level DSL macroPython API definitionsLogic-heavy syntaxLow-level graph APIs
Gradient StabilityCustom rruleStandard ADPotential instabilityVariable stability

Competitive Edge: KLAY-Core combines Julia's performance, macro system, and binary artifact ecosystem with a modern explicit deep learning framework (Lux.jl). Rather than functioning as an external wrapper, it becomes an integral neural network component, simplifying deployment, improving reproducibility, and reducing operational complexity in clinical AI environments.

Project Timeline (22 Weeks)

References

Capsule Networks for 3D Medical Imaging Segmentation in Julia

Difficulty: Hard Duration: 350 hours Mentor: Jakub Mitura Technology Stack: Julia, Lux.jl, MedPipe3D.jl, KernelAbstractions.jl, CUDA.jl, MLUtils.jl, MoonCake.jl

Deliverables

Success Criteria and Timeline

This project is scoped for a 350-hour GSoC timeframe (approximately 12–13 weeks). The following milestones and success criteria outline the expected progression.

Community Bonding (pre-coding period)

Weeks 1–3: Core Capsule Primitives and 3D Extensions

Weeks 4–6: SegCaps Architectures and Integration

Weeks 7–9: Efficient Routing and GPU Optimization

Weeks 10–11: Benchmarking and Cross-Task Transfer

Week 12+: Documentation, Polish, and Upstreaming

Description

This project implements 3D Capsule Network (CapsNet) architectures within the Julia ecosystem using Lux.jl and MedPipe3D.jl for volumetric medical image segmentation. The core work involves building a SegCaps (Segmentation Capsules) layer abstraction supporting dynamic routing-by-agreement, extending it to 3D convolution capsules with equivariance-preserving pose matrices. We will implement two key variants: (1) a 3D SegCaps U-Net hybrid that replaces encoder/decoder conv blocks with capsule layers while retaining skip connections, and (2) an efficient locally-constrained routing variant to manage the quadratic computational cost of full capsule coupling in volumetric data. Custom CUDA kernels via KernelAbstractions.jl will accelerate the routing procedure, and the full pipeline—preprocessing, training, and evaluation—will integrate with MedPipe3D.jl's NIFTI/DICOM I/O and metric infrastructure.

The central hypothesis is that capsule networks' explicit encoding of part-whole spatial hierarchies and viewpoint-equivariant pose vectors yields superior cross-domain generalization compared to standard CNNs, which rely on max-pooling and thus discard spatial relationships. We will rigorously benchmark 3D SegCaps against a 3D U-Net baseline across all 10 tasks of the Medical Segmentation Decathlon (covering CT and MRI across brain, liver, lung, pancreas, etc.), measuring not only per-task Dice/HD95 but critically cross-task transfer: models pretrained on one organ/modality and fine-tuned on another. We expect capsule routing to better preserve geometric structure across domains, improving few-shot adaptation. All code, pretrained weights, and reproducible experiment scripts will be contributed to the JuliaHealth ecosystem under MIT license.

References

Enhancing MedPipe3D: Building a Comprehensive Medical Imaging Pipeline in Julia

Description

MedPipe3D was created to improve integration between other parts of the small ecosystem (MedEye3D, MedEval3D, and MedImage). Currently, it needs to be expanded and adapted to serve as the basis for a fully functional medical imaging pipeline.

Mentor: Jakub Mitura [email: jakub.mitura14@gmail.com]

Project Difficulty and Timeline

Difficulty: Hard Duration: 12 weeks

Required Skills and Background

Potential Outcomes

This set of changes, although time-consuming to implement, should not pose a significant issue to anyone with experience with the Julia programming language. Each feature will be implemented using existing Julia libraries and frameworks where possible. However, implementing these changes will be a huge step in making the Julia language a good alternative to Python for developing end-to-end medical imaging segmentation algorithms.

Success Criteria and Time Needed

  1. Logging: Implement logging to track the progress and debug issues - 2 weeks.

  2. Performance Improvements: Optimize the performance of augmentations to ensure efficient processing - 2 weeks.

  3. Memory Usage Inspection: Enable per-layer memory usage inspection of Lux models to monitor and optimize memory consumption - 2 weeks.

  4. Gradient Checkpointing: Enable gradient checkpointing of chosen layers to save memory during training - 4 weeks.

  5. Tabular Data Support: Support loading tabular data (e.g., clinical data) together with the image into the supplied model - 1 week.

  6. Documentation: Improve documentation to provide clear instructions and examples for users - 1 week.

Total estimated time: 12 weeks.

Why Implementation of These Features is Important

Implementing these features is crucial for advancing medical imaging technology. Enhanced logging with TensorBoard integration will allow for better insight into model training. Performance improvements ensure reliable and efficient processing of large datasets. Improved documentation and memory management make the tools more accessible and usable for medical professionals, facilitating better integration into clinical workflows. Supporting tabular data alongside imaging allows for comprehensive analysis, combining clinical and imaging data to improve diagnostic accuracy and patient outcomes.

For each point, the mentor will also supply the person responsible for implementation with examples of required functionalities in Python or will point to the Julia libraries already implementing it (that just need to be integrated).