This page is designed to improve discoverability of projects. You can, for example, search this page for specific keywords and find all of the relevant projects.

MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem.

MLJ is released under the MIT license and sponsored by the Alan Turing Institute.

- View all GSoC/JSoC Projects
- Projects
- Machine Learning in Predictive Survival Analysis
- Feature transformations
- Time series forecasting at scale - speed up via Julia
- Interpretable Machine Learning in Julia
- Model visualization in MLJ
- Deeper Bayesian Integration
- Tracking and sharing MLJ workflows using MLFlow
- Speed demons only need apply
- Correcting for class imbalance in classification problems
- Improving test coverage (350 hours)
- Multi-threading Improvement Projects (350 hours)
- Automation of testing / performance benchmarking (350 hours)
- Bringing DFTK to graphics-processing units (GPUs)
- Documenter.jl
- Docsystem API
- Metalhead.jl Developement
- FastAI.jl Time Series Development
- FastAI.jl Text Development
- Differentiable Computer Vision
- FermiNets: Generative Synthesis for Automating the Choice of Neural Architectures
- Differentiable Rendering
- Adding graph convolutional layers
- Adding models and examples
- Adding graph datasets
- Supporting heterogeneous graphs
- Training on very large graphs
- Supporting temporal graph neural networks
- Improving performance using sparse linear algebra

- Recommended skills
- Mentors
- QML and Makie integration
- Web apps in Makie and JSServe
- Scheduling algorithms for Distributed algorithms
- Distributed Training
- Benchmarking against other frameworks
- GPU support for many algorithms
- Better ImageIO supports (open ended)
- EXIF viewer
- Better QR Code support (open ended)
- Where to go for discussion and to find mentors
- C++
- Rust
- Improve Javis Performance
- Building Novel Animation Abilities for Javis
- Agents.jl
- DynamicalSystems.jl
- MIDIfication of music from wave files
- Efficient low-dimensional symbolic-numeric set computations
- Reachability with sparse polynomial zonotopes
- Improving the hybrid systems reachability API
- Panel data analysis
- CRRao.jl
- JuliaStats Improvements
- Smoothing non-linear continuous time systems
- Developing a Julia plugin/frontend allowing the application of a custom compiler pipeline
- Developing Loop Models (350 hours):
- Numerical Linear Algebra
- Better Bignums Integration
- Massive parallel factorized bouncy particle sampler
- Pluto as a VS Code notebook
- Tools for education
- Electron app
- Wrapping a Rust HTTP server in Julia
- Machine Learning Time Series Regression
- Machine learning for nowcasting and forecasting
- Time series forecasting at scales
- GPU accelerated simulator of Clifford Circuits.
- Pauli Frames for faster sampling.
- A Zoo of Quantum Error Correcting codes.
- Left/Right multiplications with small gates.
- Symbolic root finding
- Symbolic Integration in Symbolics.jl
- XLA-style optimization from symbolic tracing
- Automatically improving floating point accuracy (Herbie)
- Implement Flashfill in Julia
- Parquet.jl enhancements
- Statistical transforms
- Utility transforms
- How to get started?
- Machine learning in topology optimisation
- Multi-material design representation
- Optimisation on a uniform rectilinear grid
- Adaptive mesh refinement for topology optimisation
- Heat transfer design optimisation
- More real-world Bayesian models in Turing / Julia
- Improving the integration between Turing and Turing's MCMC inference packages
- Directed-graphical model support for the abstract probabilistic programming library
- A modular tape caching mechanism for ReverseDiff
- Benchmarking & improving performance of the JuliaGaussianProcesses libraries
- Iterative methods for inference in Gaussian Processes
- Approximate inference methods for non-Gaussian likelihoods in Gaussian Processes
- GPU integration in the JuliaGPs ecosystem
- VS Code extension
- Package installation UI
- Code generation improvements and async ABI
- Wasm threading
- High performance, Low-level integration of js objects
- DOM Integration
- Porting existing web-integration packages to the wasm platform
- Native dependencies for the web
- Distributed computing with untrusted parties
- Deployment

Implement survival analysis models for use in the MLJ machine learning platform.

**Difficulty.** Moderate - hard. **Duration.** 350 hours

Survival/time-to-event analysis is an important field of Statistics concerned with understanding the distribution of events over time. Survival analysis presents a unique challenge as we are also interested in events that do not take place, which we refer to as 'censoring'. Survival analysis methods are important in many real-world settings, such as health care (disease prognosis), finance and economics (risk of default), commercial ventures (customer churn), engineering (component lifetime), and many more. This project aims to implement models for performing survivor analysis with the MLJ machine learning framework.

**Mentors.** Sebastian Vollmer, Anthony Blaom,

Julia language fluency is essential.

Git-workflow familiarity is strongly preferred.

Some experience with survival analysis.

Familiarity with MLJ's API a plus.

A passing familiarity with machine learning goals and workflow is

preferred.

Specifically, you will:

Familiarize yourself with the training and evaluation machine

learning models in MLJ.

Survey existing survival models in Julia.

Integrate some existing classical survival models into MLJ.

Develop a proof of concept for newer advanced survival analysis

models not currently implemented in Julia.

[Kvamme, H., Borgan, Ø., & Scheel, I. (2019). Time-to-event

prediction with neural networks and Cox regression. Journal of Machine Learning Research, 20(129), 1--30.](https://arxiv.org/abs/1907.00825)

[Lee, C., Zame, W. R., Yoon, J., & van der Schaar, M. (2018).

Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-Second AAAI Conference on Artificial Intelligence.](https://ojs.aaai.org/index.php/AAAI/article/view/11842/11701)

[Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., &

Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1), 24.](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-018-0482-1) <https://doi.org/10.1186/s12874-018-0482-1>

[Gensheimer, M. F., & Narasimhan, B. (2019). A scalable

discrete-time survival model for neural networks. PeerJ, 7, e6257.](https://peerj.com/articles/6257/)

[Survival.jl

Documentation](https://juliastats.org/Survival.jl/latest/)

Enhancing MLJ data-preprocessing capabilities by integrating TableTransforms into MLJ.

**Difficulty.** Easy. **Duration.** 350 hours

TableTransforms.jl is a Julia package heavily inspired by FeatureTranforms.jl which aims to provide feature engineering transforms which are vital in the Statistics and Machine Learning domain. This project would implement the necessary methods to integrate TableTransforms with MLJ, making them available for incorporation into sophisticated ML workflows.

**Mentors.** Anthony Blaom.

Julia language fluency is essential.

Git-workflow familiarity is strongly preferred.

A passing familiarity with machine learning goals and workflow

preferred

Implement the MLJ model interface for transformers in TableTransforms.jl.

Integrate TableTransforms pipelines with MLJ.

TableTransforms Github

repository.

MLJModels Github

repository with existing MLJ transformers.

Time series are ubiquitous - stocks, sensor reading, vital signs. This projects aims at adding time series forecasting to MLJ and perform benchmark comparisons to sktime, tslearn, tsml).

**Difficulty.** Moderate - hard. **Duration.** 350 hours.

Julia language fluency essential.

Git-workflow essential

Some prior contact with time series forecasting

HPC in julia is a desirable

MLJ is so far focused on tabular data and time series classification. This project is to add support for time series data in a modular, composable way.

Time series are everywhere in real-world applications and there has been an increase in interest in time series frameworks recently (see e.g. sktime, tslearn, tsml).

But there are still very few principled time-series libraries out there, so you would be working on something that could be very useful for a large number of people. To find out more, check out this paper on sktime.

**Mentors**: Sebastian Vollmer, Markus Löning (sktime developer).

Interpreting and explaining black box interpretation crucial to establish trust and improve performance

**Difficulty.** Easy - moderate. **Duration.** 350 hours.

It is important to have mechanisms in place to interpret the results of machine learning models. Identify the relevant factors of a decision or scoring of a model.

This project will implement methods for model and feature interpretability.

**Mentors.** Diego Arenas, Sebastian Vollmer.

Julia language fluency essential.

Git-workflow familiarity strongly preferred.

Some prior contact with explainable AI/ML methods is desirable.

A passing familiarity with machine learning goals and workflow preferred

The aim of this project is to implement multiple variants implementation algorithms such as:

Implement methods to show feature importance

Partial dependence plots

Tree surrogate

LocalModel: Local Interpretable Model-agnostic Explanations

Add Dataset loaders for standard interpretability datasets.

Add performance metrics for interpretability

Add interpretability algorithms

Glue code to SHAP package

Specifically you will

Familiarize yourself with MLJ

Survey of some of the literature and existing implementations in Julia and other languages, and preparing a short summary

Implement visualizations of explanations

Implement use cases

You will learn about the benefits and short comings of model interpretation and how to use them.

Tutorials

Design and implement a data visualization module for MLJ.

**Difficulty**. Easy. **Duration.** 350 hours.

Design and implement a data visualization module for MLJ to visualize numeric and categorical features (histograms, boxplots, correlations, frequencies), intermediate results, and metrics generated by MLJ machines.

Using a suitable Julia package for data visualization.

The idea is to implement a similar resource to what mlr3viz does for mlr3.

Julia language fluency essential.

Git-workflow essential.

Some prior work on data visualization is desirable

So far visualizing data or features in MLJ is an ad-hoc task. Defined by the user case by case. You will be implementing a standard way to visualize model performance, residuals, benchmarks and predictions for MLJ users.

The structures and metrics will be given from the results of models or data sets used; your task will be to implement the right visualizations depending on the data type of the features.

A relevant part of this project is to visualize the target variable against the rest of the features.

You will enhance your visualisation skills as well as your ability to "debug" and understand models and their prediction visually.

**Mentors**: Sebastian Vollmer, Diego Arenas.

Bayesian methods and probabilistic supervised learning provide uncertainty quantification. This project aims increasing integration to combine Bayesian and non-Bayesian methods using Turing.

**Difficulty.** Difficult. **Duration.** 350 hours.

As an initial step reproduce SOSSMLJ in Turing. The bulk of the project is to implement methods that combine multiple predictive distributions.

Interface between Turing and MLJ

Comparisons of ensembling, stacking of predictive distribution

reproducible benchmarks across various settings.

**Mentors**: Hong Ge Sebastian Vollmer

Help data scientists using MLJ track and share their machine learning experiments using MLFlow.

**Difficulty.** Moderate. **Duration.** 350 hours.

MLFlow is an open source platform for the machine learning life cycle. It allows the data scientist to upload experiment metadata and outputs to the platform for reproducing and sharing purposes. This project aims to integrate the MLJ machine learning platform with MLFlow.

Julia language fluency essential.

Git-workflow familiarity strongly preferred.

General familiarity with data science workflows

You will familiarize yourself with MLJ, MLFlow and MLFlowClient.jl client APIs.

Implement functionality to upload to MLFlow machine learning model hyper-parameters, performance evaluations, and artifacts encapsulating the trained model.

Implement functionality allowing for the live tracking of learning for iterative models, such as neural networks, by hooking in to MLJIteration.jl.

MLFlow website.

**Mentors.** Deyan Dyankov, Anthony Blaom, Diego Arenas.

Diagnose and exploit opportunities for speeding up common MLJ workflows.

**Difficulty.** Moderate. **Duration.** 350 hours.

In addition to investigating a number of known performance bottlenecks, you will have some free reign in this to identify opportunities to speed up common MLJ workflows, as well as making better use of memory resources.

Julia language fluency essential.

Experience with multi-threading and multi-processor computing essential, preferably in Julia.

Git-workflow familiarity strongly preferred.

Familiarity with machine learning goals and workflow preferred

In this project you will:

familiarize yourself with the training, evaluation and tuning of machine learning models in MLJ

work towards addressing a number of known performance issues, including:

limitations of the generic Tables.jl interface for interacting with tabular data which, in common cases (DataFrames), has extra functionality that can be exploited

rolling out new data front-end for models to avoid unnecessary copying of data

in conjunction with your mentor, identify best design for introducing better sparse data support to MLJ models (e.g., naive Bayes)

implement a multi-threading and/or multi-processor parallelism to the current learning networks scheduler

benchmark and profile common workflows to identify opportunities for further code optimizations

implement some of these optimizations

MLJ Roadmap. See, in particular "Scalability" section.

Data front end for MLJ models.

**Mentors.** Anthony Blaom

Improve and extend Julia's offering of algorithms for correcting class imbalance, with a view to integration into MLJ and elsewhere.

**Difficulty.** Easy - moderate. **Duration.** 350 hours

Many classification algorithms do not perform well when there is a class imbalance in the target variable (for example, many more positives than negatives). There are number of well-known data preprocessing algorithms, such as oversampling, for compensating for class imbalance. See for instance the python package imbalance-learn.

The Julia package ClassImbalance.jl provides some native Julia class imbalance algorithms. For wider adoption it is proposed that:

ClassImbalance.jl be made more data-generic, for example made to support arbitrary tables (objects implementing Tables.jl). Currently there is only support for an old version of DataFrames.jl.

ClassImbalance.jl implements a general transformer interface, such the ones provided by TableTransforms.jl, MLJ or FeatureTransforms.jl (MLJ may ultimately support the TableTransforms.jl API - see the separate "Feature Transforms" project)

ClassImbalance.jl also support data containers implementing the

`getobs`

interface in LearnBase.jl (but note this code re-organization project and this issue).Other Julia-native algorithms be added

**Mentor.** Anthony Blaom.

Julia language fluency is essential.

An understanding of the class imbalance phenomena essential. A detailed understanding of at least one class imbalance algorithm essential.

Git-workflow familiarity is strongly preferred.

A familiarity with machine learning goals and workflow preferred

Familiarize yourself with the existing ClassImbalance package, including known issues

Familiarize yourself with the Tables.jl interface

Assess the merits of different transformer API choices and choose one in consultation with your mentor

Implement the proposed improvements in parallel with testing and documentation additions to the package. Testing and documentation must be up-to-date before new algorithms are added.

repository.

Bayesian optimization is a global optimization strategy for (potentially noisy) functions with unknown derivatives. With well-chosen priors, it can find optima with fewer function evaluations than alternatives, making it well suited for the optimization of costly objective functions.

Well known examples include hyper-parameter tuning of machine learning models (see e.g. Taking the Human Out of the Loop: A Review of Bayesian Optimization). The Julia package BayesianOptimization.jl currently supports only basic Bayesian optimization methods. There are multiple directions to improve the package, including (but not limited to)

**Hybrid Bayesian Optimization (duration: 175h, expected difficulty: medium)**with discrete and continuous variables. Implement e.g. HyBO see also here.**Scalable Bayesian Optimization (duration: 175h, expected difficulty: medium)**: implement e.g. TuRBO or SCBO.**Better Defaults (duration: 175h, expected difficulty: easy)**: write an extensive test suite and implement better defaults; draw inspiration from e.g. dragonfly.

**Recommended Skills:** Familiarity with Bayesian inference, non-linear optimization, writing Julia code and reading Python code.

**Expected Outcome:** Well-tested and well-documented new features.

**Mentor:** Johanni Brea

There are a number of compiler projects that are currently being worked on. Please contact Ian Atol or Jameson Nash for additional details and let us know what specifically interests you about this area of contribution. That way, we can tailor your project to better suit your interests and skillset.

**Julia Optimization Passes (350 hours)**The Julia compiler performs optimizations at two distinct times during native code generation: first at the "Julia level", and then at the "LLVM level". At the Julia level, we have some basic optimization passes (inlining, basic DCE, SROA), but currently many other interesting passes simply don't yet exist, or have a partial PR but need significant effort to finish. We see potential for many future optimizations at this phase of compilation, especially with some new analyses that have been recently added. For this proposal, we can work together to define which optimizations we could tackle next.

**Expected Outcomes**: Improve upon the "Julia level" suite of optimizations and analyses. Ideally merge an optimization that improves Julia codegen by the end of the project timeline.**Skills**: Julia programming, some prior knowledge of compiler optimization techniques, creative thinking, and passion for performance!**Difficulty**: Medium

**LLVM (350 hours)**As previously mentioned, the Julia language utilizes LLVM as a backend for code generation. This means that there are plenty of opportunities for those with knowledge of or interest in LLVM to contribute via working on Julia's code generation process. Together, we can figure out an appropriate task if you would like to work in this area. Below are some LLVM-related projects that may be of interest.**Expected Outcomes**: Improve upon the "LLVM level" of Julia codegen.**Skills**: C/C++ programming and some prior knowledge of LLVM (in the context of clang, Rust, Swift, etc... is fine)**Difficulty**: HardInvestigating OrcJIT v2 improvements (350 hours)

The LLVM JIT has gained many new features. This project would involve finding out what they are and making use of them. Some examples include better resource tracking, parallel compilation, a new linker (which may need upstream work too), and fine-grained tracking of relocations.

**Parser error messages (and other parts) (350 hours)**Error messages and infrastructure could use some work to track source locations more precisely. This may be a large project. Contact me and @c42f for more details if this interests you.

**Expected Outcomes**: Improve upon Julia parser error messages.**Skills**: Some familiarity with parsers**Difficulty**: Medium**Macro hygiene re-implementation, to eliminate incorrect predictions inherent in current approach (350 hours)**This may be a good project for someone that wants to learn lisp/scheme! Our current algorithm runs in multiple passes, which means sometimes we compute the wrong scope for a variable in the earlier pass than when we assign the actual scope to each value. See https://github.com/JuliaLang/julia/labels/macros, and particularly issues such as https://github.com/JuliaLang/julia/issues/20241 and https://github.com/JuliaLang/julia/issues/34164.

**Expected Outcomes**: Ideally, re-implementation of hygienic macros. Realistically, resolving some or any of the`macros`

issues.**Skills**: Lisp/Scheme/Racket experience desired but not necessarily required.**Difficulty**: Medium**Better debug information output for variables (350 hours)**We have part of the infrastructure in place for representing DWARF information for our variables, but only from limited places. We could do much better since there are numerous opportunities for improvement!

**Expected Outcomes**: Varies by project. **Recommended Skills**: Most of these projects involve algorithms work, requiring a willingness and interest in seeing how to integrate with a large system. **Difficulty**: Varies by project.

**Mentors**: Jameson Nash, Ian Atol

Code coverage reports very good coverage of all of the Julia Stdlib packages, but it's not complete. Additionally, the coverage tools themselves (–track-coverage and https://github.com/JuliaCI/Coverage.jl) could be further enhanced, such as to give better accuracy of statement coverage, or more precision. A successful project may combine a bit of both building code and finding faults in others' code.

Another related side-project might be to explore adding Type information to the coverage reports?

**Recommended Skills**: An eye for detail, a thrill for filing code issues, and the skill of breaking things.

**Contact:** Jameson Nash

A few ideas to get you started, in brief:

Make better use of threads for GC (and particularly, make the page-allocator multi-threaded)

Improve granularity of codegen JIT for multi-threading

Improve granularity of IO operations for multi-threading (and set up a worker thread for running the main libuv event loop)

Measure and optimize the performance of the

`partr`

algorithm, and add the ability to dynamically scale it by workload sizeAutomatic insertion of GC safe-points/regions, particularly around loops

Work towards supporting a dynamic number of threads

Join the regularly scheduled multithreading call for discussion of any of these at #multithreading BoF calendar invite on the Julia Language Public Events calendar.

**Recommended Skills**: Varies by project

**Contact:** Jameson Nash

The Nanosoldier.jl project (and related https://github.com/JuliaCI/BaseBenchmarks.jl) tests for performance impacts of some changes. However, there remains many areas that are not covered (such as compile time) while other areas are over-covered (greatly increasing the duration of the test for no benefit) and some tests may not be configured appropriately for statistical power. Furthermore, the current reports are very primitive and can only do a basic pair-wise comparison, while graphs and other interactive tooling would be more valuable. Thus, there would be many great projects for a summer contributor to tackle here!

**Expected Outcomes**: Improvement of Julia's automated testing/benchmarking framework. **Skills**: Interest in and/or experience with CI systems. **Difficulty**: Medium

**Contact:** Jameson Nash, Tim Besard

Density-functional theory (DFT) is probably the most widespread method for simulating the quantum-chemical behaviour of electrons in matter and applications cover a wide range of fields such as materials research, chemistry or pharmacy. For aspects like designing the batteries, catalysts or drugs of tomorrow DFT is nowadays a key building block of the ongoing research. The aim to tackle even larger and more involved systems with DFT, however, keeps posing novel challenges with respect to physical models, reliability and performance. For tackling these aspects in the multidisciplinary context of DFT we recently started the density functional toolkit (DFTK), a DFT package written in pure Julia.

Employing GPUs to bring speed improvements to DFT simulations is an established idea. However, in state-of-the-art DFT simulation packages the GPU version of the solution algorithm is usually implemented in a separate code base. In other words the CPU and the GPU version co-exist, which has the drawback of the duplicated effort to fix bugs or for keeping both code bases in sync whenever a novel method or algorithm becomes available. Since conventional GPU programming frameworks feature a steep learning curve for newcomers, oftentimes the GPU version is lagging behind and features an increased code complexity making the investigation of novel GPU algorithms challenging.

In this project we want to build on the extensive GPU programming capabilities of the Julia ecosystem to enable DFTK to offload computations to a local GPU. Key aim will be to minimise the code which needs to be adapted from the present CPU code base in DFTK to achieve this. Since GPU counterparts already exist for most computational bottlenecks of a DFT computation, the key challenge of this project will be to handle the overall orchestration of the computational workflow as well as the data transfer between the CPU and the GPU. To keep the task manageable we will not directly tackle the full DFT problem (a non-linear eigenvalue problem), but restrict ourselves to the reduced setting of linear eigenvalue problems. Expanding from there towards the full DFT is an optional stretch goal of the project.

**Level of difficulty:** Medium to difficult

**Project size:** large, i.e. 12 weeks a 30 hours

**Recommended skills:** Interest to work on an multidisciplinary project bordering physics, mathematics and computer science with a good working knowledge of numerical linear algebra and Julia. Detailed knowledge in the physical background (electrostatics, material science) or about GPU programming is not required, but be prepared to take a closer look at these domains during the project.

**Expected results:** Use Julias GPU programming ecosystem to implement an algorithm for solving the type of eigenvalue problems arising in density-functional theory.

**Mentors:** Valentin Churavy, Michael F. Herbst, Antoine Levitt

**References:** For a nice intro to DFT and DFTK.jl see Michael's talk at JuliaCon 2020 and the literature given in the DFTK documentation. For an introduction to GPU computing in Julia, see the GPU workshop at JuliaCon 2021 by Tim Besard, Julian Samaroo and Valentin.

**Contact:** For any questions, feel free to email @mfherbst, @antoine-levitt or write us on the JuliaMolSim slack.

The Julia manual and the documentation for a large chunk of the ecosystem is generated using Documenter.jl – essentially a static site generator that integrates with Julia and its docsystem. There are tons of opportunities for improvements for anyone interested in working on the interface of Julia, documentation and various front-end technologies (web, LaTeX).

**ElasticSearch-based search backend for Documenter.**(350 hours) Loading the search page of Julia manual is slow because the index is huge and needs to be downloaded and constructed on the client side on every page load. Instead, we should look at hosting the search server-side. Goal is to continue the work done during a MLH fellowship for implementing an ElasticSearch-based search backend.**Improve the generated PDF in the PDF/LaTeX backend.**(175 hours) The goals is to improve the look of the generated PDF and make sure backend works reliably (improved testing). See #949, #1342 and other related issues.

**Recommended skills:** Basic knowledge of web-development (JS, CSS, HTML) or LaTeX, depending on the project.

**Mentors:** Morten Piibeleht

Julia supports docstrings – inline documentation which gets parsed together with the code and can be accessed dynamically in a Julia session (e.g. via the REPL `?>`

help mode; implemented mostly in the Docs module).

Not all docstrings are created equal however. There are bugs in Julia's docsystem code, which means that some docstrings do not get stored or are stored with the wrong key (parametric methods). In addition, the API to fetch and work with docstrings programmatically is not documented, not considered public and could use some polishing.

Create a package which would provide a clean up the API for working with docstrings, and abstract away the implementation details (and potential differences between Julia versions) of the docsystem in Base.

Fix as many docsystem-related bugs in the Julia core as possible [further reading, #16730, #29437, JuliaDocs/Documenter.jl#558]

**Recommended skills:** Basic familiarity with Julia is sufficient.

**Duration:** 350 hours

**Mentors:** Morten Piibeleht

Flux usually takes part in Google Summer of Code as a NumFocus organization. We follow the same rules and application guidelines as Julia, so please check there for more information on applying. Below are a set of ideas for potential projects (though you are welcome to explore anything you are interested in).

Flux projects are typically very competitive; we encourage you to get started early, as successful contributors typically have early PRs or working prototypes as part of the application. It is a good idea to simply start contributing via issue discussion and PRs and let a project grow from there; you can take a look at this list of issues for some starter contributions.

**Difficulty:** Medium (175h)

**Expected outcomes:** Help us improve Metalhead.jl by

adding new models

porting pre-trained weights

extending the model interfaces to make them more customizable

**Skills:** Familiarity with vision model architectures and Flux.jl

**Mentors:** Kyle Daruwalla

**Difficulty:** Medium (350h)

In this project, you will assist the ML community team with building time series methods for FastAI.jl on top of the existing JuliaML + FluxML ecosystem packages. Some familiarity with the following Julia packages is preferred, but it is not required:

**Expected outcomes:** You will

load a working time series dataset using the FastAI.jl data registry

create new block methods for time series tasks

load at least one working time series model into a learner

develop an example tutorial that ties all the previous steps together

**Skills:** Familiarity with deep learning pipelines, common practices, Flux.jl, and recurrent neural networks

**Mentors:** Lorenz Ohly, Kyle Daruwalla, Brian Chen

**Difficulty:** Medium (350h)

In this project, you will assist the ML community team with building text methods for FastAI.jl on top of the existing JuliaML + FluxML ecosystem packages. Some familiarity with the following Julia packages is preferred, but it is not required:

**Expected outcomes:** You will

load a working text dataset using the FastAI.jl data registry

create new block methods for textual tasks

load at least one working text model into a learner

develop an example tutorial that ties all the previous steps together

**Skills:** Familiarity with deep learning pipelines, common practices, Flux.jl, and JuliaText

**Mentors:** Lorenz Ohly, Kyle Daruwalla, Brian Chen

**Difficulty:** Hard (350h)

Create a library of utility functions that can consume Julia's Imaging libraries to make them differentiable. With Zygote.jl, we have the platform to take a general purpose package and apply automatic differentiation to it.

**Expected outcomes:** You will

write AD rules for functions in existing computer vision libraries

demonstrate the use of these newly differentiable libraries for tasks such as homography regression

**Skills:** Familiarity with automatic differentiation, deep learning, and defining (a lot of) Custom Adjoints

**Mentors:** Dhairya Gandhi

**Difficulty:** Hard (175h)

The application of machine learning requires an understanding a practitioner to optimize a neural architecture for a given problem, or does it? Recently techniques in automated machine learning, also known as AutoML, have dropped this requirement by allowing for good architectures to be found automatically. One such method is the FermiNet which employs generative synthesis to give a neural architecture which respects certain operational requirements.

**Expected outcomes:** The goal of this project is to implement the FermiNet in Flux to allow for automated synthesis of neural networks.

**Mentors:** Chris Rackauckas and Dhairya Gandhi

**Difficulty:** Hard (350h+)

We have an existing package, RayTracer.jl, which is motivated by OpenDR, and exists to do differentiable raytracing with Flux.jl and Zygote.jl.

**Expected outcomes:** You will

implement at least 2 alternative rendering models like NeRF, VolSDF, Neural Raytracing, etc.

make improvements to RayTracer.jl to use the latest Flux libraries

update RayTracer.jl for ChainRules.jl

**Skills:** GPU Programming, Deep Learning, familiarity with the literature, familiarity with defining Custom Adjoints

**Mentors:** Dhairya Gandhi, Avik Pal, Julian Samaroo

Graph Neural Networks (GNN) are deep learning models well adapted to data that takes the form of graphs with feature vectors associated to nodes and edges. GNNs are a growing area of research and find many applications in complex networks analysis, relational reasoning, combinatorial optimization, molecule generation, and many other fields.

GraphNeuralNetworks.jl is a pure Julia package for GNNs equipped with many features. It implements common graph convolutional layers, with CUDA support and graph batching for fast parallel operations. There are a number of ways by which the package could be improved.

While we implement a good variety of graph convolutional layers, there is still a vast zoology to be implemented yet. Preprocessing tools, pooling operators, and other GNN-related functionalities can be considered as well.

**Duration**: 175h.

**Expected difficulty**: easy to medium.

**Expected outcome**: Enrich the package with a variety of new layers and operators.

As part of the documentation and for bootstrapping new projects, we want to add fully worked out examples and applications of graph neural networks. We can start with entry-level tutorials and progressively introduce the reader to more advanced features.

**Duration**: 175h.

**Expected difficulty**: medium.

**Expected outcome**: A few pedagogical and more advanced examples of graph neural networks applications.

Provide Julia friendly wrappers for common graph datasets in `MLDatasets.jl`

. Create convenient interfaces for the Julia ML and data ecosystem.

**Duration**: 175h.

**Expected difficulty**: easy.

**Expected outcome**: A large collection of graph datasets easily available to the Julia ecosystem.

In some complex networks, the relations expressed by edges can be of different types. We need to implement an heterogeneous graph type and define convolutional layers supporting it.

**Duration**: 350h.

**Expected difficulty**: hard.

**Expected outcome**: The implementation of a new graph type for heterogeneous networks and corresponding graph convolutional layers.

Graph containing several millions of nodes are too large for gpu memory. Mini-batch training is performed on subgraphs, as in the GraphSAGE algorithm.

**Duration**: 175h.

**Expected difficulty**: hard.

**Expected outcome**: The necessary algorithmic components to scale GNN training to very large graphs.

We aim at implementing temporal graph convolutions for time-varying graph and/or node features. The design of an efficient dynamical graph type is a crucial part of this project.

**Duration**: 350h.

**Expected difficulty**: hard.

**Expected outcome**: A new dynamical graph type and corresponding convolutional layers.

Many graph convolutional layers can be expressed as non-materializing algebraic operations involving the adjacency matrix instead of the slower and more memory consuming gather/scatter mechanism. We aim at extending as far as possible and in a gpu-friendly way these *fused* implementation.

**Duration**: 175h.

**Expected difficulty**: hard.

**Expected outcome**: A noticeable performance increase for many graph convolutional operations.

Familiarity with graph neural networks and Flux.jl.

Carlo Lucibello (author of GraphNeuralNetworks.jl). For linear algebra, co-mentoring by Will Kimmerer (lead developer of SuiteSparseGraphBLAS.jl). Feel free to contact us on the Julia Slack Workspace or by opening an issue in the GitHub repo.

The QML.jl package provides Julia bindings for Qt QML on Windows, OS X and Linux. In the current state, basic GUI functionality exists, and rough integration with Makie.jl is available, allowing overlaying QML GUI elements over Makie visualizations.

*Split off the QML code for Makie into a separate package.*This will allow specifying proper package compatibility between QML and Makie, without making Makie a mandatory dependency for QML (currently we use Requires.jl for that)*Improve the integration.*Currently, connections between Makie and QML need to be set up mostly manually. We need to implement some commonly used functionality, such as the registration of clicks in a viewport with proper coordinate conversion and navigation of 3D viewports.

**Recommended Skills**: Familiarity with both Julia and the Qt framework, some basic C++ skills, affinity with 3D graphics and OpenGL.

**Duration: 175h, expected difficulty: medium**

**Mentors**: Bart Janssens and Simon Danish

Makie.jl is a visualization ecosystem for the Julia programming language, with a focus on interactivity and performance. JSServe.jl is the core infrastructure library that makes Makie's web-based backend possible.

At the moment, all the necessary ingredients exist for designing web-based User Interfaces (UI) in Makie, but the process itself is quite low-level and time-consuming. The aim of this project is to streamline that process.

Implement novel UI components and refine existing ones.

Introduce data structures suitable for representing complex UIs.

Add simpler syntaxes for common scenarios, akin to Interact's

`@manipulate`

macro.Improve documentation and tutorials.

Streamline the deployment process.

**Bonus tasks.** If time allows, one of the following directions could be pursued.

Make Makie web-based plots more suitable for general web apps (move more computation to the client side, improve interactivity and responsiveness).

Generalize the UI infrastructure to native widgets, which are already implemented in Makie but with a different interface.

**Desired skills.** Familiarity with HTML, JavaScript, and CSS, as well as reactive programming. Experience with the Julia visualization and UI ecosystem.

**Duration.** 350h.

**Difficulty.** Medium.

**Mentors.** Pietro Vertechi and Simon Danisch.

Julia is emerging as a serious tool for technical computing and is ideally suited for the ever-growing needs of big data analytics. This set of proposed projects addresses specific areas for improvement in analytics algorithms and distributed data management.

**Difficulty:** Medium (175h)

Dagger.jl is a native Julia framework and scheduler for distributed execution of Julia code and general purpose data parallelism, using dynamic, runtime-generated task graphs which are flexible enough to describe multiple classes of parallel algorithms. This project proposes to implement different scheduling algorithms for Dagger to optimize scheduling of certain classes of distributed algorithms, such as MapReduce and MergeSort, and properly utilizing heterogeneous compute resources. Contributors will be expected to find published distributed scheduling algorithms and implement them on top of the Dagger framework, benchmarking scheduling performance on a variety of micro-benchmarks and real problems.

Mentors: Julian Samaroo, Valentin Churavy

**Difficulty:** Hard (350h)

Add a distributed training API for Flux models built on top of Dagger.jl. More detailed milestones include building Dagger.jl abstractions for UCX.jl, then building tools to map Flux models into data parallel Dagger DAGs. The final result should demonstrate a Flux model training with multiple devices in parallel via the Dagger.jl APIs. A stretch goal will include mapping operations with a model to a DAG to facilitate model parallelism as well.

There are projects now that host the building blocks: DaggerFlux.jl and Distributed Data Parallel Training which can serve as jumping off points.

**Skills:** Familiarity with UCX, representing execution models as DAGs, Flux.jl, CUDA.jl and data/model parallelism in machine learning

**Mentors:** Julian Samaroo, and Dhairya Gandhi

- View all GSoC/JSoC Projects
- Projects
- Machine Learning in Predictive Survival Analysis
- Feature transformations
- Time series forecasting at scale - speed up via Julia
- Interpretable Machine Learning in Julia
- Model visualization in MLJ
- Deeper Bayesian Integration
- Tracking and sharing MLJ workflows using MLFlow
- Speed demons only need apply
- Correcting for class imbalance in classification problems
- Improving test coverage (350 hours)
- Multi-threading Improvement Projects (350 hours)
- Automation of testing / performance benchmarking (350 hours)
- Bringing DFTK to graphics-processing units (GPUs)
- Documenter.jl
- Docsystem API
- Metalhead.jl Developement
- FastAI.jl Time Series Development
- FastAI.jl Text Development
- Differentiable Computer Vision
- FermiNets: Generative Synthesis for Automating the Choice of Neural Architectures
- Differentiable Rendering
- Adding graph convolutional layers
- Adding models and examples
- Adding graph datasets
- Supporting heterogeneous graphs
- Training on very large graphs
- Supporting temporal graph neural networks
- Improving performance using sparse linear algebra

- Recommended skills
- Mentors
- QML and Makie integration
- Web apps in Makie and JSServe
- Scheduling algorithms for Distributed algorithms
- Distributed Training
- Benchmarking against other frameworks
- GPU support for many algorithms
- Better ImageIO supports (open ended)
- EXIF viewer
- Better QR Code support (open ended)
- Where to go for discussion and to find mentors
- C++
- Rust
- Improve Javis Performance
- Building Novel Animation Abilities for Javis
- Agents.jl
- DynamicalSystems.jl
- MIDIfication of music from wave files
- Efficient low-dimensional symbolic-numeric set computations
- Reachability with sparse polynomial zonotopes
- Improving the hybrid systems reachability API
- Panel data analysis
- CRRao.jl
- JuliaStats Improvements
- Smoothing non-linear continuous time systems
- Developing a Julia plugin/frontend allowing the application of a custom compiler pipeline
- Developing Loop Models (350 hours):
- Numerical Linear Algebra
- Better Bignums Integration
- Massive parallel factorized bouncy particle sampler
- Pluto as a VS Code notebook
- Tools for education
- Electron app
- Wrapping a Rust HTTP server in Julia
- Machine Learning Time Series Regression
- Machine learning for nowcasting and forecasting
- Time series forecasting at scales
- GPU accelerated simulator of Clifford Circuits.
- Pauli Frames for faster sampling.
- A Zoo of Quantum Error Correcting codes.
- Left/Right multiplications with small gates.
- Symbolic root finding
- Symbolic Integration in Symbolics.jl
- XLA-style optimization from symbolic tracing
- Automatically improving floating point accuracy (Herbie)
- Implement Flashfill in Julia
- Parquet.jl enhancements
- Statistical transforms
- Utility transforms
- How to get started?
- Machine learning in topology optimisation
- Multi-material design representation
- Optimisation on a uniform rectilinear grid
- Adaptive mesh refinement for topology optimisation
- Heat transfer design optimisation
- More real-world Bayesian models in Turing / Julia
- Improving the integration between Turing and Turing's MCMC inference packages
- Directed-graphical model support for the abstract probabilistic programming library
- A modular tape caching mechanism for ReverseDiff
- Benchmarking & improving performance of the JuliaGaussianProcesses libraries
- Iterative methods for inference in Gaussian Processes
- Approximate inference methods for non-Gaussian likelihoods in Gaussian Processes
- GPU integration in the JuliaGPs ecosystem
- VS Code extension
- Package installation UI
- Code generation improvements and async ABI
- Wasm threading
- High performance, Low-level integration of js objects
- DOM Integration
- Porting existing web-integration packages to the wasm platform
- Native dependencies for the web
- Distributed computing with untrusted parties
- Deployment

JuliaImages (see the documentation) is a framework in Julia for multidimensional arrays, image processing, and computer vision (CV). It has an active development community and offers many features that unify CV and biomedical 3D/4D image processing, support big data, and promote interactive exploration.

Often the best ideas are the ones that candidate SoC contributors come up with on their own. We are happy to discuss such ideas and help you refine your proposal. Below are some potential project ideas that might help spur some thoughts. In general, anything that is missing in JuliaImages, and worths three-months' development can be considered as potential GSoC ideas. See the bottom of this page for information about mentors.

**Difficulty:** Medium (175h)

JuliaImages provides high-quality implementations of many algorithms; however, as yet there is no set of benchmarks that compare our code against that of other image-processing frameworks. Developing such benchmarks would allow us to advertise our strengths and/or identify opportunities for further improvement. See also the OpenCV project below.

Benchmarks for several performance-sensitive packages (e.g., ImageFiltering, ImageTransformations, ImageMorphology, ImageContrastAdjustment, ImageEdgeDetection, ImageFeatures, and/or ImageSegmentation) against frameworks like Scikit-image and OpenCV, and optionally others like ITK, ImageMagick, and Matlab/Octave. See also the image benchmarks repository.

This task splits into at least two pieces:

developing frameworks for collecting the data, and

visualizing the results.

One should also be aware of the fact that differences in implementation (which may include differences in quality) may complicate the interpretation of some benchmarks.

**Skills:** JuliaImages experiences is required. Some familiarities with other image processing frameworks is preferred.

**Mentors:** Johnny Chen

**Difficulty:** Hard (350h)

JuliaImages supports many common algorithms, but targets only the CPU. With Julia now possessing first-in-class support for GPUs, now is the time to provide GPU implementations of many of the same algorithms.

KernelAbstractions may make it easier to support both CPU and GPU with a common implementation.

Fairly widespread GPU support for a single nontrivial package. ImageFiltering would be a good choice.

**Skills:** Familiarity with CUDA programming in Julia, i.e., CUDA.jl is required.

**Mentors:** Johnny Chen

**Difficulty:** Medium(175h) or Hard(350h)

ImageIO is the default IO backend shipped with Images.jl. It already supports a lot of image formats, yet there still exists some formats that are missing (e.g., GIF, JPEG 2000). Potential applicant needs to support new formats by either 1) wrapping available C libraries via BinaryBuilder, or 2) re-implement the functionality with pure Julia. See also the EXIF project below.

**Skills:** Experiences with Julia is required. For library wrapping projects, experiences with cross-compiling in Linux system is required, and familiarity with the source language (e.g., C) is preferred. The difficulty almost totally depends on how the complicate the format is, and if there exists an easy-to-wrap C library.

**Mentors:** Johnny Chen, Yupei Qi and Ian Butterworth

**Difficulty:** Medium(175h)

Exchangeable image file format (EXIF) is a widely used specification to store camera information. Potential applicant needs to provide a package to support read/write EXIF data of image file. This can be implemented in pure Julia, or wrapping the C package libexif.

**Skills:** Similar to above ImageIO skills requirements.

**Mentors:** Johnny Chen and Yupei Qi

**Difficulty:** Medium(175h) or Hard(350h)

QRCode.jl is a legacy package that supports encoding data to QR code. Contributors are required to revive this package to co-exist with the latest JuliaImages ecosystem, and also adding support to decode QR code into Julia data. Decoding QR code can be potentially be challenging and students need to find out a satisfying solution from the literature.

**Skills:** Experiences in JuliaImages are required. The ability to read and understand the QR code specification.

**Mentors:** Johnny Chen

Interested contributors are encouraged to open an discussion in Images.jl to introduce themselves and discuss the detailed project ideas. To increase the chance of getting useful feedback, please provide detailed plans and ideas (don't just copy the contents here).

The CxxWrap.jl package provides a way to load compiled C++ code into Julia. It exposes a small fraction of the C++ standard library to Julia, but many more functions and containers (e.g. `std::map`

) still need to be exposed. The objective of this project is to improve C++ standard library coverage.

Add missing STL container types (easy)

Add support for STL algorithms (intermediate)

Investigate improvement of compile times and selection of included types (advanced)

**Recommended Skills**: Familiarity with both Julia and C++

**Duration: 175h, expected difficulty: hard**

**Mentor**: Bart Janssens

Take a look at the hyper.rs project, listed on the "Pluto" page about wrapping a Rust HTTP server in a Julia package.

Are you ready to create the next amazing visualization? With Javis you can! Javis.jl is a general purpose Julia library to easily construct informative, performant, and winsome animated graphics. It uses a object-action relationship for users to make such visuals.

Javis has found application in diverse areas such as teaching, art and more. To learn more about Javis and what it is capable of, check out our 2021 JuliaCon talk! It builds on top of the drawing framework Luxor.jl by adding functions to simplify the creation of objects and their actions.

Below you can find a list of potential projects that can be tackled during Google Summer of Code. If interested in exploring any of these projects, please reach out to any of the following mentors:

**Jacob Zelko**- email, Slack (username: TheCedarPrince), or Zulip (username: TheCedarPrince)**Ole Kröger**- email, Slack (username: Wikunia), or Zulip (username: Wikunia)**Giovanni Puccetti**- Zulip (username: Giovanni)**Arsh Sharma**- Zulip (username: Arsh Sharma)

Thanks for your interest! 🎉

**Mentors:** Ole Kröger, Arsh Sharma

**Recommended Skills:** Familiarity with profiling, caching approaches, and performance testing

**Duration:** 175 hrs

**Difficulty:** Medium

As Javis's interface is largely stabilized and Javis is finding use in different applications, it is now time to deal with one of Javis's greatest pain points: slowness and high memory usage for large animations. While creating an animation in Javis, there is much room for performance improvements such as in the area of creating Objects and Actions, managing the data structures for Objects and Actions, rendering an animation, and handling different media formats (such as gif and mp4). For this specific project, a contributor will work with Ole and Arsh to create a profiling scheme for Javis to identify performance bottlenecks and measure allocations, determine caching and memory flexible modes of rendering animations with tools such as FFMPEG.jl, and finish implementing live streaming of animations. The goal for this project will not be to fully fix all identified performance issues but rather to identify and catalogue them for further development by Javis maintainers and contributors.

**Mentors:** Jacob Zelko, Giovanni Puccetti

**Recommended Skills:** General understanding of Luxor and the underlying structure of Javis

**Duration:** 175 hrs

**Difficulty:** Medium

Javis's interface has matured to a great point - but we believe Javis can do even more! Although Javis can do complex transformations such as morphing one polygon to another, Javis is capable of more than that. In this project, a contributor will work with Jacob and Giovanni to create new animation abilities for Javis to handle different coordinate systems, developing new types of shorthand expressions for object creation known as JObjects, further developing morphing, building out the flexibility of layers, and developing new interfaces (see this PR for an idea). A contributor is encouraged to come to this project with new ideas for what animations Javis can do and to reach out to Jacob and Giovanni to begin discussions early.

**Difficulty**: Medium to Hard.

**Length**: 350 hours.

Agents.jl is a pure Julia framework for agent-based modeling (ABM). It has an extensive list of features, excellent performance and is easy to learn, use, and extend. Comparisons with other popular frameworks written in Python or Java (NetLOGO, MASON, Mesa), show that Agents.jl outperforms all of them in computational speed, list of features and usability.

In this project, contributors will be paired with lead developers of Agents.jl to improve Agents.jl with more features, better performance, and overall higher polish. Possible features to implement are:

Automatic performance increase of mixed-agent models by eliminating dynamic dispatch on the stepping function

GPU support in Agents.jl

New type of space representing a planet, which can be used in climate policy or human evolution modelling, and new interface for an overarching ABM composed of several smaller ABMs

**Recommended Skills**: Familiarity with agent based modelling, Agents.jl and Julia's Type System. Background in complex systems, sociology, or nonlinear dynamics is not required but would be advantageous.

**Expected Results**: Well-documented, well-tested useful new features for Agents.jl.

**Mentors**: George Datseris.

**Difficulty:** Easy to Medium, depending on the algorithms chosen to implement.

**Length**: 175 hours.

DynamicalSystems.jl is an award-winning Julia software library for dynamical systems, nonlinear dynamics, deterministic chaos and nonlinear time series analysis. It has an impressive list of features, but one can never have enough. In this project, contributors will be able to enrich DynamicalSystems.jl with new algorithms and enrich their knowledge of nonlinear dynamics and computer-assisted exploration of complex systems.

Possible projects are summarized in the wanted-features of the library

**Recommended Skills**: Familiarity with nonlinear dynamics and/or differential equations and the Julia language.

**Expected Results**: Well-documented, well-tested new algorithms for DynamicalSystems.jl.

**Mentors**: George Datseris

JuliaMusic is an organization providing packages and functionalities that allow analyzing the properties of music performances.

**Difficulty**: Medium.

**Length**: 350 hours.

It is easy to analyze timing and intensity fluctuations in music that is the form of MIDI data. This format is already digitalized, and packages such as MIDI.jl and MusicManipulations.jl allow for seamless data processing. But arguably the most interesting kind of music to analyze is the live one. Live music performances are recorded in wave formats. Some algorithms exist that can detect the "onsets" of music hits, but they are typically focused only on the timing information and hence forfeit detecting e.g., the intensity of the played note. Plus, there are very few code implementations online for this problem, almost all of which are old and unmaintained. We would like to implement an algorithm in MusicProcessing.jl that given a recording of a single instrument, it can "MIDIfy" it, which means to digitalize it into the MIDI format.

**Recommended Skills**: Background in music, familiarity with digital signal processing.

**Expected results**: A well-tested, well-documented function `midify`

in MusicProcessing.jl.

**Mentors**: George Datseris.

JuliaReach is the Julia ecosystem for reachability computations of dynamical systems.

**Difficulty**: Medium.

**Description.** LazySets is a Julia library for computing with geometric sets, whose focus is on lazy set representations and efficient high-dimensional processing. The main interest in this project is to develop algorithms that leverage the structure of the sets. The special focus will be on low-dimensional (typically 2D and 3D) cases.

**Expected Results.** The goal is to implement certain efficient algorithms from the literature. The code is to be documented, tested, and evaluated in benchmarks. Specific tasks may include: efficient vertex enumeration of zonotopes; operations on zonotope bundles; efficient disjointness checks between different set types; complex zonotopes.

**Expected Length.** 175 hours.

**Recommended Skills.** Familiarity with Julia and Git/GitHub is mandatory. Familiarity with LazySets is recommended. Basic knowledge of geometric terminology is appreciated but not required.

**Mentors**: Marcelo Forets, Christian Schilling.

**Difficulty**: Hard.

**Description.** Sparse polynomial zonotopes are a new non-convex set representation that are well-suited for reachability analysis of nonlinear dynamical systems. The task is to add efficient Julia implementations of:

(1) sparse polynomial zonotopes in LazySets,

(2) the corresponding reachability algorithm for dynamical systems in ReachabilityAnalysis.

**Expected Results.** The goal is to efficiently implement sparse polynomial zonotopes and the corresponding reachability algorithms. The code is to be documented, tested, and evaluated extensively in benchmarks. If the candidate is interested, it is possible to change task (2) with

(3) an integration of the new set representation for neural-network control systems in NeuralNetworkAnalysis.

**Expected Length.** 350 hours.

**Recommended Skills.** Familiarity with Julia and Git/GitHub is mandatory. Familiarity with the mentioned Julia packages is appreciated but not required. The project does not require theoretical contributions, but it requires reading a research article (see below); hence a certain level of academic experience is recommended.

**Literature and related packages.** This video explains the concept of polynomial zonotopes (slides here). The relevant theory is described in this research article. There exists a Matlab implementation in CORA (the implementation of polynomial zonotopes can be found in this folder).

**Mentors**: Marcelo Forets, Christian Schilling.

**Difficulty**: Hard.

**Description.** ReachabilityAnalysis is a Julia library for set propagation of dynamical systems. One of the main aims is to handle systems with mixed discrete-continuous behaviors (known as hybrid systems in the literature). This project will focus on enhancing the capabilities of the library and overall improvement of the ecosystem for users.

**Expected Results.** Specific tasks may include: problem-specific heuristics for hybrid systems; API for time-varying input sets; flowpipe underapproximations. The code is to be documented, tested, and evaluated in benchmarks. Integration with ModelingToolkit.jl can also be considered if there is interest.

**Expected Length.** 350 hours.

**Recommended Skills.** Familiarity with Julia and Git/GitHub is mandatory. Familiarity with LazySets and ReachabilityAnalysis is also required.

**Mentors**: Marcelo Forets, Christian Schilling.

JuliaStats is an organization dedicated to providing high-quality packages for statistics in Julia.

Implement panel analysis models and estimators in Julia.

**Difficulty.** Moderate. **Duration.** 350 hours

Panel data is an important kind of statistical data that deals with observations of multiple units across time. Common examples of panel data include economic statistics (where it is common to observe figures for several countries over time). This combination of longitudinal and cross-sectional data can be powerful for extracting causal structure from data.

**Mentors.** Nils Gudat, José Bayoán Santiago Calderón, Carlos Parada

Must be fluent in at least one language for statistical computing, and willing to learn Julia before the start of projects.

Knowledge of basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing. (Must know before applying.)

Basic familiarity with time series statistics (e.g. ARIMA models, autocorrelations) or panel data. (Can be learned after applying.)

Participants will:

Learn and build on past approaches and packages for panel data analysis, such as those in Econometrics.jl and SynthControl.jl.

Generalize TreatmentPanels.jl into an abstract interface for dealing with and manipulating panel data.

Integrate existing estimators provided by packages such as Econometrics.jl into a single package for panel data estimation.

Econometric Analysis of Cross Section and Panel Data by Jeffrey Wooldridge

Implement consistent APIs for statistical modeling in Julia.

**Difficulty.** Medium. **Duration.** 350 hours

Currently, the Julia statistics ecosystem is quite fragmented. There is value in having a consistent API for a wide variety of statistical models. The CRRao.jl package offers this design. We have built several models with this interface, but there is still work to be done here.

**Mentors.** Sourish Das, Ayush Patnaik,

Must be fluent in Julia.

Knowledge of basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing. (Must know before applying.)

Participants will:

Help create, test, and document standard statistical APIs for Julia.

General improvements to JuliaStats packages, depending on the interests of participants.

**Difficulty.** Easy-Hard. **Duration.** 175-350 hours.

JuliaStats provides many of the most popular packages in Julia, including:

StatsBase.jl for basic statistics (e.g. weights, sample statistics, moments).

MixedModels.jl for random and mixed-effects linear models.

GLM.jl for generalized linear models.

All of these packages are critically important to the Julia statistics community, and all could be improved.

**Mentors.** Mousum Dutta, Chirag Anand, Ayush Patnaik, Carlos Parada

Must be fluent in at least one language for statistical computing, and willing to learn Julia before the start of projects.

Knowledge of basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing. (Must know before applying.)

Participants will:

Make JuliaStats better! This can include additional estimators, new features, performance improvements, or anything else you're interested in.

StatsBase.jl improvements could include support for cumulants, L-moments, or additional estimators.

Distributions.jl improvements could include support for new distributions (e.g. elliptical distributions), additional parametrizations and keyword constructors for current distributions, or extending support for distributions of transformed variables.

Improved nonparametric density estimators, e.g. those in R's Locfit or log-spline estimators.

Packages to support survey statistics, similar to R's survey package.

The contributor implements a state of the art smoother for continuous-time systems with additive Gaussian noise. The system's dynamics can be described as an ordinary differential equation with locally additive Gaussian random fluctuations, in other words a stochastic ordinary differential equation.

Given a series of measurements observed over time, containing statistical noise and other inaccuracies, the task is to produce an estimate of the unknown trajectory of the system that led to the observations.

*Linear* continuous-time systems are smoothed with the fixed-lag Kalman-Bucy smoother (related to the Kalman–Bucy_filter). It relies on coupled ODEs describing how mean and covariance of the conditional distribution of the latent system state evolve over time. A versatile implementation in Julia is missing.

**Expected Results**: Build efficient implementation of non-linear smoothing of continuous stochastic dynamical systems.

**Recommended Skills**: Gaussian random variables, Bayes' formula, Stochastic Differential Equations

**Mentors**: Moritz Schauer

**Rating**: Hard, 350 hours

LoopModels is the successor to LoopVectorization.jl, supporting more sophisticated analysis and transforms so that it may correctly optimize a much broader set of loop nests. It uses an internal representation of loops that represents the iteration space of each constituent operation as well as their dependencies. The iteration spaces of inner loops are allowed to be functions of the outer loops, and multiple loops are allowed to exist at each level of a loopnest. LoopModels aims to support optimizations including fusion, splitting, permuting loops, unrolling, and vectorization to maximize throughput. Broadly, this functionality can be divided into five pieces:

The Julia interface / support for custom LLVM pipelines.

The internal representation of the loops (Loop IR).

Building the internal representation from LLVM IR.

Analyze the representation to determine an optimal, correct, and target-specific schedule.

Transform the IR according to the schedule.

Open projects on this effort include:

**Difficulty**: Hard.

**Description**: In order to be able to use LoopModels from Julia, we must be able to apply a custom pass pipeline. This is likely something other packages will want to be able to do in the future, and something some packages (Enzyme.jl) do already. In this project, your aim will be to create a package that provides infrastructure others can depend on to simplify applying custom pass pipelines.

**Expected Results**: Register a package that allows applying custom LLVM pass pipelines to Julia code.

**Skills**: Julia programming, preferably with some understanding of Julia's IR. Prior familiarity with libraries such as GPUCompiler and StaticCompiler a bonus.

**Expected Length**: 175 hours.

**Difficulty**: Medium.

**Description**: This is open ended, with many potential projects here. These range from using Presburger arithmetic to support decidable polyhedral modeling, working on canonicalizations to handle more kinds of loops frequently encountered from Julia (e.g. from `CartesianIndicies`

), modeling the costs of different schedules, to efficiently searching the iteration space and find the fastest way to evaluate a loop nest. We can discuss your interests and find a task you'll enjoy and make substantive contributions to.

**Expected Results**: Help develop some aspect of the loop modeling and/or optimization.

**Skills**: C++, knowledge of LLVM, loop optimization, SIMD, and optimizing compute kernels such as GEMM preferred. A passion for performance is a must!

**Expected Length**: 350 hours.

Mentors: Chris Elrod, Yingbo Ma.

Time: 175h

Are you a performance nut? This project is aimed at expanding our coverage of high performance kernels and libraries widely used across machine learning workflows.

Help us implement cutting-edge CUDA kernels in Julia for operations important across deep learning, scientific computing and more. We also need help developing our wrappers for machine learning, sparse matrices and more, as well as CI and infrastructure. Contact us to develop a project plan.

Mentors: Tim Besard, Dhairya Gandhi.

Time: 175h

Develop a series of reinforcement learning environments, in the spirit of the OpenAI Gym. Although we have wrappers for the gym available, it is hard to install (due to the Python dependency) and, since it's written in Python and C code, we can't do more interesting things with it (such as differentiate through the environments).

A pure-Julia version of selected environments that supports a similar API and visualisation options would be valuable to anyone doing RL with Flux.

Mentors: Dhairya Gandhi.

Recent advances in reinforcement learning led to many breakthroughs in artificial intelligence. Some of the latest deep reinforcement learning algorithms have been implemented in ReinforcementLearning.jl with Flux. We'd like to have more interesting and practical algorithms added to enrich the whole community, including but not limited to the following directions:

**[Easy(175h)] Recurrent version of existing algorithms**. Contributors with a basic understanding of Q-learning and recurrent neural networks are preferred. We'd like to have a general implementation to easily extend existing algorithms to the sequential version.**[Medium(175h)] Multi-agent reinforcement learning algorithms**. Currently, we only have some CFR， MADDPG and NFSP related algorithms implemented. We'd like to see more implemented, including COMA and its variants, PSRO.**[Medium(350h)] Model-based reinforcement learning algorithms**. Contributors interested in this topic may refer Model-based Reinforcement Learning: A Survey and design some general interfaces to implement typical model based algorithms.**[Hard(350h)] Distributed reinforcement learning framework**. Inspired by Acme, a similar design is proposed in DistributedReinforcementLearning.jl. However, it is still in a very early stage. Contributors interested in this direction are required to have a basic understanding of distributed computing in Julia. Ideally we'd like to see some distributed reinforcement learning algorithms implemented under this framework, like R2D2, D4PG.

For each new algorithm, at least two experiments are expected to be added into ReinforcementLearningZoo.jl. A simple one to make sure it works on some toy games with CPU only and another more practical one to produce comparable results on the original paper with GPU enabled. Besides, a technical report on the implementation details and speed/performance comparison with other baselines is preferred.

Mentors: Jun Tian

The philosophy of the AlphaZero.jl project is to provide an implementation of AlphaZero that is simple enough to be widely accessible for contributors and researchers, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources (our latest release is consistently between one and two orders of magnitude faster than competing Python implementations).

Here are a few project ideas that build on AlphaZero.jl. Please contact us for additional details and let us know about your experience and interests so that we can build a project that best suits your profile.

[Easy (175h)] Integrate AlphaZero.jl with the OpenSpiel game library and benchmark it on a series of simple board games.

[Medium (175h)] Use AlphaZero.jl to train a chess agent. In order to save computing resources and allow faster bootstrapping, you may train an initial policy using supervised learning.

[Hard (350h)] Build on AlphaZero.jl to implement the MuZero algorithm.

[Hard (350h)] Explore applications of AlphaZero beyond board games (e.g. theorem proving, chip design, chemical synthesis...).

In all these projects, the goal is not only to showcase the current Julia ecosystem and test its limits, but also to push it forward through concrete contributions that other people can build on. Such contributions include:

Improvements to existing Julia packages (e.g. AlphaZero, ReinforcementLearning, CommonRLInterface, Dagger, Distributed, CUDA...) through code, documentation or benchmarks.

A well-documented and replicable artifact to be added to AlphaZero.Examples, ReinforcementLearningZoo or released in its own package.

A blog post that details your experience, discusses the challenges you went through and identifies promising areas for future work.

**Mentors**: Jonathan Laurent

Much of science can be explained by the movement and interaction of molecules. Molecular dynamics (MD) is a computational technique used to explore these phenomena, from noble gases to biological macromolecules. Molly.jl is a pure Julia package for MD, and for the simulation of physical systems more broadly. The package is currently under development for research with a focus on proteins and differentiable molecular simulation. There are a number of ways that the package could be improved:

**Adding simulators (duration: 175h, expected difficulty: easy to medium):**a variety of standard approaches to simulating molecules can be added including Langevin dynamics, FIRE minimisation, pressure coupling (NPT ensemble) and enhanced sampling approaches such as replica-exchange MD (REMD).**Adding constraint algorithms (duration: 175h, expected difficulty: medium):**many simulations keep fast degrees of freedom such as bond lengths and bond angles fixed using approaches such as SHAKE, RATTLE and SETTLE. A fast implementation of these algorithms would be a valuable contribution.**Adding electrostatic summation (duration: 175h, expected difficulty: medium to hard):**methods such as particle-mesh Ewald (PME) are in wide use for molecular simulation. Developing fast, flexible implementations and exploring compatibility with GPU acceleration and automatic differentiation would be an important contribution.

**Recommended skills:** familiarity with computational chemistry, structural bioinformatics or simulating physical systems.

**Expected results:** new features added to the package along with tests and relevant documentation.

**Mentor:** Joe Greener

**Contact:** feel free to ask questions via email or the JuliaMolSim Slack.

- View all GSoC/JSoC Projects
- Projects
- Machine Learning in Predictive Survival Analysis
- Feature transformations
- Time series forecasting at scale - speed up via Julia
- Interpretable Machine Learning in Julia
- Model visualization in MLJ
- Deeper Bayesian Integration
- Tracking and sharing MLJ workflows using MLFlow
- Speed demons only need apply
- Correcting for class imbalance in classification problems
- Improving test coverage (350 hours)
- Multi-threading Improvement Projects (350 hours)
- Automation of testing / performance benchmarking (350 hours)
- Bringing DFTK to graphics-processing units (GPUs)
- Documenter.jl
- Docsystem API
- Metalhead.jl Developement
- FastAI.jl Time Series Development
- FastAI.jl Text Development
- Differentiable Computer Vision
- FermiNets: Generative Synthesis for Automating the Choice of Neural Architectures
- Differentiable Rendering
- Adding graph convolutional layers
- Adding models and examples
- Adding graph datasets
- Supporting heterogeneous graphs
- Training on very large graphs
- Supporting temporal graph neural networks
- Improving performance using sparse linear algebra

- Recommended skills
- Mentors
- QML and Makie integration
- Web apps in Makie and JSServe
- Scheduling algorithms for Distributed algorithms
- Distributed Training
- Benchmarking against other frameworks
- GPU support for many algorithms
- Better ImageIO supports (open ended)
- EXIF viewer
- Better QR Code support (open ended)
- Where to go for discussion and to find mentors
- C++
- Rust
- Improve Javis Performance
- Building Novel Animation Abilities for Javis
- Agents.jl
- DynamicalSystems.jl
- MIDIfication of music from wave files
- Efficient low-dimensional symbolic-numeric set computations
- Reachability with sparse polynomial zonotopes
- Improving the hybrid systems reachability API
- Panel data analysis
- CRRao.jl
- JuliaStats Improvements
- Smoothing non-linear continuous time systems
- Developing a Julia plugin/frontend allowing the application of a custom compiler pipeline
- Developing Loop Models (350 hours):
- Numerical Linear Algebra
- Better Bignums Integration
- Massive parallel factorized bouncy particle sampler
- Pluto as a VS Code notebook
- Tools for education
- Electron app
- Wrapping a Rust HTTP server in Julia
- Machine Learning Time Series Regression
- Machine learning for nowcasting and forecasting
- Time series forecasting at scales
- GPU accelerated simulator of Clifford Circuits.
- Pauli Frames for faster sampling.
- A Zoo of Quantum Error Correcting codes.
- Left/Right multiplications with small gates.
- Symbolic root finding
- Symbolic Integration in Symbolics.jl
- XLA-style optimization from symbolic tracing
- Automatically improving floating point accuracy (Herbie)
- Implement Flashfill in Julia
- Parquet.jl enhancements
- Statistical transforms
- Utility transforms
- How to get started?
- Machine learning in topology optimisation
- Multi-material design representation
- Optimisation on a uniform rectilinear grid
- Adaptive mesh refinement for topology optimisation
- Heat transfer design optimisation
- More real-world Bayesian models in Turing / Julia
- Improving the integration between Turing and Turing's MCMC inference packages
- Directed-graphical model support for the abstract probabilistic programming library
- A modular tape caching mechanism for ReverseDiff
- Benchmarking & improving performance of the JuliaGaussianProcesses libraries
- Iterative methods for inference in Gaussian Processes
- Approximate inference methods for non-Gaussian likelihoods in Gaussian Processes
- GPU integration in the JuliaGPs ecosystem
- VS Code extension
- Package installation UI
- Code generation improvements and async ABI
- Wasm threading
- High performance, Low-level integration of js objects
- DOM Integration
- Porting existing web-integration packages to the wasm platform
- Native dependencies for the web
- Distributed computing with untrusted parties
- Deployment

Matrix functions map matrices onto other matrices, and can often be interpreted as generalizations of ordinary functions like sine and exponential, which map numbers to numbers. Once considered a niche province of numerical algorithms, matrix functions now appear routinely in applications to cryptography, aircraft design, nonlinear dynamics, and finance.

This project proposes to implement state of the art algorithms that extend the currently available matrix functions in Julia, as outlined in issue #5840. In addition to matrix generalizations of standard functions such as real matrix powers, surds and logarithms, contributors will be challenged to design generic interfaces for lifting general scalar-valued functions to their matrix analogues for the efficient computation of arbitrary (well-behaved) matrix functions and their derivatives.

**Recommended Skills**: A strong understanding of calculus and numerical analysis.

**Expected Results**: New and faster methods for evaluating matrix functions.

**Mentors:** Jiahao Chen, Steven Johnson.

**Difficulty:** Hard

Julia currently supports big integers and rationals, making use of the GMP. However, GMP currently doesn't permit good integration with a garbage collector.

This project therefore involves exploring ways to improve BigInt, possibly including:

Modifying GMP to support high-performance garbage-collection

Reimplementation of aspects of BigInt in Julia

Lazy graph style APIs which can rewrite terms or apply optimisations

This experimentation could be carried out as a package with a new implementation, or as patches over the existing implementation in Base.

**Expected Results**: An implementation of BigInt in Julia with increased performance over the current one.

**Require Skills**: Familiarity with extended precision numerics OR performance considerations. Familiarity either with Julia or GMP.

**Mentors**: Jameson Nash

**Difficulty:** Hard

As a technical computing language, Julia provides a huge number of special functions, both in Base as well as packages such as StatsFuns.jl. At the moment, many of these are implemented in external libraries such as Rmath and openspecfun. This project would involve implementing these functions in native Julia (possibly utilising the work in SpecialFunctions.jl), seeking out opportunities for possible improvements along the way, such as supporting `Float32`

and `BigFloat`

, exploiting fused multiply-add operations, and improving errors and boundary cases.

**Recommended Skills**: A strong understanding of calculus.

**Expected Results**: New and faster methods for evaluating properties of special functions.

**Mentors:** Steven Johnson, Oscar Smith. Ask on Discourse or on slack

The CCSA algorithm by Svanberg (2001) is a nonlinear programming algorithm widely used in topology optimization and for other large-scale optimization problems: it is a robust algorithm that can handle arbitrary nonlinear inequality constraints and huge numbers of degrees of freedom. Moreover, the relative simplicity of the algorithm makes it possible to easily incorporate sparsity in the Jacobian matrix (for handling huge numbers of constraints), approximate-Hessian preconditioners, and as special-case optimizations for affine terms in the objective or constraints. However, currently it is only available in Julia via the NLopt.jl interface to an external C implementation, which greatly limits its flexibility.

**Recommended Skills**: Experience with nonlinear optimization algorithms and understanding of Lagrange duality, familiarity with sparse matrices and other Julia data structures.

**Expected Results**: A package implementing a native-Julia CCSA algorithm.

**Mentors:** Steven Johnson.

At JuliaCon 2021 a new sampler Monte Carlo method (for example as sampling algorithm for the posterior in Bayesian inference) was introduced [1]. The method exploits the factorization structure to sample *a single* continuous time Markov chain targeting a joint distribution in parallel. In contrast to parallel Gibbs sampling in the method at no time a subset of coordinates is kept fixed. In Gibbs sampling keeping a subset fixed is the main device to achieve massive parallelism: given a separating set of coordinates, the conditional posterior factorizes into independent subproblems. In the presented method, a particle representing a parameter vector sampled from the posterior never ceases to move, and it is only the decisions about changes of the direction of the movement which happen in parallel on subsets of coordinates.

There are already two implementations available which make use of Julias multithreading capabilities. Starting from that, the contributor implements a version of the algorithm using GPU computing techniques as the methods is are suitable for these approaches.

**Expected Results**: Implement massive parallel factorized bouncy particle sampler [1,2] using GPU computing.

**Recommended Skills**: GPU computing, Markov processes, Bayesian inference.

**Mentors**: Moritz Schauer

**Rating**: Hard, 350 hours

[1] Moritz Schauer: ZigZagBoomerang.jl - parallel inference and variable selection. JuliaCon 2021 contribution [https://pretalx.com/juliacon2021/talk/LUVWJZ/], Youtube: [https://www.youtube.com/watch?v=wJAjP_I1BnQ], 2021.

[2] Joris Bierkens, Paul Fearnhead, Gareth Roberts: The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data. The Annals of Statistics, 2019, 47. Vol., Nr. 3, pp. 1288-1320. [https://arxiv.org/abs/1607.03188].

VS Code is an extensible editor, and one of its most recent features is a notebook GUI, with a corresponding Notebook API, allowing extension developers to write their own *notebook backend*. We want to combine two popular Julia IDEs: VS Code and Pluto.jl, and use it to provide a mature editing and debugging experience combined with Pluto's reactivity.

**Expected Results:** Reactive notebook built on top of VSCode's notebook API.

**Required skills:** JavaScript/TypeScript

**Duration:** 175 h

**Difficulty:** Medium

**Mentors:** Sebastian Pfitzner (core maintainer of julia-vscode), Panagiotis Georgakopoulos and Fons van der Plas (core maintainers of Pluto.jl) and friends

*Also see the other VS Code projects!*

Pluto's primary use case is education, and we recently started using Pluto notebooks as an 'interactive textbook': https://computationalthinking.mit.edu/ . If you are interested in design and interactive visualization, there are lots of cool JS projects in this area. Examples include:

Linking video content to dynamic content, better integration between exercise and lecture material.

Experiment with playing back the edits to a notebook session, like a video, but on a scrollable page. (link).

Syntax analysis to automatically review 'code style'

Improved live check and autograding tools

And so on! Take a look at our project board and get in touch if you have further ideas: fons@plutojl.org

**Expected Results:** *One* of the items above! When finished, your work will be used in future editions of the Computational Thinking course and more!

**Required skills:** JavaScript & CSS. (You can learn Julia as part of the project.)

**Duration:** 175 h

**Difficulty:** Easy/Medium depending on the choice

**Mentors:** Fons van der Plas, Connor Burns and fellow Pluto.jl maintainers, with feedback from Alan Edelman

Right now, Pluto is a *Julia package* with one function, `Pluto.run()`

:

```
julia> using Pluto
julia> Pluto.run()
Welcome to Pluto! Go to http://localhost:1234/ to start writing!
```

This makes sense, because Pluto is written in Julia! But for many people, the steps *install Julia, open a terminal, run the Julia REPL, use Pkg to install Pluto, import Pluto, run Pluto* are still much too intimidating. Ideally, we hope that Pluto will make scientific computing more accessible and fun for everyone, especially beginner students and programmers who might not have used a terminal before!

For this reason, we want Pluto to be a standalone Electron app, just like VS Code, Slack, WhatsApp, GitHub Desktop, Atom, and many others. Pluto as a standalone app opens the door to a more smooth and uniform user experience across the board, through Electron's native file system capabilities, setting the app to open notebook files when double-clicked, and configurable automated updates for both Pluto and Julia.

This project can be broken down into four smaller chunks.

Serve Pluto's web files in Electron

Get the Electron view talking with a local Pluto server

Implement native file system features for Pluto in Electron

Package the app into an easily installable binary (exe for Windows, dmg for MacOS, etc.), with the Julia executable embedded.

**Expected Results:** An Electron app for editing Pluto.jl notebooks, with support for operating system-specific features like file open or double-click.

**Required skills:** JavaScript, NodeJS.

**Duration:** 175 h

**Difficulty:** Easy

**Mentors:** Connor Burns, Michiel Dral, Fons van der Plas and fellow Pluto.jl maintainers

Context: *Pluto is a notebook system written in Julia, which means that it runs an HTTP/WS web server in Julia. We currently use the HTTP.jl for this, an ambitious project to write an HTTP server and client in pure Julia. While HTTP.jl works well in most scenarios, we still find that Pluto's connection is not always reliable. This is because people use Pluto on such a wide range of systems, with all kinds of network configurations, proxies, firewalls, browser interactions etc.*

Looking for alternatives, we believe that, instead of using a pure-Julia implementation of HTTP, we should wrap around an existing, high-production web server like hyper.rs. Julia has a rich history of wrapping libraries written in C, C++, Python, Go, JS and more, and the package manager has first-class support for external binaries.

As a participant of this project, you will build on top of the Julia and Rust ecosystems. A potential starting point would be looking at the Deno http server implementation also built on top of hyper.rs. Initially, the goal would be to start using the hyper C API to interoperate with Julia (there is already a hyper_jll package ❤ !!). Depending on the progress, another area of exploration is to investigate rustier tools like jlrs.

**Expected Results:** A prototype of wrapping the `hyper`

library in Julia, with a focus on reliability and efficiency, forming the basis of the package.

**Required skills:** Rust, some Julia experience, some previous experience with language interoperability or inter-process communication.

**Duration:** 175 h

**Mentors:** Paul Berg and Fons van der Plas

**Difficulty:** Hard

Pythia is a package for scalable machine learning time series forecasting and nowcasting in Julia.

The project mentors are Andrii Babii and Sebastian Vollmer.

This project involves developing scalable machine learning time series regressions for nowcasting and forecasting. Nowcasting in economics is the prediction of the present, the very near future, and the very recent past state of an economic indicator. The term is a contraction of "now" and "forecasting" and originates in meteorology.

The objective of this project is to introduce scalable regression-based nowcasting and forecasting methodologies that demonstrated the empirical success in data-rich environment recently. Examples of existing popular packages for regression-based nowcasting on other platforms include the "MIDAS Matlab Toolbox", as well as the 'midasr' and 'midasml' packages in R. The starting point for this project is porting the 'midasml' package from R to Julia. Currently Pythia has the sparse-group LASSO regression functionality for forecasting.

The following functions are of interest: in-sample and out-of sample forecasts/nowcasts, regularized MIDAS with Legendre polynomials, visualization of nowcasts, AIC/BIC and time series cross-validation tuning, forecast evaluation, pooled and fixed effects panel data regressions for forecasting and nowcasting, HAC-based inference for sparse-group LASSO, high-dimensional Granger causality tests. Other widely used existing functions from R/Python/Matlab are also of interest.

**Recommended skills:** Graduate-level knowledge of time series analysis, machine learning, and optimization is helpful.

**Expected output:** The contributor is expected to produce code, documentation, visualization, and real-data examples.

**References:** Contact project mentors for references.

Modern business applications often involve forecasting hundreds of thousands of time series. Producing such a gigantic number of reliable and high-quality forecasts is computationally challenging, which limits the scope of potential methods that can be used in practice, see, e.g., the 'forecast', 'fable', or 'prophet' packages in R. Currently, Julia lacks the scalable time series forecasting functionality and this project aims to develop the automated data-driven and scalable time series forecasting methods.

The following functionality is of interest: forecasting intermittent demand (Croston, adjusted Croston, INARMA), scalable seasonal ARIMA with covariates, loss-based forecasting (gradient boosting), unsupervised time series clustering, forecast combinations, unit root tests (ADF, KPSS). Other widely used existing functions from R/Python/Matlab are also of interest.

**Recommended skills:** Graduate-level knowledge of time series analysis is helpful.

**Expected output:** The contributor is expected to produce code, documentation, visualization, and real-data examples.

**References:** Contact project mentors for references.

Clifford circuits are a class of quantum circuits that can be simulated efficiently on a classical computer. As such, they do not provide the computational advantage expected of universal quantum computers. Nonetheless, they are extremely important, as they underpin most techniques for quantum error correction and quantum networking. Software that efficiently simulates such circuits, at the scale of thousands or more qubits, is essential to the design of quantum hardware. The QuantumClifford.jl Julia project enables such simulations.

Simulation of Clifford circuits involves significant amounts of linear algebra with boolean matrices. This enables the use of many standard computation accelerators like GPUs, as long as these accelerators support bit-wise operations. The main complications is that the elements of the matrices under consideration are usually packed in order to increase performance and lower memory usage, i.e. a vector of 64 elements would be stored as a single 64 bit integer instead of as an array of 64 bools. A Summer of Code project could consist of implement the aforementioned linear algebra operations in GPU kernels, and then seamlessly integrating them in the rest of the QuantumClifford library. At a minimum that would include Pauli-Pauli products and certain small Clifford operators, but could extend to general stabilizer tableau multiplication and even tableau diagonalization.

**Recommended skills:** Basic knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with performance profiling tools in Julia and Julia's GPU stack, including KernelAbstractions and Tullio.

**Mentors:** Stefan Krastanov

**Expected duration:** 175 hours (but applicants can scope it to a longer project by including work on GPU-accelerated Gaussian elimination used in the canonicalization routines)

**Difficulty:** Medium if the applicant is familiar with Julia, even without understanding of Quantum Information Science (but applicants can scope it to "hard" by including the aforementioned additional topics)

Often, stabilizer circuit simulations are structured as a repeated simulation of the same circuit with random Pauli errors superimposed on it. This is useful, for instance, when studying the performance of error-correcting codes. In such simulations it is possible to run one single relatively expensive simulation of the noise-less circuit in order to get a reference and then run a large number of much faster "Pauli Frame" simulations that include the random noise. By utilizing the reference simulation, the random noise simulations could more efficiently provide samples of the performance of the circuit under noise. This project would involve creating an API for such simulations in QuantumClifford.jl. A useful reference would be the Stim C++ library.

**Recommended skills:** Knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with performance profiling tools in Julia.

**Mentors:** Stefan Krastanov

**Expected duration:** 350 hours

**Difficulty:** Hard, due to requiring in-depth knowledge of the stabilizer formalism.

Quantum Error Correcting codes are typically represented in a form similar to the parity check matrix of a classical code. This form is called a Stabilizer tableaux. This project would involve creating a comprehensive library of frequently used quantum error correcting codes. As an initial step that would involve implementing the tableaux corresponding to simple pedagogical codes like the Steane and Shor codes, toric and surface codes, some CSS codes, etc. The project can be extended to a much longer one by including work on decoders for some of these codes. A large part of this project would involve literature surveys.

**Recommended skills:** Knowledge of the stabilizer formalism used for simulating Clifford circuits.

**Mentors:** Stefan Krastanov

**Expected duration:** 175 hours (but applicants can scope it as longer, depending on the list of functionality they plan to implement)

**Difficulty:** Medium. Easy with some basic knowledge of quantum error correction

Applying an n-qubit Clifford gate to an n-qubit state (tableaux) is an operation similar to matrix multiplication, requiring O(n^3) steps. However, applying a single-qubit or two-qubit gate to an n-qubit tableaux is much faster as it needs to address only one or two columns of the tableaux. This project would focus on extending the left-multiplication special cases already started in symbolic_cliffords.jl and creating additional right-multiplication special cases (for which the Stim library is a good reference).

**Recommended skills:** Knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with performance profiling tools in Julia. Understanding of C/C++ if you plan to use the Stim library as a reference.

**Mentors:** Stefan Krastanov

**Expected duration:** 175 hours (but applicants can scope it as longer if they )

**Difficulty:** Easy

Symbolics.jl have robust ways to convert symbolic expressions into multi-variate polynomials. There is now a robust Groebner basis implementation in (Groebner.jl). Finding roots and varieties of sets of polynomials would be extremely useful in many applications. This project would involve implementing various techniques for solving polynomial systems, and where possible other non-linear equation systems. A good proposal should try to enumerate a number of techniques that are worth implementing, for example:

Analytical solutions for polynomial systems of degree <= 4

Use of HomotopyContinuations.jl for testing for solvability and finding numerical solutions

Newton-raphson methods

Using Groebner basis computations to find varieties

The API for these features should be extremely user-friendly:

A single

`roots`

function should take the sets of equations and result in the right type of roots as output (either varieties or numerical answers)It should automatically find the fastest strategy to solve the set of equations and apply it.

It should fail with descriptive error messages when equations are not solvable, or degenerate in some way.

This should allow implementing symbolic eigenvalue computation when

`eigs`

is called.

**Mentors**: Shashi Gowda, Alexander Demin **Duration**: 350 hours

Implement the heuristic approach to symbolic integration. Then hook into a repository of rules such as RUMI. See also the potential of using symbolic-numeric integration techniques (https://github.com/SciML/SymbolicNumericIntegration.jl)

**Recommended Skills**: High school/Freshman Calculus

**Expected Results**: A working implementation of symbolic integration in the Symbolics.jl library, along with documentation and tutorials demonstrating its use in scientific disciplines.

**Mentors**: Shashi Gowda, Yingbo Ma

**Duration**: 350 hours

Julia functions that take arrays and output arrays or scalars can be traced using Symbolics.jl variables to produce a trace of operations. This output can be optimized to use fused operations or call highly specific NNLib functions. In this project you will trace through Flux.jl neural-network functions and apply optimizations on the resultant symbolic expressions. This can be mostly implemented as rule-based rewriting rules (see https://github.com/JuliaSymbolics/Symbolics.jl/pull/514).

**Recommended Skills**: Knowledge of space and time complexities of array operations, experience in optimizing array code.

**Mentors**: Shashi Gowda

**Duration**: 175 hours

Herbie documents a way to optimize floating point functions so as to reduce instruction count while reorganizing operations such that floating point inaccuracies do not get magnified. It would be a great addition to have this written in Julia and have it work on Symbolics.jl expressions. An ideal implementation would use the e-graph facilities of Metatheory.jl to implement this.

**Mentors**: Shashi Gowda, Alessandro Cheli

**Duration**: 350 hours

**Difficulty**: Medium

**Duration**: 350 hours

*FlashFill* is mechanism for creating data manipulation pipelines using programming by example (PBE). As an example see this implementation in Microsoft Excel. We want a version of Flashfill that can work against Julia tabular data structures, such as DataFrames and Tables.jl.

**Resources**:

A presentation by Sumit Gulwani of Microsoft Research

A video

**Recommended Skills**: Compiler techniques, DSL generation, Program synthesis

**Expected Output**: A practical flashfill implementation that can be used on any tabular data structure in Julia

**Mentors**: Avik Sengupta

**Difficulty**: Medium

**Duration**: 175 hours

Apache Parquet is a binary data format for tabular data. It has features for compression and memory-mapping of datasets on disk. A decent implementation of Parquet in Julia is likely to be highly performant. It will be useful as a standard format for distributing tabular data in a binary format. There exists a Parquet.jl package that has a Parquet reader and a writer. It currently conforms to the Julia Tabular file IO interface at a very basic level. It needs more work to add support for critical elements that would make Parquet.jl usable for fast large scale parallel data processing. Each of these goals can be targetted as a single, short duration (175 hrs) project.

Lazy loading and support for out-of-core processing, with Arrow.jl and Tables.jl integration. Improved usability and performance of Parquet reader and writer for large files.

Reading from and writing data on to cloud data stores, including support for partitioned data.

Support for missing data types and encodings making the Julia implementation fully featured.

**Resources:**

The Parquet file format (also are many articles and talks on the Parquet storage format on the internet)

**Recommended skills:** Good knowledge of Julia language, Julia data stack and writing performant Julia code.

**Expected Results:** Depends on the specific projects we would agree on.

**Mentors:** Tanmay Mohapatra

TableTransforms.jl provides transforms that are commonly used in statistics and machine learning. It was developed to address specific needs in feature engineering and works with general Tables.jl tables.

Project mentors: Júlio Hoffimann

Statistical transforms such as PCA, Z-score, etc, can greatly improve the convergence of various statistical learning models, and are widely used in advanced machine learning pipelines. In this project the mentee will learn how to implement advanced transforms such as PPMT and other transforms for imputation of missing values.

**Desired skills:** Statistics, Machine Learning

**Difficulty level:** Medium

**Expected duration:** 350hrs

**References:**

Utility transforms such as standardization of column names and other string-based transforms are extremely important for digesting real-world data. In this project the mentee will learn good coding practices and will implement various utility transforms available in other languages (e.g. Janitor package in R, pyjanitor in Python).

**Desired skills:** Text processing, Regex

**Difficulty level:** Easy

**Expected duration:** 175hrs

**References:**

Address open issues in the package.

Please contact Júlio Hoffimann on Zulip if you have any questions.

TopOpt.jl is a topology optimisation package written in pure Julia. Topology optimisation is an exciting field at the intersection of shape representation, physics simulations and mathematical optimisation, and the Julia language is a great fit for this field. To learn more about `TopOpt.jl`

, check the following JuliaCon talk.

The following is a tentative list of projects in topology optimisation that you could be working on in the coming Julia Season of Contributions or Google Summer of Code. If you are interested in exploring any of these topics or if you have other interests related to topology optimisation, please reach out to the main mentor Mohamed Tarek via email.

**Project difficulty**: Easy to Medium

**Work load**: 175 or 350 hours

**Description**: There are numerous ways to use machine learning for design optimisation in topology optimisation. The following are all recent papers with applications of neural networks and machine learning in topology optimisation. There are also exciting research opportunities in this direction.

DNN-based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations

TONR: An exploration for a novel way combining neural network with topology optimization

In this project you will implement one of the algorithms discussed in any of these papers.

**Knowledge prerequisites**: neural networks, optimisation, Julia programming

**Project difficulty**: Easy

**Work load**: 175 hours

**Description**: There are some topology optimisation formulations that enable the optimisation of the shape of the structure and the material selected simultaneously. In this project, you will implement some multi-material design optimisation formulations, e.g. this paper has a relatively simple approach to integrate in TopOpt.jl. Other methods include using mixed integer nonlinear programming from Nonconvex.jl to select materials in different parts of the design.

**Knowledge prerequisites**: basic optimisation, Julia programming

**Project difficulty**: Medium

**Work load**: 350 hours

**Description**: Currently in TopOpt.jl, there are only unstructured meshes supported. This is a very flexible type of mesh but it's not as memory efficient as uniform rectilinear grids where all the elements are assumed to have the same shape. This is the most common grid used in topology optimisation in practice. Currently in TopOpt.jl, the uniform rectilinear grid will be stored as an unstructured mesh which is unnecessarily inefficient. In this project, you will optimise the finite element analysis and topology optimisation codes in TopOpt.jl for uniform rectilinear grids.

**Knowledge prerequisites**: knowledge of mesh types, Julia programming

**Project difficulty**: Medium

**Work load**: 350 hours

**Description**: Topology optimisation problems with more mesh elements take longer to simulate and to optimise. In this project, you will explore the use of adaptive mesh refinement starting from a coarse mesh, optimising and only refining the elements that need further optimisation. This is an effective way to accelerate topology optimisation algorithms.

**Knowledge prerequisites**: adaptive mesh refinement, Julia programming

**Project difficulty**: Medium

**Work load**: 175 or 350 hours

**Description**: All of the examples in TopOpt.jl and problem types are currently of the linear elasticity, quasi-static class of problems. The goal of this project is to implement more problem types and examples from the field of heat transfer. Both steady-state heat transfer problems and linear elasticity problems make use of elliptic partial differential equations so the code from linear elasticity problems should be largely reusable for heat transfer problems with minimum changes.

**Knowledge prerequisites**: finite element analysis, heat equation, Julia programming

Turing is a universal probabilistic programming language embedded in Julia. Turing allows the user to write models in standard Julia syntax, and provide a wide range of sampling-based inference methods for solving problems across probabilistic machine learning, Bayesian statistics and data science etc. Since Turing is implemented in pure Julia code, its compiler and inference methods are amenable to hacking: new model families and inference methods can be easily added. Below is a list of ideas for potential projects, though you are welcome to propose your own to the Turing team.

If you are interested in exploring any of these projects, please reach out to the listed project mentors. You can find their contact information at turing.ml/team.

/toc

**Mentors**: Kai Xu, Tor E. Fjelde, Hong Ge

**Project difficulty**: Medium

**Project length**: 175 hrs or 350 hrs

**Description**: There are many real-world Bayesian models out there, and they deserve a Turing / Julia implementation.

Examples include but not limited to

Recommender system (probabilistic matrix factorisation, dataset)

Bayesian revenue estimation (example)

Political forecasting model (example)

Topic mining (latent Dirichlet allocation and new variants)

Multiple Annotators/Combining Unreliable Observations (Dawid and Skene, 1979)

For each model, we consider the following tasks

Correctness test: correctness of the implementation can be tested by doing inference for prior samples, for which we know the ground truth latent variables.

Performance benchmark: this includes (i) time per MCMC step and (ii) time per effective sample; if the model is differentiable, a further break-down of (i) into (i.1) time per forward pass and (i.2) time per gradient pass are needed.

Real-world results: if available, the final step is to apply the model to a real-world dataset; if such an experiment has been done in the literature, consistency of inference results needs to be checked

**Mentors**: Cameron Pfiffer, Mohamed Tarek, David Widmann

**Project difficulty**: Easy

**Project length**: 175 hrs

**Description**: Turing.jl is based on a set of inference packages that maintained by the TuringLang group. This project is about making use of improvements in DynamicPPL to create a generic integration between Turing.jl and the AbstractMCMC.jl sampling API. The ultimate goal is to remove or substantially reduce algorithm-specific glue code inside Turing.jl. The project would also involve improving data structures for storing model parameters in DynamicPPL.

**Mentors**: Philipp Gabler, Hong Ge

**Project difficulty**: Hard

**Project length**: 350 hrs

**Description**: We want to have a very light-weight representation of probabilistic models of static graphs (similar to BUGS), which can serve as a representation target of other front-end DSLs or be dynamically built. The representation should consist of the model and node representations (stochastic and deterministic, perhaps hyperparameters) and conform to the AbstractPPL model interface, with basic functions (evaluation of density, sampling, conditioning; at later stages some static analysis like extraction of Markov blankets). The model should also contain the state of the variables and implement the AbstractPPL trace interface (dictionary functions, querying of variable names). The result should be able to work with existing sampling packages through the abstract interfaces.

**Mentors**: Qingliang Zhuo, Mohamed Tarek

**Project difficulty**: Medium

**Project length**: 175 hrs

**Description**: Tape caching often leads to significant performance improvements for gradient-based sampling algorithms (e.g. HMC/NUTS). Tape caching is only possible at the complete computational level for ReverseDiff at the moment. This project is about implementing a more modular, i.e. function-as-a-caching-barrier, tape caching mechanism for ReverseDiff.jl.

**Mentors**: Theo Galy-Fajou, Will Tebbutt, ST John

**Project difficulty**: Medium

**Project length**: 350 hrs

**Description**: Although KernelFunctions.jl has extensive correctness testing, our performance testing is lacking. This project aims to resolve this, and resolve performance issues wherever they are found. The contributor would first need to extend our existing benchmarking coverage, and debug any obvious performance problems. The next phase of the work would be to construct end-to-end examples of KernelFunctions being used in practice, profile them to determine where performance problems lie, and fix them.

**Mentors**: Will Tebbutt, S. T. John, Ross Viljoen

**Project difficulty**: Medium

**Project length**: 175 hrs

**Description**: There has recently been quite a bit of work on inference methods for GPs that use iterative methods rather than the Cholesky factorisation. They look quite promising, but no one has implemented any of these within the Julia GP ecosystem yet, but they should fit nicely within the AbstractGPs framework. If you're interested in improving the GP ecosystem in Julia, this project might be for you!

**Mentors**: S. T. John, Ross Viljoen, Theo Galy-Fajou

**Project difficulty**: Hard

**Project length**: 350 hrs

**Description**: Adding approximate inference methods for non-Gaussian likelihoods which are available in other GP packages but not yet within JuliaGPs. The project would start by determining which approximate inference method(s) to implement–-there's lots to do, and we're happy to work with a contributor on whichever method they are most interested in, or to suggest one if they have no strong preference.

**Mentors**: Ross Viljoen, Theo Galy-Fajou, Will Tebbutt

**Project difficulty**: Medium

**Project length**: 350 hrs

**Description**: This would involve first ensuring that common models are able to run fully on the GPU, then identifying and improving GPU-specific performance bottlenecks. This would begin by implementing a limited end-to-end example involving a GP with a standard kernel, and profiling it to debug any substantial performance bottlenecks. From there, support for a wider range of the functionality available in KernelFunctions.jl and AbstractGPs.jl can be added. Stretch goal: extension of GPU support to some functionality in ApproximateGPs.jl.

We are generally looking for folks that want to help with the Julia VS Code extension. We have a long list of open issues, and some of them amount to significant projects.

**Required Skills**: TypeScript, Julia, web development.

**Expected Results**: Depends on the specific projects we would agree on.

**Mentors**: David Anthoff

The VSCode extension for Julia could provide a simple way to browse available packages and view what's installed on a users system. To start with, this project could simply provide a GUI that reads in package data from a `Project.toml`

/`Manifest.toml`

and show some UI elements to add/remove/manage those packages.

This could also be extended by having metadata about the package, such as a readme, github stars, activity and so on (somewhat similar to the VSCode-native extension explorer).

**Expected Results**: A UI in VSCode for package operations.

**Recommended Skills**: Familiarity with TypeScript and Julia development.

**Mentors**: Sebastian Pfitzner

*Also take a look at Pluto - VS Code integration!*

Julia has early support for targeting WebAssembly and running in the web browser. Please note that this is a rapidly moving area (see the project repository for a more detailed overview), so if you are interested in this work, please make sure to inform yourself of the current state and talk to us to scope out an appropriate project. The below is intended as a set of possible starting points.

Mentor for these projects is Keno Fischer unless otherwise stated.

Because Julia relies on an asynchronous task runtime and WebAssembly currently lacks native support for stack management, Julia needs to explicitly manage task stacks in the wasm heap and perform a compiler transformation to use this stack instead of the native WebAssembly stack. The overhead of this transformation directly impacts the performance of Julia on the wasm platform. Additionally, since all code Julia uses (including arbitrary C/C++ libraries) must be compiled using this transformation, it needs to cover a wide variety of inputs and be coordinated with other users having similar needs (e.g. the Pyodide project to run python on the web). The project would aim to improve the quality, robustness and flexibility of this transformation.

**Recommended Skills**: Experience with LLVM.

WebAssembly is in the process of standardizing threads. Simultaneously, work is ongoing to introduce a new threading runtime in Julia (see #22631 and replated PRs). This project would investigate enabling threading support for Julia on the WebAssembly platform, implementing runtime parallel primitives on the web assembly platform and ensuring that high level threading constructs are correctly mapped to the underlying platform. Please note that both the WebAssembly and Julia threading infrastructure is still in active development and may continue to change over the duration of the project. An informed understanding of the state of these projects is a definite prerequisite for this project.

**Recommended Skills**: Experience with C and multi-threaded programming.

WebAssembly is in the process of adding first class references to native objects to their specification. This capability should allow very high performance integration between julia and javascript objects. Since it is not possible to store references to javascript objects in regular memory, adding this capability will require several changes to the runtime system and code generation (possibly including at the LLVM level) in order to properly track these references and emit them either as direct references to as indirect references to the reference table.

**Recommended Skills**: Experience with C.

While Julia now runs on the web platform, it is not yet a language that's suitable for first-class development of web applications. One of the biggest missing features is integration with and abstraction over more complicated javascript objects and APIs, in particular the DOM. Inspiration may be drawn from similar projects in Rust or other languages.

**Recommended Skills**: Experience with writing libraries in Julia, experience with JavaScript Web APIs.

Several Julia libraries (e.g. WebIO.jl, Escher.jl) provide input and output capabilities for the web platform. Porting these libraries to run directly on the wasm platform would enable a number of existing UIs to automatically work on the web.

**Recommended Skills**: Experience with writing libraries in Julia.

The Julia project uses BinaryBuilder to provide binaries of native dependencies of julia packages. Experimental support exists to extend this support to the wasm platform, but few packages have been ported. This project would consist of attempting to port a significant fraction of the binary dependencies of the julia ecosystem to the web platform by improving the toolchain support in BinaryBuilder or (if necessary), porting upstream packages to fix assumptions not applicable on the wasm platform.

**Recommended Skills**: Experience with building native libraries in Unix environments.

The Distributed computing abstractions in Julia provide convenient abstraction for implementing programs that span many communicating Julia processes on different machines. However, the existing abstractions generally assume that all communicating processes are part of the same trust domain (e.g. they allow messages to execute arbitrary code on the remote). With some of the nodes potentially running in the web browser (or multiple browser nodes being part of the same distributed computing cluster via WebRPC), this assumption no longer holds true and new interfaces need to be designed to support multiple trust domains without overly restricting usability.

**Recommended Skills**: Experience with distributed computing and writing libraries in Julia.

Currently supported use cases for Julia on the web platform are primarily geared towards providing interactive environments to support exploration of the full language. Of course, this leads to significantly larger binaries than would be required for using Julia as part of a production deployment. By disabling dynamic language features (e.g. eval) one could generate small binaries suitable for deployment. Some progress towards this exists in packages like PackageCompiler.jl, though significant work remains to be done.

**Recommended Skills**: Interest in or experience with Julia internals.