This page is designed to improve discoverability of projects. You can, for example, search this page for specific keywords and find all of the relevant projects.

GeoStats.jl is an extensible framework for high-performance geostatistics in Julia. It is a project that aims to redefine the way statistics is done with geospatial data (e.g. data on geographics maps, 3D meshes).

Project mentors: Júlio Hoffimann, Rafael Caixeta

Statistical clustering cannot be applied straightforwardly to geospatial data. Geospatial constraints require clusters to be contiguous volumes in the map, something that is not taken into account by traditional methods (e.g. K-Means, Spectral Clustering).

The goal of this project is to implement a geospatial clustering method from the geostatistics literature using the GeoStats.jl API.

**Desired skills:** Statistics, Clustering, Graph Theory

**Difficulty level:** Medium

**References:**

Geostatistical simulation consists of generating multiple alternative realizations of geospatial data according to a given geospatial distribution. The litetaure on simulation methods is vast, but a few of them are particularly useful.

The goal of this project is to implement a geostatistical simulation method from the geostatistics literature using the GeoStats.jl API.

**Desired skills:** Geostatistics, Stochastics, HPC

**Difficulty level:** Hard

**References:**

The project currently relies on Plots.jl recipes to visualize geospatial data sets as well as many other objects defined in the framework. However, very large data sets (e.g. 3D volumes) cannot be visualized easily. The Makie.jl project is a promising alternative.

The goal of this project is to migrate all plot recipes from Plots.jl to Makie.jl.

**Desired skills:** Visualization, Plotting, Geometry, HPC, GPU

**Difficulty level:** Medium

Get familiar with the framework by reading the documentation and tutorials.

Please contact the project maintainers in Gitter or Zulip.

Causal and counterfactual methods for fairness in machine learning

Deeper integration with Bayesian methods and Bayesian Stacking

MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem.

MLJ is released under the MIT license and sponsored by the Alan Turing Institute.

Bring particle swarm optimization to the MLJ machine learning platform to help users tune machine learning models.

**Difficulty.** Easy - moderate.

Imagine your search for the optimal machine learning model as the meandering flight of a bee through hyper-parameter space, looking for a new home for the queen. Parallelize your search, and you've created a swarm of bees. Introduce communication between the bees about their success so far, and you introduce the possibility of the bees ultimately converging on good candidate for the best model.

PSO (Particle Swarm Optimization) is a large, promising, and active area of research, but also one that is used in real data science practice. The method is based on a very simple idea inspired by nature and makes essentially no assumptions about the nature of the cost function (unlike other methods, such as gradient descent, which might require a handle on derivatives). The method is simple to implement, and applicable to a wide range of hyper-parameter optimization problems.

**Mentors.** Anthony Blaom, Sebastian Vollmer

Julia language fluency essential.

Git-workflow familiarity strongly preferred.

Some prior contact with optimization algorithms of some kind

A passing familiarity with machine learning goals and workflow preferred

The aim of this project is to implement one or more variants of PSO algorithm, for use in the MLJ machine learning platform, for the purpose of optimizing hyper-parameters. *Integration* with MLJ is crucial, so there will be opportunity to spend time familiarizing yourself with this popular tool.

Specifically, you will:

familiarize yourself with the training, evaluation and tuning of machine learning models in MLJ

learn about the PSO algorithm and its variants, conducting a short survey of some of the literature and existing implementations in Julia and other languages, and preparing a short summary

familiarize yourself intimately with the MLJ tuning API

implement a simple PSO variant, complete with testing and documentation

experiment with the variant to learn more about its shortcomings and advantages, help recommend default parameter settings

add variants, as time permits

Mentors: Jiahao Chen, Moritz Schauer, and Sebastian Vollmer

Fairness.jl is a package to audit and mitigate bias, using the MLJ machine learning framework and other tools. It has implementations of some preprocessing and postprocessing methods for improving fairness in classification models, but could use more implementations of other methods, especially inprocessing algorithms like adversarial debiasing.

*Difficulty* Hard.

Essential: working knowledge of the Julia language

Strongly preferred: git workflow familiarity

Desirable: Experience with flux and autodiff

Machine learning models are developed to support and make high-impact decisions like who to hire or who to give a loan to. However, available training data can exhibit bias against race, age, gender, or other prohibited bases, reflecting a complex social and economic history of systemic injustice. For example, women in the United Kingdom, United States and other countries were only allowed to have their own bank accounts and lines of credit in the 1970s! That means that training a credit decisioning model on historical data would encode implicit biases, that women are less credit-worthy because few of them had lines of credit in the past. Surely we would want to be fair and not hinder an applicant's ability to get a loan on the basis of their race, gender and age?

So how can we fix data and models that are unfair? A common first reaction is to remove the race, gender and age attributes from the training data, and then say we are done. But as described in detail in the references, we have to consider if other features like one's name or address could encode such prohibited bases too. To mitigate bias and improve fairness in models, we can change the training data (pre-processing), the way we define and train the model (in-processing), and/or alter the predictions made (post-processing). Some algorithms for the first and third approaches have already been implemented in Fairness.jl, which have the advantage of treating the ML model as a black box. However, our latest research (arXiv:2011.02407) shows that pur black box methods have fundamental limitations in their ability to mitigate bias.

This project is to implement more bias mitigation algorithms and invent new ones too. We will focus on in-processing algorithms that alter the training process or alter ML model. Some specific stages are to:

Use Flux.jl or MLJFlux.jl to develop in-processing algorithms,

Study research papers proposing in-processing algorithms and implement them, and

Implement fairness algorithms and metrics for individual fairness as described in papers like arXiv:2006.11439.

High-level overview: https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb

IBM’s AIF360 resources: https://aif360.mybluemix.net/

AIF360 Inprocessing algorithms: Available here.

Mentors: Jiahao Chen, Moritz Schauer, Zenna Tavares, and Sebastian Vollmer

Fairness.jl is a package to audit and mitigate bias, using the MLJ machine learning framework and other tools. This project is to implement algorithms for counterfactual ("what if") reasoning and causal analysis to Fairness.jl and MLJ.jl, integrating and extending Julia packages for causal analysis.

*Difficulty* Hard.

Essential: working knowledge of the Julia language

Strongly preferred: git workflow familiarity

Desirable: Experience in causal inference

Desirable: Experience with graphical models

Machine learning models are developed to support and make high-impact decisions like who to hire or who to give a loan to. However, available training data can exhibit bias against race, age, gender, or other prohibited bases, reflecting a complex social and economic history of systemic injustice. For example, women in the United Kingdom, United States and other countries were only allowed to have their own bank accounts and lines of credit in the 1970s! That means that training a credit decisioning model on historical data would encode implicit biases, that women are less credit-worthy because few of them had lines of credit in the past. Surely we would want to be fair and not hinder an applicant's ability to get a loan on the basis of their race, gender and age?

So how can we fix unfairness in models? Arguably, we should first identify the underlying *causes* of bias, and only then can we actually remediate bias successfully. However, one major challenge is that a proper evaluation often requires data that we don't have. For this reason, we also need counterfactual analysis, to identify actions we can take that can mitigate fairness not just in our training data, but also in situations we haven't seen yet but could encounter in the future. Ideas for identifying and mitigating bias using such causal interventions have been proposed in papers such as Equality of Opportunity in Classification: A Causal Approach and the references below.

This project is to implement algorithms for counterfactual ("what if") reasoning and causal analysis to Fairness.jl and MLJ.jl, integrating and extending Julia packages for causal analysis. Some specific stages are:

Implement interfaces in MLJ.jl for Julia packages for causal inference and probabilistic programming such as Omega.jl and CausalInference.jl](https://github.com/mschauer/CausalInference.jl)

Implement and benchmark causal and counterfactual definitons for measuring unfairness

Implement and benchmark causal and counterfactual approaches to mitigate bias

Time series are ubiquitous - stocks, sensor reading, vital signs. This projects aims at adding time series forecasting to MLJ and perform benchmark comparisons to sktime, tslearn, tsml).

**Difficulty.** Easy - moderate.

Julia language fluency essential.

Git-workflow essential

Some prior contact with time series forecasting

HPC in julia is a desirable

MLJ is so far focused on tabular data and time series classification. This project is to add support for time series data in a modular, composable way.

Time series are everywhere in real-world applications and there has been an increase in interest in time series frameworks recently (see e.g. sktime, tslearn, tsml).

But there are still very few principled time-series libraries out there, so you would be working on something that could be very useful for a large number of people. To find out more, check out this paper on sktime.

**Mentors**: Sebastian Vollmer, Markus Löning (sktime developer).

Interpreting and explaining black box interpretation crucial to establish trust and improve performance

**Difficulty.** Easy - moderate.

It is important to have mechanisms in place to interpret the results of machine learning models. Identify the relevant factors of a decision or scoring of a model.

This project will implement methods for model and feature interpretability.

**Mentors.** Diego Arenas, Sebastian Vollmer.

Julia language fluency essential.

Git-workflow familiarity strongly preferred.

Some prior contact with explainable AI/ML methods is desirable.

A passing familiarity with machine learning goals and workflow preferred

The aim of this project is to implement multiple variants implementation algorithms such as:

Implement methods to show feature importance

Partial dependence plots

Tree surrogate

LocalModel: Local Interpretable Model-agnostic Explanations

Add Dataset loaders for standard interpretability datasets.

Add performance metrics for interpretability

Add interpretability algorithms

Glue code to SHAP package

Specifically you will

Familiarize yourself with MLJ

Survey of some of the literature and existing implementations in Julia and other languages, and preparing a short summary

Implement visualisations of explanations

Implement use cases

You will learn about the benefits and short comings of model interpretation and how to use them.

Tutorials

Design and implement a data visualization module for MLJ.

**Difficulty**. Easy.

Design and implement a data visualization module for MLJ to visualize numeric and categorical features (histograms, boxplots, correlations, frequencies), intermediate results, and metrics generated by MLJ machines.

Using a suitable Julia package for data visualization.

The idea is to implement a similar resource to what mlr3viz does for mlr3.

Julia language fluency essential.

Git-workflow essential.

Some prior work on data visualization is desirable

So far visualizing data or features in MLJ is an ad-hoc task. Defined by the user case by case. You will be implementing a standard way to visualize model performance, residuals, benchmarks and predictions for MLJ users.

The structures and metrics will be given from the results of models or data sets used; your task will be to implement the right visualizations depending on the data type of the features.

A relevant part of this project is to visualize the target variable against the rest of the features.

You will enhance your visualisation skills as well as your ability to "debug" and understand models and their prediction visually.

**Mentors**: Sebastian Vollmer, Diego Arenas.

Bayesian methods and probabilistic supervised learning provide uncertainty quantification. This project aims increasing integration to combine Bayesian and non-Bayesian methods using Turing.

As an initial step reproduce SOSSMLJ in Turing. The bulk of the project is to implement methods that combine multiple predictive distributions.

Interface between Turing and MLJ

Comparisons of ensambling, stacking of predictive distribution

reproducible benchmarks across various settings.

**Mentors**: Hong Ge Sebastian Vollmer

Integrate MLJ with MLFlow.

**Difficulty.** Easy.

MLFlow is a flexible model management tool. The project consists of writing the necessary functions to integrate MLJ with MLFlow REST API so models built using MLJ can keep track of its runs, evaluation metrics, parameters, and can be registered and monitored using MLFlow.

Julia language fluency essential.

Git-workflow familiarity strongly preferred.

Provide to MLJ users a way to keep track of their machine learning models using MLflow, as a local or remote server.

Implement a reproducible way to store and load machine learning models.

Implement functions wrapping the REST API calls that makes possible the use of MLflow.

MLFlow website.

Diagnose and exploit opportunities for speeding up common MLJ workflows.

**Difficulty.** Moderate.

In addition to investigating a number of known performance bottlenecks, you will have some free reign in this to identify opportunities to speed up common MLJ workflows, as well as making better use of memory resources.

Julia language fluency essential.

Experience with multi-threading and multi-processor computing essential, preferably in Julia.

Git-workflow familiarity strongly preferred.

Familiarity with machine learning goals and workflow preferred

In this project you will:

familiarize yourself with the training, evaluation and tuning of machine learning models in MLJ

work towards addressing a number of known performance issues, including:

limitations of the generic Tables.jl interface for interacting with tabular data which, in common cases (DataFrames), has extra functionality that can be exploited

rolling out new data front-end for models to avoid unnecessary copying of data

in conjunction with your mentor, identify best design for introducing better sparse data support to MLJ models (e.g., naive Bayes)

implement a multi-threading and/or multi-processor parallelism to the current learning networks scheduler

benchmark and profile common workflows to identify opportunities for further code optimizations

implement some of these optimizations

MLJ Roadmap. See, in particular "Scalability" section.

Data front end for MLJ models.

**Mentors.** Anthony Blaom

Bayesian optimization is a global optimization strategy for (potentially noisy) functions with unknown derivatives. With well-chosen priors, it can find optima with fewer function evaluations than alternatives, making it well suited for the optimization of costly objective functions. Well known examples include hyper-parameter tuning of machine learning models (see e.g. Taking the Human Out of the Loop: A Review of Bayesian Optimization). The Julia package BayesianOptimization.jl currently supports only basic Bayesian optimization methods. There are multiple directions to improve the package, including (but not limited to)

**Hybrid Bayesian Optimization (duration: 175h, expected difficulty: medium)**with discrete and continuous variables. Implement e.g. HyBO see also here.**Scalable Bayesian Optimization (duration: 175h, expected difficulty: medium)**: implement e.g. TuRBO or SCBO.**Better Defaults (duration: 175h, expected difficulty: easy)**: write an extensive test suite and implement better defaults; draw inspiration from e.g. dragonfly.

**Recommended Skills:** Familiarity with Bayesian inference, non-linear optimization, writing Julia code and reading Python code. **Expected Outcome:** Well-tested and well-documented new features. **Mentor:** Johanni Brea

I have a number of other compiler projects I'm currently working on. Please contact me for additional details and let me know what specifically interests you about this area of contribution and we can tailor your project to suit you together.

**Escape analysis:**A classic problem in compiler analysis! We have an existing AbstractInterpreter framework for managing inter-procedural analysis of type through data-flow analysis. However, for escape information, currently we only do very limited, local inference, which greatly limits optimization potential to places with inlining. The schedule for the project would be to start by writing some example programs that would most benefit from this. Next, you would identify what information is required to optimize those, and together we'll design a framework to compute that information. Finally, you'll get to the easy part: actually coding and putting those plans into practice. Along the way, you'll be mentored in submitting many smaller PRs to fix any issues you notice along the journey.

**Optimization passes:**Another classic compiler challenge! We have some basic optimization passes (inlining, basic DCE, SROA), but currently many other interesting passes simply don't yet exist, or have a partial PR, but need significant effort to finish. For this proposal, we can work together to define which optimizations we could tackle next.

**Investigating OrcJIT v2 improvements:**The LLVM JIT has gained many new features. This project would involve finding out what they are and making use of them. Some examples include better resource tracking, parallel compilation, a new linker (which may need upstream work too), and fine-grained tracking of relocations.

**Parser error messages (and other parts):**Error messages and infrastructure could use some work to track source locations more precisely. This may be a large project. Contact me and @c42f for more details if this interests you.

**Macro hygiene re-implementation, to eliminate incorrect predictions inherent in current approach:**This may be a good project for someone that wants to learn lisp/scheme! Our current algorithm runs in multiple passes, which means sometimes we compute the wrong scope for a variable in the earlier pass than when we assign the actual scope to each value. See https://github.com/JuliaLang/julia/labels/macros, and particularly issues such as https://github.com/JuliaLang/julia/issues/20241 and https://github.com/JuliaLang/julia/issues/34164.

**Better debug information output for variables:**We have part of the infrastructure in place for representing DWARF information for our variables, but only from limited places. We could do much better since there are numerous opportunities for improvement!

**Recommended Skills**: Most of these projects involve algorithms work, requiring a willingness and interest in seeing how to integrate with a large system.

**Mentors**: Jameson Nash

Code coverage reports very good coverage of all of the Julia Stdlib packages, but it's not complete. Additionally, the coverage tools themselves (–track-coverage and https://github.com/JuliaCI/Coverage.jl) could be further enhanced, such as to give better accuracy of statement coverage, or more precision. A successful project may combine a bit of both building code and finding faults in others' code.

Another related side-project might be to explore adding Type information to the coverage reports?

**Recommended Skills**: An eye for detail, a thrill for filing code issues, and the skill of breaking things.

**Contact:** Jameson Nash

A few ideas to get you started, in brief:

Make better use of threads for GC (and particularly, make the page-allocator multi-threaded)

Improve granularity of codegen JIT for multi-threading

Improve granularity of IO operations for multi-threading (and set up a worker thread for running the main libuv event loop)

Measure and optimize the performance of the

`partr`

algorithm, and add the ability to dynamically scale it by workload sizeAutomatic insertion of GC safe-points/regions, particularly around loops

Work towards supporting a dynamic number of threads

Join the regularly scheduled multithreading call for discussion of any of these at #multithreading BoF calendar invite on the Julia Language Public Events calendar.

**Recommended Skills**: Varies by project

**Contact:** Jameson Nash

The Nanosoldier.jl project (and related https://github.com/JuliaCI/BaseBenchmarks.jl) tests for performance impacts of some changes. However, there remains many areas that are not covered (such as compile time) while other areas are over-covered (greatly increasing the duration of the test for no benefit) and some tests may not be configured appropriately for statistical power. Furthermore, the current reports are very primitive and can only do a basic pair-wise comparison, while graphs and other interactive tooling would be more valuable. Thus, there would be many great projects for a summer student to tackle here!

**Contact:** Jameson Nash, Tim Besard

We have been developing the AtomicGraphNets.jl package, which began modestly as a Julia port of CGCNN, but now has plans to expand to a variety of more advanced graph-based methods for state-of-the-art ML performance making predictions on atomic systems. In support of this package, we are also developing ChemistryFeaturization.jl, which contains functions for building and featurizing atomic graphs from a variety of standard input files. ChemistryFeaturization will eventually form the bedrock of a DeepChem.jl umbrella organization to host a Julia-based port of the popular Deepchem Python package.

Some of the features we're excited about working on include:

smarter hyperparameter optimization for built-in model types, potentially making use of Hyperopt.jl or other existing optimization packages

building tools to enable sensitivity analysis along values of various input features as well as testing the importance of including those features at all

implementing Path-Augmented Graph Transformer layers

allowing new types of graph features (e.g. edge features, user-defined features rather than only pulling from databases, etc.) and building network layers that can make use of these features

building more physically-informed pooling operations

Improving documentation, example sets, and building tutorials for both of these packages (see cross-posting at Julia GSoD site)

**Recommended Skills**: Basic graph theory and linear algebra, some knowledge of chemistry

**Expected Results**: Contributions of new features in the eventual DeepChem.jl ecosystem

**Mentors**: Rachel Kurchin

Density-functional theory (DFT) is probably the most widespread method for simulating the quantum-chemical behaviour of electrons in matter and applications cover a wide range of fields such as materials research, chemistry or pharmacy. For aspects like designing the batteries, catalysts or drugs of tomorrow DFT is nowadays a key building block of the ongoing research. The aim to tackle even larger and more involved systems with DFT, however, keeps posing novel challenges with respect to physical models, reliability and performance. For tackling these aspects in the multidisciplinary context of DFT we recently started the density functional toolkit (DFTK), a DFT package written in pure Julia.

Employing GPUs to bring speed improvements to DFT simulations is an established idea. However, in state-of-the-art DFT simulation packages the GPU version of the solution algorithm is usually implemented in a separate code base. In other words the CPU and the GPU version co-exist, which has the drawback of the duplicated effort to fix bugs or for keeping both code bases in sync whenever a novel method or algorithm becomes available. Since conventional GPU programming frameworks feature a steep learning curve for newcomers, oftentimes the GPU version is lagging behind and features an increased code complexity making the investigation of novel GPU algorithms challenging.

In this project we want to build on the extensive GPU programming capabilities of the Julia ecosystem to enable DFTK to offload computations to a local GPU. Key aim will be to minimise the code which needs to be adapted from the present CPU code base in DFTK to achieve this. Since GPU counterparts already exist for most computational bottlenecks of a DFT computation, the key challenge of this project will be to handle the overall orchestration of the computational workflow as well as the data transfer between the CPU and the GPU. To keep the task manageable we will not directly tackle the full DFT problem (a non-linear eigenvalue problem), but restrict ourselves to the reduced setting of linear eigenvalue problems. Expanding from there towards the full DFT is an optional stretch goal of the project.

**Level of difficulty:** Medium to difficult

**Project size:** large, i.e. 12 weeks a 30 hours

**Recommended skills:** Interest to work on an multidisciplinary project bordering physics, mathematics and computer science with a good working knowledge of numerical linear algebra and Julia. Detailed knowledge in the physical background (electrostatics, material science) or about GPU programming is not required, but be prepared to take a closer look at these domains during the project.

**Expected results:** Use Julias GPU programming ecosystem to implement an algorithm for solving the type of eigenvalue problems arising in density-functional theory.

**Mentors:** Valentin Churavy, Michael F. Herbst, Antoine Levitt

**References:** For a nice intro to DFT and DFTK.jl see Michael's talk at JuliaCon 2020 and the literature given in the DFTK documentation. For an introduction to GPU computing in Julia, see the GPU workshop at JuliaCon 2021 by Tim Besard, Julian Samaroo and Valentin.

**Contact:** For any questions, feel free to email @mfherbst, @antoine-levitt or write us on the JuliaMolSim slack.

The DifferentialEquations.jl ecosystem has an extensive set of state-of-the-art methods for solving differential equations hosted by the SciML Scientific Machine Learning Software Organization. By mixing native methods and wrapped methods under the same dispatch system, DifferentialEquations.jl serves both as a system to deploy and research the most modern efficient methodologies. While most of the basic methods have been developed and optimized, many newer methods need high performance implementations and real-world tests of their efficiency claims. In this project students will be paired with current researchers in the discipline to get a handle on some of the latest techniques and build efficient implementations into the *DiffEq libraries (OrdinaryDiffEq.jl, StochasticDiffEq.jl, DelayDiffEq.jl). Possible families of methods to implement are:

Global error estimating ODE solvers

Implicit-Explicit (IMEX) Methods

Geometric (exponential) integrators

Low memory Runge-Kutta methods

Multistep methods specialized for second order ODEs (satellite simulation)

Parallel (multithreaded) extrapolation (both explicit and implicit)

Parallel Implicit Integrating Factor Methods (PDEs and SPDEs)

Parallel-in-time ODE Methods

Rosenbrock-W methods

Approximate matrix factorization

Runge-Kutta-Chebyshev Methods (high stability RK methods)

Fully Implicit Runge-Kutta (FIRK) methods

Anderson Acceleration

Boundary value problem (BVP) solvers like MIRK and collocation methods

BDF methods for differential-algebraic equations (DAEs)

Methods for stiff stochastic differential equations

Many of these methods are the basis of high-efficiency partial differential equation (PDE) solvers and are thus important to many communities like computational fluid dynamics, mathematical biology, and quantum mechanics.

This project is good for both software engineers interested in the field of numerical analysis and those students who are interested in pursuing graduate research in the field.

**Recommended Skills**: Background knowledge in numerical analysis, numerical linear algebra, and the ability (or eagerness to learn) to write fast code.

**Expected Results**: Contributions of production-quality solver methods.

**Mentors**: Chris Rackauckas

Neural networks can be used as a method for efficiently solving difficult partial differential equations. Efficient implementations of physics-informed machine learning from recent papers are being explored as part of the NeuralPDE.jl package. The issue tracker contains links to papers which would be interesting new neural network based methods to implement and benchmark against classical techniques.

**Recommended Skills**: Background knowledge in numerical analysis and machine learning.

**Expected Results**: New neural network based solver methods.

**Mentors**: Chris Rackauckas

Wouldn't it be cool to have had a part in the development of widely used efficient differential equation solvers? DifferentialEquations.jl has a wide range of existing methods and an extensive benchmark suite which is used for tuning the methods for performance. Many of its methods are already the fastest in their class, but there is still a lot of performance enhancement work that can be done. In this project you can learn the details about a wide range of methods and dig into the optimization of the algorithm's strategy and the implementation in order to improve benchmarks. Projects that could potentially improve the performance of the full differential equations ecosystem include:

Alternative adaptive stepsize techniques and step optimization

Pointer swapping tricks

Quasi-Newton globalization and optimization

Cache size reductions

Enhanced within-method multithreading, distributed parallelism, and GPU usage

Improved automated method choosing

Adaptive preconditioning on large-scale (PDE) discretizations

**Recommended Skills**: Background knowledge in numerical analysis, numerical linear algebra, and the ability (or eagerness to learn) to write fast code.

**Expected Results**: Improved benchmarks to share with the community.

**Mentors**: Chris Rackauckas

There are two ways to approach libraries for partial differential equations (PDEs): one can build "toolkits" which enable users to discretize any PDE but require knowledge of numerical PDE methods, or one can build "full-stop" PDE solvers for specific PDEs. There are many different ways solving PDEs could be approached, and here are some ideas for potential projects:

Automated PDE discretization tooling. We want users to describe a PDE in its mathematical form and automate the rest of the solution process. See this issue for details.

Enhancement of existing tools for discretizing PDEs. The finite differencing (FDM) library DiffEqOperators.jl could be enhanced to allow non-uniform grids or composition of operators. The finite element method (FEM) library FEniCS.jl could wrap more of the FEniCS library.

Full stop solvers of common fluid dynamical equations, such as diffusion-advection-convection equations, or of hyperbolic PDEs such as the Hamilton-Jacobi-Bellman equations would be useful to many users.

Using stochastic differential equation (SDE) solvers to efficiently (and highly parallel) approximate certain PDEs.

Development of ODE solvers for more efficiently solving specific types of PDE discretizations. See the "Native Julia solvers for ordinary differential equations" project.

**Recommended Skills**: Background knowledge in numerical methods for solving differential equations. Some basic knowledge of PDEs, but mostly a willingness to learn and a strong understanding of calculus and linear algebra.

**Expected Results**: A production-quality PDE solver package for some common PDEs.

**Mentors**: Chris Rackauckas

Global Sensitivity Analysis is a popular tool to assess the effect that parameters have on a differential equation model. A good introduction can be found in this thesis. Global Sensitivity Analysis tools can be much more efficient than Local Sensitivity Analysis tools, and give a better view of how parameters affect the model in a more general sense. The goal of this project would be to implement more global sensitivity analysis methods like the eFAST method into DiffEqSensitivity.jl which can be used with any differential equation solver on the common interface.

**Recommended Skills**: An understanding of how to use DifferentialEquations.jl to solve equations.

**Expected Results**: Efficient functions for performing global sensitivity analysis.

**Mentors**: Chris Rackauckas, Vaibhav Dixit

Parameter identifiability analysis is an analysis that describes whether the parameters of a dynamical system can be identified from data or whether they are redundant. There are two forms of identifiability analysis: structural and practical. Structural identifiability analysis relates changes in the solution of the ODE directly to other parameters, showcasing that it is impossible to distinguish between parameter A being higher and parameter B being lower, or the vice versa situation, given only data about the solution because of how the two interact. This could be done directly on the symbolic form of the equation as part of ModelingToolkit.jl. Meanwhile, practical identifiability analysis looks as to whether the parameters are non-identifiable in a practical sense, for example if two parameters are numerically indistinguishable (given possibly noisy data). In this case, numerical techniques being built in DiffEqSensitivity.jl, such as a nonlinear likelihood profiler or an information sensitivity measure can be used to showcase whether a parameter has a unique enough effect to be determined.

**Recommended Skills**: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

**Expected Results**: Efficient and high-quality implementations of parameter identifiability methods.

**Mentors**: Chris Rackauckas

Model order reduction is a technique for automatically finding a small model which approximates the large model but is computationally much cheaper. We plan to use the infrastructure built by ModelingToolkit.jl to implement a litany of methods and find out the best way to accelerate differential equation solves.

**Recommended Skills**: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

**Expected Results**: Efficient and high-quality implementations of model order reduction methods.

**Mentors**: Chris Rackauckas

Numerically solving a differential equation can be difficult, and thus it can be helpful for users to simplify their model before handing it to the solver. Alas this takes time... so let's automate it! ModelingToolkit.jl is a project for automating the model transformation process. Various parts of the library are still open, such as:

Support for DAEs, DDEs, and SDEs

Pantelides algorithm for DAE index reduction

Lamperti transforms

Automatic construction of adjoint solutions

Tearing in nonlinear solvers

Solving distributed delay equations

**Recommended Skills**: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

**Expected Results**: Efficient and high-quality implementations of model transformation methods.

**Mentors**: Chris Rackauckas

The Julia manual and the documentation for a large chunk of the ecosystem is generated using Documenter.jl – essentially a static site generator that integrates with Julia and its docsystem. There are tons of opportunities for improvements for anyone interested in working on the interface of Julia, documentation and various front-end technologies (web, LaTeX).

**ElasticSearch-based search backend for Documenter.**Loading the search page of Julia manual is slow because the index is huge and needs to be downloaded and constructed on the client side on every page load. Instead, we should look at hosting the search server-side. Goal is to continue the work done during a MLH fellowship for implementing an ElasticSearch-based search backend.**Improve the generated PDF in the PDF/LaTeX backend.**The goals is to improve the look of the generated PDF and make sure backend works reliably (improved testing). See #949, #1342 and other related issues.

**Recommended skills:** Basic knowledge of web-development (JS, CSS, HTML) or LaTeX, depending on the project.

**Mentors:** Morten Piibeleht

Julia supports docstrings – inline documentation which gets parsed together with the code and can be accessed dynamically in a Julia session (e.g. via the REPL `?>`

help mode; implemented mostly in the Docs module).

Not all docstrings are created equal however. There are bugs in Julia's docsystem code, which means that some docstrings do not get stored or are stored with the wrong key (parametric methods). In addition, the API to fetch and work with docstrings programmatically is not documented, not considered public and could use some polishing.

Create a package which would provide a clean up the API for working with docstrings, and abstract away the implementation details (and potential differences between Julia versions) of the docsystem in Base.

Fix as many docsystem-related bugs in the Julia core as possible [further reading, #16730, #29437, JuliaDocs/Documenter.jl#558]

**Recommended skills:** Basic familiarity with Julia is sufficient.

**Mentors:** Morten Piibeleht

Are you a performance nut? Help us implement cutting-edge CUDA kernels in Julia for operations important across deep learning, scientific computing and more. We also need help developing our wrappers for machine learning, sparse matrices and more, as well as CI and infrastructure. Contact us to develop a project plan.

Mentors: Tim Besard, Dhairya Gandhi.

Develop a series of reinforcement learning environments, in the spirit of the OpenAI Gym. Although we have wrappers for the gym available, it is hard to install (due to the Python dependency) and, since it's written in Python and C code, we can't do more interesting things with it (such as differentiate through the environments). A pure-Julia version that supports a similar API and visualisation options would be valuable to anyone doing RL with Flux.

Mentors: Dhairya Gandhi.

Recent advances in reinforcement learning led to many breakthroughs in artificial intelligence. Some of the latest deep reinforcement learning algorithms have been implemented in ReinforcementLearning.jl with Flux. We'd like to have more interesting and practical algorithms added to enrich the whole community, including but not limited to the following directions:

**[Easy(175h)] Recurrent version of existing algorithms**. Students with a basic understanding of Q-learning and recurrent neural networks are preferred. We'd like to have a general implementation to easily extend existing algorithms to the sequential version.**[Medium(175h)] Multi-agent reinforcement learning algorithms**. Currently, we only have some CFR， MADDPG and NFSP related algorithms implemented. We'd like to see more implemented, including COMA and its variants, PSRO.**[Medium(350h)] Model-based reinforcement learning algorithms**. Students interested in this topic may refer Model-based Reinforcement Learning: A Survey and design some general interfaces to implement typical model based algorithms.**[Hard(350h)] Distributed reinforcement learning framework**. Inspired by Acme, a similar design is proposed in DistributedReinforcementLearning.jl. However, it is still in a very early stage. Students interested in this direction are required to have a basic understanding of distributed computing in Julia. Ideally we'd like to see some distributed reinforcement learning algorithms implemented under this framework, like R2D2, D4PG.

For each new algorithm, at least two experiments are expected to be added into ReinforcementLearningZoo.jl. A simple one to make sure it works on some toy games with CPU only and another more practical one to produce comparable results on the original paper with GPU enabled. Besides, a technical report on the implementation details and speed/performance comparison with other baselines is preferred.

Mentors: Jun Tian

The philosophy of the AlphaZero.jl project is to provide an implementation of AlphaZero that is simple enough to be widely accessible for students and researchers, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources (our latest release is consistently between one and two orders of magnitude faster than competing Python implementations).

Here are a few project ideas that build on AlphaZero.jl. Please contact us for additional details and let us know about your experience and interests so that we can build a project that best suits your profile.

[Easy] Integrate AlphaZero.jl with the OpenSpiel game library and benchmark it on a series of simple board games.

[Medium] Use AlphaZero.jl to train a chess agent. In order to save computing resources and allow faster bootstrapping, you may train an initial policy using supervised learning.

[Hard] Build on AlphaZero.jl to implement the MuZero algorithm.

[Hard] Explore applications of AlphaZero beyond board games (e.g. theorem proving, chip design, chemical synthesis...).

In all these projects, the goal is not only to showcase the current Julia ecosystem and test its limits, but also to push it forward through concrete contributions that other people can build on. Such contributions include:

Improvements to existing Julia packages (e.g. AlphaZero, ReinforcementLearning, CommonRLInterface, Dagger, Distributed, CUDA...) through code, documentation or benchmarks.

A well-documented and replicable artifact to be added to AlphaZero.Examples, ReinforcementLearningZoo or released in its own package.

A blog post that details your experience, discusses the challenges you went through and identifies promising areas for future work.

**Mentors**: Jonathan Laurent

**Difficulty**: Medium to Hard

Build deep learning models for Natural Language Processing in Julia. TextAnalysis and WordTokenizers contains the basic algorithms and data structures to work with textual data in Julia. On top of that base, we want to build modern deep learning models based on recent research. The following tasks can span multiple students and projects.

It is important to note that we want practical, usable solutions to be created, not just research models. This implies that a large part of the effort will need to be in finding and using training data, and testing the models over a wide variety of domains. Pre-trained models must be available to users, who should be able to start using these without supplying their own training data.

Implement GPT/GPT-2 in Julia

Implement practical models for

Dependency Tree Parsing

Morphological extractions

Translations (using Transformers)

Indic language support – validate and test all models for Indic languages

ULMFiT models for Indic languages

Chinese tokenisation and parsing

**Mentors**: Avik Sengupta

**Difficulty**: Hard

Neural network based models can be used for music analysis and music generation (composition). A suite of tools in Julia to enable research in this area would be useful. This is a large, complex project that is suited for someone with an interest in music and machine learning. This project will need a mechanism to read music files (primarily MIDI), a way to synthesise sounds, and finally a model to learn composition. All of this is admittedly a lot of work, so the exact boundaries of the project can be flexible, but this can be an exciting project if you are interested in both music and machine learning.

**Recommended Skills**: Music notation, some basic music theory, MIDI format, Transformer and LSTM architectures

**Resources**: Music Transformer, Wave2MIDI2Wave, MIDI.jl, Mplay.jl

**Mentors**: Avik Sengupta

Flux usually takes part in Google Summer of Code, as part of the wider Julia organisation. We follow the same rules and application guidelines as Julia, so please check there for more information on applying. Below are a set of ideas for potential projects (though you are welcome to explore anything you are interested in).

Flux projects are typically very competitive; we encourage you to get started early, as successful students typically have early PRs or working prototypes as part of the application. It is a good idea to simply start contributing via issue discussion and PRs and let a project grow from there; you can take a look at this list of issues for some starter contributions.

There are many high-quality open-source tutorials and learning materials available, for example from PyTorch and fast.ai. We'd like to have Flux ports of these that we can add to the model zoo, and eventually publish to the Flux website.

Mentors: Dhairya Gandhi.

The application of machine learning requires an understanding a practitioner to optimize a neural architecture for a given problem, or does it? Recently techniques in automated machine learning, also known as AutoML, have dropped this requirement by allowing for good architectures to be found automatically. One such method is the FermiNet which employs generative synthesis to give a neural architecture which respects certain operational requirements. The goal of this project is to implement the FermiNet in Flux to allow for automated synthesis of neural networks.

Mentors: Chris Rackauckas and Dhairya Gandhi.

Expected Outcome: This is motivated to create SoftRasterizer/DiB-R based projects. We already have RayTracer.jl which is motivated by OpenDR. (Of course, if someone wants to implement NERF - like models they are most welcome to submit a proposal). We would ideally target at least 2 of these models.

Skills: GPU Programming, Deep Learning, (deep) familiarity with the literature, familiarity with defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi, Julian Samaroo, Avik Pal

Expected Outcomes:

Some of the functions require custom adjoints for speedup

Functions require GPU kernels. Some of these are of common interest to the community like – knn, etc.

Benchmarking with Tensorflow Graphics and Pytorch3D. We already have the scripts for kaolin, need to extend that.

Most of these problems are listed as issues in the main repo.

Skills: GPU Programming, Deep Learning, familiarity with defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi

**Difficulty:** Medium

In this project, you will assist the ML community team with building FastAI.jl on top of the existing JuliaML + FluxML ecosystem packages. The primary goal is to create an equivalent to docs.fast.ai. This will require building the APIs, documenting them, and creating the appropriate tutorials. Some familiarity with the following Julia packages is preferred, but it is not required:

A stretch goal can include extending FastAI.jl beyond its Python-equivalent by leveraging the flexibility in the underlying Julia packages. For example, creating and designing abstractions for distributed data parallel training.

**Skills:** Familiarity with deep learning pipelines, common practices, Flux.jl, and MLDataPattern.jl

**Mentors:** Kyle Daruwalla

Expected Outcome:

Create a library of utility functions that can consume Julia's Imaging libraries to make them differentiable. With Zygote.jl, we have the platform to take a general purpose package and apply automatic differentiation to it. This project is motivated to use existing libraries that offer perform computer vision tasks, and augment them with AD to perform tasks such as homography regression.

Skills: Familiarity with automatic differentiation, deep learning, and defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi

**Difficulty**: Easy to Medium

The use of deep learning tools to source code is an active area of research. With the runtime being able to easily introspect into Julia code (for example, with a clean, accessible AST format), using theses techniques on Julia code would be a fruitful exercise.

Use of RNNs for syntax error correction: https://arxiv.org/abs/1603.06129

Implement Code2Vec for Julia: https://arxiv.org/abs/1803.09473

**Recommended Skills:** Familiarity with compiler techniques as well as deep learning tools will be required. The "domain expertise" in this task is Julia programming, so it will need someone who has a reasonable experience of the Julia programming language.

**Expected Outcome:** Packages for each technique that is usable by general programmers.

**Mentors**: Avik Sengupta

Julia is emerging as a serious tool for technical computing and is ideally suited for the ever-growing needs of big data analytics. This set of proposed projects addresses specific areas for improvement in analytics algorithms and distributed data management.

**Difficulty:** Medium

Dagger.jl is a native Julia framework and scheduler for distributed execution of Julia code and general purpose data parallelism, using dynamic, runtime-generated task graphs which are flexible enough to describe multiple classes of parallel algorithms. This project proposes to implement different scheduling algorithms for Dagger to optimize scheduling of certain classes of distributed algorithms, such as MapReduce and MergeSort, and properly utilizing heterogeneous compute resources. Students will be expected to find published distributed scheduling algorithms and implement them on top of the Dagger framework, benchmarking scheduling performance on a variety of micro-benchmarks and real problems.

Mentors: Julian Samaroo, Valentin Churavy

**Difficulty:** Hard

Add a distributed training API for Flux models built on top of Dagger.jl. More detailed milestones include building Dagger.jl abstractions for UCX.jl, then building tools to map Flux models into data parallel Dagger DAGs. The final result should demonstrate a Flux model training with multiple devices in parallel via the Dagger.jl APIs. A stretch goal will include mapping operations with a model to a DAG to facilitate model parallelism as well.

**Skills:** Familiarity with UCX, representing execution models as DAGs, Flux.jl, and data/model parallelism in machine learning

**Mentors:** Kyle Daruwalla, Julian Samaroo, and Brian Chen

JuliaImages (see the documentation) is a framework in Julia for multidimensional arrays, image processing, and computer vision (CV). It has an active development community and offers many features that unify CV and biomedical 3D/4D image processing, support big data, and promote interactive exploration.

Often the best ideas are the ones that candidate SoC students come up with on their own. We are happy to discuss such ideas and help you refine your proposal. Below are some potential project ideas that might help spur some thoughts. See the bottom of this page for information about mentors.

For new or occasional users, JuliaImages would benefit from a large collection of complete worked examples organized by topic. While the current documentation contains many "mini-demos," they are scattered; an organized page would help users quickly find what they need. We have set up a landing page, but many more demos are needed. Scikit-image is one potential model.

Notes:

This "project" might also be split among multiple students who contribute demos as part of their work in a focused area of JuliaImages.

Each demo is a mini blog that includes the usage, explanations and (optional) best practices. A direct copy from the function references is not allowed.

Copy or modify from existing open-source projects should meet their license requirements.

The applicant should be familiar with JuliaImages, and should be able to write good technical blogs in English.

A significant expansion of the number of democards with detailed explanations.

(Preferred) adding more missing functionalities to JuliaImages ecosystem.

(Optional) improve DemoCards.jl, which is the build tool for the demos.

Johnny Chen and Tim Holy

JuliaImages provides high-quality implementations of many algorithms; however, as yet there is no set of benchmarks that compare our code against that of other image-processing frameworks. Developing such benchmarks would allow us to advertise our strengths and/or identify opportunities for further improvement. See also the OpenCV project below.

JuliaImages experiences is required. Some familiarities with other image processing frameworks is preferred.

Benchmarks for several performance-sensitive packages (e.g., ImageFiltering, ImageTransformations, ImageMorphology, ImageContrastAdjustment, ImageEdgeDetection, ImageFeatures, and/or ImageSegmentation) against frameworks like Scikit-image and OpenCV, and optionally others like ITK, ImageMagick, and Matlab/Octave.

This task splits into at least two pieces:

developing frameworks for collecting the data, and

visualizing the results.

One should also be aware of the fact that differences in implementation (which may include differences in quality) may complicate the interpretation of some benchmarks.

Tim Holy and Johnny Chen

JuliaImages supports many common algorithms, but targets only the CPU. With Julia now possessing first-in-class support for GPUs, now is the time to provide GPU implementations of many of the same algorithms.

KernelAbstractions may make it easier to support both CPU and GPU with a common implementation.

Familiarity with CUDA programming in Julia, i.e., CUDA.jl is required.

Fairly widespread GPU support for a single nontrivial package. ImageFiltering would be a good choice.

Tim Holy and Johnny Chen

ImageMagick is a widely used low-level image io and processing library, it also has its Julia frontend ImageMagick.jl, which is used widely in the entire Julia ecosystem. However, ImageMagick.jl project is not under active maintenance; it lacks of the necessary documentation and has few test coverage. The potential applicant needs to revisit and upgrade the ImageMagick.jl codebase to enhance the ImageMagick.jl package.

Experiences with Linux cross-compiling, C and Julia is required. Familiarity with ImageMagick library is preferred.

fix legacy ImageMagick.jl issues

improve the reliability

add a complete reference documentation for ImageMagick.jl

(Optional) port more ImageMagick features to ImageMagick.jl

Tim Holy and Johnny Chen

Besides the gigantic ImageMagick library, Julia also provides a lighter ImageIO package for PNG, TIFF and Netpbm image formats. However, there are more widely-used image formats (e.g., JPEG, GIF) that are not supported by ImageIO yet. Potential applicant needs to support the IO of new image format by either 1) wrapping available C libraries via BinaryBuilder, or 2) re-implement the functionality with pure Julia.

Experiences with Julia is required. For library wrapping projects, experiences with cross-compiling in Linux system is required, and familiarity with the source language (e.g., C) is preferred.

Add at least one image format support.

Ian Butterworth, Johnny Chen and Tim Holy

Image processing often involves tight interaction between algorithms and visualization. While there are a number of older tools available, leveraging GLVisualize seems to hold the greatest promise. This project might implement a number of interactive tools for region-of-interest selection, annotation, measurement, and modification. Software suites like OpenCV, ImageJ/Fiji, scikit-image, and Matlab might serve as inspiration.

JuliaImages also provides several non-GUI visualization tools, e.g., ImageDraw.jl, ImageInTerminal.jl, ImageShow.jl and MosaicViews.jl. Improving these packages are also good project ideas.

For ImageViews.jl and similar GUI projects, familiarity with GUI programming is required. For non-GUI projects, familiarity with Julia array interfaces are preferred.

Tim Holy. For non-GUI projects, Johnny Chen is also available.

OpenCV is one of the pre-eminent image-processing frameworks. During the summer of 2020, significant progress was made on a Julia wrapper. An important remaining task is to integrate the wrapper with Julia's binary packaging system.

C++ experiences are required. Some familiarity with the Julia and BinaryBuilder.jl and CxxWrap.jl are preferred.

An OpenCV package that can be installed across all major platforms with `Pkg.add("OpenCV")`

.

When two images are taken of a scene with a calibrated stereo rig it is possible to construct a three-dimensional model of the scene provided that one can determine the coordinates of corresponding points in the two images. The task of determining the coordinates of corresponding points is frequently called *stereo matching* or *disparity estimation*. Numerous algorithms for this task have been proposed over the years and new ones continue to be developed.

This project will implement several stereo matching algorithms. Emphasis will be placed on *efficient* implementations which leverage all of Julia's features for writing fast code.

Example algorithms:

Bleyer, Michael, Christoph Rhemann, and Carsten Rother. "PatchMatch Stereo-Stereo Matching with Slanted Support Windows." Bmvc. Vol. 11. 2011.

Hirschmuller, Heiko. "Accurate and efficient stereo processing by semi-global matching and mutual information." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.

Gehrig, Stefan K., and Clemens Rabe. "Real-time semi-global matching on the CPU." Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 2010.

Experiences in JuliaImages are required. Familiarity with the algorithms are preferred.

A library of stereo matching algorithms with usage tutorials and documentation.

Camera calibration involves determining a camera's intrinsic parameters from a series of images of a so-called "calibration target". Knowledge of the intrinsic parameters facilitates three-dimensional reconstruction from images or video. The most frequently used calibration target is a checkerboard pattern. A key step in camera calibration involves automatically detecting the checkerboard and identifying landmarks such as the corners of each checkerboard square.

This project will implement a recent automatic checkerboard detection and feature extraction algorithm.

Example algorithm:

Y. Yan, P. Yang, L. Yan, J. Wan, Y. Sun, and K. Tansey, “Automatic checkerboard detection for camera calibration using self-correlation,” Journal of Electronic Imaging, vol. 27, no. 03, p. 1, May 2018.

Experiences in JuliaImages are required. Familiarity with the algorithms are preferred.

A checkeboard detection algorithm which can provide the necessary inputs to a camera calibration routine.

Interested students are encouraged to open an discussion in Images.jl to introduce themselves and discuss the detailed project ideas. To increase the chance of getting useful feedback, please provide detailed plans and ideas (don't just copy the contents here).

Javis:

JuliaAnimations andVISualizations

`Javis.jl`

is a general purpose Julia library to easily construct informative, performant, and winsome animated graphics. `Javis`

provides a powerful grammar for users to make animated visuals. Users of `Javis`

have made animations to explain concepts in a variety of fields such as mathematical concepts like Fourier transformation to brain imaging of EEGs. It builds on top of the Julia drawing framework Luxor by adding functions to simplify the creation of objects and their actions.

The Summer of Code Javis projects aims at simplifying the creation of animations to explain difficult concepts and communicate to broad audiences how Julia is a strong tool for graphics creation.

Below you can find a list of potential projects that can be tackled during Google Summer of Code. If interested in exploring any of these projects, please reach out to:

**Jacob Zelko**- email, Slack (username: TheCedarPrince), or Zulip (username: TheCedarPrince)**Ole Kröger**- Slack (username: Wikunia), or Zulip (username: Wikunia)

Thanks for your interest! 🎉

**Mentors**: Ole Kröger, Jacob Zelko

**Recommended skills**: General understanding of Luxor and the underlying structure of Javis.

**Difficulty:** Medium

**Description**: This project is split across several tasks that are manageable enough to be worked on by a single student in the Google Summer of Code period. These small tasks come together to create an easier and understandable syntax for Javis-based animated graphic creation. The following list are the smaller tasks one could work on:

One of the bigger missing features is the lack of combining several objects into a layer. Issue #75

To improve the user experience it will be helpful to ease object positioning based on other objects. Issue #130

For visual appeal, morphing shapes into one another shall be improved as it's currently an undocumented and unfinished feature. Issue #286

To bring Javis and Julia closer to the broader audience we are interested in the ability of live streaming animations to platforms like Twitch. Issue #91

**Mentors**: Ole Kröger, Jacob Zelko

**Recommended skills**: Knowledge about graph theory and LightGraphs.jl

**Difficulty:** Hard

**Description**: Javis could be a powerful platform to easily animate problems and their solutions in a variety of different fields. Currently, Javis lacks the ability to visualize graphs. The goal for this project would be to add graph support to Javis by supporting interoperability with LightGraphs.jl. The animation of flows and shortest path is something that's extremely valuable for teaching as well as in practical analysis of graph networks. To learn more about the current thoughts surrounding this problem, check this issue for more information.

**Mentors**: Ole Kröger, Jacob Zelko

**Recommended skills**: Basic to intermediate knowledge about linear algebra.

**Difficulty:** Easy

**Description**: Linear algebra is of invaluable importance all across different fields of mathematics and engineering. Enabling the easy creation of visualizations regarding rotations, matrices and other concepts is helpful in educating students about this amazing branch mathematics. Here are a few issues related to tasks that could be worked on to bring about this capability:

Vectors are foundational to linear algebra, help Javis visualize them! Issue #31

Drawing backgrounds such as grids can assist in easy viewing of complicated mathematical operations such as rotations. Issue #38

**Difficulty**: Easy to Medium.

Agents.jl is a pure Julia framework for agent-based modeling (ABM). It has an extensive list of features, excellent performance and is easy to learn, use, and extend. Comparisons with other popular frameworks written in Python or Java (NetLOGO, MASON, Mesa), show that Agents.jl outperforms all of them in computational speed, list of features and usability.

In this project students will be paired with lead developers of Agents.jl to improve Agents.jl with more features, better performance, and overall higher polish. Possible features to implement are:

File IO of current state of ABM to disk

Reading lists of human data (e.g. csv files) into

`Agent`

instances.New type of space representing a planet, which can be used in climate policy or human evolution modelling

Automatic performance increase of mixed-agent models by eliminating dynamic dispatch on the stepping function

Port of Open Street Map plotting to Makie.jl.

GPU support in Agents.jl

**Recommended Skills**: Familiarity with agent based modelling, Agents.jl and Julia's Type System. Background in complex systems, sociology, or nonlinear dynamics is not required.

**Expected Results**: Well-documented, well-tested useful new features for Agents.jl.

**Mentors**: George Datseris, Tim DuBois

**Difficulty:** Easy to Hard, depending on the algorithm chosen

DynamicalSystems.jl is an award-winning Julia software library for dynamical systems, nonlinear dynamics, deterministic chaos and nonlinear timeseries analysis. It has an impressive list of features, but one can never have enough. In this project students will be able to enrich DynamicalSystems.jl with new algorithms and enrich their knowledge of nonlinear dynamics and computer-assisted exploration of complex systems.

Possible projects are summarized in the wanted-features of the library

Examples include but are are not limited to:

Nonlinear local Lyapunov exponents

Final state sensitivity and fractal basin boundaries

Kolmogorov-Sinai entropy

Importance sampling for chaotic systems

and many more.

**Recommended Skills**: Familiarity with nonlinear dynamics and/or differential equations and the Julia language.

**Expected Results**: Well-documented, well-tested new algorithms for DynamicalSystems.jl.

**Mentors**: George Datseris

The student implements a state of the art smoother for continuous-time systems with additive Gaussian noise. The system's dynamics can be described as an ordinary differential equation with locally additive Gaussian random fluctuations, in other words a stochastic ordinary differential equation.

Given a series of measurements observed over time, containing statistical noise and other inaccuracies, the task is to produce an estimate of the unknown trajectory of the system that led to the observations.

*Linear* continuous-time systems are smoothed with the fixed-lag Kalman-Bucy smoother (related to the Kalman–Bucy_filter). It relies on coupled ODEs describing how mean and covariance of the conditional distribution of the latent system state evolve over time. A versatile implementation in Julia is missing.

**Expected Results**: Build efficient implementation of non-linear smoothing of continuous stochastic dynamical systems.

**Recommended Skills**: Gaussian random variables, Bayes' formula, Stochastic Differential Equations

**Mentors**: Moritz Schauer

**Rating**: Hard

LoopModels.jl uses an internal representation of loops that represents the iteration space of each constituent operation as well as their dependencies. The iteration spaces of inner loops are allowed to be affine functions of the outer loops, and multiple loops are allowed to exist at each level of a loopnest. LoopModels.jl aims to support optimizations including fusion, splitting, permuting loops, unrolling, and vectorization to maximize throughput. Broadly, this functionality can be divided into four pieces:

The internal representation of the loops (Loop IR).

Means of creating the internal representation from Julia code. This must be able to deconstruct and simplify user provided types into the primitive types representable by IR, e.g. decompose

`ForwardDiff.Dual{T1,ForwardDiff.Dual{T2,Float64,N},M}`

operations into operations on the underlying`Float64`

.Analyze the representation to determine an optimal, correct, and target-specific schedule.

Generate runnable code according to the schedule.

Open projects on this effort include:

This can include refining the search, dependency analysis, and cost modeling.

Develop the front end that infers the loop structure from Julia code, and creates the Loop IR. This will likely live as a compiler plugin, but infrastrcture such as GPUCompiler.jl to interface more directly on the LLVM level is worth exploring, as this would allow taking advantage of LLVM's existing infrastrcture.

Code would be generated through LLVM.jl. It must be able to follow the schedule determined by the optimization. The schedule is abstract, so care must still be taken to generate optimal code when following the schedule, e.g. to optimally keep track of the loop bounds, handle remainders, and indexing the arrays.

Mentors: Chris Elrod.

- View all GSoC/JSoC Projects
- Projects
- New geostatistical clustering methods
- New geostatistical simulation methods
- Migrate from Plots.jl to Makie.jl recipes
- How to get started?
- Particle swarm optimization of machine learning models
- In-processing methods for fairness in machine learning
- Causal and counterfactual methods for fairness in machine learning
- Time series forecasting at scale - speed up via Julia
- Interpretable Machine Learning in Julia
- Model visualization in MLJ
- Deeper Bayesian Integration
- MLJ and MLFlow integration
- Speed demons only need apply
- Improving test coverage
- Multi-threading Improvement Projects
- Automated performance measurements
- Towards DeepChem.jl: Combining Machine Learning with Chemical Knowledge
- Bringing DFTK to graphics-processing units (GPUs)
- Native Julia ODE, SDE, DAE, DDE, and (S)PDE Solvers
- Improvements to Physics-Informend Neural networks (PINN) for solving differential equations
- Performance enhancements for differential equation solvers
- Discretizations of partial differential equations
- Tools for global sensitivity analysis
- Parameter identifiability analysis
- Model Order Reduction
- Automated symbolic manipulations of differential equation systems
- Documenter.jl
- Docsystem API
- Flux.jl
- Deep Learning for source code analysis
- Scheduling algorithms for Distributed algorithms
- Distributed Training
- Wide-ranging demos (easy)
- Benchmarking against other frameworks (medium)
- GPU support for many algorithms (hard)
- Better ImageMagick supports (medium)
- Better ImageIO supports (medium)
- Interactivity and visualization tools (open-ended)
- Integration of OpenCV and JuliaImages (hard)
- Contributions to a Stereo Matching Package (medium)
- Contributions to a Calibration Target package (medium)
- General Improvement to User Experience
- Graph and networks
- Linear algebra
- Agents.jl
- DynamicalSystems.jl
- Smoothing non-linear continuous time systems
- Improving and refining the IR and analysis
- Develop the frontend to infer the loop structure
- Generating code
- Numerical Linear Algebra
- Better Bignums Integration
- Pluto as a VS Code notebook
- Macro support
- Tools for education
- Machine Learning Time Series Regression
- Machine learning for nowcasting and forecasting
- Time series forecasting at scales
- Integration of FEniCS.jl with dolfin-adjoint + Zygote.jl for Finite Element Scientific Machine Learning
- Multi-Start Optimization Methods
- Groebner basis and Symbolic root finding
- Symbolic Integration
- Implement Flashfill in Julia
- Parquet.jl enhancements
- Statistical transforms
- Utility transforms
- How to get started?
- MCMCChains improvements
- Particle filtering methods
- Nested Sampling
- GPU acceleration
- Documentation and tutorial improvements
- Iterative Methods for Inference in Gaussian Processes
- Implement advanced variational Gaussian process models
- VS Code extension
- Package installation UI
- Code generation improvements and async ABI
- Wasm threading
- High performance, Low-level integration of js objects
- DOM Integration
- Porting existing web-integration packages to the wasm platform
- Native dependencies for the web
- Distributed computing with untrusted parties
- Deployment

Matrix functions map matrices onto other matrices, and can often be interpreted as generalizations of ordinary functions like sine and exponential, which map numbers to numbers. Once considered a niche province of numerical algorithms, matrix functions now appear routinely in applications to cryptography, aircraft design, nonlinear dynamics, and finance.

This project proposes to implement state of the art algorithms that extend the currently available matrix functions in Julia, as outlined in issue #5840. In addition to matrix generalizations of standard functions such as real matrix powers, surds and logarithms, students will be challenged to design generic interfaces for lifting general scalar-valued functions to their matrix analogues for the efficient computation of arbitrary (well-behaved) matrix functions and their derivatives.

**Recommended Skills**: A strong understanding of calculus and numerical analysis.

**Expected Results**: New and faster methods for evaluating matrix functions.

**Mentors:** Jiahao Chen, Steven Johnson.

**Difficulty:** Hard

Julia currently supports big integers and rationals, making use of the GMP. However, GMP currently doesn't permit good integration with a garbage collector.

This project therefore involves exploring ways to improve BigInt, possibly including:

Modifying GMP to support high-performance garbage-collection

Reimplementation of aspects of BigInt in Julia

Lazy graph style APIs which can rewrite terms or apply optimisations

This experimentation could be carried out as a package with a new implementation, or as patches over the existing implementation in Base.

**Expected Results**: An implementation of BigInt in Julia with increased performance over the current one.

**Require Skills**: Familiarity with extended precision numerics OR performance considerations. Familiarity either with Julia or GMP.

**Mentors**: Jameson Nash

**Difficulty:** Hard

As a technical computing language, Julia provides a huge number of special functions, both in Base as well as packages such as StatsFuns.jl. At the moment, many of these are implemented in external libraries such as Rmath and openspecfun. This project would involve implementing these functions in native Julia (possibly utilising the work in SpecialFunctions.jl), seeking out opportunities for possible improvements along the way, such as supporting `Float32`

and `BigFloat`

, exploiting fused multiply-add operations, and improving errors and boundary cases.

**Recommended Skills**: A strong understanding of calculus.

**Expected Results**: New and faster methods for evaluating properties of special functions.

**Mentors:** Steven Johnson, Oscar Smith. Ask on Discourse or on slack

The CCSA algorithm by Svanberg (2001) is a nonlinear programming algorithm widely used in topology optimization and for other large-scale optimization problems: it is a robust algorithm that can handle arbitrary nonlinear inequality constraints and huge numbers of degrees of freedom. Moreover, the relative simplicity of the algorithm makes it possible to easily incorporate sparsity in the Jacobian matrix (for handling huge numbers of constraints), approximate-Hessian preconditioners, and as special-case optimizations for affine terms in the objective or constraints. However, currently it is only available in Julia via the NLopt.jl interface to an external C implementation, which greatly limits its flexibility.

**Recommended Skills**: Experience with nonlinear optimization algorithms and understanding of Lagrange duality, familiarity with sparse matrices and other Julia data structures.

**Expected Results**: A package implementing a native-Julia CCSA algorithm.

**Mentors:** Steven Johnson.

VS Code is an extensible editor, and one of its most recent features is a notebook GUI, with a corresponding Notebook API, allowing extension developers to write their own *notebook backend*. We want to combine two popular Julia IDEs: VS Code and Pluto.jl, and use it to provide a mature editing and debugging experience combined with Pluto's reactivity.

**Expected Results:** Reactive notebook built on top of VSCode's notebook API.

**Recommended skills:** JavaScript/TypeScript, some Julia experience

**Mentors:** Sebastian Pfitzner (core maintainer of julia-vscode), Fons van der Plas (core maintainer of Pluto.jl) and friends

*Also see the other VS Code projects!*

Macros are a core feature of Julia, and many important packages (Flux, JuMP, DiffEq, …) use them in creative ways. Pluto's reactivity is based on *syntax analysis* to find the assigned and referenced variables of each cell. This powers not just reactive evaluation, but also Pluto's global scope management, and `@bind`

interactivity. (See the JuliaCon presentation for more info.)

Macros can assign to a variable without Pluto detecting it as such. For example, `@variables x y`

from Symbolics.jl *assigns* to variables `x`

and `y`

, while Pluto thinks that `x`

and `y`

were referenced. Your project is to **add macro support to Pluto**. Julia has the built-in ability to 'expand' macros on demand, but integrating this into Pluto's reactive runtime remains a significant algorithm design problem. More info in Pluto.jl#196.

**Expected Results:** First objective: process macros from packages, second (more difficult) objective: support macros defined inside the notebook itself.

**Recommended skills:** Julia, you will learn about metaprogramming, algorithm design and distributed computing

**Mentors:** Fons van der Plas and fellow Pluto.jl maintainers

Pluto's primary use case is education, and we recently started using Pluto notebooks as an 'interactive textbook': https://computationalthinking.mit.edu/ . If you are interested in design and interactive visualization, there are lots of cool JS projects in this area. Examples include:

Linking video content to dynamic content, better integration between exercise and lecture material.

Experiment with playing back the edits to a notebook session, like a video, but on a scrollable page. (link).

Syntax analysis to automatically review 'code style'

Improved live check and autograding tools

And so on! Take a look at our project board and get in touch if you have further ideas: fons@plutojl.org

**Expected Results:** *One* of the items above! When finished, your work will be used in future editions of the Computational Thinking course and more!

**Recommended skills:** JavaScript, CSS, you can learn Julia as part of the project.

**Mentors:** Fons van der Plas, Connor Burns and fellow Pluto.jl maintainers, with feedback from Alan Edelman

Pythia is a package for scalable machine learning time series forecasting and nowcasting in Julia.

The project mentors are Andrii Babii and Sebastian Vollmer.

This project involves developing scalable machine learning time series regressions for nowcasting and forecasting. Nowcasting in economics is the prediction of the present, the very near future, and the very recent past state of an economic indicator. The term is a contraction of "now" and "forecasting" and originates in meteorology.

The objective of this project is to introduce scalable regression-based nowcasting and forecasting methodologies that demonstrated the empirical success in data-rich environment recently. Examples of existing popular packages for regression-based nowcasting on other platforms include the "MIDAS Matlab Toolbox", as well as the 'midasr' and 'midasml' packages in R. The starting point for this project is porting the 'midasml' package from R to Julia. Currently Pythia has the sparse-group LASSO regression functionality for forecasting.

The following functions are of interest: in-sample and out-of sample forecasts/nowcasts, regularized MIDAS with Legendre polynomials, visualization of nowcasts, AIC/BIC and time series cross-validation tuning, forecast evaluation, pooled and fixed effects panel data regressions for forecasting and nowcasting, HAC-based inference for sparse-group LASSO, high-dimensional Granger causality tests. Other widely used existing functions from R/Python/Matlab are also of interest.

**Recommended skills:** Graduate-level knowledge of time series analysis, machine learning, and optimization is helpful.

**Expected output:** The student is expected to produce code, documentation, visualization, and real-data examples.

**References:** Contact project mentors for references.

Modern business applications often involve forecasting hundreds of thousands of time series. Producing such a gigantic number of reliable and high-quality forecasts is computationally challenging, which limits the scope of potential methods that can be used in practice, see, e.g., the 'forecast', 'fable', or 'prophet' packages in R. Currently, Julia lacks the scalable time series forecasting functionality and this project aims to develop the automated data-driven and scalable time series forecasting methods.

The following functionality is of interest: forecasting intermittent demand (Croston, adjusted Croston, INARMA), scalable seasonal ARIMA with covariates, loss-based forecasting (gradient boosting), unsupervised time series clustering, forecast combinations, unit root tests (ADF, KPSS). Other widely used existing functions from R/Python/Matlab are also of interest.

**Recommended skills:** Graduate-level knowledge of time series analysis is helpful.

**Expected output:** The student is expected to produce code, documentation, visualization, and real-data examples.

**References:** Contact project mentors for references.

These projects are hosted by the SciML Open Source Scientific Machine Learning Software Organzation.

Neural networks can be used as a method for efficiently solving difficult partial differential equations. Recently this strategy has been dubbed physics-informed neural networks and has seen a resurgence because of its efficiency advantages over classical deep learning. Efficient implementations from recent papers are being explored as part of the NeuralNetDiffEq.jl package. The issue tracker contains links to papers which would be interesting new neural network based methods to implement and benchmark against classical techniques. Project work in this area includes:

Improved training strategies for PINNs.

Implementing new neural architectures that impose physical constraints like divergence-free criteria.

Demonstrating large-scale problems solved by PINN training.

Improving the speed and parallelization of PINN training routines.

This project is good for both software engineers interested in the field of scientific machine learning and those students who are interested in perusing graduate research in the field.

**Recommended Skills**: Background knowledge in numerical analysis and machine learning.

**Expected Results**: New neural network based solver methods.

**Mentors**: Chris Rackauckas

Neural ordinary differential equations have been shown to be a way to use machine learning to learn differential equation models. Further improvements to the methodology, like universal differential equations have incorporated physical and biological knowledge into the system in order to make it a data and compute efficient learning method. However, there are many computational aspects left to explore. The purpose of this project is to enhance the universal differential equation approximation abilities of DiffEqFlux.jl, adding features like:

Improved adjoints for DAEs and SDEs

Various improvements to minibatching

Support for second order ODEs (i.e. symplectic integrators)

See the DiffEqFlux.jl issue tracker for full details.

This project is good for both software engineers interested in the field of scientific machine learning and those students who are interested in perusing graduate research in the field.

**Recommended Skills**: Background knowledge in numerical analysis and machine learning.

**Expected Results**: New and improved methods for neural and universal differential equations.

In many cases, when attempting to optimize a function `f(p)`

each calculation of `f`

is very expensive. For example, evaluating `f`

may require solving a PDE or other applications of complex linear algebra. Thus, instead of always directly evaluating `f`

, one can develop a surrogate model `g`

which is approximately `f`

by training on previous data collected from `f`

evaluations. This technique of using a trained surrogate in place of the real function is called surrogate optimization and mixes techniques from machine learning to accelerate optimization.

Advanced techniques utilize radial basis functions and Gaussian processes in order to interpolate to new parameters to estimate `f`

in areas which have not been sampled. Adaptive training techniques explore how to pick new areas to evaluate `f`

to better hone in on global optima. The purpose of this project is to explore these techniques and build a package which performs surrogate optimizations.

**Recommended Skills**: Background knowledge of standard machine learning, statistical, or optimization techniques. Strong knowledge of numerical analysis is helpful but not required.

**Expected Results**: Library functions for performing surrogate optimization with tests on differential equation models.

**Mentors**: Chris Rackauckas

Machine learning has become a popular tool for understanding data, but scientists typically understand the world through the lens of physical laws and their resulting dynamical models. These models are generally differential equations given by physical first principles, where the constants in the equations such as chemical reaction rates and planetary masses determine the overall dynamics. The inverse problem to simulation, known as parameter estimation, is the process of utilizing data to determine these model parameters.

The purpose of this project is to utilize the growing array of statistical, optimization, and machine learning tools in the Julia ecosystem to build library functions that make it easy for scientists to perform this parameter estimation with the most high-powered and robust methodologies. Possible projects include improving methods for Bayesian estimation of parameters via Stan.jl and Julia-based libraries like Turing.jl, or global optimization-based approaches. Novel techniques like classifying model outcomes via support vector machines and deep neural networks can also be considered. Research and benchmarking to attempt to find the most robust methods will take place in this project. Additionally, the implementation of methods for estimating structure, such as topological sensitivity analysis along with performance enhancements to existing methods will be considered.

Some work in this area can be found in DiffEqParamEstim.jl and DiffEqBayes.jl. Examples can be found in the DifferentialEquations.jl documentation.

**Recommended Skills**: Background knowledge of standard machine learning, statistical, or optimization techniques. It's recommended but not required that one has basic knowledge of differential equations and DifferentialEquations.jl. Using the differential equation solver to get outputs from parameters can be learned on the job, but you should already be familiar (but not necessarily an expert) with the estimation techniques you are looking to employ.

**Expected Results**: Library functions for performing parameter estimation and inferring properties of differential equation solutions from parameters. Notebooks containing benchmarks determining the effectiveness of various methods and classifying when specific approaches are appropriate will be developed simultaneously.

**Mentors**: Chris Rackauckas, Vaibhav Dixit

Scientific machine learning requires mixing scientific computing libraries with machine learning. This blog post highlights how the tooling of Julia is fairly advanced in this field compared to alternatives such as Python, but one area that has not been completely worked out is integration of automatic differentiation with partial differential equations. FEniCS.jl is a wrapper to the FEniCS project for finite element solutions of partial differential equations. We would like to augment the Julia wrappers to allow for integration with Julia's automatic differentiation libraries like Zygote.jl by using dolfin-adjoint. This would require setting up this library for automatic installation for Julia users and writing adjoint passes which utilize this adjoint builder library. It would result in the first total integration between PDEs and neural networks.

**Recommended Skills**: A basic background in differential equations and Python. Having previous Julia knowledge is preferred but not strictly required.

**Expected Results**: Efficient and high-quality implementations of adjoints for Zygote.jl over FEniCS.jl functions.

**Mentors**: Chris Rackauckas

While standard machine learning can be shown to be "safe" for local optimization, scientific machine learning can sometimes require the use of globalizing techniques to improve the optimization process. Hybrid methods, known as multistart optimization methods, glue together a local optimization technique together with a parameter search over a large space of possible initial points. The purpose of this project would be to take a MultistartOptimization.jl as a starting point and create a fully featured set of multistart optimization tools for use with Optim.jl

**Recommended Skills**: A basic background in optimization. Having previous Julia knowledge is preferred but not strictly required.

**Expected Results**: Efficient and high-quality implementations of multistart optimization methods.

**Mentors**: Chris Rackauckas and Patrick Kofod Mogensen

Implement solving polynomial equation systems symbolically. (I.e. finding the variety of a set of polynomials). This involves first computing the Groebner basis for a set of polynomials. Groebner basis computation is NP complete so it is essential that the implementation is practical. It should start by studying the literature on state-of-the art Groebner basis solvers.

**Recommended Skills**: Calculus and discrete mathematics. Prior knowledge of computational algebra and ring theory is preferred.

**Expected Results**: Working Groebner basis and rootfinding algorithms to be deployed in the Symbolics.jl package, along with documentation and tutorials.

**Mentors**: Shashi Gowda, Yingbo Ma, Mason Protter

Implement the heuristic approach to symbolic integration. Then hook into a repository of rules such as RUMI

**Recommended Skills**: Calculus

**Expected Results**: A working implementation of symbolic integration in the Symbolics.jl library, along with documentation and tutorials demonstrating its use in scientific disciplines.

**Mentors**: Shashi Gowda, Yingbo Ma, Mason Protter

**Difficulty**: Medium

*FlashFill* is mechanism for creating data manipulation pipelines using programming by example (PBE). As an example see this implementation in Microsoft Excel. We want a version of Flashfill that can work against Julia tabular data structures, such as DataFrames and Tables.

**Resources**:

A presentation by Sumit Gulwani of Microsoft Research

A video

**Recommended Skills**: Compiler techniques, DSL generation, Program synthesis

**Expected Output**: A practical flashfill implementation that can be used on any tabular data structure in Julia

**Mentors**: Avik Sengupta

**Difficulty**: Medium

Apache Parquet is a binary data format for tabular data. It has features for compression and memory-mapping of datasets on disk. A decent implementation of Parquet in Julia is likely to be highly performant. It will be useful as a standard format for distributing tabular data in a binary format. There exists a Parquet.jl package that has a Parquet reader and a writer. It currently conforms to the Julia Tabular file IO interface at a very basic level. It needs more work to add support for critical elements that would make Parquet.jl usable for fast large scale parallel data processing. One or more of the following goals can be targeted:

Lazy loading and support for out-of-core processing, with Arrow.jl and Tables.jl integration. Improved usability and performance of Parquet reader and writer for large files.

Reading from and writing data on to cloud data stores, including support for partitioned data.

Support for missing data types and encodings making the Julia implementation fully featured.

**Resources:**

The Parquet file format (also are many articles and talks on the Parquet storage format on the internet)

**Recommended skills:** Good knowledge of Julia language, Julia data stack and writing performant Julia code.

**Expected Results:** Depends on the specific projects we would agree on.

**Mentors:** Shashi Gowda, Tanmay Mohapatra

TableTransforms.jl provides transforms that are commonly used in statistics and machine learning. It was developed to address specific needs in feature engineering and works with general Tables.jl tables.

Project mentors: Júlio Hoffimann

Statistical transforms such as PCA, Z-score, etc, can greatly improve the convergence of various statistical learning models, and are widely used in advanced machine learning pipelines. In this project the mentee will learn how to implement advanced transforms such as PPMT and other transforms for imputation of missing values.

**Desired skills:** Statistics, Machine Learning

**Difficulty level:** Medium

**Expected duration:** 350hrs

**References:**

Utility transforms such as standardization of column names and other string-based transforms are extremely important for digesting real-world data. In this project the mentee will learn good coding practices and will implement various utility transforms available in other languages (e.g. Janitor package in R, pyjanitor in Python).

**Desired skills:** Text processing, Regex

**Difficulty level:** Easy

**Expected duration:** 175hrs

**References:**

Address open issues in the package.

Please contact Júlio Hoffimann on Zulip if you have any questions.

Turing is a universal probabilistic programming language embedded in Julia. Turing allows the user to write models in standard Julia syntax, and provide a wide range of sampling-based inference methods for solving problems across probabilistic machine learning, Bayesian statistics and data science etc. Since Turing is implemented in pure Julia code, its compiler and inference methods are amenable to hacking: new model families and inference methods can be easily added. Below is a list of ideas for potential projects, though you are welcome to propose your own to the Turing team.

If you are interested in exploring any of these projects, please reach out to the listed project mentors. You can find their contact information at turing.ml/team.

**Mentors**: Cameron Pfiffer, Hong Ge

**Project difficulty**: Easy

**Description**: MCMCChains is a key component of the Turing.jl ecosystem. It is the package that determines how to analyze and store MCMC samples provided by packages like Turing. It's also used outside of Turing.

For this project, a student might improve the performance of the various statistical functions provided by MCMCChains, changing the back end to use a data storage format that maintains the shape of parameter samples, or improve the general plotting functionality of the package.

There's lots of fun little things to do for MCMCChains. Check out this meta-issue for more details and discussions.

**Mentors**: Hong Ge, Cameron Pfiffer

**Project difficulty**: Medium

**Description**: Turing's support for particle sampling methods is slowing being improved with the addition of AdvancedPS.jl. If you're interested in implementing or improving particle sampling methods, this is a great project for you!

**Mentors**: Miles Lucas, Cameron Pfiffer, Hong Ge

**Project difficulty**: Hard

**Description**: NestedSamplers.jl is an excellent package which implements nested sampling methods. As of yet, it is not connected to Turing.jl. For this project, a student would connect the NestedSamplers.jl library to Turing.jl.

**Mentors**: Mohamed Tarek, Hong Ge, Kai Xu, Tor Fjelde

**Project difficulty**: Medium

**Description**: Turing's native GPU support is limited in that the Metropolis-Hastings and HMC samplers do not implement GPU sampling methods. This can and should be done – GPU methods are awesome! If you are interested with working on parallelism and GPUs, this project is for you.

Students will work with the code at AdvancedMH or AdvancedHMC, depending on their interests.

**Mentors**: Cameron Pfiffer, Martin Trapp

**Project difficulty**: Easy

**Description**: Turing's documentation and tutorials need a bit of an overhaul. Turing has changed significantly since the last time the documentation was written, and it's beginning to show. Students would use their knowledge of probabilistic programming languages and Turing to shore-up or rewrite documentation and tutorials.

**Mentors**: Will Tebbutt, S. T. John, Theo Galy-Fajou

**Project difficulty**: Medium

**Description**: There has recently been quite a bit of work on inference methods for GPs that use iterative methods rather than the Cholesky factorisation. They look quite promising, but no one has implemented any of these within the Julia GP ecosystem yet, but they should fit nicely within the AbstractGPs framework. If you're interested in improving the GP ecosystem in Julia, this project might be for you!

**Mentors**: ST John, Will Tebbutt, Theo Galy-Fajou

**Project difficulty**: Easy to Medium

**Description**: Sparse variational Gaussian process models provide the flexibility to scale to large datasets, handle arbitrary (non-conjugate) likelihoods, and to be used as building blocks for composite models such as deep GPs. This project is about making such models more readily available within the Julia GP ecosystem - depending on your interests you can focus on making it easier for end users and providing good tutorials, or on the implementations of these models to give us the same or better performance as with established Python packages such as GPflow, integrating with Flux.jl, etc.

We are generally looking for folks that want to help with the Julia VS Code extension. We have a long list of open issues, and some of them amount to significant projects.

**Required Skills**: TypeScript, julia, web development.

**Expected Results**: Depends on the specific projects we would agree on.

**Mentors**: David Anthoff

The VSCode extension for Julia could provide a simple way to browse available packages and view what's installed on a users system. To start with, this project could simply provide a GUI that reads in package data from a `Project.toml`

/`Manifest.toml`

and show some UI elements to add/remove/manage those packages.

This could also be extended by having metadata about the package, such as a readme, github stars, activity and so on (somewhat similar to the VSCode-native extension explorer).

**Expected Results**: A UI in VSCode for package operations.

**Recommended Skills**: Familiarity with TypeScript and Julia development.

**Mentors**: Sebastian Pfitzner

*Also take a look at Pluto - VS Code integration!*

Julia has early support for targeting WebAssembly and running in the web browser. Please note that this is a rapidly moving area (see the project repository for a more detailed overview), so if you are interested in this work, please make sure to inform yourself of the current state and talk to us to scope out an appropriate project. The below is intended as a set of possible starting points.

Mentor for these projects is Keno Fischer unless otherwise stated.

Because Julia relies on an asynchronous task runtime and WebAssembly currently lacks native support for stack management, Julia needs to explicitly manage task stacks in the wasm heap and perform a compiler transformation to use this stack instead of the native WebAssembly stack. The overhead of this transformation directly impacts the performance of Julia on the wasm platform. Additionally, since all code Julia uses (including arbitrary C/C++ libraries) must be compiled using this transformation, it needs to cover a wide variety of inputs and be coordinated with other users having similar needs (e.g. the Pyodide project to run python on the web). The project would aim to improve the quality, robustness and flexibility of this transformation.

**Recommended Skills**: Experience with LLVM.

WebAssembly is in the process of standardizing threads. Simultaneously, work is ongoing to introduce a new threading runtime in julia (see #22631 and replated PRs). This project would investigate enabling threading support for Julia on the WebAssembly platform, implementing runtime parallel primitives on the web assembly platform and ensuring that high level threading constructs are correctly mapped to the underlying platform. Please note that both the WebAssembly and julia threading infrastructure is still in active development and may continue to change over the duration of the project. An informed understanding of the state of these projects is a definite prerequisite for this project.

**Recommended Skills**: Experience with C and multi-threaded programming.

WebAssembly is in the process of adding first class references to native objects to their specification. This capability should allow very high performance integration between julia and javascript objects. Since it is not possible to store references to javascript objects in regular memory, adding this capability will require several changes to the runtime system and code generation (possibly including at the LLVM level) in order to properly track these references and emit them either as direct references to as indirect references to the reference table.

**Recommended Skills**: Experience with C.

While julia now runs on the web platform, it is not yet a language that's suitable for first-class development of web applications. One of the biggest missing features is integration with and abstraction over more complicated javascript objects and APIs, in particular the DOM. Inspiration may be drawn from similar projects in Rust or other languages.

**Recommended Skills**: Experience with writing libraries in Julia, experience with JavaScript Web APIs.

Several Julia libraries (e.g. WebIO.jl, Escher.jl) provide input and output capabilities for the web platform. Porting these libraries to run directly on the wasm platform would enable a number of existing UIs to automatically work on the web.

**Recommended Skills**: Experience with writing libraries in Julia.

The Julia project uses BinaryBuilder to provide binaries of native dependencies of julia packages. Experimental support exists to extend this support to the wasm platform, but few packages have been ported. This project would consist of attempting to port a significant fraction of the binary dependencies of the julia ecosystem to the web platform by improving the toolchain support in BinaryBuilder or (if necessary), porting upstream packages to fix assumptions not applicable on the wasm platform.

**Recommended Skills**: Experience with building native libraries in Unix environments.

The Distributed computing abstractions in julia provide convenient abstraction for implementing programs that span many communicating julia processes on different machines. However, the existing abstractions generally assume that all communicating processes are part of the same trust domain (e.g. they allow messages to execute arbitrary code on the remote). With some of the nodes potentially running in the web browser (or multiple browser nodes being part of the same distributed computing cluster via WebRPC), this assumption no longer holds true and new interfaces need to be designed to support multiple trust domains without overly restricting usability.

**Recommended Skills**: Experience with distributed computing and writing libraries in Julia.

Currently supported use cases for julia on the web platform are primarily geared towards providing interactive environments to support exploration of the full language. Of course, this leads to significantly larger binaries than would be required for using Julia as part of a production deployment. By disabling dynamic language features (e.g. eval) one could generate small binaries suitable for deployment. Some progress towards this exists in packages like PackageCompiler.jl, though significant work remains to be done.

**Recommended Skills**: Interest in or experience with Julia internals.