View all GSoC/JSoC Projects

This page is designed to improve discoverability of projects. You can, for example, search this page for specific keywords and find all of the relevant projects.

Projects

GeoStats.jl - Summer of Code

GeoStats.jl is an extensible framework for high-performance geostatistics in Julia. It is a project that aims to redefine the way statistics is done with geospatial data (e.g. data on geographics maps, 3D meshes).

Project mentors: Júlio Hoffimann, Rafael Caixeta

New geostatistical clustering methods

Statistical clustering cannot be applied straightforwardly to geospatial data. Geospatial constraints require clusters to be contiguous volumes in the map, something that is not taken into account by traditional methods (e.g. K-Means, Spectral Clustering).

The goal of this project is to implement a geospatial clustering method from the geostatistics literature using the GeoStats.jl API.

Desired skills: Statistics, Clustering, Graph Theory

Difficulty level: Medium

References:

New geostatistical simulation methods

Geostatistical simulation consists of generating multiple alternative realizations of geospatial data according to a given geospatial distribution. The litetaure on simulation methods is vast, but a few of them are particularly useful.

The goal of this project is to implement a geostatistical simulation method from the geostatistics literature using the GeoStats.jl API.

Desired skills: Geostatistics, Stochastics, HPC

Difficulty level: Hard

References:

Migrate from Plots.jl to Makie.jl recipes

The project currently relies on Plots.jl recipes to visualize geospatial data sets as well as many other objects defined in the framework. However, very large data sets (e.g. 3D volumes) cannot be visualized easily. The Makie.jl project is a promissing alternative.

The goal of this project is to migrate all plot recipes from Plots.jl to Makie.jl.

Desired skills: Visualization, Plotting, Geometry, HPC, GPU

Difficulty level: Medium

How to get started?

Get familiar with the framework by reading the documentation and tutorials.

Please contact the project maintainers in Gitter or Zulip.

GSOC projects

2021 Ideas

Titles & possible mentors

MLJ Projects – Summer of Code

MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem.

MLJ is released under the MIT license and sponsored by the Alan Turing Institute.

Particle swarm optimization of machine learning models

Bring particle swarm optimization to the MLJ machine learning platform to help users tune machine learning models.

Difficulty. Easy - moderate.

Description

Imagine your search for the optimal machine learning model as the meandering flight of a bee through hyper-parameter space, looking for a new home for the queen. Parallelize your search, and you've created a swarm of bees. Introduce communication between the bees about their success so far, and you introduce the possibility of the bees ultimately converging on good candidate for the best model.

PSO (particle swarm optimization) is a large, promising, and active area of research, but also one that is used in real data science practice. The method is based on a very simple idea inspired by nature and makes essentially no assumptions about the nature of the cost function (unlike other methods, such as gradient descent, which might require a handle on derivatives). The method is simple to implement, and applicable to a wide range of hyper-parameter optimization problems.

Mentors. Anthony Blaom, Sebastian Vollmer

Prerequisites

Your contribution

The aim of this project is to implement one or more variants of PSO algorithm, for use in the MLJ machine learning platform, for the purpose of optimizing hyper-parameters. Integration with MLJ is crucial, so there will be opportunity to spend time familiarizing yourself with this popular tool.

Specifically, you will:

API](https://github.com/alan-turing-institute/MLJTuning.jl#how-do-i-implement-a-new-tuning-strategy)

References

In-processing methods for fairness in machine learning

Mentors: Jiahao Chen, Moritz Schauer, and Sebastian Vollmer

Fairness.jl is a package to audit and mitigate bias, using the MLJ machine learning framework and other tools. It has implementations of some preprocessing and postprocessing methods for improving fairness in classification models, but could use more implementations of other methods, especially inprocessing algorithms like adversarial debiasing.

Difficulty Hard.

Prerequisites

Description

Machine learning models are developed to support and make high-impact decisions like who to hire or who to give a loan to. However, available training data can exhibit bias against race, age, gender, or other prohibited bases, reflecting a complex social and economic history of systemic injustic. For example, women in the United Kingdom, United States and other countries were only allowed to have their own bank accounts and lines of credit in the 1970s! That means that training a credit decisioning model on historical data would encode implicit biases, that women are less credit-worthy because few of them had lines of credit in the past. Surely we would want to be fair and not hinder an applicant's ability to get a loan on the basis of their race, gender and age?

So how can we fix data and models that are unfair? A common first reaction is to remove the race, gender and age attributes from the training data, and then say we are done. But as described in detail in the references, we cam have to consider if other features like one's name or address could encode such prohibited bases too. To mitigate bias and improve fairness in models, we can change the training data (pre-processing), the way we define and train the model (in-processing), and/or alter the predictions made (post-processing). Some algorithms for the first and third approaches have already been implemented in Fairness.jl, which have the advantage of treating the ML model as a black box. However, our latest resarch (arXiv:2011.02407) shows that pur black box methods have fundamental limitations in their ability to mitigate bias.

Your contribution

This project is to implement more bias mitigation algorithms and invent new ones too. We will focus on in-processing algorithms that alter the training process or alter ML model. Some specific stages are to:

  1. Use Flux.jl or MLJFlux.jl to develop in-processing algorithms,

  2. Study research papers proposing in-processing algorithms and implement them, and

  3. Implement fairness algorithms and metrics for individual fairness as described in papers like arXiv:2006.11439.

References

  1. High-level overview: https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb

  2. https://nextjournal.com/ashryaagr/fairness

  3. IBM’s AIF360 resources: https://aif360.mybluemix.net/

    AIF360 Inprocessing algorithms: Available here.

  4. https://dssg.github.io/fairness_tutorial/

Causal and counterfactual methods for fairness in machine learning

Mentors: Jiahao Chen, Moritz Schauer, Zenna Tavares, and Sebastian Vollmer

Fairness.jl is a package to audit and mitigate bias, using the MLJ machine learning framework and other tools. This project is to implement algorithms for counterfactual ("what if") reasoning and causal analysis to Fairness.jl and MLJ.jl, integrating and extending Julia packages for causal analysis.

Difficulty Hard.

Prerequisites

Description

Machine learning models are developed to support and make high-impact decisions like who to hire or who to give a loan to. However, available training data can exhibit bias against race, age, gender, or other prohibited bases, reflecting a complex social and economic history of systemic injustic. For example, women in the United Kingdom, United States and other countries were only allowed to have their own bank accounts and lines of credit in the 1970s! That means that training a credit decisioning model on historical data would encode implicit biases, that women are less credit-worthy because few of them had lines of credit in the past. Surely we would want to be fair and not hinder an applicant's ability to get a loan on the basis of their race, gender and age?

So how can we fix unfairness in models? Arguably, we should first identify the underlying causes of bias, and only then can we actually remediate bias successfully. However, one major challenge is that a proper evaluation often requires data that we don't have. For this reason, we also need counterfactual analysis, to identify actions we can take that can mitigate fairness not just in our training data, but also in situations we haven't seen yet but could encounter in the future. Ideas for identifying and mitigating bias using such causal interventions have been proposed in papers such as Equality of Opportunity in Classification: A Causal Approach and the references below.

Your contribution

This project is to implement algorithms for counterfactual ("what if") reasoning and causal analysis to Fairness.jl and MLJ.jl, integrating and extending Julia packages for causal analysis. Some specific stages are:

  1. Implement interfaces in MLJ.jl for Julia packages for causal inference and probabilistic programming such as Omega.jl and CausalInference.jl](https://github.com/mschauer/CausalInference.jl)

  2. Implement and benchmark causal and counterfactual definitons for measuring unfairness

  3. Implement and benchmark causal and counterfactual approaches to mitigate bias

References

Time series forecasting at scale - speed up via Julia

Time series are ubiquitous - stocks, sensor reading, vital signs. This projects aims at adding time series forecasting to MLJ and perform benchmark comparisons to sktime, tslearn, tsml).

Difficulty. Easy - moderate.

Prerequisites

Your contribution

MLJ is so far focused on tabular data and time series classification. This project is to add support for time series data in a modular, composable way.

Time series are everywhere in real-world applications and there has been an increase in interest in time series frameworks recently (see e.g. sktime, tslearn, tsml).

But there are still very few principled time-series libraries out there, so you would be working on something that could be very useful for a large number of people. To find out more, check out this paper on sktime.

Mentors: Sebastian Vollmer, Markus Löning (sktime developer).

References

Interpretable Machine Learning in Julia

Interpreting and explaining black box interpretation crucial to estabilish trust and improve performance

Difficulty. Easy - moderate.

Description

It is important to have mechanisms in place to interpret the results of machine learning models. Identify the relevant factors of a decision or scoring of a model.

This project will implement methods for model and feature interpretability.

Mentors. Diego Arenas, Sebastian Vollmer.

Prerequisites

Your contribution

The aim of this project is to implement multiple variants implementation algorithms such as:

Specifically you will

References

Tutorials

Model visualization in MLJ

Design and implement a data visualization module for MLJ.

Difficulty. Easy.

Description

Design and implement a data visualization module for MLJ to visualize numeric and categorical features (histograms, boxplots, correlations, frequencies), intermediate results, and metrics generated by MLJ machines.

Using a suitable Julia package for data visualization.

The idea is to implement a similar resource to what mlr3viz does for mlr3.

Prerequisites

Your contribution

So far visualizing data or features in MLJ is an ad-hoc task. Defined by the user case by case. You will be implementing a standard way to visualize model performance, residuals, benchmarks and predictions for MLJ users.

The structures and metrics will be given from the results of models or data sets used; your task will be to implement the right visualizations depending on the data type of the features.

A relevant part of this project is to visualize the target variable against the rest of the features.

You will enhance your visualisation skills as well as your ability to "debug" and understand models and their prediction visually.

References

Mentors: Sebastian Vollmer, Diego Arenas.

Deeper Bayesian Intergration

Bayesian methods and probabilistic supervised learning provide uncertainty quantification. This project aims increasing integration to combine Bayeisan and non-Bayesian methods using Turing.

Description

As an initial step reproduce (SOSSMLJ)[https://github.com/cscherrer/SossMLJ.jl] in Turing. The bulk of the project is to implement methods that combine multiple predictive distributinons.

Your contributions

References

Bayesian Stacking SKpro

Difficulty: Medium to Hard

Mentors: Hong Ge Sebastian Vollmer

MLJ and MLFlow integration

Integrate MLJ with MLFlow.

Difficulty. Easy.

Description

MLFlow is a flexible model management tool. The project consists of writing the necessary functions to integrate MLJ with MLFlow REST API so models built using MLJ can keep track of its runs, evaluation metrics, parameters, and can be registered and monitored using MLFlow.

Prerequisites

Your contribution

References

Speed demons only need apply

Diagnose and exploit opportunities for speeding up common MLJ workflows.

Difficulty. Moderate.

Description

In addition to investigating a number of known performance bottlenecks, you will have some free reign in this to identify opportunities to speed up common MLJ workflows, as well as making better use of memory resources.

Prerequisites

Your contribution

In this project you will:

References

Mentors. Anthony Blaom

Compiler Projects – Summer of Code

I have a number of other compiler projects I'm currently working on. Please contact me for additional details and let me know what specifically interests you about this area of contribution and we can tailor your project to suit you together.

Recommended Skills: Most of these projects involve algorithms work, requiring a willingness and interest in seeing how to integrate with a large system.

Mentors: Jameson Nash

Improving test coverage

Code coverage reports very good coverage of all of the Julia Stdlib packages, but it's not complete. Additionally, the coverage tools themselves (–track-coverage and https://github.com/JuliaCI/Coverage.jl) could be further enhanced, such as to give better accuracy of statement coverage, or more precision. A successful project may combine a bit of both building code and finding faults in others' code.

Another related side-project might be to explore adding Type information to the coverage reports?

Recommended Skills: An eye for detail, a thrill for filing code issues, and the skill of breaking things.

Contact: Jameson Nash

Multi-threading Improvement Projects

A few ideas to get you started, in brief:

Join the regularly scheduled multithreading call for discussion of any of these at #multithreading BoF calendar invite on the Julia Language Public Events calendar.

Recommended Skills: Varies by project

Contact: Jameson Nash

Automated performance measurements

The Nanosoldier.jl project (and related https://github.com/JuliaCI/BaseBenchmarks.jl) tests for performance impacts of some changes. However, there remains many areas that are not covered (such as compile time) while other areas are over-covered (greatly increasing the duration of the test for no benefit) and some tests may not be configured appropriately for statistical power. Furthermore, the current reports are very primitive and can only do a basic pair-wise comparison, while graphs and other interactive tooling would be more valuable. Thus, there would be many great projects for a summer student to tackle here!

Contact: Jameson Nash, Tim Besard

DeepChem.jl development projects – Summer of Code

Towards DeepChem.jl: Combining Machine Learning with Chemical Knowledge

We have been developing the AtomicGraphNets.jl package, which began modestly as a Julia port of CGCNN, but now has plans to expand to a variety of more advanced graph-based methods for state-of-the-art ML performance making predictions on atomic systems. In support of this package, we are also developing ChemistryFeaturization.jl, which contains functions for building and featurizing atomic graphs from a variety of standard input files. ChemistryFeaturization will eventually form the bedrock of a DeepChem.jl umbrella organization to host a Julia-based port of the popular Deepchem Python package.

Some of the features we're excited about working on include:

Recommended Skills: Basic graph theory and linear algebra, some knowledge of chemistry

Expected Results: Contributions of new features in the eventual DeepChem.jl ecosystem

Mentors: Rachel Kurchin

DFTK.jl development projects – Summer of Code

Automatic differentiation in density-functional theory

Density-functional theory (DFT) is probably the most widespread method for simulating the quantum-chemical behaviour of electrons in matter and applications cover a wide range of fields such as materials research, chemistry or pharmacy. For aspects like designing the batteries, catalysts or drugs of tomorrow DFT is nowadays a key building block of the ongoing research. The aim to tackle even larger and more involved systems with DFT, however, keeps posing novel challenges with respect to physical models, reliability and performance. For tackling these aspects in the multidisciplinary context of DFT we recently started the density functional toolkit (DFTK), a DFT package written in pure Julia.

Aside from computing the DFT energy itself, most applications of DFT require also derivatives of the energy with respect to various computational parameters. Examples are the forces (derivative energy with respect to atomic positions) and stresses (derivative energy with respect to lattice parameters). While the expressions of these derivatives are well-known for the standard DFT approaches implementing these is still a laborious (and sometimes boring) task. Additionally deriving these forces and stresses expressions for novel models currently boils down to manually doing so on pen and paper, which for the more involved models can be non-trivial.

As an alternative we want to take a look at combining the automatic-differentiation (AD) capabilities of the Julia ecosystem with DFTK in order to compute stresses without implementing the derivatives by hand. Instead we want to make DFTK suitable for AD, such that stresses for our current (and future) DFT models can be computed automatically. Being able to combine DFTK and AD would not only give us stresses, but it would also pave the road for computing even more involved properties using AD. In this final stage of the project it would be required to AD through the whole of DFTK (including several layers of solvers).

Project type: Risky and exploratory (essentially a small research project)

Level of difficulty: Hard

Recommended skills: Interest to work on an multidisciplinary project bordering physics, mathematics and computer science with a good working knowledge of differential calculus and Julia. Detailed knowledge in the physical background (electrostatics, material science) or about automatic differentiation is not required, but be prepared to take a closer look at these domains during the project.

Expected results: Use automatic differentiation to implement stresses (derivatives of the total energy with respect to lattice parameters) into DFTK.

Mentors: Keno Fischer, Michael F. Herbst, Antoine Levitt

References: For a nice intro to DFT and DFTK.jl see Michael's talk at JuliaCon 2020 and the literature given in the DFTK documentation. A concise introduction into AD are Antoine's notes on the adjoint trick.

Contact: For any questions, feel free to email @mfherbst, @antoine-levitt or write us on our gitter chat.

Numerical Differential Equations Projects – Summer of Code

Native Julia ODE, SDE, DAE, DDE, and (S)PDE Solvers

The DifferentialEquations.jl ecosystem has an extensive set of state-of-the-art methods for solving differential equations hosted by the SciML Scientific Machine Learning Software Organization. By mixing native methods and wrapped methods under the same dispatch system, DifferentialEquations.jl serves both as a system to deploy and research the most modern efficient methodologies. While most of the basic methods have been developed and optimized, many newer methods need high performance implementations and real-world tests of their efficiency claims. In this project students will be paired with current researchers in the discipline to get a handle on some of the latest techniques and build efficient implementations into the *DiffEq libraries (OrdinaryDiffEq.jl, StochasticDiffEq.jl, DelayDiffEq.jl). Possible families of methods to implement are:

Many of these methods are the basis of high-efficiency partial differential equation (PDE) solvers and are thus important to many communities like computational fluid dynamics, mathematical biology, and quantum mechanics.

This project is good for both software engineers interested in the field of numerical analysis and those students who are interested in pursuing graduate research in the field.

Recommended Skills: Background knowledge in numerical analysis, numerical linear algebra, and the ability (or eagerness to learn) to write fast code.

Expected Results: Contributions of production-quality solver methods.

Mentors: Chris Rackauckas

Improvements to Physics-Informend Neural networks (PINN) for solving differential equations

Neural networks can be used as a method for efficiently solving difficult partial differential equations. Efficient implementations of physics-informed machine learning from recent papers are being explored as part of the NeuralPDE.jl package. The issue tracker contains links to papers which would be interesting new neural network based methods to implement and benchmark against classical techniques.

Recommended Skills: Background knowledge in numerical analysis and machine learning.

Expected Results: New neural network based solver methods.

Mentors: Chris Rackauckas

Performance enhancements for differential equation solvers

Wouldn't it be cool to have had a part in the development of widely used efficient differential equation solvers? DifferentialEquations.jl has a wide range of existing methods and an extensive benchmark suite which is used for tuning the methods for performance. Many of its methods are already the fastest in their class, but there is still a lot of performance enhancement work that can be done. In this project you can learn the details about a wide range of methods and dig into the optimization of the algorithm's strategy and the implementation in order to improve benchmarks. Projects that could potentially improve the performance of the full differential equations ecosystem include:

Recommended Skills: Background knowledge in numerical analysis, numerical linear algebra, and the ability (or eagerness to learn) to write fast code.

Expected Results: Improved benchmarks to share with the community.

Mentors: Chris Rackauckas

Discretizations of partial differential equations

There are two ways to approach libraries for partial differential equations (PDEs): one can build "toolkits" which enable users to discretize any PDE but require knowledge of numerical PDE methods, or one can build "full-stop" PDE solvers for specific PDEs. There are many different ways solving PDEs could be approached, and here are some ideas for potential projects:

  1. Automated PDE discretization tooling. We want users to describe a PDE in its mathematical form and automate the rest of the solution process. See this issue for details.

  2. Enhancement of existing tools for discretizing PDEs. The finite differencing (FDM) library DiffEqOperators.jl could be enhanced to allow non-uniform grids or composition of operators. The finite element method (FEM) library FEniCS.jl could wrap more of the FEniCS library.

  3. Full stop solvers of common fluid dynamical equations, such as diffusion-advection-convection equations, or of hyperbolic PDEs such as the Hamilton-Jacobi-Bellman equations would be useful to many users.

  4. Using stochastic differential equation (SDE) solvers to efficiently (and highly parallel) approximate certain PDEs.

  5. Development of ODE solvers for more efficiently solving specific types of PDE discretizations. See the "Native Julia solvers for ordinary differential equations" project.

Recommended Skills: Background knowledge in numerical methods for solving differential equations. Some basic knowledge of PDEs, but mostly a willingness to learn and a strong understanding of calculus and linear algebra.

Expected Results: A production-quality PDE solver package for some common PDEs.

Mentors: Chris Rackauckas

Tools for global sensitivity analysis

Global Sensitivity Analysis is a popular tool to assess the effect that parameters have on a differential equation model. A good introduction can be found in this thesis. Global Sensitivity Analysis tools can be much more efficient than Local Sensitivity Analysis tools, and give a better view of how parameters affect the model in a more general sense. The goal of this project would be to implement more global sensitivity analysis methods like the eFAST method into DiffEqSensitivity.jl which can be used with any differential equation solver on the common interface.

Recommended Skills: An understanding of how to use DifferentialEquations.jl to solve equations.

Expected Results: Efficient functions for performing global sensitivity analysis.

Mentors: Chris Rackauckas, Vaibhav Dixit

Parameter identifiability analysis

Parameter identifiability analysis is an analysis that describes whether the parameters of a dynamical system can be identified from data or whether they are redundant. There are two forms of identifiability analysis: structural and practical. Structural identifiability analysis relates changes in the solution of the ODE directly to other parameters, showcasing that it is impossible to distinguish between parameter A being higher and parameter B being lower, or the vice versa situation, given only data about the solution because of how the two interact. This could be done directly on the symbolic form of the equation as part of ModelingToolkit.jl. Meanwhile, practical identifiability analysis looks as to whether the parameters are non-identifiable in a practical sense, for example if two parameters are numerically indistinguishable (given possibly noisy data). In this case, numerical techniques being built in DiffEqSensitivity.jl, such as a nonlinear likelihood profiler or an information sensitivity measure can be used to showcase whether a parameter has a unique enough effect to be determined.

Recommended Skills: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

Expected Results: Efficient and high-quality implementations of parameter identifiability methods.

Mentors: Chris Rackauckas

Model Order Reduction

Model order reduction is a technique for automatically finding a small model which approximates the large model but is computationally much cheaper. We plan to use the infrastructure built by ModelingToolkit.jl to implement a litany of methods and find out the best way to accelerate differential equation solves.

Recommended Skills: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

Expected Results: Efficient and high-quality implementations of model order reduction methods.

Mentors: Chris Rackauckas

Automated symbolic manipulations of differential equation systems

Numerically solving a differential equation can be difficult, and thus it can be helpful for users to simplify their model before handing it to the solver. Alas this takes time... so let's automate it! ModelingToolkit.jl is a project for automating the model transformation process. Various parts of the library are still open, such as:

Recommended Skills: A basic background in differential equations and the ability to use numerical ODE solver libraries. Background in the numerical analysis of differential equation solvers is not required.

Expected Results: Efficient and high-quality implementations of model transformation methods.

Mentors: Chris Rackauckas

Documentation tooling

Documenter.jl

The Julia manual and the documentation for a large chunk of the ecosystem is generated using Documenter.jl – essentially a static site generator that integrates with Julia and its docsystem. There are tons of opportunities for improvements for anyone interested in working on the interface of Julia, documentation and various front-end technologies (web, LaTeX).

Recommended skills: Basic knowledge of web-development (JS, CSS, HTML) or LaTeX, depending on the project.

Mentors: Morten Piibeleht

Docsystem API

Julia supports docstrings – inline documentation which gets parsed together with the code and can be accessed dynamically in a Julia session (e.g. via the REPL ?> help mode; implemented mostly in the Docs module).

Not all docstrings are created equal however. There are bugs in Julia's docsystem code, which means that some docstrings do not get stored or are stored with the wrong key (parametric methods). In addition, the API to fetch and work with docstrings programmatically is not documented, not considered public and could use some polishing.

Recommended skills: Basic familiarity with Julia is sufficient.

Mentors: Morten Piibeleht

Machine Learning Projects - Summer of Code

CUDA Hacking

Are you a performance nut? Help us implement cutting-edge CUDA kernels in Julia for operations important across deep learning, scientific computing and more. We also need help developing our wrappers for machine learning, sparse matrices and more, as well as CI and infrastructure. Contact us to develop a project plan.

Mentors: Tim Besard, Dhairya Gandhi.

Reinforcement Learning Environments

Develop a series of reinforcement learning environments, in the spirit of the OpenAI Gym. Although we have wrappers for the gym available, it is hard to install (due to the Python dependency) and, since it's written in Python and C code, we can't do more interesting things with it (such as differentiate through the environments). A pure-Julia version that supports a similar API and visualisation options would be valuable to anyone doing RL with Flux.

Mentors: Dhairya Gandhi.

Reinforcement Learning Algorithms

Recent advances in reinforcement learning led to many breakthroughs in artificial intelligence. Some of the latest deep reinforcement learning algorithms have been implemented in ReinforcementLearning.jl with Flux. We'd like to have more interesting and practical algorithms added to enrich the whole community, including but not limited to the following directions:

Expected Outcomes

For each new algorithm, at least two experiments are expected to be added into ReinforcementLearningZoo.jl. A simple one to make sure it works on some toy games with CPU only and another more practical one to produce comparable results on the original paper with GPU enabled. Besides, a technical report on the implementation details and speed/performance comparison with other baselines is preferred.

Mentors: Jun Tian

AlphaZero.jl

The philosophy of the AlphaZero.jl project is to provide an implementation of AlphaZero that is simple enough to be widely accessible for students and researchers, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources (our latest release is consistently between one and two orders of magnitude faster than competing Python implementations).

Here are a few project ideas that build on AlphaZero.jl. Please contact us for additional details and let us know about your experience and interests so that we can build a project that best suits your profile.

Expected Outcomes

In all these projects, the goal is not only to showcase the current Julia ecosystem and test its limits, but also to push it forward through concrete contributions that other people can build on. Such contributions include:

Mentors: Jonathan Laurent

NLP Tools and Models

Difficulty: Medium to Hard

Build deep learning models for Natural Language Processing in Julia. TextAnalysis and WordTokenizers contains the basic algorithms and data structures to work with textual data in Julia. On top of that base, we want to build modern deep learning models based on recent research. The following tasks can span multiple students and projects.

It is important to note that we want practical, usable solutions to be created, not just research models. This implies that a large part of the effort will need to be in finding and using training data, and testing the models over a wide variety of domains. Pre-trained models must be available to users, who should be able to start using these without supplying their own training data.

Mentors: Avik Sengupta

Automated music generation

Difficulty: Hard

Neural network based models can be used for music analysis and music generation (composition). A suite of tools in Julia to enable research in this area would be useful. This is a large, complex project that is suited for someone with an interest in music and machine learning. This project will need a mechanism to read music files (primarily MIDI), a way to synthesise sounds, and finally a model to learn composition. All of this is admittedly a lot of work, so the exact boundaries of the project can be flexible, but this can be an exciting project if you are interested in both music and machine learning.

Recommended Skills: Music notation, some basic music theory, MIDI format, Transformer and LSTM architectures

Resources: Music Transformer, Wave2MIDI2Wave, MIDI.jl, Mplay.jl

Mentors: Avik Sengupta

Flux.jl

Flux usually takes part in Google Summer of Code, as part of the wider Julia organisation. We follow the same rules and application guidelines as Julia, so please check there for more information on applying. Below are a set of ideas for potential projects (though you are welcome to explore anything you are interested in).

Flux projects are typically very competitive; we encourage you to get started early, as successful students typically have early PRs or working prototypes as part of the application. It is a good idea to simply start contributing via issue discussion and PRs and let a project grow from there; you can take a look at this list of issues for some starter contributions.

Port ML Tutorials

There are many high-quality open-source tutorials and learning materials available, for example from PyTorch and fast.ai. We'd like to have Flux ports of these that we can add to the model zoo, and eventually publish to the Flux website.

Mentors: Dhairya Gandhi.

FermiNets: Generative Synthesis for Automating the Choice of Neural Architectures

The application of machine learning requires an understanding a practictioner to optimize a neural architecture for a given problem, or does it? Recently techniques in automated machine learning, also known as AutoML, have dropped this requirement by allowing for good architectures to be found automatically. One such method is the FermiNet which employs generative synthesis to give a neural architecture which respects certain operational requirements. The goal of this project is to implement the FermiNet in Flux to allow for automated sythesis of neural networks.

Mentors: Chris Rackauckas and Dhairya Gandhi.

Differentiable Rendering [HARD]

Expected Outcome: This is motivated to create SoftRasterizer/DiB-R based projects. We already have RayTracer.jl which is motivated by OpenDR. (Of course, if someone wants to implement NERF - like models they are most welcome to submit a proposal). We would ideally target at least 2 of these models.

Skills: GPU Programming, Deep Learning, (deep) familiarity with the literature, familiarity with defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi, Julian Samaroo, Avik Pal

Core Development [MEDIUM]

Expected Outcomes:

Skills: GPU Programming, Deep Learning, familiarity with defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi

FastAI.jl Development

Difficulty: Medium

In this project, you will assist the ML community team with building FastAI.jl on top of the existing JuliaML + FluxML ecosystem packages. The primary goal is to create an equivalent to docs.fast.ai. This will require building the APIs, documenting them, and creating the appropriate tutorials. Some familiarity with the following Julia packages is preferred, but it is not required:

A stretch goal can include extending FastAI.jl beyond its Python-equivalent by leveraging the flexibility in the underlying Julia packages. For example, creating and designing abstractions for distributed data parallel training.

Skills: Familiarity with deep learning pipelines, common practices, Flux.jl, and MLDataPattern.jl

Mentors: Kyle Daruwalla

Differentiable Computer Vision [HARD]

Expected Outcome:

Create a library of utliity functions that can consume Julia's Imaging libraries to make them differentiable. With Zygote.jl, we have the platform to take a general purpose package and apply automatic differentiation to it. This project is motivated to use existing libraries that offer perform computer vision tasks, and augment them with AD to perform tasks such as homography regression.

Skills: Familiarity with automatic differentiation, deep learning, and defining (a lot of) Custom Adjoints

Mentors: Dhairya Gandhi

Deep Learning for source code analysis

Difficulty: Easy to Medium

The use of deep learning tools to source code is an active area of research. With the runtime being able to easily introspect into Julia code (for example, with a clean, accesible AST format), using theses techniques on Julia code would be a fruitful exercise.

Recommended Skills: Familiarity with compiler techniques as well as deep learning tools will be required. The "domain expertise" in this task is Julia programming, so it will need someone who has a reasonable experience of the Julia programming language.

Expected Outcome: Packages for each technique that is usable by general programmers.

Mentors: Avik Sengupta

High Performance and Parallel Computing Projects – Summer of Code

Julia is emerging as a serious tool for technical computing and is ideally suited for the ever-growing needs of big data analytics. This set of proposed projects addresses specific areas for improvement in analytics algorithms and distributed data management.

Difficulty: Medium

Scheduling algorithms for Distributed algorithms

Dagger.jl is a native Julia framework and scheduler for distributed execution of Julia code and general purpose data parallelism, using dynamic, runtime-generated task graphs which are flexible enough to describe multiple classes of parallel algorithms. This project proposes to implement different scheduling algorithms for Dagger to optimize scheduling of certain classes of distributed algorithms, such as MapReduce and MergeSort, and properly utilizing heterogeneous compute resources. Students will be expected to find published distributed scheduling algorithms and implement them on top of the Dagger framework, benchmarking scheduling performance on a variety of micro-benchmarks and real problems.

Mentors: Julian Samaroo, Valentin Churavy

Distributed Training

Difficulty: Hard

Add a distributed training API for Flux models built on top of Dagger.jl. More detailed milestones include building Dagger.jl abstractions for UCX.jl, then building tools to map Flux models into data parallel Dagger DAGs. The final result should demonstrate a Flux model training with multiple devices in parallel via the Dagger.jl APIs. A stretch goal will include mapping operations with a model to a DAG to facilitate model parallelism as well.

Skills: Familiarity with UCX, representing execution models as DAGs, Flux.jl, and data/model parallelism in machine learning

Mentors: Kyle Daruwalla, Julian Samaroo, and Brian Chen

JuliaImages Projects – Summer of Code

JuliaImages (see the documentation) is a framework in Julia for multidimensional arrays, image processing, and computer vision (CV). It has an active development community and offers many features that unify CV and biomedical 3D/4D image processing, support big data, and promote interactive exploration.

Often the best ideas are the ones that candidate SoC students come up with on their own. We are happy to discuss such ideas and help you refine your proposal. Below are some potential project ideas that might help spur some thoughts. See the bottom of this page for information about mentors.

Wide-ranging demos (easy)

Description

For new or occasional users, JuliaImages would benefit from a large collection of complete worked examples organized by topic. While the current documentation contains many "mini-demos," they are scattered; an organized page would help users quickly find what they need. We have set up a landing page, but many more demos are needed. Scikit-image is one potential model.

Notes:

Skills

The applicant should be familar with JuliaImages, and should be able to write good technical blogs in English.

Expected Outcomes

Mentors

Johnny Chen and Tim Holy

Benchmarking against other frameworks (medium)

Descriptsion

JuliaImages provides high-quality implementations of many algorithms; however, as yet there is no set of benchmarks that compare our code against that of other image-processing frameworks. Developing such benchmarks would allow us to advertise our strengths and/or identify opportunities for further improvement. See also the OpenCV project below.

Skills

JuliaImages experiences is required. Some familiarities with other image processing frameworks is preferred.

Expected Outcomes

Benchmarks for several performance-sensitive packages (e.g., ImageFiltering, ImageTransformations, ImageMorphology, ImageContrastAdjustment, ImageEdgeDetection, ImageFeatures, and/or ImageSegmentation) against frameworks like Scikit-image and OpenCV, and optionally others like ITK, ImageMagick, and Matlab/Octave.

This task splits into at least two pieces:

One should also be aware of the fact that differences in implementation (which may include differences in quality) may complicate the interpretation of some benchmarks.

Mentors

Tim Holy and Johnny Chen

GPU support for many algorithms (hard)

Description

JuliaImages supports many common algorithms, but targets only the CPU. With Julia now possessing first-in-class support for GPUs, now is the time to provide GPU implementations of many of the same algorithms.

KernelAbstractions may make it easier to support both CPU and GPU with a common implementation.

Skills

Familiarity with CUDA programming in Julia, i.e., CUDA.jl is required.

Expected Outcomes

Fairly widespread GPU support for a single nontrivial package. ImageFiltering would be a good choice.

Mentors

Tim Holy and Johnny Chen

Better ImageMagick supports (medium)

Description

ImageMagick is a widely used low-level image io and processing library, it also has its Julia frontend ImageMagick.jl, which is used widely in the entire Julia ecosystem. However, ImageMagick.jl project is not under active maintenance; it lacks of the necessary documentation and has few test coverage. The potential applicant needs to revisit and upgrade the ImageMagick.jl codebase to enhance the ImageMagick.jl package.

Skills

Experiences with Linux cross-compiling, C and Julia is required. Familiarity with ImageMagick library is preferred.

Expected Outcomes

Mentors

Tim Holy and Johnny Chen

Better ImageIO supports (medium)

Description

Besides the gigantic ImageMagick library, Julia also provides a lighter ImageIO package for PNG, TIFF and Netpbm image formats. However, there are more widely-used image formats (e.g., JPEG, GIF) that are not supported by ImageIO yet. Potential applicant needs to support the IO of new image format by either 1) wrapping available C libraries via BinaryBuilder, or 2) re-implement the functionality with pure Julia.

Skills

Experiences with Julia is required. For library wrapping projects, experiences with cross-compiling in Linux system is required, and familiarity with the source language (e.g., C) is preferred.

Expected Outcomes

Add at least one image format support.

Mentors

Ian Butterworth, Johnny Chen and Tim Holy

Interactivity and visualization tools (open-ended)

Description

Image processing often involves tight interaction between algorithms and visualization. While there are a number of older tools available, leveraging GLVisualize seems to hold the greatest promise. This project might implement a number of interactive tools for region-of-interest selection, annotation, measurement, and modification. Software suites like OpenCV, ImageJ/Fiji, scikit-image, and Matlab might serve as inspiration.

JuliaImages also provides several non-GUI visualization tools, e.g., ImageDraw.jl, ImageInTerminal.jl, ImageShow.jl and MosaicViews.jl. Improving these packages are also good project ideas.

Skills

For ImageViews.jl and similar GUI projects, familiarity with GUI programming is required. For non-GUI projects, familarity with Julia array interfaces are preferred.

Mentors

Tim Holy. For non-GUI projects, Johnny Chen is also available.

Integration of OpenCV and JuliaImages (hard)

Description

OpenCV is one of the pre-eminent image-processing frameworks. During the summer of 2020, significant progress was made on a Julia wrapper. An important remaining task is to integrate the wrapper with Julia's binary packaging system.

Skills

C++ experiences are required. Some familiarity with the Julia and BinaryBuilder.jl and CxxWrap.jl are preferred.

Expected Outcomes

An OpenCV package that can be installed across all major platforms with Pkg.add("OpenCV").

Mentors

Tim Holy

Contributions to a Stereo Matching Package (medium)

Description

When two images are taken of a scene with a calibrated stereo rig it is possible to construct a three-dimensional model of the scene provided that one can determine the coordinates of corresponding points in the two images. The task of determining the coordinates of corresponding points is frequently called stereo matching or disparity estimation. Numerous algorithms for this task have been proposed over the years and new ones continue to be developed.

This project will implement several stereo matching algorithms. Emphasis will be placed on efficient implementations which leverage all of Julia's features for writing fast code.

Example algorithms:

  1. Bleyer, Michael, Christoph Rhemann, and Carsten Rother. "PatchMatch Stereo-Stereo Matching with Slanted Support Windows." Bmvc. Vol. 11. 2011.

  2. Hirschmuller, Heiko. "Accurate and efficient stereo processing by semi-global matching and mutual information." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.

  3. Gehrig, Stefan K., and Clemens Rabe. "Real-time semi-global matching on the CPU." Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 2010.

Skills

Experiences in JuliaImages are required. Familiarity with the algorithms are preferred.

Expected Outcomes

A library of stereo matching algorithms with usage tutorials and documentation.

Mentors

Zygmunt Szpak

Contributions to a Calibration Target package (medium)

Description

Camera calibration involves determining a camera's intrinsic parameters from a series of images of a so-called "calibration target". Knowledge of the intrinsic parameters facilitates three-dimensional reconstruction from images or video. The most frequently used calibration target is a checkerboard pattern. A key step in camera calibration involves automatically detecting the checkerboard and identifying landmarks such as the corners of each checkeboard square.

This project will implement a recent automatic checkerboard detection and feature extraction algorithm.

Example algorithm:

  1. Y. Yan, P. Yang, L. Yan, J. Wan, Y. Sun, and K. Tansey, “Automatic checkerboard detection for camera calibration using self-correlation,” Journal of Electronic Imaging, vol. 27, no. 03, p. 1, May 2018.

Skills

Experiences in JuliaImages are required. Familiarity with the algorithms are preferred.

Expected Outcomes

A checkeboard detection algorithm which can provide the necessary inputs to a camera calibration routine.

Mentors

Zygmunt Szpak

Where to go for discussion and to find mentors

Interested students are encouraged to open an discussion in Images.jl to introduce themselves and discuss the detailed project ideas. To increase the chance of getting useful feeback, please provide detailed plans and ideas (don't just copy the contents here).

Javis Projects – Summer of Code

Javis: Julia Animations and VISualizations

Javis.jl is a general purpose Julia library to easily construct informative, performant, and winsome animated graphics. Javis provides a powerful grammar for users to make animated visuals. Users of Javis have made animations to explain concepts in a variety of fields such as mathematical concepts like Fourier transformation to brain imaging of EEGs. It builds on top of the Julia drawing framework Luxor by adding functions to simplify the creation of objects and their actions.

The Summer of Code Javis projects aims at simplifying the creation of animations to explain difficult concepts and communicate to broad audiences how Julia is a strong tool for graphics creation.

Below you can find a list of potential projects that can be tackled during Google Summer of Code. If interested in exploring any of these projects, please reach out to:

Thanks for your interest! 🎉

General Improvement to User Experience

Mentors: Ole Kröger, Jacob Zelko

Recommended skills: General understanding of Luxor and the underlying structure of Javis.

Difficulty: Medium

Description: This project is split across several tasks that are manageable enough to be worked on by a single student in the Google Summer of Code period. These small tasks come together to create an easier and understandable syntax for Javis-based animated graphic creation. The following list are the smaller tasks one could work on:

Graph and networks

Mentors: Ole Kröger, Jacob Zelko

Recommended skills: Knowledge about graph theory and LightGraphs.jl

Difficulty: Hard

Description: Javis could be a powerful platform to easily animate problems and their solutions in a variety of different fields. Currently, Javis lacks the ability to visualize graphs. The goal for this project would be to add graph support to Javis by supporting interoperability with LightGraphs.jl. The animation of flows and shortest path is something that's extremely valuable for teaching as well as in practical analysis of graph networks. To learn more about the current thoughts surrounding this problem, check this issue for more information.

Linear algebra

Mentors: Ole Kröger, Jacob Zelko

Recommended skills: Basic to intermediate knowledge about linear algebra.

Difficulty: Easy

Description: Linear algebra is of invaluable importance all across different fields of mathematics and engineering. Enabling the easy creation of visualizations regarding rotations, matrices and other concepts is helpful in educating students about this amazing branch mathematics. Here are a few issues related to tasks that could be worked on to bring about this capability:

Dynamical systems, complex systems & nonlinear dynamics – Summer of Code

Agents.jl

Difficulty: Easy to Medium.

Agents.jl is a pure Julia framework for agent-based modeling (ABM). It has an extensive list of features, excellent performance and is easy to learn, use, and extend. Comparisons with other popular frameworks written in Python or Java (NetLOGO, MASON, Mesa), show that Agents.jl outperforms all of them in computational speed, list of features and usability.

In this project students will be paired with lead developers of Agents.jl to improve Agents.jl with more features, better performance, and overall higher polish. Possible features to implement are:

Recommended Skills: Familiarity with agent based modelling, Agents.jl and Julia's Type System. Background in complex systems, sociology, or nonlinear dynamics is not required.

Expected Results: Well-documented, well-tested useful new features for Agents.jl.

Mentors: George Datseris, Tim DuBois

DynamicalSystems.jl

Difficulty: Easy to Hard, depending on the algorithm chosen

DynamicalSystems.jl is an award-winning Julia software library for dynamical systems, nonlinear dynamics, deterministic chaos and nonlinear timeseries analysis. It has an impressive list of features, but one can never have enough. In this project students will be able to enrich DynamicalSystems.jl with new algorithms and enrich their knowledge of nonlinear dynamics and computer-assisted exploration of complex systems.

Possible projects are summarized in the wanted-features of the library

Examples include but are are not limited to:

and many more.

Recommended Skills: Familiarity with nonlinear dynamics and/or differential equations and the Julia language.

Expected Results: Well-documented, well-tested new algorithms for DynamicalSystems.jl.

Mentors: George Datseris

Stochastic differential equations and continuous time signal processing – Summer of Code

Smoothing non-linear continuous time systems

The student implements a state of the art smoother for continuous-time systems with additive Gaussian noise. The system's dynamics can be described as an ordinary differential equation with locally additive Gaussian random fluctuations, in other words a stochastic ordinary differential equation.

Given a series of measurements observed over time, containing statistical noise and other inaccuracies, the task is to produce an estimate of the unknown trajectory of the system that led to the observations.

Linear continuous-time systems are smoothed with the fixed-lag Kalman-Bucy smoother (related to the Kalman–Bucy_filter). It relies on coupled ODEs describing how mean and covariance of the conditional distribution of the latent system state evolve over time. A versatile implementation in Julia is missing.

Expected Results: Build efficient implementation of non-linear smoothing of continuous stochastic dynamical systems.

Recommended Skills: Gaussian random variables, Bayes' formula, Stochastic Differential Equations

Mentors: Moritz Schauer

Rating: Hard

Numerical Projects – Summer of Code

  1. View all GSoC/JSoC Projects
  2. Projects
  3. New geostatistical clustering methods
  4. New geostatistical simulation methods
  5. Migrate from Plots.jl to Makie.jl recipes
  6. How to get started?
  7. Particle swarm optimization of machine learning models
    1. Description
    2. Prerequisites
    3. Your contribution
    4. References
  8. In-processing methods for fairness in machine learning
    1. Prerequisites
    2. Description
    3. Your contribution
    4. References
  9. Causal and counterfactual methods for fairness in machine learning
    1. Prerequisites
    2. Description
    3. Your contribution
    4. References
  10. Time series forecasting at scale - speed up via Julia
    1. Prerequisites
    2. Your contribution
    3. References
  11. Interpretable Machine Learning in Julia
    1. Description
    2. Prerequisites
    3. Your contribution
    4. References
  12. Model visualization in MLJ
    1. Description
    2. Prerequisites
    3. Your contribution
    4. References
  13. Deeper Bayesian Intergration
    1. Description
    2. Your contributions
    3. References
    4. Difficulty: Medium to Hard
  14. MLJ and MLFlow integration
    1. Description
    2. Prerequisites
    3. Your contribution
    4. References
  15. Speed demons only need apply
    1. Description
    2. Prerequisites
    3. Your contribution
    4. References
  16. Improving test coverage
  17. Multi-threading Improvement Projects
  18. Automated performance measurements
  19. Towards DeepChem.jl: Combining Machine Learning with Chemical Knowledge
  20. Automatic differentiation in density-functional theory
  21. Native Julia ODE, SDE, DAE, DDE, and (S)PDE Solvers
  22. Improvements to Physics-Informend Neural networks (PINN) for solving differential equations
  23. Performance enhancements for differential equation solvers
  24. Discretizations of partial differential equations
  25. Tools for global sensitivity analysis
  26. Parameter identifiability analysis
  27. Model Order Reduction
  28. Automated symbolic manipulations of differential equation systems
  29. Documenter.jl
  30. Docsystem API
    1. CUDA Hacking
    2. Reinforcement Learning Environments
    3. Reinforcement Learning Algorithms
      1. Expected Outcomes
    4. AlphaZero.jl
      1. Expected Outcomes
    5. NLP Tools and Models
    6. Automated music generation
  31. Flux.jl
    1. Port ML Tutorials
    2. FermiNets: Generative Synthesis for Automating the Choice of Neural Architectures
    3. Differentiable Rendering [HARD]
    4. Core Development [MEDIUM]
    5. FastAI.jl Development
    6. Differentiable Computer Vision [HARD]
  32. Deep Learning for source code analysis
  33. Scheduling algorithms for Distributed algorithms
  34. Distributed Training
  35. Wide-ranging demos (easy)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  36. Benchmarking against other frameworks (medium)
    1. Descriptsion
    2. Skills
    3. Expected Outcomes
    4. Mentors
  37. GPU support for many algorithms (hard)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  38. Better ImageMagick supports (medium)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  39. Better ImageIO supports (medium)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  40. Interactivity and visualization tools (open-ended)
    1. Description
    2. Skills
    3. Mentors
  41. Integration of OpenCV and JuliaImages (hard)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  42. Contributions to a Stereo Matching Package (medium)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
  43. Contributions to a Calibration Target package (medium)
    1. Description
    2. Skills
    3. Expected Outcomes
    4. Mentors
    5. Where to go for discussion and to find mentors
  44. General Improvement to User Experience
  45. Graph and networks
  46. Linear algebra
  47. Agents.jl
  48. DynamicalSystems.jl
  49. Smoothing non-linear continuous time systems
  50. Numerical Linear Algebra
    1. Matrix functions
  51. Better Bignums Integration
    1. Special functions
    2. A Julia-native CCSA optimization algorithm
  52. Pluto as a VS Code notebook
  53. Macro support
  54. Tools for education
  55. Machine Learning Time Series Regression
  56. Machine learning for nowcasting and forecasting
  57. Time series forecasting at scales
    1. Physics-Informed Neural Networks (PINNs) and Solving Differential Equations with Deep Learning
    2. Improvements to Neural and Universal Differential Equations
    3. Accelerating optimization via machine learning with surrogate models
    4. Parameter estimation for nonlinear dynamical models
  58. Integration of FEniCS.jl with dolfin-adjoint + Zygote.jl for Finite Element Scientific Machine Learning
  59. Multi-Start Optimization Methods
  60. Groebner basis and Symbolic root finding
  61. Symbolic Integration
  62. Implement Flashfill in Julia
  63. Parquet.jl enhancements
  64. MCMCChains improvements
  65. Particle filtering methods
  66. Nested Sampling
  67. GPU acceleration
  68. Documentation and tutorial improvements
  69. Iterative Methods for Inference in Gaussian Processes
  70. Implement advanced variational Gaussian process models
  71. VS Code extension
  72. Package installation UI
  73. Code generation improvements and async ABI
  74. Wasm threading
  75. High performance, Low-level integration of js objects
  76. DOM Integration
  77. Porting existing web-integration packages to the wasm platform
  78. Iodide notebook integration
  79. Native dependencies for the web
  80. Distributed computing with untrusted parties
  81. Deployment

Numerical Linear Algebra

Matrix functions

Matrix functions map matrices onto other matrices, and can often be interpreted as generalizations of ordinary functions like sine and exponential, which map numbers to numbers. Once considered a niche province of numerical algorithms, matrix functions now appear routinely in applications to cryptography, aircraft design, nonlinear dynamics, and finance.

This project proposes to implement state of the art algorithms that extend the currently available matrix functions in Julia, as outlined in issue #5840. In addition to matrix generalizations of standard functions such as real matrix powers, surds and logarithms, students will be challenged to design generic interfaces for lifting general scalar-valued functions to their matrix analogues for the efficient computation of arbitrary (well-behaved) matrix functions and their derivatives.

Recommended Skills: A strong understanding of calculus and numerical analysis.

Expected Results: New and faster methods for evaluating matrix functions.

Mentors: Jiahao Chen, Steven Johnson.

Difficulty: Hard

Better Bignums Integration

Julia currently supports big integers and rationals, making use of the GMP. However, GMP currently doesn't permit good integration with a garbage collector.

This project therefore involves exploring ways to improve BigInt, possibly including:

This experimentation could be carried out as a package with a new implementation, or as patches over the existing implementation in Base.

Expected Results: An implementation of BigInt in Julia with increased performance over the current one.

Require Skills: Familiarity with extended precision numerics OR performance considerations. Familiarity either with Julia or GMP.

Mentors: Jameson Nash

Difficulty: Hard

Special functions

As a technical computing language, Julia provides a huge number of special functions, both in Base as well as packages such as StatsFuns.jl. At the moment, many of these are implemented in external libraries such as Rmath and openspecfun. This project would involve implementing these functions in native Julia (possibly utilising the work in SpecialFunctions.jl), seeking out opportunities for possible improvements along the way, such as supporting Float32 and BigFloat, exploiting fused multiply-add operations, and improving errors and boundary cases.

Recommended Skills: A strong understanding of calculus.

Expected Results: New and faster methods for evaluating properties of special functions.

Mentors: Steven Johnson. Ask on Discourse or on slack

A Julia-native CCSA optimization algorithm

The CCSA algorithm by Svanberg (2001) is a nonlinear programming algorithm widely used in topology optimization and for other large-scale optimization problems: it is a robust algorithm that can handle arbitrary nonlinear inequality constraints and huge numbers of degrees of freedom. Moreover, the relative simplicity of the algorithm makes it possible to easily incorporate sparsity in the Jacobian matrix (for handling huge numbers of constraints), approximate-Hessian preconditioners, and as special-case optimizations for affine terms in the objective or constraints. However, currently it is only available in Julia via the NLopt.jl interface to an external C implementation, which greatly limits its flexibility.

Recommended Skills: Experience with nonlinear optimization algorithms and understanding of Lagrange duality, familiarity with sparse matrices and other Julia data structures.

Expected Results: A package implementing a native-Julia CCSA algorithm.

Mentors: Steven Johnson.

Pluto.jl projects

Pluto as a VS Code notebook

VS Code is an extensible editor, and one of its most recent features is a notebook GUI, with a corresponding Notebook API, allowing extension developers to write their own notebook backend. We want to combine two popular Julia IDEs: VS Code and Pluto.jl, and use it to provide a mature editing and debugging experience combined with Pluto's reactivity.

Expected Results: Reactive notebook built on top of VSCode's notebook API.

Recommended skills: JavaScript/TypeScript, some Julia experience

Mentors: Sebastian Pfitzner (core maintainer of julia-vscode), Fons van der Plas (core maintainer of Pluto.jl) and friends

Also see the other VS Code projects!

Macro support

Macros are a core feature of Julia, and many important packages (Flux, JuMP, DiffEq, …) use them in creative ways. Pluto's reactivity is based on syntax analysis to find the assigned and referenced variables of each cell. This powers not just reactive evaluation, but also Pluto's global scope management, and @bind interactivity. (See the JuliaCon presentation for more info.)

Macros can assign to a variable without Pluto detecting it as such. For example, @variables x y from Symbolics.jl assigns to variables x and y, while Pluto thinks that x and y were referenced. Your project is to add macro support to Pluto. Julia has the built-in ability to 'expand' macros on demand, but integrating this into Pluto's reactive runtime remains a significant algorithm design problem. More info in Pluto.jl#196.

Expected Results: First objective: process macros from packages, second (more difficult) objective: support macros defined inside the notebook itself.

Recommended skills: Julia, you will learn about metaprogramming, algorithm design and distributed computing

Mentors: Fons van der Plas and fellow Pluto.jl maintainers

Tools for education

Pluto's primary use case is education, and we recently started using Pluto notebooks as an 'interactive textbook': https://computationalthinking.mit.edu/ . If you are interested in design and interactive visualization, there are lots of cool JS projects in this area. Examples include:

Expected Results: One of the items above! When finished, your work will be used in future editions of the Computational Thinking course and more!

Recommended skills: JavaScript, CSS, you can learn Julia as part of the project.

Mentors: Fons van der Plas, Connor Burns and fellow Pluto.jl maintainers, with feedback from Alan Edelman

Pythia – Summer of Code

Machine Learning Time Series Regression

Pythia is a package for scalable machine learning time series forecasting and nowcasting in Julia.

The project mentors are Andrii Babii and Sebastian Vollmer.

Machine learning for nowcasting and forecasting

This project involves developing scalable machine learning time series regressions for nowcasting and forecasting. Nowcasting in economics is the prediction of the present, the very near future, and the very recent past state of an economic indicator. The term is a contraction of "now" and "forecasting" and originates in meteorology.

The objective of this project is to introduce scalable regression-based nowcasting and forecasting methodologies that demonstrated the empirical success in data-rich environment recently. Examples of existing popular packages for regression-based nowcasting on other platforms include the "MIDAS Matlab Toolbox", as well as the 'midasr' and 'midasml' packages in R. The starting point for this project is porting the 'midasml' package from R to Julia. Currently Pythia has the sparse-group LASSO regression functionality for forecasting.

The following functions are of interest: in-sample and out-of sample forecasts/nowcasts, regularized MIDAS with Legendre polynomials, visualization of nowcasts, AIC/BIC and time series cross-validation tuning, forecast evaluation, pooled and fixed effects panel data regressions for forecasting and nowcasting, HAC-based inference for sparse-group LASSO, high-dimensional Granger causality tests. Other widely used existing functions from R/Python/Matlab are also of interest.

Recommended skills: Graduate-level knowledge of time series analysis, machine learning, and optimization is helpful.

Expected output: The student is expected to produce code, documentation, visualization, and real-data examples.

References: Contact project mentors for references.

Time series forecasting at scales

Modern business applications often involve forecasting hundreds of thousands of time series. Producing such a gigantic number of reliable and high-quality forecasts is computationally challenging, which limits the scope of potential methods that can be used in practice, see, e.g., the 'forecast', 'fable', or 'prophet' packages in R. Currently, Julia lacks the scalable time series forecasting functionality and this project aims to develop the automated data-driven and scalable time series forecasting methods.

The following functionality is of interest: forecasting intermittent demand (Croston, adjusted Croston, INARMA), scalable seasonal ARIMA with covariates, loss-based forecasting (gradient boosting), unsupervised time series clustering, forecast combinations, unit root tests (ADF, KPSS). Other widely used existing functions from R/Python/Matlab are also of interest.

Recommended skills: Graduate-level knowledge of time series analysis is helpful.

Expected output: The student is expected to produce code, documentation, visualization, and real-data examples.

References: Contact project mentors for references.

Scientific Machine Learning (SciML) Projects

These projects are hosted by the SciML Open Source Scientific Machine Learning Software Organzation.

Physics-Informed Neural Networks (PINNs) and Solving Differential Equations with Deep Learning

Neural networks can be used as a method for efficiently solving difficult partial differential equations. Recently this strategy has been dubbed physics-informed neural networks and has seen a resurgence because of its efficiency advantages over classical deep learning. Efficient implementations from recent papers are being explored as part of the NeuralNetDiffEq.jl package. The issue tracker contains links to papers which would be interesting new neural network based methods to implement and benchmark against classical techniques. Project work in this area includes:

This project is good for both software engineers interested in the field of scientific machine learning and those students who are interested in perusing graduate research in the field.

Recommended Skills: Background knowledge in numerical analysis and machine learning.

Expected Results: New neural network based solver methods.

Mentors: Chris Rackauckas

Improvements to Neural and Universal Differential Equations

Neural ordinary differential equations have been shown to be a way to use machine learning to learn differential equation models. Further improvements to the methodology, like universal differential equations have incorporated physical and biological knowledge into the system in order to make it a data and compute efficient learning method. However, there are many computational aspects left to explore. The purpose of this project is to enhance the universal differential equation approximation abilities of DiffEqFlux.jl, adding features like:

See the DiffEqFlux.jl issue tracker for full details.

This project is good for both software engineers interested in the field of scientific machine learning and those students who are interested in perusing graduate research in the field.

Recommended Skills: Background knowledge in numerical analysis and machine learning.

Expected Results: New and improved methods for neural and universal differential equations.

Accelerating optimization via machine learning with surrogate models

In many cases, when attempting to optimize a function f(p) each calculation of f is very expensive. For example, evaluating f may require solving a PDE or other applications of complex linear algebra. Thus, instead of always directly evaluating f, one can develop a surrogate model g which is approximately f by training on previous data collected from f evaluations. This technique of using a trained surrogate in place of the real function is called surrogate optimization and mixes techniques from machine learning to accelerate optimization.

Advanced techniques utilize radial basis functions and Gaussian processes in order to interpolate to new parameters to estimate f in areas which have not been sampled. Adaptive training techniques explore how to pick new areas to evaluate f to better hone in on global optima. The purpose of this project is to explore these techniques and build a package which performs surrogate optimizations.

Recommended Skills: Background knowledge of standard machine learning, statistical, or optimization techniques. Strong knowledge of numerical analysis is helpful but not required.

Expected Results: Library functions for performing surrogate optimization with tests on differential equation models.

Mentors: Chris Rackauckas

Parameter estimation for nonlinear dynamical models

Machine learning has become a popular tool for understanding data, but scientists typically understand the world through the lens of physical laws and their resulting dynamical models. These models are generally differential equations given by physical first principles, where the constants in the equations such as chemical reaction rates and planetary masses determine the overall dynamics. The inverse problem to simulation, known as parameter estimation, is the process of utilizing data to determine these model parameters.

The purpose of this project is to utilize the growing array of statistical, optimization, and machine learning tools in the Julia ecosystem to build library functions that make it easy for scientists to perform this parameter estimation with the most high-powered and robust methodologies. Possible projects include improving methods for Bayesian estimation of parameters via Stan.jl and Julia-based libraries like Turing.jl, or global optimization-based approaches. Novel techniques like classifying model outcomes via support vector machines and deep neural networks can also be considered. Research and benchmarking to attempt to find the most robust methods will take place in this project. Additionally, the implementation of methods for estimating structure, such as topological sensitivity analysis along with performance enhancements to existing methods will be considered.

Some work in this area can be found in DiffEqParamEstim.jl and DiffEqBayes.jl. Examples can be found in the DifferentialEquations.jl documentation.

Recommended Skills: Background knowledge of standard machine learning, statistical, or optimization techniques. It's recommended but not required that one has basic knowledge of differential equations and DifferentialEquations.jl. Using the differential equation solver to get outputs from parameters can be learned on the job, but you should already be familiar (but not necessarily an expert) with the estimation techniques you are looking to employ.

Expected Results: Library functions for performing parameter estimation and inferring properties of differential equation solutions from parameters. Notebooks containing benchmarks determining the effectiveness of various methods and classifying when specific approaches are appropriate will be developed simultaneously.

Mentors: Chris Rackauckas, Vaibhav Dixit

Integration of FEniCS.jl with dolfin-adjoint + Zygote.jl for Finite Element Scientific Machine Learning

Scientific machine learning requires mixing scientific computing libraries with machine learning. This blog post highlights how the tooling of Julia is fairly advanced in this field compared to alternatives such as Python, but one area that has not been completely worked out is integration of automatic differentiation with partial differential equations. FEniCS.jl is a wrapper to the FEniCS project for finite element solutions of partial differential equations. We would like to augment the Julia wrappers to allow for integration with Julia's automatic differentiation libraries like Zygote.jl by using dolfin-adjoint. This would require setting up this library for automatic installation for Julia users and writing adjoint passes which utilize this adjoint builder library. It would result in the first total integration between PDEs and neural networks.

Recommended Skills: A basic background in differential equations and Python. Having previous Julia knowledge is preferred but not strictly required.

Expected Results: Efficient and high-quality implementations of adjoints for Zygote.jl over FEniCS.jl functions.

Mentors: Chris Rackauckas

Multi-Start Optimization Methods

While standard machine learning can be shown to be "safe" for local optimization, scientific machine learning can sometimes require the use of globalizing techniques to improve the optimization process. Hybrid methods, known as multistart optimization methods, glue together a local optimization technique together with a parameter search over a large space of possible initial points. The purpose of this project would be to take a MultistartOptimization.jl as a starting point and create a fully featured set of multistart optimization tools for use with Optim.jl

Recommended Skills: A basic background in optimization. Having previous Julia knowledge is preferred but not strictly required.

Expected Results: Efficient and high-quality implementations of multistart optimization methods.

Mentors: Chris Rackauckas and Patrick Kofod Mogensen

Symbolic computation project ideas

Groebner basis and Symbolic root finding

Implement solving polynomial equation systems symbolically. (I.e. finding the variety of a set of polynomials). This involves first computing the groebner basis for a set of polynomials. Groebner basis computation is NP complete so it is essential that the implementation is practical. It should start by studying the literature on state-of-the art Groebner basis solvers.

Recommended Skills: Calculus and discrete mathematics. Prior knowledge of computational algebra and ring theory is preferred.

Expected Results: Working Groebner basis and rootfinding algorithms to be deployed in the Symbolics.jl package, along with documentation and tutorials.

Mentors: Shashi Gowda, Yingbo Ma, Mason Protter

Symbolic Integration

Implement the heuristic approach to symbolic integration. Then hook into a repository of rules such as RUMI

Recommended Skills: Calculus

Expected Results: A working implementation of symbolic integration in the Symbolics.jl library, along with documentation and tutorials demonstrating its use in scientific disciplines.

Mentors: Shashi Gowda, Yingbo Ma, Mason Protter

Tabular Data – Summer of Code

Implement Flashfill in Julia

Difficulty: Medium

FlashFill is mechanism for creating data manipulation pipelines using programming by example (PBE). As an example see this implementation in Microsoft Excel. We want a version of Flashfill that can work against Julia tabular data structures, such as DataFrames and Tables.

Resources:

Recommended Skills: Compiler techniques, DSL generation, Program synthesis

Expected Output: A practical flashfill implementation that can be used on any tablular data structure in Julia

Mentors: Avik Sengupta

Parquet.jl enhancements

Difficulty: Medium

Apache Parquet is a binary data format for tabular data. It has features for compression and memory-mapping of datasets on disk. A decent implementation of Parquet in Julia is likely to be highly performant. It will be useful as a standard format for distributing tabular data in a binary format. There exists a Parquet.jl package that has a Parquet reader and a writer. It currently conforms to the Julia Tabular file IO interface at a very basic level. It needs more work to add support for critical elements that would make Parquet.jl usable for fast large scale parallel data processing. One or more of the following goals can be targeted:

Resources:

Recommended skills: Good knowledge of Julia language, Julia data stack and writing performant Julia code.

Expected Results: Depends on the specific projects we would agree on.

Mentors: Shashi Gowda, Tanmay Mohapatra

Turing Projects – Summer of Code

Turing is a universal probabilistic programming language embedded in Julia. Turing allows the user to write models in standard Julia syntax, and provide a wide range of sampling-based inference methods for solving problems across probabilistic machine learning, Bayesian statistics and data science etc. Since Turing is implemented in pure Julia code, its compiler and inference methods are amenable to hacking: new model families and inference methods can be easily added. Below is a list of ideas for potential projects, though you are welcome to propose your own to the Turing team.

If you are interested in exploring any of these projects, please reach out to the listed project mentors. You can find their contact information at turing.ml/team.

MCMCChains improvements

Mentors: Cameron Pfiffer, Hong Ge

Project difficulty: Easy

Description: MCMCChains is a key component of the Turing.jl ecosystem. It is the package that determines how to analyze and store MCMC samples provided by packages like Turing. It's also used outside of Turing.

For this project, a student might improve the performance of the various statistical functions provided by MCMCChains, changing the back end to use a data storage format that maintains the shape of parameter samples, or improve the general plotting functionality of the package.

There's lots of fun little things to do for MCMCChains. Check out this meta-issue for more details and dicussions.

Particle filtering methods

Mentors: Hong Ge, Cameron Pfiffer

Project difficulty: Medium

Description: Turing's support for particle sampling methods is slowing being improved with the addition of AdvancedPS.jl. If you're interested in implementing or improving particle sampling methods, this is a great project for you!

Nested Sampling

Mentors: Miles Lucas, Cameron Pfiffer, Hong Ge

Project difficulty: Hard

Description: NestedSamplers.jl is an excellent package which implements nested sampling methods. As of yet, it is not connected to Turing.jl. For this project, a student would connect the NestedSamplers.jl library to Turing.jl.

GPU acceleration

Mentors: Mohamed Tarek, Hong Ge, Kai Xu, Tor Fjelde

Project difficulty: Medium

Description: Turing's native GPU support is limited in that the Metropolis-Hastings and HMC samplers do not implement GPU sampling methods. This can and should be done – GPU methods are awesome! If you are interested with working on parallelism and GPUs, this project is for you.

Students will work with the code at AdvancedMH or AdvancedHMC, depending on their interests.

Documentation and tutorial improvements

Mentors: Cameron Pfiffer, Martin Trapp

Project difficulty: Easy

Description: Turing's documentation and tutorials need a bit of an overhaul. Turing has changed significantly since the last time the documentation was written, and it's beginning to show. Students would use their knowledge of probabilistic programming languages and Turing to shore-up or rewrite documentation and tutorials.

Iterative Methods for Inference in Gaussian Processes

Mentors: Will Tebbutt, S. T. John, Theo Galy-Fajou

Project difficulty: Medium

Description: There has recently been quite a bit of work on inference methods for GPs that use iterative methods rather than the Cholesky factorisation. They look quite promising, but no one has implemented any of these within the Julia GP ecosystem yet, but they should fit nicely within the AbstractGPs framework. If you're interested in improving the GP ecosystem in Julia, this project might be for you!

Implement advanced variational Gaussian process models

Mentors: ST John, Will Tebbutt, Theo Galy-Fajou

Project difficulty: Easy to Medium

Description: Sparse variational Gaussian process models provide the flexibility to scale to large datasets, handle arbitrary (non-conjugate) likelihoods, and to be used as building blocks for composite models such as deep GPs. This project is about making such models more readily available within the Julia GP ecosystem - depending on your interests you can focus on making it easier for end users and providing good tutorials, or on the implementations of these models to give us the same or better performance as with established Python packages such as GPflow, integrating with Flux.jl, etc.

VS Code projects

VS Code extension

We are generally looking for folks that want to help with the Julia VS Code extension. We have a long list of open issues, and some of them amount to significant projects.

Required Skills: TypeScript, julia, web development.

Expected Results: Depends on the specific projects we would agree on.

Mentors: David Anthoff

Package installation UI

The VSCode extension for Julia could provide a simple way to browse available packages and view what's installed on a users system. To start with, this project could simply provide a GUI that reads in package data from a Project.toml/Manifest.toml and show some UI elements to add/remove/manage those packages.

This could also be extended by having metadata about the package, such as a readme, github stars, activity and so on (somewhat similar to the VSCode-native extension explorer).

Expected Results: A UI in VSCode for package operations.

Recommended Skills: Familiarity with TypeScript and Julia development.

Mentors: Sebastian Pfitzner

Also take a look at Pluto - VS Code integration!

Web Platform Projects – Summer of Code

Julia has early support for targeting WebAssembly and running in the web browser. Please note that this is a rapidly moving area (see the project repository for a more detailed overview), so if you are interested in this work, please make sure to inform yourself of the current state and talk to us to scope out an appropriate project. The below is intended as a set of possible starting points.

Mentor for these projects is Keno Fischer unless otherwise stated.

Code generation improvements and async ABI

Because Julia relies on an asynchronous task runtime and WebAssembly currently lacks native support for stack management, Julia needs to explicitly manage task stacks in the wasm heap and perform a compiler transformation to use this stack instead of the native WebAssembly stack. The overhead of this transformation directly impacts the performance of Julia on the wasm platform. Additionally, since all code Julia uses (including arbitrary C/C++ libraries) must be compiled using this transformation, it needs to cover a wide variety of inputs and be coordinated with other users having similar needs (e.g. the Pyodide project to run python on the web). The project would aim to improve the quality, robustness and flexibility of this transformation.

Recommended Skills: Experience with LLVM.

Wasm threading

WebAssembly is in the process of standardizing threads. Simultaneously, work is ongoing to introduce a new threading runtime in julia (see #22631 and replated PRs). This project would investigate enabling threading support for Julia on the WebAssembly platform, implementing runtime parallel primitives on the web assembly platform and ensuring that high level threading constructs are correctly mapped to the underlying platform. Please note that both the WebAssembly and julia threading infrastructure is still in active development and may continue to change over the duration of the project. An informed understanding of the state of these projects is a definite prerequisite for this project.

Recommended Skills: Experience with C and multi-threaded programming.

High performance, Low-level integration of js objects

WebAssembly is in the process of adding first class references to native objects to their specification. This capability should allow very high performance integration between julia and javascript objects. Since it is not possible to store references to javascript objects in regular memory, adding this capability will require several changes to the runtime system and code generation (possibly including at the LLVM level) in order to properly track these references and emit them either as direct references to as indirect references to the reference table.

Recommended Skills: Experience with C.

DOM Integration

While julia now runs on the web platform, it is not yet a language that's suitable for first-class development of web applications. One of the biggest missing features is integration with and abstraction over more complicated javascript objects and APIs, in particular the DOM. Inspiration may be drawn from similar projects in Rust or other languages.

Recommended Skills: Experience with writing libraries in Julia, experience with JavaScript Web APIs.

Porting existing web-integration packages to the wasm platform

Several Julia libraries (e.g. WebIO.jl, Escher.jl) provide input and output capabilities for the web platform. Porting these libraries to run directly on the wasm platform would enable a number of existing UIs to automatically work on the web.

Recommended Skills: Experience with writing libraries in Julia.

Iodide notebook integration

Experimental support exists for running Julia/wasm inside Iodide notebooks. There are a number of possible improvements to this integration, such as improving the quality of output and allowing interactive exploration of Julia objects from the iodide frontend. In addition, iodide notebooks should have support for specifying Julia manifest files in order to allow reproducibility in the face of changing package versions.

Recommended Skills: Experience with JavaScript.

Native dependencies for the web

The Julia project uses BinaryBuilder to provide binaries of native dependencies of julia packages. Experimental support exists to extend this support to the wasm platform, but few packages have been ported. This project would consist of attempting to port a significant fraction of the binary dependencies of the julia ecosystem to the web platform by improving the toolchain support in BinaryBuilder or (if necessary), porting upstream packages to fix assumptions not applicable on the wasm platform.

Recommended Skills: Experience with building native libraries in Unix environments.

Distributed computing with untrusted parties

The Distributed computing abstractions in julia provide convenient abstraction for implementing programs that span many communicating julia processes on different machines. However, the existing abstractions generally assume that all communicating processes are part of the same trust domain (e.g. they allow messages to execute arbitrary code on the remote). With some of the nodes potentially running in the web browser (or multiple browser nodes being part of the same distributed computing cluster via WebRPC), this assumption no longer holds true and new interfaces need to be designed to support multiple trust domains without overly restricting usability.

Recommended Skills: Experience with distributed computing and writing libraries in Julia.

Deployment

Currently supported use cases for julia on the web platform are primarily geared towards providing interactive environments to support exploration of the full language. Of course, this leads to significantly larger binaries than would be required for using Julia as part of a production deployment. By disabling dynamic language features (e.g. eval) one could generate small binaries suitable for deployment. Some progress towards this exists in packages like PackageCompiler.jl, though significant work remains to be done.

Recommended Skills: Interest in or experience with Julia internals.