Below are the projects which have been proposed for Google Season of Docs under the umbrella of the Julia Language. If you have questions about potential projects, the first point of contact would be the mentor(s) listed on the project. If you are unable to get ahold of the potential mentor(s), you should email jsoc@julialang.org
and CC logan@julialang.org
.
We at the Julia Language are committed to making the application process and participation in GSoD with Julia accessible to everyone. If you have questions or requests, please do reach out and we will do our best to accommodate you.
Learn from one of our technical writers about their experience with GSoD:
Below you can find a running list of potential GSoD projects. If any of these are of interest to you, please reachout to the respective mentor(s).
Given the large scope and breadth of the SciML Ecosystem, the project will be participating independently from the Julia umbrella. SciML is also a NumFOCUS sponsored project since 2020 and runs other dev programs separate from the Julia umbrella. You can find the SciML GSoD projects on their website.
The Flux.jl project will be applying separately from the Julia umbrella since Flux is now a NumFOCUS affiliated project. You can find the GSoD projects proposed for Flux on their wesbite.
Turing.jl is a probabilistic programming language written in Julia. The team is looking for help with documentation and tutorials for several projects. Mentors for this project would be Cameron Pfiffer, Martin Trapp, Kai Xu, or Hong Ge.
Some ideas include:
Documentation and manuals for MCMCChains, AdvancedHMC, and Bijectors. Turing maintains several separate packages, all of which have some documentation, but each of these could be improved dramatically with a dedicated documentation website and better coverage of docstrings. Technical writers would review current documentation, meet with the development team for each package to determine the package goals, and assess whether current documentation meets those goals. AdvancedHMC, for example, requires a much more detailed guide that explores all of AdvancedHMC's functionality.
A more comprehensive tutorial that sow how to use Turing.jl with DifferenceEquations.jl. Turing has an existing tutorial that demonstrates how to perform Bayesian parameter estimation for the Lotka-Volterra model with DifferenceEquations.jl, but this is very much a toy example and does not demonstrate real-world applicability. A guide that shows how to apply Turing and Flux in a useful setting would be valuable for Julia, Turing, and Flux.
A more comprehensive tutorial that shows how to use Turing with Flux.jl. Turing has an existing tutorial that demonstrates how to build a Bayesian neural network with Flux.jl, but this is very much a toy example and does not demonstrate real-world applicability. A guide that shows how to apply Turing and Flux in a useful setting would be valuable for Julia, Turing, and Flux. One possibility is to reproduce some results from Radford Neal's PhD thesis.
Polishing other existing Turing tutorials. Turing currently has more than 10 tutorials at https://turing.ml/dev/tutorials/ and most of them were written in early days of Turing.jl. Some of them requires an update to the latest syntax and most of them would benefit from a general writing improvement.
A structured explanation of the different inference algorithms provided in Turing with details on when to use each algorithm and what are the implications of each approach. Turing has many sampling algorithms, but their value is not fully recognized by inexperienced users – heuristics on when to use which algorithms would greatly benefit the user experience. Documentation might run speed tests, or measure convergence criteria for different sampler types. One such blog post focused mostly on speed, but there are features of different models and data that make some samplers preferable to others.
An introduction to probabilistic modelling and the Bayesian approach in Turing with a discussion of general Bayesian issues such as the choice of prior. Turing lacks a good resource for those who are just beginning with probabilistic modelling, and users tend to use Stan's excellent documentation or Probabilistic Programming & Bayesian Methods for Hackers to learn how to use probabilistic programming. Turing should be self-contained when it comes to grabbing early-phase learners.
Turing is a rapidly developing probabilistic programming language, used by machine learning researchers, data scientists, statisticians, and economists. Improving any measure of the informational tools in Turing will allow those communities to integrate better with the Julia community, which will, in turn, improve the rest of Julia's ecosystem. Better documentation and guides will attract new learners and help to transition more experienced people from tools that do not meet their needs.
MLJ (Machine Learning in Julia) is the most popular multi-paradigm machine learning toolbox written in the Julia language. It provides a common interface and meta-algorithms for selecting, tuning, evaluating, composing and almost 200 machine learning models written in Julia and other languages. Such models include neural networks (based on the popular Flux.jl package) tree-based models (such as random forests) support vector machines, nearest neighbor models, outlier detection models, general linear models, clustering algorithms (such as K-means), and more. In particular MLJ wraps a large number of models from the python toolbox scikit-learn.
While the the MLJ ecosystem is spread over some two dozen repositories the reference documentation is mostly collected in a single manual. Additional learning resources, which include a dedicated tutorial site, are listed here.
The reference manual is comprehensive from the point-of-view of what you can do with models (train, tune, evaluated, combined, etc) but has no model-specific documentation at all. Only some models have document strings, but these usually lack detail or examples, and do not conform to any standard.
The present project can be divided into two parts:
Model documentation
Create a detailed document string for each model in MLJ's model registry, as outlined in this github issue, or at least the most popular models. A key part of each document string is a short example illustrating basic usage. Most models are provided by third party packages, which generally have their own documentation, so this is often a simple matter of adapting existing documentation to MLJ syntax.
The models wrapped from sckit-learn (about 70) constitute a separate case, as the available documentation is in python and not Julia. Initially, docstrings for these models will simply quote the python documentation. However, generating these may require some coding (e.g., Julia macros, artifacts) and so is optional for this project.
Integration of model documentation into the reference manual
Models can be loosely grouped into families (regressors, classifiers, clustering algorithms, etc) and integrating the new document strings into the reference manual could fit into such an organization - something resembling the model documentation in scikit-learn.
Writers will need some familiarity with basic machine learning workflows, and very basic Julia, but will also have the opportunity to develop both.
Julia is perceived as a potential game-changer for machine learning. Current practice is dominated by platforms in python and R, but there innovation is increasingly stifled by the two language problem solved by Julia. Good documentation is essential both to ensure MLJ.jl is an attractive option to practicing data scientists, and to those training new data scientists.
Plots.jl is a unified API for several plotting libraries popular for its composable recipe system.
Plots.jl had for long time a list of examples demonstrating basic functionality. Recently, a user gallery was added that should show particular good looking demos (chloropleths, gradients, animations, ... ) using DemoCards.jl. Interested writers would add new examples potentially taking inspiration from other existing galleries.
Writers will need some experience in producing visualisations and have a good sense for aesthetics.
Javis.jl is a tool focused on providing an easy to use interface for creating winsome and efficient animations and visualizations quickly - with an emphasis on having fun! 😃
Javis.jl uses an Action-Object relationship when creating animations. Tutorial 3 was written early when we were pioneering this paradigm for Javis. A lot more has been added to Javis that has made working with Objects and Actions even more powerful! Those interested in this project should be excited to dive into this new exciting paradigm and work with mentors on how to develop this tutorial.
Javis.jl uses an Action-Object relationship when creating animations. Tutorial 5 was written early when we were pioneering this paradigm for Javis. A lot more has been added to Javis that has made working with Objects and Actions even more powerful! Those interested in this project should be excited to dive into this new exciting paradigm and work with mentors on how to revamp our mascot for Javis!
A wonderful project in the JuliaPlots ecosystem called BeautifulMakie was created by Lazaro Alonso. Lazaro graciously allowed us to potentially fork this project to make a new place for Javis - BeautifulJavis! BeautifulJavis is waiting to be created to be populated with beautiful Javis animations with tutorials or instructions on how to make them. Interested individuals should be inspired by the opportunity to unleash their creativity in exploring Javis to its full potential to create beautiful and incredible Javis animations.
Potential contributors will need creativity, some experience or appreciation of the arts, and enthusiasm for teaching.
Jacob Zelko - email, Slack (username: TheCedarPrince), or Zulip (username: TheCedarPrince)