A hands-on introduction to quarto with a Computo submission
May 31, 2024
Editorial board
IT support
Stat. learning, DR INRAE
Paris-Saclay University
Statistique, DR CNRS
IMT Toulouse
Optim./Machine-Learning
CR INRIA Lyon
CS/Stats/ML, IR CNRS
IMAG, Montpellier University
Machine learning, CR CNRS
Grenoble Alpes University
Statistics, MCF
Institut Agro Rennes-Angers
Machine Learning
CR MinesParisTech
Stats/ML/dev, IR CNRS
LBMC, ENS LYON
Fundamentally, it provides three things:
Tools to reproduce the results (that’s like cooking)
A “recipe” to reproduce the results (still like cooking)
A path to understand the results and the process that led to them (unlike cooking…1)
The pdf era and paper submission.
The reproducibility was not a priority:
And then in the Machine Learning domain, there was distill.pub [1]
but…
… engineering was too complex for the average scientist (a lot of javascript, etc.)
In fact, the distill.pub project was discontinued in 2021 [2]
distill.pub
distill.pub’s goals were right, but they outpaced themselves in terms of development complexity.
\(\Rightarrow\) bring the community to the higher standards
distill.pub’s goals were right, but they outpaced themselves in terms of development complexity.
distill.pub’s goals were right, but they outpaced themselves in terms of development complexity.
French statistical society appoints a “publication” committee (lead by Julien then Pierre) to develop a new journal
Assessment
Point of view
⇝ Emergence of “Computo” idea
Scientific perimeter
Promote contribution in statistics and machine learning that provide insight into which models or methods are more appropriate to address a specific scientific question
Open access
⇝ In accordance with Budapest Open Access Initiative (BOAI) and Plan S
Reproducible
Official launch at the end of 2021
Notebook and literate programming
text (markdown) + math (\(\LaTeX\)) + code (Python/R/Julia), references (bib\(\TeX\))
Environment management, Compilation, Multi-format publication (pdf, html)
Continuous integration/Continuous deployment (CI/CD)
markdown
Rmarkdown
Pandoc
Credit: Pratik89Roy CC-BY-SA-4.0 from Wikimedia
with template notebook document + doc + pre-configured compilation and publication setup
Let’s go, locally (same spirit as Jupyter/Rmarkdown notebooks)
Configuration file versionned and used during CI compilation/publication action
A git push
command will trigger your article compilation (including computations) and publication as a github page1
See the preconfigured .github/workflows/build.yml
file for the github action configuration2
If the CI process succeeds, both HTML and PDF versions are published on the github-page associated to the repository
https://openreview.net/group?id=Computo
Submit:
After a “traditionnal” review process, a 3 step procedure:
including
🥲 Fully operational + doi, ISSN
🙂 7 published articles articles, 3 in preproduction, 6 under review (more details here)
🙂 x presentations (Montpellier, Toronto, Humastica, Grenoble, RR2023, etc.)
🙂 French reproducible research network
🤯 Difficult to find reviewers
🤔 Institutional support?
🤔 Changing of practices in the scientific community?
quarto
: dynamic, agnostic language, FOSS1, community-based (pandoc
), Rstudio/Posit supportgithub
: dynamic, large user community but not institutional and limited computing resourcesThe global scientific workflow of a reproducible process for a Computo may be split in two types of steps:
External and Editorial
Requirement
If the notebook contains everything to produce the final document
\(\Rightarrow\) “Direct reproducibility” in the sense that the notebook is the only thing needed to reproduce the results.
Ultimately, the workflow must end with a direct reproducibility step which concludes the whole process.
data produced by the external process \(\Rightarrow\) transferred to the notebook environment.
Requirement
Not only the intermediate results are provided, but also the code to transfer it in the notebook environment.
They are a variety of software solutions to do so.
joblib.Memory
, caching mechanism for python functions, save the results of a function call to disk, and load it back later..RData
file format, can be loaded back in R with the load()
function..csv
, .tsv
, .json
, etc.) is also a solution..joblib
directory or .Rdata
file) could be committed to the git repository, and directly loaded in the notebook environment.In this workshop, we will learn how to use quarto to create a document that includes code, data, and narrative text. We will also learn how to make the CI (continuous integration) working.
To get started you will need to clone the mock template for this workshop. The template is available at
https://github.com/computorg/template-jds2024
https://github.com/computorg/template-jds2024
Include all branches
.Make a git clone
of the repository you just templated and open it in your favorite IDE.
Python version
Rename the published-paper-tsne-python.qmd
to published-paper-tsne.qmd
R version
Rename the published-paper-tsne-R.qmd
to published-paper-tsne.qmd