COMPUTO

An academic journal in statistics and machine learning promoting reproducibility and alternative publication mode

April 8, 2025

Team

Editorial board

IT support

Communication

Julien Chiquet (chief editor)

Stat. learning, DR INRAE
Paris-Saclay University

Pierre Neuvial

Statistique, DR CNRS
IMT Toulouse

Fra.-Dav. Collin

CS/Stats/ML, IR CNRS
IMAG, Montpellier University

Chloé Azencott

Machine Learning
CR MinesParisTech

Nelle Varoquaux

Machine learning, CR CNRS
Grenoble Alpes University

Marie-Pierre Étienne

Statistics, MCF
Institut Agro Rennes-Angers

Aymeric Stamm

Stats/dev, IR
Jean Leray, Univ. Nantes

Journal setup

Official launch at the end of 2021

“Economical” model

A few tenacious people…
Free/Open-source community tools (Pandoc, Quarto, Git forge)
Institutional support (AgroParisTech, CNRS, INRAE, INRIA, SFdS)

AAP Science Ouverte 🤩

Motivations

Origin (~ 2020s)

Mission carried out at the French statistical society (SFdS)

Assessment

😔 Multiplication of “traditional” journals, often predatory journal…

😱 ↘ of publication quality and time dedicated to each article (on author or reviewer sides) [1]

😔 Not enough valorization of “negative” results or source codes/case studies

😱 Issue with scientific reproducibility (analyses, experiments) [2–7]

Point of view

Need for renewal regarding scientific research implementation
Need for higher standard regarding result publications

Philosophy

Scientific perimeter

Promote contributions in stat/ML that provide insight into which models/methods are appropriate to address a scientific question

Open access

“Diamond” open access (free to publish and free to read, possible to reuse)
Content published under CC-BY license (attribution, share, adapt)
Reviews and discussions available after acceptance for publication (anonymous reviews)

Reproducible

Numerical reproducibility is a necessary condition (Source code and data should be available)

What is reproducible research?

Fundamentally, it provides three things:

Tools to reproduce the results (that’s like cooking)

A “recipe” to reproduce the results (still like cooking)

A path to understanding the results and the process that led to them (unlike cooking…¹)

Why reproducible research? [8–10]

Why reproducing scientific results?

To strengthen their credibility
To check for errors (everyone makes error at some point!!!)
To build new research upon them (science is incremental)

Issues?

Reproduce numerical scientific results is often difficult (technology/environment evolution, source code/environment configuration/software partially available or not available)
Waste of time and resources to reproduce existing non-reproducible results

Reproducible research?

For others but also for your future self
Improve result credibility
Facilitate future research works

Functioning of reproducibility at Computo

Writing system

Notebook and literate programming
text (markdown) + math (\(\LaTeX\)) + code (Python/R/Julia), references (bib\(\TeX\))

Publication system

Environment management, Compilation, Multi-format publication (pdf, html)
Continuous integration/Continuous deployment (CI/CD)

Reviewing system

Anonymous exchange published after acceptance
Reviewer pool (you can join)
[Ongoing switch from Open review to PCI]

Solutions/Prototype

Reproducible article and computations

Automatic editorial reproducibility

Scientific validation

Tools for authors

Document model

quarto Computo extension

Document template

Git template repository

with template notebook document + doc + pre-configured compilation and publication setup

Locally

Text editor/IDE (VS Code, Rstudio, NeoVim, etc.)
Quarto (compilation)

Julia / R / Python code + computations
git versioning system

Author point of view (1/3)

Step 0: setup a git repository for your article

Startup from a template repository (R, Python, Julia)

Tip

You can host your git repository on github and soon an any gitlab forge¹.

Step 1: write your article

Let’s go, locally (same spirit as Jupyter/Rmarkdown notebooks)

Author point of view (2/3)

Step 2: configure the environment (dependencies management)

Example in Python
Example in R
Example in Julia

venv: use a virtual environment and generate the requirements.txt file

# requirements.txt
jupyter
matplotlib
numpy

renv: generate the renv.lock file

renv::init()
renv::install("ggplot2")
# or equivalently install.packages("ggplot2")
renv::snapshot()

Pkg: native Julia package manager (with generated Project.toml et Manifest.toml files)

add Plots
add IJulia

Configuration file versionned and used during CI compilation/publication action

Step 3: (re)production

A git push command will trigger your article compilation (including computations) and publication as a github page¹

See the preconfigured .github/workflows/build.yml file for the github action configuration²

Author point of view (3/3)

Step 4: submit your article

If the CI process succeeds, both HTML and PDF versions are published on the github-page associated to the repository

Open review

https://openreview.net/group?id=Computo

Submit:

your article PDF (scientific content review)
your git repository (source code and reproducibility review)

Editor point of view

After a “traditionnal” review process, a 3 step procedure:

Acceptance
Pre-production
Publication in Computo (with a DOI)

including

Copy of the author git repository to https://github.com/computorg/
Final version formatting
Review report publication
Registration in the journal bibliographic data base
Copy of the repository to Software Heritage for archiving
Publication of the article on the journal website

Task-list = github issue

3 year report

🥲 Fully operational + doi, ISSN

🙂 15 published articles articles, 5 under review (more details here)

🙂 x presentations (Montpellier, Toronto, Humastica, Grenoble, RR2023, etc.)

🙂 French reproducible research network

🤯 Difficult to find reviewers

🙂 Referencing and Visibility: Mir@bel, Open Policy/Sherpa Romeo -> DOAJ

🤯 Google Scholar: dark black box

🤔 How to build on institutional support?

🤔 Changing of practices in the scientific community?

Discussion, Perspectives

Co-lead a stat/ML Peer Community-In (PCI)¹ for the reviewing process, to expand our visibility
Provision of computing resources (to be able to run all computations)
Improve long-term reproducibility stack
github: dynamic, large user community but not institutional and limited computing resources. Switch to a French institutional gitlab forge?

How to help?

By submitting² your work!

By becoming reviewer³

Regarding the logo

References

Hanson, M A, Barreiro, P G, Crosetto, P, and Brockington, D 2023 The strain on scientific publishing. DOI: https://doi.org/10.48550/arXiv.2309.15884

Ioannidis, J P A 2005 Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. DOI: https://doi.org/10.1371/journal.pmed.0020124

Steen, R G 2011 Retractions in the scientific literature: Is the incidence of research fraud increasing? Journal of Medical Ethics, 37(4): 249–253. DOI: https://doi.org/10.1136/jme.2010.040923

Allison, D B, Brown, A W, George, B J, and Kaiser, K A 2016 Reproducibility: A tragedy of errors. Nature, 530(7588): 27–29. DOI: https://doi.org/10.1038/530027a

Bastian, H 2016 Reproducibility Crisis Timeline: Milestones in Tackling Research Reliability. URL https://absolutelymaybe.plos.org/2016/12/05/reproducibility-crisis-timeline-milestones-in-tackling-research-reliability/. [Online; accessed 22-March-2023]

Whitfield, J 2021 Replication Crisis. London Review of Books, 43(19). URL https://www.lrb.co.uk/the-paper/v43/n19/john-whitfield/replication-crisis. [Online; accessed 22-March-2023]

Hernández, J A and Colom, M 2023 Repeatability, Reproducibility, Replicability, Reusability (4R) in Journals’ Policies and Software/Data Management in Scientific Publications: A Survey, Discussion, and Perspectives. URL https://hal.science/hal-04322522. [Online; accessed 4-January-2024]

Desquilbet, L L, Granger, S, Hejblum, B, Legrand, A, Pernot, P, Rougier, N P, Castro Guerra, E de, Courbin-Coulaud, M, Duvaux, L, Gravier, P, Le Campion, G, Roux, S, and Santos, F 2019 Vers une recherche reproductible. Unité régionale de formation à l’information scientifique et technique de Bordeaux. URL https://hal.science/hal-02144142

Hejblum, B P, Kunzmann, K, Lavagnini, E, Hutchinson, A, Robertson, D, Jones, S, and Eckes-Shephard, A 2020 Realistic and Robust Reproducible Research for Biostatistics. DOI: https://doi.org/10.20944/preprints202006.0002.v1

10.

The Turing Way Community 2022 The Turing Way: A handbook for reproducible, ethical and collaborative research. DOI: https://doi.org/10.5281/zenodo.7625728

11.

Knuth, D E 1984 Literate programming. The Computer Journal, 27(2): 97–111.

Note on literate programming

Literate programming [11]: notebook including text and code
Markup formatting language: e.g. markdown
Separate content from rendering (≠ “what you see is what you get” editors)
Rendering includes text, code and results (from code computations)

Note on quarto

https://quarto.org

Generalization of Rmarkdown
Relying on top community tools like universal document converter Pandoc
Developed and supported by RStudio/Posit
Native support of complex documents (website, articles, books) and multiple languages for computations (R, Python, Julia)
Management of references, citations, figures, tables, metadata, etc.

Note on continuous integration

Implementation in git forges (e.g. github actions or gitlab CI/CD)
Triggered by commits
Automatic tests
Automatic deployment: package/software publication, website

Credit: Pratik89Roy CC-BY-SA-4.0 from Wikimedia

Two-fold reproducibility

The global scientific workflow of a reproducible process split into two types of steps

1. External

Process to obtain (intermediate) results outside of the notebook environment, for a list of reasons (non-exclusive to each other):

the process is too long to be conducted in a notebook
the data to be processed is too big to be handled directly in the notebook
it needs a specific environment (e.g. a cluster, with gpus, etc.)
it needs to involve specific languages (e.g. C, C++, Fortran, etc.) or build tools (e.g. make, cmake, etc.)

2. Editorial

Notebook rendering with the results of the external process

Requirement

If the notebook contains everything to produce the final document

\(\Rightarrow\) “Direct reproducibility” in the sense that the notebook is the only thing needed to reproduce the results.

Ultimately, the workflow must end with a direct reproducibility step which concludes the whole process.

Other Reproducibility considerations

Data transfer: When applicable, the switch from external to editorial reproducibility is done with a “data transfer” step,

data produced by the external process \(\Rightarrow\) transferred to the notebook environment.

Requirement

Not only the intermediate results are provided, but also the code to transfer it in the notebook environment.

There are a variety of software solutions to do so.

Examples of data transfer solutions

Intermediate results storage

Python: joblib.Memory, caching mechanism for python functions, save the results of a function call to disk, and load it back later.
R : .RData file format, can be loaded back in R with the load() function.
If results are small enough, storing them in a text file (e.g. .csv, .tsv, .json, etc.) is also a solution.

Transfer of the results to the notebook environment

(.joblib directory or .Rdata file) could be committed to the git repository, and directly loaded in the notebook environment.
Alternative, centralize input data (when large enough) and intermediate results on a shared scientific provider (we recommend Zenodo for this purpose), and download them in the notebook environment.