Team

Editorial board

IT support

Communication

Julien Chiquet (chief editor)

Stat. learning, DR INRAE
Paris-Saclay University

Pierre Neuvial

Statistique, DR CNRS
IMT Toulouse

Fra.-Dav. Collin

CS/Stats/ML, IR CNRS
IMAG, Montpellier University

Chloé Azencott

Machine Learning
CR MinesParisTech

Nelle Varoquaux

Machine learning, CR CNRS
Grenoble Alpes University

Marie-Pierre Étienne

Statistics, MCF
Institut Agro Rennes-Angers

Aymeric Stamm

Stats/dev, IR
Jean Leray, Univ. Nantes

Journal setup

Official launch at the end of 2021

“Economical” model

  • A few tenacious people…
  • Free/Open-source community tools (Pandoc, Quarto, Git forge)
  • Institutional support (AgroParisTech, CNRS, INRAE, INRIA, SFdS)

AAP Science Ouverte 🤩

Motivations


Origin (~ 2020s)

Mission carried out at the French statistical society (SFdS)


Assessment

😔 Multiplication of “traditional” journals, often predatory journal…

😱 ↘ of publication quality and time dedicated to each article (on author or reviewer sides) [1]

😔 Not enough valorization of “negative” results or source codes/case studies

😱 Issue with scientific reproducibility (analyses, experiments) [27]


Point of view

  • Need for renewal regarding scientific research implementation
  • Need for higher standard regarding result publications

Philosophy


Scientific perimeter

Promote contributions in stat/ML that provide insight into which models/methods are appropriate to address a scientific question


Open access

  • “Diamond” open access (free to publish and free to read, possible to reuse)
  • Content published under CC-BY license (attribution, share, adapt)
  • Reviews and discussions available after acceptance for publication (anonymous reviews)


Reproducible

Numerical reproducibility is a necessary condition (Source code and data should be available)

What is reproducible research?

Fundamentally, it provides three things:

Tools to reproduce the results (that’s like cooking)

A “recipe” to reproduce the results (still like cooking)

A path to understanding the results and the process that led to them (unlike cooking…1)

Why reproducible research? [810]


Why reproducing scientific results?

  • To strengthen their credibility
  • To check for errors (everyone makes error at some point!!!)
  • To build new research upon them (science is incremental)

Issues?

  • Reproduce numerical scientific results is often difficult (technology/environment evolution, source code/environment configuration/software partially available or not available)
  • Waste of time and resources to reproduce existing non-reproducible results

Reproducible research?

  • For others but also for your future self
  • Improve result credibility
  • Facilitate future research works

Functioning of reproducibility at Computo

Writing system

Notebook and literate programming
text (markdown) + math (\(\LaTeX\)) + code (Python/R/Julia), references (bib\(\TeX\))

Publication system

Environment management, Compilation, Multi-format publication (pdf, html)
Continuous integration/Continuous deployment (CI/CD)

Reviewing system

  • Anonymous exchange published after acceptance
  • Reviewer pool (you can join)
  • [Ongoing switch from Open review to PCI]

Solutions/Prototype

Reproducible article and computations

Automatic editorial reproducibility

Scientific validation

Tools for authors

Document model

Document template

Git template repository

with template notebook document + doc + pre-configured compilation and publication setup

Locally

  • Text editor/IDE (VS Code, Rstudio, NeoVim, etc.)
  • Quarto (compilation)
  • Julia / R / Python code + computations
  • git versioning system

Author point of view (1/3)


Step 0: setup a git repository for your article

Startup from a template repository (R, Python, Julia)

Tip

You can host your git repository on github and soon an any gitlab forge1.


Step 1: write your article

Let’s go, locally (same spirit as Jupyter/Rmarkdown notebooks)

Author point of view (2/3)

Step 2: configure the environment (dependencies management)

venv: use a virtual environment and generate the requirements.txt file

# requirements.txt
jupyter
matplotlib
numpy

renv: generate the renv.lock file

renv::init()
renv::install("ggplot2")
# or equivalently install.packages("ggplot2")
renv::snapshot()

Pkg: native Julia package manager (with generated Project.toml et Manifest.toml files)

add Plots
add IJulia

Configuration file versionned and used during CI compilation/publication action

Step 3: (re)production

A git push command will trigger your article compilation (including computations) and publication as a github page1

See the preconfigured .github/workflows/build.yml file for the github action configuration2

Author point of view (3/3)


Step 4: submit your article

If the CI process succeeds, both HTML and PDF versions are published on the github-page associated to the repository



Open review

https://openreview.net/group?id=Computo

Submit:

  • your article PDF (scientific content review)
  • your git repository (source code and reproducibility review)

Editor point of view

After a “traditionnal” review process, a 3 step procedure:

  1. Acceptance
  2. Pre-production
  3. Publication in Computo (with a DOI)

including

  • Copy of the author git repository to https://github.com/computorg/
  • Final version formatting
  • Review report publication
  • Registration in the journal bibliographic data base
  • Copy of the repository to Software Heritage for archiving
  • Publication of the article on the journal website

3 year report


🥲 Fully operational + doi, ISSN

🙂 15 published articles articles, 5 under review (more details here)

🙂 x presentations (Montpellier, Toronto, Humastica, Grenoble, RR2023, etc.)

🙂 French reproducible research network

🤯 Difficult to find reviewers

🙂 Referencing and Visibility: Mir@bel, Open Policy/Sherpa Romeo -> DOAJ

🤯 Google Scholar: dark black box

🤔 How to build on institutional support?

🤔 Changing of practices in the scientific community?


Discussion, Perspectives


  • Co-lead a stat/ML Peer Community-In (PCI)1 for the reviewing process, to expand our visibility
  • Provision of computing resources (to be able to run all computations)
  • Improve long-term reproducibility stack
  • github: dynamic, large user community but not institutional and limited computing resources. Switch to a French institutional gitlab forge?

How to help?

  • By submitting2 your work!
  • By becoming reviewer3

References

1.
Hanson, M A, Barreiro, P G, Crosetto, P, and Brockington, D 2023 The strain on scientific publishing. DOI: https://doi.org/10.48550/arXiv.2309.15884
2.
Ioannidis, J P A 2005 Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. DOI: https://doi.org/10.1371/journal.pmed.0020124
3.
Steen, R G 2011 Retractions in the scientific literature: Is the incidence of research fraud increasing? Journal of Medical Ethics, 37(4): 249–253. DOI: https://doi.org/10.1136/jme.2010.040923
4.
Allison, D B, Brown, A W, George, B J, and Kaiser, K A 2016 Reproducibility: A tragedy of errors. Nature, 530(7588): 27–29. DOI: https://doi.org/10.1038/530027a
5.
Bastian, H 2016 Reproducibility Crisis Timeline: Milestones in Tackling Research Reliability. URL https://absolutelymaybe.plos.org/2016/12/05/reproducibility-crisis-timeline-milestones-in-tackling-research-reliability/. [Online; accessed 22-March-2023]
6.
Whitfield, J 2021 Replication Crisis. London Review of Books, 43(19). URL https://www.lrb.co.uk/the-paper/v43/n19/john-whitfield/replication-crisis. [Online; accessed 22-March-2023]
7.
Hernández, J A and Colom, M 2023 Repeatability, Reproducibility, Replicability, Reusability (4R) in JournalsPolicies and Software/Data Management in Scientific Publications: A Survey, Discussion, and Perspectives. URL https://hal.science/hal-04322522. [Online; accessed 4-January-2024]
8.
Desquilbet, L L, Granger, S, Hejblum, B, Legrand, A, Pernot, P, Rougier, N P, Castro Guerra, E de, Courbin-Coulaud, M, Duvaux, L, Gravier, P, Le Campion, G, Roux, S, and Santos, F 2019 Vers une recherche reproductible. Unité régionale de formation à l’information scientifique et technique de Bordeaux. URL https://hal.science/hal-02144142
9.
Hejblum, B P, Kunzmann, K, Lavagnini, E, Hutchinson, A, Robertson, D, Jones, S, and Eckes-Shephard, A 2020 Realistic and Robust Reproducible Research for Biostatistics. DOI: https://doi.org/10.20944/preprints202006.0002.v1
10.
The Turing Way Community 2022 The Turing Way: A handbook for reproducible, ethical and collaborative research. DOI: https://doi.org/10.5281/zenodo.7625728
11.
Knuth, D E 1984 Literate programming. The Computer Journal, 27(2): 97–111.

Note on literate programming


  • Literate programming [11]: notebook including text and code
  • Markup formatting language: e.g. markdown
  • Separate content from rendering (≠ “what you see is what you get” editors)
  • Rendering includes text, code and results (from code computations)


Note on quarto


https://quarto.org


  • Generalization of Rmarkdown
  • Relying on top community tools like universal document converter Pandoc
  • Developed and supported by RStudio/Posit
  • Native support of complex documents (website, articles, books) and multiple languages for computations (R, Python, Julia)
  • Management of references, citations, figures, tables, metadata, etc.

Note on continuous integration

  • Implementation in git forges (e.g. github actions or gitlab CI/CD)
  • Triggered by commits
  • Automatic tests
  • Automatic deployment: package/software publication, website

Credit: Pratik89Roy CC-BY-SA-4.0 from Wikimedia

Two-fold reproducibility

The global scientific workflow of a reproducible process split into two types of steps

1. External

Process to obtain (intermediate) results outside of the notebook environment, for a list of reasons (non-exclusive to each other):

  • the process is too long to be conducted in a notebook
  • the data to be processed is too big to be handled directly in the notebook
  • it needs a specific environment (e.g. a cluster, with gpus, etc.)
  • it needs to involve specific languages (e.g. C, C++, Fortran, etc.) or build tools (e.g. make, cmake, etc.)

2. Editorial

Notebook rendering with the results of the external process

Requirement

If the notebook contains everything to produce the final document

\(\Rightarrow\) “Direct reproducibility” in the sense that the notebook is the only thing needed to reproduce the results.

Ultimately, the workflow must end with a direct reproducibility step which concludes the whole process.

Other Reproducibility considerations

Data transfer
When applicable, the switch from external to editorial reproducibility is done with a “data transfer” step,

data produced by the external process \(\Rightarrow\) transferred to the notebook environment.

Requirement

Not only the intermediate results are provided, but also the code to transfer it in the notebook environment.

There are a variety of software solutions to do so.

Examples of data transfer solutions

Intermediate results storage

  • Python: joblib.Memory, caching mechanism for python functions, save the results of a function call to disk, and load it back later.
  • R : .RData file format, can be loaded back in R with the load() function.
  • If results are small enough, storing them in a text file (e.g. .csv, .tsv, .json, etc.) is also a solution.

Transfer of the results to the notebook environment

  • (.joblib directory or .Rdata file) could be committed to the git repository, and directly loaded in the notebook environment.
  • Alternative, centralize input data (when large enough) and intermediate results on a shared scientific provider (we recommend Zenodo for this purpose), and download them in the notebook environment.