class: center, middle, bg_title, hide-count <img src="img/UniPD.png" width="150px"/> <img src="img/DSCTV.png" width="50px"/> <img src="img/UBEP.png" width="50px"/> <img src="img/LAIMS.png" width="50px"/> <style type="text/css"> .left-code { color: #777; width: 38%; height: 92%; float: left; } .right-code { color: #777; width: 55%; height: 92%; float: right; padding-top: 0.5em; } .left-plot { width: 43%; float: left; } .right-plot { width: 60%; float: right; } .hide-count .remark-slide-number { display: none; } .bg_title { position: relative; z-index: 1; } .bg_title::before { content: ""; background-image: url('img/bg1.png'); background-size: contain; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; opacity: 0.3; z-index: -1; } </style> <br> <br> <br> # Practical **.orange[Artificial Intelligence] for **Medical Data Analyses** with **.orange[R]<br>Advanced ## **Efficient Reproducibility in R**<br>**.orange[A Journey from Patchworks to Projects]** .orange[Biostatistics Month] 2025/05/14 [.orange[Corrado Lanera]](mailto:CorradoLanera@ubep.unipd.it) | [**Unit of Biostatistics, Epidemiology, and Public Health**](https://www.unipd-ubep.it/) --- class: inverse, bottom, right, hide-count <img src="img/profilo_CL.jpg" width="50%" /> # Find me at... [
](https://www.unipd-ubep.it/) [**www.unipd-ubep.it**](https://www.unipd-ubep.it/) [
](mailto:Corrado.Lanera@ubep.unipd.it) [**Corrado.Lanera**__.orange[@ubep.unipd.it]__](mailto:Corrado.Lanera@ubep.unipd.it) [
](https://github.com/corradolanera) [
](https://twitter.com/corradolanera) [
](https://telegram.me/CorradoLanera) **@CorradoLanera** [
](https://github.com/UBESP-DCTV) **@UBESP-DCTV** --- # How to .orange[follow] the workshops #### Resources: - .orange[Slides]: [https://corradolanera.github.io/ws-reproj/](https://corradolanera.github.io/ws-reproj/#1) #### Coding: 1. If you like it **.orange[local]**: code **on your system** - At any stage you can:<br> `usethis::use_course("CorradoLanera/<stage name>")`: - `<stage name>`s: - `bare.proj.01` - `foldered.02` - `changes.dry.03` - `structure.04` - `private.05` - `targets.06` 3. If you like it **.orange[remote]**: code (free of charge) **on Posit Cloud Workspace**:<br> [https://bit.ly/positcloud-ws-reproj](https://bit.ly/positcloud-ws-reproj) <small> .right[Whole .orange[project]: [https://github.com/CorradoLanera/ws-reproj](https://github.com/CorradoLanera/ws-reproj) ] </small> --- class: hide-count # .orange[Outline] - **.underline[Intro]**: definition of **.orange[AS]**AP, the anatomy of a **.orange[patchwork]** - **.underline[Program]**: start **.orange[slow]**, do not waste **.orange[_time_]** - **.underline[Step-by-step]**: - Setup an **R**/**RStudio** project: git, `{renv}`, and `{here}`. - Are you ready to **.orange[share]** and **.orange[collaborate]**? (welcome **GitHub**) - What about **.orange[folders]**? - What about **.orange[changes]**? + Data / **objects** (versions?) + Code / **Functions** (isolation, abstractions?) + **Bugs** (tests?) - Please, give me some **.orange[structure]**! The magic of the `DESCRIPTION` file - Oops! I have internal (**.orange[private]**) data for my (**.orange[public]**) project! - working with `{targets}` - **.underline[All-in-one template]**: welcome [**`{laims.analysis}`**](https://github.com/UBESP-DCTV/laims.analysis) - **.underline[Outro]**: definition of AS**.orange[AP]**, the anatomy of a **.orange[project]** --- class: inverse, middle, center, hide-count # Intro: **.orange[AS]AP** **Disclaimer**: ALL of the following is derived by personal experience, **.orange[failures]**, and lessons learnt in the last 11 years at UBEP as a biostatistician and data scientist, coding in R quite every single night 🌉. --- # .orange[AS]AP What usually happens... .orange[As Soon] As Possible - get the raw-data, manipulate them to .orange[directly fix easy-bugs] - .orange[do the analysis] -- - get updated data, overwrite previous, directly manipulate them to fix easy bugs (...some more **added**? some of them **forgotten**?) - update (overwrite) the analysis - add a copy **`analysis_v2.R`** of the script (just to be sure) -- - find a bug, fix it and save third **`analysis_v2_ok.R`** version of analysis - Searching for a faster execution, add an additional **`analysis_v2_ok_faster.R`** faster version (surely identical, console-checked!) to save full 15 seconds (after 2h of try-and-see coding experiments) -- - > **send** the (hopefully) correct report 3 minutes **.underline[before]** **.orange[_by yesterday_!!!]** -- - > get a .orange[**BUG** reported by the **BOSS**!] (... it's a **.orange[_B😨GSS_]** 🤦ðŸ˜) -- - But now we can not give up: produce a new version night-time (**.orange[without falling asleep]**) - **.orange[AS]AP.orange[AS]AP.orange[AS]AP.orange[AS]AP** -- .center[_... many new-files, new-versions, new-names, new-plots, new-fixes later..._] -- <br> .center[**SENT!**] --- class: inverse, middle, center, hide-count ... -- ... -- **.orange[_B😨GSS_]** 🤦😠-- The project is then passed to a new analyst ... **.orange[AS]AP.orange[AS]AP.orange[AS]AP.orange[AS]AP** ...<br>**.orange[ASAY]** (**.orange[A]s .orange[S]oon .orange[A]s .orange[Y]ESTERDAY**) -- But they can't understand **ANYTHING** of the work already done...! So they prefer to **RESTART** from scratch<br>(adding sub-folders, sub-files, sub-version, sub-plots, sub-fixes to the project!) <br> **.orange[DONE!]**, and before dinner! 😎😎😎 -- ... -- ... -- **.orange[_B😱GSS_]** 🤦😠--- class: inverse, middle, center, hide-count #### BOSS: "I would have spend less time if did it by myself!" "Ok, I will do it!" "mmm 🤔 ... which are the original/last data?!" ... -- ... -- ... -- <img src="img/gpt_help_me.png" width="100%"/> 😨🤦😠-- .orange[**THE END.**] --- class: inverse, middle, center, hide-count Go Tidyverse (but...) <img src="img/data-science.png" width="100%"/> [tidyverse.tidyverse.org/articles/paper.html](https://tidyverse.tidyverse.org/articles/paper.html) --- class: middle, center <img src="img/tidy-red.png" width="100%"/> --- class: middle, center <img src="img/time-red.png" width="100%"/> --- class: inverse, middle, center, hide-count # .orange[Don't waste **time**] --- # .orange[Whose] **time** <small><small>you don't have to waste</small></small>?! - Your time?! - My time?! - Boss' time?! -- <br> **.center[.orange[Project time] = ]** `$$\sum_{i = 1}^{\#\ correct\ version!}\left(\sum_{p\in team}\sum_{t\in tasks} time_i(by = p, for = t, include\_breaks = TRUE)\right)$$` -- <br> <br> .center[ **AS.orange[AP]** `\(:= argmin(Project\ time)\)` (**.orange[CL] personal** definition!) ] --- class: inverse, middle, center, hide-count # .orange[Start **slow**] --- # **If** you need it, .orange[do it]! - Don't do things you don't need > **you will .orange[waste] time**! - Be prepared to add features/components > **you will .orange[save] time**! -- <br> <br> <br> <br> In the following step-by-step project development, we aim to show how to add features accordingly to sequential requirements. - change **.orange[the order]** at your convenience - change **.orange[the tools]** at your convenience (or as they will improve/change) - don't implement stages you don't need (until you need them...!) > This won't be a workshop about the tools we will show. It will be a workshop about the processes, the structures, and the motivations in using them. --- class: inverse, middle, center, hide-count # **.orange[Step**-by-**step]** --- # Create a project We start from just a bare project > create new RStudio project (including git and {renv} ticks) > `usethis::use_course("CorradoLanera/bare.proj.01")` > Posit Cloud: 01 - Bare project -- - let's talk about `{here}` (and a `00-setup.R` script) -- - let's talk about RStudio projects and `{renv}` -- - let's talk about git --- # Start some analysis - analysis.R - report.rmd > Where do we put input data and output files? How do we reach them? --- class: inverse, middle, center, hide-count .orange[Live session] Start from: `usethis::use_course("CorradoLanera/bare.proj.01")` Posit Cloud: 01 - Bare project --- # Changes? .orange[DRY]: Do Not Repeat Yourself! > What if some input/output file changed? > What about repeated code? > What if you find some bugs? --- class: inverse, middle, center, hide-count .orange[Live session] Start from: `usethis::use_course("CorradoLanera/foldered.02")` Posit Cloud: 02 - Foldered --- # All this code .orange[together] is really really HORRIBLE and .orange[confusing]! > what about a clear structure? > what about error-prone repeated code? --- class: inverse, middle, center, hide-count .orange[Live session] Start from: `usethis::use_course("CorradoLanera/changes.dry.03")` Posit Cloud: 03 - Changes & DRY --- # Data Protection... > What if I have private data (in a centralized folder, and visible by colleagues only)? > What if I would like to share the code (but not the data)? --- class: inverse, middle, center, hide-count .orange[Live session] Start from: `usethis::use_course("CorradoLanera/structure.04")` Posit Cloud: 04 - structure --- # Clear pipeline and automation + automatic discovering and saving of intermediate/final computed objects + automatic discovering of functions' and objects' dependencies + automatic update of all the outdated dependencies for updates (and only those) > let's talk about `{targets}`: --- class: inverse, middle, center, hide-count .orange[Live session] Start from: `usethis::use_course("CorradoLanera/private.05")` Posit Cloud: 05 - private --- class: inverse, middle, center, hide-count .orange[Final result] Take as reference: `usethis::use_course("CorradoLanera/targets.06")` or Posit Cloud: 06 - targets --- # A comprehensive .orange[template] [https://github.com/UBESP-DCTV/laims.analysis](https://github.com/UBESP-DCTV/laims.analysis) <small>- _You will need a (free) GitHub account_</small> All explored features, plus: - an **.orange[`explore.R`]** script specifically designed to host completely **unruled** exploratory code, i.e., ugly, un-styled, un-commented, un-usual, un-working, temporary code (which can be easily deleted time-after-time, maintaining their evidence and lesson-learnt thanks to git!) - **.orange[`targets` objects directly usable]**: on computations, reports, and tests (maintaining their privacy if shared storage system is implemented) - a **.orange[qmd report template]**: including logos, authorship with OrcID, citations, marginal notes, custom size for figures, ... (and more as powered by Quarto<sup>*</sup>) - **.orange[test coverage]**: _**Percentage** of "**do-something**" code tested._ - **.orange[lint checks]**: _How **good** is your code "**statically**"?_ - **.orange[GitHub Action]**: _**automatic** execution of tests/lint/CRAN checks at **every push**!_ - **.orange[R-package structure]**: _make functions available as a **ready-to-use** r-package._ <small> .footnote[[*] We didn't talk about Quarto today, if interested see: [https://quarto.org/](https://quarto.org/), and [https://books.ropensci.org/targets/literate-programming.html#quarto-targets](https://books.ropensci.org/targets/literate-programming.html#quarto-targets).] </small> --- class: inverse, middle, center, hide-count # Outro: **AS.orange[AP]** --- # AS.orange[AP] As Soon .orange[As Possible] - Start .orange[easy] and .orange[slow] but keep your eyes on your final results! - If your results are not correct, all your _high-speed_ goes down to zero! - **Bugs will be there**. For sure! <br>The greatest error you can do is to assume your code is going to be error-free - Do/check .orange[only] what you really need to do/check - Enforce .orange[standards] and a coherent .orange[style] - Use **templates**: .orange[Conventions] are all about making repetitive stuff **.orange[invisible]**! --- # Technical recap - Track + changes (git) + environment (`{renv}`) - Use portable paths (`{here}`) - Use **standard** folder and files (keep structure .orange[invisible]) - .orange[Refactor] your code to extract logic by progressive layers of abstraction **DRY** - Use `DESCRIPTION` to activate R/RStudio super-powers (functions, tests, ...) - Environmental vars (from function call, `.gitignore`d) for private locations - Let `{targets}` manage project dependencies, efficient runs, and efficient storage of your objects - Use GitHub (or similar) to share code (not data nor secrets!) with collaborators, journals, public, ...! .center[**If you know you need all of above, use templates! Like .orange[`laims.analysis`].**] -- .center[ Go for AS**.orange[AP]** `\(:= argmin(Project\ time)\)` !! ] --- class: inverse, center, middle, hide-count .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .left[ _**.orange[The best]** is the enemy of the good!_ <br> .center[... and ...] <br> .right[_The only way to go fast is to **.orange[go well!]**_] ] .tr[ — Voltaire + Uncle Bob ] ] # Thank .orange[you] for the attention! Slides: [https://CorradoLanera.github.io/ws-reproj](https://CorradoLanera.github.io/ws-reproj/#1) Repo: [https://github.com/CorradoLanera/ws-reproj](https://github.com/CorradoLanera/ws-reproj) <br> [
](https://www.unipd-ubep.it/) [**www.unipd-ubep.it**](https://www.unipd-ubep.it/) | [
](https://github.com/UBESP-DCTV) **@UBESP-DCTV** [
](mailto:Corrado.Lanera@ubep.unipd.it) [**.orange[Corrado.Lanera]@ubep.unipd.it**](mailto:Corrado.Lanera@ubep.unipd.it) | [
](https://github.com/corradolanera) [
](https://twitter.com/corradolanera) [
](https://telegram.me/CorradoLanera) **@CorradoLanera**