Machine Learning Overview

class: center, middle, bg_title, hide-count

.bg_title {
  position: relative;
  z-index: 1;
}

.bg_title::before {
      content: "";
      background-image: url('img/bg1.png');
      background-size: contain;
      position: absolute;
      top: 0px;
      right: 0px;
      bottom: 0px;
      left: 0px;
      opacity: 0.3;
      z-index: -1;
}

</style>

# **Machine Learning** .orange[and] **AI**<br>**.orange[Overview]**

From **Basics** to **.orange[Agents]**
<br>
<br>
<br>

Corrado Lanera | **Unit of Biostatistics, Epidemiology, and Public Health**

---
class: inverse, bottom, right, hide-count

<img src="img/profilo_CL.jpg" width="50%" />
# Find me at...

---
class: inverse, hide-count
# What we are going to see

Purpose of this class is to introducing **.orange[what is]** Machine Learning, what are the techniques involved, how they work and how to understand (and **.orange[trust]**!) their results; we will also look at **.orange[some example]** and **.orange[best practice]** in conducing a machine learning project.

The class will cover from simple _classical_ techniques to large language models, introducing agents.

My principal aim is to give you the tools to start **.orange[understanding]** and **.orange[evaluating]** the quality of a machine learning project when used for clinical purposes, possibly regardless of its complexity.

**.orange[Disclaimer]: Today I do not show you any code 😇**

---
class: inverse, middle
# .center[**.orange[Overview]**]

- **Introduction**: what does it means "Machine Learning"?

- **Classifiers**

- **MLT Examples**

- Model **.orange[selection]** and **evaluation**

- **Neural Networks** and **.orange[Deep] Learning**

- **.orange[Unstructured] data** (e.g., images, text)

- **Large** **.orange[Language Models]**, ChatGPT, and Agents

- **Best practices** for implementing Machine Learning

---
class: inverse, middle, center, hide-count
# .orange[Introduction]

What does it means "Machine Learning"?

---
# .orange[What is **Machine Learning**]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[
  .left[
    Machine Learning deals with the study, the design and the development of algorithms that give computers the capability to learn without being **explicitly** programmed.
  ]

.tr[
    — Arthur Samuel, 1959
  ]
]

---
# .orange[What is **Machine Learning**]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[
  .left[
    A computer program is said to learn from **experience** (E) with respect to some class of **tasks** (T) and performance measure  (P), if its **performance** at the given task improves with **experience**
  ]

.tr[
    — _Machine Learning_ - Mitchell, 1997
  ]
]

.pull-left[

<img src="img/Tom-Mitchell-2.webp" width="100%" style="display: block; margin: auto;" />
]
.pull-right[

<br>
<br>
<br>
**.orange[Learning]**: performance on **T** as measured by **P** improves with **E**.

]

---
# .orange[What is **Machine Learning**]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[
  .left[
    A task (red box) requires an appropriate mapping - a model- from data described by feature to outputs. **Obtaining** such mapping from training data is what constitute a **learning problem** (blue box).
  ]

.tr[
    — _Machine Learning_ - Peter Flach, 2012
  ]
]

.left-column[
<img src="img/PeterCartoon-square.jpg" width="100%" style="display: block; margin: auto;" />
]
.right-column[
<img src="img/flach_learning_problem.png" width="100%" />
]

---
# .orange[Task]

A __task__ is something the ML must carry out

- the process of learning itself is not the task

- _Learning_ is .orange[the act of generate models] having the ability to perform the task

A __task__ is defined in terms of _how the ML should process a collection of Examples_, i.e. a .orange[dataset].

---
# .orange[Task: Example]

<small>

**Learning Problem** | Task **T**
---------------------|------------
Learning Checkers    | **Playing checkers**
Handwriting recognition | **Recognizing/Classifying handwritten words/numbers within images**
Self-driving car | **Driving from A to B**
Diseases extraction from EHR | **classify EHR by the disease reported (in free-text natural language)**
Describe patient movement in bed | **At any given time provide position and dynamics of patients**

</small>

---
# .orange[Performance]

A __performance__ is a quantitative measure for assessing the ability of ML

- performance is measured on the task being carried out

Usually __performance__ is measured in terms of:

- .orange[_accuracy_]: proportion of examples for which the model produces the correct output.

- .orange[_error rate_]: proportion of examples for which the model produces the incorrect output.

#### .orange[**WARNING**: unbalanced data require balanced metrics!]

---
# .orange[Performance: Example]

<small>

**Learning Problem** | Task **T** | Performance **P**
---------------------|------------|-------------------
Learning Checkers    | Playing checkers | **% games won **
Handwriting recognition | Recognizing/Classifying handwritten words/numbers within images | **% correctly classified words **
Self-driving car | Driving from A to B | **Average distance traveled before an error (as judged by humans)**
Diseases extraction from EHR | classify EHR by the disease reported (in free-text natural language) | **% of EHR correctly classified**
Describe patient movement in bed | At any given time provide position and dynamics of patients | **% average error in position or dynamics**

</small>

---
#.orange[Experience]

.orange[__Experience__] is primarily determined by the amount of supervision during the learning process and the availability of labeled data

---
#.orange[Experience: Example]

<small>

**Learning Problem** | Task **T** | Performance **P** | Experience **E**
---------------------|------------|-------------------|-----------------
Learning Checkers    | Playing checkers | % games won | **playing (against itself)**
Handwriting recognition | Recognizing/Classifying handwritten words/numbers within images | % correctly classified words | **process data sets of handwritten words with given classification**
Self-driving car | Driving from A to B | Average distance traveled before an error (as judged by humans) | **Sequence of videos, still images, and steering commands recorded while observing a human driver**
Diseases extraction from EHR | classify EHR by the disease reported (in free-text natural language) | % of EHR correctly classified | **process EHRs with given classification**
Describe patient movement in bed | At any given time provide position and dynamics of patients | % average error in position or dynamics | **time-series of patient kinetic measures taken from wearable devices and bed weight sensors, and position and dynamics collected by videos recorded observing inbed patients**

</small>

---
# .orange[How do Machines **learn**?]

Machine learning is concerned with finding functions that **_best_ predict** outputs (responses), given data inputs (predictors)

.pull-left[
`$$Y \simeq f(X)$$`

.orange[_Learners_] are algorithms that improve their skills (in producing better models/functions) by learning from old/known .orange[__(training)__] data.

]

.pull-right[
<img src="img/mlt_loop.png" width="10017
%" />
]

> A .orange[_learner_] uses data and experience to perform better over time (i.e., producing new models that performs better than the previous ones)

---
# .orange[How do Machines **learn**?]

.left-code[
In traditional programming:
- **provide** an algorithm (a finite set of instructions)
- **provide** .orange[input] data (no training/new distinction)
- **obtain** the desired result.

In machine learning:
- **provide** the .orange[training] input (data)
- **provide** the .orange[training] known/desired result
- **obtain** the .orange[learning algorithm] (ingesting _new_ data and returning _new_ outputs).

Machine Learning problems are .orange[*optimisation*] ones.

]

.right-plot[

<img src="img/MLvsTrad.png" width="100%" style="display: block; margin: auto;" />
]

---
# Types of .orange[learning]

- **.orange[Unsupervised] learning**: The input data is _not labeled_ (there are not right answers!)

Data is given to the model, which is left to learn optimal .orange[patterns]/.orange[clusters].

- **(Passive) .orange[Supervised] learning**: The learning algorithm is provided with a set of inputs along with the corresponding .orange[correct] outputs.

The algorithm compares its current inferred output with the correct one to learn from its .orange[errors] (i.e., to minimize it).

- **.orange[Active] learning**: The learning algorithm .orange[interactively] queries a user (the _oracle_) to label new data with the desired (correct) outputs.

With input data labeled on the fly by .orange[oracle's knowledge], the model cycles query/train stages on the left unlabeled data.

- **.orange[Reinforcement] learning (RL)**: The learning algorithm (an _agent_) interacts with an environment by performing actions in given states and then receiving rewards (or penalties).

Learning a policy (state → action mapping) that optimally balances (maximizing) short-term and long-term gains.

---
# .orange[Basic **components**]

.orange[**Training** dataset]: data used as input to the learner to train the model

.orange[**Validation** dataset]: data used by the learner for validation and optimization

.orange[Training model]: the ML artifact that comes out of the training process

.orange[Cost (or loss) function]: a function to optimize in the ML system (e.g., sum of squared errors over the training data set)

.orange[**Test** dataset]: data provided to the trained model for performance estimations

---
class: inverse, middle, center, hide-count
# .orange[Classifiers]

---
class: middle center hide-count

---

# .orange[Classifiers] and .orange[regressors].

---
# .orange[Classification]

.left-code[
__Feature space__

- data: points in `$\mathbb{R}^d$`

- dimensions:  scalar measurements

<br>

__Classifier functions (_classifiers_)__

- a classifier for `$K$` classes is a function

$$
f:\mathbb{R}^d \to \{1, \ldots, K \}
$$
- classifiers carve up the space into regions
]

.right-plot[

]

---
# .orange[Quantifying errors]

If `$f$` is our classifiers, i.e. for any given `$x$` we have `$f(x) = \tilde{y}$` is the predicted class (with `$y$` the real one), then

**Loss function** (for .orange[K] classes `$\{1, \cdots, K\}$`):

`$$\begin{split}\mathfrak{L}: \{1, \cdots, K\}&\times \{1, \cdots, K\} &\to [0, \infty)\\ (f(x)&,\ y) &\mapsto \mathfrak{L} (f(x),\ y)\end{split}$$`

<br>

> If all mistakes are equally bad:
`$$\mathfrak{L}(i, j) = \begin{cases}
1 & \textrm {if } i\neq j  \\
0 & \textrm {if } i = j  \\
\end{cases}$$`

Note: if the outcome of `$f$` is a set of probabilities for the classes, the same definition is valid considering
`$$\mathfrak{L}: \{p_1, \cdots, p_K\} \times \{1, \cdots, K\} \to [0, \infty)$$`

---
# .orange[Risk of classifier]

If the distribution of the classes is known, the .orange[risk] of classifier is the expected loss

$$
{\rm risk}(f) = \mathbb{E}[\mathfrak{L}(f(X), {\rm true\ class\ of\ } X)]
$$

We can evaluate the classifier by how large its risk is

<br>

> The best way to possibly train a classifier is by **minimizing its risk**

> N.B. The above one is a sort of common **_misuse of notation_**: a classifier is **.orange[a (single!) model]**, and **cannot be trained**!! when we say "train a model" the real meaning is
.center[**"a MLT start producing models, somehow; new ones replacing the prevouses"**.]
Interpreting "replacement" like "modification", we can pretend that a (**.orange[single]**...) model is trained.

---
class: inverse, middle, center, hide-count
# .orange[Main MLT examples]

---
# .orange[Nearest neighbor]

.orange[__Idea:__] use training data itself as classifier

- Given: data point `$x$`

- Find training data point closest to `$x$`

- Assign `$x$` the label of closest point

---
#.orange[Nearest neighbor]

---
##.orange[Nearest neighbor (100 data points)]

---
# .orange[k-Nearest] Neighbor (kNN)

- Find `$k$` closest training points

- Take a majority vote between these points

> .orange[__Rule of thumb:__] 3NN often works surprisingly well

---
##.orange[k-Nearest neighbor]

<div class="figure" style="text-align: center">
<img src="img/knn_k.png" alt="https://medium.com/analytics-vidhya/diabetes-classification-with-knn-and-logistic-regression-2edd3760a8c7" width="60%" />
<p class="caption">https://medium.com/analytics-vidhya/diabetes-classification-with-knn-and-logistic-regression-2edd3760a8c7</p>
</div>

---
#.orange[kNN: drawbacks]

In large dataset, finding nearest data points is expensive

Computational burden grows with dimension

> it is the method of choice when dataset is small

__What to do for large dataset:__

- Extract a concise summary

---
# .orange[Linear classifiers]

---
#.orange[Linear classifiers: Support-Vector Machine]

A maximum margin classifier is called __Support-Vector Machine__

.footnote[An hyperplane is a **subspace of co-dimension = 1** of a space (i.e. 1 dimension less then the original space).

E.g., a line (1D) in a plane (2D); a plane (2D) in a 3D space; a 23D subspace in a 24D space.]

---
# .orange[Limitations of linear classifiers]

.pull-left[

Problem 1: **curved optimal decision boundary**

- SVM solves Problem 1 using the so-called .orange[_kernel trick_]

<br>

Problem 2: **classes may overlap**

- SVM solve Problem 2 by:

- permitting .orange[misclassified] training points (**C** hyper-parameter)

- each such point contributes a .orange[_cost_] to the optimization target function

- using the .orange[kernel tricks]

]

.pull-right[
<img src="img/classification_3.png" width="100%" style="display: block; margin: auto;" />
]

---
# .orange[Example of the kernel trick]

Suppose you have .orange[non-linearly separable] data

> Accuracy of classification given by the linear classifier: .orange[75%]

---
# .orange[Example of the kernel trick]

Project it into a three-dimensional space where the new coordinates are

.left-column[
`$$\begin{cases}
X_1 &= y_1^2 \\
X_2 &= y_2^2 \\
X_3 &= \sqrt{2}y_1y_2
\end{cases}$$`
]

.right-column[
<img src="img/kernel.gif" width="90%" style="display: block; margin: auto;" />
]

---
# .orange[Example of the kernel trick]

Run the SVM on the trasformed data

.right-column[
<img src="img/kernel_2.gif" width="90%" style="display: block; margin: auto;" />
]

---
# .orange[Example of the kernel trick]

Now you got completely _linearly_ separable data

> Accuracy of classification given by the SVM classifier: .orange[100%]

---
# .orange[Ensemble classifiers]

__Weak classifier__

Consider two classes of equal size: assign class by coin flip: 50% expected error

> weak classifier: .orange[error rate **slightly below** 50%]

__Ensemble Classifier__
- trains .orange[many _weak_] classifiers

- .orange[combines results] by majority vote

If weak classifiers are applicable to `$k>2$` classes, so it is ensemble.

**Important examples: .orange[Random Forests]**

---
# .orange[Classification by majority vote]

`$m$` classifiers take a vote

> let us suppose `$m$` is an odd number

Two choices:
- correct = `$1$`
- wrong = `$-1$`

Decision is made by simple majority

- for two classes and classifiers `$f_1,\ldots ,f_m$` with output `$\pm1$`
majority vote at input `$x$` is
`$$\rm sgn \left( \sum_{j=1}^m f_j(x)\right)$$`

---
## .orange[Classification by majority vote]

---
# .orange[_Weak_ learner: tree classifier]

---
# .orange[_Weak_ learner: tree classifier]

---
# .orange[_Weak_ learner: tree classifier]

---
# .orange[_Weak_ learner: tree classifier]

---
# .orange[_Weak_ learner: tree classifier]

.pull-left[
<img src="img/tree_4.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="img/tree_5.png" width="100%" style="display: block; margin: auto;" />
]

---
# .orange[Random forest]

.orange[Tree training]: Input `$n$` training points of classes `$1,\ldots, K$`

- select `$n$` points uniformly at random with replacement

- train a tree on the randomized data set

.orange[For each tree]:

- in each step, select `$m$` axes at random (**mtry** hyper-parameter)

- compute best split point for each of these axes

- split along the one that minimizes error

> .orange[Train **ntree** trees in total]
>  - compute class label of new point `$x$` under each of the **ntree** trees
>  - take majority vote

---
class: inverse, middle, center, hide-count
# .orange[Model selection]

---
# .orange[Overfitting]

Sample data acts as proxy for underlying data source

.orange[_Over-fitting_] means adapting too closely to the idiosyncrasies of a sample set

**Result**: Small error on training data but .orange[poor predictive performance]!

---
# .orange[Overfitting]

Model is .orange[not able to generalize]

Learn the data and .orange[not the underlying function]

Performs well on training data but .orange[poorly with new data]

---
# .orange[Overfitting: example]

Two alternative models of human papillomavirus infection and its progression to cervical cancer (CIN)

The complex model includes multiple stages of pre-cancerous lesions which can progress or regress at different rates (model parameters)

<br>

---
# .orange[Overfitting ]

Prevalence data of (CIN) generated using more complex model over a 30-year period among a fictional cohort of young women

The complex model (in green) actually has a better .orange[_fit_] to the early prevalence data (solid red dots) than does the simpler model (in blue)...

However, the complex model produced a pattern that poorly forecasts future prevalence (hollow red dots)

---
# .orange[Overfitting]

**.orange[Every]** additional parameter in the model introduces **new sources of uncertainty** and potential to affect results in non-intuitive ways that may either be useful or deceptive

> Complex models must be well-characterized in terms of their behavior before they are used for .orange[__forecasting__ ]

---
name: berra
class: center, middle, hide-count

---
# .orange[Bias and Variance trade-off]

In order to minimize test error on new data points we need to
> **select a function** that achieves **.orange[_low variance_]** and **.orange[_low bias_]**.

- .orange[**Variance**] refers to the amount by which our predictions would **change if we estimated using a different training set**.
> The more flexible the model, the higher the variance.

- .orange[**Bias**] refers to the **error that introduced by the approximation** we are making with our model (represent complicated data by a simple model).
> The more simple the model, the higher the bias.

There is a .orange[trade off] between increasing variance (flexibility) and decreasing bias (simplicity) and vice versa.

---
#.orange[Cross-validation]

> How to select an adequate model based on sample data?

<br>

__Recall__: model selection **.orange[chooses a model complexity]** (hyper-parameter)

- Training a classifiers chooses parameter values

- The training can often be formulated as minimizing the training error

.orange[Model selection **cannot be performed by minimizing the training error**]

- it would lead to overfitting

---
# .orange[Cross-validation]

1. Split data into three sets

a. training set

b. validation set

c. test set (hold out set)

2. Train classifiers with **.orange[different hyper-parameters]** on training set

3. Select that with smallest **.orange[prediction error on validation set]**

4. Estimate **.orange[performance on test set]**

<br>

**.orange[Separate test set is **crucial**]**:

- prediction error estimate on validation set is confounded by model selection

---
# .orange[Cross-validation]

Data splitting estimates the .orange[prediction error from data]

<br>

Prediction error estimates can be used in two ways

- model selection `$\Leftrightarrow$` .orange[optimize] performance

- classifier assessment `$\Leftrightarrow$` .orange[interpret] performance (estimates the prediction error of the final choice of classifier)

We **must not use** **.orange[same data for both]**.

> **.orange[Every time]** you take even a single decision **after** looking at perfomances evaluated on some data, those data are no longer valid for performance estimation, and you need to use **.orange[new]** data for that.

---
# .orange[K-fold cross validation]

The misclassification error rate is computed on the observations in the held-out fold.

---
# .orange[K-fold cross validation]

This procedure is .orange[repeated K] times; each time, a different group of observations is treated as a validation set.

---
# .orange[K-fold cross validation]

The .orange[CV error rate] is then calculated as the average of these K error rates.

---
# .orange[K-fold cross validation]

Generally, .orange[k between 5 and 10] avoids over-training the model (variance), whilst avoiding too few training points (bias)

---
#.orange[K-fold cross validation]

0. .orange[Remove test set] and set it aside

1. Divide remaining data into `$K$` .orange[equally sized] blocks

2. Cross-validate: for `$k = 1,\cdots, K$`

- remove block `$k$` from training data

- train classifier on remaining blocks.

- estimate prediction error on block `$k$`

3. Estimates over all `$k$` and select best classifier

4. Retrain the best classifiers (i.e. with its hyper-parameters) on the whole training set (all K sets!)

5. When classifier is chosen and retrained, estimate its performance .orange[on test set]

---
#.orange[Cross validation Flow]

---
#.orange[Cross validation variability]

---
#.orange[Bias variance trade-off]

A predictor having high bias or variance won't do well in predicting on new data

Good, generalizable predictors need to have .orange[both low bias and low variance]

---

#.orange[(Hyper-)parameters]

MLT                 |parameters             | Hyper-parameters|
--------------------|-----------------------|-----------------
Decision tree       | Splits' locations     | # splits
Random forest       | Splits' locations     | # splits<br># trees<br># dimension (randomly selected)
SVM                 | Hyper-plane's position | type of nonlinearity<br>margin<br>overlap
Logistic regression | `$\beta$`s              | polynomial degrees<br># nodes for splines<br>interactions
ANN/DL              | weights               | # layers<br># neurons/layer<br># training's epochs<br>batch size<br>learning rate

---
class: inverse, middle, center, hide-count

# Ready to go _deeper_?

---
class: inverse, middle, center, hide-count

# .orange[Deep] Learning

---
# Neuron

I.e., anything more than old new-fashioned (generalized*) logistic regressions

<br>

.center[<img src="img/neuron.gif" width="70%"/>]

<br>

`$\text{output(s)} = g(\sum_{i=1}^n a_i*w_i)$`

<br>

\*generalized := any **non-linear**, **differentiable**, `$g:\mathbb{R}^n\to \mathbb{R}$` activation function.

---
# .orange[Fully connected] network

.center[
<img src="img/mlp.png" width="65%"/>
]
90

<br>

---
### Can we try to write it down?
3 input; 2 hidden layer w/ 2 neurons each; 1 (sigmoid) output

---
# ML: .orange[optimized] neurons' network

.pull-left[
<img src="img/fc.png" width="80%"/>

Each `$W$` is one base dimension `$\longrightarrow$`<br>
The error is the height `$\longrightarrow$`

Every combination of possible `$W$`s has its own error, i.e., an height

Finding better `$W$`s by running down the (smooth) hills, we improve the model's performance! (.orange[It learns!])

]

.pull-right[

Select some (initially random) weights `$W$`, do the math, obtain a results.

Compare it with the true result, and obtain the error, i.e., a number.

(If the non-linearity `$g$` satisfy good mathematical requirements...)
<br>
<br>
<br>

<img src="img/descent.gif" width="100%"/>
]

---
class: inverse, middle, center, hide-count

# .orange[Unstructured] data

.left[
- Multi-dimensional single-information (e.g., images)
- Sequential one-dimension privileged single-information (e.g., text/signals)
]

---

## Images

---
## Multi-dimensional single-information

### Convolutional networks

> Apply a filter (kernel) to the input data, producing a feature map that captures local patterns and spatial hierarchies in the data. I.e., a sort of summary, a spatial compression, of the input data.

---
# Convolutional networks

---
# Convolutional networks

---
# Convolutional networks

---
### One-dimension privileged single-information

#### Sequencies (input/output)

<br>

---
# Recurrent networks

<small>

.pull-left[
  `$x^{<t>}$`: input position t

`$T_x$`: length of input

`$W^{[l]}_{yx}$` :weight matrix used with input x for output y on layer l

`$b^{[l]}_y$`: (bias) vector for output y on layer l
]

.pull-right[
  `$y^{<t>}$` : output position t

`$T_y$`: length of output

`$a^{[l]}_{<t>}$` : activation vector at position t on layer l
]

</small>

---
# Take them all

<div class="figure" style="text-align: center">
<img src="img/multi-dl.jpg" alt="&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;Network from https://www.sciencedirect.com/science/article/pii/S0007091219306361 &lt;br&gt;Bradley A. Fritz, et al. 'Deep-learning model for predicting 30-day postoperative mortality' - BJA 2019" width="100%" />
<p class="caption"><br><br><br><br><br>Network from https://www.sciencedirect.com/science/article/pii/S0007091219306361 <br>Bradley A. Fritz, et al. 'Deep-learning model for predicting 30-day postoperative mortality' - BJA 2019</p>
</div>
</small></small>

---
# DARP-D

<div class="figure" style="text-align: center">
<img src="img/darp-d.png" alt="Network from https://doi.org/10.1371/journal.pone.0297793 &lt;br&gt;Corianò, Lanera, et al. 'Deep learning-based prediction of major arrhythmic events in dilated cardiomyopathy' - PLoS-One 2024" width="65%" />
<p class="caption">Network from https://doi.org/10.1371/journal.pone.0297793 <br>Corianò, Lanera, et al. 'Deep learning-based prediction of major arrhythmic events in dilated cardiomyopathy' - PLoS-One 2024</p>
</div>
</small></small>

---
class: inverse, middle, center, hide-count

# **Large** .orange[Language Models] (LLM)

---
### But ... have you ever heard of [.orange[Chat GPT]](https://chat.openai.com)!

---
<img src="img/100M-users.png" width="100%" />

---
# First appeared at 2022-11-30

---

# Currently

---
class: hide-count

# **.orange[Chat GPT]**

- _What's Chat GPT?_ (reversed...)
 - T ...
 - P ...
 - G ...
 - Chat!

- _How Chat GPT?_
 - Base
 - Medium
 - Advanced

---
class: inverse, middle, center, hide-count

# .orange[What's **Chat GPT**?]

.left[

- **Transformer**

- **Pre-trained**

- **Generative**

- in **Chat**
]

---
class: inverse, middle, center, hide-count

# Chat GP.orange[T]: .orange[Transformer]

.pull-left[
#### June 2017 (The day everything changed!)

<img src="img/attention-is-all-you-need.png" alt="https://arxiv.org/abs/1706.03762" width="100%" />
]
.pull-right[
<img src="img/full_transformer.png" width="100%" />
]

---
# .orange[Encoder]/.orange[decoder]

---

# Generative... .orange[Transformers]

---

# Generative... .orange[Transformers]

---

# Generative... .orange[Transformers]

---

# Generative... .orange[Transformers]

---
# Generative ... .orange[Transformers]

---
class: inverse, middle, center, hide-count

# Chat G.orange[P]T: .orange[Pre-trained]

---
# .orange[Transfer] learning

<img src="img/transferlearningworkflow.png" alt="https://it.mathworks.com/help/deeplearning/import-deep-neural-networks.html?s_tid=CRUX_lftnav" width="100%" />
---
# .orange[Transfer] learning

---
# Potential .orange[usages]

---
class: inverse, middle, center, hide-count

# In .orange[chat]

---
# Prompt: .orange[cycle]

---
# Prompt: .orange[composition]

---
class: inverse, middle, center, hide-count

# .orange[How **Chat GPT**?]

.left[
- Base: **Web**

- Medium: **Prompt design**

- Advanced: **API**
]

---
class: inverse, middle, center, hide-count

# .orange[Web] access

.left[

https://chat.openai.com

- privacy
- age verification
- robustness/correctness
- chat history
- free vs plus (vs pro)
]

---
# Prompt design

<small>
Info request is funny, in fact:
- .orange[looks] smart...
- mistakes quite .orange[often]...
- helps us easily disseminate .orange[fake news]...

but...
- **.orange[Translate]** this text into Spanish...
- **.orange[Summarize]** this article...
- **.orange[Propose]** the outline for a course...
- **.orange[Reply]** to an email...
- **.orange[Extract]** cholesterol levels from test results...
- **.orange[Rank]** the EHRs in trauma/non-trauma...
- **.orange[Correct]** an exam...
- **.orange[Draft]** a project...
- **.orange[Write]** the R code to do this analysis...
- **.orange[debug]** this code...

if we _play_ seriously it could use a little more **.orange[strategy]** 😉...<br>
 .right[... and a lot of **.orange[competence]**!]
</small>

Getting LLMs to perform tasks that we don't have high enough competence to understand if ( **.orange[when!!]** ) it they provide **.orange[wrong/inaccurate]** answers... is **.orange[extremely dangerous]** !!!

---
class: inverse, middle, center, hide-count

# **.orange[Prompt Design]**

<small>

.left[
**role** = You are the assistant of a university professor.

**context** = You are analyzing the comments of the students of the last course.

**task** = Your task is to extract information from a text provided.

**instructions** = You should extract the first and last words of the text.

**output** = Return the first and last words of the text separated by a dash, i.e., "first - last".

**style** = Do not add any additional information, return only the requested information.

**examples** =
    # Examples:
    text: 'This is an example text.'
    output: 'This - text'
    text: 'Another example text!!!'
    output: 'Another - text'

**text** = The lecture was very interesting and the professor was very clear in his explanations.

</small>
]

---
class: inverse, middle, center, hide-count

# .orange[API] access

Application Programming Interface

---
# <small>Application Programming Interface</small>

### .orange[Request]

``` python
from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message)
```

https://platform.openai.com/docs/guides/text-generation

https://cdn.openai.com/spec/model-spec-2024-05-08.html#follow-the-chain-of-command

---
# <small>Application Programming Interface</small>

### .orange[Response]

.small-code[

``` json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }],
  "service_tier": "default",
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}
```

]

https://platform.openai.com/docs/guides/gpt/chat-completions-api

---
# .orange[Costs] <small><small><small>(https://platform.openai.com/docs/pricing)</small></small></small>

---
# .orange[Counting] tokens

<small>https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them</small>

.pull-left[
<img src="img/token-count.png" alt="https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them" width="100%" />
]
.pull-right[
<br>
<br>
<br>
> Interattiva: https://platform.openai.com/tokenizer
>
> Programmatica: https://github.com/openai/tiktoken
]

E.g. ~10000 paragraphs, or ~750000 words --> ~1M tokens approx

---
class: inverse, middle, center, hide-count
# .orange[**Agents**] (.orange[authonomous] search)

---
class: inverse, middle, center, hide-count

# .orange[**Agents**] (.orange[ask] for interaction)

---
class: inverse, middle, center, hide-count

# .orange[**Agents**] (.orange[ask] for sharing)

---
class: inverse, middle, center, hide-count

# .orange[**Agents**] (.orange[do] stuff!!)

---
class: inverse, middle, center, hide-count
# .orange[Best practices for implementing]<br>Machine Learning<br>.orange[projects]

---
## .orange[Start quickly and simple; **next iterate**!]

Keep .orange[robustness]
- less model complexity and fewer parameters are always beneficial

Keep it .orange[simple] both for model selection and data for your analysis
- start with the minimal set of data that could get you to a good result

## .orange[Treat data with suspicious]

.orange[Look] at the data
- dig into its details
- look for correlations
- systematic biases, errors, and flaw

.orange[Normalize] input data
- ML algorithms can perform .orange[poorly and slowly] if there are large differences in scale between different features

---
## Validate (and fine-tune) your Model
Separate your data into .orange[training], .orange[validation], and .orange[test sets].

> .orange[If you take **ANY** decision after having seen a performance on a data set, it becomes a training one (even if you have treated it as a test)]

## Do not be fooled by Accuracy

For event that only happens 1% of the time,  you can easily report an accuracy of 99%: meaningless.

Before starting a (classification) project, better figure out which precision and recall application (or _metrics_) requires to be useful

> - .orange[Build the model with these metrics on your mind]
> - .orange[When in doubt use balanced metrics]

---
## .orange[Healthcare does not trust black boxes]

Some ML methods are more transparent than others

- Clustering, tend to be easy to interpret, because they create groupings of concepts

- Linear regression can tell how important each feature is to the final output

- Same for decision trees, but they are easily prone to overfitting!

<br>

Random forests are .orange[difficult to interpret].

Neural networks and deep learning are .orange[truly black boxes], i.e., very little transparency to what is important in the decision making process (or very high effort to obtain it).

> Some techniques for explanation:
- Variable importance
- SHAP analyses

---

# My .orange[very personal] classification of AI users

|                | Ignorant of AI                              | AI User                                                                 |
|----------------|---------------------------------------------|-------------------------------------------------------------------------|
| **Inexperienced** | **Dependent Outsider**<br/>Relies on others. Doesn’t cause harm, but neither brings advantages: out of the game and survives only thanks to others. | **Exposed Charlatan**<br/>Might cause damage before getting burned. The most common category, at all levels. Appears (but only if one is truly an expert) way too easy. |
| **Expert**        | **Resistant Craftsman**<br/>Risks being isolated or wiped out from the market. | **Modern Alchemist**<br/>Can produce anything from terrible to sublime, even “quality scams”: the most ambivalent yet powerful figure. |

---
class: inverse, center, middle, hide-count

<br>
# Thank .orange[you] for the attention!

<br>

[<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg>](https://www.unipd-ubep.it/) [**www.unipd-ubep.it**](https://www.unipd-ubep.it/) |
[<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M128 0C110.3 0 96 14.3 96 32V224h96V192c0-35.3 28.7-64 64-64H480V32c0-17.7-14.3-32-32-32H128zM256 160c-17.7 0-32 14.3-32 32v32h96c35.3 0 64 28.7 64 64V416H576c17.7 0 32-14.3 32-32V192c0-17.7-14.3-32-32-32H256zm240 64h32c8.8 0 16 7.2 16 16v32c0 8.8-7.2 16-16 16H496c-8.8 0-16-7.2-16-16V240c0-8.8 7.2-16 16-16zM64 256c-17.7 0-32 14.3-32 32v13L187.1 415.9c1.4 1 3.1 1.6 4.9 1.6s3.5-.6 4.9-1.6L352 301V288c0-17.7-14.3-32-32-32H64zm288 84.8L216 441.6c-6.9 5.1-15.3 7.9-24 7.9s-17-2.8-24-7.9L32 340.8V480c0 17.7 14.3 32 32 32H320c17.7 0 32-14.3 32-32V340.8z"/></svg>](mailto:Corrado.Lanera@ubep.unipd.it) [**Corrado.Lanera@ubep.unipd.it**](mailto:Corrado.Lanera@ubep.unipd.it)

[<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg>](https://github.com/corradolanera)
[<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg>](https://twitter.com/corradolanera)
[<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248,8C111.033,8,0,119.033,0,256S111.033,504,248,504,496,392.967,496,256,384.967,8,248,8ZM362.952,176.66c-3.732,39.215-19.881,134.378-28.1,178.3-3.476,18.584-10.322,24.816-16.948,25.425-14.4,1.326-25.338-9.517-39.287-18.661-21.827-14.308-34.158-23.215-55.346-37.177-24.485-16.135-8.612-25,5.342-39.5,3.652-3.793,67.107-61.51,68.335-66.746.153-.655.3-3.1-1.154-4.384s-3.59-.849-5.135-.5q-3.283.746-104.608,69.142-14.845,10.194-26.894,9.934c-8.855-.191-25.888-5.006-38.551-9.123-15.531-5.048-27.875-7.717-26.8-16.291q.84-6.7,18.45-13.7,108.446-47.248,144.628-62.3c68.872-28.647,83.183-33.623,92.511-33.789,2.052-.034,6.639.474,9.61,2.885a10.452,10.452,0,0,1,3.53,6.716A43.765,43.765,0,0,1,362.952,176.66Z"/></svg>](https://telegram.me/CorradoLanera)
**@CorradoLanera** |
[<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg>](https://github.com/UBESP-DCTV)
**@UBESP-DCTV**

[<svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M128 0c13.3 0 24 10.7 24 24V64H296V24c0-13.3 10.7-24 24-24s24 10.7 24 24V64h40c35.3 0 64 28.7 64 64v16 48V448c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V192 144 128C0 92.7 28.7 64 64 64h40V24c0-13.3 10.7-24 24-24zM400 192H48V448c0 8.8 7.2 16 16 16H384c8.8 0 16-7.2 16-16V192zM329 297L217 409c-9.4 9.4-24.6 9.4-33.9 0l-64-64c-9.4-9.4-9.4-24.6 0-33.9s24.6-9.4 33.9 0l47 47 95-95c9.4-9.4 24.6-9.4 33.9 0s9.4 24.6 0 33.9z"/></svg>](https://calendly.com/corradolanera) [**calendly.com/corradolanera**](https://calendly.com/corradolanera)