class: center, middle, bg_title, hide-count <img src="img/DSCTV.png" width="50px"/> <img src="img/UBEP.png" width="50px"/> <img src="img/LAIMS.png" width="50px"/> <style type="text/css"> .left-code { color: #777; width: 38%; height: 92%; float: left; } .right-code { color: #777; width: 55%; height: 92%; float: right; padding-top: 0.5em; } .left-plot { width: 43%; float: left; } .right-plot { width: 60%; float: right; } .hide-count .remark-slide-number { display: none; } .bg_title { position: relative; z-index: 1; } .bg_title::before { content: ""; background-image: url('img/bg1.png'); background-size: contain; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; opacity: 0.3; z-index: -1; } </style> <br> <br> <br> <br> <br> # **Machine Learning** .orange[and] **AI**<br>**.orange[Overview]** From **Basics** to **.orange[Agents]** <br> <br> <br> Corrado Lanera | **Unit of Biostatistics, Epidemiology, and Public Health** --- class: inverse, bottom, right, hide-count <img src="img/profilo_CL.jpg" width="50%" /> # Find me at... [
](https://www.unipd-ubep.it/) [**www.unipd-ubep.it**](https://www.unipd-ubep.it/) [
](mailto:Corrado.Lanera@ubep.unipd.it) [**Corrado.Lanera .orange[@ubep.unipd.it]**](mailto:Corrado.Lanera@ubep.unipd.it) [
](https://github.com/corradolanera) [
](https://telegram.me/CorradoLanera) **@CorradoLanera** [
](https://github.com/UBESP-DCTV) **@UBESP-DCTV** --- class: inverse, hide-count # What we are going to see Purpose of this class is to introducing **.orange[what is]** Machine Learning, what are the techniques involved, how they work and how to understand (and **.orange[trust]**!) their results; we will also look at **.orange[some example]** and **.orange[best practice]** in conducing a machine learning project. The class will cover from simple _classical_ techniques to large language models, introducing agents. <br> <br> My principal aim is to give you the tools to start **.orange[understanding]** and **.orange[evaluating]** the quality of a machine learning project when used for clinical purposes, possibly regardless of its complexity. <br> <br> <br> <br> **.orange[Disclaimer]: Today I do not show you any code đ** --- class: inverse, middle # .center[**.orange[Overview]**] - **Introduction**: what does it means "Machine Learning"? - **Classifiers** - **MLT Examples** - Model **.orange[selection]** and **evaluation** - **Neural Networks** and **.orange[Deep] Learning** - **.orange[Unstructured] data** (e.g., images, text) - **Large** **.orange[Language Models]**, ChatGPT, and Agents - **Best practices** for implementing Machine Learning --- class: inverse, middle, center, hide-count # .orange[Introduction] What does it means "Machine Learning"? --- # .orange[What is **Machine Learning**] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[ .left[ Machine Learning deals with the study, the design and the development of algorithms that give computers the capability to learn without being **explicitly** programmed. ] .tr[ â Arthur Samuel, 1959 ] ] <img src="img/samuel.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[What is **Machine Learning**] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[ .left[ A computer program is said to learn from **experience** (E) with respect to some class of **tasks** (T) and performance measure (P), if its **performance** at the given task improves with **experience** ] .tr[ â _Machine Learning_ - Mitchell, 1997 ] ] .pull-left[ <img src="img/Tom-Mitchell-2.webp" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <br> <br> <br> **.orange[Learning]**: performance on **T** as measured by **P** improves with **E**. ] --- # .orange[What is **Machine Learning**] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[ .left[ A task (red box) requires an appropriate mapping - a model- from data described by feature to outputs. **Obtaining** such mapping from training data is what constitute a **learning problem** (blue box). ] .tr[ â _Machine Learning_ - Peter Flach, 2012 ] ] .left-column[ <img src="img/PeterCartoon-square.jpg" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <img src="img/flach_learning_problem.png" width="100%" /> ] --- # .orange[Task] A __task__ is something the ML must carry out - the process of learning itself is not the task - _Learning_ is .orange[the act of generate models] having the ability to perform the task <br> <br> A __task__ is defined in terms of _how the ML should process a collection of Examples_, i.e. a .orange[dataset]. --- # .orange[Task: Example] <small> **Learning Problem** | Task **T** ---------------------|------------ Learning Checkers | **Playing checkers** Handwriting recognition | **Recognizing/Classifying handwritten words/numbers within images** Self-driving car | **Driving from A to B** Diseases extraction from EHR | **classify EHR by the disease reported (in free-text natural language)** Describe patient movement in bed | **At any given time provide position and dynamics of patients** </small> --- # .orange[Performance] A __performance__ is a quantitative measure for assessing the ability of ML - performance is measured on the task being carried out Usually __performance__ is measured in terms of: - .orange[_accuracy_]: proportion of examples for which the model produces the correct output. - .orange[_error rate_]: proportion of examples for which the model produces the incorrect output. <br> <br> <br> #### .orange[**WARNING**: unbalanced data require balanced metrics!] --- # .orange[Performance: Example] <small> **Learning Problem** | Task **T** | Performance **P** ---------------------|------------|------------------- Learning Checkers | Playing checkers | **% games won ** Handwriting recognition | Recognizing/Classifying handwritten words/numbers within images | **% correctly classified words ** Self-driving car | Driving from A to B | **Average distance traveled before an error (as judged by humans)** Diseases extraction from EHR | classify EHR by the disease reported (in free-text natural language) | **% of EHR correctly classified** Describe patient movement in bed | At any given time provide position and dynamics of patients | **% average error in position or dynamics** </small> --- #.orange[Experience] .orange[__Experience__] is primarily determined by the amount of supervision during the learning process and the availability of labeled data --- #.orange[Experience: Example] <small> **Learning Problem** | Task **T** | Performance **P** | Experience **E** ---------------------|------------|-------------------|----------------- Learning Checkers | Playing checkers | % games won | **playing (against itself)** Handwriting recognition | Recognizing/Classifying handwritten words/numbers within images | % correctly classified words | **process data sets of handwritten words with given classification** Self-driving car | Driving from A to B | Average distance traveled before an error (as judged by humans) | **Sequence of videos, still images, and steering commands recorded while observing a human driver** Diseases extraction from EHR | classify EHR by the disease reported (in free-text natural language) | % of EHR correctly classified | **process EHRs with given classification** Describe patient movement in bed | At any given time provide position and dynamics of patients | % average error in position or dynamics | **time-series of patient kinetic measures taken from wearable devices and bed weight sensors, and position and dynamics collected by videos recorded observing inbed patients** </small> --- # .orange[How do Machines **learn**?] Machine learning is concerned with finding functions that **_best_ predict** outputs (responses), given data inputs (predictors) .pull-left[ `$$Y \simeq f(X)$$` <img src="img/ml-process.png" width="100%" /> .orange[_Learners_] are algorithms that improve their skills (in producing better models/functions) by learning from old/known .orange[__(training)__] data. ] .pull-right[ <img src="img/mlt_loop.png" width="10017 %" /> ] > A .orange[_learner_] uses data and experience to perform better over time (i.e., producing new models that performs better than the previous ones) --- # .orange[How do Machines **learn**?] .left-code[ In traditional programming: - **provide** an algorithm (a finite set of instructions) - **provide** .orange[input] data (no training/new distinction) - **obtain** the desired result. In machine learning: - **provide** the .orange[training] input (data) - **provide** the .orange[training] known/desired result - **obtain** the .orange[learning algorithm] (ingesting _new_ data and returning _new_ outputs). Machine Learning problems are .orange[*optimisation*] ones. ] .right-plot[ <img src="img/MLvsTrad.png" width="100%" style="display: block; margin: auto;" /> ] --- # Types of .orange[learning] - **.orange[Unsupervised] learning**: The input data is _not labeled_ (there are not right answers!) Data is given to the model, which is left to learn optimal .orange[patterns]/.orange[clusters]. - **(Passive) .orange[Supervised] learning**: The learning algorithm is provided with a set of inputs along with the corresponding .orange[correct] outputs. The algorithm compares its current inferred output with the correct one to learn from its .orange[errors] (i.e., to minimize it). - **.orange[Active] learning**: The learning algorithm .orange[interactively] queries a user (the _oracle_) to label new data with the desired (correct) outputs. With input data labeled on the fly by .orange[oracle's knowledge], the model cycles query/train stages on the left unlabeled data. - **.orange[Reinforcement] learning (RL)**: The learning algorithm (an _agent_) interacts with an environment by performing actions in given states and then receiving rewards (or penalties). Learning a policy (state â action mapping) that optimally balances (maximizing) short-term and long-term gains. --- # .orange[Basic **components**] .orange[**Training** dataset]: data used as input to the learner to train the model .orange[**Validation** dataset]: data used by the learner for validation and optimization .orange[Training model]: the ML artifact that comes out of the training process .orange[Cost (or loss) function]: a function to optimize in the ML system (e.g., sum of squared errors over the training data set) .orange[**Test** dataset]: data provided to the trained model for performance estimations --- class: inverse, middle, center, hide-count # .orange[Classifiers] --- class: middle center hide-count <img src="img/food.jpg" width="100%" style="display: block; margin: auto;" /> --- # .orange[Classifiers] and .orange[regressors]. <img src="img/class-reg.png" alt="https://www.sharpsightlabs.com/blog/regression-vs-classification/" width="100%" /> --- # .orange[Classification] .left-code[ __Feature space__ - data: points in `\(\mathbb{R}^d\)` - dimensions: scalar measurements <br> __Classifier functions (_classifiers_)__ - a classifier for `\(K\)` classes is a function $$ f:\mathbb{R}^d \to \{1, \ldots, K \} $$ - classifiers carve up the space into regions ] .right-plot[ <img src="img/classification_3.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Quantifying errors] If `\(f\)` is our classifiers, i.e. for any given `\(x\)` we have `\(f(x) = \tilde{y}\)` is the predicted class (with `\(y\)` the real one), then **Loss function** (for .orange[K] classes `\(\{1, \cdots, K\}\)`): `$$\begin{split}\mathfrak{L}: \{1, \cdots, K\}&\times \{1, \cdots, K\} &\to [0, \infty)\\ (f(x)&,\ y) &\mapsto \mathfrak{L} (f(x),\ y)\end{split}$$` <br> > If all mistakes are equally bad: `$$\mathfrak{L}(i, j) = \begin{cases} 1 & \textrm {if } i\neq j \\ 0 & \textrm {if } i = j \\ \end{cases}$$` Note: if the outcome of `\(f\)` is a set of probabilities for the classes, the same definition is valid considering `$$\mathfrak{L}: \{p_1, \cdots, p_K\} \times \{1, \cdots, K\} \to [0, \infty)$$` --- # .orange[Risk of classifier] If the distribution of the classes is known, the .orange[risk] of classifier is the expected loss $$ {\rm risk}(f) = \mathbb{E}[\mathfrak{L}(f(X), {\rm true\ class\ of\ } X)] $$ We can evaluate the classifier by how large its risk is <br> > The best way to possibly train a classifier is by **minimizing its risk** <br> <br> > N.B. The above one is a sort of common **_misuse of notation_**: a classifier is **.orange[a (single!) model]**, and **cannot be trained**!! when we say "train a model" the real meaning is .center[**"a MLT start producing models, somehow; new ones replacing the prevouses"**.] Interpreting "replacement" like "modification", we can pretend that a (**.orange[single]**...) model is trained. --- class: inverse, middle, center, hide-count # .orange[Main MLT examples] --- # .orange[Nearest neighbor] .orange[__Idea:__] use training data itself as classifier - Given: data point `\(x\)` - Find training data point closest to `\(x\)` - Assign `\(x\)` the label of closest point --- #.orange[Nearest neighbor] <img src="img/knn.png" width="60%" style="display: block; margin: auto;" /> --- ##.orange[Nearest neighbor (100 data points)] <img src="img/knn_2.png" width="60%" style="display: block; margin: auto;" /> --- # .orange[k-Nearest] Neighbor (kNN) - Find `\(k\)` closest training points - Take a majority vote between these points > .orange[__Rule of thumb:__] 3NN often works surprisingly well --- ##.orange[k-Nearest neighbor] <div class="figure" style="text-align: center"> <img src="img/knn_k.png" alt="https://medium.com/analytics-vidhya/diabetes-classification-with-knn-and-logistic-regression-2edd3760a8c7" width="60%" /> <p class="caption">https://medium.com/analytics-vidhya/diabetes-classification-with-knn-and-logistic-regression-2edd3760a8c7</p> </div> --- #.orange[kNN: drawbacks] In large dataset, finding nearest data points is expensive Computational burden grows with dimension > it is the method of choice when dataset is small <br> <br> <br> <br> __What to do for large dataset:__ - Extract a concise summary --- # .orange[Linear classifiers] <img src="img/linear_classifier_3.png" width="70%" style="display: block; margin: auto;" /> --- #.orange[Linear classifiers: Support-Vector Machine] A maximum margin classifier is called __Support-Vector Machine__ <img src="img/linear_classifier_7.png" width="70%" style="display: block; margin: auto;" /> .footnote[An hyperplane is a **subspace of co-dimension = 1** of a space (i.e. 1 dimension less then the original space). E.g., a line (1D) in a plane (2D); a plane (2D) in a 3D space; a 23D subspace in a 24D space.] --- # .orange[Limitations of linear classifiers] .pull-left[ Problem 1: **curved optimal decision boundary** - SVM solves Problem 1 using the so-called .orange[_kernel trick_] <br> Problem 2: **classes may overlap** - SVM solve Problem 2 by: - permitting .orange[misclassified] training points (**C** hyper-parameter) - each such point contributes a .orange[_cost_] to the optimization target function - using the .orange[kernel tricks] ] .pull-right[ <img src="img/classification_3.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Example of the kernel trick] Suppose you have .orange[non-linearly separable] data <img src="img/kernel_1.png" width="70%" style="display: block; margin: auto;" /> > Accuracy of classification given by the linear classifier: .orange[75%] --- # .orange[Example of the kernel trick] Project it into a three-dimensional space where the new coordinates are .left-column[ `$$\begin{cases} X_1 &= y_1^2 \\ X_2 &= y_2^2 \\ X_3 &= \sqrt{2}y_1y_2 \end{cases}$$` ] .right-column[ <img src="img/kernel.gif" width="90%" style="display: block; margin: auto;" /> ] --- # .orange[Example of the kernel trick] Run the SVM on the trasformed data .right-column[ <img src="img/kernel_2.gif" width="90%" style="display: block; margin: auto;" /> ] --- # .orange[Example of the kernel trick] Now you got completely _linearly_ separable data <img src="img/kernel_2.png" width="70%" style="display: block; margin: auto;" /> > Accuracy of classification given by the SVM classifier: .orange[100%] --- # .orange[Ensemble classifiers] __Weak classifier__ Consider two classes of equal size: assign class by coin flip: 50% expected error > weak classifier: .orange[error rate **slightly below** 50%] __Ensemble Classifier__ - trains .orange[many _weak_] classifiers - .orange[combines results] by majority vote If weak classifiers are applicable to `\(k>2\)` classes, so it is ensemble. <br><br><br> **Important examples: .orange[Random Forests]** --- # .orange[Classification by majority vote] `\(m\)` classifiers take a vote > let us suppose `\(m\)` is an odd number Two choices: - correct = `\(1\)` - wrong = `\(-1\)` Decision is made by simple majority - for two classes and classifiers `\(f_1,\ldots ,f_m\)` with output `\(\pm1\)` majority vote at input `\(x\)` is `$$\rm sgn \left( \sum_{j=1}^m f_j(x)\right)$$` --- ## .orange[Classification by majority vote] <img src="index_files/figure-html/unnamed-chunk-22-1.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[_Weak_ learner: tree classifier] <img src="img/tree_1.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[_Weak_ learner: tree classifier] <img src="img/tree_2.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[_Weak_ learner: tree classifier] <img src="img/tree_3.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[_Weak_ learner: tree classifier] <img src="img/tree_4.png" width="70%" style="display: block; margin: auto;" /> --- # .orange[_Weak_ learner: tree classifier] .pull-left[ <img src="img/tree_4.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/tree_5.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Random forest] .orange[Tree training]: Input `\(n\)` training points of classes `\(1,\ldots, K\)` - select `\(n\)` points uniformly at random with replacement - train a tree on the randomized data set .orange[For each tree]: - in each step, select `\(m\)` axes at random (**mtry** hyper-parameter) - compute best split point for each of these axes - split along the one that minimizes error > .orange[Train **ntree** trees in total] > - compute class label of new point `\(x\)` under each of the **ntree** trees > - take majority vote --- class: inverse, middle, center, hide-count # .orange[Model selection] --- # .orange[Overfitting] Sample data acts as proxy for underlying data source .orange[_Over-fitting_] means adapting too closely to the idiosyncrasies of a sample set **Result**: Small error on training data but .orange[poor predictive performance]! <img src="img/overfit.jpg" width="90%" style="display: block; margin: auto;" /> --- # .orange[Overfitting] Model is .orange[not able to generalize] Learn the data and .orange[not the underlying function] Performs well on training data but .orange[poorly with new data] <img src="img/figure3.png" width="100%" style="display: block; margin: auto;" /> --- # .orange[Overfitting: example] Two alternative models of human papillomavirus infection and its progression to cervical cancer (CIN) The complex model includes multiple stages of pre-cancerous lesions which can progress or regress at different rates (model parameters) <br> <div class="figure" style="text-align: center"> <img src="img/Figure_8.png" alt="Basu 2013" width="100%" /> <p class="caption">Basu 2013</p> </div> --- # .orange[Overfitting ] Prevalence data of (CIN) generated using more complex model over a 30-year period among a fictional cohort of young women The complex model (in green) actually has a better .orange[_fit_] to the early prevalence data (solid red dots) than does the simpler model (in blue)... However, the complex model produced a pattern that poorly forecasts future prevalence (hollow red dots) <img src="img/Figure_9.png" width="60%" style="display: block; margin: auto;" /> --- # .orange[Overfitting] **.orange[Every]** additional parameter in the model introduces **new sources of uncertainty** and potential to affect results in non-intuitive ways that may either be useful or deceptive <br> <br> <br> > Complex models must be well-characterized in terms of their behavior before they are used for .orange[__forecasting__ ] --- name: berra class: center, middle, hide-count <img src="img/yogi-berra-photo-quote-1.jpg" width="100%" style="display: block; margin: auto;" /> --- # .orange[Bias and Variance trade-off] In order to minimize test error on new data points we need to > **select a function** that achieves **.orange[_low variance_]** and **.orange[_low bias_]**. - .orange[**Variance**] refers to the amount by which our predictions would **change if we estimated using a different training set**. > The more flexible the model, the higher the variance. - .orange[**Bias**] refers to the **error that introduced by the approximation** we are making with our model (represent complicated data by a simple model). > The more simple the model, the higher the bias. There is a .orange[trade off] between increasing variance (flexibility) and decreasing bias (simplicity) and vice versa. <img src="img/tradeoff.png" width="30%" style="display: block; margin: auto;" /> --- #.orange[Cross-validation] > How to select an adequate model based on sample data? <br> __Recall__: model selection **.orange[chooses a model complexity]** (hyper-parameter) - Training a classifiers chooses parameter values - The training can often be formulated as minimizing the training error <br> <br> .orange[Model selection **cannot be performed by minimizing the training error**] - it would lead to overfitting --- # .orange[Cross-validation] 1. Split data into three sets a. training set b. validation set c. test set (hold out set) 2. Train classifiers with **.orange[different hyper-parameters]** on training set 3. Select that with smallest **.orange[prediction error on validation set]** 4. Estimate **.orange[performance on test set]** <br> **.orange[Separate test set is **crucial**]**: - prediction error estimate on validation set is confounded by model selection --- # .orange[Cross-validation] Data splitting estimates the .orange[prediction error from data] <br> Prediction error estimates can be used in two ways - model selection `\(\Leftrightarrow\)` .orange[optimize] performance - classifier assessment `\(\Leftrightarrow\)` .orange[interpret] performance (estimates the prediction error of the final choice of classifier) <br> <br> We **must not use** **.orange[same data for both]**. > **.orange[Every time]** you take even a single decision **after** looking at perfomances evaluated on some data, those data are no longer valid for performance estimation, and you need to use **.orange[new]** data for that. --- # .orange[K-fold cross validation] The misclassification error rate is computed on the observations in the held-out fold. <img src="img/Cv1.png" width="100%" style="display: block; margin: auto;" /> --- # .orange[K-fold cross validation] This procedure is .orange[repeated K] times; each time, a different group of observations is treated as a validation set. <img src="img/Cv2.png" width="90%" style="display: block; margin: auto;" /> --- # .orange[K-fold cross validation] The .orange[CV error rate] is then calculated as the average of these K error rates. <img src="img/Cv3.png" width="90%" style="display: block; margin: auto;" /> --- # .orange[K-fold cross validation] <img src="img/splits.png" width="100%" style="display: block; margin: auto;" /> Generally, .orange[k between 5 and 10] avoids over-training the model (variance), whilst avoiding too few training points (bias) --- #.orange[K-fold cross validation] 0. .orange[Remove test set] and set it aside 1. Divide remaining data into `\(K\)` .orange[equally sized] blocks 2. Cross-validate: for `\(k = 1,\cdots, K\)` - remove block `\(k\)` from training data - train classifier on remaining blocks. - estimate prediction error on block `\(k\)` 3. Estimates over all `\(k\)` and select best classifier 4. Retrain the best classifiers (i.e. with its hyper-parameters) on the whole training set (all K sets!) 5. When classifier is chosen and retrained, estimate its performance .orange[on test set] --- #.orange[Cross validation Flow] <img src="img/resampling.svg" width="100%" style="display: block; margin: auto;" /> --- #.orange[Cross validation variability] <img src="img/overfitting.png" width="100%" style="display: block; margin: auto;" /> --- #.orange[Bias variance trade-off] A predictor having high bias or variance won't do well in predicting on new data <img src="img/BV.png" width="70%" style="display: block; margin: auto;" /> Good, generalizable predictors need to have .orange[both low bias and low variance] --- #.orange[(Hyper-)parameters] MLT |parameters | Hyper-parameters| --------------------|-----------------------|----------------- Decision tree | Splits' locations | # splits Random forest | Splits' locations | # splits<br># trees<br># dimension (randomly selected) SVM | Hyper-plane's position | type of nonlinearity<br>margin<br>overlap Logistic regression | `\(\beta\)`s | polynomial degrees<br># nodes for splines<br>interactions ANN/DL | weights | # layers<br># neurons/layer<br># training's epochs<br>batch size<br>learning rate --- class: inverse, middle, center, hide-count # Ready to go _deeper_? <img src="img/perplesso.jpg" width="100%" style="display: block; margin: auto;" /> --- class: inverse, middle, center, hide-count # .orange[Deep] Learning --- # Neuron I.e., anything more than old new-fashioned (generalized*) logistic regressions <br> .center[<img src="img/neuron.gif" width="70%"/>] <br> `\(\text{output(s)} = g(\sum_{i=1}^n a_i*w_i)\)` <br> \*generalized := any **non-linear**, **differentiable**, `\(g:\mathbb{R}^n\to \mathbb{R}\)` activation function. --- # .orange[Fully connected] network .center[ <img src="img/mlp.png" width="65%"/> ] 90 <br> <img src="img/loss.png" width="100%"/> --- ### Can we try to write it down? 3 input; 2 hidden layer w/ 2 neurons each; 1 (sigmoid) output --- # ML: .orange[optimized] neurons' network .pull-left[ <img src="img/fc.png" width="80%"/> Each `\(W\)` is one base dimension `\(\longrightarrow\)`<br> The error is the height `\(\longrightarrow\)` Every combination of possible `\(W\)`s has its own error, i.e., an height Finding better `\(W\)`s by running down the (smooth) hills, we improve the model's performance! (.orange[It learns!]) ] .pull-right[ Select some (initially random) weights `\(W\)`, do the math, obtain a results. Compare it with the true result, and obtain the error, i.e., a number. (If the non-linearity `\(g\)` satisfy good mathematical requirements...) <br> <br> <br> <img src="img/descent.gif" width="100%"/> ] --- class: inverse, middle, center, hide-count # .orange[Unstructured] data .left[ - Multi-dimensional single-information (e.g., images) - Sequential one-dimension privileged single-information (e.g., text/signals) ] --- ## Images <img src="img/imagetypegrayscale.png" width="90%" style="display: block; margin: auto;" /> <img src="img/imagetypergb.png" width="90%" style="display: block; margin: auto;" /> --- ## Multi-dimensional single-information ### Convolutional networks > Apply a filter (kernel) to the input data, producing a feature map that captures local patterns and spatial hierarchies in the data. I.e., a sort of summary, a spatial compression, of the input data. <img src="img/convExample.png" width="100%" style="display: block; margin: auto;" /> --- # Convolutional networks <img src="img/conv.jpg" width="100%" style="display: block; margin: auto;" /> --- # Convolutional networks <img src="img/multi-cnn.png" width="100%" style="display: block; margin: auto;" /> --- # Convolutional networks <br><br> <img src="img/cnn-struct.png" width="100%" style="display: block; margin: auto;" /> --- ### One-dimension privileged single-information #### Sequencies (input/output) <br> <img src="img/sequences.png" width="100%" style="display: block; margin: auto;" /> --- # Recurrent networks <img src="img/rnn-full_CL.png" width="100%" style="display: block; margin: auto;" /> <small> .pull-left[ `\(x^{<t>}\)`: input position t `\(T_x\)`: length of input `\(W^{[l]}_{yx}\)` :weight matrix used with input x for output y on layer l `\(b^{[l]}_y\)`: (bias) vector for output y on layer l ] .pull-right[ `\(y^{<t>}\)` : output position t `\(T_y\)`: length of output `\(a^{[l]}_{<t>}\)` : activation vector at position t on layer l ] </small> --- # Take them all <small><small> <div class="figure" style="text-align: center"> <img src="img/multi-dl.jpg" alt="<br><br><br><br><br>Network from https://www.sciencedirect.com/science/article/pii/S0007091219306361 <br>Bradley A. Fritz, et al. 'Deep-learning model for predicting 30-day postoperative mortality' - BJA 2019" width="100%" /> <p class="caption"><br><br><br><br><br>Network from https://www.sciencedirect.com/science/article/pii/S0007091219306361 <br>Bradley A. Fritz, et al. 'Deep-learning model for predicting 30-day postoperative mortality' - BJA 2019</p> </div> </small></small> --- # DARP-D <small><small> <div class="figure" style="text-align: center"> <img src="img/darp-d.png" alt="Network from https://doi.org/10.1371/journal.pone.0297793 <br>Corianò, Lanera, et al. 'Deep learning-based prediction of major arrhythmic events in dilated cardiomyopathy' - PLoS-One 2024" width="65%" /> <p class="caption">Network from https://doi.org/10.1371/journal.pone.0297793 <br>Corianò, Lanera, et al. 'Deep learning-based prediction of major arrhythmic events in dilated cardiomyopathy' - PLoS-One 2024</p> </div> </small></small> --- class: inverse, middle, center, hide-count # **Large** .orange[Language Models] (LLM) --- ### But ... have you ever heard of [.orange[Chat GPT]](https://chat.openai.com)! <img src="img/simple-prompt-en.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/100M-users.png" width="100%" /> --- # First appeared at 2022-11-30 <br> <img src="img/1M-users.png" width="100%" /> --- # Currently <br> <img src="img/900M-uers.png" width="100%" /> --- class: hide-count # **.orange[Chat GPT]** - _What's Chat GPT?_ (reversed...) - T ... - P ... - G ... - Chat! - _How Chat GPT?_ - Base - Medium - Advanced --- class: inverse, middle, center, hide-count # .orange[What's **Chat GPT**?] .left[ - **Transformer** - **Pre-trained** - **Generative** - in **Chat** ] --- class: inverse, middle, center, hide-count # Chat GP.orange[T]: .orange[Transformer] .pull-left[ #### June 2017 (The day everything changed!) <img src="img/attention-is-all-you-need.png" alt="https://arxiv.org/abs/1706.03762" width="100%" /> ] .pull-right[ <img src="img/full_transformer.png" width="100%" /> ] --- # .orange[Encoder]/.orange[decoder] <img src="img/encdec.png" alt="https://jalammar.github.io/illustrated-transformer/" width="100%" style="display: block; margin: auto;" /> --- # Generative... .orange[Transformers] <img src="img/transformer-1.png" width="60%" style="display: block; margin: auto;" /> --- # Generative... .orange[Transformers] <img src="img/transformer-2.png" width="60%" style="display: block; margin: auto;" /> --- # Generative... .orange[Transformers] <img src="img/transformer-3.png" width="60%" style="display: block; margin: auto;" /> --- # Generative... .orange[Transformers] <img src="img/transformer-4.png" width="60%" style="display: block; margin: auto;" /> --- # Generative ... .orange[Transformers] <img src="img/transformer_decoding_2.gif" alt="https://jalammar.github.io/illustrated-transformer/" width="100%" style="display: block; margin: auto;" /> --- class: inverse, middle, center, hide-count # Chat G.orange[P]T: .orange[Pre-trained] <img src="img/2023-Alan-D-Thompson-AI-Bubbles-Rev-7b.png" alt="https://s10251.pcdn.co/pdf/2023-Alan-D-Thompson-AI-Bubbles-Rev-7b.pdf" width="90%" /> --- # .orange[Transfer] learning <img src="img/transferlearningworkflow.png" alt="https://it.mathworks.com/help/deeplearning/import-deep-neural-networks.html?s_tid=CRUX_lftnav" width="100%" /> --- # .orange[Transfer] learning <img src="img/transfer-learning.png" alt="https://www.mdpi.com/1424-8220/23/2/570" width="80%" /> --- # Potential .orange[usages] <img src="img/LLM-use-cases.png" alt="https://txt.cohere.com/llm-use-cases/" width="100%" /> --- class: inverse, middle, center, hide-count # In .orange[chat] --- # Prompt: .orange[cycle] <img src="img/llm-prompt.png" alt="https://medium.com/@tariqsaad1997/chatgpt-prompt-engineering-part-4-building-a-customized-chatbot-165db7515c29" width="100%" /> --- # Prompt: .orange[composition] <img src="img/chatgpt-prompt.png" alt="https://medium.com/@tariqsaad1997/chatgpt-prompt-engineering-part-4-building-a-customized-chatbot-165db7515c29" width="100%" /> --- class: inverse, middle, center, hide-count # .orange[How **Chat GPT**?] .left[ - Base: **Web** - Medium: **Prompt design** - Advanced: **API** ] --- class: inverse, middle, center, hide-count # .orange[Web] access .left[ https://chat.openai.com - privacy - age verification - robustness/correctness - chat history - free vs plus (vs pro) ] --- # Prompt design <small> Info request is funny, in fact: - .orange[looks] smart... - mistakes quite .orange[often]... - helps us easily disseminate .orange[fake news]... but... - **.orange[Translate]** this text into Spanish... - **.orange[Summarize]** this article... - **.orange[Propose]** the outline for a course... - **.orange[Reply]** to an email... - **.orange[Extract]** cholesterol levels from test results... - **.orange[Rank]** the EHRs in trauma/non-trauma... - **.orange[Correct]** an exam... - **.orange[Draft]** a project... - **.orange[Write]** the R code to do this analysis... - **.orange[debug]** this code... if we _play_ seriously it could use a little more **.orange[strategy]** đ...<br> .right[... and a lot of **.orange[competence]**!] </small> Getting LLMs to perform tasks that we don't have high enough competence to understand if ( **.orange[when!!]** ) it they provide **.orange[wrong/inaccurate]** answers... is **.orange[extremely dangerous]** !!! --- class: inverse, middle, center, hide-count # **.orange[Prompt Design]** <small> .left[ **role** = You are the assistant of a university professor. **context** = You are analyzing the comments of the students of the last course. **task** = Your task is to extract information from a text provided. **instructions** = You should extract the first and last words of the text. **output** = Return the first and last words of the text separated by a dash, i.e., "first - last". **style** = Do not add any additional information, return only the requested information. **examples** = # Examples: text: 'This is an example text.' output: 'This - text' text: 'Another example text!!!' output: 'Another - text' **text** = The lecture was very interesting and the professor was very clear in his explanations. </small> ] --- class: inverse, middle, center, hide-count # .orange[API] access Application Programming Interface --- # <small>Application Programming Interface</small> ### .orange[Request] ``` python from openai import OpenAI client = OpenAI() completion = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "developer", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ] ) print(completion.choices[0].message) ``` https://platform.openai.com/docs/guides/text-generation https://cdn.openai.com/spec/model-spec-2024-05-08.html#follow-the-chain-of-command --- # <small>Application Programming Interface</small> ### .orange[Response] <style type="text/css"> .small-code pre code { font-size: 60%; line-height: 1.2; } </style> .small-code[ ``` json { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "gpt-4o-mini", "system_fingerprint": "fp_44709d6fcb", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "\n\nHello there, how may I assist you today?", }, "logprobs": null, "finish_reason": "stop" }], "service_tier": "default", "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21, "completion_tokens_details": { "reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 } } } ``` ] https://platform.openai.com/docs/guides/gpt/chat-completions-api --- # .orange[Costs] <small><small><small>(https://platform.openai.com/docs/pricing)</small></small></small> <img src="img/gpt-api-price.png" alt="https://openai.com/pricing" width="90%" /> --- # .orange[Counting] tokens <small>https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them</small> .pull-left[ <img src="img/token-count.png" alt="https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them" width="100%" /> ] .pull-right[ <br> <br> <br> > Interattiva: https://platform.openai.com/tokenizer > > Programmatica: https://github.com/openai/tiktoken ] E.g. ~10000 paragraphs, or ~750000 words --> ~1M tokens approx --- class: inverse, middle, center, hide-count # .orange[**Agents**] (.orange[authonomous] search) <img src="img/operator_find_guidelines.png" alt="https://operator.chatgpt.com" width="100%" /> --- class: inverse, middle, center, hide-count # .orange[**Agents**] (.orange[ask] for interaction) <img src="img/operator_request_interaction.png" alt="https://operator.chatgpt.com" width="50%" /> --- class: inverse, middle, center, hide-count # .orange[**Agents**] (.orange[ask] for sharing) <img src="img/agents_ask_for_sharing.png" alt="https://operator.chatgpt.com" width="100%" /> --- class: inverse, middle, center, hide-count # .orange[**Agents**] (.orange[do] stuff!!) <img src="img/agents_do_stuff.png" alt="https://operator.chatgpt.com" width="100%" /> --- class: inverse, middle, center, hide-count # .orange[Best practices for implementing]<br>Machine Learning<br>.orange[projects] --- ## .orange[Start quickly and simple; **next iterate**!] Keep .orange[robustness] - less model complexity and fewer parameters are always beneficial Keep it .orange[simple] both for model selection and data for your analysis - start with the minimal set of data that could get you to a good result ## .orange[Treat data with suspicious] .orange[Look] at the data - dig into its details - look for correlations - systematic biases, errors, and flaw .orange[Normalize] input data - ML algorithms can perform .orange[poorly and slowly] if there are large differences in scale between different features --- ## Validate (and fine-tune) your Model Separate your data into .orange[training], .orange[validation], and .orange[test sets]. > .orange[If you take **ANY** decision after having seen a performance on a data set, it becomes a training one (even if you have treated it as a test)] ## Do not be fooled by Accuracy For event that only happens 1% of the time, you can easily report an accuracy of 99%: meaningless. Before starting a (classification) project, better figure out which precision and recall application (or _metrics_) requires to be useful > - .orange[Build the model with these metrics on your mind] > - .orange[When in doubt use balanced metrics] --- ## .orange[Healthcare does not trust black boxes] Some ML methods are more transparent than others - Clustering, tend to be easy to interpret, because they create groupings of concepts - Linear regression can tell how important each feature is to the final output - Same for decision trees, but they are easily prone to overfitting! <br> Random forests are .orange[difficult to interpret]. Neural networks and deep learning are .orange[truly black boxes], i.e., very little transparency to what is important in the decision making process (or very high effort to obtain it). > Some techniques for explanation: - Variable importance - SHAP analyses --- # My .orange[very personal] classification of AI users | | Ignorant of AI | AI User | |----------------|---------------------------------------------|-------------------------------------------------------------------------| | **Inexperienced** | **Dependent Outsider**<br/>Relies on others. Doesnât cause harm, but neither brings advantages: out of the game and survives only thanks to others. | **Exposed Charlatan**<br/>Might cause damage before getting burned. The most common category, at all levels. Appears (but only if one is truly an expert) way too easy. | | **Expert** | **Resistant Craftsman**<br/>Risks being isolated or wiped out from the market. | **Modern Alchemist**<br/>Can produce anything from terrible to sublime, even âquality scamsâ: the most ambivalent yet powerful figure. | --- class: inverse, center, middle, hide-count <img src="img/procione.jpeg" width="50%" /> <br> # Thank .orange[you] for the attention! <br> [
](https://www.unipd-ubep.it/) [**www.unipd-ubep.it**](https://www.unipd-ubep.it/) | [
](mailto:Corrado.Lanera@ubep.unipd.it) [**Corrado.Lanera@ubep.unipd.it**](mailto:Corrado.Lanera@ubep.unipd.it) [
](https://github.com/corradolanera) [
](https://twitter.com/corradolanera) [
](https://telegram.me/CorradoLanera) **@CorradoLanera** | [
](https://github.com/UBESP-DCTV) **@UBESP-DCTV** [
](https://calendly.com/corradolanera) [**calendly.com/corradolanera**](https://calendly.com/corradolanera)