# what is machine learning - github pages ... machine learning approach fig 1-2 from hands-on machine

Post on 31-Mar-2021

0 views

Embed Size (px)

TRANSCRIPT

What is Machine Learning

He He

Slides based on Lecture 1 from David Rosenberg’s course material.

CDS, NYU

Feb 2, 2021

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 1 / 15

https://davidrosenberg.github.io/mlcourse/Archive/2017Fall/Lectures/01.black-box-ML.pdf https://github.com/davidrosenberg/mlcourse https://davidrosenberg.github.io/mlcourse/Archive/2017Fall/Lectures/01.black-box-ML.pdf https://github.com/davidrosenberg/mlcourse

Contents

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 2 / 15

https://davidrosenberg.github.io/mlcourse/Archive/2017Fall/Lectures/01.black-box-ML.pdf https://github.com/davidrosenberg/mlcourse

Machine Learning Problems

Common theme is to solve a prediction problem:

given an input x ,

predict an output y .

We’ll start with a few canonical examples...

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 3 / 15

https://davidrosenberg.github.io/mlcourse/Archive/2017Fall/Lectures/01.black-box-ML.pdf https://github.com/davidrosenberg/mlcourse

Example: Spam Detection

Input: Incoming email

Output: “SPAM” or “NOT SPAM”

A binary classification problem, because only 2 possible outputs. He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 4 / 15

https://davidrosenberg.github.io/mlcourse/Archive/2017Fall/Lectures/01.black-box-ML.pdf https://github.com/davidrosenberg/mlcourse

Example: Medical Diagnosis

Input: Symptoms (fever, cough, fast breathing, shaking, nausea, ...)

Output: Diagnosis (pneumonia, flu, common cold, bronchitis, ...)

A multiclass classification problem: choosing one of several discrete outputs.

How to express uncertainty?

Probabilistic classification or soft classification:

P(pneumonia) = 0.7 P(flu) = 0.2

... ...

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 5 / 15

Example: Predicting a Stock Price

Input: History of stock’s prices

Output: Predict stock’s price at close of next day

A regression problem, because the output is continuous.

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 6 / 15

The Prediction Function

A prediction function takes input x and produces an output y .

We’re looking for prediction functions that solve particular problems.

Machine learning helps find the “best” prediction function automatically with data What does “best” mean?

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 7 / 15

What is not ML: Rule-Based Approaches

Consider medical diagnosis. 1 Consult textbooks and medical doctors (i.e. “experts”). 2 Understand their diagnosis process. 3 Implement this as an algorithm (a “rule-based system”)

Doesn’t sound too bad...

Very popular in the 1980s.

(To be fair, expert systems could be much more sophisticated than they sound here. For example, through inference they could make new logical deductions from knowledge bases.)

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 8 / 15

Rule-Based Approach

Fig 1-1 from Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron (2017).

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 9 / 15

Rule-Based Systems

Issues with rule-based systems:

Very labor intensive to build.

Rules work very well for areas they cover, but cannot generalize to unanticipated input combinations.

Don’t naturally handle uncertainty.

Expert systems seen as brittle

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 10 / 15

Modern AI: Machine Learning

Don’t reverse engineer an expert’s decision process.

Machine learns on its own.

We provide training data: many examples of (input x , output y) pairs, e.g. A set of videos, and whether or not each has a cat.

A set of emails, and whether or not each is SPAM.

Learning from training data of this form is called supervised learning.

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 11 / 15

Machine Learning Algorithm

A machine learning algorithm learns from the training data: Input: Training Data

Output: A prediction function that produces output y given input x .

The success of ML depends on Availability of large amounts of data

Generalization to unseen samples (the test set)

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 12 / 15

Machine Learning Approach

Fig 1-2 from Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron (2017).

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 13 / 15

Key concepts

Most common ML problem types classification (binary and multiclass)

regression

prediction function: predicts output y given input x

training data: a set of (input x , output y) pairs

supervised learning algorithm: takes training data and produces a prediction function

Beyond prediction Unsupervised learning: finding structures in data, e.g. clustering

Reinforcement learning: optimizing long-term objective, e.g. Go

Representation learning: learning good featurs of real-world objects, e.g. text

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 14 / 15

Core Questions in Machine Learning

Given any task, the following questions need to be answered:

Modeling: What is the prediction function?

Learning: How to learn the prediction function from data?

Inference: Given a learned model, how to make predictions?

He He Slides based on Lecture 1 from David Rosenberg’s course material. (CDS, NYU)DS-GA 1003 Feb 2, 2021 15 / 15