The iron ML notebook

CtrlK

Multilayer Perceptron

Can draw complex non-linear boundaries to separate data
But needs more data!

Fully-connected feedforward neural network
In the output layer:
- For classification, Softmax is commonly used (even for binary classification problems)
- However, Sigmoid or Logistic function can be used as well for 1/0 classification.
In the hidden layer,
- we need to use a non-linear activation function
- we can use the Sigmoid or Logistic function
- but it's more common to use the ReLUnowadays

It can be seen as a Multiclass Logistic Regression model with Hidden layers:

Normally, Cross-entropy loss is used to train it!

Wide vs Deep Networks

In theory, an MLP with 1 hidden layer should be enough. But:
- Needs lots of hidden units (wide & shallow)
- Prone to overfitting
A narrow and deep MLP:
- needs fewer nodes and generalizes better
- but, it's harder to train!

Initialize weights

Cannot initialize the weights to 0, to avoid losing the power of the different hidden units.

Random initialization:
- To small and random numbers!
- To keep all hidden layers with different numbers

PreviousLogistic Regression NextkNN

Last updated 5 months ago

Was this helpful?