Multilayer Perceptron

  • Fully-connected feedforward neural network

  • In the output layer:

  • In the hidden layer,

It can be seen as a Multiclass Logistic Regression model with Hidden layers:

Normally, Cross-entropy loss is used to train it!

Wide vs Deep Networks

  • In theory, an MLP with 1 hidden layer should be enough. But:

    • Needs lots of hidden units (wide & shallow)

    • Prone to overfitting

  • A narrow and deep MLP:

    • needs fewer nodes and generalizes better

    • but, it's harder to train!

Initialize weights

  • Cannot initialize the weights to 0, to avoid losing the power of the different hidden units.

  • Random initialization:

    • To small and random numbers!

    • To keep all hidden layers with different numbers

Last updated

Was this helpful?