Multilayer Perceptron
Can draw complex non-linear boundaries to separate data
But needs more data!
Fully-connected feedforward neural network
In the output layer:
For classification, Softmax is commonly used (even for binary classification problems)
However, Sigmoid or Logistic function can be used as well for 1/0 classification.
In the hidden layer,
we need to use a non-linear activation function
we can use the Sigmoid or Logistic function
but it's more common to use the ReLUnowadays

It can be seen as a Multiclass Logistic Regression model with Hidden layers:

Wide vs Deep Networks
In theory, an MLP with 1 hidden layer should be enough. But:
Needs lots of hidden units (wide & shallow)
Prone to overfitting
A narrow and deep MLP:
needs fewer nodes and generalizes better
but, it's harder to train!

Initialize weights
Cannot initialize the weights to 0, to avoid losing the power of the different hidden units.

Random initialization:
To small and random numbers!
To keep all hidden layers with different numbers
Last updated
Was this helpful?