We find training residual networks with Fixup to be as stable as training with normalization -- even for networks with 10,000 layers.
確定! 回上一頁