While the CNN extracts features and maps the input to fixed-length vectors via an RNN/LSTM encoder, the 3D convolutional model encodes motion using video shape ...
確定! 回上一頁