We refer to this as Multi-Head Attention layer with the learnable ... dict (e.g. when we save the model) self.register_buffer("pe", pe, ...
確定! 回上一頁