Tensor): target signal masked with self.padding_id (batch, seqlen) Returns: loss (torch. ... (KL-Divergence是KL散度的意思,中间不是减号).
確定! 回上一頁