背景
BN批量归一化是深度学习的一个标志性技术。通过假设数据都处于正态分布,将数据进行归一化,从而加快整个网络的训练性能
但是由于现在任务越来越复杂,导致我们没有足够显存去存放足够多的批量数据
当batch太小的时候,批量归一化效果就没那么好了
尽管有GN等等从其他维度上进行归一化的技术,但依旧无法取代BN
最近谷歌提出了一个FRN结构。在低batch下的性能依旧稳定
结构
整个结构不复杂,就是通过一个公式计算,最后经过一个阈值限制输出
FRN的归一化维度是在(H,W)上,即对 每个sample,对应的每个channel进行归一化
epsilon 是一个很小的数值。对于常规操作设置为1e-6,但对于1×1卷积这个值效果不太好,作者也建议我们设置epsilon设置为可学习的参数
注意的是我们公式计算归一化的时候,并没有减去均值。因此我们需要额外再设置一个可学习的参数作为一个阈值,来限制输出
代码
TensorFlow版本,这里直接copy官方的
def FRNLayer(x, tau, beta, gamma, eps=1e-6):# x: Input tensor of shape [BxHxWxC].# alpha, beta, gamma: Variables of shape [1, 1, 1, C].# eps: A scalar constant or learnable variable.# Compute the mean norm of activations per channel.nu2 = tf.reduce_mean(tf.square(x), axis=[1, 2],keepdims=True)# Perform FRN.x = x * tf.rsqrt(nu2 + tf.abs(eps))# Return after applying the Offset-ReLU non-linearity.
Pytorch版本 github也有相关开源
class FilterResponseNormNd(nn.Module):def __init__(self, ndim, num_features, eps=1e-6,learnable_eps=False):"""Input Variables:----------------ndim: An integer indicating the number of dimensions of the expected input tensor.num_features: An integer indicating the number of input feature dimensions.eps: A scalar constant or learnable variable.learnable_eps: A bool value indicating whether the eps is learnable."""assert ndim in [3, 4, 5], 'FilterResponseNorm only supports 3d, 4d or 5d inputs.'super(FilterResponseNormNd, self).__init__()shape = (1, num_features) + (1,) * (ndim - 2)self.eps = nn.Parameter(torch.ones(*shape) * eps)if not learnable_eps:self.eps.requires_grad_(False)self.gamma = nn.Parameter(torch.Tensor(*shape))self.beta = nn.Parameter(torch.Tensor(*shape))self.tau = nn.Parameter(torch.Tensor(*shape))self.reset_parameters()def forward(self, x):avg_dims = tuple(range(2, x.dim())) # (2, 3)nu2 = torch.pow(x, 2).mean(dim=avg_dims, keepdim=True)x = x * torch.rsqrt(nu2 + torch.abs(self.eps))return torch.max(self.gamma * x + self.beta, self.tau)def reset_parameters(self):nn.init.ones_(self.gamma)nn.init.zeros_(self.beta)nn.init.zeros_(self.tau)
我这里用PaddlePaddle写一个
import paddle.fluid as fluid
from paddle.fluid.initializer import ConstantInitializer
import numpy as np
from paddle.fluid.layers import elementwise_max
class FRNlayer(fluid.dygraph.Layer):"""FRNLayer"""def __init__(self, ndim, num_features, eps=1e-6):""":param ndim: 数据维度:param num_features: 数据特征数:param eps: 防止除0,论文建议初始值设置为1e-6,并且是可学习参数"""super(FRNlayer, self).__init__()assert ndim in [3, 4, 5], 'Only Supports 3d, 4d, 5d Tensor'shape = (1, num_features) + (1,) * (ndim - 2)self.ndim = ndimself.eps = self.create_parameter(shape=shape,default_initializer=ConstantInitializer(eps), dtype='float32')self.gamma = self.create_parameter(shape=shape,default_initializer=ConstantInitializer(0), dtype='float32')self.beta = self.create_parameter(shape=shape,default_initializer=ConstantInitializer(0), dtype='float32')self.tau = self.create_parameter(shape=shape,default_initializer=ConstantInitializer(0), dtype='float32')def forward(self, x):dims = list(range(2, self.ndim))var = fluid.layers.reduce_mean(fluid.layers.pow(x, 2), dim=dims, keep_dim=True)x = x*self.gamma*fluid.layers.rsqrt(var + fluid.layers.abs(self.eps)) + self.betareturn elementwise_max(x, self.tau, axis=1)with fluid.dygraph.guard():x = np.random.rand(1, 3, 224, 224).astype('float32')x = fluid.dygraph.to_variable(x)frn = FRNlayer(4, 3)y = frn(x)print(y.numpy())