时间:2022-08-07 08:58:52 | 栏目:Python代码 | 点击:次
钩子编程(hooking
),也称作“挂钩”,是计算机程序设计术语,指通过拦截软件模块间的函数调用、消息传递、事件传递来修改或扩展操作系统、应用程序或其他软件组件的行为的各种技术。处理被拦截的函数调用、事件、消息的代码,被称为钩子(hook)。
Hook 是 PyTorch
中一个十分有用的特性。利用它,我们可以不必改变网络输入输出的结构,方便地获取、改变网络中间层变量的值和梯度。这个功能被广泛用于可视化神经网络中间层的 feature
、gradient
,从而诊断神经网络中可能出现的问题,分析网络有效性。
本文主要用 hook 函数输出网络执行过程中 forward 和 backward 的执行顺序,以此找到了bug所在。
用法如下:
# 设置hook func def hook_func(name, module): def hook_function(module, inputs, outputs): # 请依据使用场景自定义函数 print(name+' inputs', inputs) print(name+' outputs', outputs) return hook_function # 注册正反向hook for name, module in model.named_modules(): module.register_forward_hook(hook_func('[forward]: '+name, module)) module.register_backward_hook(hook_func('[backward]: '+name, module))
如一个简单的 MNIST 手写数字识别的模型结构如下:
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output
打印模型:
Net( (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1)) (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)) (dropout1): Dropout(p=0.25, inplace=False) (dropout2): Dropout(p=0.5, inplace=False) (fc1): Linear(in_features=9216, out_features=128, bias=True) (fc2): Linear(in_features=128, out_features=10, bias=True) )
构建hook函数:
# 设置hook func def hook_func(name, module): def hook_function(module, inputs, outputs): with open("log_model.txt", 'a+') as f: # 请依据使用场景自定义函数 f.write(name + ' len(inputs): ' + str(len(inputs)) + '\n') f.write(name + ' len(outputs): ' + str(len(outputs)) + '\n') return hook_function # 注册正反向hook for name, module in model.named_modules(): module.register_forward_hook(hook_func('[forward]: '+name, module)) module.register_backward_hook(hook_func('[backward]: '+name, module))
输出的前向和反向传播过程:
[forward]: conv1 len(inputs): 1
[forward]: conv1 len(outputs): 8
[forward]: conv2 len(inputs): 1
[forward]: conv2 len(outputs): 8
[forward]: dropout1 len(inputs): 1
[forward]: dropout1 len(outputs): 8
[forward]: fc1 len(inputs): 1
[forward]: fc1 len(outputs): 8
[forward]: dropout2 len(inputs): 1
[forward]: dropout2 len(outputs): 8
[forward]: fc2 len(inputs): 1
[forward]: fc2 len(outputs): 8
[forward]: len(inputs): 1
[forward]: len(outputs): 8
[backward]: len(inputs): 2
[backward]: len(outputs): 1
[backward]: fc2 len(inputs): 3
[backward]: fc2 len(outputs): 1
[backward]: dropout2 len(inputs): 1
[backward]: dropout2 len(outputs): 1
[backward]: fc1 len(inputs): 3
[backward]: fc1 len(outputs): 1
[backward]: dropout1 len(inputs): 1
[backward]: dropout1 len(outputs): 1
[backward]: conv2 len(inputs): 2
[backward]: conv2 len(outputs): 1
[backward]: conv1 len(inputs): 2
[backward]: conv1 len(outputs): 1
因为只要模型处于train状态,hook_func
就会执行,导致不断输出 [forward] 和 [backward],所以将输出内容建议写到文件中,而不是 print