踩坑之路

踩坑之路

001号坑

问题描述:

pychram 中 run一个程序test_attack.py,报错信息如下:

1
2
3
4
5
6
"D:\Program Files (x86)\Anaconda3\envs\MyPytorch\python.exe" "D:\Program Files\JetBrains\PyCharm 2021.2.1\plugins\python\helpers\pycharm\_jb_pytest_runner.py" --path "test_attack.py"
Testing started at 21:53 ...
Traceback (most recent call last):
File "D:\Program Files\JetBrains\PyCharm 2021.2.1\plugins\python\helpers\pycharm\_jb_pytest_runner.py", line 5, in <module>
import pytest
ModuleNotFoundError: No module named 'pytest'

我一开始就在装pytest这个模块,装了后面还是要错。。。

解决办法

由于pycharm中以pytest运行,它会默认把test、test_开头的.py文件当做单元测试。
需要修改文件名,修改文件test_attack.py 为attack.py ,再run,没问题了。

Refer to

https://www.cnblogs.com/yoyoblogs/p/11232819.html


002号坑

问题描述

python 安装 RobustBench库的时候报错

安装命令

1
pip install git+https://github.com/RobustBench/robustbench

报错如下:
1
2
3
4
5
6
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "balbabalba\setup.py", line 11, in <module>
long_description=open('README.md').read(),
UnicodeDecodeError: UnicodeDecodeError: 'gbk' codec can't decode byte balabbalbal我忘了

README.md这个文件的编码有问题。

解决方法

1、https://github.com/RobustBench/robustbench 把源码下载下来然后解压
2、将里面的README.md里面的内容全去掉,注意不能删掉README.md文件
3、再将这些解压了的文件 加压 成一个压缩包abcd.zip
4、进到上面这个压缩包abcd.zip的所在位置,在命令窗口执行pip install abcd.zip


003号坑

问题描述:gpu上训练好之后保存的gpu模型在cpu上加载

保存信息如下:

1
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

源码:
1
2
3
4
5
6
7
8
9
model = CIFAR10()
model = torch.nn.DataParallel(model)
load_model(model, 'model/cifar10_gpu.pt')

。。。。。。

def load_model(model, filename):
""" Load the training model """
model.load_state_dict(torch.load(filename))

解决方法

please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.
修改代码如下:

1
model.load_state_dict(torch.load(filename, map_location=torch.device('cpu')))

004号坑

问题描述

服务器上运行代码,加载模型的时候报错了,报错信息如下:

1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
File "attck_cifar10.py", line 297, in <module>
load_model(model, 'model/cifar10_gpu.pt')
File "/home/ranyu/Workspace/Project/bandit-CIFAR10/allmodels.py", line 756, in load_model
model.load_state_dict(torch.load(filename))
File "/home/ranyu/anaconda3/envs/MyPytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CIFAR10:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias", "features.1.weight", "features.1.bias", "features.1.running_mean", "features.1.running_var", "features.3.weight", "features.3.bias", "features.4.weight", "features.4.bias", "features.4.running_mean", "features.4.running_var", "features.7.weight", "features.7.bias", "features.8.weight", "features.8.bias", "features.8.running_mean", "features.8.running_var", "features.10.weight", "features.10.bias", "features.11.weight", "features.11.bias", "features.11.running_mean", "features.11.running_var", "fc1.weight", "fc1.bias", "fc2.weight", "fc2.bias", "fc3.weight", "fc3.bias".
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias", "module.features.1.weight", "module.features.1.bias", "module.features.1.running_mean", "module.features.1.running_var", "module.features.3.weight", "module.features.3.bias", "module.features.4.weight", "module.features.4.bias", "module.features.4.running_mean", "module.features.4.running_var", "module.features.7.weight", "module.features.7.bias", "module.features.8.weight", "module.features.8.bias", "module.features.8.running_mean", "module.features.8.running_var", "module.features.10.weight", "module.features.10.bias", "module.features.11.weight", "module.features.11.bias", "module.features.11.running_mean", "module.features.11.running_var", "module.fc1.weight", "module.fc1.bias", "module.fc2.weight", "module.fc2.bias", "module.fc3.weight", "module.fc3.bias".

解决方法

原来报错的代码:

1
2
3
4
5
model = CIFAR10()
model = torch.nn.DataParallel(model)
load_model(model, 'model/cifar10_gpu.pt')
model_to_fool=model.cuda()
model_to_fool.eval()

将代码修改为如下所示:【注】:我使用的是1号GPU这一张卡

1
2
3
4
5
model = CIFAR10()
model = torch.nn.DataParallel(model,device_ids=[1])
load_model(model, 'model/cifar10_gpu.pt')
model_to_fool=model.cuda()
model_to_fool.eval()

005号坑

问题描述 服务器上的数据GPU

1
2
3
4
5
6
7
8
Traceback (most recent call last):
File "attck_cifar10.py", line 300, in <module>
main(args, model_to_fool, 32)
File "attck_cifar10.py", line 231, in main
res = make_adversarial_examples(images.cuda(), targets.cuda(), args, model_to_fool, dataset_size)
File "attck_cifar10.py", line 157, in make_adversarial_examples
l1 = L((image + args.fd_eta*q1/norm(q1)).cuda()) # L(prior + c*noise) # q1/norm(q1)) 单位方向向量
RuntimeError: expected device cuda:1 but got device cpu

解决方法

把一行注释取消掉

1
# ch.set_default_tensor_type('torch.cuda.FloatTensor')

如下:
1
ch.set_default_tensor_type('torch.cuda.FloatTensor')

006号坑

问题描述:imagenet归一化进行操作,最后将归一化后的图片进行还原(反归一化)

前面进行的归一化操作:

1
2
3
4
5
6
7
8
9
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])                         
val_dataset = dsets.ImageFolder(
'./ImageNet_val',
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
]))

我进行的反归一化操作进行图片显示:
1
2
3
4
5
6
7
mean = torch.Tensor([0.485, 0.456, 0.406]) 
std = torch.Tensor([0.229, 0.224, 0.225])
tmp = x0[0] * std+ mean
tmp = tmp.swapaxes(0, 1)
tmp = tmp.swapaxes(1, 2)
plt.imshow(tmp)
plt.show()

报错了:
1
The size of tensor a (224) must match the size of tensor b (3) at non-singleton dimension 2

解决方法

添加view(-1,1,1),这样反归一化之后的数据 是正确的图片

1
2
3
4
5
tmp = x0[0] * std.view(-1,1,1) + mean.view(-1,1,1)
tmp = tmp.swapaxes(0, 1)
tmp = tmp.swapaxes(1, 2)
plt.imshow(tmp)
plt.show()

007号坑

问题描述:Linux 上安装pytorch,一直安装不上gpu版本

解决方法:

源有问题,难过只能装上cpu版本的 麻了麻了 折腾了好久
确保这里条在前面
vim ~/.condarc

1
2
3
4
5
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes
conda install pytorch torchvision cudatoolkit=10.1

换号源之后执行这个conda install pytorch torchvision cudatoolkit=10.1什么都是匹配好的 cudnn也不用我装了

008号坑

问题描述:

1
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

解决方法:

加上一下代码

1
torch.backends.cudnn.enabled = False

参考:https://blog.csdn.net/qq_41895003/article/details/105620317

009号坑

问题描述

1
2
3
4
5
6
7
8
9
patch_value = torch.rand(size=[bs, c, h, h]).cuda()
patched_img = (torch.ones_like(mask).cuda() - mask) * images + mask * patch_value
patch_value.requires_grad = True

# show_image(patched_img[0])
outputs = model(patched_img)
loss = loss_function(outputs, labels)
loss.backward()
patch_gard = torch.sign(patch_value.grad)

固定所有变量, 只求patch_valued的梯度,这样反向传播之后,patch_grad是Nonetype

解决办法

patch_value是输入,是叶子,要对它求梯度,除了它的requires_grad要设置为True,所有中间节点的requires_grad也要设置为True。所以这里patch_value.requires_grad = True不能放在 patch_img后面,应该紧跟在patch_value后面,这样后面与patch_value相关的变量都会自动设置为true。而按照上面的写法,只有patch_value的requires_grad设置成True,而其他中间节点没有。改成如下:

1
2
3
4
5
6
7
8
9
10
patch_value = torch.rand(size=[bs, c, h, h]).cuda()
patch_value.requires_grad = True
# patch_value.requires_grad = True 紧跟在 patch_value后面
patched_img = (torch.ones_like(mask).cuda() - mask) * images + mask * patch_value

# show_image(patched_img[0])
outputs = model(patched_img)
loss = loss_function(outputs, labels)
loss.backward()
patch_gard = torch.sign(patch_value.grad)

010号坑

问题描述

第一次运行model = timm.create_model(‘resnet18’, pretrained=True, num_classes=2)时出现如下报错:

1
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

解决方法

添加镜像网站运行:
方法1)在python代码中加入以下代码:

1
2
import os 
os.environ["HF_ENDPOINT"]="http://hf-mirror.com"

方法2)在命令行中设置环境变量运行代码
1
HF_ENDPOINT=https://hf-mirror.com python xx.py

Reference

https://blog.csdn.net/weixin_44257107/article/details/136532423

011号坑

问题描述

ssh登陆上T4之后,查看文件夹,中文全部显示问号了

解决方法

终端输入

1
export LANG="zh_CN.UTF-8";export LANGUAGE="zh_CN:zh";export LC_ALL="zh_CN.UTF-8"

Reference

https://zhuanlan.zhihu.com/p/338941791
history看了一下,原来是被别人改成另外的语言了。。。。T4上没有sudo权限,先这么改着用。

012号坑

问题描述

运行IQA_Attack的时候报错

1
2
3
  File "/data1/ranyu/workspace/IQA_ATTACK_P100/Attack/misc.py", line 23, in <module>
from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/data1/ranyu/home/anaconda3/envs/ranyu/lib/python3.11/site-packages/torchvision/ops/__init__.py)

解决方法

把项目里的misc.py文件里的下面三行注释掉

1
2
3
if float(torchvision.__version__[:3]) < 0.7:
from torchvision.ops import _new_empty_tensor
from torchvision.ops.misc import _output_size

Reference

https://zhuanlan.zhihu.com/p/586622773

013号坑

问题描述

运行IQA_attack的时候报错:

1
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0

解决方法

问题原因:os.environ[“CUDA_VISIBLE_DEVICES”]=”1”写在靠后的位置了,导致模型初始化已经加载在其他卡上;
解决办法:在代码第一行设置os.environ[“CUDA_VISIBLE_DEVICES”]=“1”

Reference

014号坑

问题描述

macos升级后使用git报错:xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

解决方法

重装xcode
command line:

1
xcode-select --install

如果没有解决问题,执行以下命令
1
sudo xcode-select -switch /

Reference

https://blog.csdn.net/weixin_45072479/article/details/132862357