使用DistilBertTokenizer.from_pretrained方法时遇到"OSError: Unable to load weights"错误的解决方法

问题现象与背景

在使用Hugging Face的transformers库加载DistilBERT分词器时，开发者经常会遇到如下错误：

OSError: Unable to load weights from pytorch_model.bin. 
If you tried to load a PyTorch model from a TF 2.0 checkpoint, 
set `from_tf=True`.

这个错误通常发生在执行DistilBertTokenizer.from_pretrained('distilbert-base-uncased')时，表明系统无法正确加载预训练模型的权重文件。该问题可能由多种因素引起，包括但不限于网络连接问题、缓存目录权限、模型版本不匹配等。

transformers库默认会从Hugging Face Hub下载预训练模型。当网络连接不稳定或被防火墙阻挡时，可能导致：

transformers会将下载的模型缓存在~/.cache/huggingface/transformers目录。常见问题包括：

不同版本的transformers库与模型权重可能存在兼容性差异：

tokenizer = DistilBertTokenizer.from_pretrained(
    'distilbert-base-uncased',
    force_download=True,
    resume_download=False
)

手动下载模型文件后指定本地路径：

tokenizer = DistilBertTokenizer.from_pretrained(
    './local/path/to/distilbert',
    local_files_only=True
)

Python代码清除缓存：

from transformers.utils import TRANSFORMERS_CACHE
import shutil
shutil.rmtree(TRANSFORMERS_CACHE, ignore_errors=True)

对于顽固的兼容性问题：

pip install transformers==4.19.0 torch==1.11.0

import logging
logging.basicConfig(level=logging.DEBUG)

验证下载文件是否完整：

import hashlib
with open('pytorch_model.bin','rb') as f:
    print(hashlib.md5(f.read()).hexdigest())

设置环境变量使用国内镜像：

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'