如何解决TensorFlow中tf.sqrt方法返回NaN或inf的问题？

问题现象与诊断

在使用tf.sqrt()进行平方根计算时，开发者经常遇到返回NaN(非数字)或inf(无穷大)的情况。典型错误场景包括：

数学上平方根函数的定义域限制是产生这些问题的基础原因：

# 问题重现示例
import tensorflow as tf
negative_tensor = tf.constant([-1.0, 0.0, 4.0])
result = tf.sqrt(negative_tensor)  # 返回 [nan, 0., 2.]

在实数范围内，负数平方根会产生复数结果，但tf.sqrt默认返回实数。当输入包含负数时：

极端数值情况会导致异常：

使用tf.maximum确保非负输入：

safe_input = tf.maximum(x, 1e-10)
result = tf.sqrt(safe_input)

实现带数值检查的包装函数：

def safe_sqrt(x, eps=1e-8):
    with tf.name_scope('safe_sqrt'):
        return tf.sqrt(tf.maximum(x, eps))

如需处理负值，可转换为复数计算：

complex_result = tf.sqrt(tf.cast(x, tf.complex64))

对于FP16训练场景，需额外处理：

在自定义梯度计算时确保稳定性：

@tf.custom_gradient
def stable_sqrt(x):
    def grad(dy):
        return dy * (0.5 / tf.sqrt(tf.maximum(x, 1e-10)))
    return tf.sqrt(x), grad

在Multi-GPU/TPU环境下：

不同解决方案的耗时比较（Tesla V100 GPU）：