重试你的 Python 代码直到它失败

使用 Tenacity 和 Mock 库来查找隐藏在你代码深处的错误。

图片来自

Jen Wike Huger 拍摄的照片，CC BY-SA；原始照片由 Torkild Retvedt 拍摄

有时，函数在错误的输入或错误的程序状态下被调用，因此会失败。在像 Python 这样的语言中，这通常会导致异常。

但有时异常是由不同的问题引起的，或者是暂时的。想象一下，必须在缓存数据被清理的情况下保持工作的代码。理论上，代码和清理程序可以仔细地商定清理方法，以防止代码尝试访问不存在的文件或目录。不幸的是，这种方法既复杂又容易出错。然而，大多数这些问题都是暂时的，因为清理程序最终会创建正确的结构。

更常见的是，网络编程的不确定性意味着一些抽象网络调用的函数会因为数据包丢失或损坏而失败。

一个常见的解决方案是重试失败的代码。这种做法允许跳过过渡性问题，同时在问题持续存在时仍然（最终）失败。Python 有几个库可以使重试更容易。这是一个常见的“手指练习”。

Tenacity

一个超越手指练习并进入有用抽象的库是 tenacity。使用 pip install tenacity 安装它，或者在你的 pyproject.toml 文件中使用 dependencies = tenacity 行来依赖它。

设置日志记录

tenacity 的一个方便的内置功能是对日志记录的支持。对于错误处理，查看关于重试尝试的日志详细信息是非常宝贵的。

为了允许其余的示例显示日志消息，设置日志库。在一个真实的程序中，中心入口点或日志配置插件会执行此操作。这是一个示例

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s:%(name)s:%(levelname)s:%(message)s",
)

TENACITY_LOGGER = logging.getLogger("Retrying")

选择性失败

为了演示 tenacity 的功能，有一种方法可以在最终成功之前失败几次是很有帮助的。使用 unittest.mock 对于这种情况很有用。

from unittest import mock

thing = mock.MagicMock(side_effect=[ValueError(), ValueError(), 3])

如果你是单元测试新手，请阅读我的关于 mock 的文章。

在展示 tenacity 的强大功能之前，先看看当你直接在一个函数内部实现重试时会发生什么。演示这一点可以很容易地看出使用 tenacity 可以节省的人工工作。

def useit(a_thing):
    for i in range(3):
        try:
            value = a_thing()
        except ValueError:
            TENACITY_LOGGER.info("Recovering")
            continue
        else:
            break
    else:
        raise ValueError()
    print("the value is", value)

该函数可以被调用，使用永远不会失败的东西

>>> useit(lambda: 5)
the value is 5

使用最终会成功的东西

>>> useit(thing)

2023-03-29 17:00:42,774:Retrying:INFO:Recovering
2023-03-29 17:00:42,779:Retrying:INFO:Recovering

the value is 3

使用失败次数过多的东西调用该函数最终会很糟糕

try:
    useit(mock.MagicMock(side_effect=[ValueError()] * 5 + [4]))
except Exception as exc:
    print("could not use it", repr(exc))

结果


2023-03-29 17:00:46,763:Retrying:INFO:Recovering
2023-03-29 17:00:46,767:Retrying:INFO:Recovering
2023-03-29 17:00:46,770:Retrying:INFO:Recovering

could not use it ValueError()

简单的 tenacity 用法

在很大程度上，上面的函数是在重试代码。下一步是让装饰器处理重试逻辑

import tenacity

my_retry=tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
)

Tenacity 支持指定的尝试次数以及在获取异常后进行日志记录。

useit 函数不再需要关心重试。有时函数仍然考虑可重试性是有意义的。Tenacity 允许代码通过引发特殊的异常 TryAgain 来自行确定可重试性

@my_retry
def useit(a_thing):
    try:
        value = a_thing()
    except ValueError:
        raise tenacity.TryAgain()
    print("the value is", value)

现在当调用 useit 时，它会重试 ValueError，而无需自定义重试代码

useit(mock.MagicMock(side_effect=[ValueError(), ValueError(), 2]))

输出

2023-03-29 17:12:19,074:Retrying:WARNING:Finished call to '__main__.useit' after 0.000(s), this was the 1st time calling it.
2023-03-29 17:12:19,080:Retrying:WARNING:Finished call to '__main__.useit' after 0.006(s), this was the 2nd time calling it.

the value is 2

配置装饰器

上面的装饰器只是 tenacity 支持的一小部分示例。这是一个更复杂的装饰器

my_retry = tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    before=tenacity.before_log(TENACITY_LOGGER, logging.WARNING),
    retry=tenacity.retry_if_exception_type(ValueError),
    wait=tenacity.wait_incrementing(1, 10, 2),
    reraise=True
)

更多 Python 资源

这是一个更真实的装饰器示例，带有额外的参数

before：在调用函数之前记录日志
retry：与其只重试 TryAgain，不如使用给定的标准重试异常
wait：在调用之间等待（如果调用服务，这尤其重要）
reraise：如果重试失败，则重新引发上次尝试的异常

既然装饰器也指定了可重试性，那么从 useit 中删除代码

@my_retry
def useit(a_thing):
    value = a_thing()
    print("the value is", value)

这是它的工作原理

useit(mock.MagicMock(side_effect=[ValueError(), 5]))

输出

2023-03-29 17:19:39,820:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:19:39,823:Retrying:WARNING:Finished call to '__main__.useit' after 0.003(s), this was the 1st time calling it.
2023-03-29 17:19:40,829:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.


the value is 5

请注意第二行和第三行日志行之间的时间延迟。它几乎正好是一秒

>>> useit(mock.MagicMock(side_effect=[5]))

2023-03-29 17:20:25,172:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.

the value is 5

更详细

try:
    useit(mock.MagicMock(side_effect=[ValueError("detailed reason")]*3))
except Exception as exc:
    print("retrying failed", repr(exc))

输出

2023-03-29 17:21:22,884:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:21:22,888:Retrying:WARNING:Finished call to '__main__.useit' after 0.004(s), this was the 1st time calling it.
2023-03-29 17:21:23,892:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.
2023-03-29 17:21:23,894:Retrying:WARNING:Finished call to '__main__.useit' after 1.010(s), this was the 2nd time calling it.
2023-03-29 17:21:25,896:Retrying:WARNING:Starting call to '__main__.useit', this is the 3rd time calling it.
2023-03-29 17:21:25,899:Retrying:WARNING:Finished call to '__main__.useit' after 3.015(s), this was the 3rd time calling it.

retrying failed ValueError('detailed reason')

再次，使用 KeyError 而不是 ValueError

try:
    useit(mock.MagicMock(side_effect=[KeyError("detailed reason")]*3))
except Exception as exc:
    print("retrying failed", repr(exc))

输出

2023-03-29 17:21:37,345:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.

retrying failed KeyError('detailed reason')

将装饰器与控制器分离

通常，类似的重试参数需要重复使用。在这些情况下，最好使用参数创建一个重试控制器

my_retryer = tenacity.Retrying(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    before=tenacity.before_log(TENACITY_LOGGER, logging.WARNING),
    retry=tenacity.retry_if_exception_type(ValueError),
    wait=tenacity.wait_incrementing(1, 10, 2),
    reraise=True
)

使用重试控制器装饰函数

@my_retryer.wraps
def useit(a_thing):
    value = a_thing()
    print("the value is", value)

运行它

>>> useit(mock.MagicMock(side_effect=[ValueError(), 5]))

2023-03-29 17:29:25,656:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:29:25,663:Retrying:WARNING:Finished call to '__main__.useit' after 0.008(s), this was the 1st time calling it.
2023-03-29 17:29:26,667:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.

the value is 5

这允许你收集上次调用的统计信息

>>> my_retryer.statistics

{'start_time': 26782.847558759,
 'attempt_number': 2,
 'idle_for': 1.0,
 'delay_since_first_attempt': 0.0075125470029888675}

使用这些统计信息来更新内部统计注册表并与你的监控框架集成。

扩展 tenacity

装饰器的许多参数都是对象。这些对象可以是子类的对象，从而实现深度可扩展性。

例如，假设斐波那契数列应该确定等待时间。问题在于，请求等待时间的 API 只给出尝试次数，因此通常的迭代计算斐波那契数列的方式没有用处。

实现目标的一种方法是使用封闭公式

$Closed formula for a Fibonacci sequence, written in LaTeX as $(((1+\sqrt{5})/2)^n - ((1-\sqrt{5})/2)^n)/\sqrt{5}$$

一个小技巧是跳过减法，转而四舍五入到最接近的整数

$Variant formula for a Fibonacci sequence, written in LaTeX as $\operatorname{round}((((1+\sqrt{5})/2)^n)/\sqrt{5})$$

这可以转换为 Python，如下所示

int(((1 + sqrt(5))/2)**n / sqrt(5) + 0.5)

这可以直接在 Python 函数中使用

from math import sqrt

def fib(n):
    return int(((1 + sqrt(5))/2)**n / sqrt(5) + 0.5)

斐波那契数列从 0 开始计数，而尝试次数从 1 开始，因此 wait 函数需要对此进行补偿

def wait_fib(rcs):
    return fib(rcs.attempt_number - 1)

该函数可以直接作为 wait 参数传递

@tenacity.retry(
    stop=tenacity.stop_after_attempt(7),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    wait=wait_fib,
)
def useit(thing):
    print("value is", thing())
try:
    useit(mock.MagicMock(side_effect=[tenacity.TryAgain()] * 7))
except Exception as exc:
    pass

试用一下

2023-03-29 18:03:52,783:Retrying:WARNING:Finished call to '__main__.useit' after 0.000(s), this was the 1st time calling it.
2023-03-29 18:03:52,787:Retrying:WARNING:Finished call to '__main__.useit' after 0.004(s), this was the 2nd time calling it.
2023-03-29 18:03:53,789:Retrying:WARNING:Finished call to '__main__.useit' after 1.006(s), this was the 3rd time calling it.
2023-03-29 18:03:54,793:Retrying:WARNING:Finished call to '__main__.useit' after 2.009(s), this was the 4th time calling it.
2023-03-29 18:03:56,797:Retrying:WARNING:Finished call to '__main__.useit' after 4.014(s), this was the 5th time calling it.
2023-03-29 18:03:59,800:Retrying:WARNING:Finished call to '__main__.useit' after 7.017(s), this was the 6th time calling it.
2023-03-29 18:04:04,806:Retrying:WARNING:Finished call to '__main__.useit' after 12.023(s), this was the 7th time calling it.

从“之后”时间中减去后续数字并四舍五入以查看斐波那契数列

intervals = [
    0.000,
    0.004,
    1.006,
    2.009,
    4.014,
    7.017,
    12.023,
]
for x, y in zip(intervals[:-1], intervals[1:]):
    print(int(y-x), end=" ")

它工作吗？是的，完全符合预期

0 1 1 2 3 5

总结

编写临时的重试代码可能是一种有趣的消遣。对于生产级代码，更好的选择是像 tenacity 这样的成熟库。tenacity 库是可配置和可扩展的，它很可能满足你的需求。

标签

Python

Moshe Zadka

Moshe sitting down, head slightly to the side. His t-shirt has Guardians of the Galaxy silhoutes against a background of sound visualization bars.

Moshe 自 1998 年以来一直参与 Linux 社区，在 Linux “安装聚会”中提供帮助。他自 1999 年以来一直在编写 Python 代码，并为核心 Python 解释器做出了贡献。Moshe 在这些术语存在之前就一直是 DevOps/SRE，他非常关心软件可靠性、构建可重复性以及其他此类事情。

更多关于我