使用 Luminoth 的开源图像识别

Open source image recognition with Luminoth

图片来源：

Agustin Azzinnari，CC BY

计算机视觉是一种使用人工智能来自动化图像识别的方法，即使用计算机来识别照片、视频或其他图像类型中的内容。Luminoth (v. 0.1) 的最新版本，这是一个用 Python 构建并使用 Tensorflow 和 Sonnet 的开源计算机视觉工具包，与其前身相比，提供了多项改进。

实现了 Single Shot MultiBox Detector (SSD) 模型，这是一种比已包含的 Faster R-CNN 更快（但精度稍低）的目标检测器。SSD 可以在大多数现代 GPU 上实现实时目标检测，以支持例如视频流的处理。
对 Faster R-CNN 模型进行了一些调整，并提供了一个新的基础配置，使其在 COCO 和 Pascal VOC 视觉目标检测数据集上训练时，能够达到与现有实现相当的结果。
分别在 Pascal 和 COCO 数据集上训练的 SSD 和 Faster R-CNN 模型的检查点，具有最先进的结果。这使得图像中的目标检测非常简单，因为这些检查点将由库自动下载，即使只是使用命令行界面 (CLI) 时也是如此。
通用可用性改进，例如更简洁的 CLI 用于大多数命令，支持视频预测，以及重新设计了包含的 Web 前端，使其更容易使用模型。

让我们通过逐步构建我们自己的计算机视觉图像检测器来探索这些功能。

安装和测试 Luminoth

首先，安装 Luminoth。在您的虚拟环境中，运行

$ pip install luminoth

如果您有可用的 GPU 并想使用它，请先运行 pip install tensorflow-gpu，然后再运行上面的安装命令。

Luminoth 的新检查点功能为 Faster R-CNN 和 SSD 提供了开箱即用的预训练模型。这意味着您只需几个命令即可下载并使用完全训练的目标检测模型。让我们首先使用 Luminoth 的 CLI 工具 lumi 刷新检查点存储库

$ lumi checkpoint refresh
Retrieving remote index... done.
2 new remote checkpoints added.
$ lumi checkpoint list
================================================================================
|           id |                  name |       alias | source |         status |
================================================================================
| 48ed2350f5b2 |   Faster R-CNN w/COCO |    accurate | remote | NOT_DOWNLOADED |
| e3256ffb7e29 |      SSD w/Pascal VOC |        fast |  local | NOT_DOWNLOADED |
================================================================================

输出显示所有可用的预训练检查点。每个检查点都用 id 字段（在本例中为 48ed2350f5b2 和 e3256ffb7e29）和可能的 alias（例如，accurate 和 fast）标识。您可以使用命令 lumi checkpoint detail <checkpoint_id_or_alias> 查看其他信息。我们将尝试 Faster R-CNN 检查点，因此我们将下载它（通过使用别名而不是 ID），然后使用 lumi predict 命令

$ lumi checkpoint download accurate
Downloading checkpoint...  [####################################]  100%
Importing checkpoint... done.
Checkpoint imported successfully.
$ lumi predict image.png
Found 1 files to predict.
Neither checkpoint not config specified, assuming `accurate`.
Predicting image.jpg... done.
{
  "file": "image.jpg",
  "objects": [
    {"bbox": [294, 231, 468, 536], "label": "person", "prob": 0.9997},
    {"bbox": [494, 289, 578, 439], "label": "person", "prob": 0.9971},
    {"bbox": [727, 303, 800, 465], "label": "person", "prob": 0.997},
    {"bbox": [555, 315, 652, 560], "label": "person", "prob": 0.9965},
    {"bbox": [569, 425, 636, 600], "label": "bicycle", "prob": 0.9934},
    {"bbox": [326, 410, 426, 582], "label": "bicycle", "prob": 0.9933},
    {"bbox": [744, 380, 784, 482], "label": "bicycle", "prob": 0.9334},
    {"bbox": [506, 360, 565, 480], "label": "bicycle", "prob": 0.8724},
    {"bbox": [848, 319, 858, 342], "label": "person", "prob": 0.8142},
    {"bbox": [534, 298, 633, 473], "label": "person", "prob": 0.4089}
  ]
}

lumi predict 命令默认使用别名为 accurate 的检查点，但我们可以使用选项 --checkpoint=<alias_or_id> 指定其他检查点。在现代 CPU 上大约 30 秒后，这是输出

People and bikes detected with the Faster R-CNN model.

您还可以将 JSON 输出写入文件（通过 --output 或 -f 选项），并使 Luminoth 存储绘制了边界框的图像（通过 --save-media-to 或 -d 选项）。

现在是实时

除非您在未来几年后阅读本文（来自过去的问候！），否则您可能已经注意到 Faster R-CNN 花费了相当长的时间来检测图像中的对象。这是因为该模型优先考虑预测准确性而不是计算效率，因此将其用于实时视频处理等操作是不可行的（尤其是在您没有现代硬件的情况下）。即使在相当快的 GPU 上，Faster R-CNN 每秒也无法处理超过两到五张图像。

引入 Single-Shot MultiBox Detector。此模型以较低的精度（随着您想要检测的类别越多而增加）换取速度：在上面使用的相同硬件上，每秒大约 60 张图像，使其适合在视频流或一般视频上运行。

让我们试用一下。再次运行 lumi predict，但这次使用 fast 检查点。此外，这次我们不会预先下载它；CLI 将注意到该命令并在远程存储库中查找它。

$ lumi predict video.mp4 --checkpoint=fast --save-media-to=.
Found 1 files to predict.
Predicting video.mp4  [####################################]  100%     fps: 45.9

Single Shot MultiBox Detector model applied to a dog playing fetch.

速度快得多！该命令将通过逐帧运行 SSD 来生成视频，因此没有花哨的时序预测模型（至少目前是这样）。在实践中，这意味着您可能会看到框中出现一些抖动，以及一些预测凭空出现和消失，但这并不是一些后处理无法解决的问题。

训练您自己的模型

假设您想检测窗外的汽车，并且您对 COCO 中存在的 80 个类别不感兴趣。训练您的模型以检测较少数量的类别可能会提高检测质量，所以让我们这样做。但是请注意，在 CPU 上进行训练可能需要相当长的时间，因此请务必使用 GPU 或云服务，例如 Google 的 ML Engine（Luminoth 与之集成）。

Luminoth 包含从标准格式（例如 COCO 和 Pascal 使用的格式）准备和构建自定义数据集的工具。您还可以构建自己的数据集转换器以支持您自己的格式，但这超出了本文的主题范围。目前，我们将使用 lumi dataset CLI 工具来构建一个仅包含汽车的数据集，该数据集取自 COCO 和 Pascal（2007 年和 2012 年）。

首先从 Pascal 2007、Pascal 2012 和 COCO 下载数据集，并将它们存储在您的工作目录中创建的 datasets/ 目录中（具体来说：datasets/pascal/2007/、datasets/pascal/2012/ 和 datasets/coco/）。然后运行以下命令将所有数据合并到一个 .tfrecords 文件中，该文件已准备好被 Luminoth 使用

$ lumi dataset transform \
        --type pascal \
        --data-dir datasets/pascal/VOCdevkit/VOC2007/ \
        --output-dir datasets/pascal/tf/2007/ \
        --split train --split val --split test \
        --only-classes=car
$ lumi dataset transform \
        --type pascal \
        --data-dir datasets/pascal/VOCdevkit/VOC2012/ \
        --output-dir datasets/pascal/tf/2012/ \
        --split train --split val \
        --only-classes=car
$ lumi dataset transform \
        --type coco \
        --data-dir datasets/coco/ \
        --output-dir datasets/coco/tf/ \
        --split train --split val \
        --only-classes=car
$ lumi dataset merge \
        datasets/pascal/tf/2007/classes-car/train.tfrecords \
        datasets/pascal/tf/2012/classes-car/train.tfrecords \
        datasets/coco/tf/classes-car/train.tfrecords \
        datasets/tf/train.tfrecords
$ lumi dataset merge \
        datasets/pascal/tf/2007/classes-car/val.tfrecords \
        datasets/pascal/tf/2012/classes-car/val.tfrecords \
        datasets/coco/tf/classes-car/val.tfrecords \
        datasets/tf/val.tfrecords

现在我们准备开始训练。要使用 Luminoth 训练模型，您必须创建一个配置文件，指定一些必需的信息（例如运行名称、数据集位置和要使用的模型，以及一系列模型相关的超参数）。由于 Luminoth 提供了基本配置文件，因此像这样的配置就足够了

train:
  run_name: ssd-cars
  # Directory in which model checkpoints & summaries (for Tensorboard) will be saved.
  job_dir: jobs/

  # Specify the learning rate schedule to use. These defaults should be good enough.
  learning_rate:
    decay_method: piecewise_constant
    boundaries: [1000000, 1200000]
    values: [0.0003, 0.0001, 0.00001]

dataset:
  type: object_detection
  # Directory from which to read the dataset.
  dir: datasets/tf/

model:
  type: ssd
  network:
    # Total number of classes to predict. One, in this case.
    num_classes: 1

将其存储在您的工作目录（datasets/ 所在的位置）中，命名为 config.yml。正如您所看到的，我们将训练一个 SSD 模型。运行以下命令

$ lumi train -c config.yml
INFO:tensorflow:Starting training for SSD
INFO:tensorflow:Constructing op to load 32 variables from pretrained checkpoint
INFO:tensorflow:ImageVisHook was created with mode = "debug"
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into jobs/ssd-cars/model.ckpt.
INFO:tensorflow:step: 1, file: b'000004.jpg', train_loss: 20.626895904541016, in 0.07s
INFO:tensorflow:step: 2, file: b'000082.jpg', train_loss: 12.471542358398438, in 0.07s
INFO:tensorflow:step: 3, file: b'000074.jpg', train_loss: 7.3356428146362305, in 0.06s
INFO:tensorflow:step: 4, file: b'000137.jpg', train_loss: 8.618950843811035, in 0.07s
(ad infinitum)

几个小时后，该模型应该会产生一些合理的结果（当它超过 100 万步左右时，您可以停止它）。您可以使用内置的 Web 界面立即对其进行测试，方法是运行以下命令

$ lumi server web -c config.yml
Neither checkpoint not config specified, assuming 'accurate'.
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Luminoth's frontend with cars detected