去年,我遇到了一个问题:HTTPD (Apache Web 服务器) 在重启或冷启动时无法启动。为了解决这个问题,我添加了一个覆盖文件,/etc/systemd/system/httpd.service.d/override.conf
。它包含以下语句,用于延迟 HTTPD 的启动,直到网络正确启动并在线。(如果您读过我之前的文章,您就会知道我使用 NetworkManager 和 systemd,而不是旧的 SystemV 网络服务和启动脚本)。
# Trying to delay the startup of httpd so that the network is
# fully up and running so that httpd can bind to the correct
# IP address
#
# By David Both, 2020-04-16
[Unit]
After=network-online.target
Wants=network-online.target
这种权宜之计一直有效,直到最近,我不仅需要手动启动 HTTPD;我还需要手动启动 DHCPD。等待 network-online.target
不再起作用,原因不明。
原因和我的解决方法
在进行了更多的互联网搜索并在我的 /etc
目录中挖掘一番后,我认为我发现了真正的罪魁祸首:我在 /etc/init.d
目录中找到了 SystemV 和 init 时代的古老遗留物。那里有一个旧的网络启动文件的副本,本不应该在那里。我认为这个文件是我在切换到 NetworkManager 之前花了一些时间使用旧的网络程序时遗留下来的。
显然,systemd 做了它应该做的事情。它从 SystemV 启动脚本中动态生成了一个目标文件,并尝试使用 SystemV 启动脚本和它创建的 systemd 目标来启动网络。这导致 systemd 尝试在网络准备就绪之前启动 HTTPD 和 DHCPD,这些服务超时且未能启动。
我从服务器中删除了 /etc/init.d/network
脚本,现在它可以在无需我手动启动 HTTPD 和 DHCPD 服务的情况下重启。这是一个更好的解决方案,因为它找到了根本原因,而不仅仅是一种权宜之计。
但这仍然不是最佳解决方案。该文件归 network-scripts
软件包所有,如果该软件包更新,它将被替换。因此,我还从服务器中删除了该软件包,这确保了这种情况不会再次发生。您能猜到我是如何发现这一点的吗?
在升级到 Fedora 34 后,DHCPD 和 HTTPD 再次无法启动。经过一些额外的实验,我发现 override.conf
文件也需要添加几行代码。这两行新代码强制这两个服务在启动前等待 60 秒。这似乎再次解决了问题——目前是这样。
修改后的 override.conf
文件现在如下所示。它不仅在启动服务前休眠 60 秒,还指定它不应该启动,直到 network-online.target
启动之后。后者似乎已损坏,但我想我最好两件事都做,因为其中一个通常似乎有效。
# Delay the startup of any network service so that the
# network is fully up and running so that httpd can bind to the correct
# IP address.
#
# By David Both, 2020-04-28
#
################################################################################
# #
# Copyright (C) 2021 David Both #
# LinuxGeek46@both.org #
# #
# This program is free software; you can redistribute it and/or modify #
# it under the terms of the GNU General Public License as published by #
# the Free Software Foundation; either version 2 of the License, or #
# (at your option) any later version. #
# #
# This program is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
# GNU General Public License for more details. #
# #
# You should have received a copy of the GNU General Public License #
# along with this program; if not, write to the Free Software #
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #
# #
################################################################################
[Service]
ExecStartPre=/bin/sleep 60
[Unit]
After=network-online.target
Wants=network-online.target
使用 Ansible 使其更轻松
这类问题非常适合使用 Ansible 轻松解决。因此,我创建了一个相对简单的 playbook。它有两个 play。第一个 play 删除 network-scripts
,然后删除 /etc/init.d/network
脚本,因为如果脚本在那里而软件包不在,则脚本将不会被删除。我的至少一个系统就遇到了这种情况。我对所有主机(无论是工作站还是服务器)都运行此 play。
第二个 play 仅针对服务器运行,并安装 override.conf
文件。
################################################################################
# fix-network #
# #
# This Ansible playbook removes the network-scripts package and the #
# /etc/rc.d/init.d/network SystemV start script. The /etc/init.d/network #
# script which conflicts with NetworkManager and causes some network services #
# such as DHCPD and HTTPD to fail to start. #
# #
# This playbook also installs override files for httpd and dhcpd which causes #
# them to wait 60 seconds before starting. #
# #
# All of these things taken together seem to resolve or circumvent the issues #
# that seem to stem from multiple causes. #
# #
# NOTE: The override file is service neutral and can be used with any service. #
# I have found that using the systemctl edit command does not work as #
# it is supposed to according to the documenation. #
# #
# #
# From the network-scripts package info: #
# #
# : This package contains the legacy scripts for activating & deactivating of most
# : network interfaces. It also provides a legacy version of 'network' service.
# :
# : The 'network' service is enabled by default after installation of this package,
# : and if the network-scripts are installed alongside NetworkManager, then the
# : ifup/ifdown commands from network-scripts take precedence over the ones provided
# : by NetworkManager.
# :
# : If user has both network-scripts & NetworkManager installed, and wishes to
# : use ifup/ifdown from NetworkManager primarily, then they has to run command:
# : $ update-alternatives --config ifup
# :
# : Please note that running the command above will also disable the 'network'
# : service.
# #
# #
#------------------------------------------------------------------------------#
# #
# Change History #
# 2021/04/26 David Both V01.00 New code. #
# 2021/04/28 David Both V01.10 Revised to also remove network-scripts package. #
# Also install an override file to do a 60 second #
# timeout before the services start. # # #
################################################################################
---
################################################################################
# Play 1: Remove the /etc/init.d/network file
################################################################################
- name: Play 1 - Remove the network-scripts legacy package on all hosts
hosts: all
tasks:
- name: Remove the network-scripts package if it exists
dnf:
name: network-scripts
state: absent
- name: Remove /etc/init.d/network file if it exists but the network-scripts package is not installed
ansible.builtin.file:
path: /etc/init.d/network
state: absent
- name: Play 2 - Install override files for the server services
hosts: server
tasks:
- name: Install the override file for DHCPD
copy:
src: /root/ansible/BasicTools/files/override.conf
dest: /etc/systemd/system/dhcpd.service.d
mode: 0644
owner: root
group: root
- name: Install the override file for HTTPD
copy:
src: /root/ansible/BasicTools/files/override.conf
dest: /etc/systemd/system/httpd.service.d
mode: 0644
owner: root
group: root
这个 Ansible play 从我的网络上的另外两台主机和我在另一个网络上支持的一台主机中删除了这些旧文件。所有仍然有 SystemV 网络脚本和 network-scripts
软件包的主机都已经好几年没有从头开始重新安装了;它们都是使用 dnf-upgrade
升级的。我从未在新主机上规避 NetworkManager,因此它们没有这个问题。
这个 playbook 还为这两个服务安装了 override 文件。请注意,override 文件没有引用它提供配置 override 的服务。因此,它可以用于任何因启动尝试未允许 NetworkManager 服务完成启动而无法启动的服务。
最后的想法
尽管这个问题与 systemd 启动有关,但我不能责怪 systemd。这至少部分是自找的麻烦,原因是我规避了 systemd。当时,我认为这样做会让自己更轻松,但我花了更多时间试图找到因我避免使用 NetworkManager 而引起的问题,而不是我节省的时间,因为我无论如何都必须学习它。然而实际上,这个问题有多种可能的原因,所有这些原因都可以通过 Ansible playbook 解决。
1 条评论