Quản lý service với systemd: lab start stop restart, unit file và troubleshooting thực chiến

Quản lý service với systemd là công việc diễn ra gần như mỗi ngày của SysAdmin. Từ web server, database, agent monitoring cho tới cron replacement bằng timer, đa số dịch vụ trên Linux hiện đại đều được quản lý qua systemd.

Bài này tập trung vào các thao tác thật: xem status, start/stop/restart, enable khi boot, đọc log liên quan và hiểu unit file ở mức đủ để vận hành production an toàn.

1. Vì sao systemd quan trọng?

quản lý vòng đời service
ghi log tập trung qua journal
hỗ trợ dependency, restart policy, timer

2. Lệnh cơ bản phải thuộc

systemctl status nginx --no-pager
sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx
sudo systemctl reload nginx
sudo systemctl enable nginx
sudo systemctl disable nginx

Phân biệt nhanh:

restart: dừng rồi chạy lại
reload: nạp cấu hình nếu ứng dụng hỗ trợ
enable: tự chạy khi boot

3. Đọc log của service

journalctl -u nginx -n 100 --no-pager
journalctl -u nginx -f

Đây là bước bắt buộc trước khi quyết định restart liên tục.

4. Xem service nào đang lỗi

systemctl --failed

Trên production, lệnh này cho cái nhìn nhanh nếu nhiều service có vấn đề sau reboot hoặc sau một đợt update.

5. Hiểu unit file ở mức vận hành

systemctl cat nginx
systemctl show nginx --no-pager | head -40

Những trường nên chú ý:

ExecStart
Restart
WantedBy
EnvironmentFile nếu có

6. Tạo service lab đơn giản

Tạo script:

cat <<'EOF' > /usr/local/bin/hello-service.sh
#!/usr/bin/env bash
while true; do
  echo "$(date) hello from lab service"
  sleep 30
done
EOF
sudo chmod +x /usr/local/bin/hello-service.sh

Tạo unit file:

cat <<'EOF' | sudo tee /etc/systemd/system/hello-lab.service
[Unit]
Description=Hello Lab Service
After=network.target

[Service]
ExecStart=/usr/local/bin/hello-service.sh
Restart=always

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now hello-lab.service

Kiểm tra:

systemctl status hello-lab.service --no-pager
journalctl -u hello-lab.service -n 20 --no-pager

7. Troubleshooting thực chiến

service start fail vì đường dẫn ExecStart sai
service restart loop vì process thoát ngay
service chạy được bằng tay nhưng fail khi chạy qua systemd do thiếu environment hoặc quyền
quên daemon-reload sau khi sửa unit file

8. Lab step-by-step

Cài Nginx hoặc dùng service có sẵn.
Xem systemctl status và journalctl.
Dừng service, xác minh cổng/nghiệp vụ ngừng hoạt động.
Bật lại service.
Tạo hello-lab.service như ví dụ trên.
Cố tình sửa sai ExecStart, rồi xem log để tự tìm lỗi.

9. Tài liệu chính thống

10. Checklist production

đọc log trước khi restart liên tục
biết service có hỗ trợ reload hay không
mọi sửa unit file đều chạy daemon-reload
kiểm tra enable cho service cần tự khởi động sau reboot
có runbook xác định service nào phụ thuộc service nào

11. Phân biệt restart, reload và daemon-reload

restart: dừng rồi chạy lại service, có thể gây gián đoạn.
reload: nạp lại cấu hình nếu service hỗ trợ, thường ít gián đoạn hơn.
daemon-reload: bảo systemd đọc lại unit file sau khi anh chỉnh file .service.

Rất nhiều người sửa unit file xong restart service nhưng quên daemon-reload, dẫn tới systemd vẫn dùng cấu hình cũ.

12. Override unit an toàn bằng systemctl edit

sudo systemctl edit nginx

Cách này tạo file override trong /etc/systemd/system/nginx.service.d/override.conf thay vì sửa trực tiếp unit gốc. Đây là thói quen tốt vì dễ audit và ít bị ghi đè khi update package.

13. Runbook ngắn khi service không lên

systemctl status ten-service --no-pager
journalctl -u ten-service -n 100 --no-pager
kiểm tra file cấu hình liên quan
kiểm tra port có bị chiếm không bằng ss -tulpn
kiểm tra quyền file, user chạy service, và dependency
sửa xong mới restart và xác minh lại

14. Lab thực tế: cố tình làm service fail rồi sửa

Nếu dùng Nginx, anh có thể chỉnh thử một lỗi nhỏ trong cấu hình lab, sau đó chạy:

sudo nginx -t
sudo systemctl restart nginx
journalctl -u nginx -n 50 --no-pager

Mục tiêu không phải phá service, mà là tập phản xạ đọc lỗi cấu hình thay vì restart theo bản năng.

Kết luận: nắm chắc Quản lý service với systemd giúp anh phản ứng tốt hơn khi dịch vụ production lỗi, thay vì xử lý theo cảm tính.