SAM3 笔记4:ComfyUI + SAM3 容器化部署

摘要:
Meta 的 SAM3 (Segment Anything Model 3) 带来了强大的图像分割和视频跟踪能力。本文详细介绍了如何在 Docker 环境下部署 ComfyUI-SAM3,解决了依赖缺失、CUDA 编译加速以及模型路径配置等常见坑点,并提供了现成的 Docker 配置文件和测试工作流。


Meta 最近发布的 SAM3 在图像分割和视频对象跟踪方面表现出色。虽然 ComfyUI 社区迅速跟进适配了 PozzettiAndrea/ComfyUI-SAM3 插件,但在 Docker 环境下部署时,我们遇到了一系列依赖和环境问题。

本文将分享一套经过验证的 Docker 部署方案,包含显存优化、CUDA 加速编译以及常见报错修复。

1. 核心配置文件

我们将使用 pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel 作为基础镜像,以支持 SAM3 的 CUDA 加速扩展编译。

Dockerfile

这个 Dockerfile 修复了官方镜像中缺少 GitPythonuv 包管理器以及 SAM3 运行时必须的 ftfy 等库的问题。

# ------------------------------------------------------------------------------
# 阶段 1: 基础环境
# 使用 devel 版本以支持 Flash Attention 和 SAM3 Speedup 编译 (包含 nvcc)
# ------------------------------------------------------------------------------
FROM pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel AS base

# 环境变量配置
ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    HF_HOME="/root/cache/huggingface" \
    TORCH_HOME="/root/cache/torch" \
    # [关键] 强制 SAM3 初始化,防止 pytest 环境误报导致节点不加载
    SAM3_FORCE_INIT=1

# ------------------------------------------------------------------------------
# 阶段 2: 系统依赖
# ------------------------------------------------------------------------------
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential git wget curl nano zip unzip \
    ninja-build \
    libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev ffmpeg \
    libpng-dev libjpeg-dev \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# 升级 pip
RUN pip install --upgrade pip setuptools wheel

# ------------------------------------------------------------------------------
# 阶段 3: Python 依赖
# ------------------------------------------------------------------------------
WORKDIR /opt

# 1. 获取 ComfyUI 依赖列表
RUN wget https://raw.githubusercontent.com/comfyanonymous/ComfyUI/master/requirements.txt -O comfyui_requirements.txt

# 2. 移除 torch 相关行,防止覆盖基础镜像版本
RUN sed -i '/torch/d' comfyui_requirements.txt && \
    sed -i '/torchvision/d' comfyui_requirements.txt && \
    sed -i '/torchaudio/d' comfyui_requirements.txt

# 3. 安装 ComfyUI 依赖
RUN pip install -r comfyui_requirements.txt

# 4. 补回核心库 + 安装 ComfyUI-Manager 必须的 GitPython 和 uv
RUN pip install torchsde einops transformers safetensors GitPython uv

# 5. 安装 Flash Attention (耗时较长)
ENV MAX_JOBS=4
RUN pip install flash-attn --no-build-isolation

# 6. 预安装 SAM 3 及视频处理相关依赖
# [重点] 包含 ftfy 和 regex 以解决文本提示报错
RUN pip install opencv-python pycocotools matplotlib onnxruntime-gpu scipy timm huggingface_hub \
    hydra-core iopath moviepy av ftfy regex

# ------------------------------------------------------------------------------
# 阶段 4: 收尾与启动配置
# ------------------------------------------------------------------------------
WORKDIR /opt/ComfyUI
EXPOSE 8188

COPY scripts/entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["python", "main.py", "--listen", "0.0.0.0", "--port", "8188"]

docker-compose.yml

SAM3 处理视频时非常消耗内存,因此必须配置 shm_size

version: '3.8'

services:
  comfyui-sam3:
    build: .
    container_name: comfyui-sam3
    restart: unless-stopped
    ports:
      - "8188:8188"
    
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      # 开启 High VRAM 模式
      - CLI_ARGS=--highvram --preview-method auto
      - HF_HOME=/root/ai-dock/cache/huggingface
      - SAM3_FORCE_INIT=1
    
    # [关键] 增加共享内存,解决视频处理崩溃问题
    shm_size: 16gb
    
    volumes:
      - ./storage/comfyui_core:/opt/ComfyUI
      - ./storage/input:/opt/ComfyUI/input
      - ./storage/output:/opt/ComfyUI/output
      # 挂载模型库
      - ./storage/models:/opt/ComfyUI/models
      # 挂载插件目录
      - ./storage/custom_nodes:/opt/ComfyUI/custom_nodes
      # 缓存持久化
      - ./storage/cache:/root/ai-dock/cache

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

entrypoint.sh (启动脚本)

此脚本负责自动拉取插件,并尝试编译 CUDA 加速扩展。

#!/bin/bash
set -e

COMFY_DIR="/opt/ComfyUI"
CUSTOM_NODES_DIR="${COMFY_DIR}/custom_nodes"

echo "Checking ComfyUI installation..."

# 1. 安装/恢复 ComfyUI 本体
if [ ! -f "${COMFY_DIR}/main.py" ]; then
    echo "ComfyUI main.py not found. Installing..."
    git clone https://github.com/comfyanonymous/ComfyUI.git /tmp/comfyui_temp
    cp -rn /tmp/comfyui_temp/* ${COMFY_DIR}/
    rm -rf /tmp/comfyui_temp
fi

# 2. 安装 ComfyUI Manager
if [ ! -d "${CUSTOM_NODES_DIR}/ComfyUI-Manager" ]; then
    git clone https://github.com/ltdrdata/ComfyUI-Manager.git "${CUSTOM_NODES_DIR}/ComfyUI-Manager"
fi

# 3. 安装 PozzettiAndrea/ComfyUI-SAM3
SAM3_NODE_DIR="${CUSTOM_NODES_DIR}/ComfyUI-SAM3"

if [ ! -d "${SAM3_NODE_DIR}" ]; then
    echo "Installing PozzettiAndrea/ComfyUI-SAM3..."
    git clone https://github.com/PozzettiAndrea/ComfyUI-SAM3.git "${SAM3_NODE_DIR}"
    
    cd "${SAM3_NODE_DIR}"
    echo "Running install.py..."
    python install.py
    
    # 尝试编译 GPU 加速 (视频跟踪提速 5-10倍)
    echo "Running speedup.py for GPU acceleration..."
    python speedup.py || echo "Warning: GPU speedup compilation failed. Falling back to standard mode."

    cd "${COMFY_DIR}"
fi

# 确保包管理器存在
pip install GitPython uv > /dev/null 2>&1 || true

echo "Starting ComfyUI..."
exec "$@"

2. 模型准备(关键步骤)

SAM3 模型较大(约 3.2GB),为了避免启动时下载失败或等待时间过长,建议手动下载模型并放入指定目录。

操作步骤:

  1. 下载模型文件 sam3.pt
  2. 将其放置在宿主机的以下路径:
    ./storage/models/sam3/sam3.pt

对应容器内的路径为 /opt/ComfyUI/models/sam3/sam3.pt。如果该目录不存在,请先手动创建。

注意:如果已经在本地有模型文件,请直接拷贝进去。如果不放,第一次运行节点时会自动从 HuggingFace 下载。

3. 运行演示工作流

启动容器:

docker-compose up --build -d

启动完成后,打开浏览器访问 http://localhost:8188。你可以直接将下面的 JSON 文件拖入 ComfyUI 界面加载工作流。

JSON 说明:

  • 这是一个基础的“文本提示分割”工作流。
  • LoadSAM3Model 节点已配置为读取 sam3.pt
  • SAM3Grounding 节点默认提示词为 "person",。
{
  "id": "9e5b67d0-53dc-42aa-bdb7-541f1939e114",
  "revision": 0,
  "last_node_id": 14,
  "last_link_id": 24,
  "nodes": [
    {
      "id": 1,
      "type": "LoadImage",
      "pos": [
        626.0189034779078,
        251.07377362433684
      ],
      "size": [
        274.080078125,
        314
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            22
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.72",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "example_image.jpg",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "MaskPreview",
      "pos": [
        1589.7307778177144,
        96.54681627376111
      ],
      "size": [
        210,
        258
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 24
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.72",
        "Node name for S&R": "MaskPreview"
      },
      "widgets_values": []
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1233.971252285426,
        223.48707505899935
      ],
      "size": [
        312.1604553633704,
        258
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 23
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.72",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 13,
      "type": "SAM3Grounding",
      "pos": [
        921.0206799155256,
        127.5149432108375
      ],
      "size": [
        278.08203125,
        190
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "sam3_model",
          "type": "SAM3_MODEL",
          "link": 21
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 22
        },
        {
          "name": "positive_boxes",
          "shape": 7,
          "type": "SAM3_BOXES_PROMPT",
          "link": null
        },
        {
          "name": "negative_boxes",
          "shape": 7,
          "type": "SAM3_BOXES_PROMPT",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "masks",
          "type": "MASK",
          "links": [
            24
          ]
        },
        {
          "name": "visualization",
          "type": "IMAGE",
          "links": [
            23
          ]
        },
        {
          "name": "boxes",
          "type": "STRING",
          "links": null
        },
        {
          "name": "scores",
          "type": "STRING",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-sam3",
        "ver": "82c8e4f88a6c0a9242712b827e05f5e67c4a90a7",
        "Node name for S&R": "SAM3Grounding"
      },
      "widgets_values": [
        0.2,
        "person",
        -1,
        false
      ]
    },
    {
      "id": 12,
      "type": "LoadSAM3Model",
      "pos": [
        619.424717299163,
        120.03196674570798
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "sam3_model",
          "type": "SAM3_MODEL",
          "links": [
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-sam3",
        "ver": "82c8e4f88a6c0a9242712b827e05f5e67c4a90a7",
        "Node name for S&R": "LoadSAM3Model"
      },
      "widgets_values": [
        "models/sam3/sam3.pt",
        ""
      ]
    }
  ],
  "links": [
    [
      21,
      12,
      0,
      13,
      0,
      "SAM3_MODEL"
    ],
    [
      22,
      1,
      0,
      13,
      1,
      "IMAGE"
    ],
    [
      23,
      13,
      1,
      3,
      0,
      "IMAGE"
    ],
    [
      24,
      13,
      0,
      4,
      0,
      "MASK"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1167815779424768,
      "offset": [
        -353.95100285588137,
        87.4757595301123
      ]
    },
    "frontendVersion": "1.32.9",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}