生成AIショート動画自動生成チャレンジ: Stable Diffusion WebUI Forge導入とAPI利用

前回まで、生成AIによるショート動画の自動生成を目指し、I2Vを行うための WAN2.2 と ComfyUI の設定を進めました。

今回は、I2Vの元となる画像の生成をStable Diffusionで行う準備を進めます。

Stable Diffusion WebUI Forgeの導入
Stable Diffusion系APIの基本
バイブコーディングの注意点
まとめ

Stable Diffusion WebUI Forgeの導入

Stable Diffusionには複数のUI派生が存在しますが、今回はWebUI Forgeを採用します。
やや以前のパッケージではあるものの、その分情報が豊富で、APIを叩くバイブコーディングとの相性が良い点を活かします。

導入にはStability Matrixを使用します。画像生成AIは進化が早く、将来的に別のUIを試したくなる場面が出てきたときに、Stability MatrixであればUIの切り替えが容易です。

また、モデルブラウザもCivitAIやHugging Faceを個別に確認するより快適で、ダウンロードしたモデルを複数のパッケージ間で共有できる点もメリットです。

WebUI Forgeは、Stability Matrixからは数クリックで導入できます。
環境によって手動インストールを行いたい場合は、下記の手順も参考にしてください。

ComfyUIでモデルを利用する方法もありますが、今回のようにAPI接続を前提とする場合は、試行錯誤しながら操作できるWebUI系のほうが扱いやすいでしょう。

Stable Diffusion系APIの基本

Stable DiffusionのWebUI系パッケージは、APIでの呼び出しが前提となる設計になっています。
もともとAPIの上にUIを載せた構造のため、WebUIに馴染みがある場合は、APIのパラメータ構成も直感的に理解しやすいでしょう。

ポジティブプロンプト、ネガティブプロンプト、サンプリング設定は、特によく調整するポイントになります。

まずはこのあたりを設定し、画像生成とダウンロードができる状態を整えます。コードの生成はChatGPTに任せました。

http://127.0.0.1:7860/　でStable Diffusion WebUI Forgeが稼働している。
ここに、ポジティブプロンプト、ネガティブプロンプト、サンプリング周りなどを指定してAPIを利用して画像を生成するコードを生成してください。
なおプロンプトや設定値はプログラム内部で変数を定義し渡せるようにしてください。

生成されたコードを以下に掲載します。IPアドレスなどは一部編集していますが、内容は短いため省略せず全文を載せています。ただし、後述の赤線部分など利用には注意が必要な箇所もあります。

"""
Stable Diffusion WebUI Forge (AUTOMATIC1111互換API) に txt2img を投げて画像生成するサンプル。

前提:
- Forge が http://127.0.0.1:7860 で起動している
- WebUI の API が有効 (通常は /sdapi/v1/txt2img が使えます)

出力:
- ./outputs/ 以下にPNG保存
"""

from __future__ import annotations

import base64
import json
import os
import time
from dataclasses import dataclass, asdict
from typing import Any, Dict, Optional, List

import requests


# =========================
# 設定（ここを変数で渡せる）
# =========================

BASE_URL = "http://127.0.0.1:7860"  # 末尾スラッシュ不要
OUT_DIR = "outputs"


@dataclass
class Txt2ImgParams:
    # プロンプト
    prompt: str
    negative_prompt: str = ""

    # 基本生成パラメータ
    width: int = 768
    height: int = 1024
    steps: int = 28
    cfg_scale: float = 6.0
    sampler_name: str = "DPM++ 2M Karras"  # Forge/WEBUI側の表記に合わせる
    seed: int = -1  # -1 でランダム
    batch_size: int = 1
    n_iter: int = 1

    # 追加（必要に応じて）
    restore_faces: bool = False
    tiling: bool = False
    enable_hr: bool = False
    hr_scale: float = 2.0
    hr_upscaler: str = "Latent"
    hr_second_pass_steps: int = 0
    denoising_strength: float = 0.55

    # モデル/設定の上書き（必要なら）
    # 例: override_settings={"sd_model_checkpoint":"xxx.safetensors", "CLIP_stop_at_last_layers":2}
    override_settings: Optional[Dict[str, Any]] = None

    # ControlNet等の拡張を使う場合は alwayson_scripts を追加
    alwayson_scripts: Optional[Dict[str, Any]] = None

    # 生成画像を base64 で返すか（通常 True）
    send_images: bool = True
    save_images: bool = False  # WebUI側に保存させたいなら True


def build_payload(p: Txt2ImgParams) -> Dict[str, Any]:
    payload: Dict[str, Any] = {
        "prompt": p.prompt,
        "negative_prompt": p.negative_prompt,
        "width": p.width,
        "height": p.height,
        "steps": p.steps,
        "cfg_scale": p.cfg_scale,
        "sampler_name": p.sampler_name,
        "seed": p.seed,
        "batch_size": p.batch_size,
        "n_iter": p.n_iter,
        "restore_faces": p.restore_faces,
        "tiling": p.tiling,
        "enable_hr": p.enable_hr,
        "hr_scale": p.hr_scale,
        "hr_upscaler": p.hr_upscaler,
        "hr_second_pass_steps": p.hr_second_pass_steps,
        "denoising_strength": p.denoising_strength,
        "send_images": p.send_images,
        "save_images": p.save_images,
    }

    if p.override_settings:
        payload["override_settings"] = p.override_settings
        payload["override_settings_restore_afterwards"] = True

    if p.alwayson_scripts:
        payload["alwayson_scripts"] = p.alwayson_scripts

    return payload


def txt2img(
    base_url: str,
    params: Txt2ImgParams,
    timeout_sec: int = 300,
) -> Dict[str, Any]:
    url = f"{base_url.rstrip('/')}/sdapi/v1/txt2img"
    payload = build_payload(params)

    try:
        r = requests.post(url, json=payload, timeout=timeout_sec)
        r.raise_for_status()
    except requests.RequestException as e:
        # 失敗時に payload も出せるようにしておく
        raise RuntimeError(f"txt2img request failed: {e}\nURL={url}\nPAYLOAD={json.dumps(payload, ensure_ascii=False)}") from e

    data = r.json()
    return data


def save_images_from_response(resp: Dict[str, Any], out_dir: str) -> List[str]:
    os.makedirs(out_dir, exist_ok=True)
    saved_paths: List[str] = []

    images_b64 = resp.get("images") or []
    for i, img_b64 in enumerate(images_b64):
        # WebUIは data:image/png;base64, 付きで返す場合があるので剥がす
        if "," in img_b64 and img_b64.strip().startswith("data:"):
            img_b64 = img_b64.split(",", 1)[1]

        img_bytes = base64.b64decode(img_b64)
        ts = time.strftime("%Y%m%d_%H%M%S")
        path = os.path.join(out_dir, f"txt2img_{ts}_{i:02d}.png")
        with open(path, "wb") as f:
            f.write(img_bytes)
        saved_paths.append(path)

    return saved_paths


def main() -> None:
    # ここを「プログラム内部の変数」として自由に差し替え可能
    positive_prompt = (
        "masterpiece, best quality, ultra-detailed, "
        "a cinematic portrait photo of a woman in orange light, 35mm, bokeh"
    )
    negative_prompt = (
        "lowres, worst quality, blurry, jpeg artifacts, deformed, bad anatomy, extra fingers"
    )

    params = Txt2ImgParams(
        prompt=positive_prompt,
        negative_prompt=negative_prompt,
        width=768,
        height=1024,
        steps=28,
        cfg_scale=6.0,
        sampler_name="DPM++ 2M Karras",
        seed=-1,
        batch_size=1,
        n_iter=1,
        enable_hr=False,
        # 例: モデル指定したいとき
        # override_settings={"sd_model_checkpoint": "your_model.safetensors"},
    )

    resp = txt2img(BASE_URL, params, timeout_sec=300)
    paths = save_images_from_response(resp, OUT_DIR)

    print("Saved:")
    for p in paths:
        print(" -", p)

    # 生成情報（infotext）が欲しいとき
    info = resp.get("info")
    if info:
        print("\nInfo:")
        try:
            # info がJSON文字列の場合がある
            info_obj = json.loads(info)
            print(json.dumps(info_obj, ensure_ascii=False, indent=2))
        except Exception:
            print(info)


if __name__ == "__main__":
    main()