-
-
Notifications
You must be signed in to change notification settings - Fork 562
AMD ROCm
To use AMD ROCm with SD.Next:
- Install ROCm libraries.
- Run SD.Next with
--use-rocmso it installs a compatibletorchbuild.
Important
AMD ROCm is officially supported for specific AMD GPUs.
[!IMPORTANT]
Currently, PyTorch support on Windows is not officially maintained by PyTorch team.
See AMD's announcement for more information.
[!WARNING]
Unofficial support for other platforms is provided by the community and SD.Next does not guarantee it will work.
Use of any third-party libraries is at your own risk.
- For preview support on Windows platform, see ROCm on Windows section.
- For unofficial support for Windows platform, see ZLUDA page.
Install ROCm:
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo amdgpu-install --usecase=rocm
sudo usermod -a -G render,video $LOGNAMEInstall git and python:
sudo apt install git python3 python3-dev python3-venv python3-pipSimply change the wget line from "noble" to "jammy" if using Ubuntu 22.04.
Install prerequisites:
sudo zypper in python312-devel python312-virtualenv python312-pip patterns-devel-base-devel_basisAdd the ROCm repository (not official, but maintained by AMD employees):
sudo zypper ar obs://science:GPU:ROCm/openSUSE_Factory ROCm
sudo zypper ref # Answer "ultimately trust"Install the relevant packages (there is no pattern so you must install them all manually):
sudo zypper in rocm-runtime \
miopen rccl rocblas amdsmi \
hipblaslt hiprand hipcub \
hipsolver hipfft rocm-cmake \
rocm-compilersupport \
rocm-llvm-filesystem \
rocm-clang-runtime-devel \
hipcub-devel rocm-hip-devel \
libhipfft0-devel libhipsolver0-devel \
libhipsparse1-devel rocthrust-devel \
librocfft0 rocm-core rocrand rocsolver This procedure should also work for Leap-based distributions and Slowroll (adjust the distro-specific lines), but it is not tested.
Note: This also installs build dependencies for flash-attention.
Install ROCm and git:
sudo pacman -S rocm-hip-runtime gitInstall Python 3.12 (or anything between 3.10 and 3.13):
sudo pacman -S base-devel python-pip python-virtualenv
git clone https://aur.archlinux.org/python312.git
cd python312
makepkg -si
cd ..
export PYTHON=python3.12
# remove the package builder residuals:
# rm -rf python312Install ROCm SDK:
Note
ROCm SDK is optional. It is only required for building flash attention or similar custom kernels.
ROCm SDK uses 26 GB of disk space.
sudo pacman -S rocm-hip-sdk libxml2-legacy gcc14 gcc14-libsOpen a terminal in the folder where you want to install SD.Next, then run:
git clone https://github.com/vladmandic/sdnext
```bash
Then enter the `sdnext` folder:
```shell
cd sdnextRun SD.Next with:
./webui.sh --use-rocmNote
It will install the necessary libraries at the first run so it will take a while depending on your internet.
See Docker if you want to build a custom image.
Note
Installing ROCm on your system is not required when using Docker as Docker has no access to it anyway.
To run a prebuilt Docker image:
export SDNEXT_DOCKER_ROOT_FOLDER=~/sdnext
sudo docker run -it \
--name sdnext-rocm \
--device /dev/dri \
--device /dev/kfd \
-p 7860:7860 \
-v $SDNEXT_DOCKER_ROOT_FOLDER/app:/app \
-v $SDNEXT_DOCKER_ROOT_FOLDER/python:/mnt/python \
-v $SDNEXT_DOCKER_ROOT_FOLDER/data:/mnt/data \
-v $SDNEXT_DOCKER_ROOT_FOLDER/models:/mnt/models \
-v $SDNEXT_DOCKER_ROOT_FOLDER/huggingface:/root/.cache/huggingface \
disty0/sdnext-rocm:latestNote
It will install the necessary libraries at the first run so it will take a while depending on your internet.
Resulting docker image will use 3.2 GB disk space (uncompressed) for the docker image and 20 GB for the venv.
For details, see AMD-MIOpen Guide.
On first use, on first use of a new resolution, or after a PyTorch upgrade, ROCm runs benchmarks to pick efficient kernels.
This can make startup slow (up to 5-8 minutes), especially with high-resolution refine passes, but it usually happens once per resolution.
If startup time is the priority, set MIOPEN_FIND_MODE=FAST.
If generation performance is the priority, set MIOPEN_FIND_ENFORCE=SEARCH and accept slower first-time startup.
If you use bf16 (Settings > Compute Settings > Execution Precision > Device precision type), which is auto-detected on RDNA3 and newer cards, VRAM usage can become very high (16+ GB) during final decode and non-latent upscaling.
To reduce usage, set Device precision type to fp16 and disable VAE upcasting in Variational Auto Encoder > VAE upcasting.
Using fp16 can also improve performance.
On RDNA3 hardware (RX 7000 series), you can enable CK Flash Attention in Compute Settings > Cross Attention > SDP Options by toggling CK Flash attention and restarting SD.Next.
This requires rocm-hip-sdk, because SD.Next downloads and compiles an additional Python package at startup.
In case you want to install it manually, activate the virtual environment then run pip:
pip install --no-build-isolation git+https://github.com/Disty0/flash-attention@navi_rotary_fix- Install Git and Python 3.12.
- Open the terminal in a folder you want to install SD.Next and install SD.Next from GitHub with this command:
git clone https://github.com/vladmandic/sdnext- Enter into the sdnext folder:
cd sdnext- Make sure that you are up to date.
git pull- Run SD.Next with this command:
./webui.bat --use-rocm