Hi all,
with the increasing popularity of neural network based tools GPU acceleration becomes more and more important because otherwise the processing time for a single opration on high resolution images can easily exceed an hour or more. Currently the most prominant examples of such tools are the nice processes by Russell Croman for PixInsight (BlurXTerminator, StarXTerminator, etc.). In the background tools of that kind often make use of the TensorFlow library which provides various mechanisms for machine learning based calculations. It is already well known that TensorFlow can make use of CUDA enabled GPUs to accelerate such operations, often with a speed increase on the order of magnitudes. Therefore GPUs manufactured by Nvidia are often favoured by those who are investing into hardware for deep sky image processing.
However, what seems often overlooked is the fact that TensorFlow also provides a backend for calculations based on ROCm, the CUDA equivalent of AMD graphics cards. The only problem is that no such prepared version of the library can be simply obtained with a download from the internet.
Nevertheless it is possible to compile the library oneself with support for ROCm enabled. This is exactly what I did and I would like to share my experience of that procedure and the results.
Please do not understand this as comprehensive guide! I only want to demonstrate that Nvidia GPUs are not the only devices capable of such acceleration.
Prerequisites: I am running Ubuntu 22.04 on my desktop machine, equipped with a Ryzen 9 5950x, 128GB of DDR4 memory, and an AMD 6950 XT graphics card.
Outline of the procedure:
1. Install ROCm including development headers (see the amdgpu-install command)
2. Install bazel (for the build process)
3. Clone the official TensorFlow library and check out version 2.14.1 (that's the one I got working)
4. Configure the build, disabling CUDA support, but enabling ROCm support
5. Start a monolithic build (with my processor this took more than an hour)
6. Copy the TensorFlow, as well as the TensorFlow framework library into the PixInsight directory, making sure to set the symbolic links accordingly
7. Make sure that the ROCm version installed is compatible with the version of the amdgpu driver which is running (I initially had a newer Linux kernel running which caused interference between the graphics driver and ROCm)
I recommend setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true, otherwise TensorFlow will allocate the entire VRAM causing a lot of stuttering in the GUI.
If everything goes according to plan PixInsight can be launched and no error occurs. Launching StarXTerminator or StarNET++ will now cause the GPU to be used for the calculation, massively speeding up the process:
2023-12-10 21:10:02.503992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15810 MB memory: -> device: 0, name: AMD Radeon RX 6950 XT, pci bus id: 0000:11:00.0
A screenshot of my desktop when running StarXTerminator using my AMD GPU:

A comparison between the CPU and GPU based exeuction times for the same image taken with a ZWO ASI6200MM Pro (with Generate Star Image, Unscreen Stars, and Large Overlap enabled):
- CPU (16-core AMD Ryzen 9 5950X): 1h19m58s
- GPU (AMD RX 6950 XT): 3m11s
I think this can be called a significant speed-up
Clear skies,
Philipp
with the increasing popularity of neural network based tools GPU acceleration becomes more and more important because otherwise the processing time for a single opration on high resolution images can easily exceed an hour or more. Currently the most prominant examples of such tools are the nice processes by Russell Croman for PixInsight (BlurXTerminator, StarXTerminator, etc.). In the background tools of that kind often make use of the TensorFlow library which provides various mechanisms for machine learning based calculations. It is already well known that TensorFlow can make use of CUDA enabled GPUs to accelerate such operations, often with a speed increase on the order of magnitudes. Therefore GPUs manufactured by Nvidia are often favoured by those who are investing into hardware for deep sky image processing.
However, what seems often overlooked is the fact that TensorFlow also provides a backend for calculations based on ROCm, the CUDA equivalent of AMD graphics cards. The only problem is that no such prepared version of the library can be simply obtained with a download from the internet.
Nevertheless it is possible to compile the library oneself with support for ROCm enabled. This is exactly what I did and I would like to share my experience of that procedure and the results.
Please do not understand this as comprehensive guide! I only want to demonstrate that Nvidia GPUs are not the only devices capable of such acceleration.
Prerequisites: I am running Ubuntu 22.04 on my desktop machine, equipped with a Ryzen 9 5950x, 128GB of DDR4 memory, and an AMD 6950 XT graphics card.
Outline of the procedure:
1. Install ROCm including development headers (see the amdgpu-install command)
2. Install bazel (for the build process)
3. Clone the official TensorFlow library and check out version 2.14.1 (that's the one I got working)
4. Configure the build, disabling CUDA support, but enabling ROCm support
5. Start a monolithic build (with my processor this took more than an hour)
6. Copy the TensorFlow, as well as the TensorFlow framework library into the PixInsight directory, making sure to set the symbolic links accordingly
7. Make sure that the ROCm version installed is compatible with the version of the amdgpu driver which is running (I initially had a newer Linux kernel running which caused interference between the graphics driver and ROCm)
I recommend setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true, otherwise TensorFlow will allocate the entire VRAM causing a lot of stuttering in the GUI.
If everything goes according to plan PixInsight can be launched and no error occurs. Launching StarXTerminator or StarNET++ will now cause the GPU to be used for the calculation, massively speeding up the process:
2023-12-10 21:10:02.503992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15810 MB memory: -> device: 0, name: AMD Radeon RX 6950 XT, pci bus id: 0000:11:00.0
A screenshot of my desktop when running StarXTerminator using my AMD GPU:

A comparison between the CPU and GPU based exeuction times for the same image taken with a ZWO ASI6200MM Pro (with Generate Star Image, Unscreen Stars, and Large Overlap enabled):
- CPU (16-core AMD Ryzen 9 5950X): 1h19m58s
- GPU (AMD RX 6950 XT): 3m11s
I think this can be called a significant speed-up
Clear skies,
Philipp