WBPP stacking duration

19 replies873 views
Daniel Arenas avatar
Hi dudes!

Yesterday I stacked, with the PixInsight WBPP, it was time to NGC7000 (North Ameeica nebula) with 48 lights (which by the way the ASI 2600 takes 56 MB weighted files) There were also 60 flats, 60 dark flats and 30 or 40 darks, with the addition that since they were from different sessions, WBPP had to calibrate by groups (each light with its darkflats and flats and the total darks) and then finish to stack.

Well, in total on my 12 year old Mac laptop it took 4 hours and 49 minutes.

The laptop has the following specifications:

Retina MacBook Pro mid 2012.
Processor: Intel Core i7 with 4 cores at 2.3 GHz.
RAM: 16GB 1600MHz DDR3.
Graphics card: NVIDIA GeForce 650M 1GB + Intel HD Graphics 4000 1536MB.

Of course I know that with a modern laptop or better, a good gaming computer, time shoulb be shorter. 

In fact I’ve read the recommended specifications for PixInsight and they are high: https://pixinsight.com/sysreq/index.html

DSS users  have told me to try stacking with this (another program to learn) which is usually fast.

I will try it next week, I have a full weekend with family activities, but I will re-stack with WBPP on my daughter's Windows laptop to better compare times (DSS is only for Windows). The laptop is more modern and mid-featured so I expect WBPP itself to stack faster.

What I don't understand or doesn't enter my head is that they could be 2 programs with different stacking times (theoretically a lot) and they supposedly produce equally good master light files? I have my doubts, if so, what do the algorithms of the latest programs waste their time on?
I find it very hard to believe that a PixInsight script that has even been responsible for the release of new Pix versions has some bad optimization problem 🤷🏻‍♂️. But I have no knowledge to analyze it.

Do you use WBPP to stack? In fact know that I’ve learned it I love its interface. 

Kind regards and clear skies!

Daniel
John Hayes avatar
Wow…you have to work pretty hard to achieve a nearly 5 hour stacking job-even with a 2012 Mac!  

I can’t talk to you about the algorithms but I can tell you one big thing that’s causing it to take so long.  Using 60 flats, 60 dark flats, and 30-40 dark files is a complete waste of time.  I don’t have it here on my iPad, but in other threads, I’ve posted a chart showing the impact on noise as a function of the number of calibration files derived from the statistics of a stacking calibrated files.  For 48 lights, using calibration masters containing much more than around 16 subs doesn’t provide any significant improvement in SNR.  You didn’t say how many different exposures you are using or the exposure lengths for your flats so it’s hard to parse out how you could further consolidate the process.   Flat darks aren’t always needed—but that depends on how you are taking the data.  It depends a little on how you operate, but in most cases, it is MUCH faster to create reusable flat, dark, and bias masters prior to stacking your lights.  It those cases, you save time by stacking the calibration files just once; rather than stacking them with each new image.  If you work a little smarter, I suspect that your old Mac will have no trouble stacking a set of 48 lights in less than 10-20 minutes—even with the ASI2600 using 1x1 binning.

John
Helpful Engaging Supportive
Die Launische Diva avatar
The short answer is that PixInsight's WBPP runs a totally different workflow than DSS. Thus you can't easily make direct comparisons between DSS and PI regarding processing time. Also consider that  PI stores a lot of intermediate files for you to inspect for troubleshooting or for some other special purpose.

Local Normalization, more sophisticated rejection algorithms, dealing with faulty CCD columns are some PI processes which impact a lot the processing time. You have to make your research and decide if some WBPP setting is appropriate for your dataset. If something is a default in WBPP, it does not mean it is absolutely necessary. Its developers enabled those defaults with more difficult datasets in mind.

It is desirable for both programs to produce similar masters under normal imaging conditions. However, an experienced user may be able to spot differences, and be able to push more a PI master light during processing.

Huge disclaimer: I don't use WBPP smile
Helpful Insightful Respectful
Andy Wray avatar
As above:  did you use Local Normalisation?  On my fairly old workstation it tends to double my overall integration time and quite often produces worse results than if I didn't use it.  I also stick with master darks and master dark flats for several months and as such that part of the process is zero seconds.

FYI:  I do use WBPP most of the time, but tend do it in two halves (calibrate the images -> SubFrameSelector to choose best images -> Back into WBPP to register and integrate)

Personally, I would not recommend an Apple laptop for PixInsight.  You'd be better off with an Nvidia-based gaming PC with multiple NVMe drives (based on their plan to start taking advantage of CUDA acceleration).  Even things like Starnet++ run so much faster using CUDA.
Helpful Insightful Respectful Engaging
Roger Nichol avatar
PI makes good use of PCs having many cores and plenty of RAM.  I don't use WBPP, but  a manually-driven calibration and integration of 50 ASI2600MC images takes about 20 minutes using local normalisation, and about 30 minutes using NormaliseScaleGradient.  The stacking itself (Image Integration) is just over 4 minutes. 

I'm using a PC with an AMD 5950X (16 core) processor, 64GB of RAM and a fast nVME SSD. This replaced my old i7-based PC. The benchmarks ran 26x faster on the new PC. 

You may find that your limitation is RAM. If PI runs out of physical RAM it uses virtual memory, which is disk, so involves swapping large amounts of data between RAM and disk - very slow.

You can check whether or not you are running out of RAM using Task Manager (Ctrl-Alt-Del, select Task Manager then the Performance tab).  Watch the memory graph during integration - if the memory gets to 100% then you will get a significant performance hit.  Increasing the amount of RAM in your PC may be a cost-effective way of speeding things up. The ASI2600 files are big - I found that 32GB RAM was not enough, but 64GB works fine.
Well Written Helpful Insightful Engaging Supportive
pfile avatar
by default WBPP tends to automatically select ESD clipping for pixel rejection. this algorithm is extremely memory and cpu intensive. this could lead to paging (and extremely poor performance) as Roger explains above. if you manually select Windsorised Sigma Clipping that part should be a little more quick.

and as others have explained WBPP is also now using Local Normalization by default, which isn't super processor intensive but it is another step that has to be run against each subframe, which programs like DSS, etc. don't do. depending on the number of subframes this can also increase runtimes a lot.

rob
Helpful Concise
Daniel Arenas avatar
Thanks to all of you for your responses. 

Let  me share with you the image of the task monitor from the stacking I did. Here you can see the time it took for each part of the process:



One more thing. I also stack with the option of separate channeld and the recombine them. 

@John Hayes I take flats 5 seconds time exposure. That’s the reason for taking darkflats. That and because in other posts most of you told me that bias are not needed in dedicated càmeras and darks also includes the electronic noise that are recorded in bias. 

The chart you referred seems to be a very interesting thing. I will look for it. It’s yours or a kind of darks impact study in calibration?

@Andy Wray I use to check that the WBPP search for the better reference image and I suppose that it’s an issue aldo for my laptop processor. 

@Roger Nichol yes, I think you’te right and I’m runnigout of RAM the most of time. In fact from 16 Gb there’s usually a message from another program that monitorize the laprop (Clean My Mac) that tell me that. Next time I’m going to check it.
Andy Wray avatar
Almost 2 hours of that was due to local normalisation … I'd suggest you really need to check if you need to do that based on your subs.  I know the WBPP script added it as a default recently, but that doesn't mean you have to use it.

I also think that doubling your RAM would help a lot.
Helpful Concise Supportive
pfile avatar
that looks reasonably balanced and there are not any huge outliers, time wise. you might save some time with the pixel rejection setting that i was talking about.

one thing i do see is that the LN reference seems to be composed of all 58 of your frames. i don't know if that's a display bug or if you are running an older version of WBPP, but the tooltip in mine (1.8.9-1 with WBPP 2.4.5) says that it will integrate the top 3-20 frames for an LN reference. since the LN reference generation times and master light integrations seem to be taking the same amount of time, it really does seem to me that WBPP is integrating all 58 images as the LN reference.

with these huge CMOS sensors it's definitely possible to run out of memory. however, 6248 * 4176 * 4 = 100MB per image and with 58 image you're looking at ~6GB of ram to load every single image in its entirety. splitting the CFA channels actually helps a lot here, otherwise you'd be looking at 18GB of ram. anyway PI shouldn't have to load the whole image into memory so in theory it looks like you should have enough RAM to get by here without paging, but it is worth looking at while WBPP is running just to be sure.

rob
Helpful
kuechlew avatar
Hi,

you may run the PixInsight Benchmark to check how your machine is performing.



PI is fairly IO-heavy, so the speed of the drive where you keep your files is important for overall performance. My very first image integration with PI took 8+ hours because I was stupid enough to run it from a  slow NAS. In addition you should create swap files - 1 for each core of your cpu under Preferences





Using different drives for multiple swap files is better than what I'm doing ...
Your files and the swap files should be located on SSDs preferably nvme ssds. With a lot of RAM a RAM-disk is another good solution.

PixInsight — PixInsight Benchmark provides you with typical results for various setups.

Clear skies
Wolfgang
Helpful Concise
Andy Wray avatar
I just thought I'd give you something to compare against.  This is using my 4 year-old Windows workstation with the following specs:

Core i7 8700 3.2GHz 6 core
32G RAM
Two NVMe drives: one for my boot and another for my data.  I've created separate swap files on the two drives.

The below is a slightly larger number of subs than yours being measured, registered and integrated.  That said, my camera has a smaller number of pixels.


I'm guessing that your laptop is RAM-limited and that you may not have NVMe drives in it and so are also I/O limited.

FWIW:  My kit is well below the recommended specs also.
Helpful Supportive
pfile avatar
the thing about swap files though is that i don’t think they are used at all by any process in the WBPP pipeline. they are the backing store for the view undo history. 

wbpp speed is more about putting your subs (and the working diectory) on fast storage since just about every task in the pipeline is reading images from disk and writing them back out. 

i couldn’t detect a meaningful difference between running wbpp on subs stored on an nvme ssd vs. a ram disk on my M1 ultra. but this is probably because that machine has a ridiculous amount of bandwidth available in the storage subsystem.
Helpful Insightful
Andy Wray avatar
[Update]  I was looking at Windows resource monitor during a WBPP session just now and what I've realised is:

* My NVMe drives are barely breaking into a sweat (maybe 25% utilised), so no I/O bottleneck there
* A lot of the time during registration and integration my processor is running at 100% on all 6 cores
* most of the time my 32G of memory was only 50% utilised (with 80 of my subs per channel), however during the final image integration of each channel this jumped to 85% utilised.  I'm assuming, therefore, that my 32G of RAM would start to become an issue for 100 or more subs

I'm sure there are plenty of articles about this out there, however when I do come round to upgrading my workstation I'll be looking at a much faster processor and 64 to 128G RAM
Helpful Insightful
John Hayes avatar
Daniel Arenas:
@John Hayes I take flats 5 seconds time exposure. That’s the reason for taking darkflats. That and because in other posts most of you told me that bias are not needed in dedicated càmeras and darks also includes the electronic noise that are recorded in bias.

Hi Daniel,
At 0C, the IX455 only produces 0.014 e-/pixel/second of dark current so at 5 seconds you don't need dark flats.  Dark flats are just increasing your computing time with no benefit.

You are correct that dark data also includes the bias offset; however, the problem is that in order to properly subtract the pure dark signal, you have to be VERY careful about even small offsets.  It is so easy to subtract bias that I still recommend doing so--just to avoid the possibility of a calibration error.  If you are absolutely certain that the bias signal is identically zero across the chip, then it is safe to ignore bias but that's almost never the case.

Here is the chart that I was referring to.  With 50 subs, 64 subs in the calibration master only provides an additional 9%-10% reduction in noise compared to using only 16 subs in the master.  That small gain incurs a large penalty in computing time.   I suggest trying a calibration with 16 subs and I'll bet you that you will have a very, very hard time seeing that additional 10% noise reduction.


John

Well Written Helpful Insightful Engaging
Andy Wray avatar
@John Hayes You are obviously right about the dark/flat calibration stuff.  Daniel's real problem is, however, with the time it's taking him in registration, local normalisation and integration.  Not accepting the WBPP defaults in these areas and maybe leaving local normalisation out of that process (given his limited PC capabilities) would actually cut hours from the process.
Dark Matters Astrophotography avatar
Andy Wray:
@John Hayes You are obviously right about the dark/flat calibration stuff.  Daniel's real problem is, however, with the time it's taking him in registration, local normalisation and integration.  Not accepting the WBPP defaults in these areas and maybe leaving local normalisation out of that process (given his limited PC capabilities) would actually cut hours from the process.

I have not used Local Normalization in a long time, as it can cause more problems than it solves. Do folks really see that much benefit from it? If it is adding hours to the process of stacking images -- I would hope there is some tangible benefit of doing so.
Well Written Engaging
pfile avatar
LN was kind of a mess for a long time and usually created more problems than it solved. but then normalize scale gradient came along and lit a fire under juan… and he rewrote LN. i think it uses photometry now but not 100% sure of that. anyway it is a lot better behaved now. 

for whatever reason it is now on by default in WBPP and new users probably are not aware of this.
Daniel Arenas avatar
Hello everyone!

After having spent some days disconnecting from the routine I'm here again. I want to let you know that I'm very gratefull about your help and comments and I very appreciate them.
one thing i do see is that the LN reference seems to be composed of all 58 of your frames. i don't know if that's a display bug or if you are running an older version of WBPP, but the tooltip in mine (1.8.9-1 with WBPP 2.4.5) says that it will integrate the top 3-20 frames for an LN reference. since the LN reference generation times and master light integrations seem to be taking the same amount of time, it really does seem to me that WBPP is integrating all 58 images as the LN reference.


Hi @pfile,

That's a parameter that I just click the checkbox. I do not force to use all the pictures in the configuration. It's something that WBPP just do it. I show you a screen capture with the parameters' configuration. I'm using the latest version of WBPP 2.4.5



In presets I choose the best quality. If the price is only the speed of stacking I'd rather to pay it and have a master-light with the best quality.


Hi,

you may run the PixInsight Benchmark to check how your machine is performing.


Hi @kuechlew ,

I have done it but I don't Know how to interpret the report given.

Here it is, from my Mac.


Andy Wray:
just thought I'd give you something to compare against.  This is using my 4 year-old Windows workstation with the following specs:

Core i7 8700 3.2GHz 6 core
32G RAM
Two NVMe drives: one for my boot and another for my data.  I've created separate swap files on the two drives.

The below is a slightly larger number of subs than yours being measured, registered and integrated.  That said, my camera has a smaller number of pixels.


Hi @Andy Wray,

So, you have 2 more cores and with more GHZ (3.2 in front 2.3). The processor is an i7 but I supose that this model number 8700 indicates that have much more features (whatever that means) than mine wich is 3615QM.
And also you have 16 Gb of RAM more than me and I supose that they are DDR4 with good speed.

I think It's clear that all of this bunch of features move better the data. Ah! My SSD it's not an NVMe, it's one 10 year-old. Just 18 minutes is impressive!
John Hayes:
Hi Daniel,
At 0C, the IX455 only produces 0.014 e-/pixel/second of dark current so at 5 seconds you don't need dark flats.  Dark flats are just increasing your computing time with no benefit.

You are correct that dark data also includes the bias offset; however, the problem is that in order to properly subtract the pure dark signal, you have to be VERY careful about even small offsets.  It is so easy to subtract bias that I still recommend doing so--just to avoid the possibility of a calibration error.  If you are absolutely certain that the bias signal is identically zero across the chip, then it is safe to ignore bias but that's almost never the case.

Here is the chart that I was referring to.  With 50 subs, 64 subs in the calibration master only provides an additional 9%-10% reduction in noise compared to using only 16 subs in the master.  That small gain incurs a large penalty in computing time.   I suggest trying a calibration with 16 subs and I'll bet you that you will have a very, very hard time seeing that additional 10% noise reduction.


John


Hi @John Hayes ,
That's a great discussion. With my non modified DSLR I used to take bias and flats with 1 o 2 seconds of exposure.
When I get my ASI 2600 MC Pro 16 bits dedicated camera I asked what was the best way to proceed and someone, which opinion I appreciate as evereyone's, told me that dedicated cameras don't need to take bias and that is better to take dark-flats If I took flats from more than 1 or 2 seconds exposures. I use a flat pannel and with the NINA's flat assistant the best value is 5 second exposure, so I did dark flats.

What I don't think to have known much about is what you say from the calibration master. I mean, I have undestood about the graphic explanation but this 6 subs mean darks? flats? bias? dark-flats? or 16 of each?

To take bias in NINA with a dedicated camera you set the time in 0 ?
I have not used Local Normalization in a long time, as it can cause more problems than it solves. Do folks really see that much benefit from it? If it is adding hours to the process of stacking images -- I would hope there is some tangible benefit of doing so.


Hi @Bill Long 

I'm a Newbie on this with just a year of experience, maybe it's my 5th stacking with WBPP, and that's the first with fits taken with a cooled dedicated camera. And moreover with data of two night sessions. I check in the parameters what it's supposed to be checked but I'm not in the point to know how to interpret all processes of the stacking in a technical way just to know what are the pros an cons of each step or if it's better to avoid one or another in that or other cases. I assume that I'm still learning and I have a long path in front of me.
LN was kind of a mess for a long time and usually created more problems than it solved. but then normalize scale gradient came along and lit a fire under juan… and he rewrote LN. i think it uses photometry now but not 100% sure of that. anyway it is a lot better behaved now. 

for whatever reason it is now on by default in WBPP and new users probably are not aware of this.


Maybe that means that admins believe that the update of the internal process / algorithm (I don't know what's the correct word) of LN has solved all the problems and It will give more pro than cons.... I'm just guessing.  I didn't know that LN gave so many cons before.

Well I have done a test and I copied all the data from my external thunderbolt HDD to the internal 10 year-old SSD. I have better stacking times. In front of the 4 hours 49 minutes, now I had 3 hours and 29 minutes

That are the 2 screnn capture. First the new one, and then the old.





I'm going to try to do that again with my daughter's windows laptop that it's not a powerful one butmuch more modern than my Mac.

What do you think to use a Gaming PC (not laptop, a desk one)  for using PixInsight? My teenage daughter ask me to buy one to play computer games and maybe it could be a good thing for both.

Well, in order not to mix things, may be it will be better to create another thread to talk about that.

Thanks everyone and don't hesitate to let me know you opinion / experience or any kind of contribution to me and to all these great astrobin's community.

Kind regards and clear skies!
Andy Wray avatar
Daniel Arenas:
What do you think to use a Gaming PC (not laptop, a desk one)  for using PixInsight? My teenage daughter ask me to buy one to play computer games and maybe it could be a good thing for both.


Great idea!!  You'll get lots of appreciation from your daughter and a proper platform for PI at the same time ;)
Daniel Arenas avatar
All right mates,

I did the benchmark and the stacking with WBPP 2.4.5 with my daughter's windows laptop. The laptop is not any kind of top featured, you can see that in the benchmark, but the stacking in the same conditions of the others was significantly quicker (slower comparing to the 18 minutes lasting from @Andy Wray process).

Ok, the benchmark:



The swap performance in punctuation is significantly much better than with the Mac, but the total performance isn't. 6566 in the Windows laptop in front of 4149 in the Mac laptop. Different of course, but I'm not able to understand if just 2000 points is a big improvement or not.

Now, the stacking result:



On Windows, WBPP did a Linear pattern subtraction process that I'm not sure to have configured it... 

The comparison of the results:

Mac with external HDD thunderbolt: 4 h 49 min
Mac with 10-year-old SSD: 3h 29 min
Windows NVMe drive: 2 h 19 min

Kind regards and clear skies!
Helpful