Just stumbled across this thread. I’m not a heavy abin user so pardon the late arrival. There are some aspects to this which can be explained, and are in the process of being addressed.
First, some background on the Alpaca aspect. Alpaca was designed, primarily, around the concept of the other network-connected devices being on the same IP subnet as the client. The Alpaca Discovery Protocol (ADP) works only on the local subnet. If the Alpaca device is on a different subnet, it won’t receive ADP packets from the client (NINA, in this case). Everyone has their own subnet at HCRO, so obviously using ADP to find the roof controller’s Alpaca Safety Monitor endpoint nor the SafeAlert weather station Alpaca endpoint isn’t going to happen.
To get around that, you are instructed to create what’s called an ASCOM Dynamic Driver - a pseudo ASCOM driver that has the IP address, TCP port, and such configuration for those Safety Monitor and weather station Alpaca endpoints baked into it. You direct NINA (or Voyager, or whatever one uses) to use this ASCOM Dynamic Driver, which in turn does the talking to the remote Alpaca device. The ADD essentially acts as a middleman between the client and the remote Alpaca endpoint on a different network.
I’ve observed that, if the remote Alpaca device endpoint fails to service a HTTP request from the ADD, the ADD considers its connection to the device to be severed and there doesn’t seem to much in that framework to attempt a reconnect automatically before giving up. This situation then bubbles up to the client app (NINA, Voyager, etc.) as a disconnected device.
On occasion, the weather and roof controller will fail a request from one of the many clients now hitting these services for weather and safety status. The affected clients’ requests will get error response from the service (HTTP 500 is what I’ve observed) or they could time out waiting for a response. Either way, the ADD marks this as a disconnect and that ends up being registered by NINA et al as a disconnected device. Because the ADD looks like a “local” ASCOM driver and not know that the device it is working with is a remote Alpaca device elsewhere on the network, most clients will treat this like any other ASCOM device status change and that’s it. The context of possibly-failing network requests is lost on it.
Two things are being worked on at HCRO to help address this:
It’s clear that the large number of imaging PCs that now hit the weather and roof controller endpoints is putting these services under load that can reduce their availability. The classic “thundering herd” issue of load management. To deal with this, a caching service has been instituted for the roof controller’s Alpaca endpoints. This was put in place today and is now taking the load of the clients. It seems to be working quite well.
The weather station in particular is an acute example of the load issue. The weather station utilizes a piece of software called “ASCOM Remote”. This takes regular Windows ASCOM driver (the Interactive Astronomy SafeAlert weather system’s driver in this case) and creates an Alpaca network server for it. This permits everyone to reach it via the ObservingConditions ADD that everyone is advised to set up. So when the client (NINA, Voyager, etc.) uses the ADD to get weather stats, it’s contacting this ASCOM Remote component. The problem with ASCOM Remote is that it does not implement something newer to ASCOM called Device States. Device States allows an Alpaca server to bundle up multiple properties in a single response to a single query for the “device state”. ASCOM Remote is old enough that it doesn’t support this. As a result, each weather metric - temperature, humidity, air pressure, and so-on, that a client gets, is each gotten with an individual Alpaca/HTTP request.
There are 11 total metrics served up by the weather station at HCRO, so that’s 11 individual queries by a client to get/update them all. In NINA, this is done every 2 seconds. Multiply that by the number of clients of all sorts at HCRO and you can then get an idea about how busy that ASCOM Remote component gets, and it sometimes fails due to that.
A caching proxy is now also going in front of it. It’s being tested over the next few nights by myself to ensure it functionally does what is needed. Eventually, guidance will be issued to members so they can re-point their AdD for the weather station to the caching proxy instead of directly at the ASCOM Remote service for it.
On the NINA side, and I am mentioning this only because I’m involved in its development, is that we are looking at a way to avoid the need to configure these ASCOM Dynamic Drivers in order to use Alpaca services that are not on the local subnet and lack discovery by ADP. This is a bit more involved but, once in, NINA would be able to talk to such devices directly and the need to go into ASCOM Diagnostics and configure an ASCOM Dynamic Driver will be obviated.
Hope this helps.