ERROR: 'Watcher thread is marking device DOWN'

Description

When running a CIMPLICITY project that uses the OPC Client, on occasion the following message appears in the status logs:

10/21/2015 6:30:37 PM Failure 3760 OPC_0 DEVICENAME 3821
Watcher thread is marking device DOWN!!!
Error of type: COR_DCRP_ERR, Code: 20016

What does this failure mean and how can it be rectified?

Resolution

The message means that the CIMPLICITY OPC Client has failed on its health checks to the OPC Server. When you are using an OPC Client, it must establish communications to a remote OPC Server using DCOM. Since the DCOM timeout over which the OPC Client will communicate is set to 60 seconds, it is entirely inadequate to deal with the detection of communications failures. Since that is the case a
mechanism was created to allow the CIMPLICITY OPC Client to detect a communications failure more rapidly.

To do this three executing threads are used. The three threads are as follows:

  1. TOOLKIT Thread – This is the “Worker” thread that actually does the writes/reads to the OPC Server.
  2. PING Thread – This thread is used to periodically check on the OPC Server is still active. This is done by the thread going out and doing a “GetStatus()” call against the OPC Server. The OPC Server must return a result of RUNNING in order for it to be a successful call.
  3. WATCHER Thread – This thread watches the “Ping Thread” to see how long it has taken. If the ping thread does not return within the PingTimeout value (as dictated by the ini file) then the connection is dropped.

When the “Watcher Thread Marking Device Down” failure message appears in the status logs, it indicates that the Watcher thread detected that the Ping Thread took too long to return. This could be due to several factors:

  • The Ping could have failed, in which case the device should be marked down correctly.
  • The OPC Server was slow to respond, in which case the ping took longer than the configured timeout. To adjust this you would want to adjust the PingTimeout parameter (default of 3000 ms).
  • The OPC Server responded with an invalid status response.

Most typically the issue is the first two issues. If the Device is marked down because of the timing being incorrect, then you would need to increase the PingInterval and the PingTimeout parameters which are found on the OPC Client Device Properties in the CIMPLICITY Project.