Mail Flow Delays
Incident Report for Zix | AppRiver
Postmortem

On February 11, 2019, we identified an issue that was causing customers to experience long, intermittent delays in processing and delivering emails. Customers across all filtering sets were impacted. While no customer emails were lost, many emails were delayed during high traffic volume times. During the typical surge traffic periods from February 11 through February 15th, our Dev team continued to troubleshoot the issue and we ultimately determined the delays were caused by DKIM test timeouts and the inability of the thread manager to recover expired DKIM test threads for further processing.

The code fix was deployed in the afternoon of 15 February 2019 and finally validated once the typical surge traffic resumed on the morning of 18 February 2019.

Posted Feb 21, 2019 - 16:04 CST

Resolved
This incident has been resolved.
Posted Feb 21, 2019 - 14:01 CST
Update
Since our last update mail has continued to flow normally and without delay. However, we will continue to monitor everything throughout the rest of today and this weekend to ensure it continues. A root-cause analysis will be drafted and posted after full exploration of the issue. We understand the frustration over such an incident and sincerely apologize for all the inconveniences we may have caused.
Posted Feb 15, 2019 - 15:09 CST
Monitoring
We are seeing confirmation the corrective action our engineers and development staff implemented is continuing to resolve the mail delay issue. There is still some minor queues still being worked through and sender's SMTP retry periods to reach, but in general, mail flow has returned to normal. We are investigating the root cause and working to create a permanent fix. We will continue to update this throughout the day.
Posted Feb 15, 2019 - 11:35 CST
Update
Our engineering and development staff have rolled out a fix to hopefully alleviate the latency with mail flow. This will require approximately 1 hour to be fully benchmarked and assessed. We will update this status in about 1 hour. We understand the frustration over such an incident and sincerely apologize for any inconvenience.
Posted Feb 15, 2019 - 10:25 CST
Investigating
With the normal increase of traffic this morning, it has become apparent our mitigation efforts were not sufficient. We are currently testing additional vectors to prevent latency. Testing has been deployed and being evaluated currently. We will continue to update this page as we gather data.
Posted Feb 15, 2019 - 08:38 CST
Monitoring
Mail flow has generally normalized but we are going to continue monitoring to ensure resolution of this issue. A root-cause analysis will be drafted and posted after full exploration of the issue and its impacts.
Posted Feb 14, 2019 - 16:52 CST
Identified
We have mitigated the email delays. There are still queues that will need to be processed, and this could take upwards of an hour. New email flow is returning to normal and mail delivery should completely resolve shortly. Please accept our apology for this most inconvenient delay. A root-cause analysis will be drafted and posted after full exploration of the issue and its impacts. Future process will also be implemented to prevent such an occurrence from happening again.
Posted Feb 14, 2019 - 15:32 CST
Update
We continue to investigate the mail flow latency issue and are working to address all concerns. We understand the frustration over such an incident and sincerely apologize for any inconvenience. More updates to come.
Posted Feb 14, 2019 - 13:12 CST
Investigating
Our staff is still working to resolve the mail flow delays that were identified previously. We’re still testing the issue and are focused on applying a quality resolution. More info on the issue soon.
Posted Feb 14, 2019 - 08:48 CST
Monitoring
We have taken corrective action to address the mail flow delay reported previously and continue to closely monitor.
Posted Feb 13, 2019 - 15:10 CST
Investigating
We are currently aware of intermittent mail flow delays and are investigating the issue.
Posted Feb 13, 2019 - 08:43 CST