Normal Operations

Degraded Operations

Partial Outage

Outage

Under Maintenance

Azure Multi Service Impact in Switzerland North

Incident Report for Sprinklr

Postmortem

What happened?

Between 23:54 UTC on 26 September 2025 and 21:59 UTC on 27 September 2025, a platform issue resulted in an impact to customers in Switzerland North who may have experienced service unavailability or degraded performances for resources hosted in the region. This includes Azure services such as: API Management, App Service, Application Gateway, Application Insights, Azure Cache for Redis, Azure Cosmos DB, Azure Data Explorer, Azure Database for PostgreSQL, Azure Databricks, Azure Firewall, Azure Kubernetes Service, Azure Storage, Azure Synapse Analytics, Backup, Batch, Data Factory, Log Analytics, Microsoft Sentinel, SQL Database, SQL Managed Instance, Site Recovery, Stream Analytics, Virtual Machine Scale Sets, and Virtual Machines. Other services reliant on these may have also been impacted. Many customers would have experienced mitigation at approximately 04:00 UTC on 27 September 2025, while those impacted by additional long-tail issues that remained after initial mitigation steps would have experienced full mitigation at 21:59 UTC on 27 September 2025.

What do we know so far?

We were alerted by our telemetry of a significant drop in traffic. We identified a recent deployment introduced a malformed prefix in one of the certificates used for connection authorization for resources within the region. Once we identified the deployment error, we rolled back to restore normal traffic flow and service availability. Additional long-tail recovery was required for a subset of particular resources with remaining connectivity issues after the roll back was completed.

How did we respond?

23:54 UTC on 26 September 2025 – Customer impact began.
00:08 UTC on 27 September 2025 – The issue was detected, shortly after, via automated monitoring.
00:12 UTC on 27 September 2025 – Investigation commenced by Storage and Networking engineering teams.
02:33 UTC on 27 September 2025 – Cause identified to be the recent deployment and a roll back was initiated to a previous safe state.
04:00 UTC on 27 September 2025 – Rollback successfully completed. Additional validation performed for impacted services and additional connectivity issues were identified for a subset of resources.
16:15 UTC on 27 September 2025 - Long-tail recovery operations were investigated and performed in order to recover resources with remaining connectivity issues. This included applying mitigation scripts and performing steps to safely reboot and recover the remaining subset of resources.
21:59 UTC on 27 September 2025 – Long-tail recovery activities and validation completed confirming full recovery of all impacted services and customers.

What happens next?

Our team will be completing an internal retrospective to understand the incident in more detail. We will share preliminary findings within 3 days. Once our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review (PIR) to all impacted customers.

Posted Sep 30, 2025 - 02:36 UTC

Resolved

Hi Team,

We’d like to inform you that the issue impacting accessibility of Sprinklr platform URLs has been updated to operational by the CSP for Azure Switzerland North. Majority of the impacted services have been fully recovered, and a subset are nearing completion. We continue to monitor traffic and service stability to ensure full recovery.

For more details, you can refer to the official Azure status page: https://azure.status.microsoft/en-gb/status

Thank you for your patience and understanding during this incident. Should you face any further difficulties, please feel free to reach out to Sprinklr Support.

Posted Sep 27, 2025 - 09:41 UTC

Monitoring

Hi Team,

This is to inform you that Azure has confirmed that majority of their impacted services have been fully recovered, and a subset is nearing completion. They will continue to monitor traffic and service stability to ensure full recovery.

We shall keep you posted as soon as we have further updates.

Posted Sep 27, 2025 - 06:01 UTC

Update

Hi Team,

This is to inform that Azure has identified the root cause and begun rolling back the faulty deployment. Early signs of recovery are positive, and full mitigation is expected by 05:30 UTC. We shall keep you posted as soon as we have further updates.

Posted Sep 27, 2025 - 03:55 UTC

Update

Hi Team,

This is to inform you that azure has identified a networking-related issue that is impacting multiple services in this region. They are actively investigating a recent networking change as a potential root cause and are in the process of preparing a hotfix to address the issue.
We shall keep you posted as soon as we have further updates.

Posted Sep 27, 2025 - 02:30 UTC

Investigating

This is to inform you that we are observing errors while opening prod18 sprinklr platform url's. The issue seems to be occurring due to an issue with Virtual machines on Azure. We will keep you posted as soon as we have an update.

We apologise for the inconvenience and appreciate your patience as we work towards fully restoring services. For further assistance or updates, please reach out to our support teams.

Posted Sep 27, 2025 - 00:56 UTC

This incident affected: Cloud Hosting Provider (EMEA (Prod18)).