Users unable to log in; API...

Downtime

Users unable to log in; API response failures

Jun 13, 2023 at 8:30am UTC

Affected services

EU Web App

Import Service

Export Service

Batch Service

Image Service

Resolved
Jun 18, 2023 at 1:56pm UTC

Please see below for our RCA of this incident.

Duration:

From 2:56am BST Tuesday 13th June until 9:05am on Wednesday 14th June users had intermittent issues trying to log in to Pimberly, increased error responses from our API, and delays to scheduled jobs.

Root cause:

The root cause has been identified as an integration between Pimberly and BigCommerce. As soon as the issue was identified we disabled the specific service and the problem ceased returning Pimberly to a fully operational state.

How we will avoid in the future:

We have refined a number of logging tools to ensure any similar scenarios are captured and altered to us immediately and automatically.

A fix has been applied to protect against this scenario to ensure no further occurrences.

Impact on customers

· Users:
o When the platform was up and running users could utilise all Pimberly functionality as usual
o Tuesday - adhoc imports and exports were potentially delayed as they were likely caught up in the overall backlog of jobs from overnight delays
· Overnight feeds and channels
o Delays would have occurred to overnight jobs on both Monday night and Tuesday night, although we managed to process the vast majority throughout the day on Tuesday and everything on Wednesday
o Normal access was returned at 9:05am on Wednesday morning
· Sandbox was not impacted by the incident so has been fully operational throughout
· No data was lost throughout this incident

Any questions please either contact your account manager, support or Mike.walker@pimberly.com

Updated
Jun 14, 2023 at 3:35pm UTC

Ongoing monitoring of the application and database has confirmed there have been no further recurrences of the issue. The queue of jobs has now been processed and is picking up new jobs as they are generated.

A full post-mortem analysis of this incident will be published in due course.

Updated
Jun 14, 2023 at 12:02pm UTC

We have identified the root cause of the ongoing issue. We are continuing to monitor overall application and database performance and scaling resources further to process the queue of pending jobs.

We will provide a further update once the queue has been processed, and a full post-mortem analysis of this incident will be published in due course.

Updated
Jun 14, 2023 at 8:23am UTC

Our engineers are continuing to investigate the root cause of the issues being experienced. Task processing for overnight jobs and ad-hoc imports and exports may be delayed but we are actively managing the queue and scaling our resources as required.

Updated
Jun 13, 2023 at 7:30pm UTC

Our API and task processing services have stabilised, scheduled imports and exports are executing as normal. We are still seeing some performance issues on our core database cluster and our engineers are continuing to investigate the cause of these issues with our database provider.

Updated
Jun 13, 2023 at 3:19pm UTC

Despite periods of stability, we're continuing to experience further intermittent outages affecting the Pimberly application, the API and task processing. Our engineers are continuing to investigate with our database provider in order to find a resolution.

Updated
Jun 13, 2023 at 1:02pm UTC

Since deploying additional database resource we have seen improvements to stability over the last hour; we continue to monitor and react as required. The underlying root-cause continues to be investigated by our engineering team and our database provider.

We will post further updates this afternoon.

Updated
Jun 13, 2023 at 11:27am UTC

Our engineers continue to investigate the ongoing incidents of unexpected downtime. We are in contact with our database provider who is assisting with the investigation.

As a mitigation step we are deploying additional database resource, and scaling our task processing infrastructure to help process the queue of pending tasks.

Updated
Jun 13, 2023 at 10:28am UTC

We are continuing to investigate the issue and will post a further update before 12:30 BST.

Updated
Jun 13, 2023 at 9:50am UTC

We're currently investigating a further occurrence of unexpected downtime affecting the Pimberly app and API.

We will post a further update before 11:30 BST.

Updated
Jun 13, 2023 at 8:59am UTC

Access to Pimberly has now been restored and the API is responding to requests as normal.

Our task infrastructure is now processing the queue of jobs such as feeds and channels. We will actively monitor and scale our resources to help process the queue as quickly as possible.

Created
Jun 13, 2023 at 8:30am UTC

We are currently investigating an issue that is preventing users from logging in to Pimberly, and that is also causing timeouts or other errors from our API. Scheduled feeds and channels are also affected.

Our senior engineers are working on this as a priority. We will share our next update before 10:00 BST.