Users unable to log in; API response failures
Resolved
Jun 18 at 02:56pm BST
Please see below for our RCA of this incident.
Duration:
From 2:56am BST Tuesday 13th June until 9:05am on Wednesday 14th June users had intermittent issues trying to log in to Pimberly, increased error responses from our API, and delays to scheduled jobs.
Root cause:
The root cause has been identified as an integration between Pimberly and BigCommerce. As soon as the issue was identified we disabled the specific service and the problem ceased returning Pimberly to a fully operational state.
How we will avoid in the future:
We have refined a number of logging tools to ensure any similar scenarios are captured and altered to us immediately and automatically.
A fix has been applied to protect against this scenario to ensure no further occurrences.
Impact on customers
· Users:
o When the platform was up and running users could utilise all Pimberly functionality as usual
o Tuesday - adhoc imports and exports were potentially delayed as they were likely caught up in the overall backlog of jobs from overnight delays
· Overnight feeds and channels
o Delays would have occurred to overnight jobs on both Monday night and Tuesday night, although we managed to process the vast majority throughout the day on Tuesday and everything on Wednesday
o Normal access was returned at 9:05am on Wednesday morning
· Sandbox was not impacted by the incident so has been fully operational throughout
· No data was lost throughout this incident
Any questions please either contact your account manager, support or Mike.walker@pimberly.com
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 14 at 04:35pm BST
Ongoing monitoring of the application and database has confirmed there have been no further recurrences of the issue. The queue of jobs has now been processed and is picking up new jobs as they are generated.
A full post-mortem analysis of this incident will be published in due course.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 14 at 01:02pm BST
We have identified the root cause of the ongoing issue. We are continuing to monitor overall application and database performance and scaling resources further to process the queue of pending jobs.
We will provide a further update once the queue has been processed, and a full post-mortem analysis of this incident will be published in due course.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 14 at 09:23am BST
Our engineers are continuing to investigate the root cause of the issues being experienced. Task processing for overnight jobs and ad-hoc imports and exports may be delayed but we are actively managing the queue and scaling our resources as required.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 08:30pm BST
Our API and task processing services have stabilised, scheduled imports and exports are executing as normal. We are still seeing some performance issues on our core database cluster and our engineers are continuing to investigate the cause of these issues with our database provider.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 04:19pm BST
Despite periods of stability, we're continuing to experience further intermittent outages affecting the Pimberly application, the API and task processing. Our engineers are continuing to investigate with our database provider in order to find a resolution.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 02:02pm BST
Since deploying additional database resource we have seen improvements to stability over the last hour; we continue to monitor and react as required. The underlying root-cause continues to be investigated by our engineering team and our database provider.
We will post further updates this afternoon.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 12:27pm BST
Our engineers continue to investigate the ongoing incidents of unexpected downtime. We are in contact with our database provider who is assisting with the investigation.
As a mitigation step we are deploying additional database resource, and scaling our task processing infrastructure to help process the queue of pending tasks.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 11:28am BST
We are continuing to investigate the issue and will post a further update before 12:30 BST.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 10:50am BST
We're currently investigating a further occurrence of unexpected downtime affecting the Pimberly app and API.
We will post a further update before 11:30 BST.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Updated
Jun 13 at 09:59am BST
Access to Pimberly has now been restored and the API is responding to requests as normal.
Our task infrastructure is now processing the queue of jobs such as feeds and channels. We will actively monitor and scale our resources to help process the queue as quickly as possible.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service
Created
Jun 13 at 09:30am BST
We are currently investigating an issue that is preventing users from logging in to Pimberly, and that is also causing timeouts or other errors from our API. Scheduled feeds and channels are also affected.
Our senior engineers are working on this as a priority. We will share our next update before 10:00 BST.
Affected services
Web Interface
Import Service
Export Service
Batch Service
Image Service