On June 1 at 14:50 UTC a customer opened a support case to notify us that specific documents were not present in an index where they were expected. Our customer support team quickly established that the documents were still present in the database and directly accessible using a reference, but confirmed that the documents were omitted from the index. The support on-call engaged our engineering on-call for the database team to investigate. The engineering on-call suspected that our garbage collector, which cleans up documents that have been deleted or have outlived their configured time-to-live (TTL), might be causing the issue and disabled garbage collection at 15:54 as a precaution. Two more customer reports of documents missing from indexes came in via additional support cases, and additional engineers were brought in to investigate each report. At 17:20 the engineering team identified that the issue was caused by a code defect that caused the garbage collector to write a partial history for some documents with a large number of versions, which in turn caused the indexing system to miss adds/deletes and incorrectly include/exclude those documents from indexes. At 18:08 the engineering team initiated a repair of documents that were known to be impacted and the repair completed on June 2 at 4:01.
We know that data inconsistencies are unacceptable and we are prioritizing work to improve. Specifically, we’re taking the following steps:
We prioritize the availability, security, performance, and correctness of our service above everything else and apologize for any inconvenience that this event may have caused you. If you have further questions/comments about the event or require assistance with any remaining issues related to the event, please reach out to firstname.lastname@example.org.