june-logo
June's logo
Customers
Pricing
Changelog
Ferruccio BalestreriCTO and Co-Founder at June

04 Feb 22

June's outage on January 12-13th 2022

We had an outage between Wednesday the 12th and Thursday the 13th of January 2022. After our standup we noticed some strange behavior with our Slack notifications, the volume of notifications we were sending didn’t look right. Looking at our data we couldn’t see some of our events for the last 18 hours.

While investigating why we weren’t processing as many Slack notifications we learned that a code change had broken the ingestion of some events from Segment.

After a short time, we identified and fixed the issue.

Customer impact

If you were one of our impacted customers you should've already received an email about this. If you haven't received any emails then there's no impact on your workspace.

Between the 12th of January, 14:00 UTC to 13 January 2022, 10:00 UTC we didn't ingest some events coming from backend SDKs. So you might have some data gaps in June for that interval.

After some discussions in the last 10 days, Segment was able to replay data into June for users on their business plan. This fixed these inconsistencies.

Unfortunately, they can't do this for all of our impacted customers. Only users on the Segment Business were able to replay data into June.

The technical details

What happened is that we introduced some faulty code in our event ingestion pipeline.

Most events that we receive from Segment's Javascript SDK look like this:

{
  "anonymousId": "5092b127-80ae-474e-b99c-b019154a7197",
  "context": {
    "ip": "1.66.060.202",
    "library": {
      "name": "analytics.js",
      "version": "next-1.31.2"
    },
    "locale": "en-GB",
    "page": {
      "path": "/a/1234/event/5678",
      "referrer": "",
      "search": "",
      "title": "June Analytics",
      "url": "https://app.june.so/a/1234/event/5678"
    },
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"
  },
  "event": "events_opened",
  "integrations": {},
  "messageId": "ajs-next-166d11332078db31b9b74a41aabdbbf4",
  "originalTimestamp": "2022-01-27T11:11:56.086Z",
  "properties": {},
  "receivedAt": "2022-01-27T11:11:56.648Z",
  "sentAt": "2022-01-27T11:11:56.086Z",
  "timestamp": "2022-01-27T11:11:56.648Z",
  "type": "track",
  "userId": "1234"
}
An example of a track event we'd receive from Segment

When receiving backend events through Segment the payload we receive looks like this instead:

{
  "context": {
    "library": {
      "name": "analytics-ruby",
      "version": "2.2.8"
    }
  },
  "event": "event_created",
  "integrations": {},
  "messageId": "c97eac8b-2818-42de-b19e-f5e588abb4da",
  "originalTimestamp": "2022-01-27T11:24:11.863+00:00",
  "properties": {
    "event_count": 1
  },
  "receivedAt": "2022-01-27T11:24:12.384Z",
  "sentAt": "2022-01-27T11:24:11.865Z",
  "timestamp": "2022-01-27T11:24:12.382Z",
  "type": "track",
  "userId": "1234",
  "writeKey": "A_WRITE_KEY"
}
An example of backend track event

So what happened was that for 19 hours we rejected many backend events from Segment.

This caused a spike in 401 errors (unauthorized), that we couldn't detect before in our instrumentation, because the number of requests we receive is highly volatile.

Our changes moving forward

In the last two weeks, we followed up on this incident to make sure we can catch things like this a lot earlier as they happen.

  1. We improved our instrumentation setup. We can now identify these problems within minutes. This makes it easier to spot new instances of similar errors.
  2. We improved our approval process for any code changes that impact our data ingestion. This should make it less likely for similar incidents to happen in the future.

Closing out

We deeply regret the disruption in service. Every incident is an opportunity to learn, and an unplanned investment in future reliability. We learned a lot from this incident and, as always, we intend to make the most of this unplanned investment to make our infrastructure better in 2022 and beyond.

If you have any questions don't hesitate to reach out, we're here to address any of your concerns

Continue reading

Or back to June.so
post image

Using product usage data to boost Sales in B2B Startups

25 Mar 24

post image

Bootstrapping is Dead. Use OPM.

17 Mar 24

Get on board in 30 mins

We're here to help with your journey to PMF. Set up June
in under 30 minutes and get our Pro plan free for a month.

Get started
Arrow-secondary
Barcode
WTF
Plane
PMF
FROMWhat the f***
TOProduct Market Fit
DEPARTURE