Surprisingly one of my biggest costs running a serverless application has been the CloudTrail bill. Let’s look at some methods we can employ to keep this under control.
The best place to start troubleshooting this is to look at the cost breakdown. Using a profile that has access to billing navigate to Billing > Bills > Charges by Service.
You should see something like the following…
We can see a couple interesting things here. We have three different kinds of events, and two of them have an identical event count. That suggests that we are paying for a duplicate of an event record that should be free.
Let’s discuss the three types of events that we see in this billing breakdown.
Data Events record API actions that access or modify resources in your AWS account, such as retrieving an object from an S3 bucket or modifying a DynamoDB table, AWS SDKs, command line tools, and other AWS services.
Management events are very similar but they record “management level events”, such as resource creation (EG creating a new S3 bucket). They also record additional detail, not only what happened, but information about who performed the action.
The breakdown on what is considered to be a Data Event and Managment Event seems to be a bit blurry. For example programmatically adding a new file to an S3 bucket is considered to be a data event. Programmatically creating a Glue table partition is a management event. 🤷♂️
The second two event types that we see in the billing console FreeEventsRecorded and PaidEventsRecorded are both Management Events.
In short, the first copy of every management event is stored free. Additional copies of those event cost money.
Somewhere in our settings we have enabled a copy of Management Event records unnecessarily. Let’s see if we can track that down.
If we navigate to CloudTrail > Trails we can find the culprits.
We have two trails here and I can tell right away that it is going to be the step functions trail that is causing this issue. The cognito logs will have nowhere near the volume of the step functions trails.
If we examine this trail we have a couple of options. We could simply disable the tracking of Data Events, which we can see here this trail is tracking all read/write events on all buckets. We could make this more specific, so only track specific buckets.
Or we could choose to disable this trail entirely. Now that we understand that this trail is actually a duplicate copy of all management event logs, that seems like the best option for us.