A Five Minute Overview of AWS Macie

(If you would prefer to listen to this article, click this link to hear it using Amazon Polly. It will also be available in iTunes: search for LabR Learning Resources.)

AWS Macie is a managed service capable of scanning your stored information for sensitive information which may be exposed to outside parties. Macie uses machine learning to scan your data, classify it and protect the sensitive data. This sounds like a great tool to have in your AWS security program. As of April 2019, Macie only supports analyzing data stored in S3.

Once you enroll in AWS Macie, you configure the S3 buckets to analyze, and then review your alerts in the Macie dashboard. Let’s take a look at setting up and configuring Macie.

Setting up AWS Macie

Setting up Macie is as simple as going to your AWS Console and selecting AWS Macie from the services menu.

Image for post
Image for post
Enable AWS Macie for your Account

After enabling Macie, the dashboard view. is displayed. As we haven’t integrated anything into Macie at this point, the dashboard is empty. The dashboard view contains a high level view of critical alerts, event occurrences, user sessions, and users. The S3 data analysis consumes the majority of the dashboard view, and there are different views of the analysis. Here is a view of a dashboard after Macie has been set up and some analysis completed.

Image for post
Image for post
Initial AWS Macie Dashboard viewI

Integrations

As mentioned, Macie will not analyze anything until you configure some integrations. This involves selecting the AWS accounts Macie should examine, and what S3 buckets should be scanned.

In our example, there is only one AWS account, but it must still be selected for integration with Macie.

Image for post
Image for post

After selecting the account(s) to integrate, you can select the S3 buckets which Macie should analyze.

Image for post
Image for post

You may wish to be selective about the S3. buckets, as it is possible for Macie to identify events matching one of the settings which is a false positive because of the source. For example, including the S3 access log bucket in our sample, may not be appropriate as you can allow Macie to access CloudTrail for your account. Keeping our S3 access log bucket may result I duplicate findings between CloudTrail and the S3 bucket.

After selecting the buckets, Macie gives you the opportunity to review the selections you have made and gives you an estimated cost of the initial analysis.

Image for post
Image for post

After reviewing the settings, there is a final confirmation screen, at which point Macie will start processing the selected S3 buckets and classifying the data. It is just a matter of waiting for the analysis to be completed.

This is the one annoyance I found while preparing for this article — there is no easy way to determine if Macie has completed the initial scan. Indeed, while I was writing this article, the number of events alerts continued to increase! The every increasing results may have been because Macie had not yet completed the initial scan, or because Macie was scanning new data added to an S3 bucket by another process.

Once the S3 buckets have been integrated, you have the option of selecting additional accounts to integrate. Once you have finished integrating your accounts, you can review and possibly change the default Macie settings.

Settings

Macie comes with a very comprehensive configuration, which are accessed from the Settings view.

Image for post
Image for post

For each of the sections, you can see the risk rating associated with the analysis rule, what conditions are used to perform the match (such as specific keywords), and either edit the item (pencil) or search the existing findings for that specific item (magnifying glass).

Content Types

Macie makes use the MIME type associated with the file format. This section has a very comprehensive list of content types, 207 in fact. From the content types view, you can select the pencil to edit the settings, or the magnifying glass to search the Macie analysis for the selected item. This behavior is consistent across all of the settings.

Image for post
Image for post
Supported Content Types

File Extension

Sometimes we use custom file extensions, or file extensions denote a data or content type which may be different from what is associated with the extension. For example, we can use a .doc extension on a Microsoft Word document or a text document. There is a long list of identified file extensions Macie is aware of, 83 of them as of April 2019.

Image for post
Image for post
Supported File Extensions

Theme

A theme is a title associated with a keyword selection and associated with a file regardless of content type or file extension. For example, the “Attorney Client Privileged” theme is associated with an object if the object contains the keyword “attorney-client”, “attorney client” or “privileged”. At first glance this could be a source of false positives, but the theme is only applied if there are at least 2 of the keyword combinations found in the object.

Image for post
Image for post
Supported Themes

Regex

The final analysis category is regex, where information is identified by using regular expressions to find potential matches in the S3 bucket data. These rules rely upon pattern matching and are generally more precise than a keyword match. In some cases, keyword matches are not possible. For example, identifying a Social Security Number could only be accomplished with a regular expression. (It was interesting that I chose this example, because there is no regex to identify the 9 digit U.S. SSN pattern of ###-##-#### in the data.)

Image for post
Image for post

CloudTrail Events and Errors

Macie is capable of analyzing the account’s CloudTrail events for suspicious activity, and errors. For example, “Access Denied” errors in CloudTrail are flagged as possible conditions.

Basic Alerts

Macie also comes configured with 45 basic alerts, including things like AWS credentials embedded in source code.

Understanding the Dashboard

The Macie dashboard is a collection of widgets which can be clicked on to access additional information.

Image for post
Image for post

Here is another view of the dashboard — contrast it with the one shown at the beginning of this story, and you will see that Macie is either not finished the initial scan, or is continuing to monitor changes. While I am writing this, I have some code executing which is inserting objects into a specific S3 bucket.

The lower part of the page shows the information selected by the icons here

Image for post
Image for post

These icons are:

  • Time graph
  • - S3 objects
  • - S3 objects by PII
  • - S3 public objects and buckets
  • - S3 objects by ACL
  • - High Risk CloudTrail Events
  • - High Risk CloudTrail Errors
  • - Activity Location
  • - CloudTrail Events
  • - Activity by Internet Service Provider (ISP)
  • - CloudTrail User Identity

Selecting any of these options updates the view with the information for that selected icon.

What does Macie cost?

Macie’s pricing follows a similar structure to other AWS services — there is a quantity of access provided at no cost, and a fee for subsequent usage. For example, the initial analysis by the classification engine costs $0 for the first 1 GB of content, and $5 per GB after that. The first 100,000 CloudTrail Events are free, with a cost of $4 per 1,000,000 events after that.

The S3 metadata is stored for 30 days at no cost, and then deleted. If you wish to retain the metadata for a longer period, a charge of $0.05/GB is assessed. The [AWS Macie Pricing] webpage has a good example of the sample costs for a three month period.

Additionally, since Macie needs both CloudTrail management and data events for the S3 buckets, there may be additional charges for these associated events.

Conclusion

AWS Macie is a machine learning based classification engine capable of analyzing your S3 buckets to identify sensitive information which could be improperly exposed. Like other security tools, enabling everything from everywhere may provide a lot of data needing evaluation prior to remediation. This just means you may need to “tune” the configuration to best suit your specific environment and needs.

Additionally, there may be specific classifications which are missing and need to be added for your unique situation. For example, there is a keyword match to possibly identify U.S. Social Security Numbers, but no corresponding regular expression to identify a Social Security Number. Your specific industry may have other needs to identify which are also not present in the Macie provided configuration.

This is an interesting and valuable tool. While writing this article, Macie identified an S3 bucket created during a lab which had not been deleted, and which could have been used to upload and download objects from it by anyone on the internet. (It is fixed now.)

Remember, AWS Security Hub can also pull in data from Macie, which we will see in a second look at Security Hub in the coming months.

References

AWS Macie Product Overview

AWS Macie Pricing

AWS CloudTrail

AWS Security Hub

Copyright 2019, Chris Hare

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store