38

7 Ways to Secure Amazon Athena

 5 years ago
source link: https://www.tuicool.com/articles/z2INfi3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

7 Ways to Secure Amazon Athena

zYjQnaa.jpg!web2yQRbim.jpg!web
Photo by MILKOVÍ on  Unsplash

Amazon Athena is a serverless data analysis tool from Amazon Web Services. It’s easy to use, doesn’t need any server to be set up, and supports ANSI SQL to query structured and semi-structured data from Amazon S3. Many companies choose Athena for its simplicity and low cost.

There are many resources available on the Internet and even here at Medium about what Athena can and can’t do, and the best practices to adhere to. I won’t go into this. Instead, let’s talk about the prickly, but the necessary question that always comes up in design meetings: security .

One of the first questions asked when considering Athena as a tool is how secure is it? Athena is a serverless tool: you don’t set up any Athena instance inside a VPC, and there’s no Security Group to filter out unwanted traffic. Just like S3, Athena must be accessed over the Internet.

That’s when there are a few raised eyebrows from the security and data architecture teams. Why you are asked, should you choose such a risky tool?

Well, it’s not risky at all, and like any other piece of technology - how secure you make it is really up to you.

Here is what you need to know about Athena security.

Broadly, data security can be considered in two areas: when data is at rest and when data is in flight.

Let’s consider data at rest.

Scenario #1:You have an S3 bucket containing data you want to query from Athena. How can you ensure the data is secure in the bucket?

First, make sure the source bucket isn’t publicly accessible — unless you consciously want it to be public for a very good reason. Period. That’s the first thing to do when considering Athena security (in fact when you use S3 for any purpose). There are few horror stories about data breaches caused by publicly accessible S3 buckets . Here is how you can change this property for a bucket from the AWS console:

RZNJZ3m.png!web2qYbIbe.png!web
Changing the public access policy for individual buckets

Second, make sure the data in the bucket is encrypted. This is encryption at rest. There are two ways to do this: you can encrypt the bucket:

ZNBFfiR.png!webFnm2aqY.png!web
Encrypting an S3 bucket from AWS console

or you can encrypt the source files:

ArmAvqi.png!web2AfIFvN.png!web
Encrypting an S3 object from AWS console

Either way, you can use the AWS Key Management Service to do the encryption. There are three types of keys:

  • SSE-S3 : This is where you let S3 manage the encryption key
  • CSE-KMS : Here, you have your own key and let KMS use that
  • SSE-KMS : You use a key generated by KMS and let KMS manage it

Unless there’s a need otherwise, use SSE-KMS keys. With SSE-KMS, you can control who can access the key. The image below shows the two types of access levels for KMS keys.

niE3AzN.png!webVjI7V3y.png!web
IAM users and roles can be made KMS key administrators
yMFJRbY.png!webvea26zj.png!web
IAM users and roles can be made KMS key users

Third, ensure your query results are encrypted . Amazon Athena stores query results in an S3 location (which you can configure) called the S3 staging directory . Encrypting the S3 bucket or source files doesn’t mean query results are encrypted too. Unless you encrypt the staging directory, you haven’t really encrypted your data at rest.

Think about it: you run a query from Athena on some sensitive data. The resultset is saved as a file in the staging location. Now, if your query is fetching sensitive data and if you haven’t encrypted the staging location, some of the sensitive information will remain there unencrypted.

What about encrypting individual query results? Once you have encrypted the staging directory, every subsequent query result will be encrypted.

This is how you encrypt the staging directory:

jIZVz2u.png!webEjmeUfQ.png!web
Encrypting the Athena query result location

Should you use the same key to encrypt your data and the query results? I recommend using different keys. There is a management overhead, but if one key is compromised, you know at least the other one is secure.

Fourth, you can encrypt your Glue Data Catalog. The Data Catalog holds all your Athena table definitions (among other things). Once you encrypt the catalog, the Athena table definitions (not the data) are encrypted. Depending on how far you want to go with encryption, this can be a nice addition.

yieqEnR.png!webYFBV3eE.png!web
Encrypting AWS Glue data catalog

Scenario #2: You have encrypted the data; how can you control access to the data?

You can fine-tune access to your source data with bucket policies . A bucket policy stipulates who has access to a bucket and what they can do with its contents (e.g. decrypt the data). Bucket policies can allow IAM users from the same or another AWS account to have access to the bucket. That means even if someone has access to the bucket encryption KMS key, s/he may not be able to access its contents if the policy explicitly denies access to the role, user or group.

VnaABzv.png!webyEfiieR.png!web
The S3 bucket policy editor allows writing bucket policy

Scenario #3: You want only certain users to run queries from Athena.

Athena doesn’t support user accounts like a traditional database. The only way to control who can access Athena is through IAM policies . There are two AWS-managed IAM policies for Athena:

eAr2auV.png!webayuAjyf.png!web
AWS managed policies for Amazon Athena

The first policy (AmazonAthenaFullAccess) allows the user to perform any action on Athena. The second one (AWSQuicksightAthenaAccess) should be assigned to IAM users who use Amazon Quicksight to access Athena.

I recommend you create two custom IAM policies for Athena users. One will be a “power user” policy that will allow creating, modifying or deleting Athena objects like databases, tables or views. The other one will be an “analyst” policy. It won’t have the administrative privileges.

Once you create the policies, create two roles and assign each policy to a role. Next, assign the roles to IAM groups. Finally, assign individual IAM users to these groups based on their access requirements.

If your Athena queries are running from an EC2 instance, you can assign the roles to that instance.

What about securing data in transit? How do you control that?

There’s nothing you need to do here. Amazon service endpoints are SSL encrypted. That means Transport Layer Security (TLS) is used to encrypt objects in transition between S3 and Athena.

If you are using a JDBC compliant SQL query tools to run Athena queries, the data returned to the client will be SSL encrypted.

The last option isn’t actually about securing anything, it’s more for monitoring.

You can enable CloudTrail on your AWS account, which once enabled, will log every API call made to any AWS service in that account. The log files for these events will be stored in S3 in a compressed and encrypted format.

Since CloudTrail logs are saved in S3, they are very much searchable from Athena. In fact, the CloudTrail console offers a facility to create an Athena table for the logs.

7b6bmmE.png!webvQ3UVjF.png!web
Creating an Athena table from the AWS CloudTrail console

Once you create the table, you can search the logs.

You can also configure CloudTrail to trap S3 data events like GetObject (Read) and PutObject (Write) for your Athena source buckets. You can use the logs from these data events to see when AWS Athena is accessing S3.

iAFvAb2.png!webIZRr6zf.png!web
AWS CloudTrail configuration for S3 data events

So now you have seen a few options for securing Amazon Athena. Which ones you implement is up to you.

What about other advanced areas? Like automating Athena? Or making it work with BI analytics tools? To learn these and more with hands-on exercises, you can enroll in my online course at Pluralsight: Advanced Operations with Amazon Athena .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK