5

Deploying MWAA Using Terraform

 1 year ago
source link: https://dzone.com/articles/deploy-mwaa-using-terraform
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Deploying MWAA Using Terraform

How to use HashiCorp's open source Infrastructure as Code tool Terraform to configure and deploy your Managed Workflows for Apache Airflow environments.

In a previous post, I showed you how to use AWS CDK to automate the deployment and configuration of your Apache Airflow environments using Managed Workflows for Apache Airflow (MWAA) on AWS. In this quick how-to guide, I will share how you can use Terraform to do the same thing.

You will need:

  1. An AWS account with the right level of privileges
  2. A development environment with the Terraform tool installed
  3. Access to an AWS region where Managed Workflows for Apache Airflow is supported
  4. All code used in this how-to guide is provided in this GitHub repository

The MWAA Terraform Module

The MWAA Terraform module can be found here. Clone the repo into your local workspace.

git clone https://github.com/aws-ia/terraform-aws-mwaa.git

From the examples/basic folder, we have a simple MWAA stack that we can use to test that everything is working.

├── README.md
├── dags
│   └── hello_world_dag.py
├── main.tf
├── mwaa
│   └── requirements.txt
├── outputs.tf
├── providers.tf
└── variables.tf

The README.md contains a quick start on how to deploy your first environment. In this post, I will walk you through the steps. There are several key files you will need to understand before deploying your first MWAA environment.

MWAA and DAGS Folders

You will notice that we have a sample Apache Airflow DAG (hello_world_dag.py) that we want to deploy as we build our MWAA environment. We also have a requirements.txt file we will want to upload (it is currently empty).

This example shows how you can do this, as we will see later. For the moment, all you need to be aware of is that these are resources that you want to deploy as you build out your MWAA environment.

Variables.tf

This file contains configuration options that you can alter to change your MWAA environment - the name of the environment, the AWS region, and default tags. For this demo, these are the values I am using.

variable "name" {
  description = "Name of MWAA Environment"
  default     = "terraform-dzone-mwaa"
  type        = string
}

variable "region" {
  description = "region"
  type        = string
  default     = "eu-central-1"
}

variable "tags" {
  description = "Default tags"
  default     = {"env": "dzone", "dept": "AWS Developer Relations"}
  type        = map(string)
}

variable "vpc_cidr" {
  description = "VPC CIDR for MWAA"
  type        = string
  default     = "10.1.0.0/16"
}

Main.tf

The main.tf contains the main Terraform configuration file that will deploy resources using values contained in the variables.tf. 

At the top of the file, we have the following: the important value here is bucket_name, which will configure a unique S3 bucket that will be used by your MWAA environment. This is important as the subsequent uploading of the sample DAGs, the requirements.txt, as well as all the IAM policy documents, use this value.

locals {
  azs         = slice(data.aws_availability_zones.available.names, 0, 2)
  bucket_name = format("%s-%s", "aws-ia-mwaa", data.aws_caller_identity.current.account_id)
}

Note! You can change the bucket_name formatting rules to better match your own requirements if you need to.

Next in the file, we have the section that creates and uploads the sample DAG and the requirements.txt.

#-----------------------------------------------------------
# Create an S3 bucket and upload sample DAG
#-----------------------------------------------------------
#tfsec:ignore:AWS017 tfsec:ignore:AWS002 tfsec:ignore:AWS077
resource "aws_s3_bucket" "this" {
  bucket = local.bucket_name
  tags   = var.tags
}

resource "aws_s3_bucket_acl" "this" {
  bucket = aws_s3_bucket.this.id
  acl    = "private"
}

resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration {
    status = "Enabled"
  }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  bucket = aws_s3_bucket.this.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "this" {
  bucket                  = aws_s3_bucket.this.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Upload DAGS
resource "aws_s3_object" "object1" {
  for_each = fileset("dags/", "*")
  bucket   = aws_s3_bucket.this.id
  key      = "dags/${each.value}"
  source   = "dags/${each.value}"
  etag     = filemd5("dags/${each.value}")
}

# Upload plugins/requirements.txt
resource "aws_s3_object" "reqs" {
  for_each = fileset("mwaa/", "*")
  bucket   = aws_s3_bucket.this.id
  key      = each.value
  source   = "mwaa/${each.value}"
  etag     = filemd5("mwaa/${each.value}")
}

In this section, we define the name of the MWAA environment we want to use, the version of Apache Airflow (1.12, 2.0.2, or 2.2.2 are the supported versions of Apache Airflow that the MWAA service supports today), and the size of the MWAA Worker nodes (mw1.small, mw1.medium, or mw1.large). We then define the name of the dags folder, which Apache Airflow will use as the "dags folder" to search for DAGs to run. Finally, you can optionally set a plugins.zip and requirements.zip file and location, but these are not set by default.

  name                 = "basic-mwaa"
  airflow_version      = "2.2.2"
  environment_class    = "mw1.medium"
  dag_s3_path          = "dags"
  #plugins_s3_path      = "plugins.zip"
  #requirements_s3_path = "requirements.txt"

Note! If you want to set the plugins_s3_path or requirements_s3_path you will need to set these here and then configure/deploy the plugins.zip and requirements.txt separately.

The next section configures the logging for the various MWAA services. This section allows us to define the logging verbosity for the different MWAA services. Bear in mind that there is a cost associated with this, so understand this before configuring these. The values you can use are CRITICAL, ERROR, WARNING, INFO, or DEBUG.

  logging_configuration = {
    dag_processing_logs = {
      enabled   = true
      log_level = "INFO"
    }

    scheduler_logs = {
      enabled   = true
      log_level = "WARNING"
    }

    task_logs = {
      enabled   = true
      log_level = "DEBUG"
    }

    webserver_logs = {
      enabled   = true
      log_level = "INFO"
    }

    worker_logs = {
      enabled   = true
      log_level = "INFO"
    }
  }

Note! There are costs associated with keeping CloudWatch logs, so review these settings and make sure you keep an eye on the costs and adjust as needed.

In the next section, you can define some custom Apache Airflow configuration parameters if you need to do so. You can check out the MWAA documentation to find out more, but you might use these to tweak performance settings or enable AWS integration of things like AWS Secrets Manager.

  airflow_configuration_options = {
    "core.load_default_connections" = "false"
    "core.load_examples" = "false"
    "webserver.dag_default_view" = "tree"
    "webserver.dag_orientation" = "TB"
  }

Provides scaling settings for the Apache Airflow worker nodes and then provides details of the VPC networking. You should probably not need to change these (network) settings.

  min_workers        = 1
  max_workers        = 5
  vpc_id             = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnets

  webserver_access_mode = "PUBLIC_ONLY"
  source_cidr           = ["10.1.0.0/16"] 

Finally, these options allow you to define and bring your own AWS security groups, execution roles, or S3 buckets to use within MWAA. If you do create your own, make sure these meet the minimum requirements by checking the MWAA documentation. You will also need to comment on the sections above so that Terraform will configure the correct resources.

  # create_security_group = 
  # source_bucket_arn = 
  # execution_role_arn = 

Now that we have reviewed our configuration files, we are good to go. You are now ready to deploy your MWAA environment.

Deploying the MWAA Environment

Now that we have gone over these configurations, we are ready to try and deploy this. From my Visual Code IDE, I run the following command to initiate.

terraform init

This generates the following output (your output will vary, but it should look similar).

 Initializing modules...
- mwaa in ../..
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.14.2 for vpc...
- vpc in .terraform/modules/vpc

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.63.0, ~> 4.20.0"...
- Installing hashicorp/aws v4.20.1...
- Installed hashicorp/aws v4.20.1 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

If we want to display what AWS resources Terraform will create, we can use another Terraform command.

terraform plan

I will not display all the output as it will be different from yours, but you can review it, and you will see all the different resources that are about to be configured and deployed.

To deploy, you can now run terraform apply. After running this command, you will need to check the output and then answer "yes" when prompted if it looks all good. (I have just shown the initial output this generates here below).

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.vpc.aws_vpc.this[0]: Creating...
module.vpc.aws_eip.nat[0]: Creating...
module.mwaa.aws_iam_role.mwaa[0]: Creating...
module.mwaa.aws_s3_bucket.mwaa[0]: Creating...
...
...

The deployment will take about 20-25 minutes. You should get the following output to show that the deployment has been completed.

Apply complete! Resources: 30 added, 0 changed, 0 destroyed.

Outputs:

mwaa_arn = "arn:aws:airflow:eu-central-1:704533066374:environment/basic-mwaa"
mwaa_role_arn = "arn:aws:iam::704533066374:role/mwaa-executor20220628091608972100000001"
mwaa_security_group_id = "sg-0fc61902a4cbbae48"
mwaa_service_role_arn = "arn:aws:iam::704533066374:role/aws-service-role/airflow.amazonaws.com/AWSServiceRoleForAmazonMWAA"
mwaa_status = "AVAILABLE"
mwaa_webserver_url = "1008702f-c770-4b55-bfbf-8f9e6ee823c5.c9.eu-central-1.airflow.amazonaws.com"

We can now copy the mwaa_webserver_url into a browser and log in using our AWS credentials to access our new MWAA environment.

As you can see from the screenshot, our sample DAG has also been uploaded into the environment, and we can use this to test that our environment is working as expected.

Deleting Our MWAA Environment

So we have covered how to configure and deploy MWAA environments; now, I will cover how you can clean up and remove them. The clean-up process takes around 20 minutes to complete and is pretty straightforward. There are a couple of things to think about first though.

First, the destroy process will clean up all resources, including the S3 bucket that contains your DAGs. If you need to keep these safe, make sure that you copy/move them to another location before cleaning up your MWAA environment.

Second, the CloudWatch log groups that are created when the MWAA environment is configured will also not be deleted. If you want to completely clean up your environment, remember to go to CloudWatch and search under log groups and delete as needed.

With that out of the way, to remove this new environment we created and clean up all the resources, we issue the terraform destroy command, and when prompted, respond appropriately.

terraform destroy

Running this command will generate a lot of output, displaying what resources will be removed. You will be prompted to enter "yes" to confirm you want to delete the MWAA environment. It will then start cleaning up the resources. After about 3-4 minutes, you should start to see the following as it cleans up the MWAA environment. 

module.mwaa.aws_mwaa_environment.mwaa: Still destroying... [id=terraform-mwaa, 2m40s elapsed]
module.mwaa.aws_mwaa_environment.mwaa: Still destroying... [id=terraform-mwaa, 2m50s elapsed]
module.mwaa.aws_mwaa_environment.mwaa: Still destroying... [id=terraform-mwaa, 3m0s elapsed]
module.mwaa.aws_mwaa_environment.mwaa: Still destroying... [id=terraform-mwaa, 3m10s elapsed]
module.mwaa.aws_mwaa_environment.mwaa: Still destroying... [id=terraform-mwaa, 3m20s elapsed]

This will take approx. 20 minutes to complete.

What's Next?

The AWS team that has put this new Terraform module together would love your feedback. Does this work as you expect it? What examples would you like included? Did you try this and find any errors or quirks? Please let us know, either directly by raising an issue in the project or by reaching out to me.

All the resources you need are available on the Terraform module page, and make sure you check out the examples that I have used in this post. The MWAA Terraform module is also available in the Terraform registry.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK