Low Cost "Overkill" AWS Infrastructure for a Newborn Startup

Introduction

On a cold and dark evening in December 2022, a good friend of mine calls me and says: "Nicolas, I am creating a product that is going to scale massively and revolutionize the market, and I need your help". Now, if I had a dollar for every time I heard this sentence, I would be financing trips to Mars by now.

Nevertheless, I met with the friend and his technical lead. After long hours of discussions (and daydreaming), the business model was summarized as follows: "The product is a maintenance management platform designed to help companies and vehicle owners to efficiently manage their vehicles. The product aims to automate the entire maintenance procedure and provide preventive and predictive solutions by connecting vehicles to IoT devices, which allows the monitoring of maintenance parameters in real-time."

I agreed to help them for many reasons, some of which include:

They actually know what they are doing.
The technical lead is absolutely intelligent.
I trust they will make it.

My job, evidently, was to architect and implement the infrastructure, deployment, and maintenance of the application.

Requirements and Challenges

At the time of discussion, they had just finished an MVP that was deployed poorly on AWS. In fact, both my friend and the technical lead have very minimal experience in everything related to infrastructure and DevOps. In addition, they had little money to pay my original fees and therefore did not want to be a big burden on me. So at first, they suggested that I perform a very basic infrastructure and deployment strategy, that they can use temporarily until they raise more money.

The first thought I had was: "Those noobs don't even know what they are talking about". From my experience in consulting with more than two dozen companies (from small startups to extremely large multinationals), once you start working with a bad infrastructure, chances are you will keep building on top of it until working on it becomes a living hell, and then possibly run out of business due to bad tech. I was definitely not going to be part of this scenario

Therefore, my answer was: "No, I will do it properly". So after countless back-and-forth discussions, below is the summary of the challenges to think about while architecting the solution:

There must be at least two environments: Develop and Production.
The developers must be able to operate the infrastructure without having to become DevOps Engineers.
Proper observability must be employed to quickly identify and solve issues when they happen (Because they will happen).
The cost must be as optimized as possible.
And finally, I set a requirement, for my sake primarily: The solution must be robust enough to minimize the number of headaches I have to suffer from in the future.

Understanding the Application

Before actually coming up with the solution, a good approach would be to first understand the different components of the application. Therefore, as a first step, the technical lead was kind enough to explain to me the different components of the application, and how to run it locally.

For simplicity purposes, both the backend (NodeJS) and frontend (ReactJS) applications are designed as a mono repository, managed through NX. The application stores its data in a PostgreSQL database. Surprisingly, the application was very well documented, a phenomenon I have rarely seen in my life. Therefore, understanding the behavior and the build steps of the application wasn't so difficult.

In about three hours, I was able to containerize, deploy, and run all the containerized application components on a single Linux machine. Amazing! First step complete.

Infrastructure Requirements

Now that the application is containerized, and all the steps documented, it is time to architect the infrastructure. Whenever I am architecting a solution, regardless of its complexity and cost, I always make sure to achieve the following characteristics:

Security: One of the most integral parts in any application is security. A robust software is one that prohibits cyber attacks, such as SQL Injection Attacks, Password Attacks, Cross Site Scripting Attacks, etc. Integrating security mechanisms in the code is a mandatory practice to ensure the safety of the system in general, especially the data layer.
Availability: Refers to the probability that a system is running as required, when required, during the time it is supposed to be running. A good practice to achieve availability would be to replicate the system and application as much as possible (e.g., containers, machines, databases, etc).
Scalability: The on-demand provisioning of resources offered by the cloud allows its users to quickly scale-in and scale-out resources based on the varying load. This is absolutely important, especially to optimize the cost, all while serving the traffic consistently.
System Observability: One of the most important mechanisms required to achieve a robust application is system visibility:
1. Logging: Aggregating the application logs and displaying them in an organized fashion allows the developers to test, debug, and enhance the application.
2. Tracing: Tracing the requests is another important practice, allowing to tail every request flowing in and out of the system and rapidly finding and fixing errors and bottlenecks.
3. Monitoring: It is essential to have accurate and reliable monitoring mechanisms in every aspect of the system. Key metrics that must be monitored include but are not limited to CPU utilization, Memory Utilization, Disk Read/Write Operations, Disk space, etc.

Infrastructure Solution

In light of all the above, and after twisting my imagination for a little bit, I came up with the architecture depicted in the diagram below (Does not display all the components used):

Networking

The infrastructure is created in the region of Ireland (eu-west-1). The following network components are created:

Virtual Private Cluster: To isolate the resources in a private network.
Internet Gateway: To provide internet connectivity to the resources in the public subnets.
NAT Gateway: To provide outbound connectivity to private resources.
Public Subnets: In each availability zone.
Private Subnets: In each availability zone.

A VPN instance with a free license is deployed to provide secure connectivity for the developers and system administrators to the private resources in the VPC.

AWS EKS

An AWS EKS cluster is created to orchestrate the backend service of each environment. The cluster is composed of one node pool made of 2 nodes, each in an Availability zone.

Application Load Balancer

An Application Load Balancer (Layer 7) is created to expose the endpoints and provide the routing rules required from the internet into the application. The load balancer is configured to serve traffic on ports 80 and 443.

AWS RDS PostgreSQL

An AWS RDS PostgreSQL database is created to hold and persist the application’s data. Both the develop and production environments are hosted on the same instance but are separated logically.

Clients VM

A private virtual machine on which client applications are installed, to interact with different parts of the infrastructure (e.g., kubectl, PostgreSQL client, etc).

AWS ECR

Two ECR repositories are created for the backend service, one for each environment.

S3 Bucket

An AWS S3 bucket is created to host the frontend application for each environment.

AWS Cloudfront

An AWS Cloudfront distribution is created to cache the frontend application hosted on AWS S3 of each environment.

ACM Public certificates are required for the domains. A public certificate must be created in the region of eu-west-1 to be used by the load balancer, and another one in the region of us-east-1, to be used by Cloudfront.

Cloudwatch

The infrastructure metrics and application logs are configured to be displayed on Cloudwatch.

Application Deployment

Now that the infrastructure was successfully architected and created, I proceeded to deploy the containerized backend services and ensured their proper connectivity to the databases. Afterward, the frontend application was built and deployed on S3.

Continuous Delivery Pipelines

The last step before signaling to the team the good news was to automate the build and delivery steps of all the services. Evidently, none of the developers should perform tedious and time-wasting tasks of building and deploying the application everytime there is a change. As a matter of fact, knowing the pace at which the developers are working, I expect they push code to develop 276 million times per day.

Therefore, I used AWS Codebuild and AWS CodePipeline to automate the steps of building and deploying the services. The diagram below depicts all the steps required to continuously deliver the frontend and backend applications:

Conclusion

Once everything is done, I met with the friend and with the technical lead for a handover. They were so pleased with the outcome, stating that the infrastructure is amazing, but is overkill and much more than they need right now.

But in reality, it is not an overkill. As a matter of fact, the product and the team are growing very rapidly. This solution is a skeleton that can be quickly and easily modified and scaled upon need:

Backend services replicas can be easily modified.
The EKS nodes can be easily scaled vertically and horizontally.
The frontend application is on S3, which is automatically scalable.
The database can be easily scaled vertically and horizontally.

After delivering the solution in mid December 2022:

The developers are happy because of the robustness and ease of use of the infrastructure.
My friend is happy because his application is live, and is costing him less than $500 per month.
I am happy because they never called me with a complaint.

Everybody is happy :)))) The end!!

Introduction

Requirements and Challenges

Understanding the Application

Infrastructure Requirements

Infrastructure Solution

Networking

AWS EKS

Application Load Balancer

AWS RDS PostgreSQL

Clients VM

AWS ECR

S3 Bucket

AWS Cloudfront

Cloudwatch

Application Deployment

Continuous Delivery Pipelines

Conclusion

Recommend

Hackaday Links: April 2, 2023

Apple faces difficult test with mixed-reality headset

Sparklabs is hunting 10 startups for its CleanTech accelerator program

介绍一下gitea的action

7 Tricks to take the Performance of your Website to the Moon 🚀🌙

英特尔已经向台积电下单后续GPU：Battlemage和Celestial分别采用4nm和3nm

The Middle School Kids Bootcamp That Changed Lives

3 common myths about sustainability and cloud computing

惠普OMEN暗影精灵9系列发布：6999元起最高13代酷睿i9

Why JavaScript is a Prototype-based OOP

About Joyk