This text was originally published in French in 2020 on the Revolve Blog by Jérémie RODON. Translation to English by Nicole Hage Chahine.
On AWS, it is considered best practice to segregate our environments. It is recommended to have separate environments for production, pre-production, and development, ensuring their isolation. Similarly, it is advisable to have separate environments for different teams. In recent years, AWS has dedicated significant efforts to enhance the features of AWS Organizations, promoting the division of resources into distinct AWS accounts. As a result, it has become common for companies, even of modest size, to manage several dozen AWS accounts… and an equal number of VPCs!
The issue of interconnecting these numerous VPCs, both with each other and with the corporate network, has rapidly become critical when designing an AWS infrastructure
The costly “predecessor” of the Transit VPC model
VPC peering was quickly ruled out because peering connections are not transitive, and we quickly reached the limitations of the feature, resulting in a tangled network mess.
In fact, for a long time, the only scalable model that worked was the Transit VPC. Although not our main topic, a quick recap helps us better understand where we came from. The concept was “simple”: in a dedicated VPC (the Transit VPC), we instantiated two large c4.8xlarge instances with our preferred router/firewall appliance AMI, and then configured these appliances to establish VPN connections to each of our other VPCs.
We ensured that the Transit VPC had connectivity with our on-premises network, and with proper configuration management, voila! We had an extended enterprise network in AWS, more or less highly available, capable of reaching thousands of VPCs.
Figure 1: Transit VPC with its router/firewall instances. Each instance establishes two tunnels to each of the “App VPCs.”
Certainly, the solution was far from perfect. Firstly, it incurred exorbitant costs due to the expenses associated with instances, licenses, and VPN connections. Secondly, there were difficulties in effectively managing appliance redundancy and keeping the overall configuration up to date as new VPCs were added.
As an interesting anecdote, I encountered a company that extensively used this solution (with thousands of VPCs), resulting in monthly costs amounting to hundreds of thousands of dollars. Adding new VPCs involved submitting a request to the network team in the form of an Excel file, and even then, it could take up to a week if luck was on your side.
In essence, as you may have surmised, this approach was not highly compatible with the principles of DevSecOps and agility. However, alternatives were limited, leaving us with no other choice.
The horizon finally brightened in November 2018 at the re:Invent conference in Las Vegas, with the eagerly anticipated announcement of AWS Transit Gateway.
Introducing Transit Gateway
AWS Transit Gateway simply replaces the entire concept of Transit VPC with a single, managed, scalable, redundant, and automated service.
Compared to what it does for us, I would readily assert that the service is easy to manage. However, when diving into it for the first time, one can easily feel a bit lost, especially if they are not familiar with networking concepts. Let’s first examine the different components of the service and their relationships, step by step.
Transit Gateway
In addition to being the name of the service, “Transit Gateway” is also the name of the central component. A Transit Gateway is a regional element: typically, one is created per region, regardless of the number of AWS accounts.
Generally, it is created in a dedicated AWS account (let’s call it “network”) and shared with the rest of our organization through AWS Resource Access Manager (RAM). This allows it to be visible and accessible from all our accounts.
Attachment
To connect a VPC to a Transit Gateway, we create an “Attachment.” The creation process is initiated from the VPC account. The “network” account, hosting the Transit Gateway, then needs to accept the attachment creation. Alternatively, the Transit Gateway can be configured to automatically accept new attachments (but caution should be exercised to ensure that the Transit Gateway doesn’t use a default route table, as it could pose a potential security issue).
By default, a Transit Gateway can have up to 5000 attachments (soft limit).
Once the attachment is created, the VPC route tables can use a new target: the Transit Gateway. This allows us to decide which subnet(s) of the VPC can communicate with it. In general, we route all “private” traffic (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) to the Transit Gateway or even set the Transit Gateway as the default target. However, the specific routing depends on the desired objectives.
Since the Transit Gateway is a regional component, it can only be attached to VPCs within the same region. If you have VPCs in multiple regions, it is possible to establish peering connections between Transit Gateways in different regions, enabling the creation of a multi-region network.
Nevertheless, this topic is beyond the scope of this series of articles, and we will only consider the case of a single region. If you are interested in an article about Transit Gateways in a multi-region context, please let me know, as it would encourage me to write it!
A small clarification: an attachment is not necessarily a connection between a Transit Gateway and a VPC. An attachment can also be a Transit Gateway Peering, a VPN connection, or even a Direct Connect link (it technically has a different name in the case of Direct Connect).
Transit Gateway Route Table
When using a Transit Gateway, it is essential to have control over network routing. This is where Transit Gateway Route Tables (TGWRT) come into play. Each Transit Gateway can have up to 20 TGWRTs, containing a total of 10,000 routes (these are soft limits).
Each route consists of a CIDR block and a target, which is an attachment.
If we disregard the higher limits and different targets, a Transit Gateway Route Table functions similarly to the VPC Route Tables we are familiar with.
Notably, they consider the “closest match” principle: the route used for routing decision will be the most specific one among those matching the destination of the packet. Just like with VPC Route Tables, the same CIDR block cannot be used twice in the same TGWRT, ensuring that there is no ambiguity.
Association, propagation & routes
Each attachment is associated with a single TGWRT, and it is the TGWRT that determines the routing of IP packets originating from the attachment. The routing decision is based on the destination IP of the packet and the associated route table for the attachment. It’s important to note that while an attachment can only be associated with one TGWRT, a TGWRT can be associated with multiple attachments.
Additionally, it is possible to propagate an attachment to one or more TGWRTs. Propagation should not be confused with association. Propagation simply facilitates the creation of a route in a TGWRT. When an attachment is propagated to a TGWRT, the service automatically creates a route using the CIDR block of the corresponding VPC (or BGP announcements from the VPN if applicable) with that attachment as the target. The same configuration could be manually done without any functional difference; it’s just a convenience feature.
To summarize:
- We associate an attachment with a TGWRT when we want that TGWRT to be used for routing packets from that attachment (outbound traffic).
- We propagate an attachment to one or more TGWRTs when we want those TGWRTs to be able to route traffic to that attachment (send packets into the pipe).
Here, the attachments from our “App VPCs” are associated with the TGWRT “AppVPCs,” and they are propagated to the TGWRT “Network.” This specific configuration allows each “App VPC” to communicate via the VPN but not directly with each other.
One flow, four route tables
One important point to note is that when establishing connectivity between two VPCs via Transit Gateway, it may not involve configuring just a single TGWRT.
In fact, there is no restriction on associating the attachments of our VPCs with different TGWRTs. In practice, this is often the case, as we will see. Additionally, the VPC Route Tables of each VPC must also be properly configured.
As seen in the diagram, when instance A and instance B communicate, TGWRT 1 is used for traffic from A to B because it is associated with attachment A. However, TGWRT 2 is used for traffic from B to A because it is associated with attachment B. In general, bidirectional traffic is necessary for proper functionality, so we indeed need to consider all four route tables.
“One flow, two Transit Gateway Route Tables, and two VPC Route Tables.”
It is important to keep this in mind when debugging or auditing Transit Gateway configurations.
The trivial use case: hub and spoke
The most trivial use case is the hub and spoke. The functional requirement is simple: I want everyone to communicate with each other.
It is very straightforward to set up with Transit Gateway as it requires only a single route table. We create all the attachments, associate them all with the single TGWRT, and propagate them all to this same TGWRT.
This results in the following configuration:
And everyone communicates with everyone else.
However, in practice, we may not necessarily want that.
In the next article, we will see how to manage a more complex network at scale, where some VPCs need to communicate with each other, while others do not, or where certain VPCs need access to the VPN and others do not, and so on.
In short, I will introduce my method for managing and evolving a complex network composed of multiple routing domains at scale.