Amazon MSK Connect is a feature of Amazon Managed Streaming for Apache Kafka (Amazon MSK) that offers a fully managed Apache Kafka Connect environment on AWS. With MSK Connect, you can deploy fully managed connectors built for Kafka Connect that move data into or pull data from popular data stores like Amazon S3 and Amazon OpenSearch Service. With the introduction of the Private DNS support into MSK Connect, connectors are able to resolve private customer domain names, using their DNS servers configured in the customer VPC DHCP Options set. This post demonstrates a solution for resolving private DNS hostnames defined in a customer VPC for MSK Connect.
You may want to use private DNS hostname support for MSK Connect for multiple reasons. Before the private DNS resolution capability included with MSK Connect, it used the service VPC DNS resolver for DNS resolution. MSK Connect didn’t use the private DNS servers defined in the customer VPC DHCP option sets for DNS resolution. The connectors were only able to reference hostnames in the connector configuration or plugin that are publicly resolvable and couldn’t resolve private hostnames defined in either a private hosted zone or use DNS servers in another customer network.
Many customers ensure that their internal DNS applications are not publicly resolvable. For example, you might have a MySQL or PostgreSQL database and may not want the DNS name for your database to be publicly resolvable or accessible. Amazon Relational Database Service (Amazon RDS) or Amazon Aurora servers have DNS names that are publicly resolvable but not accessible. You can have multiple internal applications such as databases, data warehouses, or other systems where DNS names are not publicly resolvable.
With the recent launch of MSK Connect private DNS support, you can configure connectors to reference public or private domain names. Connectors use the DNS servers configured in your VPC’s DHCP option set to resolve domain names. You can now use MSK Connect to privately connect with databases, data warehouses, and other resources in your VPC to comply with your security needs.
If you have a MySQL or PostgreSQL database with private DNS, you can configure it on a custom DNS server and configure the VPC-specific DHCP option set to do the DNS resolution using the custom DNS server local to the VPC instead of using the service DNS resolution.
A customer can have different architecture options to set up their MSK Connect. For example, they can have Amazon MSK and MSK Connect are in the same VPC or source system in VPC1 and Amazon MSK and MSK Connect are in VPC2 or source system, Amazon MSK and MSK Connect are all in different VPCs.
The following setup uses two different VPCs, where the MySQL VPC hosts the MySQL database and the MSK VPC hosts Amazon MSK, MSK Connect, the DNS server, and various other components. You can extend this architecture to support other deployment topologies using appropriate AWS Identity and Access Management (IAM) permissions and connectivity options.
This post provides step-by-step instructions to set up MSK Connect where it will receive data from a source MySQL database with private DNS hostname in the MySQL VPC and send data to Amazon MSK using MSK Connect in another VPC. The following diagram illustrates the high-level architecture.
The setup instructions include the following key steps:
To follow the tutorial in this post, you need the following:
Before configuring the MSK Connect, we need to set up the VPCs, subnets, and other core infrastructure components. To set up resources in your AWS account, complete the following steps:
The CloudFormation template creates the following key resources in your account:
Please note, if you’re using VPC peering or AWS Transit Gateway with MSK Connect, don’t configure your connector for reaching the peered VPC resources with IPs in the CIDR ranges. For more information, refer to Connecting from connectors.
Complete the following steps to configure the DNS server:
For the allow-transfer attribute, update the DNS server internal IP address to allow-transfer{ localhost; <DNS Server internal IP address>; };.
You can find the DNS server IP address on the CloudFormation template Outputs tab.
Note that the MSK cluster is still not set up at this stage. We need to update the Kafka broker DNS names and their respective internal IP addresses in the /var/named/kafka.region.amazonaws.com configuration file after setting up the MSK cluster later in this post. For instructions, refer to here.
Also note that these settings configure the DNS server for this post. In your own environment, you can configure the DNS server as per your needs.
You should see the following message:Redirecting to /bin/systemctl restart named.service
Your custom DNS server is up and running now.
Typically, we can use an Amazon RDS for MySQL database, but for this post, we use custom MySQL database servers. The Amazon RDS DNS is publicly accessible and MSK Connect supports it, but it was not able to support databases or applications with private DNS in the past. With the latest private DNS hostnames feature launch, it can support applications’ private DNS as well, so we use a MySQL database on the EC2 instance.
This installation provides information about setting up the MySQL database on a single-node EC2 instance. This should not be used for your production setup. You should follow appropriate guidance for setting up and configuring MySQL in your account.
The MySQL database is already set up using the CloudFormation template and is ready to use now. To upload the data, complete the followings steps:
This will insert the records in various different tables in the salesdb database.
DHCP option sets give you control over the following aspects of routing in your virtual network:
To support private DNS, you can use an Amazon Route 53 private zone or your own custom DNS server. If you use a Route 53 private zone, the setup will work automatically and there is no need to make any changes to the default DHCP option set for the MSK VPC. For a custom DNS server, complete the following steps to set up a custom DHCP configuration using Amazon Virtual Private Cloud (Amazon VPC) and attach it to the MSK VPC.
There will be a default DHCP option set in your VPC attached to the Amazon provided DNS server. At this stage, the requests will go to Amazon’s provided DNS server for resolution. However, we create a new DHCP option set because we’re using a custom DNS server.
Now our base infrastructure setup is ready to move to the next stage. As part of our base infrastructure, we have set up the following key components successfully:
At this stage, the MySQL DB DNS name is resolvable using a custom DNS server instead of Route 53.
The next step is to deploy the MSK cluster and MSK Connect, which will fetch records from the salesdb and send it to an Amazon Simple Storage Service (Amazon S3) bucket. In this section, we provide a walkthrough of replicating the MySQL database (salesdb) to Amazon MSK using Debezium, an open-source connector. The connector will monitor for any changes to the database and capture any changes to the tables.
With MSK Connect, you can run fully managed Apache Kafka Connect workloads on AWS. MSK Connect provisions the required resources and sets up the cluster. It continuously monitors the health and delivery state of connectors, patches and manages the underlying hardware, and auto scales connectors to match changes in throughput. As a result, you can focus your resources on building applications rather than managing infrastructure.
MSK Connect will make use of the custom DNS server in the VPC and it won’t be dependent on Route 53.
Complete the following steps to create an MSK cluster:
In the next step, we attach this configuration to a cluster. Complete the following steps:
Cluster creation can typically take 25–30 minutes. Its status changes to Active when it’s created successfully.
Before you create the MSK connector, update the DNS server configurations with the MSK cluster details.
Now you can log in to the DNS server and update the records for different brokers and respective IP addresses in the /var/named/kafka.region.amazonaws.com file.
Note that you need to provide the broker DNS as mentioned earlier. Remove .kafka.<region id>.amazonaws.com from the broker DNS name.
You should see the following message:
Your custom DNS server is up and running now and you should be able to resolve using broker DNS names using the internal DNS server.
It’s important to have the appropriate connectivity in place between MSK Connect and the MySQL database. Complete the following steps:
To create your MSK connector, complete the following steps:
Wait for the connector creation process to complete (about 10–15 minutes).
The MSK Connect connector is now up and running. You can log in to the MySQL database using your user ID and make a couple of record changes to the customer table record. MSK Connect will be able to receive CDC records and updates to the database will be available in the MSK <Customer> topic.
To consume messages from the MSK topic, run the Kafka consumer on the MSK_Client EC2 instance available in the MSK VPC.
Now you’re ready to consume data.
Run the Kafka consumer on your EC2 machine and you will be able to log messages similar to the following:
While testing the application, records with CUST_ID 1998, 1999, and 2000 were updated, and these records are available in the logs.
It’s always a good practice to clean up all the resources created as part of this post to avoid any additional cost. To clean up your resources, delete the MSK Cluster, MSK Connect connection, EC2 instances, DNS server, bastion host, S3 bucket, VPC, subnets and CloudWatch logs.
Additionally, clean up all other AWS resources that you created using AWS CloudFormation. You can delete these resources on the AWS CloudFormation console by deleting the stack.
In this post, we discussed the process of setting up MSK Connect using a private DNS. This feature allows you to configure connectors to reference public or private domain names.
We are able to receive the initial load and CDC records from a MySQL database hosted in a separate VPC and its DNS is not accessible or resolvable externally. MSK Connect was able to connect to the MySQL database and consume the records using the MSK Connect private DNS feature. The custom DHCP option set was attached to the VPC, which ensured DNS resolution was performed using the local DNS server instead of Route 53.
With the MSK Connect private DNS support feature, you can make your databases, data warehouses, and systems like secret managers that work with your own VPC inaccessible to the internet and be able to overcome this limitation and comply with your corporate security posture.
To learn more and get started, refer to private DNS for MSK connect.
Amar is a Senior Solutions Architect at Amazon AWS in the UK. He works across power, utilities, manufacturing and automotive customers on strategic implementations, specializing in using AWS Streaming and advanced data analytics solutions, to drive optimal business outcomes.
View Comments

More Stories
Local Infrastructure, Lower Costs: How Peering Is Moving the Needle on Internet Affordability
On Global Accessibility Awareness Day, An Internet for Everyone Must Include Everyone
An Open Fiber Data Standard to Make the Internet for Everyone