Overview
Redpanda Connect, formerly known as Benthos (acquired by Redpanda), is a declarative data streaming service that solves a wide range of data engineering problems with simple, chained, stateless processing steps. It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks it’s able to guarantee at-least-once delivery without needing to persist messages during transit.
It’s simple to deploy, comes with a wide range of connectors, and is totally data agnostic, making it easy to drop into your existing infrastructure. Connect has functionality that overlaps with integration frameworks, log aggregators, and ETL workflow engines, and can therefore be used to complement these traditional data engineering tools or act as a simpler alternative.
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It pairs very well with Redpanda Connect and we will see how good the experience is in deploying and maintaining the Redpanda Connect components in this context.
In this document, we will walk through the steps of deploying Redpanda Connect on an Amazon EKS cluster using ArgoCD. The process involves fetching the Benthos Helm chart, uploading Helm chart manifests to GitHub, modifying the ConfigMap in Bitbucket, and deploying through ArgoCD.
Prerequisites
1. Amazon EKS Cluster: Ensure you have an EKS cluster running.
2. ArgoCD: Installed and configured on your EKS cluster.
3. Kubectl: Installed and configured to interact with your EKS cluster.
4. Helm: Installed for fetching and managing Helm charts.
5. GitHub Account: For storing Helm chart manifests.
6. Bitbucket Account: For managing ConfigMap modifications.
Introduction to Redpanda Connect
Connect is a declarative data streaming service that solves a wide range of data engineering problems with simple, chained, stateless processing steps. It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks it’s able to guarantee at-least-once delivery without needing to persist messages during transit.
It’s simple to deploy, comes with a wide range of connectors, and is totally data agnostic, making it easy to drop into your existing infrastructure. Connect has functionality that overlaps with integration frameworks, log aggregators and ETL workflow engines, and can therefore be used to complement these traditional data engineering tools or act as a simpler alternative.
Refer: https://docs.redpanda.com/redpanda-connect/about/
NOTE: At the time of this writing (late August 2024), Redpanda has not yet converted all references to Redpanda Connect from Benthos and many updates are happening during this product integration phase. As a close partner of Redpanda, IntVerse regularly keeps up with changes and will be regularly updating this document – i.e., eventually this note will disappear!
Step-by-Step Deployment
1. Fetch the Helm Chart
Use Helm to fetch the Benthos chart and untar it:
helm fetch benthos/benthos --untar
2. Review the Chart:
Navigate to the directory where the Helm chart is untarred. This directory will contain the Chart.yaml, values.yaml, and templates.
cd benthos
Upload Helm Chart Manifests to GitHub
1. Create a Git Repository:
Set up a Git repository on GitHub if you don’t have one.
2. Add the Helm Chart:
Upload the Benthos directory (or relevant Helm chart files) to a new directory in your GitHub repository.
git init
git add benthos
git commit -m "Add Benthos Helm chart"
git remote add origin https://github.com/your-username/your-repo.git
git push -u origin main

Modify ConfigMap on Bitbucket
1. Access Bitbucket:
Go to your Bitbucket repository where the ConfigMap is stored.
2. Edit ConfigMap:
Modify the ConfigMap as needed. For example, update benthos-config.yaml in your Bitbucket repository. Once we deploy the Benthos Application it will take input and output from the configmap we provided here. The config parameter should contain the configuration as it would be parsed by the Benthos binary.
Integrating Benthos and Redpanda BYOC using ConfigMap:
Modify Benthos ConfigMap to match our example topology. In this use case, we are using Kafka (Redpanda) as an “input” and File as an “Output”.
Reference:
https://docs.redpanda.com/redpanda-connect/components/inputs/about/
NOTE on this step:
If you’re seeing issues writing to or reading from Kafka with this component then it’s worth trying out the newer kafka_franz input.
I’m seeing logs that report Failed to connect to Kafka: Kafka: the client has run out of available brokers to talk to (Is your cluster reachable?), but the brokers are reachable.
Unfortunately, this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double-check your authentication configuration and also ensure that you have enabled TLS if applicable.

Commit Changes:
Commit the changes to Bitbucket.
git add benthos-config.yaml
git commit -m "Update Benthos ConfigMap"
git push origin master
Create an ArgoCD Application for Benthos
Create Application:
In the ArgoCD UI, click on +New APP. Choose EDIT AS YAML and use the following template:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: benthos
spec:
destination:
namespace: benthos
server: https://kubernetes.default.svc
source:
path: redpandaconnect/benthos
repoURL: https://github.com/Intverse/benthos.git
targetRevision: main
sources: []
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: true
syncOptions:
- CreateNamespace=true

Save and Create: Save and create the application.

Verify Deployment
Check Logs: Monitor the logs of the Benthos pods to ensure that they are processing iterations correctly and that there are no issues with the deployment

Check logs of pod

Streams mode
Prepare ConfigMap: When running Benthos in streams mode, combine individual stream configuration files into a single Kubernetes ConfigMap. Ensure this ConfigMap is applied before deploying the Helm chart.
Update values.yaml: Enable streams mode in values.yaml file:
# values.yaml
streams:
enabled: true
streamsConfigMap: "benthos-streams"

Currently, the streams mode ConfigMap should be applied separately from and before installation of the helm chart; support for deploying additional ConfigMap’s within the chart may be implemented later.
Created config.yaml file on bitbucket so we can explicitly able to give multiple inputoutput iterations for benthos.

Create an ArgoCD application for benthos config:
Click on +NewAPP Then click on EDIT AS YAML, save and create the application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: bent
spec:
destination:
namespace: rbenthos
server: https://kubernetes.default.svc
source:
path: benthosconfig
repoURL: https://github.com/Intverse/benthos.git
targetRevision: main
sources: []
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: true
syncOptions:
- CreateNamespace=true
Save and Create: Save and create the application.

Deploy Benthos Application:
Deploy Application: In ArgoCD, configure the repository URL to point to the Benthos Helm chart manifest file and ensure values.yaml has streams enabled.
Create Application: Click on +New APP, choose EDIT AS YAML, and use the following template:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: benthos
spec:
destination:
namespace: rbenthos
server: https://kubernetes.default.svc
source:
path: redpandaconnect/benthos
repoURL: https://github.com/Intverse/benthos.git
targetRevision: main
sources: []
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: true
syncOptions:
- CreateNamespace=true
Save and Create: Save and create the application.

Verify Deployment
Check Logs: Monitor the logs of the Benthos pods to ensure that they are processing iterations correctly and that there are no issues with the deployment.

Check logs of pod if it is taking multiple iterations or not:

Conclusion:
By following this guide, you’ve successfully deployed Redpanda Connect on an Amazon EKS cluster using ArgoCD, demonstrating the power and flexibility of combining these robust tools. Redpanda Connect’s ability to handle complex data streaming tasks with ease, coupled with ArgoCD’s seamless continuous delivery capabilities, makes for an efficient and scalable solution for modern data engineering challenges.
This deployment process not only simplifies the management of your streaming data infrastructure but also ensures resilience and scalability, allowing your team to focus on building and optimizing data pipelines without the overhead of manual operations. With Redpanda Connect and ArgoCD, you’re well-equipped to handle the demands of today’s data-driven environments, ensuring reliable, real-time data processing across your organization.
To further enhance your capabilities and overcome any data streaming challenges, consider leveraging the expertise of the IntVerse team. Their services can provide tailored solutions, ensuring that your Redpanda Connect deployment is optimized for your specific needs. With IntVerse by your side, you can confidently tackle complex data engineering problems, knowing you have the right support to maximize the potential of your data streaming infrastructure.
Now that your setup is complete, you can explore further customization and optimization of your streaming data pipelines, backed by the expertise and support of the IntVerse team.
One response
Thank you for this, just what I needed!