Doc: Deploying Redpanda Connect on EKS using ArgoCD

Redpanda Connect on EKS using ArgoCD - IntVerse

Overview

Redpanda Connect, formerly known as Benthos (acquired by Redpanda), is a declarative data streaming service that solves a wide range of data engineering problems with simple, chained, stateless processing steps. It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks it’s able to guarantee at-least-once delivery without needing to persist messages during transit.

It’s simple to deploy, comes with a wide range of connectors, and is totally data agnostic, making it easy to drop into your existing infrastructure. Connect has functionality that overlaps with integration frameworks, log aggregators, and ETL workflow engines, and can therefore be used to complement these traditional data engineering tools or act as a simpler alternative.

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It pairs very well with Redpanda Connect and we will see how good the experience is in deploying and maintaining the Redpanda Connect components in this context.

In this document, we will walk through the steps of deploying Redpanda Connect on an Amazon EKS cluster using ArgoCD.  The process involves fetching the Benthos Helm chart, uploading Helm chart manifests to GitHub, modifying the ConfigMap in Bitbucket, and deploying through ArgoCD.

Prerequisites

1. Amazon EKS Cluster: Ensure you have an EKS cluster running.

2. ArgoCD: Installed and configured on your EKS cluster.

3. Kubectl: Installed and configured to interact with your EKS cluster.

4. Helm: Installed for fetching and managing Helm charts.

5. GitHub Account: For storing Helm chart manifests.

6. Bitbucket Account: For managing ConfigMap modifications.

Introduction to Redpanda Connect

Connect is a declarative data streaming service that solves a wide range of data engineering problems with simple, chained, stateless processing steps. It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks it’s able to guarantee at-least-once delivery without needing to persist messages during transit.

It’s simple to deploy, comes with a wide range of connectors, and is totally data agnostic, making it easy to drop into your existing infrastructure. Connect has functionality that overlaps with integration frameworks, log aggregators and ETL workflow engines, and can therefore be used to complement these traditional data engineering tools or act as a simpler alternative.

Refer: https://docs.redpanda.com/redpanda-connect/about/

NOTE: At the time of this writing (late August 2024), Redpanda has not yet converted all references to Redpanda Connect from Benthos and many updates are happening during this product integration phase. As a close partner of Redpanda, IntVerse regularly keeps up with changes and will be regularly updating this document – i.e., eventually this note will disappear! 

Step-by-Step Deployment

1. Fetch the Helm Chart

Use Helm to fetch the Benthos chart and untar it:

helm fetch benthos/benthos --untar
2. Review the Chart:

Navigate to the directory where the Helm chart is untarred. This directory will contain the Chart.yaml, values.yaml, and templates.

cd benthos

Upload Helm Chart Manifests to GitHub

1. Create a Git Repository:

Set up a Git repository on GitHub if you don’t have one.

2. Add the Helm Chart:

Upload the Benthos directory (or relevant Helm chart files) to a new directory in your GitHub repository.

git init
git add benthos
git commit -m "Add Benthos Helm chart"
git remote add origin https://github.com/your-username/your-repo.git
git push -u origin main

Modify ConfigMap on Bitbucket

1. Access Bitbucket:

Go to your Bitbucket repository where the ConfigMap is stored.

2. Edit ConfigMap:

Modify the ConfigMap as needed. For example, update benthos-config.yaml in your Bitbucket repository. Once we deploy the Benthos Application it will take input and output from the configmap we provided here. The config parameter should contain the configuration as it would be parsed by the Benthos binary.

Integrating Benthos and Redpanda BYOC using ConfigMap:

Modify Benthos ConfigMap to match our example topology. In this use case, we are using Kafka (Redpanda) as an “input” and File as an “Output”.

Reference:
https://docs.redpanda.com/redpanda-connect/components/inputs/about/

https://docs.redpanda.com/redpanda-connect/components/processors/about
https://docs.redpanda.com/redpanda-connect/components/outputs/about

NOTE on this step:

If you’re seeing issues writing to or reading from Kafka with this component then it’s worth trying out the newer kafka_franz input

I’m seeing logs that report Failed to connect to Kafka: Kafka: the client has run out of available brokers to talk to (Is your cluster reachable?), but the brokers are reachable.

Unfortunately, this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double-check your authentication configuration and also ensure that you have enabled TLS if applicable.

Commit Changes:

Commit the changes to Bitbucket.

git add benthos-config.yaml
git commit -m "Update Benthos ConfigMap"
git push origin master

Create an ArgoCD Application for Benthos

Create Application:

In the ArgoCD UI, click on +New APP. Choose EDIT AS YAML and use the following template:

apiVersion: argoproj.io/v1alpha1
kind: Application

metadata:

  name: benthos

spec:

  destination:
    namespace: benthos
    server: https://kubernetes.default.svc

  source:

    path: redpandaconnect/benthos
    repoURL: https://github.com/Intverse/benthos.git
    targetRevision: main

  sources: []

  project: default

  syncPolicy:

    automated:
      prune: true
      selfHeal: true
      allowEmpty: true

    syncOptions:

      - CreateNamespace=true

Save and Create: Save and create the application.

Verify Deployment

Check Logs: Monitor the logs of the Benthos pods to ensure that they are processing iterations correctly and that there are no issues with the deployment

Check logs of pod 

Streams mode

Prepare ConfigMap: When running Benthos in streams mode, combine individual stream configuration files into a single Kubernetes ConfigMap. Ensure this ConfigMap is applied before deploying the Helm chart.

Update values.yaml: Enable streams mode in values.yaml file:

# values.yaml
streams:
 enabled: true
 streamsConfigMap: "benthos-streams"

Currently, the streams mode ConfigMap should be applied separately from and before installation of the helm chart; support for deploying additional ConfigMap’s within the chart may be implemented later.

Created config.yaml file on bitbucket so we can explicitly able to give multiple inputoutput iterations for benthos.

Create an ArgoCD application for benthos config:

Click on +NewAPP Then click on EDIT AS YAML, save and create the application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
 name: bent
spec:
 destination:
   namespace: rbenthos
   server: https://kubernetes.default.svc
 source:
   path: benthosconfig
   repoURL: https://github.com/Intverse/benthos.git
   targetRevision: main
 sources: []
 project: default
 syncPolicy:
   automated:
     prune: true
     selfHeal: true
     allowEmpty: true
   syncOptions:
     - CreateNamespace=true

Save and Create: Save and create the application.

Deploy Benthos Application:

Deploy Application: In ArgoCD, configure the repository URL to point to the Benthos Helm chart manifest file and ensure values.yaml has streams enabled.

Create Application: Click on +New APP, choose EDIT AS YAML, and use the following template:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: benthos
spec:
  destination:
    namespace: rbenthos
    server: https://kubernetes.default.svc
  source:
    path: redpandaconnect/benthos
    repoURL: https://github.com/Intverse/benthos.git
    targetRevision: main
  sources: []
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: true
    syncOptions:
      - CreateNamespace=true

Save and Create: Save and create the application.

Verify Deployment

Check Logs: Monitor the logs of the Benthos pods to ensure that they are processing iterations correctly and that there are no issues with the deployment.

Check logs of pod if it is taking multiple iterations or not:

Conclusion:

By following this guide, you’ve successfully deployed Redpanda Connect on an Amazon EKS cluster using ArgoCD, demonstrating the power and flexibility of combining these robust tools. Redpanda Connect’s ability to handle complex data streaming tasks with ease, coupled with ArgoCD’s seamless continuous delivery capabilities, makes for an efficient and scalable solution for modern data engineering challenges.

This deployment process not only simplifies the management of your streaming data infrastructure but also ensures resilience and scalability, allowing your team to focus on building and optimizing data pipelines without the overhead of manual operations. With Redpanda Connect and ArgoCD, you’re well-equipped to handle the demands of today’s data-driven environments, ensuring reliable, real-time data processing across your organization.

To further enhance your capabilities and overcome any data streaming challenges, consider leveraging the expertise of the IntVerse team. Their services can provide tailored solutions, ensuring that your Redpanda Connect deployment is optimized for your specific needs. With IntVerse by your side, you can confidently tackle complex data engineering problems, knowing you have the right support to maximize the potential of your data streaming infrastructure.

Now that your setup is complete, you can explore further customization and optimization of your streaming data pipelines, backed by the expertise and support of the IntVerse team.


One response

Leave a Reply

Your email address will not be published. Required fields are marked *