What is AWS Kinesis firehose and kinesis data streams?

What is Kinesis Data Streams?

Amazon Kinesis Data Streams collects and process large streams of data records by using data-processing applications, known as Kinesis Data Streams applications.

Kinesis Data Streams application reads data from a data stream as data records. These applications can use the Kinesis Client Library, and they can run on Amazon EC2 instances.

You can send the processed records to dashboards, use them to generate alerts, dynamically change pricing and advertising strategies, or send data to a variety of other AWS services

The producers continually push data to Kinesis Data Streams, and the consumers process the data in real time. Consumers (such as a custom application running on Amazon EC2 or an Amazon Kinesis Data Firehose delivery stream) can store their results using an AWS service such as Amazon DynamoDB, Amazon Redshift, or Amazon S3.

What is Kinesis Data Firehose?

  • Data firehose is used to deliver real time streaming data to AWS S3, AWS Redshift, Amazon OpenSearch, Splunk or any HTTP Endpoints, third party providers such as Splunk, Dynatrace or data dog.
  • With Kinesis Data Firehose, you configure producers such as AWS EC2, WAF, Logs etc. to send data to Data firehose and which automatically delivers the data to the destination.
  • For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first. Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster. 
  • You can also configure Kinesis Data Firehose to transform your data before delivering it.
  • Kinesis Data Firehose supports Amazon S3 server-side encryption with AWS Key Management Service (AWS KMS) for encrypting delivered data in Amazon S3.
  • If data transformation is enabled, Kinesis Data Firehose can log the Lambda invocation, and send data delivery errors to CloudWatch Logs.
  • Kinesis Data Firehose uses IAM roles for all the permissions that the delivery stream needs such as access to various services, including your S3 bucket, AWS KMS key (if data encryption is enabled), and Lambda function (if data transformation is enabled)

Kinesis Data Firehose delivery stream

  • You create data delivery stream so that you can send your data to this delivery stream.

Records

  • The data of interest that your data producer sends to a Kinesis Data Firehose delivery stream

Producers

  • Producers send records to Kinesis Data Firehose delivery streams. 
  • You can also configure your Kinesis Data Firehose delivery stream to automatically read data from an existing Kinesis data stream and load it into destinations.
  • You can create a new Data stream and then select the Data streams instead of DIRECT PUT.
  • Now to retrieve from Data streams into Firehose you need Amazon Kinesis agent which is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose .
  • You can install the agent on Linux-based server environments such as web servers, log servers, and database servers. The agent can pre-process the records parsed from monitored files before sending them to your delivery stream.
sudo yum install –y aws-kinesis-agent
  • To configure agent Open and edit the configuration file (as superuser if using default file access permissions).
/etc/aws-kinesis/agent.json 

sudo service aws-kinesis-agent start
  • The IAM role or AWS credentials that you specify must have permission to perform the Kinesis Data Firehose PutRecordBatch operation for the agent to send data to your delivery stream.
{ 
   "flows": [
        { 
            "filePattern": "/tmp/app.log*", 
            "deliveryStream": "yourdeliverystream"
        } 
   ] 
}
Example: 

{ 
    "flows": [
        {
            "filePattern": "/tmp/app.log*", 
            "deliveryStream": "my-delivery-stream",
            "dataProcessingOptions": [
                {
                    "optionName": "LOGTOJSON",
                    "logFormat": "COMMONAPACHELOG"
                }
            ]
        }
    ] 
}

How to create Kinesis Firehose delivery Stream with Dynamic partitioning enabled

  • Navigate to the Amazon Kinesis and click on Delivery streams
  • Next choose the S3 bucket as the Destination. Here, we selected Dynamic partitioning as Not enabled.

Note: Dynamic partitioning enables you to create targeted data sets by partitioning streaming S3 data based on partitioning keys. You can partition your source data with inline parsing and/or the specified AWS Lambda function. You can enable dynamic partitioning only when you create a new delivery stream. You cannot enable dynamic partitioning for an existing delivery stream.

  • Select the Amazon Cloud watch error logging as enabled and also create a new IAM role.

Kinesis Data Firehose uses Amazon S3 to backup all or failed only data that it attempts to deliver to your chosen destination. You can specify the S3 backup settings

  • If you set Amazon S3 as the destination for your Kinesis Data Firehose delivery stream and you choose to specify an AWS Lambda function to transform data records or if you choose to convert data record formats for your delivery stream
  • If you set Amazon Redshift as the destination for your Kinesis Data Firehose delivery stream and you choose to specify an AWS Lambda function to transform data records.
  • If you set any of the following services as the destination for your Kinesis Data Firehose delivery stream: Amazon OpenSearch Service, Datadog, Dynatrace, HTTP Endpoint, LogicMonitor, MongoDB Cloud, New Relic, Splunk, or Sumo Logic.

When you send data from your data producers to your data stream, Kinesis Data Streams encrypts your data using an AWS Key Management Service (AWS KMS) key before storing the data at rest.

When your Kinesis Data Firehose delivery stream reads the data from your data stream, Kinesis Data Streams first decrypts the data and then sends it to Kinesis Data Firehose.

Writing to Kinesis Data Firehose delivery stream using Cloud Watch Events

  • On CloudWatch page, click on rules.
  • Once we click on rules, then create rules, provide the Source and Target details.

Sending Amazon VPC Logs to Kinesis Data Firehose Delivery Stream ( Splunk) using Cloud Watch

  • On AWS VPC, create a VPC flow log with Destination as Cloud Watch Log group.
  • In Cloud watch service Create Log group and choose Log groups.
  • Create a Kinesis Data Firehose Delivery Stream with Splunk as a Destination.
  • Now, create CloudWatch subscription which will send all the CloudWatch logs to delivery stream.
aws logs put-subscription-filter --log-group-name "VPCtoSplunkLogGroup" --filter-name "Destination" --filter-pattern "" --destination-arn "arn:aws:firehose:your-region:your-aws-account-id:deliverystream/VPCtoSplunkStream" --role-arn "arn:aws:iam::your-aws-account-id:role/VPCtoSplunkCWtoFHRole"
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s