Analytic, Steaming, Queue
Kinesis
- Collect, Process, and analyze streaming data in real-time
- Ingest real-time data such as Application logs, Metrics, Website clickstreams, IoT telemetry data, IoT telemetry data
- Services
- Kinesis Data Streams
- Capture, process, and store data stream
- Retention between 1 day to 365 days
- Ability to reprocess (replay) data
- Once data is inserted in Kinesis, it can't be deleted (immutability)
- Data that share the same partition foes to the same shard (ordering)
- Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
- Consumers:
- Write your own: Kinesis Client Library (KCL), AWS SDK
- Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics
- Capacity
- Provisioned Mode
- Choose the number of shards, scale manually
- Each shard gets 1MB/s in (or 1000 records per second)
- Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
- Pay per shard provisioned per hour
- On-demand mode
- Default 4 MB/s in or 4000 records per second
- Scales automatically based on observed throughput peaks during the last 30 days
- Pay per stream per hour & data in/out per GB
- Security
- Control access/authorization using IAM policies
- Encryption in flight using HTTPS endpoints
- Encryption at rest using KMS
- You can implement encryption/decryption of data on the client side (harder)
- VPC Endpoints are available for Kinesis to access within VPC
- Monitor API calls using CloudTrail
- Kinesis Data Firehose
- Load data streams into AWS data stores
- Fully managed service, No administration, automatic scaling, serverless
- Destination
- AWS: Redshift, S3, OpenSearch
- 3rd party partners: Splunk, MongoDB, Datadog, NewRelic
- Custom: HTTP Endpoint
- Pay for data going through Firehose
- Near Real Time (Buffer internal 0 seconds to 900 seconds, Buffer size minimum 1MB)
- Support many data formats, Conversions, Transformations, Compression
- Supports custom data transformations using AWS Lambda
- Can send failed or all data to a backup S3 bucket
- Kinesis Data Analytics
- Analyze data streams with SQL or Apache Flink
- Kinesis Video Streams
- Capture, process, and store video streams
SQS
- Type of service communication
- Overview
- Producer
- Produced to SQS using the SDK (SendMessage API)
- The message persists in SQS until a consumer deletes it
- Consumer
- Poll SQS for messages (receive up to 10 messages at a time)
- Delete the messages using the
DeleteMessageAPI
- Type of the SQS
- Standard Queue (Oldest, over 10 years old)
- Unlimited throughput, unlimited number of messages in the queue
- Default retention of messages 4 days, max 14 days
- Low latency (<10 ms on publish and receive)
- Limitation of 256KB per message sent
- Can have duplicate messages (at least once delivery, occasionally)
- Can have out-of-order messages (best-effort ordering)
- FIFO Queue
- Limited throughput 300 msg/s without batching, 300 msg/s with batching
- Exactly-once-send capability (by removing duplicates)
- Use-cases
- Use as buffer to database writes
- Decouple between application tiers
- Scaling
- Features
- Message Visibility Timeout
- After a consumer polls a message, It becomes invisible to other consumers
- The default is 30 seconds
- If a message is not processed within the visibility timeout, It will be processed twice
- A consumer could call the ChangeMessageVisibility API to get more time
- If visibility timeout is high (hours), and the consumer crashes, re-processing will take time
- If visibility timeout is too low (seconds), you may get duplicates
- Long polling
- When a consumer requests messages from the queue, it can optionally
wait
for messages to arrive if there are none in the queue
- LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
- The wait time can be between 1 sec to 20 sec
- Long polling is preferable to short polling
- Long polling can be enabled at the queue level or at the API level using
WaitTimeSeconds
- Security
- In-flight encryption using HTTPS API
- At-rest encryption using KMS keys
- Client-side encryption
- Access Controls by IAM policies
- SQS Access Policies (similar to S3 bucket policies)
- Useful for cross-account access
- Useful for allowing other services to write to an SQS queue
SNS
- Up to 12,500,000 subscriptions per topic
- 100,000 topics limit
- Subscribers
- Emails
- SMS & Mobile Notifications
- HTTP(S) Endpoints
- SQS
- Lambda
- Kinesis Data Firehose
- Many AWS Services can send data directly to SNS
- CloudWatch Alarms
- AWS Budgets
- Lambda
- Auto Scaling Group (Notifications)
- S3 Bucket (Events)
- DynamoDB
- CloudFormation (State Changes)
- AWS DMS (New Replica)
- RDS Events
- Publish
- Topic Publish (SDK)
- Direct Publish (mobile apps SDK)
- Security
- Encryption
- In-flight encryption using HTTPS API
- At-rest encryption using KMS keys
- Client-side encryption if the client wants to perform encryption/decryption itself
- Access Controls
- IAM policies to regulate access to the SNS API
- SNS Access Policies (Similar to S3 bucket policies)
- Useful for cross-account access to SNS topics
- Useful for allowing other services to write to an SNS topic
- Type
- FIFO
- Ordering by Message Group ID (All messages in the same group are ordered)
- Deduplication using a Deduplication ID or Content-Based Deduplication
- Strictly-preserved message ordering
- Exactly once message delivery
- Highest throughput, up to 300 publishes/second
- Subscription protocols: SQS
- Standard
- Best effort message ordering
- At least once message delivery
- Highest throughput in publishes/second
- Subscription protocols: SQS, Lambda, HTTP, SMS, email, mobile application endpoints
- Message Filtering
- JSON policy used to filter messages sent to SNS topic's subscriptions
- If a subscription doesn't have a filter policy, It receives every message
SNS + SQS: Fan Out Pattern
- Push once in SNS, Receive in all SQS queues
- Fully decoupled, No data loss
- Ability to add more SQS subscribers over time
Kinesis vs SQS ordering
- For example, 100 trucks, 5 Kinesis shards, 1 SQS FIFO
- Kinesis Data Streams
- On average you will have 20 trucks per shard
- Trucks will have their data ordered within each shard
- The maximum amount of consumers in parallel we can have is 5
- Can receive up to 5 MB/s of data
- SQS FIFO
- 1 SQS FIFO queue
- 100 Group ID
- Can have up to 100 consumers (due to the 100 Group ID)
- Can have up to 300 messages per second (or 3000 if using batching)
Amazon MQ
- Managed message broker service for RabbitMQ, ActiveMQ
- It doesn't scale as much as SQS/SNS
- runs on servers, Can run in Multi-AZ with failover
- Has both queue feature (SQS) and topic features (SNS)
- High Availability
Others