Analytic, Steaming, Queue

Kinesis

Collect, Process, and analyze streaming data in real-time
Ingest real-time data such as Application logs, Metrics, Website clickstreams, IoT telemetry data, IoT telemetry data
Services
- Kinesis Data Streams
  - Capture, process, and store data stream
  - Retention between 1 day to 365 days
  - Ability to reprocess (replay) data
  - Once data is inserted in Kinesis, it can't be deleted (immutability)
  - Data that share the same partition foes to the same shard (ordering)
  - Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
  - Consumers:
    - Write your own: Kinesis Client Library (KCL), AWS SDK
    - Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics
  - Capacity
    - Provisioned Mode
      - Choose the number of shards, scale manually
      - Each shard gets 1MB/s in (or 1000 records per second)
      - Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
      - Pay per shard provisioned per hour
    - On-demand mode
      - Default 4 MB/s in or 4000 records per second
      - Scales automatically based on observed throughput peaks during the last 30 days
      - Pay per stream per hour & data in/out per GB
  - Security
    - Control access/authorization using IAM policies
    - Encryption in flight using HTTPS endpoints
    - Encryption at rest using KMS
    - You can implement encryption/decryption of data on the client side (harder)
    - VPC Endpoints are available for Kinesis to access within VPC
    - Monitor API calls using CloudTrail
- Kinesis Data Firehose
  - Load data streams into AWS data stores
  - Fully managed service, No administration, automatic scaling, serverless
  - Destination
    - AWS: Redshift, S3, OpenSearch
    - 3rd party partners: Splunk, MongoDB, Datadog, NewRelic
    - Custom: HTTP Endpoint
  - Pay for data going through Firehose
  - Near Real Time (Buffer internal 0 seconds to 900 seconds, Buffer size minimum 1MB)
  - Support many data formats, Conversions, Transformations, Compression
  - Supports custom data transformations using AWS Lambda
  - Can send failed or all data to a backup S3 bucket
- Kinesis Data Analytics
  - Analyze data streams with SQL or Apache Flink
- Kinesis Video Streams
  - Capture, process, and store video streams

SQS

Type of service communication
Overview
Producer
- Produced to SQS using the SDK (SendMessage API)
- The message persists in SQS until a consumer deletes it
Consumer
- Poll SQS for messages (receive up to 10 messages at a time)
- Delete the messages using the DeleteMessageAPI
Type of the SQS
- Standard Queue (Oldest, over 10 years old)
  - Unlimited throughput, unlimited number of messages in the queue
  - Default retention of messages 4 days, max 14 days
  - Low latency (<10 ms on publish and receive)
  - Limitation of 256KB per message sent
  - Can have duplicate messages (at least once delivery, occasionally)
  - Can have out-of-order messages (best-effort ordering)
- FIFO Queue
  - Limited throughput 300 msg/s without batching, 300 msg/s with batching
  - Exactly-once-send capability (by removing duplicates)
Use-cases
- Use as buffer to database writes
- Decouple between application tiers
Scaling
Features
- Message Visibility Timeout
  - After a consumer polls a message, It becomes invisible to other consumers
  - The default is 30 seconds
  - If a message is not processed within the visibility timeout, It will be processed twice
  - A consumer could call the ChangeMessageVisibility API to get more time
  - If visibility timeout is high (hours), and the consumer crashes, re-processing will take time
  - If visibility timeout is too low (seconds), you may get duplicates
- Long polling
  - When a consumer requests messages from the queue, it can optionally wait for messages to arrive if there are none in the queue
  - LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
  - The wait time can be between 1 sec to 20 sec
  - Long polling is preferable to short polling
  - Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds
Security
- In-flight encryption using HTTPS API
- At-rest encryption using KMS keys
- Client-side encryption
- Access Controls by IAM policies
- SQS Access Policies (similar to S3 bucket policies)
  - Useful for cross-account access
  - Useful for allowing other services to write to an SQS queue

SNS

Up to 12,500,000 subscriptions per topic
100,000 topics limit
Subscribers
- Emails
- SMS & Mobile Notifications
- HTTP(S) Endpoints
- SQS
- Lambda
- Kinesis Data Firehose
Many AWS Services can send data directly to SNS
- CloudWatch Alarms
- AWS Budgets
- Lambda
- Auto Scaling Group (Notifications)
- S3 Bucket (Events)
- DynamoDB
- CloudFormation (State Changes)
- AWS DMS (New Replica)
- RDS Events
Publish
- Topic Publish (SDK)
- Direct Publish (mobile apps SDK)
Security
- Encryption
  - In-flight encryption using HTTPS API
  - At-rest encryption using KMS keys
  - Client-side encryption if the client wants to perform encryption/decryption itself
- Access Controls
  - IAM policies to regulate access to the SNS API
- SNS Access Policies (Similar to S3 bucket policies)
  - Useful for cross-account access to SNS topics
  - Useful for allowing other services to write to an SNS topic
Type
- FIFO
  - Ordering by Message Group ID (All messages in the same group are ordered)
  - Deduplication using a Deduplication ID or Content-Based Deduplication
  - Strictly-preserved message ordering
  - Exactly once message delivery
  - Highest throughput, up to 300 publishes/second
  - Subscription protocols: SQS
- Standard
  - Best effort message ordering
  - At least once message delivery
  - Highest throughput in publishes/second
  - Subscription protocols: SQS, Lambda, HTTP, SMS, email, mobile application endpoints
Message Filtering
- JSON policy used to filter messages sent to SNS topic's subscriptions
- If a subscription doesn't have a filter policy, It receives every message

Push once in SNS, Receive in all SQS queues
Fully decoupled, No data loss
Ability to add more SQS subscribers over time

Kinesis vs SQS ordering

For example, 100 trucks, 5 Kinesis shards, 1 SQS FIFO
Kinesis Data Streams
- On average you will have 20 trucks per shard
- Trucks will have their data ordered within each shard
- The maximum amount of consumers in parallel we can have is 5
- Can receive up to 5 MB/s of data
SQS FIFO
- 1 SQS FIFO queue
- 100 Group ID
- Can have up to 100 consumers (due to the 100 Group ID)
- Can have up to 300 messages per second (or 3000 if using batching)