AWS S3 Storage Service
Master AWS S3 concepts with visual diagrams covering storage classes, replication, encryption, lifecycle policies, and data transfer methods. Complete guide with data flows and architecture patterns.
Q1: What is Amazon S3?
graph TB
S3Main[Amazon S3<br/>Simple Storage Service] --> Concept[Key Concepts]
Concept --> Bucket[Bucket<br/>Logical Container]
Concept --> Object[Object<br/>File + Metadata]
Concept --> KeyConcept[Key<br/>Unique Identifier]
Bucket --> B1[my-app-bucket]
Object --> O1[File: image.jpg<br/>Metadata: permissions]
KeyConcept --> K1[photos-2024-image.jpg]
style S3Main fill:#FF9900
S3 Core Concepts:
- Object storage service accessed through web service interfaces (REST API)
- Bucket: Logical container for objects, globally unique name, region-specific
- Object: File plus metadata (permissions, creation date, content type)
- Key: Unique identifier within bucket, acts like file path
- Unlike file systems with folders, S3 uses flat structure with key prefixes
- Maximum object size: 5TB, unlimited storage capacity
- Highly durable (11 nines) and available (99.99%) storage solution
Q2: Object Upload & Properties
sequenceDiagram
participant User
participant S3
participant Bucket
User->>S3: Upload file (max 5TB)
Note over User,S3: REST API PUT or POST
User->>S3: Set Properties
S3->>S3: Permissions
S3->>S3: Metadata
S3->>S3: Storage Class
S3->>S3: Encryption
S3->>Bucket: Store Object
Bucket-->>User: Success Response
Note over S3: Metadata immutable<br/>after upload
Upload Process:
- Maximum single object size: 5TB (use multipart upload for files > 100MB)
- Permissions: Public or private access control at object level
- Metadata: Key-value pairs (creation date, classification, custom tags)
- Storage Class: Standard, Infrequent Access, Glacier (affects cost and retrieval time)
- Encryption: SSE-S3 (AWS managed), SSE-KMS (customer managed), SSE-C (customer provided)
- Metadata cannot be modified after upload—must copy object with new metadata
- Use REST API (PUT/POST), AWS Console, CLI, or SDKs for uploads
Q3: S3 vs EBS Storage
graph TB
subgraph S3Storage[S3 - Object Storage]
S1[Internet or VPC Endpoint]
S2[File storage, backups]
S3[Fast access]
S4[Cross-AZ redundancy]
S5[Public or Private]
S6[Cannot mount as filesystem]
end
subgraph EBSStorage[EBS - Block Storage]
E1[Within VPC only]
E2[Install applications]
E3[Super fast IOPS]
E4[Within AZ only]
E5[Private only]
E6[Attach to EC2 instance]
end
style S3Storage fill:#FF9900
style EBSStorage fill:#4CAF50
Storage Comparison:
- S3: Object storage like FTP, accessed via HTTP/HTTPS from anywhere
- Best for static files, backups, archives, data lakes, content distribution
- EBS: Block storage like SAN, attached directly to EC2 instances
- Best for databases, operating systems, applications requiring low latency
- S3 has cross-AZ redundancy, EBS replicates within single AZ
- S3 can be public, EBS is always private within VPC
- EBS provides faster I/O for applications, S3 better for large-scale storage
Q4: Storage Classes & Lifecycle
flowchart LR
Upload[Upload] --> Standard[S3 Standard<br/>Day 0-30<br/>Frequent Access]
Standard --> IA[S3 IA<br/>Day 31-90<br/>Lower Cost]
IA --> Glacier[S3 Glacier<br/>Day 91-365<br/>Archival]
Glacier --> Delete[Delete<br/>After 365 days]
style Standard fill:#4CAF50
style IA fill:#FF9800
style Glacier fill:#2196F3
Storage Classes:
- S3 Standard: Frequent access, low latency, high throughput (websites, big data)
- S3 Infrequent Access (IA): Less frequent access, quick retrieval, lower cost (backups, DR)
- S3 Glacier: Archival storage, lowest cost, retrieval minutes to hours (compliance, archives)
- Lifecycle Policies: Automatically transition objects between classes based on age
- Reduce costs by moving older data to cheaper storage classes
- Can also automatically delete objects after specified time period
- Configure rules based on object age, prefix, or tags
Q5: Data Transfer Methods
graph TB
Transfer[S3 Data Transfer] --> Internet[Public Internet]
Transfer --> Acceleration[S3 Transfer Acceleration]
Transfer --> DirectConnect[AWS Direct Connect]
Transfer --> Snowball[AWS Snowball]
Internet --> I1[S3 APIs, Console, CLI]
Internet --> I2[HTTPS port 443]
Internet --> I3[VPC Endpoints]
Acceleration --> A1[CloudFront Edge]
Acceleration --> A2[AWS Backbone]
DirectConnect --> DC1[Private 1-10 Gbps]
Snowball --> SB1[Petabyte-scale]
Snowball --> SB2[Physical device]
style Transfer fill:#FF9900
Transfer Options:
- Public Internet: Standard method using S3 APIs, Console, CLI, or SDKs over HTTPS
- VPC Endpoints: Private connection from VPC without internet gateway (secure, cost-effective)
- S3 Transfer Acceleration: Uses CloudFront edge locations for faster long-distance transfers
- AWS Direct Connect: Dedicated private connection (1-10 Gbps) for consistent performance
- AWS Snowball: Physical device for petabyte-scale data transfer when network impractical
- Choose based on data volume, speed requirements, and security needs
- VPC Endpoints eliminate NAT Gateway costs and improve security
Q6: S3 Replication
graph TB
Source[Source Bucket<br/>us-east-1] --> Replication[S3 Replication<br/>Automatic]
Replication --> CRR[Cross-Region<br/>Replication]
Replication --> SRR[Same-Region<br/>Replication]
CRR --> Target1[Target Bucket<br/>eu-west-1]
CRR --> Benefits1[Disaster Recovery<br/>Compliance<br/>Lower Latency]
SRR --> Target2[Target Bucket<br/>us-east-1]
SRR --> Benefits2[Log Aggregation<br/>Between Accounts]
style Replication fill:#FF9900
style CRR fill:#4CAF50
style SRR fill:#2196F3
Replication Features:
- Cross-Region Replication (CRR): Replicate objects to different AWS region
- Use for disaster recovery, compliance requirements, lower latency access
- Same-Region Replication (SRR): Replicate within same region
- Use for log aggregation, replication between accounts, data sovereignty
- Replication is automatic and asynchronous after initial configuration
- Can replicate entire bucket or specific prefixes/tags
- Requires versioning enabled on both source and destination buckets
- Replication only applies to new objects after rule is created
Q7: S3 Security
graph TB
Security[S3 Security] --> IAM[IAM Policies]
Security --> Bucket[Bucket Policies]
Security --> Encryption[Encryption]
IAM --> IAM1[User-level access]
IAM --> IAM2[Programmatic]
Bucket --> B1[Bucket-level access]
Bucket --> B2[Deny statements]
Bucket --> B3[Override IAM]
Encryption --> E1[In Transit: HTTPS]
Encryption --> E2[At Rest: SSE-S3, KMS, C]
Encryption --> E3[Client-side]
style Security fill:#FF9900
style Encryption fill:#4CAF50
Security Layers:
- IAM Policies: User-level permissions for programmatic access management
- Bucket Policies: Bucket-level access control, can deny access and override IAM
- ACLs: Legacy method, use IAM and bucket policies instead
- Encryption in Transit: HTTPS/TLS for data transfer security
- Encryption at Rest: SSE-S3 (AWS managed), SSE-KMS (customer managed keys), SSE-C (customer provided)
- Client-side Encryption: Encrypt data before uploading to S3
- Use bucket policies for cross-account access and public access scenarios
- Enable MFA Delete for additional protection on versioned buckets
Q8: S3 Monitoring
graph LR
Monitoring[S3 Monitoring] --> CloudTrail[AWS CloudTrail]
Monitoring --> AccessLogs[S3 Access Logging]
Monitoring --> Metrics[CloudWatch Metrics]
CloudTrail --> CT1[API calls logging]
CloudTrail --> CT2[Bucket-level default]
CloudTrail --> CT3[Object-level optional]
AccessLogs --> AL1[Detailed logs]
AccessLogs --> AL2[Who, when, what]
Metrics --> M1[Storage metrics]
Metrics --> M2[Request metrics]
style Monitoring fill:#FF9900
Monitoring Tools:
- AWS CloudTrail: Logs all S3 API calls for auditing and compliance
- Bucket-level operations logged by default, object-level requires data events
- S3 Server Access Logging: Detailed access logs (who accessed, when, what operation)
- Logs stored in separate S3 bucket for analysis
- CloudWatch Metrics: Storage metrics (bucket size, object count), request metrics (GET/PUT counts)
- Use for capacity planning, performance monitoring, and cost optimization
- Set up CloudWatch alarms for unusual activity or threshold breaches
- Combine all three for comprehensive security and operational visibility