AWS S3 Interview Questions & Answers with Diagrams
Master AWS S3 concepts with visual diagrams covering storage classes, replication, encryption, lifecycle policies, and data transfer methods. Complete guide with data flows and architecture patterns.
Q1: What is Amazon S3?
graph TB
S3Main[Amazon S3<br/>Simple Storage Service] --> Concept[Key Concepts]
Concept --> Bucket[Bucket<br/>Logical Container]
Concept --> Object[Object<br/>File + Metadata]
Concept --> KeyConcept[Key<br/>Unique Identifier]
Bucket --> B1[my-app-bucket]
Object --> O1[File: image.jpg<br/>Metadata: permissions]
KeyConcept --> K1[photos-2024-image.jpg]
style S3Main fill:#FF9900
S3 Core Concepts:
- Object storage service accessed through web service interfaces (REST API)
- Bucket: Logical container for objects, globally unique name, region-specific
- Object: File plus metadata (permissions, creation date, content type)
- Key: Unique identifier within bucket, acts like file path
- Unlike file systems with folders, S3 uses flat structure with key prefixes
- Maximum object size: 5TB, unlimited storage capacity
- Highly durable (11 nines) and available (99.99%) storage solution
Q2: Object Upload & Properties
sequenceDiagram
participant User
participant S3
participant Bucket
User->>S3: Upload file (max 5TB)
Note over User,S3: REST API PUT or POST
User->>S3: Set Properties
S3->>S3: Permissions
S3->>S3: Metadata
S3->>S3: Storage Class
S3->>S3: Encryption
S3->>Bucket: Store Object
Bucket-->>User: Success Response
Note over S3: Metadata immutable<br/>after upload
Upload Process:
- Maximum single object size: 5TB (use multipart upload for files > 100MB)
- Permissions: Public or private access control at object level
- Metadata: Key-value pairs (creation date, classification, custom tags)
- Storage Class: Standard, Infrequent Access, Glacier (affects cost and retrieval time)
- Encryption: SSE-S3 (AWS managed), SSE-KMS (customer managed), SSE-C (customer provided)
- Metadata cannot be modified after upload—must copy object with new metadata
- Use REST API (PUT/POST), AWS Console, CLI, or SDKs for uploads
Q3: S3 vs EBS Storage
graph TB
subgraph S3Storage[S3 - Object Storage]
S1[Internet or VPC Endpoint]
S2[File storage, backups]
S3[Fast access]
S4[Cross-AZ redundancy]
S5[Public or Private]
S6[Cannot mount as filesystem]
end
subgraph EBSStorage[EBS - Block Storage]
E1[Within VPC only]
E2[Install applications]
E3[Super fast IOPS]
E4[Within AZ only]
E5[Private only]
E6[Attach to EC2 instance]
end
style S3Storage fill:#FF9900
style EBSStorage fill:#4CAF50
Storage Comparison:
- S3: Object storage like FTP, accessed via HTTP/HTTPS from anywhere
- Best for static files, backups, archives, data lakes, content distribution
- EBS: Block storage like SAN, attached directly to EC2 instances
- Best for databases, operating systems, applications requiring low latency
- S3 has cross-AZ redundancy, EBS replicates within single AZ
- S3 can be public, EBS is always private within VPC
- EBS provides faster I/O for applications, S3 better for large-scale storage
Q4: Storage Classes & Lifecycle
flowchart LR
Upload[Upload] --> Standard[S3 Standard<br/>Day 0-30<br/>Frequent Access]
Standard --> IA[S3 IA<br/>Day 31-90<br/>Lower Cost]
IA --> Glacier[S3 Glacier<br/>Day 91-365<br/>Archival]
Glacier --> Delete[Delete<br/>After 365 days]
style Standard fill:#4CAF50
style IA fill:#FF9800
style Glacier fill:#2196F3
Storage Classes:
- S3 Standard: Frequent access, low latency, high throughput (websites, big data)
- S3 Infrequent Access (IA): Less frequent access, quick retrieval, lower cost (backups, DR)
- S3 Glacier: Archival storage, lowest cost, retrieval minutes to hours (compliance, archives)
- Lifecycle Policies: Automatically transition objects between classes based on age
- Reduce costs by moving older data to cheaper storage classes
- Can also automatically delete objects after specified time period
- Configure rules based on object age, prefix, or tags
Q5: Data Transfer Methods
graph TB
Transfer[S3 Data Transfer] --> Internet[Public Internet]
Transfer --> Acceleration[S3 Transfer Acceleration]
Transfer --> DirectConnect[AWS Direct Connect]
Transfer --> Snowball[AWS Snowball]
Internet --> I1[S3 APIs, Console, CLI]
Internet --> I2[HTTPS port 443]
Internet --> I3[VPC Endpoints]
Acceleration --> A1[CloudFront Edge]
Acceleration --> A2[AWS Backbone]
DirectConnect --> DC1[Private 1-10 Gbps]
Snowball --> SB1[Petabyte-scale]
Snowball --> SB2[Physical device]
style Transfer fill:#FF9900
Transfer Options:
- Public Internet: Standard method using S3 APIs, Console, CLI, or SDKs over HTTPS
- VPC Endpoints: Private connection from VPC without internet gateway (secure, cost-effective)
- S3 Transfer Acceleration: Uses CloudFront edge locations for faster long-distance transfers
- AWS Direct Connect: Dedicated private connection (1-10 Gbps) for consistent performance
- AWS Snowball: Physical device for petabyte-scale data transfer when network impractical
- Choose based on data volume, speed requirements, and security needs
- VPC Endpoints eliminate NAT Gateway costs and improve security
Q6: S3 Replication
graph TB
Source[Source Bucket<br/>us-east-1] --> Replication[S3 Replication<br/>Automatic]
Replication --> CRR[Cross-Region<br/>Replication]
Replication --> SRR[Same-Region<br/>Replication]
CRR --> Target1[Target Bucket<br/>eu-west-1]
CRR --> Benefits1[Disaster Recovery<br/>Compliance<br/>Lower Latency]
SRR --> Target2[Target Bucket<br/>us-east-1]
SRR --> Benefits2[Log Aggregation<br/>Between Accounts]
style Replication fill:#FF9900
style CRR fill:#4CAF50
style SRR fill:#2196F3
Replication Features:
- Cross-Region Replication (CRR): Replicate objects to different AWS region
- Use for disaster recovery, compliance requirements, lower latency access
- Same-Region Replication (SRR): Replicate within same region
- Use for log aggregation, replication between accounts, data sovereignty
- Replication is automatic and asynchronous after initial configuration
- Can replicate entire bucket or specific prefixes/tags
- Requires versioning enabled on both source and destination buckets
- Replication only applies to new objects after rule is created
Q7: S3 Security
graph TB
Security[S3 Security] --> IAM[IAM Policies]
Security --> Bucket[Bucket Policies]
Security --> Encryption[Encryption]
IAM --> IAM1[User-level access]
IAM --> IAM2[Programmatic]
Bucket --> B1[Bucket-level access]
Bucket --> B2[Deny statements]
Bucket --> B3[Override IAM]
Encryption --> E1[In Transit: HTTPS]
Encryption --> E2[At Rest: SSE-S3, KMS, C]
Encryption --> E3[Client-side]
style Security fill:#FF9900
style Encryption fill:#4CAF50
Security Layers:
- IAM Policies: User-level permissions for programmatic access management
- Bucket Policies: Bucket-level access control, can deny access and override IAM
- ACLs: Legacy method, use IAM and bucket policies instead
- Encryption in Transit: HTTPS/TLS for data transfer security
- Encryption at Rest: SSE-S3 (AWS managed), SSE-KMS (customer managed keys), SSE-C (customer provided)
- Client-side Encryption: Encrypt data before uploading to S3
- Use bucket policies for cross-account access and public access scenarios
- Enable MFA Delete for additional protection on versioned buckets
Q8: S3 Monitoring
graph LR
Monitoring[S3 Monitoring] --> CloudTrail[AWS CloudTrail]
Monitoring --> AccessLogs[S3 Access Logging]
Monitoring --> Metrics[CloudWatch Metrics]
CloudTrail --> CT1[API calls logging]
CloudTrail --> CT2[Bucket-level default]
CloudTrail --> CT3[Object-level optional]
AccessLogs --> AL1[Detailed logs]
AccessLogs --> AL2[Who, when, what]
Metrics --> M1[Storage metrics]
Metrics --> M2[Request metrics]
style Monitoring fill:#FF9900
Monitoring Tools:
- AWS CloudTrail: Logs all S3 API calls for auditing and compliance
- Bucket-level operations logged by default, object-level requires data events
- S3 Server Access Logging: Detailed access logs (who accessed, when, what operation)
- Logs stored in separate S3 bucket for analysis
- CloudWatch Metrics: Storage metrics (bucket size, object count), request metrics (GET/PUT counts)
- Use for capacity planning, performance monitoring, and cost optimization
- Set up CloudWatch alarms for unusual activity or threshold breaches
- Combine all three for comprehensive security and operational visibility