Full Stack • Java • System Design • Cloud • AI Engineering

AWS2026-06-17

AWS S3 Interview Questions & Answers with Diagrams

Master AWS S3 concepts with visual diagrams covering storage classes, replication, encryption, lifecycle policies, and data transfer methods. Complete guide with data flows and architecture patterns.

Q1: What is Amazon S3?

graph TB
    S3Main[Amazon S3<br/>Simple Storage Service] --> Concept[Key Concepts]
    
    Concept --> Bucket[Bucket<br/>Logical Container]
    Concept --> Object[Object<br/>File + Metadata]
    Concept --> KeyConcept[Key<br/>Unique Identifier]
    
    Bucket --> B1[my-app-bucket]
    Object --> O1[File: image.jpg<br/>Metadata: permissions]
    KeyConcept --> K1[photos-2024-image.jpg]
    
    style S3Main fill:#FF9900

S3 Core Concepts:

  • Object storage service accessed through web service interfaces (REST API)
  • Bucket: Logical container for objects, globally unique name, region-specific
  • Object: File plus metadata (permissions, creation date, content type)
  • Key: Unique identifier within bucket, acts like file path
  • Unlike file systems with folders, S3 uses flat structure with key prefixes
  • Maximum object size: 5TB, unlimited storage capacity
  • Highly durable (11 nines) and available (99.99%) storage solution

Q2: Object Upload & Properties

sequenceDiagram
    participant User
    participant S3
    participant Bucket
    
    User->>S3: Upload file (max 5TB)
    Note over User,S3: REST API PUT or POST
    
    User->>S3: Set Properties
    S3->>S3: Permissions
    S3->>S3: Metadata
    S3->>S3: Storage Class
    S3->>S3: Encryption
    
    S3->>Bucket: Store Object
    Bucket-->>User: Success Response
    
    Note over S3: Metadata immutable<br/>after upload

Upload Process:

  • Maximum single object size: 5TB (use multipart upload for files > 100MB)
  • Permissions: Public or private access control at object level
  • Metadata: Key-value pairs (creation date, classification, custom tags)
  • Storage Class: Standard, Infrequent Access, Glacier (affects cost and retrieval time)
  • Encryption: SSE-S3 (AWS managed), SSE-KMS (customer managed), SSE-C (customer provided)
  • Metadata cannot be modified after upload—must copy object with new metadata
  • Use REST API (PUT/POST), AWS Console, CLI, or SDKs for uploads

Q3: S3 vs EBS Storage

graph TB
    subgraph S3Storage[S3 - Object Storage]
        S1[Internet or VPC Endpoint]
        S2[File storage, backups]
        S3[Fast access]
        S4[Cross-AZ redundancy]
        S5[Public or Private]
        S6[Cannot mount as filesystem]
    end
    
    subgraph EBSStorage[EBS - Block Storage]
        E1[Within VPC only]
        E2[Install applications]
        E3[Super fast IOPS]
        E4[Within AZ only]
        E5[Private only]
        E6[Attach to EC2 instance]
    end
    
    style S3Storage fill:#FF9900
    style EBSStorage fill:#4CAF50

Storage Comparison:

  • S3: Object storage like FTP, accessed via HTTP/HTTPS from anywhere
  • Best for static files, backups, archives, data lakes, content distribution
  • EBS: Block storage like SAN, attached directly to EC2 instances
  • Best for databases, operating systems, applications requiring low latency
  • S3 has cross-AZ redundancy, EBS replicates within single AZ
  • S3 can be public, EBS is always private within VPC
  • EBS provides faster I/O for applications, S3 better for large-scale storage

Q4: Storage Classes & Lifecycle

flowchart LR
    Upload[Upload] --> Standard[S3 Standard<br/>Day 0-30<br/>Frequent Access]
    Standard --> IA[S3 IA<br/>Day 31-90<br/>Lower Cost]
    IA --> Glacier[S3 Glacier<br/>Day 91-365<br/>Archival]
    Glacier --> Delete[Delete<br/>After 365 days]
    
    style Standard fill:#4CAF50
    style IA fill:#FF9800
    style Glacier fill:#2196F3

Storage Classes:

  • S3 Standard: Frequent access, low latency, high throughput (websites, big data)
  • S3 Infrequent Access (IA): Less frequent access, quick retrieval, lower cost (backups, DR)
  • S3 Glacier: Archival storage, lowest cost, retrieval minutes to hours (compliance, archives)
  • Lifecycle Policies: Automatically transition objects between classes based on age
  • Reduce costs by moving older data to cheaper storage classes
  • Can also automatically delete objects after specified time period
  • Configure rules based on object age, prefix, or tags

Q5: Data Transfer Methods

graph TB
    Transfer[S3 Data Transfer] --> Internet[Public Internet]
    Transfer --> Acceleration[S3 Transfer Acceleration]
    Transfer --> DirectConnect[AWS Direct Connect]
    Transfer --> Snowball[AWS Snowball]
    
    Internet --> I1[S3 APIs, Console, CLI]
    Internet --> I2[HTTPS port 443]
    Internet --> I3[VPC Endpoints]
    
    Acceleration --> A1[CloudFront Edge]
    Acceleration --> A2[AWS Backbone]
    
    DirectConnect --> DC1[Private 1-10 Gbps]
    
    Snowball --> SB1[Petabyte-scale]
    Snowball --> SB2[Physical device]
    
    style Transfer fill:#FF9900

Transfer Options:

  • Public Internet: Standard method using S3 APIs, Console, CLI, or SDKs over HTTPS
  • VPC Endpoints: Private connection from VPC without internet gateway (secure, cost-effective)
  • S3 Transfer Acceleration: Uses CloudFront edge locations for faster long-distance transfers
  • AWS Direct Connect: Dedicated private connection (1-10 Gbps) for consistent performance
  • AWS Snowball: Physical device for petabyte-scale data transfer when network impractical
  • Choose based on data volume, speed requirements, and security needs
  • VPC Endpoints eliminate NAT Gateway costs and improve security

Q6: S3 Replication

graph TB
    Source[Source Bucket<br/>us-east-1] --> Replication[S3 Replication<br/>Automatic]
    
    Replication --> CRR[Cross-Region<br/>Replication]
    Replication --> SRR[Same-Region<br/>Replication]
    
    CRR --> Target1[Target Bucket<br/>eu-west-1]
    CRR --> Benefits1[Disaster Recovery<br/>Compliance<br/>Lower Latency]
    
    SRR --> Target2[Target Bucket<br/>us-east-1]
    SRR --> Benefits2[Log Aggregation<br/>Between Accounts]
    
    style Replication fill:#FF9900
    style CRR fill:#4CAF50
    style SRR fill:#2196F3

Replication Features:

  • Cross-Region Replication (CRR): Replicate objects to different AWS region
  • Use for disaster recovery, compliance requirements, lower latency access
  • Same-Region Replication (SRR): Replicate within same region
  • Use for log aggregation, replication between accounts, data sovereignty
  • Replication is automatic and asynchronous after initial configuration
  • Can replicate entire bucket or specific prefixes/tags
  • Requires versioning enabled on both source and destination buckets
  • Replication only applies to new objects after rule is created

Q7: S3 Security

graph TB
    Security[S3 Security] --> IAM[IAM Policies]
    Security --> Bucket[Bucket Policies]
    Security --> Encryption[Encryption]
    
    IAM --> IAM1[User-level access]
    IAM --> IAM2[Programmatic]
    
    Bucket --> B1[Bucket-level access]
    Bucket --> B2[Deny statements]
    Bucket --> B3[Override IAM]
    
    Encryption --> E1[In Transit: HTTPS]
    Encryption --> E2[At Rest: SSE-S3, KMS, C]
    Encryption --> E3[Client-side]
    
    style Security fill:#FF9900
    style Encryption fill:#4CAF50

Security Layers:

  • IAM Policies: User-level permissions for programmatic access management
  • Bucket Policies: Bucket-level access control, can deny access and override IAM
  • ACLs: Legacy method, use IAM and bucket policies instead
  • Encryption in Transit: HTTPS/TLS for data transfer security
  • Encryption at Rest: SSE-S3 (AWS managed), SSE-KMS (customer managed keys), SSE-C (customer provided)
  • Client-side Encryption: Encrypt data before uploading to S3
  • Use bucket policies for cross-account access and public access scenarios
  • Enable MFA Delete for additional protection on versioned buckets

Q8: S3 Monitoring

graph LR
    Monitoring[S3 Monitoring] --> CloudTrail[AWS CloudTrail]
    Monitoring --> AccessLogs[S3 Access Logging]
    Monitoring --> Metrics[CloudWatch Metrics]
    
    CloudTrail --> CT1[API calls logging]
    CloudTrail --> CT2[Bucket-level default]
    CloudTrail --> CT3[Object-level optional]
    
    AccessLogs --> AL1[Detailed logs]
    AccessLogs --> AL2[Who, when, what]
    
    Metrics --> M1[Storage metrics]
    Metrics --> M2[Request metrics]
    
    style Monitoring fill:#FF9900

Monitoring Tools:

  • AWS CloudTrail: Logs all S3 API calls for auditing and compliance
  • Bucket-level operations logged by default, object-level requires data events
  • S3 Server Access Logging: Detailed access logs (who accessed, when, what operation)
  • Logs stored in separate S3 bucket for analysis
  • CloudWatch Metrics: Storage metrics (bucket size, object count), request metrics (GET/PUT counts)
  • Use for capacity planning, performance monitoring, and cost optimization
  • Set up CloudWatch alarms for unusual activity or threshold breaches
  • Combine all three for comprehensive security and operational visibility