Full Stack • Java • System Design • Cloud • AI Engineering

AWS S3 Storage Service

Master AWS S3 concepts with visual diagrams covering storage classes, replication, encryption, lifecycle policies, and data transfer methods. Complete guide with data flows and architecture patterns.

Q1: What is Amazon S3?

graph TB
    S3Main[Amazon S3<br/>Simple Storage Service] --> Concept[Key Concepts]
    
    Concept --> Bucket[Bucket<br/>Logical Container]
    Concept --> Object[Object<br/>File + Metadata]
    Concept --> KeyConcept[Key<br/>Unique Identifier]
    
    Bucket --> B1[my-app-bucket]
    Object --> O1[File: image.jpg<br/>Metadata: permissions]
    KeyConcept --> K1[photos-2024-image.jpg]
    
    style S3Main fill:#FF9900

S3 Core Concepts:

  • Object storage service accessed through web service interfaces (REST API)
  • Bucket: Logical container for objects, globally unique name, region-specific
  • Object: File plus metadata (permissions, creation date, content type)
  • Key: Unique identifier within bucket, acts like file path
  • Unlike file systems with folders, S3 uses flat structure with key prefixes
  • Maximum object size: 5TB, unlimited storage capacity
  • Highly durable (11 nines) and available (99.99%) storage solution

Q2: Object Upload & Properties

sequenceDiagram
    participant User
    participant S3
    participant Bucket
    
    User->>S3: Upload file (max 5TB)
    Note over User,S3: REST API PUT or POST
    
    User->>S3: Set Properties
    S3->>S3: Permissions
    S3->>S3: Metadata
    S3->>S3: Storage Class
    S3->>S3: Encryption
    
    S3->>Bucket: Store Object
    Bucket-->>User: Success Response
    
    Note over S3: Metadata immutable<br/>after upload

Upload Process:

  • Maximum single object size: 5TB (use multipart upload for files > 100MB)
  • Permissions: Public or private access control at object level
  • Metadata: Key-value pairs (creation date, classification, custom tags)
  • Storage Class: Standard, Infrequent Access, Glacier (affects cost and retrieval time)
  • Encryption: SSE-S3 (AWS managed), SSE-KMS (customer managed), SSE-C (customer provided)
  • Metadata cannot be modified after upload—must copy object with new metadata
  • Use REST API (PUT/POST), AWS Console, CLI, or SDKs for uploads

Q3: S3 vs EBS Storage

graph TB
    subgraph S3Storage[S3 - Object Storage]
        S1[Internet or VPC Endpoint]
        S2[File storage, backups]
        S3[Fast access]
        S4[Cross-AZ redundancy]
        S5[Public or Private]
        S6[Cannot mount as filesystem]
    end
    
    subgraph EBSStorage[EBS - Block Storage]
        E1[Within VPC only]
        E2[Install applications]
        E3[Super fast IOPS]
        E4[Within AZ only]
        E5[Private only]
        E6[Attach to EC2 instance]
    end
    
    style S3Storage fill:#FF9900
    style EBSStorage fill:#4CAF50

Storage Comparison:

  • S3: Object storage like FTP, accessed via HTTP/HTTPS from anywhere
  • Best for static files, backups, archives, data lakes, content distribution
  • EBS: Block storage like SAN, attached directly to EC2 instances
  • Best for databases, operating systems, applications requiring low latency
  • S3 has cross-AZ redundancy, EBS replicates within single AZ
  • S3 can be public, EBS is always private within VPC
  • EBS provides faster I/O for applications, S3 better for large-scale storage

Q4: Storage Classes & Lifecycle

flowchart LR
    Upload[Upload] --> Standard[S3 Standard<br/>Day 0-30<br/>Frequent Access]
    Standard --> IA[S3 IA<br/>Day 31-90<br/>Lower Cost]
    IA --> Glacier[S3 Glacier<br/>Day 91-365<br/>Archival]
    Glacier --> Delete[Delete<br/>After 365 days]
    
    style Standard fill:#4CAF50
    style IA fill:#FF9800
    style Glacier fill:#2196F3

Storage Classes:

  • S3 Standard: Frequent access, low latency, high throughput (websites, big data)
  • S3 Infrequent Access (IA): Less frequent access, quick retrieval, lower cost (backups, DR)
  • S3 Glacier: Archival storage, lowest cost, retrieval minutes to hours (compliance, archives)
  • Lifecycle Policies: Automatically transition objects between classes based on age
  • Reduce costs by moving older data to cheaper storage classes
  • Can also automatically delete objects after specified time period
  • Configure rules based on object age, prefix, or tags

Q5: Data Transfer Methods

graph TB
    Transfer[S3 Data Transfer] --> Internet[Public Internet]
    Transfer --> Acceleration[S3 Transfer Acceleration]
    Transfer --> DirectConnect[AWS Direct Connect]
    Transfer --> Snowball[AWS Snowball]
    
    Internet --> I1[S3 APIs, Console, CLI]
    Internet --> I2[HTTPS port 443]
    Internet --> I3[VPC Endpoints]
    
    Acceleration --> A1[CloudFront Edge]
    Acceleration --> A2[AWS Backbone]
    
    DirectConnect --> DC1[Private 1-10 Gbps]
    
    Snowball --> SB1[Petabyte-scale]
    Snowball --> SB2[Physical device]
    
    style Transfer fill:#FF9900

Transfer Options:

  • Public Internet: Standard method using S3 APIs, Console, CLI, or SDKs over HTTPS
  • VPC Endpoints: Private connection from VPC without internet gateway (secure, cost-effective)
  • S3 Transfer Acceleration: Uses CloudFront edge locations for faster long-distance transfers
  • AWS Direct Connect: Dedicated private connection (1-10 Gbps) for consistent performance
  • AWS Snowball: Physical device for petabyte-scale data transfer when network impractical
  • Choose based on data volume, speed requirements, and security needs
  • VPC Endpoints eliminate NAT Gateway costs and improve security

Q6: S3 Replication

graph TB
    Source[Source Bucket<br/>us-east-1] --> Replication[S3 Replication<br/>Automatic]
    
    Replication --> CRR[Cross-Region<br/>Replication]
    Replication --> SRR[Same-Region<br/>Replication]
    
    CRR --> Target1[Target Bucket<br/>eu-west-1]
    CRR --> Benefits1[Disaster Recovery<br/>Compliance<br/>Lower Latency]
    
    SRR --> Target2[Target Bucket<br/>us-east-1]
    SRR --> Benefits2[Log Aggregation<br/>Between Accounts]
    
    style Replication fill:#FF9900
    style CRR fill:#4CAF50
    style SRR fill:#2196F3

Replication Features:

  • Cross-Region Replication (CRR): Replicate objects to different AWS region
  • Use for disaster recovery, compliance requirements, lower latency access
  • Same-Region Replication (SRR): Replicate within same region
  • Use for log aggregation, replication between accounts, data sovereignty
  • Replication is automatic and asynchronous after initial configuration
  • Can replicate entire bucket or specific prefixes/tags
  • Requires versioning enabled on both source and destination buckets
  • Replication only applies to new objects after rule is created

Q7: S3 Security

graph TB
    Security[S3 Security] --> IAM[IAM Policies]
    Security --> Bucket[Bucket Policies]
    Security --> Encryption[Encryption]
    
    IAM --> IAM1[User-level access]
    IAM --> IAM2[Programmatic]
    
    Bucket --> B1[Bucket-level access]
    Bucket --> B2[Deny statements]
    Bucket --> B3[Override IAM]
    
    Encryption --> E1[In Transit: HTTPS]
    Encryption --> E2[At Rest: SSE-S3, KMS, C]
    Encryption --> E3[Client-side]
    
    style Security fill:#FF9900
    style Encryption fill:#4CAF50

Security Layers:

  • IAM Policies: User-level permissions for programmatic access management
  • Bucket Policies: Bucket-level access control, can deny access and override IAM
  • ACLs: Legacy method, use IAM and bucket policies instead
  • Encryption in Transit: HTTPS/TLS for data transfer security
  • Encryption at Rest: SSE-S3 (AWS managed), SSE-KMS (customer managed keys), SSE-C (customer provided)
  • Client-side Encryption: Encrypt data before uploading to S3
  • Use bucket policies for cross-account access and public access scenarios
  • Enable MFA Delete for additional protection on versioned buckets

Q8: S3 Monitoring

graph LR
    Monitoring[S3 Monitoring] --> CloudTrail[AWS CloudTrail]
    Monitoring --> AccessLogs[S3 Access Logging]
    Monitoring --> Metrics[CloudWatch Metrics]
    
    CloudTrail --> CT1[API calls logging]
    CloudTrail --> CT2[Bucket-level default]
    CloudTrail --> CT3[Object-level optional]
    
    AccessLogs --> AL1[Detailed logs]
    AccessLogs --> AL2[Who, when, what]
    
    Metrics --> M1[Storage metrics]
    Metrics --> M2[Request metrics]
    
    style Monitoring fill:#FF9900

Monitoring Tools:

  • AWS CloudTrail: Logs all S3 API calls for auditing and compliance
  • Bucket-level operations logged by default, object-level requires data events
  • S3 Server Access Logging: Detailed access logs (who accessed, when, what operation)
  • Logs stored in separate S3 bucket for analysis
  • CloudWatch Metrics: Storage metrics (bucket size, object count), request metrics (GET/PUT counts)
  • Use for capacity planning, performance monitoring, and cost optimization
  • Set up CloudWatch alarms for unusual activity or threshold breaches
  • Combine all three for comprehensive security and operational visibility