StorageBlob Storage & Object Storage

Object storage (like S3) stores unstructured binary data — images, videos, documents — as objects with unique keys. It's infinitely scalable, cheap, and the standard for media storage in modern applications.

What is Object Storage?

Object storage stores data as discrete objects (files) in flat namespaces (buckets). Unlike file systems (hierarchical) or block storage (fixed-size blocks), object storage is designed for:

  • Massive scale (petabytes)
  • High durability (99.999999999% — "eleven 9s")
  • Direct HTTP access
  • Cheap storage at scale

Examples: AWS S3, Google Cloud Storage, Azure Blob Storage, Cloudflare R2, MinIO

Core Concepts

Buckets

The top-level container for objects. Global namespace (bucket names must be unique across all users). Configure access policies, versioning, and lifecycle rules at the bucket level.

Objects

Each object has:

  • Key: The unique identifier (e.g., users/avatar/abc123.jpg)
  • Value: The raw bytes (the file data)
  • Metadata: Key-value pairs (content-type, custom tags)
  • Version ID: If versioning is enabled

Access

Objects are accessed via HTTP(S) URLs:

https://bucket-name.s3.amazonaws.com/path/to/file.jpg

Common Use Cases

| Use Case | Pattern | |---|---| | User avatar uploads | Client uploads to pre-signed URL → Store key in DB | | Video storage | Upload to S3 → Transcode via worker → Store CDN URL | | Static website assets | S3 + CloudFront CDN | | Application backups | Nightly dump to S3 Glacier | | Data lake | Raw data in S3 → Query with Athena/Spark | | Audit logs | Stream to S3 → Long-term retention |

Pre-Signed URLs

A critical pattern: never proxy file uploads through your server. Instead:

  1. Client requests an upload URL from your server
  2. Server generates a pre-signed URL (a time-limited S3 URL with embedded permissions)
  3. Client uploads directly to S3 using the pre-signed URL
  4. Client notifies server that upload is complete
  5. Server stores the object key in the database

Benefits:

  • No file data passes through your servers (saves bandwidth and compute)
  • S3 handles the actual upload (much better infrastructure)
  • Pre-signed URLs expire automatically (secure by default)
Client → POST /api/upload-url → Server → S3 generates URL
Client → PUT https://s3.amazonaws.com/bucket/file?X-Amz-Signature=... → S3
Client → POST /api/upload-complete → Server → DB saves key

Storage Classes & Cost

| S3 Class | Use Case | Cost | |---|---|---| | Standard | Frequently accessed | Highest | | Standard-IA | Infrequently accessed | Lower | | One Zone-IA | Reproducible, infrequent | Lower | | Glacier Instant | Archives, occasional access | Much lower | | Glacier Deep Archive | Long-term archival | Cheapest |

Lifecycle policies automatically move objects between classes based on age — e.g., move to Glacier after 90 days.

Durability vs. Availability

  • Durability (11 nines): S3 stores data redundantly across 3+ AZs. The chance of losing data is 0.000000001% per year.
  • Availability (99.99%): S3 is accessible ~52 minutes of downtime per year in the SLA.

Durability is about data survival. Availability is about being able to access it.

Interview Tips

  • Use pre-signed URLs for uploads — never proxy binary data through your application servers
  • CDN (CloudFront) in front of S3 is standard for serving files globally — content is cached at edge
  • Always store the S3 object key in your database, not the full URL (the URL may change if you migrate buckets or providers)
  • For large files (video), discuss multipart uploads: upload in chunks, assemble on S3. Handles failures gracefully
  • Object storage is not a database — no querying by content, no transactions. It's a key-value store for binary data