A Guide to Setting Up Vercel Log Storage in AWS S3

A Guide to Setting Up Vercel Log Storage in AWS S3

The Nativity with the Prophets Isaiah and Ezekiel (1308–1311) by Duccio di Buoninsegna.

Introduction

Modern cloud applications generate vast amounts of logs that are crucial for debugging, monitoring, and maintaining application health. While Vercel provides excellent built-in logging capabilities, these logs are only retained for 24 hours. For production applications, this brief retention period often proves insufficient for proper debugging, compliance requirements, and long-term analysis. This guide will show you how to implement a robust logging solution that preserves your Vercel logs in AWS S3 for future use.

Understanding the Architecture

Before diving into the implementation, let's understand how the components work together. When your Vercel application generates logs, they'll flow through a log drain to AWS S3. This process creates a persistent record of your application's behavior, which you can analyze using various AWS services. Think of it as creating a permanent, searchable archive of your application's history. This solution not only ensures that your logs are securely stored but also allows you to leverage AWS's powerful analytics tools to gain deeper insights into your application performance and user activity.


Implementation Guide

Step 1: S3 Bucket Configuration

First, we'll create an S3 bucket with appropriate security settings and retention policies. The bucket will serve as our log archive, with different storage tiers to balance accessibility and cost.

  1. Create the S3 Bucket:

    • Log in to the AWS Management Console.
    • Navigate to the S3 service.
    • Click “Create bucket” and provide a unique name (e.g., vercel-log-storage).
    • Disable public access to keep logs secure.
    • Click “Create bucket”.
  2. Configure the Bucket Policy:

    • Add the following bucket policy to grant write access:
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::your-bucket-name/logs/*",
            "Condition": {
              "StringEquals": {
                "s3:x-amz-acl": "bucket-owner-full-control"
              }
            }
          }
        ]
      }
    • This policy ensures that only authorized entities can write to your S3 bucket while maintaining bucket ownership for all uploaded logs.
  3. Set Up Lifecycle Policies:

    • Configure the following lifecycle policy to optimize storage costs:

      {
        "Rules": [
          {
            "ID": "Log Retention Policy",
            "Status": "Enabled",
            "Filter": {
              "Prefix": "logs/"
            },
            "Transitions": [
              {
                "Days": 30,
                "StorageClass": "STANDARD_IA"
              },
              {
                "Days": 90,
                "StorageClass": "GLACIER"
              }
            ],
            "Expiration": {
              "Days": 365
            }
          }
        ]
      }
    • This policy creates a tiered storage approach:

      • Logs remain in Standard storage for 30 days for immediate access during debugging.
      • Logs move to Standard-IA for reduced costs while maintaining accessibility for 30-90 days.
      • Logs archive to Glacier for long-term storage (90-365 days).
      • Logs are deleted after 365 days unless otherwise required by compliance policies or business needs.
    • Regularly review your lifecycle policies to ensure they align with your organization’s evolving requirements.


Step 2: Log Processing Implementation

The following Lambda function serves as our log processor, organizing incoming logs into a structured hierarchy:

import boto3
import json
from datetime import datetime

s3 = boto3.client('s3')
BUCKET_NAME = 'your-bucket-name'

def lambda_handler(event, context):
    log_data = event['body']
    log_json = json.loads(log_data)

    # Create a logical organization structure by branch
    branch = log_json.get('branch', 'unknown')
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

    # Establish a clear hierarchy for log storage
    file_name = f"logs/{branch}/{timestamp}.json"

    s3.put_object(
        Bucket=BUCKET_NAME,
        Key=file_name,
        Body=log_data,
        ContentType='application/json'
    )

    return {
        'statusCode': 200,
        'body': json.dumps('Log stored successfully!')
    }

This implementation creates a logical folder structure in S3:

logs/
  ├── main/         # Production environment logs
  │   └── YYYY-MM-DD_HH-MM-SS.json
  ├── staging/      # Staging environment logs
  │   └── YYYY-MM-DD_HH-MM-SS.json
  └── dev/          # Development environment logs
      └── YYYY-MM-DD_HH-MM-SS.json

Organizing logs by branch improves visibility and traceability, enabling developers and operations teams to pinpoint issues specific to a particular environment. For example, a team debugging a deployment issue in staging can quickly filter logs for the staging/ folder.


Step 3: Log Analysis and Monitoring

Once your logs are flowing into S3, you can implement powerful analysis capabilities using AWS services. Here's how to set up basic querying using AWS Athena:

CREATE EXTERNAL TABLE vercel_logs (
    timestamp STRING,
    level STRING,
    message STRING,
    branch STRING,
    deployment_id STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION 's3://your-bucket-name/logs/';

This table structure allows you to run SQL queries against your logs, such as:

-- Find all errors in production in the last 7 days
SELECT timestamp, message
FROM vercel_logs
WHERE level = 'error'
  AND branch = 'main'
  AND dt >= date_sub(current_date, 7)
ORDER BY timestamp DESC;

You can extend this setup by integrating Athena with visualization tools like Amazon QuickSight to create dashboards that monitor key metrics such as error rates, response times, and user activity trends.


Operational Best Practices

  • Monitor Costs: Use AWS Cost Explorer to track S3 usage and identify cost-saving opportunities.
  • Enable Encryption: Use server-side encryption (SSE) with AWS-managed keys (SSE-S3) or customer-managed keys (SSE-KMS) for logs.
  • Set Alerts: Configure CloudWatch alarms for unusual log activity or error spikes.
  • Compress Logs: Enable compression in Lambda to reduce storage costs without losing data integrity.
  • Replicate Logs: Use cross-region replication for disaster recovery and compliance with data sovereignty laws.
  • Regular Reviews: Periodically review your S3 bucket permissions and lifecycle policies to ensure they remain aligned with security and cost requirements.

Verification and Testing

  • Deploy and Test: Deploy your Vercel application and generate logs to verify the end-to-end setup.
  • Run Queries: Use Athena to query and validate log data structure and content.
  • Check Lifecycle Policies: Confirm that lifecycle transitions (e.g., Standard to Glacier) are functioning as expected.
  • Audit Permissions: Regularly audit your bucket policy to ensure that only authorized entities can access the logs.

Conclusion

By following this guide, you've created a robust logging system that extends well beyond Vercel's 24-hour retention limit. This solution provides the foundation for comprehensive application monitoring, debugging, and compliance reporting. Additionally, it equips your team with tools to gain actionable insights into application performance, ensuring a smoother user experience and improved operational efficiency.