How to Run Puppeteer on AWS Lambda

March 28, 2025•Jonathan Geiger

puppeteerweb-scrapingAWSserverlessautomation

Running Puppeteer on AWS Lambda can be challenging due to the serverless environment's limitations and Chrome's resource requirements. However, with the right setup and optimizations, it's possible to create a reliable web scraping solution that scales automatically. In this guide, we'll explore how to set up Puppeteer on AWS Lambda and provide a working boilerplate solution.

Why Run Puppeteer on AWS Lambda?

Running Puppeteer on AWS Lambda offers several advantages:

Serverless Architecture: No need to manage servers or worry about uptime
Cost-Effective: Pay only for the compute time you use
Auto-Scaling: Automatically handle varying workloads
Easy Integration: Works well with other AWS services

However, there are some challenges to consider:

Lambda's execution time limits (up to 15 minutes)
Memory constraints (up to 10GB)
Cold starts affecting performance
Chrome binary compatibility issues

Setting Up Puppeteer on AWS Lambda

I've created a boilerplate repository that handles these challenges and provides a working solution. Let's go through the setup process:

Prerequisites

Node.js 18.x (recommended)
AWS Account with Lambda and S3 access
AWS CLI configured for local deployment

Local Development Setup

First, clone the repository and set up your local environment:

# Install Node.js 18
nvm install 18
nvm use 18

# Install dependencies
npm install

# Create environment file
echo "SECRET=your-secret-key-here" > .env

# Run locally
node index.js

AWS Configuration

Create an S3 bucket for your Lambda deployment package
Create a Lambda function with these recommended settings:
- Runtime: Node.js 18.x
- Memory: 1024 MB
- Timeout: 30 seconds
- Architecture: x86_64

Deployment Options

Manual Deployment

# Create deployment package
zip -r lambda.zip index.js node_modules

# Upload to S3
aws s3 cp lambda.zip s3://your-bucket-name/lambda.zip

Then update your Lambda function through the AWS Console:

Go to AWS Lambda Console
Select your function
Go to Code tab
Click "Upload from" -> "Amazon S3 location"
Paste the S3 URL of your uploaded zip file

Automated Deployment with GitHub Actions

The boilerplate includes a GitHub Actions workflow for automated deployment. To set it up:

Add these secrets to your GitHub repository:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Update the workflow file (.github/workflows/main.yml) with your values:
- Replace {{your-bucket-name}} with your S3 bucket name
- Replace {{your-function-name}} with your Lambda function name
Push to main to trigger deployment

Using the Lambda Function

The function accepts POST requests with this structure:

{
	"url": "https://example.com"
}

Required headers:

secret: your-secret-key

Key Features of the Boilerplate

Stealth Mode: Uses puppeteer-extra-plugin-stealth to avoid detection
AWS Compatibility: Uses @sparticuz/chromium for Lambda compatibility
Security: Secret key authentication
Automated Deployment: GitHub Actions workflow included

Dependencies

The boilerplate uses these key dependencies:

@sparticuz/chromium: ^123.0.1
puppeteer-extra: ^3.3.4
puppeteer-core: 19.6
puppeteer-extra-plugin-stealth: ^2.11.1
puppeteer: ^21.5.0
dotenv: ^16.4.5

Alternative Solution: CaptureKit

While running Puppeteer on AWS Lambda is powerful, it requires significant maintenance and handling of edge cases. If you're looking for a managed solution that handles all the infrastructure and maintenance, consider using CaptureKit. It provides three powerful APIs in one platform:

Screenshot API

Reliable screenshot capture with no infrastructure management
Full-page screenshots with lazy loading support
Built-in ad and cookie banner blocking
Multiple output formats (PNG, WebP, JPEG, PDF)
Direct S3 upload integration

Content Extraction API

Clean, structured HTML extraction
Metadata parsing (title, description, OpenGraph & Schema data)
Link scraping (internal and external)
Consistent data without maintenance headaches
Perfect for data pipelines and web scraping

AI Analysis API

Instant webpage summarization
Key insights extraction
AI-powered content analysis
Scale your web research process
Focus on creating, not extracting content

All CaptureKit APIs are:

Developer-first with instant access
No credit card required for free tier
Lightning-fast support
Built for production use cases

Best Practices and Tips

Memory Management
- Monitor Lambda memory usage
- Adjust memory allocation based on your needs
- Clean up resources properly
Performance Optimization
- Use Lambda layers for dependencies
- Implement connection pooling
- Cache frequently accessed data
Error Handling
- Implement proper error logging
- Set up CloudWatch alarms
- Handle timeouts gracefully
Security
- Never commit AWS credentials
- Use environment variables for secrets
- Implement proper IAM roles

Conclusion

Running Puppeteer on AWS Lambda is a powerful solution for serverless web scraping, but it requires careful setup and maintenance. The provided boilerplate handles many common challenges and provides a solid foundation for your projects.

For those who want to focus on their core business logic without managing infrastructure, CaptureKit offers a comprehensive solution that handles all the complexities of web scraping and content extraction.

Choose the approach that best fits your needs:

Use the Puppeteer Lambda boilerplate if you need full control and customization
Use CaptureKit if you want a managed solution with additional features

If you found this post useful, I wrote more posts about scraping and Puppeteer. Maybe you will find these scraping tutorials useful as well: