Icecrawl

A powerful web scraping application offering multiple interfaces: HTTP API, CLI, and MCP Server.

Overview

Icecrawl is a flexible web scraping tool with multiple interfaces (HTTP API with dashboard, CLI, MCP server) designed for crawling, data extraction, and site analysis.

Features

🌐 HTTP API & Dashboard

RESTful API with web UI for managing scrapes and viewing results.

💻 CLI Tool

Command-line scraping with icecrawl for quick operations.

🤖 MCP Server

Programmatic scraping via icecrawl mcp-server for agent integrations.

🔐 Authentication

Role-based access control for secure usage.

💾 Database Storage

Persistent storage with Prisma ORM (SQLite by default).

🕷️ Flexible Crawling

Asynchronous crawling with depth & scope controls.

⚙️ Output Formats

Supports JSON, Markdown, HTML, and screenshots.

🚀 Performance & Proxy

Caching, request pooling, and proxy support for speed & reliability.

🖥️ JS Rendering

Optional headless browser via Puppeteer for dynamic sites.

Installation

From npm (Recommended)

npm install -g icecrawl

Creates default data directory and seeds admin user.

From Source (Development)

git clone https://github.com/wangdangel/icecrawl.git
cd icecrawl
npm install
cp .env.example .env
npx prisma migrate dev
npm run build
npm run build:dashboard

Usage

Start Dashboard & MCP Server

icecrawl

Dashboard: http://localhost:6971/dashboard, API Docs: /api-docs

Start Dashboard Only

icecrawl dashboard

Start MCP Server Only

icecrawl mcp-server

Scrape via CLI

icecrawl scrape url https://example.com
echo "https://example.com" | icecrawl scrape

Troubleshooting & FAQ

Permission Denied: add execute permissions for icecrawl if necessary (e.g., chmod +x $(npm bin -g)/icecrawl).

MCP Server Setup

{
  "command": "node",
  "args": ["k:/Documents/smart_crawler/dist/mcp-server.js"],
  "cwd": "k:/Documents/smart_crawler",
  "transportType": "stdio"
}

Default Login Credentials

UsernamePasswordRole
adminpasswordadmin

License

Licensed under MIT. (View)

Contributing

Contributions welcome! Open issues or PRs on GitHub.