A powerful web scraping application offering multiple interfaces: HTTP API, CLI, and MCP Server.
Icecrawl is a flexible web scraping tool with multiple interfaces (HTTP API with dashboard, CLI, MCP server) designed for crawling, data extraction, and site analysis.
RESTful API with web UI for managing scrapes and viewing results.
Command-line scraping with icecrawl
for quick operations.
Programmatic scraping via icecrawl mcp-server
for agent integrations.
Role-based access control for secure usage.
Persistent storage with Prisma ORM (SQLite by default).
Asynchronous crawling with depth & scope controls.
Supports JSON, Markdown, HTML, and screenshots.
Caching, request pooling, and proxy support for speed & reliability.
Optional headless browser via Puppeteer for dynamic sites.
npm install -g icecrawl
Creates default data directory and seeds admin user.
git clone https://github.com/wangdangel/icecrawl.git
cd icecrawl
npm install
cp .env.example .env
npx prisma migrate dev
npm run build
npm run build:dashboard
icecrawl
Dashboard: http://localhost:6971/dashboard
, API Docs: /api-docs
icecrawl dashboard
icecrawl mcp-server
icecrawl scrape url https://example.com
echo "https://example.com" | icecrawl scrape
Permission Denied: add execute permissions for icecrawl
if necessary (e.g., chmod +x $(npm bin -g)/icecrawl
).
{
"command": "node",
"args": ["k:/Documents/smart_crawler/dist/mcp-server.js"],
"cwd": "k:/Documents/smart_crawler",
"transportType": "stdio"
}
Username | Password | Role |
---|---|---|
admin | password | admin |
Licensed under MIT. (View)
Contributions welcome! Open issues or PRs on GitHub.