Skip to main content
Scrape website content and bring it into Galaxy for processing. Galaxy fetches content, processes it, and extracts structure, text, and entities.
Browserbase 1

How to Connect

  1. Select Browserbase as the Source type
  2. Name the Source: Give your Website Scrape Source a name
  3. Provide base URL: Enter the base URL of the website to scrape
  4. (Optional) Add a Prompt: Provide additional instructions to guide extraction (e.g., focus on product documentation, ignore navigation, prioritize structured data)
  5. Choose Credentials
    • Galaxy managed (Recommended) — Galaxy handles Browserbase credentials automatically
    • Bring your own API key — Connect your own Browserbase account
  6. (Optional) Configure Max URLs to Scrape: Set the maximum number of URLs Galaxy should process from the starting page.
Once created, Galaxy begins scraping the specified URL and processing the content.

COMING SOON More Configuration Options

  • Max Depth: Maximum depth to crawl from the base URL
  • Max Pages: Maximum number of pages to scrape
  • Allowed Domains: List of domains that are allowed to be scraped (leave empty to allow all)
  • Delay Between Requests: Delay in milliseconds between requests (helps avoid overwhelming servers)
  • Respect robots.txt: Whether to respect robots.txt rules
  • User Agent: Custom user agent string to identify the scraper

Content Processing

Galaxy processes scraped website content with:
  • Text extraction: Extracts text content from web pages, preserving structure and layout
  • Content normalization: Normalizes scraped content for consistency
  • Entity extraction: Automatically extracts and normalizes semantic entities including:
    • Dates and times (normalized to standard formats)
    • Email addresses and URLs
    • Phone numbers
    • Measurements, money, and percentages
    • Serial numbers, model numbers, and part numbers
    • IP addresses and version numbers
    • Technical measurements (temperature, pressure, voltage, current, frequency)
  • Normalization: Extracted entities are normalized to standardized formats
Don’t see a source type you’re looking for? We connect to hundreds of systems - reach out to support@getgalaxy.io to request access.

What’s Next

Sources

Learn about Sources in general

Projects

Understand how to use Sources in Projects