Skip to main content
Scrape website content and bring it into Galaxy for processing. Galaxy fetches content, processes it, and extracts structure, text, and entities.
Browserbase

How to Connect

  1. Select Browserbase as the Source type
  2. Name the Source: Give your Website Scrape Source a name
  3. Provide base URL: Enter the base URL of the website to scrape
  4. (Optional) Add a Prompt: Provide additional instructions to guide extraction (e.g., focus on product documentation, ignore navigation, prioritize structured data)
  5. Choose Credentials 5. Galaxy managed (Recommended) — Galaxy handles Browserbase credentials automatically 5. Bring your own API key — Connect your own Browserbase account
  6. (Optional) Configure Max URLs to Scrape: Set the maximum number of URLs Galaxy should process from the starting page.
Once created, Galaxy begins scraping the specified URL and processing the content.

COMING SOON More Configuration Options

  • Max Depth: Maximum depth to crawl from the base URL
  • Max Pages: Maximum number of pages to scrape
  • Allowed Domains: List of domains that are allowed to be scraped (leave empty to allow all)
  • Delay Between Requests: Delay in milliseconds between requests (helps avoid overwhelming servers)
  • Respect robots.txt: Whether to respect robots.txt rules
  • User Agent: Custom user agent string to identify the scraper

Content Processing

Galaxy processes scraped website content with:
  • Text extraction: Extracts text content from web pages, preserving structure and layout
  • Content normalization: Normalizes scraped content for consistency
  • Entity extraction: Automatically extracts and normalizes semantic entities including:
    • Dates and times (normalized to standard formats)
    • Email addresses and URLs
    • Phone numbers
    • Measurements, money, and percentages
    • Serial numbers, model numbers, and part numbers
    • IP addresses and version numbers
    • Technical measurements (temperature, pressure, voltage, current, frequency)
  • Normalization: Extracted entities are normalized to standardized formats
Don’t see a source type you’re looking for? We connect to hundreds of systems - reach out to support@getgalaxy.io to request access.

What’s Next