# Selenium

Selenium drives a real browser, so proxy configuration depends on the underlying driver (Chrome / Firefox). Two approaches:

1. **No auth** — `--proxy-server` flag (Chrome) or profile setting (Firefox). The browser still prompts for credentials.
2. **With auth** — install a tiny on-the-fly extension that pre-fills the proxy auth dialog. This is the standard pattern for scraping.

## Chrome — with auth via Selenium-Wire

The cleanest option is `selenium-wire`, a Selenium wrapper that handles proxy auth transparently:

```bash
pip install selenium-wire
```

```python
from seleniumwire import webdriver

USER = "helo_s1a2b3c4d5e-type-res-region-us"
PASS = "PASSWORD"

options = {
    "proxy": {
        "http":  f"http://{USER}:{PASS}@gate.helodata.io:7777",
        "https": f"http://{USER}:{PASS}@gate.helodata.io:7777",
        "no_proxy": "localhost,127.0.0.1",
    }
}

driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://ipv4.icanhazip.com")
print(driver.page_source)
driver.quit()
```

For ISP, swap to `http://helo_s1a2b3c4d5e:PASSWORD@198.51.100.42:8000`.

## Chrome — without selenium-wire (manual auth extension)

If you can't add selenium-wire (e.g. corporate restrictions), build a manifest-v3 extension on the fly:

```python
import os, tempfile, zipfile, json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

HOST, PORT = "gate.helodata.io", 7777
USER = "helo_s1a2b3c4d5e-type-res-region-us"
PASS = "PASSWORD"

manifest = {
    "manifest_version": 3,
    "name": "helo-proxy",
    "version": "1.0.0",
    "permissions": ["proxy", "webRequest", "webRequestAuthProvider", "<all_urls>"],
    "background": {"service_worker": "bg.js"},
}
bg = f"""
chrome.proxy.settings.set({{ value: {{ mode: "fixed_servers", rules: {{
  singleProxy: {{ scheme: "http", host: "{HOST}", port: {PORT} }}, bypassList: ["localhost"]
}}}}, scope: "regular" }});
chrome.webRequest.onAuthRequired.addListener(
  () => ({{ authCredentials: {{ username: "{USER}", password: "{PASS}" }} }}),
  {{ urls: ["<all_urls>"] }},
  ["blocking"]
);
"""

tmp = tempfile.mkdtemp()
with open(f"{tmp}/manifest.json", "w") as f: json.dump(manifest, f)
with open(f"{tmp}/bg.js", "w") as f: f.write(bg)
ext_path = f"{tmp}/ext.zip"
with zipfile.ZipFile(ext_path, "w") as z:
    z.write(f"{tmp}/manifest.json", "manifest.json")
    z.write(f"{tmp}/bg.js", "bg.js")

opts = Options()
opts.add_extension(ext_path)
driver = webdriver.Chrome(options=opts)
driver.get("https://ipv4.icanhazip.com")
```

## Firefox

Firefox accepts proxy settings directly via `FirefoxProfile` plus a credential helper. With selenium-wire the same pattern as Chrome works.

## Verify

```python
print(driver.execute_script("return navigator.userAgent"))
driver.get("https://browserleaks.com/ip")
```

## Common pitfalls

* **Native proxy-auth dialog** blocks headless runs. Always use selenium-wire or the extension pattern for headless.
* **WebRTC leak** — add `--disable-features=WebRtcHideLocalIpsWithMdns` and/or block media via `chrome_options.add_argument("--disable-webrtc")` (Brave/Vivaldi flag).
* **Proxy not actually used in headless** — pre-Chrome-100 used `--headless`; modern Chrome requires `--headless=new` for the extension model to work.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.helodata.com/integrations/scraping-tools/selenium.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
