# Selenium

Selenium 驱动真实浏览器，代理配置取决于具体驱动（Chrome / Firefox）。两种思路：

1. **无认证** — Chrome 用 `--proxy-server` 参数、Firefox 用 profile，浏览器仍会弹凭证框。
2. **带认证** — 临时生成扩展自动填代理凭证框，**这是采集场景的标准做法**。

## Chrome — 使用 Selenium-Wire 处理认证

最干净的方案是 `selenium-wire`，对 Selenium 做了透明代理认证封装：

```bash
pip install selenium-wire
```

```python
from seleniumwire import webdriver

USER = "helo_s1a2b3c4d5e-type-res-region-us"
PASS = "PASSWORD"

options = {
    "proxy": {
        "http":  f"http://{USER}:{PASS}@gate.helodata.io:7777",
        "https": f"http://{USER}:{PASS}@gate.helodata.io:7777",
        "no_proxy": "localhost,127.0.0.1",
    }
}

driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://ipv4.icanhazip.com")
print(driver.page_source)
driver.quit()
```

ISP 用 `http://helo_s1a2b3c4d5e:PASSWORD@198.51.100.42:8000`。

## Chrome — 不用 selenium-wire（即时生成扩展）

无法引入 selenium-wire 时（如企业限制），即时生成一个 manifest-v3 扩展：

```python
import os, tempfile, zipfile, json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

HOST, PORT = "gate.helodata.io", 7777
USER = "helo_s1a2b3c4d5e-type-res-region-us"
PASS = "PASSWORD"

manifest = {
    "manifest_version": 3,
    "name": "helo-proxy",
    "version": "1.0.0",
    "permissions": ["proxy", "webRequest", "webRequestAuthProvider", "<all_urls>"],
    "background": {"service_worker": "bg.js"},
}
bg = f"""
chrome.proxy.settings.set({{ value: {{ mode: "fixed_servers", rules: {{
  singleProxy: {{ scheme: "http", host: "{HOST}", port: {PORT} }}, bypassList: ["localhost"]
}}}}, scope: "regular" }});
chrome.webRequest.onAuthRequired.addListener(
  () => ({{ authCredentials: {{ username: "{USER}", password: "{PASS}" }} }}),
  {{ urls: ["<all_urls>"] }},
  ["blocking"]
);
"""

tmp = tempfile.mkdtemp()
with open(f"{tmp}/manifest.json", "w") as f: json.dump(manifest, f)
with open(f"{tmp}/bg.js", "w") as f: f.write(bg)
ext_path = f"{tmp}/ext.zip"
with zipfile.ZipFile(ext_path, "w") as z:
    z.write(f"{tmp}/manifest.json", "manifest.json")
    z.write(f"{tmp}/bg.js", "bg.js")

opts = Options()
opts.add_extension(ext_path)
driver = webdriver.Chrome(options=opts)
driver.get("https://ipv4.icanhazip.com")
```

## Firefox

Firefox 通过 `FirefoxProfile` 直接配代理 + 凭证 helper。配合 selenium-wire 写法同 Chrome。

## 验证

```python
print(driver.execute_script("return navigator.userAgent"))
driver.get("https://browserleaks.com/ip")
```

## 常见陷阱

* **原生代理认证弹框** 阻塞 headless 运行，**headless 场景必须用 selenium-wire 或扩展方案**。
* **WebRTC 泄露** — 加 `--disable-features=WebRtcHideLocalIpsWithMdns`，或通过 chrome\_options 添加 `--disable-webrtc`（Brave/Vivaldi 参数）。
* **headless 下代理未生效** — 旧用 `--headless`，新版 Chrome 必须 `--headless=new` 扩展机制才生效。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.helodata.com/helodata-zh/ji-cheng-zhi-nan/pa-chong-gong-ju/selenium.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
