# LangChain

LangChain pipelines that fetch web content (loaders, retrievers, web-search tools) can route through helodata to avoid IP-based blocks during LLM-driven scraping or RAG ingestion.

## Document loaders

`WebBaseLoader`, `PlaywrightURLLoader`, and friends use `requests` / `httpx` / Playwright under the hood. Set the proxy on the underlying client.

### `WebBaseLoader`

```python
from langchain_community.document_loaders import WebBaseLoader

USER = "helo_s1a2b3c4d5e-type-res-region-us"
PASS = "PASSWORD"
proxy = f"http://{USER}:{PASS}@gate.helodata.io:7777"

loader = WebBaseLoader("https://example.com")
loader.requests_kwargs = {
    "proxies": {"http": proxy, "https": proxy},
    "timeout": 30,
}
docs = loader.load()
```

### `PlaywrightURLLoader`

```python
from langchain_community.document_loaders import PlaywrightURLLoader

loader = PlaywrightURLLoader(
    urls=["https://example.com"],
    remove_selectors=["nav", "footer"],
    continue_on_failure=True,
    headless=True,
    launch_options={
        "proxy": {
            "server":   "http://gate.helodata.io:7777",
            "username": "helo_s1a2b3c4d5e-type-res-region-us",
            "password": "PASSWORD",
        }
    },
)
docs = loader.load()
```

## Web search tools

Tools like `TavilySearch` or `DuckDuckGoSearchRun` hit external APIs that may or may not need proxying. For tools backed by HTTP scraping (e.g. `RequestsGetTool`), inject the proxied client:

```python
from langchain.tools.requests.tool import RequestsGetTool
from langchain_community.utilities.requests import TextRequestsWrapper

wrapper = TextRequestsWrapper(
    headers={"User-Agent": "Mozilla/5.0"},
    proxies={"http": proxy, "https": proxy},
)
tool = RequestsGetTool(requests_wrapper=wrapper, allow_dangerous_requests=True)
```

## Sticky session per LLM call

LangChain agents often make multiple sub-requests during one chain. Use a sticky session so the target sees consistent IP behaviour:

```python
import random, string
sid = "".join(random.choices(string.ascii_lowercase + string.digits, k=8))
user = f"helo_s1a2b3c4d5e-type-res-region-us-session-{sid}-sesstime-30"
proxy = f"http://{user}:PASSWORD@gate.helodata.io:7777"
```

Re-use `proxy` across all sub-requests within the chain.

## Tagging for analytics

Pass `X-Helodata-Tag` in headers to separate LLM-driven traffic from your other scraping:

```python
wrapper = TextRequestsWrapper(
    headers={"X-Helodata-Tag": "llm-agent-prod"},
    proxies={"http": proxy, "https": proxy},
)
```

## Common pitfalls

* **`UnstructuredURLLoader` ignores proxies** in some versions — switch to `WebBaseLoader` or pre-download with `requests`.
* **Async loaders** — wrap the client in `httpx.AsyncClient(proxy=proxy)` for proxy support; `aiohttp` uses `proxy=` per request.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.helodata.com/integrations/ai-and-workflow/langchain.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
