同期スクレイピング

POST

scrape

同期スクレイピング

curl --request POST \
  --url https://scrape.cleariflow.com/v1/scrape \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "api_key": "<string>",
  "session_id": "<string>",
  "fingerprint": "<string>",
  "render": {},
  "render.wait_until": "<string>",
  "render.timeout_ms": 123,
  "render.post_load_wait_ms": 123,
  "render.ignore_https_errors": true,
  "resources": {},
  "resources.block": [
    {}
  ],
  "actions": [
    {}
  ],
  "actions[].type": "<string>",
  "actions[].selector": "<string>",
  "actions[].text": "<string>",
  "actions[].to": "<string>",
  "actions[].wait_ms": 123,
  "actions[].timeout_ms": 123,
  "cookies": [
    {}
  ]
}
'

{
  "ok": true,
  "html": "<!DOCTYPE html><html>...</html>",
  "meta": {
    "url": "https://example.com",
    "status_code": 200,
    "duration_ms": 4521
  }
}

はじめに

ベース URL

https://scrape.cleariflow.com/v1/scrape

リクエスト例

curl -X POST 'https://scrape.cleariflow.com/v1/scrape' \
  -H 'Content-Type: application/json' \
  -d '{
    "api_key": "YOUR_UNIQUE_API_KEY",
    "url": "https://example.com",
    "render": {
      "wait_until": "networkidle",
      "timeout_ms": 60000
    }
  }'

成功したリクエストはレンダリング済みページ HTML とメタデータを返します：

{
  "ok": true,
  "html": "<!DOCTYPE html><html>...</html>",
  "meta": {
    "url": "https://example.com",
    "status_code": 200,
    "duration_ms": 4521
  }
}

レンダリングオプション

render オブジェクトは HTML 取得前のページ読み込み方法を制御します。すべてのフィールドは任意で、省略時はサーバーのデフォルト値が適用されます。

"render": {
  "wait_until": "networkidle",
  "timeout_ms": 60000,
  "post_load_wait_ms": 2000,
  "ignore_https_errors": false
}

フィールド	型	デフォルト	説明
`wait_until`	String	`domcontentloaded`	ナビゲーション完了とみなすタイミング。高速な結果には `domcontentloaded`、初期 HTML 後に XHR/fetch でデータを読み込むページには `networkidle`。
`timeout_ms`	Integer	`60000`	ページ読み込みの最大待機時間（ミリ秒）。超過するとリクエストは失敗します。
`post_load_wait_ms`	Integer	`0`	`wait_until` 後、HTML 取得前の追加待機時間（ミリ秒）。アニメーション、lazy-load ウィジェット、`networkidle` 後のクライアント側レンダリングに有用。
`ignore_https_errors`	Boolean	`false`	`true` の場合、対象ページの TLS 証明書エラーを無視します。

読み込み後もデータ取得を続ける JavaScript 多用ページの例:

curl -X POST 'https://scrape.cleariflow.com/v1/scrape' \
  -H 'Content-Type: application/json' \
  -d '{
    "api_key": "YOUR_UNIQUE_API_KEY",
    "url": "https://quotes.toscrape.com/js/",
    "render": {
      "wait_until": "networkidle",
      "timeout_ms": 60000,
      "post_load_wait_ms": 1500,
      "ignore_https_errors": false
    }
  }'

リソースオプション

resources オブジェクトはスクレイピング中にブラウザが読み込むアセット種別を制御します。HTML のテキストと構造だけが必要な場合、重いリソースをブロックするとリクエストが高速化されます。

"resources": {
  "block": ["images", "fonts", "media"]
}

値	ブロック対象
`images`	画像（`<img>`、CSS 背景、画像として読み込まれる SVG アイコン）
`fonts`	Web フォント
`media`	動画・音声ストリーム

すべてのフィールドは任意です。resources を省略した場合、デプロイメントでサーバー側デフォルトが設定されていない限り、リソース種別はブロックされません。例 — 画像とフォントをスキップして高速化:

curl -X POST 'https://scrape.cleariflow.com/v1/scrape' \
  -H 'Content-Type: application/json' \
  -d '{
    "api_key": "YOUR_UNIQUE_API_KEY",
    "url": "https://quotes.toscrape.com/",
    "resources": {
      "block": ["images", "fonts"]
    }
  }'

リクエストパラメータ

url

String

必須

スクレイピング対象 URL。公開 HTTP または HTTPS URL である必要があります。localhost およびプライベート IP へのリクエストは SSRF 保護によりブロックされます。

api_key

String

必須

固有の API キー。

session_id

String

複数のスクレイピングリクエスト間でブラウザ状態（Cookie、ローカルストレージ）を再利用するためのオプションのセッション ID。

fingerprint

String

ブラウザフィンガープリントプリセット。対応値: desktop_en_us、desktop_ru_ru、mobile_en_us。

render

Object

ブラウザセッションのレンダリングオプション。

render.wait_until

String

ナビゲーション完了とみなすタイミング。値: domcontentloaded、networkidle。デフォルト: domcontentloaded。

render.timeout_ms

Integer

ページ読み込み待機の最大時間（ミリ秒）。デフォルト: 60000。

render.post_load_wait_ms

Integer

ページ読み込み後、コンテンツ取得前の追加待機時間（ミリ秒）。

render.ignore_https_errors

Boolean

true の場合、ターゲットページの TLS 証明書エラーを無視します。

resources

Object

リソース読み込みの制御。

resources.block

Array

ブロックするリソースタイプ。対応値: images、fonts、media。

actions

Array

コンテンツ取得前に実行するブラウザアクションの順序付きリスト。各アクションは type フィールドを持つオブジェクトです。

actions[].type

String

必須

アクションタイプ。対応値: wait、wait_for、click、type、scroll。

actions[].selector

String

wait_for、click、type アクション用の CSS セレクタ。

actions[].text

String

type アクションで入力するテキスト。

actions[].to

String

scroll アクションのスクロール先（例: bottom）。

actions[].wait_ms

Integer

wait アクションの待機時間（ミリ秒）。

actions[].timeout_ms

Integer

wait_for アクションのタイムアウト（ミリ秒）。

Array

ナビゲーション前に注入する Cookie。各 Cookie オブジェクトには name と value が必要。オプション: domain、path。

レスポンスパラメータ

API レスポンスは汎用的で軽量な JSON 形式で返されます。

Boolean

スクレイピングが正常に完了したかどうか。

html

String

レンダリング済みページ HTML。

​はじめに

​ベース URL

​リクエスト例

​レンダリングオプション

​リソースオプション

​リクエストパラメータ

​レスポンスパラメータ

はじめに

ベース URL

リクエスト例

レンダリングオプション

リソースオプション

リクエストパラメータ

レスポンスパラメータ