Skip to content

BigQuery v2 REST API surface

This is the emulator's canonical mapping from the public BigQuery v2 REST API to the Go handler that backs each endpoint, derived from the upstream documentation under BigQuery REST v2 reference.

The goal of this document is operational: when you're staring at a client library that's failing against the emulator, you want to know exactly which file to open. Keep this in sync with gateway/server.go and the gateway-HTTP-surface section of ROADMAP.md.

Status legend: done = end-to-end implemented · wired = route registered, returns a structurally-valid stub or 501 · todo = not yet in the route table

Method summary

The emulator targets the BigQuery REST surface served at https://bigquery.googleapis.com/bigquery/v2/.... Path templates here omit the host and use {x} for path variables.

Projects (bigquery.projects.*)

Method Path Status Handler
projects.list GET /bigquery/v2/projects wired gateway/handlers/projects.go::ProjectList
projects.getServiceAccount GET /bigquery/v2/projects/{projectId}/serviceAccount wired gateway/handlers/projects.go::ProjectGetServiceAccount

[!NOTE] There is no GET /bigquery/v2/projects/{projectId} endpoint in the public API; an early scaffold registered one and was removed. Use projects.list to enumerate projects and resource manager APIs for per-project metadata.

Datasets (bigquery.datasets.*)

Method Path Status Handler
datasets.list GET /bigquery/v2/projects/{projectId}/datasets done gateway/handlers/datasets.go::DatasetList
datasets.insert POST /bigquery/v2/projects/{projectId}/datasets done gateway/handlers/datasets.go::DatasetInsert
datasets.get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId} done gateway/handlers/datasets.go::DatasetGet
datasets.update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId} done gateway/handlers/datasets.go::DatasetUpdate
datasets.patch PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId} done gateway/handlers/datasets.go::DatasetPatch
datasets.delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId} done gateway/handlers/datasets.go::DatasetDelete
datasets.undelete POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}:undelete done gateway/handlers/datasets.go::DatasetUndelete

datasets.undelete semantics: Restores the newest soft-deleted tombstone for the dataset (same 7-day window as table time travel). Restores member tables, views, routines, row-access policies, and gateway REST metadata (labels, friendlyName, …) captured at delete time. Returns 409 (alreadyExists) when a live dataset with the same id already exists. There is no deletedTime selector — only the newest tombstone is restored.

Dataset metadata: REST-only fields (friendlyName, description, labels, defaultCollation, defaultTableExpirationMs, defaultPartitionExpirationMs, defaultRoundingMode, maxTimeTravelHours, isCaseInsensitive, resourceTags, replicas[], access) persist in the gateway MetadataStore and merge on GET/PATCH/UPDATE. creationTime is stamped on insert and kept stable across GETs; lastModifiedTime advances on each write. Cross-region replicas[] is echo-only (no live replication).

Delete with contents: DELETE .../datasets/{datasetId}?deleteContents=true drops the dataset and all tables in the engine catalog and evicts the matching metadata-store entries. Without deleteContents=true, deleting a non-empty dataset returns 400 (failedPrecondition).

Tables (bigquery.tables.*)

Method Path Status Handler
tables.list GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables done gateway/handlers/tables.go::TableList
tables.insert POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables done gateway/handlers/tables.go::TableInsert

Federated / BigLake posture (insert, update, patch): Requests that set biglakeConfiguration, objectTableOptions, or externalDataConfiguration.sourceFormat=OBJECT_TABLE return 501 notImplemented with a pointer to docs/ENGINE_POLICY.md. datasets.insert / datasets.update / datasets.patch with externalDatasetReference (Spanner / Cloud SQL external datasets) return the same envelope. Use fixture-backed EXTERNAL_QUERY or local external tables instead — see docs/guides/external-query.md. | tables.get | GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TableGet | | tables.update | PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TableUpdate | | tables.patch | PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TablePatch | | tables.delete | DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done (captures snapshot for undelete) | gateway/handlers/tables.go::TableDelete | | tables.getIamPolicy | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:getIamPolicy | wired | gateway/handlers/tables.go::TableGetIamPolicy | | tables.setIamPolicy | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:setIamPolicy | wired | gateway/handlers/tables.go::TableSetIamPolicy | | tables.testIamPermissions | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:testIamPermissions | wired | gateway/handlers/tables.go::TableTestIamPermissions |

Table metadata: REST-only fields (friendlyName, description, labels, expirationTime, partitioning/clustering specs, defaultCollation, defaultRoundingMode, caseInsensitive, resourceTags, tableConstraints, view / materializedView definitions including view.useLegacySql, requirePartitionFilter, encryption/external config) persist in MetadataStore. type includes TABLE, VIEW, MATERIALIZED_VIEW, EXTERNAL, and SNAPSHOT (copy-job destinations; see COPY jobs below). creationTime / lastModifiedTime follow the same overlay rules as datasets. Storage stats: numRows is computed from Catalog.ListRows; all byte counters (numBytes, numLongTermBytes, numActiveLogicalBytes, numTotalLogicalBytes, physical-byte fields, numTimeTravelPhysicalBytes) are explicitly stubbed to "0" until the engine exposes byte accounting.

External tables: tables.insert accepts externalDataConfiguration (sourceUris, sourceFormat, schema, csvOptions, …). Supported GCS formats (CSV, NEWLINE_DELIMITED_JSON, PARQUET, …) are fetched via fake-gcs (STORAGE_EMULATOR_HOST / FAKE_GCS_PORT, same as LOAD jobs) and materialized into the engine catalog at insert time; externalDataConfiguration round-trips through the gateway MetadataStore on GET/PATCH/UPDATE. type defaults to EXTERNAL. Bigtable (sourceFormat: BIGTABLE with https://googleapis.com/bigtable/projects/.../instances/.../tables/... URIs) registers metadata-only external tables (zero rows; query behavior is stubbed). Google Sheets (GOOGLE_SHEETS / fixture doc ids) is supported for dev fixtures. Azure Blob and non-Sheets Google Drive URIs return 400 with an explicit unsupported message.

Logical views: CREATE [OR REPLACE] VIEW via jobs.query registers the view in the engine's in-memory catalog and persists the DDL in DuckDBStorage (__bqemu_views). tables.list / tables.get return type=VIEW and view.query (with view.useLegacySql=false for GoogleSQL views). After an engine restart the gateway rehydrates views from storage so tables.get still resolves through Catalog.DescribeTable even though the gateway's in-memory metadata overlay is empty. CREATE MATERIALIZED VIEW DDL is also surfaced on tables.list / tables.get as type=MATERIALIZED_VIEW with materializedView.query (the engine materializes rows into a physical table).

Query-time ephemeral external tables use tableDefinitions on jobs.query and configuration.query (jobs.insert). When the query omits defaultDataset, definitions are registered under internal dataset _bq_external_temp and that dataset is forwarded as default_dataset_id so unqualified table ids in SQL resolve.

Tabledata (bigquery.tabledata.*)

Method Path Status Handler
tabledata.list GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/data done gateway/handlers/tabledata.go::TableDataList
tabledata.insertAll POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/insertAll done gateway/handlers/tabledata.go::TableDataInsertAll

tabledata.list notes: maxResults defaults to 10000 (cap 100000). maxResults=0 returns totalRows/etag with zero rows and no pageToken (same semantics as jobs.getQueryResults). pageToken is a decimal row index. selectedFields projects top-level columns (comma-separated; dotted paths select the top-level STRUCT). formatOptions.useInt64Timestamp=true emits TIMESTAMP cells as JSON int64 microseconds. Logical views have no Parquet backing — use jobs.query for preview; native tables paginate via pageToken.

Jobs (bigquery.jobs.*)

Method Path Status Handler
jobs.list GET /bigquery/v2/projects/{projectId}/jobs done gateway/handlers/jobs.go::JobList
jobs.insert (metadata) POST /bigquery/v2/projects/{projectId}/jobs done (query, LOAD, COPY, EXTRACT) gateway/handlers/jobs.go::JobInsert
jobs.insert (media upload) POST /upload/bigquery/v2/projects/{projectId}/jobs done (multipart + resumable LOAD) gateway/handlers/jobs.go::JobInsertUpload
jobs.insert (resumable chunk) PUT /upload/bigquery/v2/projects/{projectId}/jobs done (resumable LOAD upload) gateway/handlers/jobs.go::JobInsertUpload
jobs.get GET /bigquery/v2/projects/{projectId}/jobs/{jobId} done gateway/handlers/jobs.go::JobGet
jobs.cancel POST /bigquery/v2/projects/{projectId}/jobs/{jobId}/cancel done gateway/handlers/jobs.go::JobCancel
jobs.delete DELETE /bigquery/v2/projects/{projectId}/jobs/{jobId}/delete done gateway/handlers/jobs.go::JobDelete

The literal /delete segment after {jobId} is not a typo — that is the upstream URL template, see jobs.delete reference.

jobs.list filters: stateFilter (repeatable pending/running/done), minCreationTime / maxCreationTime (epoch ms), parentJobId, maxResults (default 50), and opaque pageToken pagination are honored against the gateway job registry. allUsers=true returns 501 (no auth context).

INFORMATION_SCHEMA.JOBS*: queries against `region-*`.INFORMATION_SCHEMA.JOBS(_BY_PROJECT) are rewritten to `{project}`.`_bqemu_jobs`.`JOBS` with rows materialized from the job registry before the engine executes the SQL (gateway/query/info_schema_jobs.go).

COPY / EXTRACT / undelete: configuration.copy copies rows from sourceTable / sourceTables into destinationTable, honoring writeDisposition (WRITE_EMPTY, WRITE_TRUNCATE, WRITE_APPEND). operationType accepts COPY (default), SNAPSHOT, RESTORE, and CLONE (clone billing is not modeled; treated like COPY). SNAPSHOT jobs stamp the destination with type: SNAPSHOT and optional destinationExpirationTime on tables.get; RESTORE recreates a TABLE from a snapshot source. Live sources prefer engine SQL (CREATE TABLE AS SELECT / UNION ALL); snapshot decorators on live tables use FOR SYSTEM_TIME AS OF TIMESTAMP_MILLIS(epoch) in SQL. Deleted-table snapshot decorators (tableId@epoch) and SQL fallbacks use catalog row copy via snapshots.Store. Copy-dataset in the UI is orchestrated as one copy job per table (no single dataset-copy job type). configuration.extract serializes a source table to destinationUris (CSV, NEWLINE_DELIMITED_JSON, optional GZIP) via fake-gcs HTTP upload. Table undelete (python test_undelete_table, node undeleteTable) is a COPY job from a snapshot decorator after tables.delete; there is no separate tables.undelete RPC.

LOAD jobs: configuration.load ingests object data into destinationTable synchronously (job returns state: DONE). Supported sourceFormat values: CSV, NEWLINE_DELIMITED_JSON, AVRO, PARQUET, ORC, and DATASTORE_BACKUP. Supported URI schemes:

Scheme Notes
file:// Local dev paths (preferred for offline ingest)
absolute path Same as file:// without the prefix
gs:// Requires the fake-gcs storage emulator (FAKE_GCS_PORT / STORAGE_EMULATOR_HOST)
s3:// Dev-only when S3_ENDPOINT is set (path-style HTTP GET); otherwise 400 with S3_ENDPOINT guidance

Unsupported in the load path: bare https:// direct fetch and GOOGLE_SHEETS (use external tables instead). Upload variants: POST /upload/.../jobs?uploadType=multipart (job JSON part + file part) and resumable upload (uploadType=resumable init + PUT chunks). See docs/guides/load-jobs.md for UI-oriented examples.

Queries (synchronous query API)

Method Path Status Handler
jobs.query POST /bigquery/v2/projects/{projectId}/queries done (incl. tableDefinitions) gateway/handlers/queries.go::QueryRun
jobs.getQueryResults GET /bigquery/v2/projects/{projectId}/queries/{jobId} wired gateway/handlers/queries.go::QueryGetResults

Models (bigquery.models.*)

BQML has no trained-model store (inference stays UNIMPLEMENTED), but CREATE MODEL DDL registers metadata-only model resources the REST surface can list/get/delete for client-library round-trips.

Method Path Status Handler
models.list GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models done gateway/handlers/models.go::ModelList
models.get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} done gateway/handlers/models.go::ModelGet
models.patch PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} wired gateway/handlers/models.go::ModelPatch
models.delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} done gateway/handlers/models.go::ModelDelete

Routines (bigquery.routines.*)

Routines (UDFs / TVFs / stored procedures) persist in the engine's DuckDBStorage catalog (__bqemu_routines in catalog.duckdb) and surface through the Catalog.ListRoutines / GetRoutine / UpsertRoutine / DeleteRoutine gRPC RPCs. REST insert/get/list/update/delete delegates to those RPCs when the gateway is wired to emulator_main; the in-memory gateway/routines/ store mirrors responses for the synchronous query path and supplies creationTime / lastModifiedTime / etag on catalog-backed reads. routines.list unions catalog and store entries when both are active. CREATE FUNCTION / CREATE PROCEDURE DDL via jobs.query also registers routines and surfaces ddlTargetRoutine on the job statistics envelope.

pythonOptions on routines.get: Python scalar UDFs created with CREATE FUNCTION ... LANGUAGE python OPTIONS (packages=[...], entry_point=...) round-trip pythonOptions.packages and pythonOptions.entryPoint on the REST Routine resource. See docs/guides/python-udfs.md.

Method Path Status Handler
routines.list GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines done gateway/handlers/routines.go::RoutineList
routines.insert POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines done gateway/handlers/routines.go::RoutineInsert
routines.get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} done gateway/handlers/routines.go::RoutineGet
routines.update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} done gateway/handlers/routines.go::RoutineUpdate
routines.delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} done gateway/handlers/routines.go::RoutineDelete

Row-access policies (bigquery.rowAccessPolicies.*)

Row-level access policies persist in the engine catalog (__bqemu_row_access_policies) and round-trip through REST insert/get/list/update/delete.

Method Path Status Handler
rowAccessPolicies.list GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies done gateway/handlers/row_access_policies.go::RowAccessPolicyList
rowAccessPolicies.insert POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies done gateway/handlers/row_access_policies.go::RowAccessPolicyInsert
rowAccessPolicies.get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} done gateway/handlers/row_access_policies.go::RowAccessPolicyGet
rowAccessPolicies.update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} done gateway/handlers/row_access_policies.go::RowAccessPolicyUpdate
rowAccessPolicies.delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} done gateway/handlers/row_access_policies.go::RowAccessPolicyDelete

Migration (bigquerymigration.v2alpha)

Served from the same HTTP listener as the BigQuery v2 surface. The official client libraries (cloud.google.com/go/bigquery/migration/apiv2alpha, google-cloud-bigquery-migration for Python/Node/Java) read BIGQUERY_MIGRATION_EMULATOR_HOST and fall back to BIGQUERY_EMULATOR_HOST. Routes are registered under both v2alpha and v2 (alias parity for client compatibility). Workflow metadata is held in an in-process store (no AST translator or LRO execution).

Method Path Status Handler
migration.workflows.list GET /v2alpha/projects/{projectId}/locations/{location}/workflows (also v2) done gateway/handlers/migration.go::MigrationWorkflowList
migration.workflows.create POST /v2alpha/projects/{projectId}/locations/{location}/workflows (also v2) done gateway/handlers/migration.go::MigrationWorkflowCreate
migration.workflows.get GET /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId} (also v2) done gateway/handlers/migration.go::MigrationWorkflowGet
migration.workflows.delete DELETE /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId} (also v2) done gateway/handlers/migration.go::MigrationWorkflowDelete
migration.workflows.start POST /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId}:start (also v2) done gateway/handlers/migration.go::MigrationWorkflowCustomMethodPOST

Data Transfer Service (bigquerydatatransfer.v1)

Served from the same listener via BIGQUERY_EMULATOR_HOST. No data source catalog or transfer config store exists yet, so the standard list endpoints return the documented empty page, specific-resource gets return 404, and transferConfigs.create returns 501. Both project-scoped and location-scoped variants are wired (client libraries pick whichever the user's API region demands).

Method Path Status Handler
dataSources.list GET /v1/projects/{projectId}/dataSources wired gateway/handlers/data_transfer.go::DataTransferDataSourceList
dataSources.list (regional) GET /v1/projects/{projectId}/locations/{location}/dataSources wired gateway/handlers/data_transfer.go::DataTransferDataSourceList
dataSources.get GET /v1/projects/{projectId}/dataSources/{dataSourceId} wired gateway/handlers/data_transfer.go::DataTransferDataSourceGet
dataSources.get (regional) GET /v1/projects/{projectId}/locations/{location}/dataSources/{dataSourceId} wired gateway/handlers/data_transfer.go::DataTransferDataSourceGet
transferConfigs.list GET /v1/projects/{projectId}/transferConfigs wired gateway/handlers/data_transfer.go::DataTransferConfigList
transferConfigs.list (regional) GET /v1/projects/{projectId}/locations/{location}/transferConfigs wired gateway/handlers/data_transfer.go::DataTransferConfigList
transferConfigs.get GET /v1/projects/{projectId}/transferConfigs/{configId} wired gateway/handlers/data_transfer.go::DataTransferConfigGet
transferConfigs.get (regional) GET /v1/projects/{projectId}/locations/{location}/transferConfigs/{configId} wired gateway/handlers/data_transfer.go::DataTransferConfigGet
transferConfigs.create POST /v1/projects/{projectId}/transferConfigs wired gateway/handlers/data_transfer.go::DataTransferConfigCreate
transferConfigs.create (regional) POST /v1/projects/{projectId}/locations/{location}/transferConfigs wired gateway/handlers/data_transfer.go::DataTransferConfigCreate

Discovery and health

Method Path Status Handler
Discovery doc GET /discovery/v1/apis/bigquery/v2/rest wired gateway/handlers/discovery.go::Discovery
Health (emulator-only) GET / and GET /healthz done gateway/handlers/handlers.go::Health

Routing notes (Go specifics)

Go's net/http ServeMux requires every wildcard path segment to end in }. Several BigQuery endpoints use the AIP-136 custom-method shape with a :operation suffix on the resource, e.g. /datasets/{datasetId}:undelete. The mux can't match :undelete as a literal after a wildcard, so we register the parent route (POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}) and use a tiny handler-level dispatcher in gateway/handlers/handlers.go::dispatchColonOp that reads the trailing :op from the captured wildcard. The same trick is used for the three tables.*IamPolicy*/tables.testIamPermissions endpoints.

Error envelope

All non-2xx responses use BigQuery's documented JSON shape (see error messages doc):

{
  "error": {
    "code": 404,
    "message": "Not Found: Dataset myproject:foo",
    "status": "notFound",
    "errors": [
      {
        "domain": "global",
        "reason": "notFound",
        "message": "Not Found: Dataset myproject:foo"
      }
    ]
  }
}

reason values used by the emulator and recognized by BigQuery clients include notFound, notImplemented, invalid, invalidQuery, duplicate, quotaExceeded, accessDenied, stopped. These are a subset of the table in the upstream error messages doc.

SQL dialect

BigQuery's wire field useLegacySql defaults to true (legacy SQL). The emulator executes GoogleSQL via the engine (GoogleSQL's analyzer feeding the local execution coordinator). When useLegacySql=true, the gateway transpiles a narrow subset of legacy SQL used by thirdparty samples—bracket table references such as [project:dataset.table] and [dataset.table]—into GoogleSQL backtick form before forwarding; use_legacy_sql is always cleared on the engine RPC. Full legacy SQL dialect (functions, #legacy, JOIN variants, etc.) is not supported; unsupported constructs return HTTP 400 with reason: invalidQuery.

  • Treats useLegacySql unset or false as GoogleSQL (the common case).
  • Translates bracket table refs when useLegacySql=true, then runs GoogleSQL on the engine.

Clients that default to legacy via older library versions may still set useLegacySql=true for bracket-style samples; for new queries prefer useLegacySql=false (GoogleSQL).

Type wire encoding

For result marshaling, types follow StandardSqlDataType.TypeKind:

TypeKind Wire encoding
INT64 decimal string
BOOL JSON boolean
FLOAT64 JSON number, or string "NaN"/"Infinity"/"-Infinity"
STRING JSON string
BYTES base64 string (RFC 4648 §4)
TIMESTAMP RFC 3339 with mandatory Z (e.g. 1985-04-12T23:20:50.52Z)
DATE RFC 3339 full-date (1985-04-12)
TIME RFC 3339 partial-time (23:20:50.52)
DATETIME RFC 3339 full-date T partial-time (1985-04-12T23:20:50.52)
GEOGRAPHY WKT
NUMERIC / BIGNUMERIC decimal string
JSON string-encoded JSON
ARRAY list with element type per arrayElementType
STRUCT list (positional) with element types per structType.fields[i]
RANGE<T> pair [start, end) formatted with the inner type

f/v rows are always strings/objects/arrays at the wire level — even INT64 arrives as a decimal string. The query-execution row marshalers should never emit numeric types as JSON numbers except for FLOAT64.

Storage Read API

The Storage Read API surface implements google.cloud.bigquery.storage.v1.BigQueryRead. The gRPC service surface and the Avro/Arrow type tables are documented in Storage Read API and the per-method RPC reference under Storage Read RPC.

Transport: gRPC-only, served by the engine

The public BigQuery Storage Read API is gRPC-only in production — the REST surface BigQuery exposes does not proxy it, and the google-cloud-bigquery-storage client libraries (Go's cloud.google.com/go/bigquery/storage, Python's google-cloud-bigquery-storage, Java's google-cloud-bigquerystorage) all open a separate gRPC channel to bigquerystorage.googleapis.com:443. The emulator follows the same shape:

  • The REST gateway (gateway_main, default :9050) does not expose any bigquery_emulator.v1.StorageRead surface. Plan 39 intentionally keeps the gateway focused on the REST-only halves of the public API (projects, datasets, tables, jobs, tabledata.list, jobs.query, tabledata.insertAll).
  • The C++ engine (emulator_main, default :9060 via --grpc_port on gateway_main / --host_port on emulator_main) serves bigquery_emulator.v1.StorageRead directly on its gRPC port. task emulator:run-full exposes both ports, so point a Storage Read client at the engine port (localhost:9060) rather than the gateway.

For programmatic tests, the gateway/e2e/storage_read_test.go harness gates a BigQuery Storage client off the gateway's engine.Client channel (the same connection the gateway uses internally for Catalog and Query). See gateway/e2e/catalog_test.go::startEmulatorWithFlags for the full plumbing.

Supported ReadOptions

  • row_restriction: a single <column> = <literal> equality clause. Literals support INT64 (id = 42), BOOL (active = true, case-insensitive), and STRING (name = 'ada', with the SQL '' escape for embedded apostrophes). Backtick-quoted column names (`id` = 42) round-trip; bare identifiers are limited to [A-Za-z_][A-Za-z0-9_]*. Anything more complex (range / inequality ops, connectives, IN, NULL, ARRAY / STRUCT columns, FLOAT64 / DATE / NUMERIC literals) is rejected at CreateReadSession time with INVALID_ARGUMENT — the gateway surfaces that as the public Storage Read 400 envelope.

Pushdown shape: * Memory backend: the predicate filters the row vector in C++ before offset / row_limit slicing. * DuckDB backend: the predicate becomes a WHERE clause on the read_parquet(...) scan, so DuckDB filters before materializing rows.

  • selected_fields: accepted and echoed on the ReadSession reply, but not enforced — every column is returned regardless. Pushing projection into the storage layer is deferred to a future plan.

Persistence and --data-dir

The C++ engine persists catalog state under a directory passed as --data-dir on gateway_main (forwarded to emulator_main as --data_dir). A healthy initialized tree contains:

  • catalog.duckdb — DuckDB catalog (views, routines, metadata tables)
  • <project>/<dataset>/*.meta.json — table sidecars
  • <project>/<dataset>/*.parquet — row data for materialized tables

View definitions created through any path (CREATE VIEW DDL, tables.insert with a view.query body) are registered in the engine catalog and persisted to catalog.duckdb; they rehydrate on restart when the same --data-dir is mounted.

Migrating from recidiviz / goccy --database

The widely-deployed recidiviz fork (and goccy/bigquery-emulator) used a single SQLite file:

# old compose (recidiviz fork)
command: ["--database=/opt/x.db"]
volumes:
  - bq-data:/opt

This emulator expects a directory:

# this emulator
command: ["--data-dir=/opt"]
volumes:
  - bq-data:/opt
Old flag New flag Notes
--database=/opt/x.db --data-dir=/opt Parent directory of the old file
(same volume mount) (same volume mount) Re-use the volume; do not point --data-dir at the .db file itself

gateway_main still accepts --database for compatibility: it maps to --data-dir=<parent> and logs a deprecation warning. Data in the old single-file SQLite format is not automatically loaded. If --data-dir contains orphaned *.db files but no catalog.duckdb, the gateway warns at startup rather than silently starting empty.

Operators migrating a live volume should either re-seed from YAML (--seed-data-file) or confirm catalog.duckdb exists after the first successful run under --data-dir.

Authentication posture

The emulator follows cloud-spanner-emulator's posture: it parses but ignores bearer tokens, and the BIGQUERY_EMULATOR_HOST environment variable is the canonical client-library override (mirroring STORAGE_EMULATOR_HOST and SPANNER_EMULATOR_HOST). Concretely, code that targets BigQuery normally:

client, err := bigquery.NewClient(ctx, "test-project")

is redirected at the emulator with either:

client, err := bigquery.NewClient(ctx, "test-project",
    option.WithEndpoint("http://localhost:9050"),
    option.WithoutAuthentication(),
)

or by setting BIGQUERY_EMULATOR_HOST=localhost:9050 in the environment. The README's Client libraries documents both forms for end users; this file documents the server-side posture.

Every request passes through gateway/middleware/auth.go::WithAuth, which:

  • Parses the Authorization header when present (RFC 6750 Bearer tokens have the scheme stripped; other schemes are stored verbatim).
  • Attaches a synthetic Principal to the request context with Email = "emulator@bigquery.local", regardless of what the client sent.
  • Never short-circuits the response — well-formed, malformed, and absent Authorization headers are all served identically. The emulator never returns 401.

Handlers that need to know whether the client tried to authenticate read the principal via middleware.PrincipalFromContext and inspect the Anonymous and Bearer fields.

The full upstream auth model (ADC, service-account keys, IAM scopes) is documented under BigQuery authentication and is intentionally not modeled by the emulator.