BigQuery v2 REST API surface¶
This is the emulator's canonical mapping from the public BigQuery v2 REST API to the Go handler that backs each endpoint, derived from the upstream documentation under BigQuery REST v2 reference.
The goal of this document is operational: when you're staring at a client
library that's failing against the emulator, you want to know exactly
which file to open. Keep this in sync with gateway/server.go and the
gateway-HTTP-surface section of ROADMAP.md.
Status legend:
done= end-to-end implemented ·wired= route registered, returns a structurally-valid stub or 501 ·todo= not yet in the route table
Method summary¶
The emulator targets the BigQuery REST surface served at
https://bigquery.googleapis.com/bigquery/v2/.... Path templates here
omit the host and use {x} for path variables.
Projects (bigquery.projects.*)¶
| Method | Path | Status | Handler |
|---|---|---|---|
projects.list |
GET /bigquery/v2/projects |
wired | gateway/handlers/projects.go::ProjectList |
projects.getServiceAccount |
GET /bigquery/v2/projects/{projectId}/serviceAccount |
wired | gateway/handlers/projects.go::ProjectGetServiceAccount |
[!NOTE] There is no
GET /bigquery/v2/projects/{projectId}endpoint in the public API; an early scaffold registered one and was removed. Useprojects.listto enumerate projects and resource manager APIs for per-project metadata.
Datasets (bigquery.datasets.*)¶
| Method | Path | Status | Handler |
|---|---|---|---|
datasets.list |
GET /bigquery/v2/projects/{projectId}/datasets |
done | gateway/handlers/datasets.go::DatasetList |
datasets.insert |
POST /bigquery/v2/projects/{projectId}/datasets |
done | gateway/handlers/datasets.go::DatasetInsert |
datasets.get |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId} |
done | gateway/handlers/datasets.go::DatasetGet |
datasets.update |
PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId} |
done | gateway/handlers/datasets.go::DatasetUpdate |
datasets.patch |
PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId} |
done | gateway/handlers/datasets.go::DatasetPatch |
datasets.delete |
DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId} |
done | gateway/handlers/datasets.go::DatasetDelete |
datasets.undelete |
POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}:undelete |
done | gateway/handlers/datasets.go::DatasetUndelete |
datasets.undelete semantics: Restores the newest soft-deleted tombstone
for the dataset (same 7-day window as table time travel). Restores member
tables, views, routines, row-access policies, and gateway REST metadata
(labels, friendlyName, …) captured at delete time. Returns 409
(alreadyExists) when a live dataset with the same id already exists.
There is no deletedTime selector — only the newest tombstone is restored.
Dataset metadata: REST-only fields (friendlyName, description,
labels, defaultCollation, defaultTableExpirationMs,
defaultPartitionExpirationMs, defaultRoundingMode,
maxTimeTravelHours, isCaseInsensitive, resourceTags, replicas[],
access) persist in the gateway MetadataStore and
merge on GET/PATCH/UPDATE. creationTime is stamped on insert and kept
stable across GETs; lastModifiedTime advances on each write. Cross-region
replicas[] is echo-only (no live replication).
Delete with contents: DELETE .../datasets/{datasetId}?deleteContents=true
drops the dataset and all tables in the engine catalog and evicts the
matching metadata-store entries. Without deleteContents=true, deleting a
non-empty dataset returns 400 (failedPrecondition).
Tables (bigquery.tables.*)¶
| Method | Path | Status | Handler |
|---|---|---|---|
tables.list |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables |
done | gateway/handlers/tables.go::TableList |
tables.insert |
POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables |
done | gateway/handlers/tables.go::TableInsert |
Federated / BigLake posture (insert, update, patch): Requests that set
biglakeConfiguration, objectTableOptions, or
externalDataConfiguration.sourceFormat=OBJECT_TABLE return 501
notImplemented with a pointer to docs/ENGINE_POLICY.md.
datasets.insert / datasets.update / datasets.patch with
externalDatasetReference (Spanner / Cloud SQL external datasets) return the
same envelope. Use fixture-backed EXTERNAL_QUERY or local external tables
instead — see docs/guides/external-query.md.
| tables.get | GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TableGet |
| tables.update | PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TableUpdate |
| tables.patch | PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done | gateway/handlers/tables.go::TablePatch |
| tables.delete | DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId} | done (captures snapshot for undelete) | gateway/handlers/tables.go::TableDelete |
| tables.getIamPolicy | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:getIamPolicy | wired | gateway/handlers/tables.go::TableGetIamPolicy |
| tables.setIamPolicy | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:setIamPolicy | wired | gateway/handlers/tables.go::TableSetIamPolicy |
| tables.testIamPermissions | POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}:testIamPermissions | wired | gateway/handlers/tables.go::TableTestIamPermissions |
Table metadata: REST-only fields (friendlyName, description,
labels, expirationTime, partitioning/clustering specs,
defaultCollation, defaultRoundingMode, caseInsensitive,
resourceTags, tableConstraints, view / materializedView
definitions including view.useLegacySql, requirePartitionFilter,
encryption/external config) persist in MetadataStore.
type includes TABLE, VIEW, MATERIALIZED_VIEW, EXTERNAL, and
SNAPSHOT (copy-job destinations; see COPY jobs below).
creationTime / lastModifiedTime follow the same overlay rules as
datasets. Storage stats: numRows is computed from
Catalog.ListRows; all byte counters (numBytes, numLongTermBytes,
numActiveLogicalBytes, numTotalLogicalBytes, physical-byte fields,
numTimeTravelPhysicalBytes) are explicitly stubbed to "0" until the
engine exposes byte accounting.
External tables: tables.insert accepts
externalDataConfiguration (sourceUris, sourceFormat, schema,
csvOptions, …). Supported GCS formats (CSV, NEWLINE_DELIMITED_JSON,
PARQUET, …) are fetched via fake-gcs (STORAGE_EMULATOR_HOST /
FAKE_GCS_PORT, same as LOAD jobs) and materialized into the engine
catalog at insert time; externalDataConfiguration round-trips through
the gateway MetadataStore on GET/PATCH/UPDATE.
type defaults to EXTERNAL. Bigtable (sourceFormat: BIGTABLE
with https://googleapis.com/bigtable/projects/.../instances/.../tables/...
URIs) registers metadata-only external tables (zero rows; query behavior
is stubbed). Google Sheets (GOOGLE_SHEETS / fixture doc ids) is
supported for dev fixtures. Azure Blob and non-Sheets Google Drive
URIs return 400 with an explicit unsupported message.
Logical views: CREATE [OR REPLACE] VIEW via jobs.query registers
the view in the engine's in-memory catalog and persists the DDL in
DuckDBStorage (__bqemu_views). tables.list / tables.get
return type=VIEW and view.query (with view.useLegacySql=false for
GoogleSQL views). After an engine restart the gateway rehydrates views
from storage so tables.get still resolves through
Catalog.DescribeTable even though the gateway's in-memory metadata
overlay is empty. CREATE MATERIALIZED VIEW DDL is also surfaced on
tables.list / tables.get as type=MATERIALIZED_VIEW with
materializedView.query (the engine materializes rows into a physical
table).
Query-time ephemeral external tables use tableDefinitions on
jobs.query and configuration.query (jobs.insert). When the query
omits defaultDataset, definitions are registered under internal dataset
_bq_external_temp and that dataset is forwarded as
default_dataset_id so unqualified table ids in SQL resolve.
Tabledata (bigquery.tabledata.*)¶
| Method | Path | Status | Handler |
|---|---|---|---|
tabledata.list |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/data |
done | gateway/handlers/tabledata.go::TableDataList |
tabledata.insertAll |
POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/insertAll |
done | gateway/handlers/tabledata.go::TableDataInsertAll |
tabledata.list notes: maxResults defaults to 10000 (cap 100000).
maxResults=0 returns totalRows/etag with zero rows and no pageToken
(same semantics as jobs.getQueryResults). pageToken is a decimal row index.
selectedFields projects top-level columns (comma-separated; dotted paths
select the top-level STRUCT). formatOptions.useInt64Timestamp=true emits
TIMESTAMP cells as JSON int64 microseconds. Logical views have no Parquet
backing — use jobs.query for preview; native tables paginate via
pageToken.
Jobs (bigquery.jobs.*)¶
| Method | Path | Status | Handler |
|---|---|---|---|
jobs.list |
GET /bigquery/v2/projects/{projectId}/jobs |
done | gateway/handlers/jobs.go::JobList |
jobs.insert (metadata) |
POST /bigquery/v2/projects/{projectId}/jobs |
done (query, LOAD, COPY, EXTRACT) | gateway/handlers/jobs.go::JobInsert |
jobs.insert (media upload) |
POST /upload/bigquery/v2/projects/{projectId}/jobs |
done (multipart + resumable LOAD) | gateway/handlers/jobs.go::JobInsertUpload |
jobs.insert (resumable chunk) |
PUT /upload/bigquery/v2/projects/{projectId}/jobs |
done (resumable LOAD upload) | gateway/handlers/jobs.go::JobInsertUpload |
jobs.get |
GET /bigquery/v2/projects/{projectId}/jobs/{jobId} |
done | gateway/handlers/jobs.go::JobGet |
jobs.cancel |
POST /bigquery/v2/projects/{projectId}/jobs/{jobId}/cancel |
done | gateway/handlers/jobs.go::JobCancel |
jobs.delete |
DELETE /bigquery/v2/projects/{projectId}/jobs/{jobId}/delete |
done | gateway/handlers/jobs.go::JobDelete |
The literal /delete segment after {jobId} is not a typo — that is
the upstream URL template, see
jobs.delete reference.
jobs.list filters: stateFilter (repeatable pending/running/done),
minCreationTime / maxCreationTime (epoch ms), parentJobId,
maxResults (default 50), and opaque pageToken pagination are honored
against the gateway job registry. allUsers=true returns 501 (no auth
context).
INFORMATION_SCHEMA.JOBS*: queries against `region-*`.INFORMATION_SCHEMA.JOBS(_BY_PROJECT)
are rewritten to `{project}`.`_bqemu_jobs`.`JOBS` with rows
materialized from the job registry before the engine executes the SQL
(gateway/query/info_schema_jobs.go).
COPY / EXTRACT / undelete: configuration.copy copies rows
from sourceTable / sourceTables into destinationTable, honoring
writeDisposition (WRITE_EMPTY, WRITE_TRUNCATE, WRITE_APPEND).
operationType accepts COPY (default), SNAPSHOT, RESTORE, and
CLONE (clone billing is not modeled; treated like COPY). SNAPSHOT
jobs stamp the destination with type: SNAPSHOT and optional
destinationExpirationTime on tables.get; RESTORE recreates a
TABLE from a snapshot source. Live sources prefer engine SQL
(CREATE TABLE AS SELECT / UNION ALL); snapshot decorators on live
tables use FOR SYSTEM_TIME AS OF TIMESTAMP_MILLIS(epoch) in SQL.
Deleted-table snapshot decorators (tableId@epoch) and SQL fallbacks
use catalog row copy via snapshots.Store. Copy-dataset in the UI is
orchestrated as one copy job per table (no single dataset-copy job type).
configuration.extract serializes a source table to destinationUris
(CSV, NEWLINE_DELIMITED_JSON, optional GZIP) via fake-gcs HTTP upload.
Table undelete (python test_undelete_table, node undeleteTable) is a
COPY job from a snapshot decorator after tables.delete; there is no
separate tables.undelete RPC.
LOAD jobs: configuration.load ingests object data into
destinationTable synchronously (job returns state: DONE). Supported
sourceFormat values: CSV, NEWLINE_DELIMITED_JSON, AVRO, PARQUET,
ORC, and DATASTORE_BACKUP. Supported URI schemes:
| Scheme | Notes |
|---|---|
file:// |
Local dev paths (preferred for offline ingest) |
| absolute path | Same as file:// without the prefix |
gs:// |
Requires the fake-gcs storage emulator (FAKE_GCS_PORT / STORAGE_EMULATOR_HOST) |
s3:// |
Dev-only when S3_ENDPOINT is set (path-style HTTP GET); otherwise 400 with S3_ENDPOINT guidance |
Unsupported in the load path: bare https:// direct fetch and
GOOGLE_SHEETS (use external tables instead). Upload variants:
POST /upload/.../jobs?uploadType=multipart (job JSON part + file part) and
resumable upload (uploadType=resumable init + PUT chunks). See
docs/guides/load-jobs.md for UI-oriented examples.
Queries (synchronous query API)¶
| Method | Path | Status | Handler |
|---|---|---|---|
jobs.query |
POST /bigquery/v2/projects/{projectId}/queries |
done (incl. tableDefinitions) |
gateway/handlers/queries.go::QueryRun |
jobs.getQueryResults |
GET /bigquery/v2/projects/{projectId}/queries/{jobId} |
wired | gateway/handlers/queries.go::QueryGetResults |
Models (bigquery.models.*)¶
BQML has no trained-model store (inference stays UNIMPLEMENTED), but
CREATE MODEL DDL registers metadata-only model resources the REST
surface can list/get/delete for client-library round-trips.
| Method | Path | Status | Handler |
|---|---|---|---|
models.list |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models |
done | gateway/handlers/models.go::ModelList |
models.get |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} |
done | gateway/handlers/models.go::ModelGet |
models.patch |
PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} |
wired | gateway/handlers/models.go::ModelPatch |
models.delete |
DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/models/{modelId} |
done | gateway/handlers/models.go::ModelDelete |
Routines (bigquery.routines.*)¶
Routines (UDFs / TVFs / stored procedures) persist in the engine's
DuckDBStorage catalog (__bqemu_routines in catalog.duckdb) and
surface through the Catalog.ListRoutines / GetRoutine /
UpsertRoutine / DeleteRoutine gRPC RPCs. REST
insert/get/list/update/delete delegates to those RPCs when the gateway
is wired to emulator_main; the in-memory gateway/routines/
store mirrors responses for the synchronous query path and supplies
creationTime / lastModifiedTime / etag on catalog-backed reads.
routines.list unions catalog and store entries when both are active.
CREATE FUNCTION / CREATE PROCEDURE DDL via jobs.query also
registers routines and surfaces ddlTargetRoutine on the job
statistics envelope.
pythonOptions on routines.get: Python scalar UDFs created with
CREATE FUNCTION ... LANGUAGE python OPTIONS (packages=[...], entry_point=...)
round-trip pythonOptions.packages and pythonOptions.entryPoint on the
REST Routine resource. See docs/guides/python-udfs.md.
| Method | Path | Status | Handler |
|---|---|---|---|
routines.list |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines |
done | gateway/handlers/routines.go::RoutineList |
routines.insert |
POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines |
done | gateway/handlers/routines.go::RoutineInsert |
routines.get |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} |
done | gateway/handlers/routines.go::RoutineGet |
routines.update |
PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} |
done | gateway/handlers/routines.go::RoutineUpdate |
routines.delete |
DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/routines/{routineId} |
done | gateway/handlers/routines.go::RoutineDelete |
Row-access policies (bigquery.rowAccessPolicies.*)¶
Row-level access policies persist in the engine catalog
(__bqemu_row_access_policies) and round-trip through REST insert/get/list/update/delete.
| Method | Path | Status | Handler |
|---|---|---|---|
rowAccessPolicies.list |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies |
done | gateway/handlers/row_access_policies.go::RowAccessPolicyList |
rowAccessPolicies.insert |
POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies |
done | gateway/handlers/row_access_policies.go::RowAccessPolicyInsert |
rowAccessPolicies.get |
GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} |
done | gateway/handlers/row_access_policies.go::RowAccessPolicyGet |
rowAccessPolicies.update |
PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} |
done | gateway/handlers/row_access_policies.go::RowAccessPolicyUpdate |
rowAccessPolicies.delete |
DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}/rowAccessPolicies/{policyId} |
done | gateway/handlers/row_access_policies.go::RowAccessPolicyDelete |
Migration (bigquerymigration.v2alpha)¶
Served from the same HTTP listener as the BigQuery v2 surface. The
official client libraries
(cloud.google.com/go/bigquery/migration/apiv2alpha,
google-cloud-bigquery-migration for Python/Node/Java) read
BIGQUERY_MIGRATION_EMULATOR_HOST and fall back to
BIGQUERY_EMULATOR_HOST. Routes are registered under both v2alpha
and v2 (alias parity for client compatibility). Workflow metadata is
held in an in-process store (no AST translator or LRO execution).
| Method | Path | Status | Handler |
|---|---|---|---|
migration.workflows.list |
GET /v2alpha/projects/{projectId}/locations/{location}/workflows (also v2) |
done | gateway/handlers/migration.go::MigrationWorkflowList |
migration.workflows.create |
POST /v2alpha/projects/{projectId}/locations/{location}/workflows (also v2) |
done | gateway/handlers/migration.go::MigrationWorkflowCreate |
migration.workflows.get |
GET /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId} (also v2) |
done | gateway/handlers/migration.go::MigrationWorkflowGet |
migration.workflows.delete |
DELETE /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId} (also v2) |
done | gateway/handlers/migration.go::MigrationWorkflowDelete |
migration.workflows.start |
POST /v2alpha/projects/{projectId}/locations/{location}/workflows/{workflowId}:start (also v2) |
done | gateway/handlers/migration.go::MigrationWorkflowCustomMethodPOST |
Data Transfer Service (bigquerydatatransfer.v1)¶
Served from the same listener via BIGQUERY_EMULATOR_HOST. No data
source catalog or transfer config store exists yet, so the standard
list endpoints return the documented empty page, specific-resource
gets return 404, and transferConfigs.create returns 501. Both
project-scoped and location-scoped variants are wired (client
libraries pick whichever the user's API region demands).
| Method | Path | Status | Handler |
|---|---|---|---|
dataSources.list |
GET /v1/projects/{projectId}/dataSources |
wired | gateway/handlers/data_transfer.go::DataTransferDataSourceList |
dataSources.list (regional) |
GET /v1/projects/{projectId}/locations/{location}/dataSources |
wired | gateway/handlers/data_transfer.go::DataTransferDataSourceList |
dataSources.get |
GET /v1/projects/{projectId}/dataSources/{dataSourceId} |
wired | gateway/handlers/data_transfer.go::DataTransferDataSourceGet |
dataSources.get (regional) |
GET /v1/projects/{projectId}/locations/{location}/dataSources/{dataSourceId} |
wired | gateway/handlers/data_transfer.go::DataTransferDataSourceGet |
transferConfigs.list |
GET /v1/projects/{projectId}/transferConfigs |
wired | gateway/handlers/data_transfer.go::DataTransferConfigList |
transferConfigs.list (regional) |
GET /v1/projects/{projectId}/locations/{location}/transferConfigs |
wired | gateway/handlers/data_transfer.go::DataTransferConfigList |
transferConfigs.get |
GET /v1/projects/{projectId}/transferConfigs/{configId} |
wired | gateway/handlers/data_transfer.go::DataTransferConfigGet |
transferConfigs.get (regional) |
GET /v1/projects/{projectId}/locations/{location}/transferConfigs/{configId} |
wired | gateway/handlers/data_transfer.go::DataTransferConfigGet |
transferConfigs.create |
POST /v1/projects/{projectId}/transferConfigs |
wired | gateway/handlers/data_transfer.go::DataTransferConfigCreate |
transferConfigs.create (regional) |
POST /v1/projects/{projectId}/locations/{location}/transferConfigs |
wired | gateway/handlers/data_transfer.go::DataTransferConfigCreate |
Discovery and health¶
| Method | Path | Status | Handler |
|---|---|---|---|
| Discovery doc | GET /discovery/v1/apis/bigquery/v2/rest |
wired | gateway/handlers/discovery.go::Discovery |
| Health (emulator-only) | GET / and GET /healthz |
done | gateway/handlers/handlers.go::Health |
Routing notes (Go specifics)¶
Go's net/http ServeMux requires every wildcard path segment to end in
}. Several BigQuery endpoints use the AIP-136 custom-method
shape with a :operation suffix on the resource, e.g.
/datasets/{datasetId}:undelete. The mux can't match :undelete as a
literal after a wildcard, so we register the parent route
(POST /bigquery/v2/projects/{projectId}/datasets/{datasetId}) and use
a tiny handler-level dispatcher in
gateway/handlers/handlers.go::dispatchColonOp that reads
the trailing :op from the captured wildcard. The same trick is used
for the three tables.*IamPolicy*/tables.testIamPermissions
endpoints.
Error envelope¶
All non-2xx responses use BigQuery's documented JSON shape (see error messages doc):
{
"error": {
"code": 404,
"message": "Not Found: Dataset myproject:foo",
"status": "notFound",
"errors": [
{
"domain": "global",
"reason": "notFound",
"message": "Not Found: Dataset myproject:foo"
}
]
}
}
reason values used by the emulator and recognized by BigQuery clients
include notFound, notImplemented, invalid, invalidQuery,
duplicate, quotaExceeded, accessDenied, stopped. These are a
subset of the table in the upstream
error messages doc.
SQL dialect¶
BigQuery's wire field useLegacySql defaults to true (legacy SQL).
The emulator executes GoogleSQL via the engine (GoogleSQL's analyzer
feeding the local execution coordinator). When useLegacySql=true,
the gateway transpiles a narrow subset of legacy SQL used by
thirdparty samples—bracket table references such as
[project:dataset.table] and [dataset.table]—into GoogleSQL
backtick form before forwarding; use_legacy_sql is always cleared on
the engine RPC. Full legacy SQL dialect (functions, #legacy, JOIN
variants, etc.) is not supported; unsupported constructs return HTTP
400 with reason: invalidQuery.
- Treats
useLegacySqlunset orfalseas GoogleSQL (the common case). - Translates bracket table refs when
useLegacySql=true, then runs GoogleSQL on the engine.
Clients that default to legacy via older library versions may still set
useLegacySql=true for bracket-style samples; for new queries prefer
useLegacySql=false (GoogleSQL).
Type wire encoding¶
For result marshaling, types follow
StandardSqlDataType.TypeKind:
| TypeKind | Wire encoding |
|---|---|
INT64 |
decimal string |
BOOL |
JSON boolean |
FLOAT64 |
JSON number, or string "NaN"/"Infinity"/"-Infinity" |
STRING |
JSON string |
BYTES |
base64 string (RFC 4648 §4) |
TIMESTAMP |
RFC 3339 with mandatory Z (e.g. 1985-04-12T23:20:50.52Z) |
DATE |
RFC 3339 full-date (1985-04-12) |
TIME |
RFC 3339 partial-time (23:20:50.52) |
DATETIME |
RFC 3339 full-date T partial-time (1985-04-12T23:20:50.52) |
GEOGRAPHY |
WKT |
NUMERIC / BIGNUMERIC |
decimal string |
JSON |
string-encoded JSON |
ARRAY |
list with element type per arrayElementType |
STRUCT |
list (positional) with element types per structType.fields[i] |
RANGE<T> |
pair [start, end) formatted with the inner type |
f/v rows are always strings/objects/arrays at the wire level — even
INT64 arrives as a decimal string. The query-execution row marshalers
should never emit numeric types as JSON numbers except for FLOAT64.
Storage Read API¶
The Storage Read API surface implements
google.cloud.bigquery.storage.v1.BigQueryRead. The
gRPC service surface and the Avro/Arrow type tables are documented in
Storage Read API and the per-method
RPC reference under Storage Read RPC.
Transport: gRPC-only, served by the engine¶
The public BigQuery Storage Read API is gRPC-only in production —
the REST surface BigQuery exposes does not proxy it, and the
google-cloud-bigquery-storage client libraries (Go's
cloud.google.com/go/bigquery/storage, Python's
google-cloud-bigquery-storage, Java's
google-cloud-bigquerystorage) all open a separate gRPC channel to
bigquerystorage.googleapis.com:443. The emulator follows the same
shape:
- The REST gateway (
gateway_main, default:9050) does not expose anybigquery_emulator.v1.StorageReadsurface. Plan 39 intentionally keeps the gateway focused on the REST-only halves of the public API (projects,datasets,tables,jobs,tabledata.list,jobs.query,tabledata.insertAll). - The C++ engine (
emulator_main, default:9060via--grpc_portongateway_main/--host_portonemulator_main) servesbigquery_emulator.v1.StorageReaddirectly on its gRPC port.task emulator:run-fullexposes both ports, so point a Storage Read client at the engine port (localhost:9060) rather than the gateway.
For programmatic tests, the
gateway/e2e/storage_read_test.go harness gates a BigQuery Storage
client off the gateway's engine.Client channel (the same connection
the gateway uses internally for Catalog and Query). See
gateway/e2e/catalog_test.go::startEmulatorWithFlags for the full
plumbing.
Supported ReadOptions¶
row_restriction: a single<column> = <literal>equality clause. Literals support INT64 (id = 42), BOOL (active = true, case-insensitive), and STRING (name = 'ada', with the SQL''escape for embedded apostrophes). Backtick-quoted column names (`id` = 42) round-trip; bare identifiers are limited to[A-Za-z_][A-Za-z0-9_]*. Anything more complex (range / inequality ops, connectives, IN, NULL, ARRAY / STRUCT columns, FLOAT64 / DATE / NUMERIC literals) is rejected atCreateReadSessiontime withINVALID_ARGUMENT— the gateway surfaces that as the public Storage Read 400 envelope.
Pushdown shape:
* Memory backend: the predicate filters the row vector in C++
before offset / row_limit slicing.
* DuckDB backend: the predicate becomes a WHERE clause on the
read_parquet(...) scan, so DuckDB filters before materializing
rows.
selected_fields: accepted and echoed on theReadSessionreply, but not enforced — every column is returned regardless. Pushing projection into the storage layer is deferred to a future plan.
Persistence and --data-dir¶
The C++ engine persists catalog state under a directory passed as
--data-dir on gateway_main (forwarded to emulator_main as
--data_dir). A healthy initialized tree contains:
catalog.duckdb— DuckDB catalog (views, routines, metadata tables)<project>/<dataset>/*.meta.json— table sidecars<project>/<dataset>/*.parquet— row data for materialized tables
View definitions created through any path (CREATE VIEW DDL,
tables.insert with a view.query body) are registered in the engine
catalog and persisted to catalog.duckdb; they rehydrate on restart
when the same --data-dir is mounted.
Migrating from recidiviz / goccy --database¶
The widely-deployed recidiviz fork (and goccy/bigquery-emulator) used a single SQLite file:
# old compose (recidiviz fork)
command: ["--database=/opt/x.db"]
volumes:
- bq-data:/opt
This emulator expects a directory:
# this emulator
command: ["--data-dir=/opt"]
volumes:
- bq-data:/opt
| Old flag | New flag | Notes |
|---|---|---|
--database=/opt/x.db |
--data-dir=/opt |
Parent directory of the old file |
| (same volume mount) | (same volume mount) | Re-use the volume; do not point --data-dir at the .db file itself |
gateway_main still accepts --database for compatibility: it maps to
--data-dir=<parent> and logs a deprecation warning. Data in the old
single-file SQLite format is not automatically loaded. If
--data-dir contains orphaned *.db files but no catalog.duckdb, the
gateway warns at startup rather than silently starting empty.
Operators migrating a live volume should either re-seed from YAML
(--seed-data-file) or confirm catalog.duckdb exists after the first
successful run under --data-dir.
Authentication posture¶
The emulator follows cloud-spanner-emulator's posture: it parses but
ignores bearer tokens, and the BIGQUERY_EMULATOR_HOST environment
variable is the canonical client-library override (mirroring
STORAGE_EMULATOR_HOST and SPANNER_EMULATOR_HOST). Concretely, code
that targets BigQuery normally:
client, err := bigquery.NewClient(ctx, "test-project")
is redirected at the emulator with either:
client, err := bigquery.NewClient(ctx, "test-project",
option.WithEndpoint("http://localhost:9050"),
option.WithoutAuthentication(),
)
or by setting BIGQUERY_EMULATOR_HOST=localhost:9050 in the
environment. The README's
Client libraries
documents both forms for end users; this file documents the server-side
posture.
Every request passes through
gateway/middleware/auth.go::WithAuth, which:
- Parses the
Authorizationheader when present (RFC 6750Bearertokens have the scheme stripped; other schemes are stored verbatim). - Attaches a synthetic
Principalto the request context withEmail = "emulator@bigquery.local", regardless of what the client sent. - Never short-circuits the response — well-formed, malformed, and
absent
Authorizationheaders are all served identically. The emulator never returns 401.
Handlers that need to know whether the client tried to authenticate
read the principal via middleware.PrincipalFromContext and inspect
the Anonymous and Bearer fields.
The full upstream auth model (ADC, service-account keys, IAM scopes) is documented under BigQuery authentication and is intentionally not modeled by the emulator.