This document describes how to create and use clustered tables inBigQuery. For an overview of clustered table support inBigQuery, see Introduction to clustered tables.
Create clustered tables
You can create a clustered table by using the following methods:
Create a table from a query result:
- Run a DDLCREATE TABLE AS SELECTstatement.
- Run a query that creates aclustered destination table.
Use a DDL
CREATE TABLE
statement with aCLUSTER BY
clause containinga clustering_column_list.Run the bq command-line tool
bq mk
command.Make calls to the tables.insertAPI method.
Load data into BigQuery.
Use the client libraries.
Table naming
When you create a table in BigQuery, the table name mustbe unique per dataset. The table name can:
- Contain characters with a total of up to 1,024 UTF-8 bytes.
- Contain Unicode characters in category L (letter), M (mark), N (number),Pc (connector, including underscore), Pd (dash), Zs (space). For moreinformation, seeGeneral Category.
The following are all examples of valid table names:table 01
, ग्राहक
, 00_お客様
, étudiant-01
.
Caveats:
- Table names are case-sensitive by default.
mytable
andMyTable
cancoexist in the same dataset, unless they are part of a dataset withcase-sensitivity turned off. - Some table names and table name prefixes are reserved. Ifyou receive an error saying that your table name or prefix isreserved, then select a different name and try again.
If you include multiple dot operators (
.
) in a sequence, the duplicateoperators are implicitly stripped.For example, this:
project_name....dataset_name..table_name
Becomes this:
project_name.dataset_name.table_name
Required permissions
To create a table, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create
Additionally, you might require the bigquery.tables.getData
permission toaccess the data that you write to the table.
Each of the following predefined IAM roles includes the permissions that you need in order to create a table:
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin
(includes thebigquery.jobs.create
permission)roles/bigquery.user
(includes thebigquery.jobs.create
permission)roles/bigquery.jobUser
(includes thebigquery.jobs.create
permission)
Additionally, if you have the bigquery.datasets.create
permission, you cancreate and update tables in the datasets that you create.
For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.
Create an empty clustered table with a schema definition
You specify clustering columns when you create a table inBigQuery. After the table is created, you can modify theclustering columns; see Modifying clustering specificationfor details.
Clustering columns must be top-level, non-repeated columns, and they must be oneof the following simple data types:
DATE
BOOLEAN
GEOGRAPHY
INTEGER
NUMERIC
BIGNUMERIC
STRING
TIMESTAMP
RANGE
(preview)
You can specify up to four clustering columns. When you specify multiplecolumns, the order of the columns determines how the data is sorted. Forexample, if the table is clustered by columns a, b and c, the data is sorted inthe same order: first by column a, then by column b, and then by columnc. As a best practice, place the most frequently filtered or aggregated columnfirst.
The order of your clustering columns also affects query performance and pricing.For more information about query best practices for clustered tables, seeQuerying clustered tables.
To create an empty clustered table with a schema definition:
Console
In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
- In the Explorer pane, expand your project, and then select a dataset.
- In the Dataset info section, click add_box Create table.
- In the Create table panel, specify the following details:
- In the Source section, select Empty table in the Create table from list.
- In the Destination section, specify the following details:
- For Dataset, select the dataset in which you want to create the table.
- In the Table field, enter the name of the table that you want to create.
- Verify that the Table type field is set to Native table.
- In the Schema section, enter the schema definition. You can enter schema information manually by using one of the following methods:
- Option 1: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
bq show --format=prettyjson dataset.table
- Option 2: Click add_box Add field and enter the table schema. Specify each field's Name, Type, and Mode.
- Option 1: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
- For Clustering order, enter between one and four comma-separated column names.
- Optional: In the Advanced options section, if you want to use a customer-managed encryption key, then select the Use a customer-managed encryption key (CMEK) option. By default, BigQuery encrypts customer content stored at rest by using a Google-managed key.
- Click Create table.
SQL
Use the CREATE TABLE DDL statementcommand with the CLUSTER BY
option. The following example creates aclustered table named myclusteredtable
in mydataset
:
In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
CREATE TABLE mydataset.myclusteredtable( customer_id STRING, transaction_amount NUMERIC)CLUSTER BY customer_id OPTIONS ( description = 'a table clustered by customer_id');
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bq
Use the bq mk commandwith the following flags:
--table
(or the-t
shortcut).--schema
. You can supply the table's schema definition inline oruse a JSON schema file.--clustering_fields
. You can specify up to four clustering columns.
Optional parameters include --expiration
, --description
,--time_partitioning_type
, --time_partitioning_field
,--time_partitioning_expiration
, --destination_kms_key
, and --label
.
If you are creating a table in a project other than your default project,add the project ID to the dataset in the following format:project_id:dataset
.
--destination_kms_key
is not demonstrated here. For information aboutusing --destination_kms_key
, seecustomer-managed encryption keys.
Enter the following command to create an empty clustered table with aschema definition:
bq mk \ --table \ --expiration INTEGER1 \ --schema SCHEMA \ --clustering_fields CLUSTER_COLUMNS \ --description "DESCRIPTION" \ --label KEY:VALUE,KEY:VALUE \ PROJECT_ID:DATASET.TABLE
Replace the following:
INTEGER1
: the default lifetime, in seconds, forthe table.The minimum value is 3,600 seconds (one hour). The expirationtime evaluates to the current UTC time plus the integer value. If you setthe table's expiration time when you create a table, the dataset's defaulttable expiration setting is ignored. Setting this value deletes the tableafter the specified time.SCHEMA
: an inline schema definition in the formatCOLUMN:DATA_TYPE,COLUMN:DATA_TYPE
or the path tothe JSON schema file on your local machine.CLUSTER_COLUMNS
: a comma-separated list of up tofour clustering columns. The list cannot contain any spaces.DESCRIPTION
: a description of the table, in quotes.KEY:VALUE
: the key-value pair that represents alabel. You can enter multiple labels usinga comma-separated list.PROJECT_ID
: your project ID.DATASET
: a dataset in your project.TABLE
: the name of the table you're creating.
When you specify the schema on the command line, you cannot include aRECORD
(STRUCT)type, you cannot include a column description, and youcannot specify the column's mode. All modes default to NULLABLE
. Toinclude descriptions, modes, and RECORD
types, supply a JSON schemafile instead.
Examples:
Enter the following command to create a clustered tablenamed myclusteredtable
in mydataset
in your default project. The table'sexpiration is set to 2,592,000 (1 30-day month), the description is set toThis is my clustered table
, and the label is set toorganization:development
. The command uses the -t
shortcut instead of--table
.
The schema is specified inline as:timestamp:timestamp,customer_id:string,transaction_amount:float
. Thespecified clustering field customer_id
is used to cluster the table.
bq mk \ -t \ --expiration 2592000 \ --schema 'timestamp:timestamp,customer_id:string,transaction_amount:float' \ --clustering_fields customer_id \ --description "This is my clustered table" \ --label org:dev \ mydataset.myclusteredtable
Enter the following command to create a clustered table namedmyclusteredtable
in myotherproject
, not your default project. Thedescription is set to This is my clustered table
, and the label is setto organization:development
. The command uses the -t
shortcut instead of--table
. This command does not specify a table expiration. If the datasethas a default table expiration, it is applied. If the dataset has no defaulttable expiration, the table never expires.
The schema is specified in a local JSON file: /tmp/myschema.json
. Thecustomer_id
field is used to cluster the table.
bq mk \ -t \ --expiration 2592000 \ --schema /tmp/myschema.json \ --clustering_fields=customer_id \ --description "This is my clustered table" \ --label org:dev \ myotherproject:mydataset.myclusteredtable
After the table is created, you can update the table'sdescriptionand labels.
Terraform
Use thegoogle_bigquery_table
resource.
The following example creates a table named mytable
that is clusteredon the ID
and Created
columns:
resource "google_bigquery_dataset" "default" { dataset_id = "mydataset" default_partition_expiration_ms = 2592000000 # 30 days default_table_expiration_ms = 31536000000 # 365 days description = "dataset description" location = "US" max_time_travel_hours = 96 # 4 days labels = { billing_group = "accounting", pii = "sensitive" }}resource "google_bigquery_table" "default" { dataset_id = google_bigquery_dataset.default.dataset_id table_id = "mytable" deletion_protection = false # set to "true" in production clustering = ["ID", "Created"] schema = <<EOF[ { "name": "ID", "type": "INT64", "description": "Item ID" }, { "name": "Item", "type": "STRING", "mode": "NULLABLE" }, { "name": "Created", "type": "TIMESTAMP" }]EOF}
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud Shell
- Launch Cloud Shell.
-
Set the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Prepare the directory
Each Terraform configuration file must have its own directory (alsocalled a root module).
- In Cloud Shell, create a directory and a new file within that directory. The filename must have the
.tf
extension—for examplemain.tf
. In this tutorial, the file is referred to asmain.tf
.mkdir DIRECTORY && cd DIRECTORY && touch main.tf
-
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created
main.tf
.Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
- Review and modify the sample parameters to apply to your environment.
- Save your changes.
- Initialize Terraform. You only need to do this once per directory.
terraform init
Optionally, to use the latest Google provider version, include the
-upgrade
option:terraform init -upgrade
Apply the changes
- Review the configuration and verify that the resources that Terraform is going to create or update match your expectations:
terraform plan
Make corrections to the configuration as necessary.
- Apply the Terraform configuration by running the following command and entering
yes
at the prompt:terraform apply
Wait until Terraform displays the "Apply complete!" message.
- Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
API
Call the tables.insertmethod with a defined table resourcethat specifies the clustering.fields
property and the schema
property.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
from google.cloud import bigquery# Construct a BigQuery client object.client = bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"schema = [ bigquery.SchemaField("full_name", "STRING"), bigquery.SchemaField("city", "STRING"), bigquery.SchemaField("zipcode", "INTEGER"),]table = bigquery.Table(table_id, schema=schema)table.clustering_fields = ["city", "zipcode"]table = client.create_table(table) # Make an API request.print( "Created clustered table {}.{}.{}".format( table.project, table.dataset_id, table.table_id ))
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries. import ("context""fmt""time""cloud.google.com/go/bigquery")// createTableClustered demonstrates creating a BigQuery table with advanced properties like// partitioning and clustering features.func createTableClustered(projectID, datasetID, tableID string) error {// projectID := "my-project-id"// datasetID := "mydatasetid"// tableID := "mytableid"ctx := context.Background()client, err := bigquery.NewClient(ctx, projectID)if err != nil {return fmt.Errorf("bigquery.NewClient: %v", err)}defer client.Close()sampleSchema := bigquery.Schema{{Name: "timestamp", Type: bigquery.TimestampFieldType},{Name: "origin", Type: bigquery.StringFieldType},{Name: "destination", Type: bigquery.StringFieldType},{Name: "amount", Type: bigquery.NumericFieldType},}metaData := &bigquery.TableMetadata{Schema: sampleSchema,TimePartitioning: &bigquery.TimePartitioning{Field: "timestamp",Expiration: 90 * 24 * time.Hour,},Clustering: &bigquery.Clustering{Fields: []string{"origin", "destination"},},}tableRef := client.Dataset(datasetID).Table(tableID)if err := tableRef.Create(ctx, metaData); err != nil {return err}return nil}
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
import com.google.cloud.bigquery.BigQuery;import com.google.cloud.bigquery.BigQueryException;import com.google.cloud.bigquery.BigQueryOptions;import com.google.cloud.bigquery.Clustering;import com.google.cloud.bigquery.Field;import com.google.cloud.bigquery.Schema;import com.google.cloud.bigquery.StandardSQLTypeName;import com.google.cloud.bigquery.StandardTableDefinition;import com.google.cloud.bigquery.TableId;import com.google.cloud.bigquery.TableInfo;import com.google.cloud.bigquery.TimePartitioning;import com.google.common.collect.ImmutableList;public class CreateClusteredTable { public static void runCreateClusteredTable() { // TODO(developer): Replace these variables before running the sample. String datasetName = "MY_DATASET_NAME"; String tableName = "MY_TABLE_NAME"; createClusteredTable(datasetName, tableName); } public static void createClusteredTable(String datasetName, String tableName) { try { // Initialize client that will be used to send requests. This client only needs to be created // once, and can be reused for multiple requests. BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService(); TableId tableId = TableId.of(datasetName, tableName); TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY); Schema schema = Schema.of( Field.of("name", StandardSQLTypeName.STRING), Field.of("post_abbr", StandardSQLTypeName.STRING), Field.of("date", StandardSQLTypeName.DATE)); Clustering clustering = Clustering.newBuilder().setFields(ImmutableList.of("name", "post_abbr")).build(); StandardTableDefinition tableDefinition = StandardTableDefinition.newBuilder() .setSchema(schema) .setTimePartitioning(partitioning) .setClustering(clustering) .build(); TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build(); bigquery.create(tableInfo); System.out.println("Clustered table created successfully"); } catch (BigQueryException e) { System.out.println("Clustered table was not created. \n" + e.toString()); } }}
Create a clustered table from a query result
There are two ways to create a clustered table from a query result:
- Write the results to a new destination table and specify the clusteringcolumns.
- By using a DDL
CREATE TABLE AS SELECT
statement. For more information aboutthis method, seeCreating a clustered table from the result of a queryon the Using data definition language statementspage.
You can create a clustered table by querying either a partitioned table or anon-partitioned table. You cannot change an existing table to a clustered tableby using query results.
When you create a clustered table from a query result, you must use standardSQL. Currently, legacy SQL is not supported for querying clustered tables orfor writing query results to clustered tables.
SQL
To create a clustered table from a query result, use theCREATE TABLE DDL statementwith the CLUSTER BY
option. The following example creates a newtable clustered by customer_id
by querying an existing unclustered table:
In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
CREATE TABLE mydataset.clustered_table( customer_id STRING, transaction_amount NUMERIC)CLUSTER BY customer_idAS ( SELECT * FROM mydataset.unclustered_table);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bq
Enter the following command to create a new, clustered destinationtable from a query result:
bq --location=LOCATION query \ --use_legacy_sql=false 'QUERY'
Replace the following:
LOCATION
: the name of your location. The--location
flag isoptional. For example, if you are using BigQuery in theTokyo region, you can set the flag's value toasia-northeast1
. Youcan set a default value for the location using the.bigqueryrc file.QUERY
: a query in GoogleSQL syntax. Currently, you cannotuse legacy SQL to query clustered tables or to write query results toclustered tables. The query can contain aCREATE TABLE
DDLstatement that specifies the options for creating your clustered table.You can use DDL rather than specifying the individual command-lineflags.
Examples:
Enter the following command to write query results to a clustereddestination table named myclusteredtable
in mydataset
. mydataset
is inyour default project. The query retrieves data from a non-partitioned table:mytable. The table's customer_id
column is used to cluster thetable. The table's timestamp
column is used to create a partitioned table.
bq query --use_legacy_sql=false \ 'CREATE TABLE mydataset.myclusteredtable PARTITION BY DATE(timestamp) CLUSTER BY customer_id AS ( SELECT * FROM `mydataset.mytable` );'
API
To save query results to a clustered table,call the jobs.insertmethod, configure aquery job,and include a CREATE TABLE
DDLstatement that creates your clustered table.
Specify your location in the location
property in thejobReference
section of thejob resource.
Create a clustered table when you load data
You can create a clustered table by specifying clustering columns when you loaddata into a new table. You do not need to create an empty table before loadingdata into it. You can create the clustered table and load your data at the sametime.
For more information about loading data, seeIntroduction to loading data into BigQuery.
To define clustering when defining a load job:
SQL
Use the LOAD DATA statement.The following example loads AVRO data to create a table that is partitionedby the transaction_date
field and clustered by the customer_id
field.It also configures the partitions to expire after three days.
In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
LOAD DATA INTO mydataset.mytablePARTITION BY transaction_dateCLUSTER BY customer_id OPTIONS ( partition_expiration_days = 3)FROM FILES( format = 'AVRO', uris = ['gs://bucket/path/file.avro']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
API
To define a clustering configuration when creating a table through aload job, you can populate theClusteringproperties for the table.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries. import ("context""fmt""cloud.google.com/go/bigquery")// importClusteredTable demonstrates creating a table from a load job and defining partitioning and clustering// properties.func importClusteredTable(projectID, destDatasetID, destTableID string) error {// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx := context.Background()client, err := bigquery.NewClient(ctx, projectID)if err != nil {return fmt.Errorf("bigquery.NewClient: %v", err)}defer client.Close()gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv")gcsRef.SkipLeadingRows = 1gcsRef.Schema = bigquery.Schema{{Name: "timestamp", Type: bigquery.TimestampFieldType},{Name: "origin", Type: bigquery.StringFieldType},{Name: "destination", Type: bigquery.StringFieldType},{Name: "amount", Type: bigquery.NumericFieldType},}loader := client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)loader.TimePartitioning = &bigquery.TimePartitioning{Field: "timestamp",}loader.Clustering = &bigquery.Clustering{Fields: []string{"origin", "destination"},}loader.WriteDisposition = bigquery.WriteEmptyjob, err := loader.Run(ctx)if err != nil {return err}status, err := job.Wait(ctx)if err != nil {return err}if status.Err() != nil {return fmt.Errorf("job completed with error: %v", status.Err())}return nil}
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
import com.google.cloud.bigquery.BigQuery;import com.google.cloud.bigquery.BigQueryException;import com.google.cloud.bigquery.BigQueryOptions;import com.google.cloud.bigquery.Clustering;import com.google.cloud.bigquery.Field;import com.google.cloud.bigquery.FormatOptions;import com.google.cloud.bigquery.Job;import com.google.cloud.bigquery.JobInfo;import com.google.cloud.bigquery.LoadJobConfiguration;import com.google.cloud.bigquery.Schema;import com.google.cloud.bigquery.StandardSQLTypeName;import com.google.cloud.bigquery.TableId;import com.google.cloud.bigquery.TimePartitioning;import com.google.common.collect.ImmutableList;public class LoadTableClustered { public static void runLoadTableClustered() throws Exception { // TODO(developer): Replace these variables before running the sample. String datasetName = "MY_DATASET_NAME"; String tableName = "MY_TABLE_NAME"; String sourceUri = "/path/to/file.csv"; loadTableClustered(datasetName, tableName, sourceUri); } public static void loadTableClustered(String datasetName, String tableName, String sourceUri) throws Exception { try { // Initialize client that will be used to send requests. This client only needs to be created // once, and can be reused for multiple requests. BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService(); TableId tableId = TableId.of(datasetName, tableName); Schema schema = Schema.of( Field.of("name", StandardSQLTypeName.STRING), Field.of("post_abbr", StandardSQLTypeName.STRING), Field.of("date", StandardSQLTypeName.DATE)); TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY); Clustering clustering = Clustering.newBuilder().setFields(ImmutableList.of("name", "post_abbr")).build(); LoadJobConfiguration loadJobConfig = LoadJobConfiguration.builder(tableId, sourceUri) .setFormatOptions(FormatOptions.csv()) .setSchema(schema) .setTimePartitioning(partitioning) .setClustering(clustering) .build(); Job loadJob = bigquery.create(JobInfo.newBuilder(loadJobConfig).build()); // Load data from a GCS parquet file into the table // Blocks until this load table job completes its execution, either failing or succeeding. Job completedJob = loadJob.waitFor(); // Check for errors if (completedJob == null) { throw new Exception("Job not executed since it no longer exists."); } else if (completedJob.getStatus().getError() != null) { // You can also look at queryJob.getStatus().getExecutionErrors() for all // errors, not just the latest one. throw new Exception( "BigQuery was unable to load into the table due to an error: \n" + loadJob.getStatus().getError()); } System.out.println("Data successfully loaded into clustered table during load job"); } catch (BigQueryException | InterruptedException e) { System.out.println("Data not loaded into clustered table during load job \n" + e.toString()); } }}
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
from google.cloud import bigquery# Construct a BigQuery client object.client = bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config = bigquery.LoadJobConfig( skip_leading_rows=1, source_format=bigquery.SourceFormat.CSV, schema=[ bigquery.SchemaField("timestamp", bigquery.SqlTypeNames.TIMESTAMP), bigquery.SchemaField("origin", bigquery.SqlTypeNames.STRING), bigquery.SchemaField("destination", bigquery.SqlTypeNames.STRING), bigquery.SchemaField("amount", bigquery.SqlTypeNames.NUMERIC), ], time_partitioning=bigquery.TimePartitioning(field="timestamp"), clustering_fields=["origin", "destination"],)job = client.load_table_from_uri( ["gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv"], table_id, job_config=job_config,)job.result() # Waits for the job to complete.table = client.get_table(table_id) # Make an API request.print( "Loaded {} rows and {} columns to {}".format( table.num_rows, len(table.schema), table_id ))
Control access to clustered tables
To configure access to tables and views, you can grant anIAM role to an entity at the following levels, listed inorder of range of resources allowed (largest to smallest):
- a high level in theGoogle Cloud resource hierarchysuch as the project, folder, or organization level
- the dataset level
- the table or view level
You can also restrict data access within tables, by using the followingmethods:
- column-level security
- column data masking
- row-level security
Access with any resource protected by IAM is additive. Forexample, if an entity does not have access at the high level such as a project,you could grant the entity access at the dataset level, and then the entity willhave access to the tables and views in the dataset. Similarly, if the entitydoes not have access at the high level or the dataset level, you could grant theentity access at the table or view level.
Granting IAM roles at a higher level in the Google Cloudresource hierarchysuch as the project, folder, or organization level gives the entity access to abroad set of resources. For example, granting a role to an entity at the projectlevel gives that entity permissions that apply to all datasets throughout theproject.
Granting a role at the dataset level specifies the operations an entity isallowed to perform on tables and views in that specific dataset, even if theentity does not have access at a higher level. For information on configuringdataset-level access controls, seeControlling access to datasets.
Granting a role at the table or view level specifies the operations an entity isallowed to perform on specific tables and views, even if the entity does nothave access at a higher level. For information on configuring table-level accesscontrols, seeControlling access to tables and views.
You can also create IAM custom roles.If you create a custom role, the permissions you grant depend on the specificoperations you want the entity to be able to perform.
You can't set a "deny" permission on any resource protected byIAM.
For more information about roles and permissions, see Understanding rolesin the IAM documentation and the BigQueryIAM roles and permissions.
Use clustered tables
Get information about clustered tables
You can get information about tables in the following ways:
- Using the Google Cloud console.
- Using the bq command-line tool's
bq show
command. - Calling the tables.getAPI method.
- Querying INFORMATION_SCHEMA views.
Required permissions
At a minimum, to get information about tables, you must be grantedbigquery.tables.get
permissions. The following predefined IAMroles include bigquery.tables.get
permissions:
bigquery.metadataViewer
bigquery.dataViewer
bigquery.dataOwner
bigquery.dataEditor
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when thatuser creates a dataset, they are granted bigquery.dataOwner
access to it.bigquery.dataOwner
access gives the user the ability to get information abouttables in a dataset.
For more information about IAM roles and permissions inBigQuery, seePredefined roles and permissions.
Get clustered table information
To view information about a clustered table:
Console
In the Google Cloud console, go to the Resources pane.Click your dataset name to expand it, and then click the table nameyou want to view.
Click Details. This page displays the table'sdetails including the clustering columns.
SQL
For clustered tables, you can query the CLUSTERING_ORDINAL_POSITION
columnin the INFORMATION_SCHEMA.COLUMNS viewto find the 1-indexed offset of the column within the table's clusteringcolumns:
In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
CREATE TABLE mydataset.data (column1 INT64, column2 INT64)CLUSTER BY column1, column2;SELECT column_name, clustering_ordinal_positionFROM mydataset.INFORMATION_SCHEMA.COLUMNS;
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
The clustering ordinal position is 1 for column1
and 2 for column2
.More table metadata is available through the TABLES
, TABLE_OPTIONS
,COLUMNS
, and COLUMN_FIELD_PATH
views in INFORMATION_SCHEMA.
bq
Issue the bq show
command to display all table information. Use the--schema
flag to display only table schema information. The --format
flag can be used to control the output.
If you are getting information about a table in a project other thanyour default project, add the project ID to the dataset in the followingformat: project_id:dataset
.
bq show \ --schema \ --format=prettyjson \ PROJECT_ID:DATASET.TABLE
Replace the following:
PROJECT_ID
: your project IDDATASET
: the name of the datasetTABLE
: the name of the table
Examples:
Enter the following command to display all information aboutmyclusteredtable
in mydataset
. mydataset
in your default project.
bq show --format=prettyjson mydataset.myclusteredtable
The output should look like the following:
{ "clustering": { "fields": [ "customer_id" ] },...}
API
Call the bigquery.tables.getmethod and provide any relevant parameters.
List clustered tables in a dataset
You can list clustered tables in datasets in the following ways:
- Using the Google Cloud console.
- Using the bq command-line tool's
bq ls
command. - Calling the tables.list APImethod.
- Using the client libraries.
- Querying the
CLUSTERING_ORDINAL_POSITION
column in theINFORMATION_SCHEMA.COLUMNS
view.
The permissions required to list clustered tables and the steps to list themare the same as for standard tables. For more information about listingtables, seeListing tables in a dataset.
Modify clustering specification
You can change or remove a table's clustering specifications, or change the setof clustered columns in a clustered table. This method of updating theclustering column set is useful for tables that use continuous streaming insertsbecause those tables cannot be easily swapped by other methods.
Follow these steps to apply a new clustering specification to unpartitioned orpartitioned tables.
In the bq tool, update the clustering specification of yourtable to match the new clustering:
bq update --clustering_fields=CLUSTER_COLUMN DATASET.ORIGINAL_TABLE
Replace the following:
CLUSTER_COLUMN
: the column you are clustering on—for example,mycolumn
DATASET
: the name of the dataset containing the table—for example,mydataset
ORIGINAL_TABLE
: the name of your original table—for example,mytable
You can also call the
tables.update
ortables.patch
API method to modify the clustering specification.To cluster all rows according to the new clustering specification,run the following
UPDATE
statement:UPDATE DATASET.ORIGINAL_TABLE SET CLUSTER_COLUMN=CLUSTER_COLUMN WHERE true
Table security
To control access to tables in BigQuery, seeIntroduction to table access controls.
What's next
- For information about querying clustered tables, seeQuerying clustered tables.
- For an overview of partitioned table support in BigQuery, seeIntroduction to partitioned tables.
- To learn how to create partitioned tables, seeCreating partitioned tables.
- To see an overview of
INFORMATION_SCHEMA
, go toIntroduction to BigQuery INFORMATION_SCHEMA.