Create and use clustered tables | BigQuery

This document describes how to create and use clustered tables inBigQuery. For an overview of clustered table support inBigQuery, see Introduction to clustered tables.

Create clustered tables

You can create a clustered table by using the following methods:

Create a table from a query result:
- Run a DDLCREATE TABLE AS SELECTstatement.
- Run a query that creates aclustered destination table.
Use a DDL CREATE TABLE statement with a CLUSTER BY clause containinga clustering_column_list.
Run the bq command-line tool bq mk command.
Make calls to the tables.insertAPI method.
Load data into BigQuery.
Use the client libraries.

Table naming

When you create a table in BigQuery, the table name mustbe unique per dataset. The table name can:

Contain characters with a total of up to 1,024 UTF-8 bytes.
Contain Unicode characters in category L (letter), M (mark), N (number),Pc (connector, including underscore), Pd (dash), Zs (space). For moreinformation, seeGeneral Category.

The following are all examples of valid table names:table 01, ग्राहक, 00_お客様, étudiant-01.

Caveats:

Table names are case-sensitive by default. mytable and MyTable cancoexist in the same dataset, unless they are part of a dataset withcase-sensitivity turned off.
Some table names and table name prefixes are reserved. Ifyou receive an error saying that your table name or prefix isreserved, then select a different name and try again.
If you include multiple dot operators (.) in a sequence, the duplicateoperators are implicitly stripped.
For example, this:project_name....dataset_name..table_name
Becomes this:project_name.dataset_name.table_name

Required permissions

To create a table, you need the following IAM permissions:

bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create

Additionally, you might require the bigquery.tables.getData permission toaccess the data that you write to the table.

Each of the following predefined IAM roles includes the permissions that you need in order to create a table:

roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin (includes the bigquery.jobs.create permission)
roles/bigquery.user (includes the bigquery.jobs.create permission)
roles/bigquery.jobUser (includes the bigquery.jobs.create permission)

Additionally, if you have the bigquery.datasets.create permission, you cancreate and update tables in the datasets that you create.

For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Create an empty clustered table with a schema definition

You specify clustering columns when you create a table inBigQuery. After the table is created, you can modify theclustering columns; see Modifying clustering specificationfor details.

Clustering columns must be top-level, non-repeated columns, and they must be oneof the following simple data types:

DATE
BOOLEAN
GEOGRAPHY
INTEGER
NUMERIC
BIGNUMERIC
STRING
TIMESTAMP
RANGE (preview)

You can specify up to four clustering columns. When you specify multiplecolumns, the order of the columns determines how the data is sorted. Forexample, if the table is clustered by columns a, b and c, the data is sorted inthe same order: first by column a, then by column b, and then by columnc. As a best practice, place the most frequently filtered or aggregated columnfirst.

The order of your clustering columns also affects query performance and pricing.For more information about query best practices for clustered tables, seeQuerying clustered tables.

To create an empty clustered table with a schema definition:

Console

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery
In the Explorer pane, expand your project, and then select a dataset.
In the Dataset info section, click add_box Create table.
In the Create table panel, specify the following details:

In the Source section, select Empty table in the Create table from list.
In the Destination section, specify the following details:
1. For Dataset, select the dataset in which you want to create the table.
2. In the Table field, enter the name of the table that you want to create.
3. Verify that the Table type field is set to Native table.
In the Schema section, enter the schema definition. You can enter schema information manually by using one of the following methods:
- Option 1: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
```
 bq show --format=prettyjson dataset.table 
```
- Option 2: Click add_box Add field and enter the table schema. Specify each field's Name, Type, and Mode.
For Clustering order, enter between one and four comma-separated column names.
Optional: In the Advanced options section, if you want to use a customer-managed encryption key, then select the Use a customer-managed encryption key (CMEK) option. By default, BigQuery encrypts customer content stored at rest by using a Google-managed key.
Click Create table.

SQL

Use the CREATE TABLE DDL statementcommand with the CLUSTER BY option. The following example creates aclustered table named myclusteredtable in mydataset:

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

CREATE TABLE mydataset.myclusteredtable( customer_id STRING, transaction_amount NUMERIC)CLUSTER BY customer_id OPTIONS ( description = 'a table clustered by customer_id');

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

bq

Use the bq mk commandwith the following flags:

--table (or the -t shortcut).
--schema. You can supply the table's schema definition inline oruse a JSON schema file.
--clustering_fields. You can specify up to four clustering columns.

Optional parameters include --expiration, --description,--time_partitioning_type, --time_partitioning_field,--time_partitioning_expiration, --destination_kms_key, and --label.

If you are creating a table in a project other than your default project,add the project ID to the dataset in the following format:project_id:dataset.

--destination_kms_key is not demonstrated here. For information aboutusing --destination_kms_key, seecustomer-managed encryption keys.

Terraform

Use thegoogle_bigquery_tableresource.

The following example creates a table named mytable that is clusteredon the ID and Created columns:

resource "google_bigquery_dataset" "default" { dataset_id = "mydataset" default_partition_expiration_ms = 2592000000 # 30 days default_table_expiration_ms = 31536000000 # 365 days description = "dataset description" location = "US" max_time_travel_hours = 96 # 4 days labels = { billing_group = "accounting", pii = "sensitive" }}resource "google_bigquery_table" "default" { dataset_id = google_bigquery_dataset.default.dataset_id table_id = "mytable" deletion_protection = false # set to "true" in production clustering = ["ID", "Created"] schema = <<EOF[ { "name": "ID", "type": "INT64", "description": "Item ID" }, { "name": "Item", "type": "STRING", "mode": "NULLABLE" }, { "name": "Created", "type": "TIMESTAMP" }]EOF}

To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.

Prepare Cloud Shell

Launch Cloud Shell.
Set the default Google Cloud project where you want to apply your Terraform configurations.

You only need to run this command once per project, and you can run it in any directory.
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
Environment variables are overridden if you set explicit values in the Terraform configuration file.

Prepare the directory

Each Terraform configuration file must have its own directory (alsocalled a root module).

In Cloud Shell, create a directory and a new file within that directory. The filename must have the .tf extension—for example main.tf. In this tutorial, the file is referred to as main.tf.
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
If you are following a tutorial, you can copy the sample code in each section or step.

Copy the sample code into the newly created main.tf.

Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
Review and modify the sample parameters to apply to your environment.
Save your changes.
Initialize Terraform. You only need to do this once per directory.
```
terraform init
```
Optionally, to use the latest Google provider version, include the -upgrade option:
```
terraform init -upgrade
```

Apply the changes

Review the configuration and verify that the resources that Terraform is going to create or update match your expectations:
```
terraform plan
```
Make corrections to the configuration as necessary.
Apply the Terraform configuration by running the following command and entering yes at the prompt:
```
terraform apply
```
Wait until Terraform displays the "Apply complete!" message.
Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.

API

Call the tables.insertmethod with a defined table resourcethat specifies the clustering.fields property and the schema property.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from google.cloud import bigquery# Construct a BigQuery client object.client = bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"schema = [ bigquery.SchemaField("full_name", "STRING"), bigquery.SchemaField("city", "STRING"), bigquery.SchemaField("zipcode", "INTEGER"),]table = bigquery.Table(table_id, schema=schema)table.clustering_fields = ["city", "zipcode"]table = client.create_table(table) # Make an API request.print( "Created clustered table {}.{}.{}".format( table.project, table.dataset_id, table.table_id ))

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import ("context""fmt""time""cloud.google.com/go/bigquery")// createTableClustered demonstrates creating a BigQuery table with advanced properties like// partitioning and clustering features.func createTableClustered(projectID, datasetID, tableID string) error {// projectID := "my-project-id"// datasetID := "mydatasetid"// tableID := "mytableid"ctx := context.Background()client, err := bigquery.NewClient(ctx, projectID)if err != nil {return fmt.Errorf("bigquery.NewClient: %v", err)}defer client.Close()sampleSchema := bigquery.Schema{{Name: "timestamp", Type: bigquery.TimestampFieldType},{Name: "origin", Type: bigquery.StringFieldType},{Name: "destination", Type: bigquery.StringFieldType},{Name: "amount", Type: bigquery.NumericFieldType},}metaData := &bigquery.TableMetadata{Schema: sampleSchema,TimePartitioning: &bigquery.TimePartitioning{Field: "timestamp",Expiration: 90 * 24 * time.Hour,},Clustering: &bigquery.Clustering{Fields: []string{"origin", "destination"},},}tableRef := client.Dataset(datasetID).Table(tableID)if err := tableRef.Create(ctx, metaData); err != nil {return err}return nil}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;import com.google.cloud.bigquery.BigQueryException;import com.google.cloud.bigquery.BigQueryOptions;import com.google.cloud.bigquery.Clustering;import com.google.cloud.bigquery.Field;import com.google.cloud.bigquery.Schema;import com.google.cloud.bigquery.StandardSQLTypeName;import com.google.cloud.bigquery.StandardTableDefinition;import com.google.cloud.bigquery.TableId;import com.google.cloud.bigquery.TableInfo;import com.google.cloud.bigquery.TimePartitioning;import com.google.common.collect.ImmutableList;public class CreateClusteredTable { public static void runCreateClusteredTable() { // TODO(developer): Replace these variables before running the sample. String datasetName = "MY_DATASET_NAME"; String tableName = "MY_TABLE_NAME"; createClusteredTable(datasetName, tableName); } public static void createClusteredTable(String datasetName, String tableName) { try { // Initialize client that will be used to send requests. This client only needs to be created // once, and can be reused for multiple requests. BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService(); TableId tableId = TableId.of(datasetName, tableName); TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY); Schema schema = Schema.of( Field.of("name", StandardSQLTypeName.STRING), Field.of("post_abbr", StandardSQLTypeName.STRING), Field.of("date", StandardSQLTypeName.DATE)); Clustering clustering = Clustering.newBuilder().setFields(ImmutableList.of("name", "post_abbr")).build(); StandardTableDefinition tableDefinition = StandardTableDefinition.newBuilder() .setSchema(schema) .setTimePartitioning(partitioning) .setClustering(clustering) .build(); TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build(); bigquery.create(tableInfo); System.out.println("Clustered table created successfully"); } catch (BigQueryException e) { System.out.println("Clustered table was not created. \n" + e.toString()); } }}

Create a clustered table from a query result

There are two ways to create a clustered table from a query result:

Write the results to a new destination table and specify the clusteringcolumns.
By using a DDL CREATE TABLE AS SELECT statement. For more information aboutthis method, seeCreating a clustered table from the result of a queryon the Using data definition language statementspage.

You can create a clustered table by querying either a partitioned table or anon-partitioned table. You cannot change an existing table to a clustered tableby using query results.

When you create a clustered table from a query result, you must use standardSQL. Currently, legacy SQL is not supported for querying clustered tables orfor writing query results to clustered tables.

SQL

To create a clustered table from a query result, use theCREATE TABLE DDL statementwith the CLUSTER BY option. The following example creates a newtable clustered by customer_id by querying an existing unclustered table:

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

CREATE TABLE mydataset.clustered_table( customer_id STRING, transaction_amount NUMERIC)CLUSTER BY customer_idAS ( SELECT * FROM mydataset.unclustered_table);

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

bq

Enter the following command to create a new, clustered destinationtable from a query result:

bq --location=LOCATION query \ --use_legacy_sql=false 'QUERY'

Replace the following:

LOCATION: the name of your location. The --location flag isoptional. For example, if you are using BigQuery in theTokyo region, you can set the flag's value to asia-northeast1. Youcan set a default value for the location using the.bigqueryrc file.
QUERY: a query in GoogleSQL syntax. Currently, you cannotuse legacy SQL to query clustered tables or to write query results toclustered tables. The query can contain a CREATE TABLEDDLstatement that specifies the options for creating your clustered table.You can use DDL rather than specifying the individual command-lineflags.

Examples:

Enter the following command to write query results to a clustereddestination table named myclusteredtable in mydataset. mydataset is inyour default project. The query retrieves data from a non-partitioned table:mytable. The table's customer_id column is used to cluster thetable. The table's timestamp column is used to create a partitioned table.

bq query --use_legacy_sql=false \ 'CREATE TABLE mydataset.myclusteredtable PARTITION BY DATE(timestamp) CLUSTER BY customer_id AS ( SELECT * FROM `mydataset.mytable` );'

API

To save query results to a clustered table,call the jobs.insertmethod, configure aquery job,and include a CREATE TABLE DDLstatement that creates your clustered table.

Specify your location in the location property in thejobReference section of thejob resource.

Create a clustered table when you load data

You can create a clustered table by specifying clustering columns when you loaddata into a new table. You do not need to create an empty table before loadingdata into it. You can create the clustered table and load your data at the sametime.

For more information about loading data, seeIntroduction to loading data into BigQuery.

To define clustering when defining a load job:

SQL

Use the LOAD DATA statement.The following example loads AVRO data to create a table that is partitionedby the transaction_date field and clustered by the customer_id field.It also configures the partitions to expire after three days.

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

LOAD DATA INTO mydataset.mytablePARTITION BY transaction_dateCLUSTER BY customer_id OPTIONS ( partition_expiration_days = 3)FROM FILES( format = 'AVRO', uris = ['gs://bucket/path/file.avro']);

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

API

To define a clustering configuration when creating a table through aload job, you can populate theClusteringproperties for the table.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import ("context""fmt""cloud.google.com/go/bigquery")// importClusteredTable demonstrates creating a table from a load job and defining partitioning and clustering// properties.func importClusteredTable(projectID, destDatasetID, destTableID string) error {// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx := context.Background()client, err := bigquery.NewClient(ctx, projectID)if err != nil {return fmt.Errorf("bigquery.NewClient: %v", err)}defer client.Close()gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv")gcsRef.SkipLeadingRows = 1gcsRef.Schema = bigquery.Schema{{Name: "timestamp", Type: bigquery.TimestampFieldType},{Name: "origin", Type: bigquery.StringFieldType},{Name: "destination", Type: bigquery.StringFieldType},{Name: "amount", Type: bigquery.NumericFieldType},}loader := client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)loader.TimePartitioning = &bigquery.TimePartitioning{Field: "timestamp",}loader.Clustering = &bigquery.Clustering{Fields: []string{"origin", "destination"},}loader.WriteDisposition = bigquery.WriteEmptyjob, err := loader.Run(ctx)if err != nil {return err}status, err := job.Wait(ctx)if err != nil {return err}if status.Err() != nil {return fmt.Errorf("job completed with error: %v", status.Err())}return nil}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;import com.google.cloud.bigquery.BigQueryException;import com.google.cloud.bigquery.BigQueryOptions;import com.google.cloud.bigquery.Clustering;import com.google.cloud.bigquery.Field;import com.google.cloud.bigquery.FormatOptions;import com.google.cloud.bigquery.Job;import com.google.cloud.bigquery.JobInfo;import com.google.cloud.bigquery.LoadJobConfiguration;import com.google.cloud.bigquery.Schema;import com.google.cloud.bigquery.StandardSQLTypeName;import com.google.cloud.bigquery.TableId;import com.google.cloud.bigquery.TimePartitioning;import com.google.common.collect.ImmutableList;public class LoadTableClustered { public static void runLoadTableClustered() throws Exception { // TODO(developer): Replace these variables before running the sample. String datasetName = "MY_DATASET_NAME"; String tableName = "MY_TABLE_NAME"; String sourceUri = "/path/to/file.csv"; loadTableClustered(datasetName, tableName, sourceUri); } public static void loadTableClustered(String datasetName, String tableName, String sourceUri) throws Exception { try { // Initialize client that will be used to send requests. This client only needs to be created // once, and can be reused for multiple requests. BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService(); TableId tableId = TableId.of(datasetName, tableName); Schema schema = Schema.of( Field.of("name", StandardSQLTypeName.STRING), Field.of("post_abbr", StandardSQLTypeName.STRING), Field.of("date", StandardSQLTypeName.DATE)); TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY); Clustering clustering = Clustering.newBuilder().setFields(ImmutableList.of("name", "post_abbr")).build(); LoadJobConfiguration loadJobConfig = LoadJobConfiguration.builder(tableId, sourceUri) .setFormatOptions(FormatOptions.csv()) .setSchema(schema) .setTimePartitioning(partitioning) .setClustering(clustering) .build(); Job loadJob = bigquery.create(JobInfo.newBuilder(loadJobConfig).build()); // Load data from a GCS parquet file into the table // Blocks until this load table job completes its execution, either failing or succeeding. Job completedJob = loadJob.waitFor(); // Check for errors if (completedJob == null) { throw new Exception("Job not executed since it no longer exists."); } else if (completedJob.getStatus().getError() != null) { // You can also look at queryJob.getStatus().getExecutionErrors() for all // errors, not just the latest one. throw new Exception( "BigQuery was unable to load into the table due to an error: \n" + loadJob.getStatus().getError()); } System.out.println("Data successfully loaded into clustered table during load job"); } catch (BigQueryException | InterruptedException e) { System.out.println("Data not loaded into clustered table during load job \n" + e.toString()); } }}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from google.cloud import bigquery# Construct a BigQuery client object.client = bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config = bigquery.LoadJobConfig( skip_leading_rows=1, source_format=bigquery.SourceFormat.CSV, schema=[ bigquery.SchemaField("timestamp", bigquery.SqlTypeNames.TIMESTAMP), bigquery.SchemaField("origin", bigquery.SqlTypeNames.STRING), bigquery.SchemaField("destination", bigquery.SqlTypeNames.STRING), bigquery.SchemaField("amount", bigquery.SqlTypeNames.NUMERIC), ], time_partitioning=bigquery.TimePartitioning(field="timestamp"), clustering_fields=["origin", "destination"],)job = client.load_table_from_uri( ["gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv"], table_id, job_config=job_config,)job.result() # Waits for the job to complete.table = client.get_table(table_id) # Make an API request.print( "Loaded {} rows and {} columns to {}".format( table.num_rows, len(table.schema), table_id ))

Control access to clustered tables

To configure access to tables and views, you can grant anIAM role to an entity at the following levels, listed inorder of range of resources allowed (largest to smallest):

a high level in theGoogle Cloud resource hierarchysuch as the project, folder, or organization level
the dataset level
the table or view level

You can also restrict data access within tables, by using the followingmethods:

column-level security
column data masking
row-level security

Access with any resource protected by IAM is additive. Forexample, if an entity does not have access at the high level such as a project,you could grant the entity access at the dataset level, and then the entity willhave access to the tables and views in the dataset. Similarly, if the entitydoes not have access at the high level or the dataset level, you could grant theentity access at the table or view level.

Granting IAM roles at a higher level in the Google Cloudresource hierarchysuch as the project, folder, or organization level gives the entity access to abroad set of resources. For example, granting a role to an entity at the projectlevel gives that entity permissions that apply to all datasets throughout theproject.

Granting a role at the dataset level specifies the operations an entity isallowed to perform on tables and views in that specific dataset, even if theentity does not have access at a higher level. For information on configuringdataset-level access controls, seeControlling access to datasets.

Granting a role at the table or view level specifies the operations an entity isallowed to perform on specific tables and views, even if the entity does nothave access at a higher level. For information on configuring table-level accesscontrols, seeControlling access to tables and views.

You can also create IAM custom roles.If you create a custom role, the permissions you grant depend on the specificoperations you want the entity to be able to perform.

You can't set a "deny" permission on any resource protected byIAM.

For more information about roles and permissions, see Understanding rolesin the IAM documentation and the BigQueryIAM roles and permissions.

Use clustered tables

Get information about clustered tables

You can get information about tables in the following ways:

Using the Google Cloud console.
Using the bq command-line tool's bq show command.
Calling the tables.getAPI method.
Querying INFORMATION_SCHEMA views.

Required permissions

At a minimum, to get information about tables, you must be grantedbigquery.tables.get permissions. The following predefined IAMroles include bigquery.tables.get permissions:

bigquery.metadataViewer
bigquery.dataViewer
bigquery.dataOwner
bigquery.dataEditor
bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when thatuser creates a dataset, they are granted bigquery.dataOwner access to it.bigquery.dataOwner access gives the user the ability to get information abouttables in a dataset.

For more information about IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Get clustered table information

To view information about a clustered table:

Console

In the Google Cloud console, go to the Resources pane.Click your dataset name to expand it, and then click the table nameyou want to view.
Click Details. This page displays the table'sdetails including the clustering columns.

SQL

For clustered tables, you can query the CLUSTERING_ORDINAL_POSITION columnin the INFORMATION_SCHEMA.COLUMNS viewto find the 1-indexed offset of the column within the table's clusteringcolumns:

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

CREATE TABLE mydataset.data (column1 INT64, column2 INT64)CLUSTER BY column1, column2;SELECT column_name, clustering_ordinal_positionFROM mydataset.INFORMATION_SCHEMA.COLUMNS;

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

The clustering ordinal position is 1 for column1 and 2 for column2.More table metadata is available through the TABLES, TABLE_OPTIONS,COLUMNS, and COLUMN_FIELD_PATH views in INFORMATION_SCHEMA.

bq

Issue the bq show command to display all table information. Use the--schema flag to display only table schema information. The --formatflag can be used to control the output.

If you are getting information about a table in a project other thanyour default project, add the project ID to the dataset in the followingformat: project_id:dataset.

bq show \ --schema \ --format=prettyjson \ PROJECT_ID:DATASET.TABLE

Replace the following:

PROJECT_ID: your project ID
DATASET: the name of the dataset
TABLE: the name of the table

Examples:

Enter the following command to display all information aboutmyclusteredtable in mydataset. mydataset in your default project.

bq show --format=prettyjson mydataset.myclusteredtable

The output should look like the following:

{ "clustering": { "fields": [ "customer_id" ] },...}

API

Call the bigquery.tables.getmethod and provide any relevant parameters.

List clustered tables in a dataset

You can list clustered tables in datasets in the following ways:

Using the Google Cloud console.
Using the bq command-line tool's bq ls command.
Calling the tables.list APImethod.
Using the client libraries.
Querying the CLUSTERING_ORDINAL_POSITION column in theINFORMATION_SCHEMA.COLUMNS view.

The permissions required to list clustered tables and the steps to list themare the same as for standard tables. For more information about listingtables, seeListing tables in a dataset.

Modify clustering specification

You can change or remove a table's clustering specifications, or change the setof clustered columns in a clustered table. This method of updating theclustering column set is useful for tables that use continuous streaming insertsbecause those tables cannot be easily swapped by other methods.

Follow these steps to apply a new clustering specification to unpartitioned orpartitioned tables.

In the bq tool, update the clustering specification of yourtable to match the new clustering:
```
 bq update --clustering_fields=CLUSTER_COLUMN DATASET.ORIGINAL_TABLE 
```
Replace the following:
- CLUSTER_COLUMN: the column you are clustering on—for example, mycolumn
- DATASET: the name of the dataset containing the table—for example, mydataset
- ORIGINAL_TABLE: the name of your original table—for example, mytable
You can also call the tables.update or tables.patch API method to modify the clustering specification.
To cluster all rows according to the new clustering specification,run the following UPDATE statement:
```
UPDATE DATASET.ORIGINAL_TABLE SET CLUSTER_COLUMN=CLUSTER_COLUMN WHERE true
```

Table security

To control access to tables in BigQuery, seeIntroduction to table access controls.

What's next

For information about querying clustered tables, seeQuerying clustered tables.
For an overview of partitioned table support in BigQuery, seeIntroduction to partitioned tables.
To learn how to create partitioned tables, seeCreating partitioned tables.
To see an overview of INFORMATION_SCHEMA, go toIntroduction to BigQuery INFORMATION_SCHEMA.

Create and use clustered tables | BigQuery | Google Cloud (2024)

FAQs

What is the difference between partitioning and clustering in BigQuery? ›

Partitioning divides a table into segments, while clustering sorts the table based on user-defined columns. Both methods can improve the performance of certain types of queries, such as queries that use filter clauses and queries that aggregate data.

Get More Info Here ›

How do I create a table in Google cloud storage? ›

Create external tables on partitioned data

In the Google Cloud console, go to BigQuery. In the Explorer pane, expand your project and select a dataset. Click more_vert View actions, and then click Create table.

Explore More ›

Are BigQuery tables case sensitive? ›

Table names are case-sensitive by default.

Show Me More ›

How to create a cluster in Google Cloud? ›

Open the Dataproc Create a cluster page in the Google Cloud console in your browser, then click Create in the cluster on Compute engine row in the Create a Dataproc cluster on Compute Engine page. The Set up cluster panel is selected with fields filled in with default values.

Learn More ›

How do I create a cluster in AWS? ›

To create a cluster (console)

On the navigation bar, use the region selector to choose one of the AWS Regions where AWS CloudHSM is currently supported.
Choose Create cluster.
In the Cluster configuration section, do the following: ...
Choose Next.
Specify how long the service should retain backups. ...
Choose Next.

More items...

Discover More Details ›

What is the significance of a cluster table? ›

A clustered table maintains the sort properties in the context of each operation that modifies it. Queries that filter or aggregate by the clustered columns only scan the relevant blocks based on the clustered columns instead of the entire table or table partition.

Discover More Details ›

What is the difference between table partition and clustered index? ›

Partitioning a table only divides it into "chunks" based on the partition function. The clustered index will give order to the data within each partition.

What is an example of partitioning clustering? ›

The most common example of partitioning clustering is the K-Means Clustering algorithm. In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined groups.

Learn More ›

How do I create a SQL database in Google Cloud? ›

Create a database on the Cloud SQL instance

In the Google Cloud console, go to the Cloud SQL Instances page. ...
To open the Overview page of an instance, click the instance name.
Select Databases from the SQL navigation menu.
Click Create database.
In the New database dialog, specify the name of the database.
Click Create.

Read On ›

How do I create a Bigtable in GCP? ›

To create a new table in an instance:

Open the list of Bigtable instances in the Google Cloud console. ...
Click the instance whose tables you want to view.
Click Tables in the left pane. ...
Enter a table ID for the table.
Add column families (optional). ...
Optional: Enable a change stream for the table. ...
Click Create a table.

Find Out More ›

What is the limit of Cloud Storage? ›

You can use Google Cloud Storage to store data in Google's cloud. Cloud Storage is typically used to store unstructured data. You can add objects of any kind and size, and up to 5 TB.

Explore More ›

What are the limitations of BigQuery? ›

Query jobs

Limit	Default
Maximum request size	10 MB
Maximum response size	10 GB compressed
Maximum row size	100 MB
Maximum columns in a table, query result, or view definition	10,000 columns

21 more rows

Get More Info ›

What are the different types of tables in BigQuery? ›

There are two types of tables in BigQuery: logical tables and physical tables. Logical tables are the tables that you query and physical tables are the underlying storage for those logical tables. Physical tables can be either materialised or non-materialized.

Discover More Details ›

Does BigQuery have primary keys? ›

Recently, BigQuery introduced Primary Key and Foreign Key constraints. However, they differ from what we're accustomed to in traditional RDBMS. For instance, these constraints aren't currently enforced, and they can only be set between tables within the same dataset.

Find Out More ›

What is table clustering? ›

How do we create cluster? ›

To create a cluster, you need to include at least one node in the cluster and you must have access to at least one of the nodes that will be in the cluster. If only one node is specified, it must be the system that you are currently accessing.

How do you make a cluster model? ›

The five steps to building a clustering model

STEP 1: Create a dataset. Compile an aggregated dataset ready to use by your model.
STEP 2: Create a model. Create a clustering model instance in your platform of choice.
STEP 3: Explore your dataset. ...
STEP 4: Configure and train your model. ...
STEP 5: Predict using your model.

Jan 11, 2023

Explore More ›

Create and use clustered tables | BigQuery | Google Cloud (2024)

Create clustered tables

Table naming

Required permissions

Create an empty clustered table with a schema definition

Console

SQL

bq

Terraform

Prepare Cloud Shell

Prepare the directory

Apply the changes

API

Python

Go

Java

Create a clustered table from a query result

SQL

bq

API

Create a clustered table when you load data

SQL

API

Go

Java

Python

Control access to clustered tables

Use clustered tables

Get information about clustered tables

Required permissions

Get clustered table information

Console

SQL

bq

API

List clustered tables in a dataset

Modify clustering specification

Table security

What's next

FAQs

What is the difference between partitioning and clustering in BigQuery? ›

What is the limit of Cloud Storage? ›