0

Pricing

 3 years ago
source link: https://cloud.google.com/bigquery/pricing
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Overview

BigQuery offers scalable, flexible pricing options to meet your technical needs and your budget.

Storage costs are based on the amount of data stored in BigQuery. Storage charges can be:

  • Active — A monthly charge for data stored in tables or in partitions that have been modified in the last 90 days.
  • Long-term — A lower monthly charge for data stored in tables or in partitions that have not been modified in the last 90 days.

For query costs, you can choose between two pricing models:

  • On-demand — This is the most flexible option. On-demand pricing is based on the amount of data processed by each query you run.
  • Flat-rate — This pricing option is best for customers who desire cost predictability. Flat-rate customers purchase dedicated resources for query processing and are not charged for individual queries.

For more information about storage and query pricing, see Google Cloud SKUs. Note that on-demand query pricing is referred to as analysis pricing on the SKUs page.

Pricing models apply to accounts, not individual projects, unless otherwise specified.

Pricing summary

The following table summarizes BigQuery pricing. BigQuery's quotas and limits apply to these operations.

How charges are billed

Each project you create has a billing account attached to it. Any charges incurred by BigQuery jobs run in the project are billed to the attached billing account. BigQuery storage charges are also billed to the attached billing account.

How to analyze billing data

You can view BigQuery costs and trends by using the Cloud Billing reports page in the Cloud Console. For information on analyzing billing data using reports, see View your billing reports and cost trends.

For information on analyzing your billing data in BigQuery, see Export Cloud Billing data to BigQuery in the Cloud Billing documentation.

Free operations

The following table shows BigQuery operations that are free of charge in every location. BigQuery's quotas and limits apply to these operations.

Operation Details Loading data

You are not charged for loading data from Cloud Storage or from local files into BigQuery. However, you are charged for storing data in Cloud Storage. See Data storage on the Cloud Storage pricing page for details. Once the data is loaded into BigQuery, it is subject to BigQuery Storage pricing.

If the target dataset is located in the US multi-region, you are not charged for network egress when loading from a Cloud Storage bucket in any other region. For more information, see Location considerations.

Copying data You are not charged for copying a table, but you do incur charges for storing the new table and the table you copied. For more information, see Copying an existing table. Exporting data When you export data from BigQuery to Cloud Storage, you are not charged for the export operation, but you do incur charges for storing the data in Cloud Storage. See Data storage on the Cloud Storage pricing page for details. For more information, see Exporting data from BigQuery. Deleting datasets You are not charged for deleting a dataset. Deleting tables, views, partitions, and functions You are not charged for deleting a table, deleting a view, deleting individual table partitions, or deleting a user-defined function. Metadata operations You are not charged for list, get, patch, update and delete calls. Examples include (but are not limited to): listing datasets, updating a dataset's access control list, updating a table's description, or listing user-defined functions in a dataset. Reading pseudo columns You are not charged for querying the contents of the following pseudo columns:

_TABLE_SUFFIX — Used when querying wildcard tables
_PARTITIONDATE — Used when querying ingestion-time partitioned tables
_PARTITIONTIME — Used when querying ingestion-time partitioned tables
_FILE_NAME — Used when querying tables based on external data sources Reading meta tables You are not charged for querying the contents of the following meta tables:

__PARTITIONS_SUMMARY__ — Used when getting metadata about partitions in a partitioned table or an ingestion-time partitioned table
__TABLES_SUMMARY__ — Used when getting metadata about the tables and views in a dataset Creating, replacing, or calling UDFs You are not charged for creating, replacing, or invoking persistent user-defined functions (UDFs).

Always free usage limits

As part of the Google Cloud Free Tier, BigQuery offers some resources free of charge up to a specific limit. These free usage limits are available during and after the free trial period. If you go over these usage limits and are no longer in the free trial period, you will be charged according to the pricing on this page.

Resource Monthly free usage limits Details Storage The first 10 GB per month is free. BigQuery ML models and training data stored in BigQuery are included in the BigQuery storage free tier. Queries (analysis) The first 1 TB of query data processed per month is free. Queries that use BigQuery ML prediction, inspection, and evaluation functions are included in the BigQuery analysis free tier. BigQuery ML queries that contain CREATE MODEL statements are not.
BigQuery flat-rate pricing is also available for high-volume customers that prefer a stable, monthly cost. BigQuery ML CREATE MODEL queries The first 10 GB of data processed by queries that contain CREATE MODEL statements per month is free. BigQuery ML CREATE MODEL queries are independent of the BigQuery analysis free tier, and only apply to BigQuery ML built-in models (models that are trained within BigQuery).

Query pricing

Query pricing refers to the cost of running your SQL commands, and user-defined functions, and qualifying Data Manipulation Language (DML) and Data Definition Language (DDL) statements.

BigQuery offers a choice of two pricing models:

  • On-demand pricing is flexible and efficient. You pay only for the queries you run.
  • Flat-rate pricing offers predictable and consistent month-to-month costs.

By default, you are billed according to the on-demand pricing model. You can change your billing model to flat-rate billing, or you can choose between on-demand and flat-rate billing for each specific project and location combination.

On-demand pricing

Under on-demand pricing, BigQuery charges for queries by using one metric: the number of bytes processed (also referred to as bytes read). You are charged for the number of bytes processed whether the data is stored in BigQuery or in an external data source such as Cloud Storage, Drive, or Cloud Bigtable. On-demand pricing is based solely on usage.

On-demand query pricing is as follows:

Note the following regarding query charges:

  • BigQuery uses a columnar data structure. You're charged according to the total data processed in the columns you select, and the total data per column is calculated based on the types of data in the column. For more information about how your data size is calculated, see Data size calculation.
  • You aren't charged for queries that return an error or for queries that retrieve results from the cache.
  • Charges are rounded to the nearest MB, with a minimum 10 MB data processed per table referenced by the query, and with a minimum 10 MB data processed per query.
  • Canceling a running query job might incur charges up to the full cost for the query if you let the query run to completion.
  • When you run a query, you're charged according to the data processed in the columns you select, even if you set an explicit LIMIT on the results.
  • Partitioning and clustering your tables can help reduce the amount of data processed by queries. As a best practice, use partitioning and clustering whenever possible.
  • On-demand query pricing is referred to as analysis pricing on the Google Cloud SKUs page.

On-demand query cost controls

BigQuery provides cost control mechanisms that enable you to cap your query costs. You can set:

Querying Cloud Storage data

When querying an external data source from BigQuery, you are charged for the number of bytes read by the query. For more information, see Query pricing. You are also charged for storing the data on Cloud Storage. For more information, see Cloud Storage Pricing.

Querying columnar formats on Cloud Storage

If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages:

Flat-rate pricing

BigQuery offers flat-rate pricing for customers who prefer a stable cost for queries rather than paying the on-demand price per TB of data processed.

You can choose to use flat-rate pricing using BigQuery Reservations.

When you enroll in flat-rate pricing, you purchase slot commitments—dedicated query processing capacity, measured in BigQuery slots. Your queries consume this capacity, and you are not billed for bytes processed. If your capacity demands exceed your committed capacity, BigQuery will queue up slots, and you will not be charged additional fees. For more information about how BigQuery leverages slots for query processing, see Slots.

Flat-rate pricing:

  • Applies to query costs, including BigQuery ML, DML, and DDL statements.
  • Does not apply to costs of storage, streaming ingest, or BI Engine.
  • Is purchased as a regional resource. Slot commitments purchased in one region or multi-region cannot be used in another region or multi-region and cannot be moved.
  • Allows customers to raise per-project concurrency quotas by contacting Google Cloud Support.
  • Is available in per-second, monthly, and annual commitments.
  • Can be shared across your entire organization. There is no need to buy slot commitments for every project.
  • Has a 100-slot minimum and is purchased in increments of 100 slots.
  • Is billed per second for the duration of your commitment.
Note: If you signed an offline agreement for your flat-rate purchase, your plan may deviate from the details described here. Contact your sales representative to take advantage of Reservations.Note: Customers using on-demand billing often have 2000 slots or more for query processing. Allocating 100 slots might result in worse query performance.

Monthly flat-rate commitments

The following table shows the cost of your monthly slot commitment.

Annual flat-rate commitments

The following table shows the cost of your annual slot commitment.

Flex slots: short-term commitments

Flex slots are a special commitment type:

  • Commitment duration is only 60 seconds.
  • You can cancel Flex slots any time thereafter.
  • You are charged only for the seconds your commitment was deployed.

Flex slots are subject to capacity availability. When you attempt to purchase Flex Slots, success of this purchase is not guaranteed. However, once your commitment purchase is successful, your capacity is guaranteed until you cancel it.

The following table shows the cost of your Flex slot commitment.

Trial slots (promotion)

Note: Trial slots are available only in the us and eu multi-regions.

On May 20, 2020, BigQuery introduced a limited promotion for new or returning BigQuery customers. Qualified customers can purchase Trial slots, which is a 500-slot, 6-month commitment at a deeply discounted rate.

Trial slots have the following behavior:

  • You must commit to a 6-month commitment.
  • You cannot cancel for 182 days from the moment of purchase.
  • You can only purchase 500 slots.
  • You can purchase other commitment types and combine them with Trial slots.
  • Trial slots are available only in US and EU multi-regions.
  • Trial slots have limited availability and are offered on a first-come, first-serve basis.
  • There are no performance or availability differences between Trial slots and other types of slot commitments.

Trial slots are subject to qualification criteria and are available to the following customers:

  • Brand new Google Cloud customers signing up for BigQuery
  • Existing Google Cloud customers signing up for BigQuery
  • Existing BigQuery customers whose spend over the past 3 months has not exceeded $500 per month
  • Customers who sign up using their company email addresses
  • Offer only available for purchases directly from Google; not available through resellers or distributors

For more information about how trial slots work, see Trial slots.

To participate in this promotion, fill out the BigQuery Trial Slots Promotion form, and we will get back to you in no more than five business days.

Storage pricing

Once your data is loaded into BigQuery, you are charged for storing it. Storage pricing is based on the amount of data stored in your tables when it is uncompressed.

The size of the data is calculated based on the data types of the individual columns. For a detailed explanation of how data size is calculated, see Data size calculation.

Active storage

Active storage charges are as follows:

Storage pricing is prorated per MB, per second. For example, if you store:

  • 100 MB for half a month, you pay $0.001 (a tenth of a cent)
  • 500 GB for half a month, you pay $5
  • 1 TB for a full month, you pay $20
Note: Storage usage is calculated in gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB). Similarly, 1 TB is 240 bytes (1,024 GB).

Long-term storage

If a table is not edited for 90 consecutive days, the price of storage for that table automatically drops by approximately 50%. There is no degradation of performance, durability, availability, or any other functionality when a table is considered long-term storage.

Each partition of a partitioned table is considered separately for long-term storage pricing. If a partition hasn't been modified in the last 90 days, the data in that partition is considered long term storage and is charged at the discounted price.

Long-term storage pricing is as follows:

If the table is edited, the price reverts back to the regular storage pricing, and the 90-day timer starts counting from zero. Anything that modifies the data in a table resets the timer, including:

Action Details Loading data into a table Any load or query job that appends data to a destination table or overwrites a destination table. Copying data into a table Any copy job appends data to a destination table or overwrites a destination table. Writing query results to a table Any query job that appends data to a destination table or overwrites a destination table. Using the Data Manipulation Language (DML) Using a DML statement to modify table data. Using Data Definition Language (DDL) Using a CREATE OR REPLACE TABLE DDL statement to replace a table. Streaming data into the table Ingesting data using the tabledata.insertAll API call.

All other actions do not reset the timer, including the following:

  • Querying a table
  • Creating a view that queries a table
  • Exporting data from a table
  • Copying a table (to another destination table)
  • Patching or updating a table resource

For tables that reach the 90-day threshold during a billing cycle, the price is prorated accordingly.

Long-term storage pricing applies only to BigQuery storage, not to data stored in external data sources such as Cloud Bigtable, Cloud Storage, and Drive.

BigQuery Storage API pricing

The BigQuery Storage API has an on-demand pricing model. You are charged for the data that you read. Customers enrolled in flat-rate pricing can use the BigQuery Storage API to read up to 300 TB of data per month at no charge. Reads in excess of 300 TB per month are billed at the on-demand rate.

On-demand pricing

Under on-demand pricing, your BigQuery Storage API charges are based on the number of bytes read from BigQuery storage by calls to ReadRows.

The number of bytes read includes data used for filtering but not returned to you as output from ReadRows. You are not charged for data read from temporary tables.

On-demand BigQuery Storage API charges are as follows:

Note the following regarding BigQuery Storage API charges:

  • You are charged according to the total amount of data read. The total data read per column is calculated based on the type of data in the column, and the size of the data is calculated based on the column's data type. For a detailed explanation of how data size is calculated, see Data size calculation.
  • You are charged for any data read in a read session even if a ReadRows call fails.
  • If you cancel a ReadRows call before the end of the stream is reached, you are charged for any data read before the cancellation. Your charges can include data that was read but not returned to you before the cancellation of the ReadRows call.
  • As a best practice, use partitioned and clustered tables whenever possible. You can reduce the amount of data read by using a WHERE clause to prune partitions. For more information, see Querying partitioned tables.

Data size calculation

When you load data into BigQuery or query the data, you're charged according to the data size. Data size is calculated based on the size of each column's data type.

The size of your stored data, and the size of the data processed by your queries is calculated in gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB). Similarly, 1 TB is 240 bytes (1,024 GB).

The size of BigQuery's data types is as follows:

Data type Size INT64/INTEGER 8 bytes FLOAT64/FLOAT 8 bytes NUMERIC 16 bytes BOOL/BOOLEAN 1 byte STRING 2 bytes + the UTF-8 encoded string size BYTES 2 bytes + the number of bytes in the value DATE 8 bytes DATETIME 8 bytes TIME 8 bytes TIMESTAMP 8 bytes STRUCT/RECORD 0 bytes + the size of the contained fields GEOGRAPHY 16 bytes + 24 bytes * the number of vertices in the geography type (you can verify the number of vertices using the ST_NumPoints function)

Null values for any data type are calculated as 0 bytes.

A repeated column is stored as an array, and the size is calculated based on the number of values. For example, an integer column (INT64) that is repeated (ARRAY<INT64>) and contains 4 entries is calculated as 32 bytes (4 entries x 8 bytes).

Streaming pricing

Loading data into BigQuery is free, with the exception of a small charge for streamed data.

Pricing for streaming inserts is as follows:

Data Manipulation Language pricing

BigQuery charges for DML queries based on the number of bytes processed by the query.

DML pricing for non-partitioned tables

For non-partitioned tables, the number of bytes processed is calculated as follows:

DML statement Bytes processed INSERT The sum of bytes processed for all the columns referenced from the tables scanned by the query. UPDATE The sum of bytes in all the columns referenced from the tables scanned by the query
+ the sum of bytes for all columns in the updated table at the time the UPDATE starts. DELETE The sum of bytes in all the columns referenced from the tables scanned by the query
+ the sum of bytes for all columns in the modified table at the time the DELETE starts. MERGE If there are only INSERT clauses in the MERGE statement, you are charged for the sum of bytes processed for all the columns referenced in all tables scanned by the query.
If there is an UPDATE or DELETE clause in the MERGE statement, you are charged for the sum of the bytes processed for all the columns referenced in the source tables scanned by the query
+ the sum of bytes for all columns in the target table (at the time the MERGE starts).

DML pricing for partitioned tables

For partitioned tables, the number of bytes processed is calculated as follows:

DML statement Bytes processed INSERT The sum of bytes processed for all the columns referenced in all partitions scanned by the query. UPDATE The sum of bytes processed for all the columns referenced in all partitions for the tables scanned by the query
+ the sum of bytes for all columns in the updated or scanned partitions for the table being updated (at the time the UPDATE starts). DELETE The sum of bytes processed for all the columns referenced in all partitions for the tables scanned by the query
+ the sum of bytes for all columns in the modified or scanned partitions for the table being modified (at the time the DELETE starts). MERGE If there are only INSERT clauses in the MERGE statement, you are charged for the sum of bytes processed for all the columns referenced in all partitions scanned by the query.
If there is an UPDATE or DELETE clause in the MERGE statement, you are charged for the sum of the bytes processed for all the columns referenced in all partitions for the source tables scanned by the query
+ the sum of bytes for all columns in the updated, deleted or scanned partitions for the target table (at the time the MERGE starts).

Data Definition Language pricing

BigQuery charges for DDL queries based on the number of bytes processed by the query. The number of bytes processed is calculated as follows for DDL statements:

DDL statement Bytes processed CREATE TABLE None. CREATE TABLE ... AS SELECT ... The sum of bytes processed for all the columns referenced from the tables scanned by the query. CREATE VIEW None. DROP TABLE None. DROP VIEW None.

Clustered table pricing

When you create and use clustered tables in BigQuery, your charges are based on how much data is stored in the tables and on the queries you run against the data. Clustered tables help you to reduce query costs by pruning data so it is not processed by the query. This process is called block pruning.

Block pruning

BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks.

When you submit a query that contains a filter on a clustered column, BigQuery uses the clustering information to efficiently determine whether a block contains any data relevant to the query. This allows BigQuery to only scan the relevant blocks—a process referred to as block pruning.

Query pricing is based on the number of bytes processed. When you run a query against a clustered table, and the query includes a filter on the clustered columns, BigQuery uses the filter expression and the block metadata to prune the blocks scanned by the query.

When a block is pruned, it is not scanned. Only the scanned blocks are used to calculate the bytes of data processed by the query. The number of bytes processed by a query against a clustered table equals the sum of the bytes read in each column referenced by the query in the scanned blocks.

If a clustered table is referenced multiple times in a query that uses several filters, BigQuery charges for scanning the columns in the appropriate blocks in each of the respective filters.

Scripting pricing

During the BigQuery scripting beta, the BigQuery team recommends using projects with flat-rate reservations to avoid unintended query costs because the number of bytes scanned by a script is generally not known before executing it. Alternatively, you can use the BigQuery sandbox to take advantage of limited free script execution. The BigQuery team will provide more explicit control over total bytes scanned by scripts and individual statements within scripts over time. This is a beta release; for updates to pricing, see the BigQuery release notes.

If a script fails, the cost of any statements up until the failure still applies. The statement that failed does not incur any costs.

For publicly released statement types such as SELECT, INSERT, and UPDATE, the cost of executing the statement is as described in the public pricing documentation. For scripting-specific statement types, the following pricing applies:

  • DECLARE: the sum of bytes scanned for any tables referenced in the DEFAULT expression. DECLARE statements with no table references do not incur a cost.
  • SET: the sum of bytes scanned for any tables referenced in the expression. SET statements with no table references do not incur a cost.
  • IF: the sum of bytes scanned for any tables referenced in the condition expression. IF condition expressions with no table reference do not incur a cost. Any statements within the IF block that are not executed do not incur a cost.
  • WHILE: the sum of bytes scanned for any tables referenced in the condition expression. WHILE statements with no table references in the condition expression do not incur a cost. Any statements within the WHILE block that are not executed do not incur a cost.
  • CONTINUE or ITERATE: No associated cost.
  • BREAK or LEAVE: No associated cost.
  • BEGIN or END: No associated cost.

Temporary tables do not incur charges for storage while the script is running. However, regular pricing occurs for any statements that create, modify, or query them.

BigQuery pricing examples

Estimating query costs

For query pricing examples, see Estimating query costs.

Estimating storage costs

For storage pricing examples, see Estimating storage costs.

DML pricing examples for non-partitioned tables

The following examples demonstrate how BigQuery calculates bytes read for DML statements that modify non-partitioned tables.

Example 1: Non-partitioned table UPDATE

table1 has two columns: col1 of type INTEGER and col2 of type STRING.

UPDATE table1 SET col1 = 1 WHERE col2 = 2;

Bytes processed in this example =

  • sum of the number of bytes in col1 +
  • sum of the number of bytes in col2

Example 2: Non-partitioned table UPDATE

table1 has two columns: col1 of type INTEGER and col2 of type STRING. table2 has one column: field1 of type INTEGER.

UPDATE table1 SET col1 = 1 WHERE col1 in (SELECT field1 from table2)

Bytes processed in this example =

  • sum of the number of bytes in table1.col1 before UPDATE +
  • sum of the number of bytes in table1.col2 before UPDATE +
  • sum of the number of bytes in table2.field1

DML pricing examples for partitioned tables

The following examples demonstrate how BigQuery calculates bytes read for DML statements that modify partitioned tables. To view the JSON schema representations for the tables used in the examples, see Tables used in examples on the Updating Partitioned Table Data Using DML Statements page.

Example 1: Ingestion-time partitioned table INSERT

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mytable has two columns: field1 of type INTEGER and field2 of type STRING.

INSERT INTO mytable (_PARTITIONTIME, field1) AS SELECT TIMESTAMP(DATE(ts)), id from mytable2

Bytes processed in this example =

  • sum of the number of bytes in mytable2.ts +
  • sum of the number of bytes in mytable2.id

The size of table into which the rows are inserted—mytable—does not affect the cost of the query.

Example 2: Partitioned table INSERT

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mycolumntable has four columns: field1 of type INTEGER, field2 of type STRING, field3 of type BOOLEAN, and ts of type TIMESTAMP.

INSERT INTO mycolumntable (ts, field1) AS SELECT ts, id from mytable2

Bytes processed in this example =

  • sum of the number of bytes in mytable2.ts +
  • sum of the number of bytes in mytable2.id

The size of table into which the rows are inserted—mycolumntable—does not affect the cost of the query.

Example 3: Ingestion-time partitioned table UPDATE

DML statement 1: Updating a single partition

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mytable has two columns: field1 of type INTEGER and field2 of type STRING.

UPDATE project.mydataset.mytable T SET T.field1 = T.field1 + 100 WHERE T._PARTITIONTIME = TIMESTAMP(“2017-05-01”) AND EXISTS (SELECT S.id from project.mydataset.mytable2 S WHERE S.id = T.field1)

Bytes processed in this example =

  • sum of the number of bytes in mytable2.id +
  • sum of the number of bytes in mytable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mytable.field2 in the "2017-05-01" partition

DML statement 2: Updating a partition based on another partition in the table

UPDATE project.mydataset.mytable T SET T._PARTITIONTIME = TIMESTAMP(“2017-06-01”), T.field1 = T.field1 + 100 WHERE T._PARTITIONTIME = TIMESTAMP(“2017-05-01”) AND EXISTS (SELECT 1 from project.mydataset.mytable S WHERE S.field1 = T.field1 AND S._PARTITIONTIME = TIMESTAMP("2017-06-01") )

Bytes processed in this example =

  • sum of the number of bytes in mytable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mytable.field2 in the "2017-05-01" partition +
  • sum of the number of bytes in mytable.field1 in the "2017-06-01" partition +
  • sum of the number of bytes in mytable.field2 in the "2017-06-01" partition

In this case, the cost of the UPDATE statement is the sum of sizes of all fields in the partitions corresponding to "2017-05-01" and "2017-06-01".

Example 4: Partitioned table UPDATE

DML statement 1: Updating a single partition

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mycolumntable has four columns: field1 of type INTEGER, field2 of type STRING, field3 of type BOOLEAN, and ts of type TIMESTAMP.

UPDATE project.mydataset.mycolumntable T SET T.field1 = T.field1 + 100 WHERE DATE(T.ts) = “2017-05-01” AND EXISTS (SELECT S.id from project.mydataset.mytable2 S WHERE S.id = T.field1)

Bytes processed in this example =

  • sum of the number of bytes in mytable2.id +
  • sum of the number of bytes in mycolumntable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field2 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field3 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.ts in the "2017-05-01" partition

DML statement 2: Updating a partition based on another partition in the table

UPDATE project.mydataset.mycolumntable T SET T.ts = TIMESTAMP(“2017-06-01”), T.field1 = T.field1 + 100 WHERE DATE(T.ts) = “2017-05-01” AND EXISTS (SELECT 1 from project.mydataset.mycolumntable S WHERE S.field1 = T.field1 AND DATE(S.ts) = "2017-06-01")

Bytes processed in this example =

  • sum of the number of bytes in mycolumntable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field2 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field3 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.ts in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field1 in the "2017-06-01" partition +
  • sum of the number of bytes in mycolumntable.field2 in the "2017-06-01" partition +
  • sum of the number of bytes in mycolumntable.field3 in the "2017-06-01" partition +
  • sum of the number of bytes in mycolumntable.ts in the "2017-06-01" partition

In this case, the cost of the UPDATE statement is the sum of sizes of all fields in the partitions corresponding to "2017-05-01" and "2017-06-01".

Example 5: Ingestion-time partitioned table DELETE

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mytable has two columns: field1 of type INTEGER and field2 of type STRING.

DELETE project.mydataset.mytable T WHERE T._PARTITIONTIME = TIMESTAMP(“2017-05-01”) AND EXISTS (SELECT S.id from project.mydataset.mytable2 S WHERE S.id = T.field1)

Bytes processed in this example =

  • sum of the number of bytes in mytable2.id +
  • sum of the number of bytes in mytable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mytable.field2 in the "2017-05-01" partition

Example 6: Partitioned table DELETE

mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mycolumntable has four columns: field1 of type INTEGER, field2 of type STRING, field3 of type BOOLEAN, and ts of type TIMESTAMP.

DELETE project.mydataset.mycolumntable T WHERE DATE(T.ts) =“2017-05-01” AND EXISTS (SELECT S.id from project.mydataset.mytable2 S WHERE S.id = T.field1)

Bytes processed in this example =

  • sum of the number of bytes in mytable2.id +
  • sum of the number of bytes in mycolumntable.field1 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field2 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.field3 in the "2017-05-01" partition +
  • sum of the number of bytes in mycolumntable.ts in the "2017-05-01" partition

Clustered table pricing example

You have a clustered table named ClusteredSalesData. The table is partitioned by the timestamp column, and it is clustered by the customer_id column. The data is organized into the following set of blocks:

Partition identifier Block ID Minimum value for customer_id in the block Maximum value for customer_id in the block 20160501 B1 10000 19999 20160501 B2 20000 24999 20160502 B3 15000 17999 20160501 B4 22000 27999

You run the following query against the table. The query contains a filter on the customer_id column.

SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  customer_id BETWEEN 20000
  AND 23000
  AND DATE(timestamp) = "2016-05-01"

This query:

  • Scans the timestamp, customer_id, and totalSale columns in blocks B2 and B4.
  • Prunes the B3 block because of the DATE(timestamp) = "2016-05-01" filter predicate on the timestamp partitioning column.
  • Prunes the B1 block because of the customer_id BETWEEN 20000 AND 23000 filter predicate on the customer_id clustering column.

Scripting pricing example

The following example script contains comments above every statement that explain what cost, if any, is incurred by the following statement.

-- No cost, since no tables are referenced.
DECLARE x DATE DEFAULT CURRENT_DATE();
-- Incurs the cost of scanning string_col from dataset.table.
DECLARE y STRING DEFAULT (SELECT MAX(string_col) FROM dataset.table);
-- Incurs the cost of copying the data from dataset.big_table.  Once the
-- table is created, you are not charged for storage while the rest of the
-- script runs.
CREATE TEMP TABLE t AS SELECT * FROM dataset.big_table;
-- Incurs the cost of scanning column1 from temporary table t.
SELECT column1 FROM t;
-- No cost, since y = 'foo' doesn't reference a table.
IF y = 'foo' THEN
  -- Incurs the cost of scanning all columns from dataset.other_table, if
  -- y was equal to 'foo', or otherwise no cost since it is not executed.
  SELECT * FROM dataset.other_table;
ELSE
  -- Incurs the cost of scanning all columns from dataset.different_table, if
  -- y was not equal to 'foo', or otherwise no cost since it is not executed.
  UPDATE dataset.different_table
  SET col = 10
  WHERE true;
END IF;
-- Incurs the cost of scanning date_col from dataset.table for each
-- iteration of the loop.
WHILE x < (SELECT MIN(date_col) FROM dataset.table) DO
  -- No cost, since the expression does not reference any tables.
  SET x = DATE_ADD(x, INTERVAL 1 DAY);
  -- No cost, since the expression does not reference any tables.
  IF true THEN
    -- LEAVE has no associated cost.
    LEAVE;
  END IF;
  -- Never executed, since the IF branch is always taken, so does not incur
  -- a cost.
  SELECT * FROM dataset.big_table;
END WHILE;

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK