4 posts tagged with "bigquery"

Big Query Cost Analysis using StackQL

October 25, 2021 · 2 min read

Technologist and Cloud Consultant

Queries (particularly) repetitive queries that don't take advantage of results caching can lead to extraordinarily high bills.

StackQL, with it's backend SQL engine, allows you to query Big Query statistics in real time, including identifying queries which are not served from cache and understanding billable charges per query or time slice.

Here is a simple query to break down a time period into hours and show the total queries, queries served from cache and the total query charges per hour.

StackQL
Data
Results

SELECT
 STRFTIME('%H', DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch')) as hour,
 COUNT(*) as num_queries,
 SUM(JSON_EXTRACT(statistics, '$.query.cacheHit')) as using_cache,
 SUM(JSON_EXTRACT(statistics, '$.query.totalBytesBilled')*{{ .costPerByte }} ) as queryCost
FROM google.bigquery.jobs
 WHERE projectId = '{{ .projectId }}'
  AND allUsers = 'true'
  AND minCreationTime = '{{ .minCreationTime }}'
  AND maxCreationTime = '{{ .maxCreationTime }}'
  AND state = 'DONE'
  AND JSON_EXTRACT(statistics, '$.query') IS NOT null
GROUP BY STRFTIME('%H', datetime(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch'));

// variables

local projectId = 'my-project-id';
local costPerTb = 6.5;
local startTimeMs = 1634734801000;

{
	projectId: projectId,
	minCreationTime: std.toString(startTimeMs),
	maxCreationTime: std.toString(startTimeMs+86400000),
	costPerByte: costPerTb*(1/std.pow(1024, 4)),
}

|------|-------------|-------------|------------------------|
| hour | num_queries | using_cache |       queryCost        |
|------|-------------|-------------|------------------------|
|   00 |         182 |           0 |      70.73793411254883 |
|------|-------------|-------------|------------------------|
|   01 |          88 |           0 |      34.20295715332031 |
|------|-------------|-------------|------------------------|
|   02 |           2 |           0 |     0.7773399353027344 |
|------|-------------|-------------|------------------------|
|   03 |         267 |           0 |     103.77488136291504 |
|------|-------------|-------------|------------------------|
|   04 |         216 |           0 |      83.95271301269531 |
|------|-------------|-------------|------------------------|
|   05 |          47 |           0 |     18.267488479614258 |
|------|-------------|-------------|------------------------|
|   06 |         122 |           0 |       47.4177360534668 |
|------|-------------|-------------|------------------------|
|   07 |         195 |           0 |       75.7906436920166 |
|------|-------------|-------------|------------------------|
|   08 |         186 |           0 |       72.2926139831543 |
|------|-------------|-------------|------------------------|
|   09 |          75 |           0 |      29.15024757385254 |
|------|-------------|-------------|------------------------|
|   10 |          62 |           0 |     24.097537994384766 |
|------|-------------|-------------|------------------------|
|   11 |          56 |           0 |     21.765518188476562 |
|------|-------------|-------------|------------------------|
|   12 |          89 |           0 |      34.59162712097168 |
|------|-------------|-------------|------------------------|
|   15 |           3 |           0 | 0.00018596649169921875 |
|------|-------------|-------------|------------------------|
|   22 |           1 |           0 |     0.3886699676513672 |
|------|-------------|-------------|------------------------|
|   23 |          35 |           0 |     13.603448867797852 |
|------|-------------|-------------|------------------------|

Many more examples to come, including using this data to create visualisations in a Jupyter notebook, stay tuned!

Querying Big Query Errors and Load Stats

August 31, 2021 · 8 min read

Jeffrey Aven

Technologist and Cloud Consultant

Big Query provides a wealth of metrics and statistics for jobs run against it which could be queries, load jobs or export jobs. This article demonstrates some queries you can run using StackQL to bring back live statistics from load operations into Big Query as well as detail regarding errors encountered during the loading of data into Big Query.

Loading Data into Big Query from GCS using StackQL

In a previous blog, we demonstrated how to create a Big Query dataset and how to create a Big Query table using StackQL INSERT statements. Having created a target dataset and table in Big Query, we can invoke a load job using StackQL by performing an INSERT into the google.bigquery.jobs resource.

The data for this operation is shown in the Data tab which is supplied in Jsonnet format.

StackQL
Data

INSERT INTO google.bigquery.jobs(
  projectId,
  data__configuration
)
SELECT
  'stackql',
  '{{ .configuration }}'
;

{
  local project_id = 'stackql',
  local dataset_id = 'stackql_downloads',
  local table_name = 'usage',
  local bucket_name = 'stackql-download-logs',

  configuration: {
    load: {
      destinationTable: {
        projectId: project_id,
        datasetId: dataset_id,
        tableId: table_name,
      },
      sourceUris: [
	    "gs://%s/%s_%s_*" % [bucket_name, dataset_id, table_name],
      ],
      skipLeadingRows: 1,
      maxBadRecords: 0,
      projectionFields: [],
    }
  }
}

Query for Big Query Errors

The Big Query Job Object can be queried using an StackQL SELECT statement.

To see the available fields with their data types and descriptions, you can run the following StackQL DESCRIBE statement:

DESCRIBE EXTENDED google.bigquery.jobs;

As you can see from running the above command or looking at the API documentation, there is a state field which is an enum showing the state of the job, since we are only concerned with completed jobs we will filter on jobs with a state of DONE. The errorResult field is an object but its presence alone indicates that an error has occurred so we will add another filter to only show results where errorResult is not null.

A simple query to start off with is to count the number of errors, this will be for all job types (load, extract and query):

StackQL
Results

SELECT COUNT(*) as num_errors 
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

|------------|
| num_errors |
|------------|
|          2 |
|------------|

To get a little more information about Big Query errors we can run a detailed query, extracting fields from the errorResult object using the JSON_EXTRACT built in function. This function is exceptionally useful as many of the fields returned from Google APIs are complex objects.

StackQL
Results

SELECT id, JSON_EXTRACT(errorResult, '$.reason') AS errorReason
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

|-----------------------------------------|--------------|
|                   id                    | errorReason  |
|-----------------------------------------|--------------|
| stackql:US.bquxjob_3016f574_17ba908f99f | invalidQuery |
|-----------------------------------------|--------------|
| stackql:US.job_zqyy5XkNSGFlAcP7MfyfO5KA | accessDenied |
|-----------------------------------------|--------------|

Get Big Query Load Specific Errors

The previous queries returned all errors for all Big Query job types. If we want to narrow our query to just Big Query load operations we can use the Big Query JobStatistics object, which includes fields for each job type.

To refine results to only load operations add the following expression to the WHERE clause:

AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

Date values returned in job responses are in Unix timestamp format, to format them in a human readable format we can use the DATETIME built in function. Here is a more advanced example:

StackQL
Results

SELECT id,
JSON_EXTRACT(errorResult, '$.message') AS errorMessage,
JSON_EXTRACT(errorResult, '$.reason') AS errorReason,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

|---------------------------------------------|-----------------------------------------------------------|--------------|---------------------|---------------------|---------------------|
|                     id                      |                       errorMessage                        | errorReason  |    creationTime     |      startTime      |       endTime       |
|---------------------------------------------|-----------------------------------------------------------|--------------|---------------------|---------------------|---------------------|
| stackql:US.job_zqyy5XkNSGFlAcP7MfyfO5KAo7oA | Access Denied: File                                       | accessDenied | 2021-08-13 01:22:35 | 2021-08-13 01:22:36 | 2021-08-13 01:22:36 |
|                                             | gs://stackql-download-logs/stackql_downloads_usage_2021_: |              |                     |                     |                     |
|                                             | Access Denied                                             |              |                     |                     |                     |
|---------------------------------------------|-----------------------------------------------------------|--------------|---------------------|---------------------|---------------------|
| stackql:US.job_UGJCZTQNUBq7OKF62HRT5Ic0T7bx | Access Denied: File                                       | accessDenied | 2021-08-13 00:08:45 | 2021-08-13 00:08:45 | 2021-08-13 00:08:45 |
|                                             | gs://stackql-download-logs/stackql_downloads_usage_2021:  |              |                     |                     |                     |
|                                             | Access Denied                                             |              |                     |                     |                     |
|---------------------------------------------|-----------------------------------------------------------|--------------|---------------------|---------------------|---------------------|
| stackql:US.job_18YCXQmstlwhfj4Pzc3SA8-DOgYP | Access Denied: File                                       | accessDenied | 2021-08-12 23:59:27 | 2021-08-12 23:59:27 | 2021-08-12 23:59:27 |
|                                             | gs://stackql-download-logs/stackql_downloads_usage_2021:  |              |                     |                     |                     |
|                                             | Access Denied                                             |              |                     |                     |                     |
|---------------------------------------------|-----------------------------------------------------------|--------------|---------------------|---------------------|---------------------|

Get Big Query Load Statistics

Now if you want to query for statistics for Big Query load operations which were successful, we can refine the query using the following conditions:

WHERE project = 'myproject'
AND state = 'DONE'
AND errorResult IS null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

The JobStatistics object for a Big Query load job can be found here: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics3. Let's run an StackQL query to return all of the statistics for load jobs run in a given GCP project.

StackQL
Results

SELECT id,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime,
JSON_EXTRACT(statistics, '$.load.inputFiles') AS inputFiles,
JSON_EXTRACT(statistics, '$.load.inputFileBytes') AS inputFileBytes,
JSON_EXTRACT(statistics, '$.load.outputRows') AS outputRows,
JSON_EXTRACT(statistics, '$.load.outputBytes') AS outputBytes,
JSON_EXTRACT(statistics, '$.load.badRecords') AS badRecords
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS null
AND json_extract(statistics, '$.load') IS NOT null;

|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
|                          id                           |    creationTime     |      startTime      |       endTime       | inputFiles | inputFileBytes | outputRows | outputBytes | badRecords |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.job_2wqdOf0hQmEbfoUtT_H2Vk6bhHb4           | 2021-09-03 01:21:01 | 2021-09-03 01:21:01 | 2021-09-03 01:21:05 |         34 |          24833 |         37 |       16913 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.job_XS1X7uaKznqkd0xKwB3M4yxBu1GP           | 2021-08-19 04:29:29 | 2021-08-19 04:29:29 | 2021-08-19 04:29:30 |          1 |            650 |          1 |         424 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r35be078db112298a_0000017b5ca605f3_1 | 2021-08-19 04:23:13 | 2021-08-19 04:23:13 | 2021-08-19 04:23:16 |         21 |          16136 |         24 |       11223 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.job_n4WpeG26fYQGN3q2s4oBS5ECZtrO           | 2021-08-19 04:18:32 | 2021-08-19 04:18:32 | 2021-08-19 04:18:36 |         21 |          16136 |         24 |       11223 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.job_jcYDO5Uqk_Kq56LyB5gir8t3dOcs           | 2021-08-19 04:09:30 | 2021-08-19 04:09:30 | 2021-08-19 04:09:33 |         21 |          16136 |         24 |       11223 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r563627d461394a48_0000017b18e560af_1 | 2021-08-06 00:38:12 | 2021-08-06 00:38:13 | 2021-08-06 00:38:14 |          7 |           5280 |          8 |        3632 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r53c339dfc533e536_0000017b18e3ae1d_1 | 2021-08-06 00:36:21 | 2021-08-06 00:36:22 | 2021-08-06 00:36:23 |          7 |           5280 |          8 |        3632 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r4e070967e1b17aad_0000017b18e12350_1 | 2021-08-06 00:33:35 | 2021-08-06 00:33:35 | 2021-08-06 00:33:37 |          7 |           5280 |          8 |        3632 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r7da027bed5c2df33_0000017b1307b5f2_1 | 2021-08-04 21:18:00 | 2021-08-04 21:18:00 | 2021-08-04 21:18:02 |          2 |            125 |          2 |          54 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r33090c6d975e8b83_0000017b13078053_1 | 2021-08-04 21:17:47 | 2021-08-04 21:17:47 | 2021-08-04 21:17:49 |          2 |           1075 |          2 |         616 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r13fcb919db480ae_0000017b0de8b157_1  | 2021-08-03 21:26:00 | 2021-08-03 21:26:00 | 2021-08-03 21:26:02 |          2 |           1075 |          2 |         616 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r5172493bba418297_0000017b0de7ab60_1 | 2021-08-03 21:24:54 | 2021-08-03 21:24:54 | 2021-08-03 21:24:56 |          1 |             62 |          1 |          27 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|
| stackql:US.bqjob_r61e992eab4f511bc_0000017b099e0bcf_1 | 2021-08-03 01:26:00 | 2021-08-03 01:26:00 | 2021-08-03 01:26:02 |          2 |           1075 |          2 |         616 |          0 |
|-------------------------------------------------------|---------------------|---------------------|---------------------|------------|----------------|------------|-------------|------------|

In future posts we will show similar examples using StackQL to query for errors and statistics for extract and query jobs in Big Query, see you then!

Enable Logging for Google Cloud Storage Buckets and Analyzing Logs in Big Query (Part II)

August 22, 2021 · 4 min read

Jeffrey Aven

Technologist and Cloud Consultant

In the previous post, we showed you how to enable usage and storage logging for GCS buckets. Now that we have enabled logging, let's load and analyze the logs using Big Query. We will build up a data file vars.jsonnet as we go and show the queries step by step, at the end we will show how to run this as one batch using StackQL.

Step 1 : Create a Big Query dataset

We will need a dataset (akin to a schema or a database in other RDMBS parlance), basically a container for objects such as tables or views, the data and code to do this are shown here:

StackQL
Data

INSERT INTO google.bigquery.datasets(
  projectId,
  data__location,
  data__datasetReference,
  data__description,
  data__friendlyName
)
SELECT
  '{{ .projectId }}',
  '{{ .location }}',
  '{ "datasetId": "{{ .datasetId }}", "projectId": "{{ .projectId }}" }',
  '{{ .description }}',
  '{{ .friendlyName }}'
;

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
}

Step 2 : Create `usage` table

Let's use StackQL to create a table named usage to host the GCS usage logs, the schema for the table is defined in a file named cloud_storage_usage_schema_v0.json which can be downloaded from the location provided, for reference this is provided in the Table Schema tab in the example provided below:

StackQL
Data
cloud_storage_usage_schema_v0.json

/* create_table.iql */

INSERT INTO google.bigquery.tables(
  datasetId,
  projectId,
  data__description,
  data__friendlyName,
  data__tableReference,
  data__schema
)
SELECT
  '{{ .datasetId }}',
  '{{ .projectId }}',
  '{{ .table.usage.description }}',
  '{{ .table.usage.friendlyName }}',
  '{"projectId": "{{ .projectId }}", "datasetId": "{{ .datasetId }}", "tableId": "{{ .table.usage.tableId }}"}',
  '{{ .table.usage.schema }}'
; 

/* vars.jsonnet */

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';
local usage_fields = import 'cloud_storage_usage_schema_v0.json';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
  table: {
    usage: {
      tableId: 'usage',
      friendlyName: 'Usage Logs',
	  description: 'Big Query table for GCS usage logs',
	  schema: {
        fields: usage_fields
	  }
    },
  },
}

[
  {
    "name": "time_micros",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip_type",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip_region",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_method",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_uri",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "sc_status",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_bytes",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "sc_bytes",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "time_taken_micros",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_host",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_referer",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_user_agent",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "s_request_id",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_operation",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_bucket",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_object",
    "type": "string",
    "mode": "NULLABLE"
  }
]

Run the following to execute the StackQL command with the input data shown:

stackql exec -i ./create_table.iql --iqldata ./vars.jsonnet

Step 3 : Load the usage data

We have a Big Query dataset and a table, lets load some data. To do this we will need to create and submit a load job, we can do this by inserting into the google.bigquery.jobs resource as shown here:

StackQL
Data

/* bq_load_job.iql */

INSERT INTO google.bigquery.jobs(
  projectId,
  data__configuration
)
SELECT
  'stackql',
  '{
    "load": {
      "destinationTable": {
        "projectId": "{{ .projectId }}",
        "datasetId": "{{ .datasetId }}",
        "tableId": "{{ .table.usage.tableId }}"
      },
      "sourceUris": [
        "gs://{{ .logs_bucket }}/{{ .object_prefix }}"
      ],
      "schema": {{ .table.usage.schema }},
	  "skipLeadingRows": 1,
      "maxBadRecords": 0,
      "projectionFields": []
    }
  }'
;

/* vars.jsonnet */

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';
local usage_fields = import 'cloud_storage_usage_schema_v0.json';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  logs_bucket: 'stackql-download-logs',
  object_prefix: 'stackql_downloads_usage_2021*',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
  table: {
    usage: {
      tableId: 'usage',
      friendlyName: 'Usage Logs',
	  description: 'Big Query table for GCS usage logs',
	  schema: {
        fields: usage_fields
	  }
    },
  },
}

Run the following to execute:

stackql exec -i ./bq_load_job.iql --iqldata ./vars.jsonnet

Clean up (optional)

If you want to clean up what you have done, you can do so using StackQL DELETE statements, as provided below:

NOTE: To delete a Big Query dataset, you need to delete all of the tables contained in the dataset first, as shown in the following example

StackQL
Data

-- delete table(s) 

DELETE FROM google.bigquery.tables 
WHERE projectId = '{{ .projectId }}' 
AND datasetId = '{{ .datasetId }}' 
AND tableId = '{{ .table.usage.tableId }}';

-- delete dataset

DELETE FROM google.bigquery.datasets 
WHERE projectId = '{{ .projectId }}' 
AND datasetId = '{{ .datasetId }}';

// generally you would use the same data used to create the dataset and table(s)  

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  table: {
    usage: {
      tableId: 'usage',
    }
  },  
}

Enable Logging for Google Cloud Storage Buckets and Analyzing Logs in Big Query (Part I)

August 18, 2021 · 3 min read

Jeffrey Aven

Technologist and Cloud Consultant

In a previous article, Deploying and Querying GCS Buckets using StackQL, we walked through some basic creation and query operations on Google Cloud Storage buckets. In this post we will extend on this by enabling logging on a GCS bucket using StackQL. This post is based upon this article: Usage logs & storage logs.

Assuming we have deployed a bucket which we want to log activities on, follow the steps below:

Step 1 : Create a bucket to store the usage logs

One bucket in a project can be used to collect the usage logs from one or more other buckets in the project. Use the StackQL Command Shell (stackql shell) or stackql exec to create this logs bucket as shown here:

INSERT INTO google.storage.buckets(
  project,
  data__name,
  data__location,
  data__locationType
)
SELECT
  'stackql',
  'stackql-download-logs',
  'US',
  'multi-region'
;

for more examples of creating Google Cloud Storage buckets using StackQL, see Deploying and Querying GCS Buckets using StackQL.

Step 2: Set IAM policy for the logs bucket

You will need to create an IAM binding to enable writes to this bucket, do this by using the setIamPolicy method as shown here:

EXEC google.storage.buckets.setIamPolicy
@bucket = 'stackql-download-logs'
@@json = '{
  "bindings":[
    {
      "role": "roles/storage.legacyBucketWriter",
      "members":[
        "group:cloud-storage-analytics@google.com"
      ]
    }
  ]
}';

TIP: you should also add role bindings to the roles/storage.legacyBucketOwner role for serviceAccount or users who will be running StackQL SELECT queries against this logs bucket.

Step 3: Enable logging on the target bucket

To enable logging on your target bucket (or buckets) run the following StackQL EXEC method:

EXEC google.storage.buckets.patch
@bucket = 'stackql-downloads'
@@json = '{
 "logging": {
  "logBucket": "stackql-download-logs",
  "logObjectPrefix": "stackql_downloads"
 }
}';

TIP: use SHOW METHODS IN google.storage.buckets; to see what operations are avaialable such as the patch and setIamPolicy examples shown in the previous steps.

Step 4: Check logging status on target bucket

To see that logging has been enabled run the StackQL query below:

select name, logging
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

To unpack the logging object, you can use the [JSON_EXTRACT]](/docs/language-spec/functions/json/json_extract) built in function as shown here:

select name, json_extract(logging, '$.logBucket') as logBucket,
json_extract(logging, '$.logObjectPrefix') as logObjectPrefix
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

In Part II of this post, we will demonstrate how to create a Big Query dataset, then load and analyze the GCS usage logs you have collected using Big Query, stay tuned!

Loading Data into Big Query from GCS using StackQL​

Query for Big Query Errors​

Get Big Query Load Specific Errors​

Get Big Query Load Statistics​

Step 1 : Create a Big Query dataset​

Step 2 : Create usage table​

Step 3 : Load the usage data​

Clean up (optional)​

Step 1 : Create a bucket to store the usage logs​

Step 2: Set IAM policy for the logs bucket​

Step 3: Enable logging on the target bucket​

Step 4: Check logging status on target bucket​

Loading Data into Big Query from GCS using StackQL

Query for Big Query Errors

Get Big Query Load Specific Errors

Get Big Query Load Statistics

Step 1 : Create a Big Query dataset

Step 2 : Create `usage` table

Step 3 : Load the usage data

Clean up (optional)

Step 1 : Create a bucket to store the usage logs

Step 2: Set IAM policy for the logs bucket

Step 3: Enable logging on the target bucket

Step 4: Check logging status on target bucket