Skip to main content

· 5 min read

In the previous post, we showed you how to enable usage and storage logging for GCS buckets. Now that we have enabled logging, let's load and analyze the logs using Big Query. We will build up a data file vars.jsonnet as we go and show the queries step by step, at the end we will show how to run this as one batch using StackQL.

Step 1 : Create a Big Query dataset

We will need a dataset (akin to a schema or a database in other RDMBS parlance), basically a container for objects such as tables or views, the data and code to do this are shown here:

INSERT INTO google.bigquery.datasets(
projectId,
data__location,
data__datasetReference,
data__description,
data__friendlyName
)
SELECT
'{{ .projectId }}',
'{{ .location }}',
'{ "datasetId": "{{ .datasetId }}", "projectId": "{{ .projectId }}" }',
'{{ .description }}',
'{{ .friendlyName }}'
;

Step 2 : Create usage table

Let's use StackQL to create a table named usage to host the GCS usage logs, the schema for the table is defined in a file named cloud_storage_usage_schema_v0.json which can be downloaded from the location provided, for reference this is provided in the Table Schema tab in the example provided below:

/* create_table.iql */

INSERT INTO google.bigquery.tables(
datasetId,
projectId,
data__description,
data__friendlyName,
data__tableReference,
data__schema
)
SELECT
'{{ .datasetId }}',
'{{ .projectId }}',
'{{ .table.usage.description }}',
'{{ .table.usage.friendlyName }}',
'{"projectId": "{{ .projectId }}", "datasetId": "{{ .datasetId }}", "tableId": "{{ .table.usage.tableId }}"}',
'{{ .table.usage.schema }}'
;

Run the following to execute the StackQL command with the input data shown:

stackql exec -i ./create_table.iql --iqldata ./vars.jsonnet

Step 3 : Load the usage data

We have a Big Query dataset and a table, lets load some data. To do this we will need to create and submit a load job, we can do this by inserting into the google.bigquery.jobs resource as shown here:

/* bq_load_job.iql */

INSERT INTO google.bigquery.jobs(
projectId,
data__configuration
)
SELECT
'stackql',
'{
"load": {
"destinationTable": {
"projectId": "{{ .projectId }}",
"datasetId": "{{ .datasetId }}",
"tableId": "{{ .table.usage.tableId }}"
},
"sourceUris": [
"gs://{{ .logs_bucket }}/{{ .object_prefix }}"
],
"schema": {{ .table.usage.schema }},
"skipLeadingRows": 1,
"maxBadRecords": 0,
"projectionFields": []
}
}'
;

Run the following to execute:

stackql exec -i ./bq_load_job.iql --iqldata ./vars.jsonnet

Clean up (optional)

If you want to clean up what you have done, you can do so using StackQL DELETE statements, as provided below:

NOTE: To delete a Big Query dataset, you need to delete all of the tables contained in the dataset first, as shown in the following example

-- delete table(s) 

DELETE FROM google.bigquery.tables
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}'
AND tableId = '{{ .table.usage.tableId }}';

-- delete dataset

DELETE FROM google.bigquery.datasets
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}';

· 3 min read

In a previous article, Deploying and Querying GCS Buckets using StackQL, we walked through some basic creation and query operations on Google Cloud Storage buckets. In this post we will extend on this by enabling logging on a GCS bucket using StackQL. This post is based upon this article: Usage logs & storage logs.

Assuming we have deployed a bucket which we want to log activities on, follow the steps below:

Step 1 : Create a bucket to store the usage logs

One bucket in a project can be used to collect the usage logs from one or more other buckets in the project. Use the StackQL Command Shell (stackql shell) or stackql exec to create this logs bucket as shown here:

INSERT INTO google.storage.buckets(
project,
data__name,
data__location,
data__locationType
)
SELECT
'stackql',
'stackql-download-logs',
'US',
'multi-region'
;

for more examples of creating Google Cloud Storage buckets using StackQL, see Deploying and Querying GCS Buckets using StackQL.

Step 2: Set IAM policy for the logs bucket

You will need to create an IAM binding to enable writes to this bucket, do this by using the setIamPolicy method as shown here:

EXEC google.storage.buckets.setIamPolicy
@bucket = 'stackql-download-logs'
@@json = '{
"bindings":[
{
"role": "roles/storage.legacyBucketWriter",
"members":[
"group:cloud-storage-analytics@google.com"
]
}
]
}';

TIP: you should also add role bindings to the roles/storage.legacyBucketOwner role for serviceAccount or users who will be running StackQL SELECT queries against this logs bucket.

Step 3: Enable logging on the target bucket

To enable logging on your target bucket (or buckets) run the following StackQL EXEC method:

EXEC google.storage.buckets.patch
@bucket = 'stackql-downloads'
@@json = '{
"logging": {
"logBucket": "stackql-download-logs",
"logObjectPrefix": "stackql_downloads"
}
}';

TIP: use SHOW METHODS IN google.storage.buckets; to see what operations are avaialable such as the patch and setIamPolicy examples shown in the previous steps.

Step 4: Check logging status on target bucket

To see that logging has been enabled run the StackQL query below:

select name, logging
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

To unpack the logging object, you can use the [JSON_EXTRACT]](/docs/language-spec/functions/json/json_extract) built in function as shown here:

select name, json_extract(logging, '$.logBucket') as logBucket,
json_extract(logging, '$.logObjectPrefix') as logObjectPrefix
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

In Part II of this post, we will demonstrate how to create a Big Query dataset, then load and analyze the GCS usage logs you have collected using Big Query, stay tuned!

· 6 min read

StackQL is a simple way to automate the deployment, configuration, management, and removal of Google Cloud Storage buckets, as well as an easy way to query buckets in your environment for asset management reasons or to look for misconfigurations, such as public access, non-conformant encryption configuration and more. It may be useful to review Exploring GCP services and resources using StackQL, which walks working through the StackQL resource hierarchy including the SHOW and DESCRIBE commands.

Generate an INSERT template for a new bucket

The SHOW INSERT command in StackQL can be used to generate an INSERT template which can be used to create any resources in GCP. The easiest way to use this command is via the command line using the text output type and supressing column headers using the -H flag, the StackQL interactive shell can be used as well.

SHOW INSERT INTO google.storage.buckets;

To only show the mandatory fields for a resource, you can use the /*+ REQUIRED */ query hint.

You could also use the --outfile flag to write the template to a new IQL file, or pipe the results to a file using the appropriate shell operator (e.g. > or >>).

More information on generating templates can be found at Creating Infrastructure Templates

Create a bucket

Now that you have a template, you can curate this to the fields you desire. Executing an INSERT command to create a bucket is as easy as:

-- change this for your project and bucket names
INSERT INTO google.storage.buckets(
project,
data__name,
data__location,
data__locationType,
data__labels
)
SELECT
'stackql-demo',
'stackql-demo-bucket',
'US',
'multi-region',
'[{"env":"demo"}]'
;

Removing buckets

In some cases you may want to retire and remove buckets that are no longer needed, doing so is easy in StackQL using the DELETE command this can be done individually or in a batch (as we will demonstrate later).

NOTE that buckets cannot be deleted if they contain any objects, we will show examples of working with objects in future posts

To delete the bucket you created in the previous step, run the following:

DELETE FROM google.storage.buckets
WHERE bucket = 'stackql-demo-bucket';

Using variables to automate bucket creation

Although the previous example was simple, providing static values for attributes is not very scalable. In most cases, configuration for resource creation and modification is provided as data associated with an StackQL script or module using json or jsonnet files.

This can be supplied inline in the IQL script or provided as a separate file, e.g. vars.jsonnet. For more information, see Using Variables.

The example below demonstrates how to create 3 different buckets using configuration data provided using jsonnet.

TIP Use Jsonnet where possible as it is a much more powerful and feature rich configuration and data templating language supporting inheritance, external variables, parametrization, functions and operators, and much more. For more information visit https://jsonnet.org/.

Step 1: Save the following jsonnet configuration to a file named cloud_storage_vars.jsonnet:

local project = 'stackql-demo'; // update to your project name

{
project: project,
location: 'US',
locationType: 'multi-region',
buckets: [
{
name: project + '-bucket1',
label: [{"env":"demo1"}],
},
{
name: project + '-bucket2',
label: [{"env":"demo2"}],
},
{
name: project + '-bucket3',
label: [{"env":"demo3"}],
},
]
}

Step 2: Save the following StackQL script to a file named deploy_buckets.iql:

{{range .buckets}}
INSERT INTO google.storage.buckets(
project,
data__location,
data__locationType,
data__name,
data__labels
)
SELECT
'{{ $.project }}',
'{{ $.location }}',
'{{ $.locationType }}',
'{{ .name }}',
'{{ .label }}'
;
{{end}}

Step 3: (Optional) Perform a dryrun operation to see what the resultant query would be

With any modification operation which sources data from a jsonnet file it is useful to perform a dryrun operation first to ensure the variable substitution is as you intended. This is done by using the dryrun flag as shown here:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --dryrun --output text -H

Step 4: Run the query

If you are satisfied with the results of the dry run, remove the dryrun and output flags (which are not needed) and run the command, as shown here (note that if you are creating or deploying resources in GCP you will need to supply a valid service account key file or authenticate interactively):

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Querying buckets using SELECT statements

This is easy...

SELECT id, name, location, locationType, timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo';
/* returns
|----------------------|----------------------|----------|--------------|--------------------------|
| id | name | location | locationType | timeCreated |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket1 | stackql-demo-bucket1 | US | multi-region | 2021-08-10T21:32:25.847Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket2 | stackql-demo-bucket2 | US | multi-region | 2021-08-10T21:32:26.877Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket3 | stackql-demo-bucket3 | US | multi-region | 2021-08-10T21:32:27.864Z |
|----------------------|----------------------|----------|--------------|--------------------------|
*/

To order the result set by timeCreated latest first...

SELECT id, name, location, locationType,timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo'
ORDER BY timeCreated DESC;

Or a simple count:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
| 3 |
|-------------------|
*/

Cleanup

To clean up the environment and remove the test buckets, copy the following code to a file named bucket_cleanup.iql.

{{range .buckets}}
DELETE FROM google.storage.buckets
WHERE bucket = '{{.name}}';
{{end}}

Since we are in the danger zone here (deleting multiple persistent data objects) you may want to be sure you haven’t fat-fingered anything and don’t remove anything you didn’t intend to... To do this, use the dryrun option shown earlier to run the script and preprocess the input data only and show the templated command before you run it.

When you are ready to go you can run this command:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./bucket_cleanup.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Now you should see a filtered count return 0 resources as shown here:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
| 0 |
|-------------------|
*/

Future posts will demonstrate more advanced query and deployment operations with Cloud Storage, such as working with encryption using CMEK in KMS, configuring logging, cors, website access, logging configuration, acls and IAM. See ya then!

· 6 min read

This article will walk through examples of exploring services and resources in GCP using StackQL - a SQL based language for working with cloud and SaaS assets. If you are new to StackQL it may be helpful to start here.

List available services using SHOW SERVICES

To list all of the services available in the google provider, use the SHOW SERVICES command as shown below:

StackQL**>>SHOW SERVICES IN google;
|---------------|-----------|-----------------------|
| id | name | title |
|---------------|-----------|-----------------------|
| appengine__v1 | appengine | App Engine Admin API |
|---------------------------------------------------|
| bigquery__v2 | bigquery | BigQuery API |
|---------------------------------------------------|
| compute__v1 | compute | Compute Engine API |
|---------------------------------------------------|
| container__v1 | container | Kubernetes Engine API |
|---------------------------------------------------|
| ... | ... | ... |
|---------------------------------------------------|

For more information on running StackQL commands interactively or non-ineractively, see the shell and exec docs.

To search for a specific service or services using the LIKE or WHERE operator as shown here:

StackQL**>>SHOW SERVICES IN google WHERE name = 'compute';
|-------------|---------|--------------------|
| id | name | title |
|-------------|---------|--------------------|
| compute__v1 | compute | Compute Engine API |
|-------------|---------|--------------------|
StackQL**>>SHOW SERVICES IN google LIKE '%container%';
|----------------------------|-------------------|------------------------|
| id | name | title |
|----------------------------|-------------------|------------------------|
| container__v1 | container | Kubernetes Engine API |
|----------------------------|-------------------|------------------------|
| containeranalysis__v1beta1 | containeranalysis | Container Analysis API |
|----------------------------|-------------------|------------------------|

List available resources for a service using SHOW RESOURCES

Once you have identfied the service you want to work with, you can use the SHOW RESOURCES to list all of the available resources in the given service, as shown here:

StackQL**>>SHOW RESOURCES IN google.storage;
|-----------------------------|---------------------|---------------------|
| name | title | id |
|-----------------------------|---------------------|---------------------|
| buckets | Bucket | storage__v1.buckets |
|-----------------------------|---------------------|---------------------|
| objects | Object | storage__v1.objects |
|-----------------------------|---------------------|---------------------|
| ... | ... | ... |
|-----------------------------|---------------------|---------------------|

As with SHOW SERVICES, you can narrow your search by using the LIKE or WHERE operators with the SHOW RESOURCES command as well.

There is also an EXTENDED token which can be used to get descriptive information about the resource as shown here:

StackQL**>>SHOW EXTENDED RESOURCES IN google.storage WHERE name = 'notifications';
|---------------|---------------------------|--------------|--------------------------------|
| name | id | title | description |
|---------------|---------------------------|--------------|--------------------------------|
| notifications | storage__v1.notifications | Notification | A subscription to receive |
| | | | Google PubSub notifications. |
|---------------|---------------------------|--------------|--------------------------------|

List the fields in a service using DESCRIBE

If you have identified the resource you want to work with, you can use the DESCRIBE command to see the available fields for a SELECT operation as shown here:

StackQL**>>DESCRIBE google.storage.buckets;
|-----------------------|---------|
| name | type |
|-----------------------|---------|
| id | string |
|-----------------------|---------|
| name | string |
|-----------------------|---------|
| location | string |
|-----------------------|---------|
| ... | ... |
|-----------------------|---------|

As with the SHOW command, you can use EXTENDED to provide descriptive information about each field as shown here:

StackQL**>>DESCRIBE EXTENDED google.storage.buckets;
|-----------------------|---------|--------------------------------|
| name | type | description |
|-----------------------|---------|--------------------------------|
| id | string | The ID of the bucket. For |
| | | buckets, the id and name |
| | | properties are the same. |
|-----------------------|---------|--------------------------------|
| name | string | The name of the bucket. |
|-----------------------|---------|--------------------------------|
| location | string | The location of the bucket. |
| | | Object data for objects |
| | | in the bucket resides in |
| | | physical storage within this |
| | | region. Defaults to US. See |
| | | the developer's guide for the |
| | | authoritative list. |
|-----------------------|---------|--------------------------------|
| ... | ... | ... |
|-----------------------|---------|--------------------------------|

List the available methods for a service using SHOW METHODS

To see the available methods (operations) available for a given resource (beyond INSERT, SELECT and DELETE) use the SHOW METHODS command as shown here:

StackQL**>>SHOW METHODS IN google.compute.instances;
|-----------------------|----------------------------|
| MethodName | RequiredParams |
|-----------------------|----------------------------|
| getEffectiveFirewalls | zone, instance, |
| | networkInterface, project |
|-----------------------|----------------------------|
| setServiceAccount | instance, project, zone |
|-----------------------|----------------------------|
| getIamPolicy | project, resource, zone |
|-----------------------|----------------------------|
| stop | instance, project, zone |
|-----------------------|----------------------------|
| testIamPermissions | project, resource, zone |
|-----------------------|----------------------------|
| attachDisk | instance, project, zone |
|-----------------------|----------------------------|
| start | instance, project, zone |
|-----------------------|----------------------------|
| detachDisk | instance, project, zone, |
| | deviceName |
|-----------------------|----------------------------|
| ... | ... |
|-----------------------|----------------------------|

The operations shown by SHOW METHODS (which are reffered to as provider stored procedures) can be invoked using the EXEC command.

EXTENDED can be used like the other meta commands to see a description for the method as shown here:

StackQL**>>SHOW EXTENDED METHODS IN google.compute.instances;
|--------------|--------------------------|--------------------------------|
| MethodName | RequiredParams | description |
|------------- |--------------------------|--------------------------------|
| attachDisk | instance, project, zone | Attaches an existing Disk |
| | | resource to an instance. You |
| | | must first create the disk |
| | | before you can attach it. |
| | | It is not possible to create |
| | | and attach a disk at the same |
| | | time. For more information, |
| | | read Adding a persistent disk |
| | | to your instance. |
|--------------|--------------------------|--------------------------------|
| ... | ... | ... |
|--------------|--------------------------|--------------------------------|

Next up we will start creating infrastructure templates using SHOW INSERT, creating resources using INSERT and querying resources using SELECT, stay tuned!