Skip to main content

4 posts tagged with "google cloud storage"

View All Tags

· 2 min read

Its easy enough for anyone to deploy a Cloud Storage bucket in google, this can be done through the console, gcloud, terraform or stackql as shown here: Deploying and Querying GCS Buckets using StackQL. It is also easy to inadvertently allow users to set public ACLs on a bucket, therefore making its contents publicly visible by default. There is an easy way to prevent this from happening by Using public access prevention.

Let's work through a real life scenario using StackQL.

Step 1 : Run a query to find buckets which do not have public access prevention enforced

Run the following StackQL query from the shell or via exec:

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
| name | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod | unspecified |
|-------------------|------------------------|
| myco-tf-prod | enforced |
|-------------------|------------------------|
*/

We can see from the query results that the myco-tf-nonprod bucket does not have public access prevention enforced, lets fix it...using StackQL.

Step 2 : Configure public access prevention for a bucket

Run the following StackQL procedure to enforce public access prevention:

EXEC google.storage.buckets.patch 
@bucket = 'myco-tf-nonprod'
@@json = '{
"iamConfiguration": {
"publicAccessPrevention": "enforced"
}
}';

Step 3: Confirm public access prevention is enforced

Run the first query again, and you should see that the desired result is in place.

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
| name | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod | enforced |
|-------------------|------------------------|
| myco-tf-prod | enforced |
|-------------------|------------------------|
*/

Easy!

· 5 min read

In the previous post, we showed you how to enable usage and storage logging for GCS buckets. Now that we have enabled logging, let's load and analyze the logs using Big Query. We will build up a data file vars.jsonnet as we go and show the queries step by step, at the end we will show how to run this as one batch using StackQL.

Step 1 : Create a Big Query dataset

We will need a dataset (akin to a schema or a database in other RDMBS parlance), basically a container for objects such as tables or views, the data and code to do this are shown here:

INSERT INTO google.bigquery.datasets(
projectId,
data__location,
data__datasetReference,
data__description,
data__friendlyName
)
SELECT
'{{ .projectId }}',
'{{ .location }}',
'{ "datasetId": "{{ .datasetId }}", "projectId": "{{ .projectId }}" }',
'{{ .description }}',
'{{ .friendlyName }}'
;

Step 2 : Create usage table

Let's use StackQL to create a table named usage to host the GCS usage logs, the schema for the table is defined in a file named cloud_storage_usage_schema_v0.json which can be downloaded from the location provided, for reference this is provided in the Table Schema tab in the example provided below:

/* create_table.iql */

INSERT INTO google.bigquery.tables(
datasetId,
projectId,
data__description,
data__friendlyName,
data__tableReference,
data__schema
)
SELECT
'{{ .datasetId }}',
'{{ .projectId }}',
'{{ .table.usage.description }}',
'{{ .table.usage.friendlyName }}',
'{"projectId": "{{ .projectId }}", "datasetId": "{{ .datasetId }}", "tableId": "{{ .table.usage.tableId }}"}',
'{{ .table.usage.schema }}'
;

Run the following to execute the StackQL command with the input data shown:

stackql exec -i ./create_table.iql --iqldata ./vars.jsonnet

Step 3 : Load the usage data

We have a Big Query dataset and a table, lets load some data. To do this we will need to create and submit a load job, we can do this by inserting into the google.bigquery.jobs resource as shown here:

/* bq_load_job.iql */

INSERT INTO google.bigquery.jobs(
projectId,
data__configuration
)
SELECT
'stackql',
'{
"load": {
"destinationTable": {
"projectId": "{{ .projectId }}",
"datasetId": "{{ .datasetId }}",
"tableId": "{{ .table.usage.tableId }}"
},
"sourceUris": [
"gs://{{ .logs_bucket }}/{{ .object_prefix }}"
],
"schema": {{ .table.usage.schema }},
"skipLeadingRows": 1,
"maxBadRecords": 0,
"projectionFields": []
}
}'
;

Run the following to execute:

stackql exec -i ./bq_load_job.iql --iqldata ./vars.jsonnet

Clean up (optional)

If you want to clean up what you have done, you can do so using StackQL DELETE statements, as provided below:

NOTE: To delete a Big Query dataset, you need to delete all of the tables contained in the dataset first, as shown in the following example

-- delete table(s) 

DELETE FROM google.bigquery.tables
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}'
AND tableId = '{{ .table.usage.tableId }}';

-- delete dataset

DELETE FROM google.bigquery.datasets
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}';

· 3 min read

In a previous article, Deploying and Querying GCS Buckets using StackQL, we walked through some basic creation and query operations on Google Cloud Storage buckets. In this post we will extend on this by enabling logging on a GCS bucket using StackQL. This post is based upon this article: Usage logs & storage logs.

Assuming we have deployed a bucket which we want to log activities on, follow the steps below:

Step 1 : Create a bucket to store the usage logs

One bucket in a project can be used to collect the usage logs from one or more other buckets in the project. Use the StackQL Command Shell (stackql shell) or stackql exec to create this logs bucket as shown here:

INSERT INTO google.storage.buckets(
project,
data__name,
data__location,
data__locationType
)
SELECT
'stackql',
'stackql-download-logs',
'US',
'multi-region'
;

for more examples of creating Google Cloud Storage buckets using StackQL, see Deploying and Querying GCS Buckets using StackQL.

Step 2: Set IAM policy for the logs bucket

You will need to create an IAM binding to enable writes to this bucket, do this by using the setIamPolicy method as shown here:

EXEC google.storage.buckets.setIamPolicy
@bucket = 'stackql-download-logs'
@@json = '{
"bindings":[
{
"role": "roles/storage.legacyBucketWriter",
"members":[
"group:cloud-storage-analytics@google.com"
]
}
]
}';

TIP: you should also add role bindings to the roles/storage.legacyBucketOwner role for serviceAccount or users who will be running StackQL SELECT queries against this logs bucket.

Step 3: Enable logging on the target bucket

To enable logging on your target bucket (or buckets) run the following StackQL EXEC method:

EXEC google.storage.buckets.patch
@bucket = 'stackql-downloads'
@@json = '{
"logging": {
"logBucket": "stackql-download-logs",
"logObjectPrefix": "stackql_downloads"
}
}';

TIP: use SHOW METHODS IN google.storage.buckets; to see what operations are avaialable such as the patch and setIamPolicy examples shown in the previous steps.

Step 4: Check logging status on target bucket

To see that logging has been enabled run the StackQL query below:

select name, logging
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

To unpack the logging object, you can use the [JSON_EXTRACT]](/docs/language-spec/functions/json/json_extract) built in function as shown here:

select name, json_extract(logging, '$.logBucket') as logBucket,
json_extract(logging, '$.logObjectPrefix') as logObjectPrefix
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

In Part II of this post, we will demonstrate how to create a Big Query dataset, then load and analyze the GCS usage logs you have collected using Big Query, stay tuned!

· 6 min read

StackQL is a simple way to automate the deployment, configuration, management, and removal of Google Cloud Storage buckets, as well as an easy way to query buckets in your environment for asset management reasons or to look for misconfigurations, such as public access, non-conformant encryption configuration and more. It may be useful to review Exploring GCP services and resources using StackQL, which walks working through the StackQL resource hierarchy including the SHOW and DESCRIBE commands.

Generate an INSERT template for a new bucket

The SHOW INSERT command in StackQL can be used to generate an INSERT template which can be used to create any resources in GCP. The easiest way to use this command is via the command line using the text output type and supressing column headers using the -H flag, the StackQL interactive shell can be used as well.

SHOW INSERT INTO google.storage.buckets;

To only show the mandatory fields for a resource, you can use the /*+ REQUIRED */ query hint.

You could also use the --outfile flag to write the template to a new IQL file, or pipe the results to a file using the appropriate shell operator (e.g. > or >>).

More information on generating templates can be found at Creating Infrastructure Templates

Create a bucket

Now that you have a template, you can curate this to the fields you desire. Executing an INSERT command to create a bucket is as easy as:

-- change this for your project and bucket names
INSERT INTO google.storage.buckets(
project,
data__name,
data__location,
data__locationType,
data__labels
)
SELECT
'stackql-demo',
'stackql-demo-bucket',
'US',
'multi-region',
'[{"env":"demo"}]'
;

Removing buckets

In some cases you may want to retire and remove buckets that are no longer needed, doing so is easy in StackQL using the DELETE command this can be done individually or in a batch (as we will demonstrate later).

NOTE that buckets cannot be deleted if they contain any objects, we will show examples of working with objects in future posts

To delete the bucket you created in the previous step, run the following:

DELETE FROM google.storage.buckets
WHERE bucket = 'stackql-demo-bucket';

Using variables to automate bucket creation

Although the previous example was simple, providing static values for attributes is not very scalable. In most cases, configuration for resource creation and modification is provided as data associated with an StackQL script or module using json or jsonnet files.

This can be supplied inline in the IQL script or provided as a separate file, e.g. vars.jsonnet. For more information, see Using Variables.

The example below demonstrates how to create 3 different buckets using configuration data provided using jsonnet.

TIP Use Jsonnet where possible as it is a much more powerful and feature rich configuration and data templating language supporting inheritance, external variables, parametrization, functions and operators, and much more. For more information visit https://jsonnet.org/.

Step 1: Save the following jsonnet configuration to a file named cloud_storage_vars.jsonnet:

local project = 'stackql-demo'; // update to your project name

{
project: project,
location: 'US',
locationType: 'multi-region',
buckets: [
{
name: project + '-bucket1',
label: [{"env":"demo1"}],
},
{
name: project + '-bucket2',
label: [{"env":"demo2"}],
},
{
name: project + '-bucket3',
label: [{"env":"demo3"}],
},
]
}

Step 2: Save the following StackQL script to a file named deploy_buckets.iql:

{{range .buckets}}
INSERT INTO google.storage.buckets(
project,
data__location,
data__locationType,
data__name,
data__labels
)
SELECT
'{{ $.project }}',
'{{ $.location }}',
'{{ $.locationType }}',
'{{ .name }}',
'{{ .label }}'
;
{{end}}

Step 3: (Optional) Perform a dryrun operation to see what the resultant query would be

With any modification operation which sources data from a jsonnet file it is useful to perform a dryrun operation first to ensure the variable substitution is as you intended. This is done by using the dryrun flag as shown here:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --dryrun --output text -H

Step 4: Run the query

If you are satisfied with the results of the dry run, remove the dryrun and output flags (which are not needed) and run the command, as shown here (note that if you are creating or deploying resources in GCP you will need to supply a valid service account key file or authenticate interactively):

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Querying buckets using SELECT statements

This is easy...

SELECT id, name, location, locationType, timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo';
/* returns
|----------------------|----------------------|----------|--------------|--------------------------|
| id | name | location | locationType | timeCreated |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket1 | stackql-demo-bucket1 | US | multi-region | 2021-08-10T21:32:25.847Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket2 | stackql-demo-bucket2 | US | multi-region | 2021-08-10T21:32:26.877Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket3 | stackql-demo-bucket3 | US | multi-region | 2021-08-10T21:32:27.864Z |
|----------------------|----------------------|----------|--------------|--------------------------|
*/

To order the result set by timeCreated latest first...

SELECT id, name, location, locationType,timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo'
ORDER BY timeCreated DESC;

Or a simple count:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
| 3 |
|-------------------|
*/

Cleanup

To clean up the environment and remove the test buckets, copy the following code to a file named bucket_cleanup.iql.

{{range .buckets}}
DELETE FROM google.storage.buckets
WHERE bucket = '{{.name}}';
{{end}}

Since we are in the danger zone here (deleting multiple persistent data objects) you may want to be sure you haven’t fat-fingered anything and don’t remove anything you didn’t intend to... To do this, use the dryrun option shown earlier to run the script and preprocess the input data only and show the templated command before you run it.

When you are ready to go you can run this command:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./bucket_cleanup.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Now you should see a filtered count return 0 resources as shown here:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
| 0 |
|-------------------|
*/

Future posts will demonstrate more advanced query and deployment operations with Cloud Storage, such as working with encryption using CMEK in KMS, configuring logging, cors, website access, logging configuration, acls and IAM. See ya then!