4 posts tagged with "GCS"

Preventing Public Access for GCS Buckets

August 24, 2021 · 2 min read

Technologist and Cloud Consultant

Its easy enough for anyone to deploy a Cloud Storage bucket in google, this can be done through the console, gcloud, terraform or stackql as shown here: Deploying and Querying GCS Buckets using StackQL. It is also easy to inadvertently allow users to set public ACLs on a bucket, therefore making its contents publicly visible by default. There is an easy way to prevent this from happening by Using public access prevention.

Let's work through a real life scenario using StackQL.

Step 1 : Run a query to find buckets which do not have public access prevention enforced

Run the following StackQL query from the shell or via exec:

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM  google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
|       name        | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod   | unspecified            |
|-------------------|------------------------|
| myco-tf-prod      | enforced               |
|-------------------|------------------------|
*/

We can see from the query results that the myco-tf-nonprod bucket does not have public access prevention enforced, lets fix it...using StackQL.

Step 2 : Configure public access prevention for a bucket

Run the following StackQL procedure to enforce public access prevention:

EXEC google.storage.buckets.patch 
@bucket = 'myco-tf-nonprod'
@@json = '{
    "iamConfiguration": {
      "publicAccessPrevention": "enforced"
    }
}';

Step 3: Confirm public access prevention is enforced

Run the first query again, and you should see that the desired result is in place.

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM  google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
|       name        | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod   | enforced               |
|-------------------|------------------------|
| myco-tf-prod      | enforced               |
|-------------------|------------------------|
*/

Easy!

Enable Logging for Google Cloud Storage Buckets and Analyzing Logs in Big Query (Part II)

August 22, 2021 · 4 min read

Jeffrey Aven

Technologist and Cloud Consultant

In the previous post, we showed you how to enable usage and storage logging for GCS buckets. Now that we have enabled logging, let's load and analyze the logs using Big Query. We will build up a data file vars.jsonnet as we go and show the queries step by step, at the end we will show how to run this as one batch using StackQL.

Step 1 : Create a Big Query dataset

We will need a dataset (akin to a schema or a database in other RDMBS parlance), basically a container for objects such as tables or views, the data and code to do this are shown here:

StackQL
Data

INSERT INTO google.bigquery.datasets(
  projectId,
  data__location,
  data__datasetReference,
  data__description,
  data__friendlyName
)
SELECT
  '{{ .projectId }}',
  '{{ .location }}',
  '{ "datasetId": "{{ .datasetId }}", "projectId": "{{ .projectId }}" }',
  '{{ .description }}',
  '{{ .friendlyName }}'
;

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
}

Step 2 : Create `usage` table

Let's use StackQL to create a table named usage to host the GCS usage logs, the schema for the table is defined in a file named cloud_storage_usage_schema_v0.json which can be downloaded from the location provided, for reference this is provided in the Table Schema tab in the example provided below:

StackQL
Data
cloud_storage_usage_schema_v0.json

/* create_table.iql */

INSERT INTO google.bigquery.tables(
  datasetId,
  projectId,
  data__description,
  data__friendlyName,
  data__tableReference,
  data__schema
)
SELECT
  '{{ .datasetId }}',
  '{{ .projectId }}',
  '{{ .table.usage.description }}',
  '{{ .table.usage.friendlyName }}',
  '{"projectId": "{{ .projectId }}", "datasetId": "{{ .datasetId }}", "tableId": "{{ .table.usage.tableId }}"}',
  '{{ .table.usage.schema }}'
; 

/* vars.jsonnet */

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';
local usage_fields = import 'cloud_storage_usage_schema_v0.json';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
  table: {
    usage: {
      tableId: 'usage',
      friendlyName: 'Usage Logs',
	  description: 'Big Query table for GCS usage logs',
	  schema: {
        fields: usage_fields
	  }
    },
  },
}

[
  {
    "name": "time_micros",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip_type",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "c_ip_region",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_method",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_uri",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "sc_status",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_bytes",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "sc_bytes",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "time_taken_micros",
    "type": "integer",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_host",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_referer",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_user_agent",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "s_request_id",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_operation",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_bucket",
    "type": "string",
    "mode": "NULLABLE"
  },

  {
    "name": "cs_object",
    "type": "string",
    "mode": "NULLABLE"
  }
]

Run the following to execute the StackQL command with the input data shown:

stackql exec -i ./create_table.iql --iqldata ./vars.jsonnet

Step 3 : Load the usage data

We have a Big Query dataset and a table, lets load some data. To do this we will need to create and submit a load job, we can do this by inserting into the google.bigquery.jobs resource as shown here:

StackQL
Data

/* bq_load_job.iql */

INSERT INTO google.bigquery.jobs(
  projectId,
  data__configuration
)
SELECT
  'stackql',
  '{
    "load": {
      "destinationTable": {
        "projectId": "{{ .projectId }}",
        "datasetId": "{{ .datasetId }}",
        "tableId": "{{ .table.usage.tableId }}"
      },
      "sourceUris": [
        "gs://{{ .logs_bucket }}/{{ .object_prefix }}"
      ],
      "schema": {{ .table.usage.schema }},
	  "skipLeadingRows": 1,
      "maxBadRecords": 0,
      "projectionFields": []
    }
  }'
;

/* vars.jsonnet */

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';
local usage_fields = import 'cloud_storage_usage_schema_v0.json';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  location: 'US',
  logs_bucket: 'stackql-download-logs',
  object_prefix: 'stackql_downloads_usage_2021*',
  description: 'Storage and usage logs from the website',
  friendlyName: 'StackQL Download Logs',
  table: {
    usage: {
      tableId: 'usage',
      friendlyName: 'Usage Logs',
	  description: 'Big Query table for GCS usage logs',
	  schema: {
        fields: usage_fields
	  }
    },
  },
}

Run the following to execute:

stackql exec -i ./bq_load_job.iql --iqldata ./vars.jsonnet

Clean up (optional)

If you want to clean up what you have done, you can do so using StackQL DELETE statements, as provided below:

NOTE: To delete a Big Query dataset, you need to delete all of the tables contained in the dataset first, as shown in the following example

StackQL
Data

-- delete table(s) 

DELETE FROM google.bigquery.tables 
WHERE projectId = '{{ .projectId }}' 
AND datasetId = '{{ .datasetId }}' 
AND tableId = '{{ .table.usage.tableId }}';

-- delete dataset

DELETE FROM google.bigquery.datasets 
WHERE projectId = '{{ .projectId }}' 
AND datasetId = '{{ .datasetId }}';

// generally you would use the same data used to create the dataset and table(s)  

// variables
local projectId = 'stackql';
local datasetId = 'stackql_downloads';

// config
{
  projectId: projectId,
  datasetId: datasetId,
  table: {
    usage: {
      tableId: 'usage',
    }
  },  
}

Enable Logging for Google Cloud Storage Buckets and Analyzing Logs in Big Query (Part I)

August 18, 2021 · 3 min read

Jeffrey Aven

Technologist and Cloud Consultant

In a previous article, Deploying and Querying GCS Buckets using StackQL, we walked through some basic creation and query operations on Google Cloud Storage buckets. In this post we will extend on this by enabling logging on a GCS bucket using StackQL. This post is based upon this article: Usage logs & storage logs.

Assuming we have deployed a bucket which we want to log activities on, follow the steps below:

Step 1 : Create a bucket to store the usage logs

One bucket in a project can be used to collect the usage logs from one or more other buckets in the project. Use the StackQL Command Shell (stackql shell) or stackql exec to create this logs bucket as shown here:

INSERT INTO google.storage.buckets(
  project,
  data__name,
  data__location,
  data__locationType
)
SELECT
  'stackql',
  'stackql-download-logs',
  'US',
  'multi-region'
;

for more examples of creating Google Cloud Storage buckets using StackQL, see Deploying and Querying GCS Buckets using StackQL.

Step 2: Set IAM policy for the logs bucket

You will need to create an IAM binding to enable writes to this bucket, do this by using the setIamPolicy method as shown here:

EXEC google.storage.buckets.setIamPolicy
@bucket = 'stackql-download-logs'
@@json = '{
  "bindings":[
    {
      "role": "roles/storage.legacyBucketWriter",
      "members":[
        "group:cloud-storage-analytics@google.com"
      ]
    }
  ]
}';

TIP: you should also add role bindings to the roles/storage.legacyBucketOwner role for serviceAccount or users who will be running StackQL SELECT queries against this logs bucket.

Step 3: Enable logging on the target bucket

To enable logging on your target bucket (or buckets) run the following StackQL EXEC method:

EXEC google.storage.buckets.patch
@bucket = 'stackql-downloads'
@@json = '{
 "logging": {
  "logBucket": "stackql-download-logs",
  "logObjectPrefix": "stackql_downloads"
 }
}';

TIP: use SHOW METHODS IN google.storage.buckets; to see what operations are avaialable such as the patch and setIamPolicy examples shown in the previous steps.

Step 4: Check logging status on target bucket

To see that logging has been enabled run the StackQL query below:

select name, logging
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

To unpack the logging object, you can use the [JSON_EXTRACT]](/docs/language-spec/functions/json/json_extract) built in function as shown here:

select name, json_extract(logging, '$.logBucket') as logBucket,
json_extract(logging, '$.logObjectPrefix') as logObjectPrefix
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

In Part II of this post, we will demonstrate how to create a Big Query dataset, then load and analyze the GCS usage logs you have collected using Big Query, stay tuned!

Deploying and Querying GCS Buckets using StackQL

August 15, 2021 · 6 min read

Jeffrey Aven

Technologist and Cloud Consultant

StackQL is a simple way to automate the deployment, configuration, management, and removal of Google Cloud Storage buckets, as well as an easy way to query buckets in your environment for asset management reasons or to look for misconfigurations, such as public access, non-conformant encryption configuration and more. It may be useful to review Exploring GCP services and resources using StackQL, which walks working through the StackQL resource hierarchy including the SHOW and DESCRIBE commands.

Generate an `INSERT` template for a new bucket

The SHOW INSERT command in StackQL can be used to generate an INSERT template which can be used to create any resources in GCP. The easiest way to use this command is via the command line using the text output type and supressing column headers using the -H flag, the StackQL interactive shell can be used as well.

Interactive Shell
Command Line

SHOW INSERT INTO google.storage.buckets;

stackql exec "SHOW INSERT INTO google.storage.buckets" --output text -H

To only show the mandatory fields for a resource, you can use the /*+ REQUIRED */ query hint.

You could also use the --outfile flag to write the template to a new IQL file, or pipe the results to a file using the appropriate shell operator (e.g. > or >>).

More information on generating templates can be found at Creating Infrastructure Templates

Create a bucket

Now that you have a template, you can curate this to the fields you desire. Executing an INSERT command to create a bucket is as easy as:

-- change this for your project and bucket names
INSERT INTO google.storage.buckets(
  project,
  data__name,
  data__location,
  data__locationType,
  data__labels
)
SELECT
  'stackql-demo',
  'stackql-demo-bucket',
  'US',
  'multi-region',
  '[{"env":"demo"}]'
;

Removing buckets

In some cases you may want to retire and remove buckets that are no longer needed, doing so is easy in StackQL using the DELETE command this can be done individually or in a batch (as we will demonstrate later).

NOTE that buckets cannot be deleted if they contain any objects, we will show examples of working with objects in future posts

To delete the bucket you created in the previous step, run the following:

DELETE FROM google.storage.buckets
WHERE bucket = 'stackql-demo-bucket';

Using variables to automate bucket creation

Although the previous example was simple, providing static values for attributes is not very scalable. In most cases, configuration for resource creation and modification is provided as data associated with an StackQL script or module using json or jsonnet files.

This can be supplied inline in the IQL script or provided as a separate file, e.g. vars.jsonnet. For more information, see Using Variables.

The example below demonstrates how to create 3 different buckets using configuration data provided using jsonnet.

TIP Use Jsonnet where possible as it is a much more powerful and feature rich configuration and data templating language supporting inheritance, external variables, parametrization, functions and operators, and much more. For more information visit https://jsonnet.org/.

Step 1: Save the following jsonnet configuration to a file named cloud_storage_vars.jsonnet:

local project = 'stackql-demo'; // update to your project name

{
  project: project,
  location: 'US',
  locationType: 'multi-region',
  buckets: [
    {
      name: project + '-bucket1', 
      label: [{"env":"demo1"}],
    },
    {
      name: project + '-bucket2',  
      label: [{"env":"demo2"}],
    },
    {
      name: project + '-bucket3',  
      label: [{"env":"demo3"}],
    },
  ]
} 

Step 2: Save the following StackQL script to a file named deploy_buckets.iql:

{{range .buckets}}
INSERT INTO google.storage.buckets(
  project,
  data__location,
  data__locationType,
  data__name,
  data__labels
)
SELECT
  '{{ $.project }}',
  '{{ $.location }}',
  '{{ $.locationType }}',
  '{{ .name }}',
  '{{ .label }}'
;
{{end}}

Step 3: (Optional) Perform a dryrun operation to see what the resultant query would be

With any modification operation which sources data from a jsonnet file it is useful to perform a dryrun operation first to ensure the variable substitution is as you intended. This is done by using the dryrun flag as shown here:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --dryrun --output text -H

Step 4: Run the query

If you are satisfied with the results of the dry run, remove the dryrun and output flags (which are not needed) and run the command, as shown here (note that if you are creating or deploying resources in GCP you will need to supply a valid service account key file or authenticate interactively):

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./deploy_buckets.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Querying buckets using `SELECT` statements

This is easy...

SELECT id, name, location, locationType, timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo';
/* returns
|----------------------|----------------------|----------|--------------|--------------------------|
|          id          |         name         | location | locationType |       timeCreated        |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket1 | stackql-demo-bucket1 | US       | multi-region | 2021-08-10T21:32:25.847Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket2 | stackql-demo-bucket2 | US       | multi-region | 2021-08-10T21:32:26.877Z |
|----------------------|----------------------|----------|--------------|--------------------------|
| stackql-demo-bucket3 | stackql-demo-bucket3 | US       | multi-region | 2021-08-10T21:32:27.864Z |
|----------------------|----------------------|----------|--------------|--------------------------|
*/

To order the result set by timeCreated latest first...

SELECT id, name, location, locationType,timeCreated
FROM google.storage.buckets
WHERE project = 'stackql-demo'
ORDER BY timeCreated DESC;

Or a simple count:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
|                 3 |
|-------------------|
*/

Cleanup

To clean up the environment and remove the test buckets, copy the following code to a file named bucket_cleanup.iql.

{{range .buckets}}
DELETE FROM google.storage.buckets
WHERE bucket = '{{.name}}';
{{end}}

Since we are in the danger zone here (deleting multiple persistent data objects) you may want to be sure you haven’t fat-fingered anything and don’t remove anything you didn’t intend to... To do this, use the dryrun option shown earlier to run the script and preprocess the input data only and show the templated command before you run it.

When you are ready to go you can run this command:

stackql exec --iqldata ./cloud_storage_vars.jsonnet --infile ./bucket_cleanup.iql --keyfilepath /mnt/c/tmp/stackql-demo.json

Now you should see a filtered count return 0 resources as shown here:

SELECT COUNT(*) as 'Number of Buckets'
FROM google.storage.buckets
WHERE project = 'stackql-demo'
AND name LIKE '%stackql-demo-bucket%';
/* returns
|-------------------|
| Number of Buckets |
|-------------------|
|                 0 |
|-------------------|
*/

Future posts will demonstrate more advanced query and deployment operations with Cloud Storage, such as working with encryption using CMEK in KMS, configuring logging, cors, website access, logging configuration, acls and IAM. See ya then!

Step 1 : Run a query to find buckets which do not have public access prevention enforced​

Step 2 : Configure public access prevention for a bucket​

Step 3: Confirm public access prevention is enforced​

Step 1 : Create a Big Query dataset​

Step 2 : Create usage table​

Step 3 : Load the usage data​

Clean up (optional)​

Step 1 : Create a bucket to store the usage logs​

Step 2: Set IAM policy for the logs bucket​

Step 3: Enable logging on the target bucket​

Step 4: Check logging status on target bucket​

Generate an INSERT template for a new bucket​

Create a bucket​

Removing buckets​

Using variables to automate bucket creation​

Querying buckets using SELECT statements​

Cleanup​

Step 1 : Run a query to find buckets which do not have public access prevention enforced

Step 2 : Configure public access prevention for a bucket

Step 3: Confirm public access prevention is enforced

Step 1 : Create a Big Query dataset

Step 2 : Create `usage` table

Step 3 : Load the usage data

Clean up (optional)

Step 1 : Create a bucket to store the usage logs

Step 2: Set IAM policy for the logs bucket

Step 3: Enable logging on the target bucket

Step 4: Check logging status on target bucket

Generate an `INSERT` template for a new bucket

Create a bucket

Removing buckets

Using variables to automate bucket creation

Querying buckets using `SELECT` statements

Cleanup