Skip to main content

12 posts tagged with "infrastructure-as-code"

View All Tags

· 2 min read

I grappled with Terraform for the better part of a day trying to provision a GKE Autopilot cluster in a Shared VPC service project, I was able to do this with StackQL in 2 minutes, this is how...

Before starting you will need the following to use GKE Autopilot in your Shared VPC:

  • control plane IP address range
  • control plane authorized networks (if desired)
  • the host network and node subnet you intend to use
  • pod and services secondary CIDR ranges

(all of the above would typically be pre-provisioned in the Shared VPC design and deployment)

Step 1: Using the GCP console, navigate to your service project, go to Kubernetes Engine --> Clusters --> Create --> GKE Autopilot --> Configure. Enter in all of the desired configuration options (including the network configuration specified above). Do not select CREATE.

Step 2: At the bottom of the dialog used to configure the cluster in the console, use the Equivalent REST button to generate the GKE Autopilot API request body.

Step 3: Supply this as input data to an StackQL INSERT command, either via an iql file, on as inline configuration. Optionally you can convert this to Jsonnet and parameterise for use in other environments.

<<<json
{
"cluster": {
..from equivalent REST command..
}
}
>>>

INSERT INTO google.container.`projects.locations.clusters`(
parent,
data__cluster
)
SELECT 'projects/my-svc-project/locations/australia-southeast1',
'{{ .cluster }}'
;

easy!

· 2 min read

Jsonnet is a fantastic configuration language as discussed in Using Jsonnet to Configure Multiple Environments. Going slightly beyond the basics, this article is an introduction to anonymous functions and the map and format methods in the Jsonnet standard library.

Similar to map methods in various other functional programming languages or data processing frameworks, map in Jsonnet evaluates a named or anonymous function for each element within an array. map is a higher order function, meaning it is a function that calls another function. Its signature is here:

std.map(func, arr)

the func argument could be a named function or an unnamed (or anonymous function). arr is an input array which could include embedded dictionaries or other lists as well.

In this example I am templating some config for a NAT gateway in GCP for use in an StackQL routine, where I have a list of external IP's that need to be formatted in the Google selfLink format. Perfect use for the map method as well as the format command similar to the printf or equivalent commands found in various other languages. The easiest way to use this is similar to the way you would invoke this in Python:

"%s/%s/%s" % [string1, string2, string3]

Putting it all together in the following practical example, you can see the input Jsonnet in the Jsonnet tab and the templated or rendered output in the Json tab. x represents an element of the extIps array, then the function returns the fully qualified selfLink url.

{
local project_id = 'myproject-123',
local region = 'australia-southeast1',
local self_link_prefix = 'https://compute.googleapis.com/compute/v1/projects/',
local extIps = [{name: 'syd-extip1', region: region},{name: 'syd-extip2', region: region}],

nats: [
{
name: 'nat-config',
natIpAllocateOption: 'MANUAL_ONLY',
natIps: std.map((function(x) "%s/%s/regions/%s/addresses/%s" % [self_link_prefix, project_id, x.region, x.name]), extIps),
sourceSubnetworkIpRangesToNat: 'ALL_SUBNETWORKS_ALL_IP_RANGES'
},
],
}

more to come...

· 2 min read

Its easy enough for anyone to deploy a Cloud Storage bucket in google, this can be done through the console, gcloud, terraform or stackql as shown here: Deploying and Querying GCS Buckets using StackQL. It is also easy to inadvertently allow users to set public ACLs on a bucket, therefore making its contents publicly visible by default. There is an easy way to prevent this from happening by Using public access prevention.

Let's work through a real life scenario using StackQL.

Step 1 : Run a query to find buckets which do not have public access prevention enforced

Run the following StackQL query from the shell or via exec:

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
| name | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod | unspecified |
|-------------------|------------------------|
| myco-tf-prod | enforced |
|-------------------|------------------------|
*/

We can see from the query results that the myco-tf-nonprod bucket does not have public access prevention enforced, lets fix it...using StackQL.

Step 2 : Configure public access prevention for a bucket

Run the following StackQL procedure to enforce public access prevention:

EXEC google.storage.buckets.patch 
@bucket = 'myco-tf-nonprod'
@@json = '{
"iamConfiguration": {
"publicAccessPrevention": "enforced"
}
}';

Step 3: Confirm public access prevention is enforced

Run the first query again, and you should see that the desired result is in place.

SELECT name, 
JSON_EXTRACT(iamConfiguration, '$.publicAccessPrevention') as publicAccessPrevention
FROM google.storage.buckets
WHERE project = 'myco-terraform';
/* returns
|-------------------|------------------------|
| name | publicAccessPrevention |
|-------------------|------------------------|
| myco-tf-nonprod | enforced |
|-------------------|------------------------|
| myco-tf-prod | enforced |
|-------------------|------------------------|
*/

Easy!

· 5 min read

In the previous post, we showed you how to enable usage and storage logging for GCS buckets. Now that we have enabled logging, let's load and analyze the logs using Big Query. We will build up a data file vars.jsonnet as we go and show the queries step by step, at the end we will show how to run this as one batch using StackQL.

Step 1 : Create a Big Query dataset

We will need a dataset (akin to a schema or a database in other RDMBS parlance), basically a container for objects such as tables or views, the data and code to do this are shown here:

INSERT INTO google.bigquery.datasets(
projectId,
data__location,
data__datasetReference,
data__description,
data__friendlyName
)
SELECT
'{{ .projectId }}',
'{{ .location }}',
'{ "datasetId": "{{ .datasetId }}", "projectId": "{{ .projectId }}" }',
'{{ .description }}',
'{{ .friendlyName }}'
;

Step 2 : Create usage table

Let's use StackQL to create a table named usage to host the GCS usage logs, the schema for the table is defined in a file named cloud_storage_usage_schema_v0.json which can be downloaded from the location provided, for reference this is provided in the Table Schema tab in the example provided below:

/* create_table.iql */

INSERT INTO google.bigquery.tables(
datasetId,
projectId,
data__description,
data__friendlyName,
data__tableReference,
data__schema
)
SELECT
'{{ .datasetId }}',
'{{ .projectId }}',
'{{ .table.usage.description }}',
'{{ .table.usage.friendlyName }}',
'{"projectId": "{{ .projectId }}", "datasetId": "{{ .datasetId }}", "tableId": "{{ .table.usage.tableId }}"}',
'{{ .table.usage.schema }}'
;

Run the following to execute the StackQL command with the input data shown:

stackql exec -i ./create_table.iql --iqldata ./vars.jsonnet

Step 3 : Load the usage data

We have a Big Query dataset and a table, lets load some data. To do this we will need to create and submit a load job, we can do this by inserting into the google.bigquery.jobs resource as shown here:

/* bq_load_job.iql */

INSERT INTO google.bigquery.jobs(
projectId,
data__configuration
)
SELECT
'stackql',
'{
"load": {
"destinationTable": {
"projectId": "{{ .projectId }}",
"datasetId": "{{ .datasetId }}",
"tableId": "{{ .table.usage.tableId }}"
},
"sourceUris": [
"gs://{{ .logs_bucket }}/{{ .object_prefix }}"
],
"schema": {{ .table.usage.schema }},
"skipLeadingRows": 1,
"maxBadRecords": 0,
"projectionFields": []
}
}'
;

Run the following to execute:

stackql exec -i ./bq_load_job.iql --iqldata ./vars.jsonnet

Clean up (optional)

If you want to clean up what you have done, you can do so using StackQL DELETE statements, as provided below:

NOTE: To delete a Big Query dataset, you need to delete all of the tables contained in the dataset first, as shown in the following example

-- delete table(s) 

DELETE FROM google.bigquery.tables
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}'
AND tableId = '{{ .table.usage.tableId }}';

-- delete dataset

DELETE FROM google.bigquery.datasets
WHERE projectId = '{{ .projectId }}'
AND datasetId = '{{ .datasetId }}';

· 3 min read

In a previous article, Deploying and Querying GCS Buckets using StackQL, we walked through some basic creation and query operations on Google Cloud Storage buckets. In this post we will extend on this by enabling logging on a GCS bucket using StackQL. This post is based upon this article: Usage logs & storage logs.

Assuming we have deployed a bucket which we want to log activities on, follow the steps below:

Step 1 : Create a bucket to store the usage logs

One bucket in a project can be used to collect the usage logs from one or more other buckets in the project. Use the StackQL Command Shell (stackql shell) or stackql exec to create this logs bucket as shown here:

INSERT INTO google.storage.buckets(
project,
data__name,
data__location,
data__locationType
)
SELECT
'stackql',
'stackql-download-logs',
'US',
'multi-region'
;

for more examples of creating Google Cloud Storage buckets using StackQL, see Deploying and Querying GCS Buckets using StackQL.

Step 2: Set IAM policy for the logs bucket

You will need to create an IAM binding to enable writes to this bucket, do this by using the setIamPolicy method as shown here:

EXEC google.storage.buckets.setIamPolicy
@bucket = 'stackql-download-logs'
@@json = '{
"bindings":[
{
"role": "roles/storage.legacyBucketWriter",
"members":[
"group:cloud-storage-analytics@google.com"
]
}
]
}';

TIP: you should also add role bindings to the roles/storage.legacyBucketOwner role for serviceAccount or users who will be running StackQL SELECT queries against this logs bucket.

Step 3: Enable logging on the target bucket

To enable logging on your target bucket (or buckets) run the following StackQL EXEC method:

EXEC google.storage.buckets.patch
@bucket = 'stackql-downloads'
@@json = '{
"logging": {
"logBucket": "stackql-download-logs",
"logObjectPrefix": "stackql_downloads"
}
}';

TIP: use SHOW METHODS IN google.storage.buckets; to see what operations are avaialable such as the patch and setIamPolicy examples shown in the previous steps.

Step 4: Check logging status on target bucket

To see that logging has been enabled run the StackQL query below:

select name, logging
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

To unpack the logging object, you can use the [JSON_EXTRACT]](/docs/language-spec/functions/json/json_extract) built in function as shown here:

select name, json_extract(logging, '$.logBucket') as logBucket,
json_extract(logging, '$.logObjectPrefix') as logObjectPrefix
from google.storage.buckets
WHERE project = 'stackql'
and logging is not null;

In Part II of this post, we will demonstrate how to create a Big Query dataset, then load and analyze the GCS usage logs you have collected using Big Query, stay tuned!