Skip to main content

10 posts tagged with "gcp"

View All Tags

· 3 min read

Queries (particularly) repetitive queries that don't take advantage of results caching can lead to extraordinarily high bills.

StackQL, with it's backend SQL engine, allows you to query Big Query statistics in real time, including identifying queries which are not served from cache and understanding billable charges per query or time slice.

Here is a simple query to break down a time period into hours and show the total queries, queries served from cache and the total query charges per hour.

SELECT
STRFTIME('%H', DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch')) as hour,
COUNT(*) as num_queries,
SUM(JSON_EXTRACT(statistics, '$.query.cacheHit')) as using_cache,
SUM(JSON_EXTRACT(statistics, '$.query.totalBytesBilled')*{{ .costPerByte }} ) as queryCost
FROM google.bigquery.jobs
WHERE projectId = '{{ .projectId }}'
AND allUsers = 'true'
AND minCreationTime = '{{ .minCreationTime }}'
AND maxCreationTime = '{{ .maxCreationTime }}'
AND state = 'DONE'
AND JSON_EXTRACT(statistics, '$.query') IS NOT null
GROUP BY STRFTIME('%H', datetime(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch'));

Many more examples to come, including using this data to create visualisations in a Jupyter notebook, stay tuned!

· 4 min read

Understanding roles is integral to applying the principal of least privilege to GCP environments.

A quick primer on roles in GCP

A Role in GCP is a collection of permissions to services and APIs on the platform. Roles are "bound" to principals or members (users, groups and service accounts).

These bindings are referred to as "policies" which are scoped at a particular level - organisation, folder, project, resource.

There are three types of roles - Primitive Roles, Predefined Roles and Custom Roles.

Primitive (or Basic) Roles

These are legacy roles set at a GCP project level which include Owner, Editor, and Viewer. These are generally considered to be excessive in terms of permissions and their use should be minimised if not avoided altogether.

Predefined Roles

These are roles with fine grained access to discrete services in GCP. Google has put these together for your convenience. In most cases predefined roles are the preferred mechanism to assign permissions to members.

Custom Roles

Custom roles can be created with a curated collection of permissions if required, reasons for doing so include:

  • if the permissions in predefined roles are excessive for your security posture
  • if you want to combine permissions across different services, and cannot find a suitable predefined role although it is preferred to assign multiple predefined roles to a given member

Anatomy of an IAM Policy

An IAM Policy is a collection of bindings of one or more members (user, group or service account) to a role (primitive, predefined or custom). Policies are normally expressed as JSON objects as shown here:

{
"bindings": [
{
"members": [
"group:project-admins@my-cloud-identity-domain.com"
],
"role": "roles/owner"
},
{
"members": [
"serviceAccount:provisioner@my-project.iam.gserviceaccount.com",
"user:javen@avensolutions.com"
],
"role": "roles/resourcemanager.folderViewer"
}
]
}

Groups are Google Groups created in Cloud Identity or Google Workspace (formerly known as G-Suite)

Application of policies is an atomic operation, which will overwrite any existing policy attached to an entity (org, folder, project, resource).

Querying Roles with StackQL

Predefined and primitive roles are defined in the roles resource in StackQL (google.iam.roles) - which returns the following fields (as returned by DESCRIBE google.iam.roles):

NameDescription
nameName of the role in the format roles/[{service}.]{role}
for predefined or basic roles, or qualified for custom roles,
e.g. organizations/{org_id}/roles/[{service}.]{role}
descriptionAn optional, human-readable description for the role
includedPermissionsAn array of permissions this role grants (only displayed with
VIEW = 'full')
etagOutput only, used internally for consistency
titleAn optional, human-readable title for the role (visible in the
Console)
deletedA read only boolean field showing the current deleted state
of the role
stageThe current launch stage of the role, e.g. ALPHA

Get the name for a role

Often, you may know the "friendly" title for a role like "Logs Bucket Writer", but you need the actual role name to use in an Iam policy - which is roles/logging.bucketWriter. A simple query to find this using StackQL is shown here:

SELECT name
FROM google.iam.roles
WHERE title = 'Logs Bucket Writer';
/* RETURNS:
|----------------------------|
| name |
|----------------------------|
| roles/logging.bucketWriter |
|----------------------------|
*/

Conversely, if you have the name but want the friendly title you could use:

SELECT title
FROM google.iam.roles
WHERE name = 'roles/logging.bucketWriter';
/* RETURNS:
|--------------------|
| title |
|--------------------|
| Logs Bucket Writer |
|--------------------|
*/

Wildcards can also be used with the LIKE operator, for example to get the name and title for each predefined role in the logging service you could run:

SELECT name, title
FROM google.iam.roles
WHERE name LIKE 'roles/logging.%';

Get the permissions for a role

To return the includedPermissions you need to add the following WHERE clause:

WHERE view = 'FULL'

An example query to list the permissions for a given role is shown here:

SELECT includedPermissions
FROM google.iam.roles
WHERE view = 'FULL' AND
name = 'roles/cloudfunctions.viewer';
/* RETURNS
["cloudbuild.builds.get","cloudbuild.builds.list",...]
*/

A more common challenge is that you know a particular permission such as cloudfunctions.functions.get and you want to know which roles contain this permission you could run the following query:

SELECT name, title
FROM google.iam.roles
WHERE view = 'FULL'
AND includedPermissions LIKE '%cloudfunctions.functions.get%';

Creating custom roles and more...

In forthcoming articles, we will demonstrate how you can create custom roles using StackQL INSERT operations, as well as how you can construct a simple IAM framework to manage and provision access to resources in GCP, stay tuned!

· 2 min read

I grappled with Terraform for the better part of a day trying to provision a GKE Autopilot cluster in a Shared VPC service project, I was able to do this with StackQL in 2 minutes, this is how...

Before starting you will need the following to use GKE Autopilot in your Shared VPC:

  • control plane IP address range
  • control plane authorized networks (if desired)
  • the host network and node subnet you intend to use
  • pod and services secondary CIDR ranges

(all of the above would typically be pre-provisioned in the Shared VPC design and deployment)

Step 1: Using the GCP console, navigate to your service project, go to Kubernetes Engine --> Clusters --> Create --> GKE Autopilot --> Configure. Enter in all of the desired configuration options (including the network configuration specified above). Do not select CREATE.

Step 2: At the bottom of the dialog used to configure the cluster in the console, use the Equivalent REST button to generate the GKE Autopilot API request body.

Step 3: Supply this as input data to an StackQL INSERT command, either via an iql file, on as inline configuration. Optionally you can convert this to Jsonnet and parameterise for use in other environments.

<<<json
{
"cluster": {
..from equivalent REST command..
}
}
>>>

INSERT INTO google.container.`projects.locations.clusters`(
parent,
data__cluster
)
SELECT 'projects/my-svc-project/locations/australia-southeast1',
'{{ .cluster }}'
;

easy!

· 7 min read

Big Query provides a wealth of metrics and statistics for jobs run against it which could be queries, load jobs or export jobs. This article demonstrates some queries you can run using StackQL to bring back live statistics from load operations into Big Query as well as detail regarding errors encountered during the loading of data into Big Query.

Loading Data into Big Query from GCS using StackQL

In a previous blog, we demonstrated how to create a Big Query dataset and how to create a Big Query table using StackQL INSERT statements. Having created a target dataset and table in Big Query, we can invoke a load job using StackQL by performing an INSERT into the google.bigquery.jobs resource.

The data for this operation is shown in the Data tab which is supplied in Jsonnet format.

INSERT INTO google.bigquery.jobs(
projectId,
data__configuration
)
SELECT
'stackql',
'{{ .configuration }}'
;

Query for Big Query Errors

The Big Query Job Object can be queried using an StackQL SELECT statement.

To see the available fields with their data types and descriptions, you can run the following StackQL DESCRIBE statement:

DESCRIBE EXTENDED google.bigquery.jobs;

As you can see from running the above command or looking at the API documentation, there is a state field which is an enum showing the state of the job, since we are only concerned with completed jobs we will filter on jobs with a state of DONE. The errorResult field is an object but its presence alone indicates that an error has occurred so we will add another filter to only show results where errorResult is not null.

A simple query to start off with is to count the number of errors, this will be for all job types (load, extract and query):

SELECT COUNT(*) as num_errors 
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

To get a little more information about Big Query errors we can run a detailed query, extracting fields from the errorResult object using the JSON_EXTRACT built in function. This function is exceptionally useful as many of the fields returned from Google APIs are complex objects.

SELECT id, JSON_EXTRACT(errorResult, '$.reason') AS errorReason
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

Get Big Query Load Specific Errors

The previous queries returned all errors for all Big Query job types. If we want to narrow our query to just Big Query load operations we can use the Big Query JobStatistics object, which includes fields for each job type.

To refine results to only load operations add the following expression to the WHERE clause:

AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

Date values returned in job responses are in Unix timestamp format, to format them in a human readable format we can use the DATETIME built in function. Here is a more advanced example:

SELECT id,
JSON_EXTRACT(errorResult, '$.message') AS errorMessage,
JSON_EXTRACT(errorResult, '$.reason') AS errorReason,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

Get Big Query Load Statistics

Now if you want to query for statistics for Big Query load operations which were successful, we can refine the query using the following conditions:

WHERE project = 'myproject'
AND state = 'DONE'
AND errorResult IS null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

The JobStatistics object for a Big Query load job can be found here: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics3. Let's run an StackQL query to return all of the statistics for load jobs run in a given GCP project.

SELECT id,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime,
JSON_EXTRACT(statistics, '$.load.inputFiles') AS inputFiles,
JSON_EXTRACT(statistics, '$.load.inputFileBytes') AS inputFileBytes,
JSON_EXTRACT(statistics, '$.load.outputRows') AS outputRows,
JSON_EXTRACT(statistics, '$.load.outputBytes') AS outputBytes,
JSON_EXTRACT(statistics, '$.load.badRecords') AS badRecords
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS null
AND json_extract(statistics, '$.load') IS NOT null;

In future posts we will show similar examples using StackQL to query for errors and statistics for extract and query jobs in Big Query, see you then!

· 7 min read

The Cloud Asset API has recently gone GA, this is an exceptionally useful service which stores the history and inventory of cloud resources in your GCP org. Using the Cloud Asset API via StackQL you can enumerate all of the services and resources in your GCP org, including billable resources such as Cloud Storage buckets or Compute Engine instances, as well as other objects such as billing accounts, folders, projects, firewalls, service accounts and much more. All of this can be done using SQL!

Let’s start by exploring the available fields in this service:

Explore the API

Use the DESCRIBE or DESCRIBE EXTENDED to see the fields available in the google.cloudasset.assets resource as shown here:

DESCRIBE EXTENDED google.cloudasset.assets;

As you can see there is some very interesting stuff here, including where the asset fits in the organization hierarchy as well as whether the asset is included in a service perimeter.

Run some queries!

To start querying you just need to supply a root node from which you want to start enumerating assets, this can be at an org level, folder level or project level.

A simple query to group and count all of the different types of assets in a GCP project is shown here:

SELECT assetType, COUNT(*)
FROM google.cloudasset.assets
WHERE parent = 'projects/123123123123'
GROUP BY assetType;

or to see the most recent assets to be deployed or modified you could run:

SELECT name, updateTime
FROM google.cloudasset.assets
WHERE parent = 'organizations/12312312312'
ORDER BY updateTime DESC
LIMIT 3;

You can go nuts from here with other reports or drill into detail as to anomalies or stray assets, have fun!