Skip to main content

11 posts tagged with "gcp"

View All Tags

· 3 min read

StackQL allows you to query and interact with your cloud and SaaS assets using a simple SQL framework

Understanding entitlements across a GCP org with a complex hierarchy is a challenge. I have taken and data-centric approach to this in this article.

Prerequisites include setting up a Jupyter environment with StackQL (done here using Docker): stackql-jupyter-demo. You will also need a service account and associated key with the roles/iam.securityReviewer role.

I've broken the notebook bits down to explain...

Setup

This step includes importing the required libraries (pandas etc.) and instantiating a StackQL client with the service account creds you created before. You will supply your root node here using the org_id and org_name variables.

Next we will create some helper functions; these will help us enumerate nodes in the GCP org resource hierarchy and fetch and unnest IAM policies.

Get all nodes in the resource hierarchy

Create a dataframe containing all nodes in the resource hierarchy, including the root node (the organization), each folder with its subfolders, and projects. The functions used will search each folder in the hierarchy to find its subfolders and projects using a depth-first search approach.

Inspecting the output, it looks like this:

GCP Nodes

Create a dataset including each node and its associated IAM policies

This step will fetch all of the policies applied at each node in the data structure we created in the previous step.

The IAM policies response from SELECT role, members FROM google.cloudresourcemanager.project_iam_policies ... presents some challenges as members is a nested list which we need to unnest (or explode) along with the associated role and conditions (if they exist).

This bit of massaging will give us a SQL-friendly model we can use for analysis and join with another data source (such as a list of identities from an identity provider).

Inspecting the Final Output

We can now peek at the final data set, which looks like this:

GCP Nodes with IAM Policies

What's next? You could now join this with data from your IdP, or other SaaS services to correlate entitlements across your entire estate. You could also drill into specific service accounts, users, or groups. Queries are run in real-time, so you can refresh the data by simply rerunning the cells.

Welcome your feedback by getting in touch or raising issues at stackql/stackql or stackql/stackql-provider-registry, give us some ⭐️ love while you are there!

Enjoy!

· 3 min read

Queries (particularly) repetitive queries that don't take advantage of results caching can lead to extraordinarily high bills.

StackQL, with it's backend SQL engine, allows you to query Big Query statistics in real time, including identifying queries which are not served from cache and understanding billable charges per query or time slice.

Here is a simple query to break down a time period into hours and show the total queries, queries served from cache and the total query charges per hour.

SELECT
STRFTIME('%H', DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch')) as hour,
COUNT(*) as num_queries,
SUM(JSON_EXTRACT(statistics, '$.query.cacheHit')) as using_cache,
SUM(JSON_EXTRACT(statistics, '$.query.totalBytesBilled')*{{ .costPerByte }} ) as queryCost
FROM google.bigquery.jobs
WHERE projectId = '{{ .projectId }}'
AND allUsers = 'true'
AND minCreationTime = '{{ .minCreationTime }}'
AND maxCreationTime = '{{ .maxCreationTime }}'
AND state = 'DONE'
AND JSON_EXTRACT(statistics, '$.query') IS NOT null
GROUP BY STRFTIME('%H', datetime(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch'));

Many more examples to come, including using this data to create visualisations in a Jupyter notebook, stay tuned!

· 4 min read

Understanding roles is integral to applying the principal of least privilege to GCP environments.

A quick primer on roles in GCP

A Role in GCP is a collection of permissions to services and APIs on the platform. Roles are "bound" to principals or members (users, groups and service accounts).

These bindings are referred to as "policies" which are scoped at a particular level - organisation, folder, project, resource.

There are three types of roles - Primitive Roles, Predefined Roles and Custom Roles.

Primitive (or Basic) Roles

These are legacy roles set at a GCP project level which include Owner, Editor, and Viewer. These are generally considered to be excessive in terms of permissions and their use should be minimised if not avoided altogether.

Predefined Roles

These are roles with fine grained access to discrete services in GCP. Google has put these together for your convenience. In most cases predefined roles are the preferred mechanism to assign permissions to members.

Custom Roles

Custom roles can be created with a curated collection of permissions if required, reasons for doing so include:

  • if the permissions in predefined roles are excessive for your security posture
  • if you want to combine permissions across different services, and cannot find a suitable predefined role although it is preferred to assign multiple predefined roles to a given member

Anatomy of an IAM Policy

An IAM Policy is a collection of bindings of one or more members (user, group or service account) to a role (primitive, predefined or custom). Policies are normally expressed as JSON objects as shown here:

{
"bindings": [
{
"members": [
"group:project-admins@my-cloud-identity-domain.com"
],
"role": "roles/owner"
},
{
"members": [
"serviceAccount:provisioner@my-project.iam.gserviceaccount.com",
"user:javen@avensolutions.com"
],
"role": "roles/resourcemanager.folderViewer"
}
]
}

Groups are Google Groups created in Cloud Identity or Google Workspace (formerly known as G-Suite)

Application of policies is an atomic operation, which will overwrite any existing policy attached to an entity (org, folder, project, resource).

Querying Roles with StackQL

Predefined and primitive roles are defined in the roles resource in StackQL (google.iam.roles) - which returns the following fields (as returned by DESCRIBE google.iam.roles):

NameDescription
nameName of the role in the format roles/[{service}.]{role}
for predefined or basic roles, or qualified for custom roles,
e.g. organizations/{org_id}/roles/[{service}.]{role}
descriptionAn optional, human-readable description for the role
includedPermissionsAn array of permissions this role grants (only displayed with
VIEW = 'full')
etagOutput only, used internally for consistency
titleAn optional, human-readable title for the role (visible in the
Console)
deletedA read only boolean field showing the current deleted state
of the role
stageThe current launch stage of the role, e.g. ALPHA

Get the name for a role

Often, you may know the "friendly" title for a role like "Logs Bucket Writer", but you need the actual role name to use in an Iam policy - which is roles/logging.bucketWriter. A simple query to find this using StackQL is shown here:

SELECT name
FROM google.iam.roles
WHERE title = 'Logs Bucket Writer';
/* RETURNS:
|----------------------------|
| name |
|----------------------------|
| roles/logging.bucketWriter |
|----------------------------|
*/

Conversely, if you have the name but want the friendly title you could use:

SELECT title
FROM google.iam.roles
WHERE name = 'roles/logging.bucketWriter';
/* RETURNS:
|--------------------|
| title |
|--------------------|
| Logs Bucket Writer |
|--------------------|
*/

Wildcards can also be used with the LIKE operator, for example to get the name and title for each predefined role in the logging service you could run:

SELECT name, title
FROM google.iam.roles
WHERE name LIKE 'roles/logging.%';

Get the permissions for a role

To return the includedPermissions you need to add the following WHERE clause:

WHERE view = 'FULL'

An example query to list the permissions for a given role is shown here:

SELECT includedPermissions
FROM google.iam.roles
WHERE view = 'FULL' AND
name = 'roles/cloudfunctions.viewer';
/* RETURNS
["cloudbuild.builds.get","cloudbuild.builds.list",...]
*/

A more common challenge is that you know a particular permission such as cloudfunctions.functions.get and you want to know which roles contain this permission you could run the following query:

SELECT name, title
FROM google.iam.roles
WHERE view = 'FULL'
AND includedPermissions LIKE '%cloudfunctions.functions.get%';

Creating custom roles and more...

In forthcoming articles, we will demonstrate how you can create custom roles using StackQL INSERT operations, as well as how you can construct a simple IAM framework to manage and provision access to resources in GCP, stay tuned!

· 2 min read

I grappled with Terraform for the better part of a day trying to provision a GKE Autopilot cluster in a Shared VPC service project, I was able to do this with StackQL in 2 minutes, this is how...

Before starting you will need the following to use GKE Autopilot in your Shared VPC:

  • control plane IP address range
  • control plane authorized networks (if desired)
  • the host network and node subnet you intend to use
  • pod and services secondary CIDR ranges

(all of the above would typically be pre-provisioned in the Shared VPC design and deployment)

Step 1: Using the GCP console, navigate to your service project, go to Kubernetes Engine --> Clusters --> Create --> GKE Autopilot --> Configure. Enter in all of the desired configuration options (including the network configuration specified above). Do not select CREATE.

Step 2: At the bottom of the dialog used to configure the cluster in the console, use the Equivalent REST button to generate the GKE Autopilot API request body.

Step 3: Supply this as input data to an StackQL INSERT command, either via an iql file, on as inline configuration. Optionally you can convert this to Jsonnet and parameterise for use in other environments.

<<<json
{
"cluster": {
..from equivalent REST command..
}
}
>>>

INSERT INTO google.container.`projects.locations.clusters`(
parent,
data__cluster
)
SELECT 'projects/my-svc-project/locations/australia-southeast1',
'{{ .cluster }}'
;

easy!

· 7 min read

Big Query provides a wealth of metrics and statistics for jobs run against it which could be queries, load jobs or export jobs. This article demonstrates some queries you can run using StackQL to bring back live statistics from load operations into Big Query as well as detail regarding errors encountered during the loading of data into Big Query.

Loading Data into Big Query from GCS using StackQL

In a previous blog, we demonstrated how to create a Big Query dataset and how to create a Big Query table using StackQL INSERT statements. Having created a target dataset and table in Big Query, we can invoke a load job using StackQL by performing an INSERT into the google.bigquery.jobs resource.

The data for this operation is shown in the Data tab which is supplied in Jsonnet format.

INSERT INTO google.bigquery.jobs(
projectId,
data__configuration
)
SELECT
'stackql',
'{{ .configuration }}'
;

Query for Big Query Errors

The Big Query Job Object can be queried using an StackQL SELECT statement.

To see the available fields with their data types and descriptions, you can run the following StackQL DESCRIBE statement:

DESCRIBE EXTENDED google.bigquery.jobs;

As you can see from running the above command or looking at the API documentation, there is a state field which is an enum showing the state of the job, since we are only concerned with completed jobs we will filter on jobs with a state of DONE. The errorResult field is an object but its presence alone indicates that an error has occurred so we will add another filter to only show results where errorResult is not null.

A simple query to start off with is to count the number of errors, this will be for all job types (load, extract and query):

SELECT COUNT(*) as num_errors 
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

To get a little more information about Big Query errors we can run a detailed query, extracting fields from the errorResult object using the JSON_EXTRACT built in function. This function is exceptionally useful as many of the fields returned from Google APIs are complex objects.

SELECT id, JSON_EXTRACT(errorResult, '$.reason') AS errorReason
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null;

Get Big Query Load Specific Errors

The previous queries returned all errors for all Big Query job types. If we want to narrow our query to just Big Query load operations we can use the Big Query JobStatistics object, which includes fields for each job type.

To refine results to only load operations add the following expression to the WHERE clause:

AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

Date values returned in job responses are in Unix timestamp format, to format them in a human readable format we can use the DATETIME built in function. Here is a more advanced example:

SELECT id,
JSON_EXTRACT(errorResult, '$.message') AS errorMessage,
JSON_EXTRACT(errorResult, '$.reason') AS errorReason,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS NOT null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

Get Big Query Load Statistics

Now if you want to query for statistics for Big Query load operations which were successful, we can refine the query using the following conditions:

WHERE project = 'myproject'
AND state = 'DONE'
AND errorResult IS null
AND JSON_EXTRACT(statistics, '$.load') IS NOT null;

The JobStatistics object for a Big Query load job can be found here: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics3. Let's run an StackQL query to return all of the statistics for load jobs run in a given GCP project.

SELECT id,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.creationTime'), 1, 10), 'unixepoch') AS creationTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.startTime'), 1, 10), 'unixepoch') AS startTime,
DATETIME(SUBSTR(JSON_EXTRACT(statistics, '$.endTime'), 1, 10), 'unixepoch') AS endTime,
JSON_EXTRACT(statistics, '$.load.inputFiles') AS inputFiles,
JSON_EXTRACT(statistics, '$.load.inputFileBytes') AS inputFileBytes,
JSON_EXTRACT(statistics, '$.load.outputRows') AS outputRows,
JSON_EXTRACT(statistics, '$.load.outputBytes') AS outputBytes,
JSON_EXTRACT(statistics, '$.load.badRecords') AS badRecords
FROM google.bigquery.jobs
WHERE projectId = 'stackql'
AND state = 'DONE'
AND errorResult IS null
AND json_extract(statistics, '$.load') IS NOT null;

In future posts we will show similar examples using StackQL to query for errors and statistics for extract and query jobs in Big Query, see you then!