Skip to main content

16 posts tagged with "analytics"

View All Tags

· 3 min read
info

stackql is a dev tool that allows you to query and manage cloud and SaaS resources using SQL, which developers and analysts can use for CSPM, assurance, user access management reporting, IaC, XOps and more.

Most AWS services and resources are regionally scoped, meaning the UI, CLI, SDKs, and all other methods of querying the aws provider give you a regional view (us-east-1 or ap-southeast-2, for instance). Many customer AWS estates span multiple regions - for multinational organizations, for example, or organizations with numerous dispersed locations within the US.

Sure, you could write custom scripts wrapping the CLI or SDKs - which would require development effort (not reusable for other providers); or get an abstract view with tools like AWS Config or Systems Manager, which requires these services to be enabled and configured (not flexible and not extendible to other providers). In either case:

  1. You can't write and run customized queries and generate custom reports - as you can do in SQL
  2. Any solutions you build will have to be rebuilt entirely for other providers

Using the latest (AWS provider for StackQL - which leverages the AWS Cloud Control API) and the executeQueriesAsync method in the pystackql Python package, I've put together an example here which runs a query to bring back attributes from all AWS Lambda functions deployed across 17 different AWS regions asynchronously. Results can be returned as a list of Python dictionaries or a Pandas dataframe. I am doing the former here, which took less than 10s.

from pystackql import StackQL
from pprint import pprint
from asyncio import run
stackql = StackQL()
stackql.executeStmt("REGISTRY PULL aws") # not required if the aws provider is already installed

async def stackql_async_queries(queries):
return await stackql.executeQueriesAsync(queries)

regions= ["us-east-1","us-east-2","us-west-1","us-west-2","ap-south-1","ap-northeast-3","ap-northeast-2","ap-southeast-1",
"ap-southeast-2","ap-northeast-1","ca-central-1","eu-central-1","eu-west-1","eu-west-2","eu-west-3","eu-north-1",
"sa-east-1"]

# list functions from all regions asynchronously
get_fns = [
f"""
SELECT *
FROM aws.lambda.functions
WHERE region = '{region}'
"""
for region in regions
]

functions = run(stackql_async_queries(get_fns))

# get function details for all functions across all regions asynchronously
get_fn_details = [
f"""
SELECT
function_name,
region,
arn,
description,
architectures,
memory_size,
runtime
FROM aws.lambda.function
WHERE region = '{function['region']}'
AND data__Identifier = '{function['function_name']}'
"""
for function in functions
]

function_details = run(stackql_async_queries(get_fn_details))
pprint(function_details)

which returns...

[{'architectures': '["x86_64"]',
'arn': 'arn:aws:lambda:us-east-1:824532806693:function:stackql-helloworld-fn',
'description': '',
'function_name': 'stackql-helloworld-fn',
'memory_size': '128',
'region': 'us-east-1',
'runtime': 'nodejs18.x'},
{'architectures': '["x86_64"]',
'arn': 'arn:aws:lambda:us-east-2:824532806693:function:stackql-helloworld-fn',
'description': '',
'function_name': 'stackql-helloworld-fn',
'memory_size': '128',
'region': 'us-east-2',
'runtime': 'nodejs18.x'},
{'architectures': '["x86_64"]',
'arn': 'arn:aws:lambda:us-west-1:824532806693:function:stackql-helloworld-fn',
'description': '',
'function_name': 'stackql-helloworld-fn',
'memory_size': '128',
'region': 'us-west-1',
'runtime': 'nodejs18.x'},
...

You could customize the StackQL query to run specific reports and visualize the results in a Jupyter notebook, for example:

  • Functions by runtimes
  • Function by memory size
  • Functions by tags
  • etc...

You could do something similar for other hyperscalars, for example, GCP, which scopes resources by projects, or Azure, which scopes resources by resource groups.

Let us know your thoughts! Visit us and give us a ⭐ on GitHub

· 3 min read

GitHub Codespaces is a development environment completely hosted online, enabling seamless development without setting up local machines. One of the great features of Codespaces is its compatibility with IPython, providing an interactive computing environment to test and prototype StackQL queries.

info

stackql is a dev tool that allows you to query and manage cloud and SaaS resources using SQL, which developers and analysts can use for CSPM, assurance, user access management reporting, IaC, XOps and more.

Codespaces and the power of IPython and notebooks provide a quick and easy way to use StackQL to analyze and report on your cloud estate and resource configuration. No software necessary!

stackql-codespaces-notebook

Setting Up StackQL in Codespaces

See the stackql-codespaces-notebook repository as an example.

With the example devcontainer.json configuration file shown below, you can use the stackql/stackql-codespaces-base image, which includes stackql and the pystackql package (which provides the IPython magic extension used to run stackql queries and return Pandas dataframes for inspection or visualization).

{
"image": "stackql/stackql-codespaces-base",
"containerEnv": {
"STACKQL_GITHUB_PASSWORD": "${secrets:STACKQL_GITHUB_PASSWORD}",
"STACKQL_GITHUB_USERNAME": "${secrets:STACKQL_GITHUB_USERNAME}"
},
"hostRequirements": {
"cpus": 2
},
"customizations": {
"vscode": {
"extensions": [
"ms-toolsai.jupyter",
"ms-python.python"
]
}
}
}

Note that the devcontainer configuration includes the essential extensions for Jupyter and Python; you can also optionally specify host requirements for the Codespaces machine.

Provider Authentication

The environment variables required to authenticate to your specific provider or providers can be supplied using Codespaces secrets and passed securely to the Codespaces container using the containerEnv object in the example above. For specifics about variables required for different providers, see the provider documentation for your provider in the StackQL Provider Registry.

Launching the Codespaces Environment

Given the devcontainer configuration shown above in the root of your repository at .devcontainer/devcontainer.json, you can launch codespaces from <your_repo_slug>/codespaces for example https://github.com/stackql/stackql-codespaces-notebook/codespaces. You can start a codespaces environment on any branch of your repo.

Running queries and visualizing output

When the environment is provisioned (usually takes a minute or two), you can use the StackQL magic extension and the %%stackql magic decorator to seamlessly run stackql queries, including variable substitution, for example.

%load_ext pystackql.magic

(loads the Stackql magic extension, making the %%stackql decorator available)

region = 'us-east-1'

(set some notebook variables for reusability)

%%stackql
SELECT instanceType, COUNT(*) as num_instances
FROM aws.ec2.instances
WHERE region = '$region'
GROUP BY instanceType

(run a query)

_.plot(kind='pie', y='num_instances', labels=_['instanceType'], title='Instances by Type', autopct='%1.1f%%')

(visualize the results - using matplotlib, plotly, or any other visualization package)

heres an example:

stackql-codespaces-notebook

Using the pystackql package, you can also run asynchronous queries such as querying assets across AWS regions, Azure resource groups (or subscriptions) or Google projects in one statement, for example:

# get multiple regions asynchronously
regions = ["ap-southeast-2", "us-east-1"]

queries = [
f"""
SELECT '{region}' as region, instanceType, COUNT(*) as num_instances
FROM aws.ec2.instances
WHERE region = '{region}'
GROUP BY instanceType
"""
for region in regions
]

instances_df = await stackql.executeQueriesAsync(queries)

Visit us and give us a ⭐ on GitHub

· 2 min read

Materialized Views are now available in StackQL. Materialized Views can be used to improve performance for dependent or repetetive queries within StackQL provisioning or analytics routines.

Refresher on Materialized Views

Unlike standard views that provide a virtual representation of data, a Materialized View physically stores the result set of a query. This implies that the data is pre-computed and stored, which can lead to performance gains as the data doesn't need to be fetched from the underlying resource(s) every time it is queried.

Benefits of Materialized Views in StackQL

  1. Performance Boost: With data already stored and readily available, Materialized Views can substantially reduce StackQL query execution time, especially for complex and frequently-run queries.

  2. Data Consistency: Since Materialized Views provide a snapshot of the data at a specific point in time, it ensures consistent data is returned every time it is accessed until it is refreshed.

  3. Flexibility: You have the flexibility to refresh the Materialized View as needed usign the REFRESH MATERIALIZED VIEW lifecycle operation in StackQL. This is particularly useful when working with rapidly changing data.

Using Materialized Views in StackQL

Here's a step-by-step guide on how you to use this new feature in StackQL:

  1. Create a Materialized View:
CREATE MATERIALIZED VIEW vw_ec2_instance_types AS 
SELECT
memoryInfo,
hypervisor,
autoRecoverySupported,
instanceType,
SPLIT_PART(processorInfo, '\n', 3) as processorArch,
currentGeneration,
freeTierEligible,
hibernationSupported,
SPLIT_PART(vCpuInfo, '\n', 2) as vCPUs,
bareMetal,
burstablePerformanceSupported,
dedicatedHostsSupported
FROM aws.ec2.instance_types
WHERE region = 'us-east-1';
  1. Refresh the Materialized View:
REFRESH MATERIALIZED VIEW vw_ec2_instance_types;
  1. Use the Materialized View in a StackQL Query:
SELECT 
i.instanceId,
i.instanceType,
it.vCPUs,
it.memoryInfo
FROM aws.ec2.instances i
INNER JOIN vw_ec2_instance_types it
ON i.instanceType = it.instanceType
WHERE i.region = 'us-east-1';
  1. Drop the Materialized View:
DROP MATERIALIZED VIEW vw_ec2_instance_types;

More information on Materialized Views in StackQL can be found here.

· 2 min read

Many provider query responses include columns which are arrays, the iam policy related resources in google are a classic example of this. for example, this query:

select * 
from google.cloudresourcemanager.projects_iam_policies
where projectsId = 'stackql';

produces..

|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|                               
| condition | members | role |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| null | ["serviceAccount:1234567890-compute@developer.gserviceaccount.com","serviceAccount:1234567890@cloudservices.gserviceaccount.com","serviceAccount:stackql@appspot.gserviceaccount.com"] | roles/editor |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| null | ["serviceAccount:1234567890-compute@developer.gserviceaccount.com","serviceAccount:1234567890@cloudservices.gserviceaccount.com","serviceAccount:stackql@appspot.gserviceaccount.com"] | roles/editor |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| null | ["serviceAccount:1234567890-compute@developer.gserviceaccount.com","serviceAccount:1234567890@cloudservices.gserviceaccount.com","serviceAccount:stackql@appspot.gserviceaccount.com"] | roles/editor |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|

What you want to do is unnest each member in members for each role binding (and condition if applicable)

Enter the table valued function json_each.

The json_each function accepts a field (optionally with a json path expression) and returns a table object with fields that can be projected in your result set, for example (querying the same underlying resource as above), this...

select 
iam.role,
SPLIT_PART(json_each.value, ':', 1) as member_type,
SPLIT_PART(json_each.value, ':', 2) as member
from google.cloudresourcemanager.projects_iam_policies iam, json_each(members)
where projectsId = 'stackql';

now provides something much more useful from an analytic perspective:

|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| role | member_type | member |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/appengine.serviceAgent | serviceAccount | service-1234567890@gcp-gae-service.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/cloudbuild.builds.builder | serviceAccount | 1234567890@cloudbuild.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/cloudbuild.serviceAgent | serviceAccount | service-1234567890@gcp-sa-cloudbuild.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/compute.serviceAgent | serviceAccount | service-1234567890@compute-system.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/container.serviceAgent | serviceAccount | service-1234567890@container-engine-robot.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/containerregistry.ServiceAgent | serviceAccount | service-1234567890@containerregistry.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/dataflow.serviceAgent | serviceAccount | service-1234567890@dataflow-service-producer-prod.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/dataproc.serviceAgent | serviceAccount | service-1234567890@dataproc-accounts.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/editor | serviceAccount | 1234567890-compute@developer.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/editor | serviceAccount | 1234567890@cloudservices.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/editor | serviceAccount | stackql@appspot.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/firebaserules.system | serviceAccount | service-1234567890@firebase-rules.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/firestore.serviceAgent | serviceAccount | service-1234567890@gcp-sa-firestore.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/owner | serviceAccount | stackql-provisioner@stackql.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/owner | serviceAccount | t1-804@stackql.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/owner | user | javen@stackql.io |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/owner | user | krimmer@stackql.io |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/pubsub.serviceAgent | serviceAccount | service-1234567890@gcp-sa-pubsub.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|
| roles/viewer | serviceAccount | testing-admin@stackql.iam.gserviceaccount.com |
|--------------------------------------|----------------|-----------------------------------------------------------------------------|

__json_each__is available from version 0.5.418 or stackql onwards, this function can also be used in StackQL GitHub Actions such as stackql-exec or stackql-assert and in Python and Pandas using pystackql.