Skip to main content

AWS Global Inventory

In this guide, we will demonstrate using StackQL - a powerful dev tool that enables querying and deploying cloud infrastructure and resources using SQL syntax - to generate a comprehensive inventory of all resources, in all services, in all regions within a given AWS account.

Tested with embedded sql backend postgres sql backend macos linux powershell

How it works

StackQL implements a SQL engine which can process multiple queries asynchronously, for instance:

SELECT region, COUNT(*) as num_functions
FROM aws.lambda.functions
WHERE region IN (
'us-east-1','us-east-2','us-west-1','us-west-2',
'ap-south-1','ap-northeast-3','ap-northeast-2',
'ap-southeast-1','ap-southeast-2','ap-northeast-1',
'ca-central-1','eu-central-1','eu-west-1',
'eu-west-2','eu-west-3','eu-north-1','sa-east-1')
GROUP BY region;

The above query is actually 17 queries run in parallel (one query for each region), the results are aggregated and presented as a report. Using pystackql - a Python wrapper for StackQL - we can iterate through all resources in all services for all regions which are supported by the given service. The result is a report of how many instances are deployed of each resource in each region, as seen in the example below:

aws-inventory-example

Prerequisites

  • pystackql which can be installed using pip install pystackql
  • An AWS user or a role (can be assumed using sts assume-role) with the ReadOnlyAccess policy or greater attached
  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables set (this can be done automatically using AWS Cloud Shell)

The Code

The following code performs the following actions:

  1. Defines a list of all AWS regions (regions) and a list of regions supported for each service (supported_regions), more info on deriving supported_regions can be found at AWS Service Support by Region at a Glance
  2. Pulls the aws and aws StackQL providers if they're not already installed.
  3. Iterates through each service (using SHOW SERVICES), then for each service iterates through each resource (using SHOW RESOURCES)
  4. SELECT's all instances of each resource across all regions asynchronously, groups the data by region and counts the instance of each resource for each region

Here it is...

from regions import supported_regions, regions
from resources import excluded_resources, resource_filters
import time
from pystackql import StackQL
import pandas as pd

stackql = StackQL(output='pandas', page_limit=-1, execution_concurrency_limit=-1)

def pull_stackql_providers():
providers_df = stackql.execute("SHOW PROVIDERS")
is_aws_installed = 'aws' in providers_df['name'].values if not providers_df.empty else False
if not is_aws_installed:
print("aws not installed. Installing aws")
res = stackql.executeStmt("REGISTRY PULL aws")
print(res['message'])

def get_s3_buckets():
# special case for s3 buckets
int_results_df = pd.DataFrame(columns=['bucket', 'region'])
buckets = stackql.execute("""
SELECT bucket_name
FROM aws.s3.buckets WHERE region = 'us-east-1';
""")['bucket_name'].values.tolist()
for bucket in buckets:
regional_domain_query = f"""
SELECT
bucket_name,
bucket_location
FROM aws.s3.bucket WHERE region = 'us-east-1' AND data__Identifier = '{bucket}'
"""
print(f"Checking location for {bucket}...")
regional_domain_name_df = stackql.execute(regional_domain_query)
# Extract the region from the domain name
if not regional_domain_name_df.empty:
bucket_name = regional_domain_name_df['bucket_name'].values[0]
bucket_location = regional_domain_name_df['bucket_location'].values[0]
print("bucket_name: ", bucket_name)
print("bucket_location: ", bucket_location)
new_row = {'bucket': bucket_name, 'region': bucket_location}
int_results_df = pd.concat([int_results_df, pd.DataFrame([new_row])], ignore_index=True)
# Group by region and count the total resources (buckets) in each region
grouped = int_results_df.groupby('region').size().reset_index(name='total_resources')
grouped['svc'] = 's3'
grouped['res'] = 'buckets'
output_df = grouped[['svc', 'res', 'region', 'total_resources']]
return output_df

def main():
start_time = time.time()
pull_stackql_providers()
results_df = pd.DataFrame(columns=['svc', 'res', 'region', 'total_resources'])
# get all enumerable aws services
services_df = stackql.execute("SHOW SERVICES IN aws")
for svcIx, svcRow in services_df.iterrows():
service = svcRow['name']
if service in supported_regions:
# check all enumerable aws resources in the service against supported regions
resources_df = stackql.execute(f"SHOW RESOURCES IN aws.{service}")
plural_resources = [resource for resource in resources_df['name'].tolist() if resource.endswith('s')]
if plural_resources:
for resource in plural_resources:
print(f"Checking {resource} in {service}...")
if service == 's3' and resource == 'buckets':
print(f"Checking location for s3 buckets...")
resource_df = get_s3_buckets()
else:
# should we skip this service?
if service in excluded_resources and resource in excluded_resources[service]:
continue
# check if the resource is global or regional
if 'global' in supported_regions[service]:
regions_in = ", ".join([f"'{region}'" for region in regions])
else:
regions_in = ", ".join([f"'{region}'" for region in supported_regions[service]])

# check if there are additional where clauses for this resource
where_clause = resource_filters.get(service, {}).get(resource, '')

resource_df = stackql.execute(f"""
SELECT '{service}' as svc, '{resource}' as res, region, COUNT(*) as total_resources
FROM aws.{service}.{resource}
WHERE region IN ({regions_in}){where_clause}
GROUP BY svc, res, region
""")

if not resource_df.empty:
# Append resource_df to the results_df
if not 'error' in resource_df.columns:
results_df = pd.concat([results_df, resource_df], ignore_index=True)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
results_df.to_csv('all_aws_resources.csv', index=False)

if __name__ == "__main__":
main()