Skip to main content

One post tagged with "llm"

View All Tags

Markdown-KV Output Format Available in pystackql

· 3 min read
Technologist and Cloud Consultant

pystackql now includes a markdownkv output format optimized for LLM processing of control plane and data plane data from cloud providers.

Background

Recent research from ImprovingAgents.com tested 11 data formats to determine which ones LLMs parse most accurately. Using 1,000 synthetic employee records and 1,000 randomized queries, they measured how well different formats preserved data integrity through LLM processing.

The results:

FormatAccuracy95% CI
Markdown-KV60.7%57.6%–63.7%
JSON52.3%49.2%–55.4%
Markdown Tables51.9%48.8%–55.0%
JSONL45.0%41.9%–48.1%
CSV44.3%41.2%–47.4%


Markdown-KV showed a 37% improvement over CSV and 16 percentage points over JSON. The tradeoff: it uses approximately 2.7x more tokens than CSV.

What is Markdown-KV?

Markdown-KV uses hierarchical markdown headers with code blocks for key-value pairs:

# Query Results

## Record 1

id: i-1234567890abcdef0
name: prod-web-01
region: us-east-1
instance_type: t3.large
state: running


## Record 2

id: i-0987654321fedcba0
name: staging-web-01
region: us-west-2
instance_type: t3.medium
state: stopped

The format combines clear hierarchy, explicit key-value pairs, and readability for both humans and LLMs.

Usage

from pystackql import StackQL

stackql = StackQL()

# Query with Markdown-KV output
result = stackql.execute(
"""
SELECT instanceId, instanceType, state, availabilityZone
FROM aws.ec2.instances
WHERE region = 'us-east-1'
""",
output='markdownkv'
)

# Use with LLMs
response = llm_client.complete(
f"Identify instances that should be stopped:\n\n{result}"
)

Works in server mode too:

stackql = StackQL(server_mode=True)

result = stackql.execute(
"SELECT name, region, encryption FROM google.storage.buckets WHERE project = 'my-project'",
output='markdownkv'
)

When to Use It

Markdown-KV is useful when:

  • Feeding infrastructure data to LLMs for analysis, security reviews, or recommendations
  • Building RAG pipelines that need to accurately retrieve and reason about infrastructure
  • Accuracy matters more than token efficiency (infrastructure decisions typically do)
  • Query results are focused datasets (most StackQL queries are)

The token cost is a real tradeoff, but infrastructure queries typically return targeted result sets, not massive datasets. When you're asking an LLM to analyze your production environment, accuracy matters.

Getting Started

Update pystackql:

pip install --upgrade pystackql

Add output='markdownkv' to your execute calls or in the StackQL object instantiation:

result = stackql.execute(query, output='markdownkv')

Resources

The Markdown-KV output format is available in pystackql v3.8.2 and later.

⭐ Star us on GitHub and join our community!