Congressional Trading Data for AI Models — Claude, GPT-4 & Python

The GovGreed API returns structured JSON covering Triple Signals, bill ML scores, executive pre-vote buys, and committee markup calendars. This guide shows how to pipe that data into Claude API, GPT-4, LangChain, scikit-learn, and custom ML pipelines.

What Is Congressional Trading Data — and Why Does It Work in AI Models?

Congressional trading data is a category of alternative data — non-traditional financial information derived from public government disclosures rather than market feeds. Members of Congress must disclose stock trades within 45 days under the STOCK Act (2012). Corporate executives must disclose within 2 business days under SEC Section 16(b). Lobbyists file quarterly with the Senate LDA. Campaign contributions are reported to the FEC.

None of this data appears in Bloomberg, Refinitiv, or standard market data feeds. That information gap is precisely the edge.

Why it trains well: Congressional trading data has the rare property of being labeled. You know the outcome — did the bill pass? Did the stock move? This creates a supervised learning setup: train on historical signal patterns, validate on bill passage outcomes, deploy forward.

The four signal types available via API

Signal API Endpoint Why It's Useful for AI/ML
Triple Signal rpc/get_triple_signals High-precision alert: committee overlap + trade + contribution. Structured, categorical, binary flag + score.
Bill Investability Score bills?investability_score=gte.70 25-feature ML score (0–100). Use as a feature in your own model or as a pre-filter for signal universe.
Exec Pre-Vote Buy rpc/exec_timing_signals Time-series feature: days before vote, position size, officer flag. Natural fit for LSTM/temporal models.
Committee Markup Calendar upcoming_markups Event-driven signal: scheduled markup = catalyst. Combine with bill score for timed entry.

Fetching Data from the GovGreed API

The GovGreed API is a standard PostgREST REST interface. No SDK required — plain HTTP GET with your API key in the header.

Python govgreed_client.py
# pip install requests pandas
import requests
import pandas as pd
from datetime import datetime

BASE_URL = "https://api.govgreed.com"
HEADERS = {
    "apikey": "YOUR_GOVGREED_API_KEY",
    "Authorization": "Bearer YOUR_GOVGREED_API_KEY",
    "Accept": "application/json",
}

def get_triple_signals(min_score=60, limit=50):
    """Get Triple Signals ranked by score. These are bills where
    a committee member overlaps with a stock trade AND campaign contribution."""
    resp = requests.get(
        f"{BASE_URL}/rest/v1/rpc/get_triple_signals",
        params={"min_score": min_score, "limit": limit},
        headers=HEADERS
    )
    resp.raise_for_status()
    return pd.DataFrame(resp.json())

def get_high_investability_bills(min_score=70):
    """Bills scoring ≥70 on investability (5.4× more likely to pass)."""
    resp = requests.get(
        f"{BASE_URL}/rest/v1/bills",
        params={
            "congress": "eq.119",
            "investability_score": f"gte.{min_score}",
            "order": "investability_score.desc",
            "select": "id,bill_number,title,investability_score,hot_score,committee_name"
        },
        headers=HEADERS
    )
    return pd.DataFrame(resp.json())

def get_exec_timing_signals(min_score=5.0):
    """Exec buys with high timing scores — officer bought before bill vote."""
    resp = requests.get(
        f"{BASE_URL}/rest/v1/exec_timing_signals_best",
        params={
            "timing_score": f"gte.{min_score}",
            "transaction_type": "eq.Purchase",
            "order": "timing_score.desc",
            "limit": "30"
        },
        headers=HEADERS
    )
    return pd.DataFrame(resp.json())

# Usage
signals_df = get_triple_signals(min_score=70)
bills_df = get_high_investability_bills()
print(f"Triple signals: {len(signals_df)} | High bills: {len(bills_df)}")

Using Claude API to Analyze Congressional Signals

Claude is well-suited for congressional signal analysis because it can reason about conflicting multi-factor patterns, generate plain-English investment thesis summaries, and flag which signals are most statistically unusual given historical context.

Python claude_analysis.py
# pip install anthropic
import anthropic
import json

client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_KEY")

def analyze_signal_with_claude(signal_data: dict) -> str:
    """Pass a GovGreed Triple Signal to Claude for analysis.
    Returns an investment thesis summary."""

    prompt = f"""You are analyzing a congressional insider trading signal.
Here is the signal data in JSON:

{json.dumps(signal_data, indent=2)}

Fields explanation:
- ticker: stock symbol affected by the bill
- bill_number: the bill being voted on
- investability_score: ML score 0-100 (≥70 = high signal)
- committee_member: name of the politician with oversight
- trade_amount: dollar value of their stock trade
- days_before_vote: how far before the vote they traded
- exec_buys: number of corporate execs also buying this stock
- campaign_contributions: industry money received by committee member

Provide:
1. A 2-sentence investment thesis (what the signal suggests)
2. The top risk factor for this signal
3. Confidence level (Low/Medium/High) with brief justification
4. Suggested action: WATCH / ENTER / AVOID

Be specific and analytical. Reference the data."""

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        messages=[{"role": "user", "content": prompt}]
    )
    return message.content[0].text

# Example usage with a real signal
sample_signal = {
    "ticker": "NVDA",
    "bill_number": "HR.7530",
    "bill_title": "CHIPS and Science Act Expansion",
    "investability_score": 84.2,
    "committee_member": "[Senator Name]",
    "committee": "Senate Commerce, Science, and Transportation",
    "trade_amount": 485000,
    "days_before_vote": 38,
    "exec_buys": 3,
    "campaign_contributions": 125000,
    "triple_signal": True
}

thesis = analyze_signal_with_claude(sample_signal)
print(thesis)

Using OpenAI GPT-4 for Batch Signal Ranking

GPT-4 with structured outputs is useful for batch-ranking multiple signals and returning machine-readable JSON for downstream processing.

Python gpt4_batch_rank.py
# pip install openai
from openai import OpenAI
import json

client = OpenAI(api_key="YOUR_OPENAI_KEY")

def rank_signals_gpt4(signals: list[dict]) -> list:
    """Rank a list of Triple Signals using GPT-4.
    Returns signals with AI confidence score added."""

    resp = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": "You are a quantitative analyst ranking congressional trading signals. Return JSON with ranked_signals array."
            },
            {
                "role": "user",
                "content": f"Rank these signals by expected alpha. Add ai_confidence (0-1) and ai_rank. Signals: {json.dumps(signals)}"
            }
        ]
    )
    result = json.loads(resp.choices[0].message.content)
    return result.get("ranked_signals", [])

LangChain + Vector DB: Build a Congressional Signal RAG System

For more sophisticated AI analysis, store GovGreed data in a vector database (Pinecone, Qdrant, Supabase Vector) and use LangChain to build a retrieval-augmented system. Ask questions like "Which bills in the semiconductor sector have the highest triple signal count?" or "Show me historical patterns for defense sector signals in election years."

Python langchain_rag.py
# pip install langchain langchain-openai langchain-community
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.docstore.document import Document

# Convert bill signals to documents for vector storage
def signals_to_documents(bills_df):
    docs = []
    for _, row in bills_df.iterrows():
        content = f"""Bill: {row['bill_number']} — {row['title']}
Committee: {row['committee_name']}
Investability Score: {row['investability_score']:.1f}/100
Triple Signal: {row.get('triple_signal', False)}
Tickers Affected: {row.get('tickers_affected', 'N/A')}
Exec Buys: {row.get('exec_buy_count', 0)}"""
        docs.append(Document(
            page_content=content,
            metadata={"bill_number": row["bill_number"], "score": row["investability_score"]}
        ))
    return docs

# Build retriever and QA chain
embeddings = OpenAIEmbeddings()
docs = signals_to_documents(bills_df)
vectorstore = FAISS.from_documents(docs, embeddings)
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vectorstore.as_retriever(k=5)
)

# Natural language queries over congressional data
answer = qa.invoke("Which semiconductor bills have triple signals?")
print(answer["result"])

Scikit-Learn: Train Your Own Congressional Alpha Model

The GovGreed API exposes the raw features behind the investability score. You can pull those features and train your own model — either to reproduce the score or to add your own signals.

Python train_model.py
# pip install scikit-learn lightgbm pandas
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Fetch bill features from GovGreed API
resp = requests.get(
    f"{BASE_URL}/rest/v1/bill_features",
    params={"congress": "in.(117,118)"},  # historical for training
    headers=HEADERS
)
features_df = pd.DataFrame(resp.json())

# Key features available
FEATURE_COLS = [
    "insider_count", "insider_trade_value", "has_triple_signal",
    "sector_count", "impacted_ticker_count", "related_contributions",
    "exec_ahead_vote_count", "exec_officer_buy_count", "markup_scheduled",
    "sponsor_party_d", "cosponsors_count", "committee_seniority_avg"
]

X = features_df[FEATURE_COLS].fillna(0)
y = (features_df["enacted"] == True).astype(int)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LGBMClassifier(n_estimators=200, learning_rate=0.05)
model.fit(X_train, y_train)

auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
print(f"AUC: {auc:.3f}")  # baseline ~0.71 with GovGreed features

Frequently Asked Questions

Can I use Claude API to analyze congressional trading data?
Yes. Fetch congressional trade signals from GovGreed's REST API, then pass the JSON to Claude API with a structured prompt. Claude is particularly good at reasoning about conflicting signals, identifying the most suspicious patterns, and generating plain-English investment thesis summaries from the raw data. See the Claude API section above for a complete working example.
What makes congressional data good for AI and ML models?
Congressional trading data has several properties that make it well-suited for ML: (1) it's labeled — you know the outcome (did the bill pass?), (2) it has temporal structure — trades precede votes, (3) it has multiple correlated signals (trade + committee + contribution), (4) it has bounded alpha — STOCK Act insiders have real informational advantages, creating a learnable pattern. GovGreed's internal model achieves 5.4× pass rate improvement on high-signal bills.
What Python libraries work best with GovGreed data?
For data fetching: requests or httpx. For signal processing: pandas, numpy. For ML: scikit-learn, lightgbm, xgboost. For AI analysis: anthropic SDK or openai SDK. For LLM pipelines: langchain. For execution: alpaca-trade-api or ibapi. For backtesting: backtrader or zipline-reloaded.
How do I get access to the GovGreed API?
GovGreed is currently in alpha. Join the waitlist to request access — everyone on the waitlist gets 30 days of full access free at launch (before Summer 2026). For bulk historical pulls, use the offset/limit parameters to paginate through large datasets.
Can I use GovGreed with Claude Code or GitHub Copilot for development?
Yes. GovGreed's API returns clean, well-structured JSON, which means you can give the response schema to Claude Code or Copilot and get accurate code generation for working with the data. The trading bot guide includes complete Python modules you can use as a starting point.