Unlike most sectors, biopharma stocks don’t trade primarily on revenue multiples—they trade on discrete catalyst events: FDA approvals, Phase 3 readouts, fast-track designations, and pivotal trial initiations. These moments can resolve years of clinical and regulatory uncertainty in a single press release, often moving market caps by tens of billions overnight. As IQVIA notes, trial outcomes in small-cap biotech—where a single asset can represent the bulk of enterprise value—can trigger valuation swings two orders of magnitude larger than those seen in diversified pharma.
This notebook walks through a live case study using the Captide API to programmatically track these events across biotech and pharma equities. While we focus here on pulling historical catalysts for exploration and visualization, the true edge emerges when combining this framework with Captide’s document webhook—allowing teams to detect and analyze material disclosures in bulk within seconds of publication, not hours or days later.
To keep things focused, we'll analyze a sample of four companies and their disclosures from 2023 to the present. The selected firms—Verve Therapeutics, Intellia Therapeutics, Editas Medicine, and Biogen—offer a mix of early- and late-stage pipelines across gene editing and neurology, making them ideal for illustrating a range of catalyst types.
import os
import re
import json
import requests
import pandas as pd
from typing import Dict, List
from datetime import datetime
from collections import defaultdict
# Load API Key
CAPTIDE_API_KEY = os.getenv("CAPTIDE_API_KEY")
# Set headers
HEADERS = {
"X-API-Key": CAPTIDE_API_KEY,
"Content-Type": "application/json",
"Accept": "application/json"
}
# Define tickers and cutoff date
TICKERS = ["VERV", "NTLA", "EDIT", "BIIB"]
cutoff_date = "2022-12-31"
For U.S.-listed biotech and pharma companies, major catalyst events—like clinical trial results or FDA decisions—are typically disclosed through Form 8-K filings. These events are often deemed material under SEC rules, meaning they could significantly impact a company’s stock price. This is especially true for small-cap biotechs, where a single drug candidate may account for most of the company’s value. Filing an 8-K ensures compliance with Regulation FD, provides equal access to information, and reinforces transparency.
In this example, we’ll use the Captide API to retrieve and analyze 8-K filings from our sample companies. However, not all 8-Ks are relevant to catalyst events, so we’ll apply a filter using the helper function below to extract only the items of interest.
def is_valid_document(doc: Dict) -> bool:
if doc["sourceType"] == "8-K":
item_str = doc.get("additionalKwargs", {}).get("item", "")
relevant_items = {"1.01", "1.02", "2.05", "7.01", "8.01"}
return any(item in item_str for item in relevant_items)
return True
def parse_sse_response(sse_text: str) -> Dict:
try:
lines = [l[6:] for l in sse_text.splitlines() if l.startswith("data: ")]
for l in lines:
obj = json.loads(l)
if obj.get("type") == "full_answer":
content = re.sub(r"\s*\[#\w+\]", "", obj["content"])
m = re.search(r"\{.*\}", content, re.DOTALL)
return json.loads(m.group(0)) if m else {}
except Exception:
pass
return {}
We’ll now use the Captide API to fetch all 8-K filings from our sample companies that match our criteria.
def fetch_documents(ticker: str) -> List[Dict]:
url = f"https://rest-api.captide.co/api/v1/companies/ticker/{ticker}/documents?startDate={cutoff_date}"
response = requests.get(url, headers=HEADERS, timeout=60)
docs = response.json()
print(f"Fetched {len(docs)} raw documents for {ticker}")
return [
{
"ticker": doc["ticker"],
"sourceLink": doc["sourceLink"],
"date": doc["date"]
}
for doc in docs
if doc["sourceType"] == "8-K"
and "fiscalPeriod" in doc
and is_valid_document(doc)
]
# Example:
test_docs = fetch_documents("VERV")
print(json.dumps(test_docs, indent=2))
Below is a sample output of the filtered documents for Verve Therapeutics.
[
{
"ticker": "VERV",
"sourceLink": "https://rest-api.captide.co/api/v1/document?sourceType=8-K&documentId=6eff02cb-497d-466c-bf9e-54f5e29b2777",
"date": "2025-06-17"
},
{
"ticker": "VERV",
"sourceLink": "https://rest-api.captide.co/api/v1/document?sourceType=8-K&documentId=0a60ebbd-f8cb-4d1b-a95f-d132e83d4a2d",
"date": "2025-06-17"
},
{
"ticker": "VERV",
"sourceLink": "https://rest-api.captide.co/api/v1/document?sourceType=8-K&documentId=59e6ea4b-3806-4d72-8c2b-a0cd2dd8db51",
"date": "2025-06-17"
},
...
]
Once relevant 8-Ks are retrieved, the next step is to extract structured, machine-readable insights from these filings. To do this, we use the Captide Agentic RAG endpoint, which applies retrieval-augmented generation (RAG) to extract and parse information from SEC (and global) disclosures.
The function below sends a list of filing URLs and a detailed natural language prompt to the Captide API. The model then returns a single JSON object for each document, identifying the event type and associated key details, such as trial phase, drug name, regulatory decision, or program status—depending on the nature of the event.
If no relevant catalyst is found in a document, the model returns { "eventType": null, "eventDetails": {} }.
This approach allows us to standardize event data across filings, enabling easier comparison, visualization, and downstream modeling. The prompt supports five core event types commonly disclosed by biotech and pharma companies:
def fetch_metrics_with_prompt(source_links: List[str], prompt: str) -> Dict:
payload = {"query": prompt, "sourceLink": source_links}
r = requests.post(
"https://rest-api.captide.co/api/v1/rag/agent-query-stream",
json=payload, headers=HEADERS, timeout=120
)
return parse_sse_response(r.text)
BASE_PROMPT = (
"""
Return a single valid JSON object with the following key-value pairs. Note that "eventType" dictates the required structure of the "eventDetails" field:
- "eventType": one of the following values — "clinical_results", "fda_decision", "clinical_hold", "program_termination", or "program_announcement".
- "eventDetails": an object containing structured details relevant to the specified "eventType", using the required fields listed below.
If the document does **not** contain any material information matching these event types, return:
{
"eventType": null,
"eventDetails": {}
}
Required fields per event type:
- If "eventType" = "clinical_results":
- "trialPhase" — e.g., Phase 1/2/3
- "drugName" — name of the investigational therapy
- "indication" — disease or condition being treated
- "resultsSummary" — concise summary of efficacy outcomes
- "safetyProfile" — any adverse events or tolerability findings
- "nextSteps" — upcoming plans (e.g., advancing to next phase)
- If "eventType" = "fda_decision":
- "decisionType" — e.g., approved, Complete Response Letter (CRL), delayed
- "applicationType" — e.g., NDA (New Drug Application), BLA (Biologics License Application)
- "drugName" — name of the drug reviewed
- "indication" — condition the drug is intended to treat
- "notes" — relevant regulatory context, labeling info, or committee outcomes
- If "eventType" = "clinical_hold":
- "holdType" — full or partial hold
- "reason" — FDA's stated rationale for the hold
- "drugName" — drug or program affected
- "impactSummary" — brief on trial or pipeline impact
- "resolutionPlan" — company's steps to address the hold
- If "eventType" = "program_termination":
- "programName" — name of the discontinued program
- "indication" — target disease area
- "developmentStage" — how far along the program was (e.g., preclinical, Phase 2)
- "reason" — scientific, strategic, or commercial rationale for termination
- "financialImpact" — expected cost savings or write-downs
- "strategicNotes" — effect on company direction or pipeline focus
- If "eventType" = "program_announcement":
- "programName" — name of the new or updated program
- "indication" — disease or condition being targeted
- "platform" — underlying technology (e.g., CRISPR, base editing)
- "developmentStage" — current status (e.g., preclinical, IND-enabling)
- "milestone" — key progress point announced (e.g., IND submission, trial start)
- "partnershipInfo" — details of any collaborations or licensing
- "timelineEstimate" — estimated timing of next milestone
"""
)
With all the core building blocks in place—document retrieval, filtering, and event classification—we now bring everything together in a single pipeline using the process_ticker() function.
This function acts as the execution layer, orchestrating the following steps for each ticker:
To speed up execution, we use Python’s ThreadPoolExecutor
to run the pipeline concurrently across all selected tickers. This parallelization is especially useful when dealing with a large watchlist or running recurring batch jobs.
The result is results_by_ticker
: a dictionary mapping each ticker to a structured table of catalyst events—summarizing what happened, when, and with what clinical or regulatory significance.
from concurrent.futures import ThreadPoolExecutor, as_completed
def process_ticker(ticker: str):
try:
documents = fetch_documents(ticker)
table_data = defaultdict(lambda: defaultdict(str)) # eventType -> year -> detail
for doc in documents:
year = datetime.strptime(doc["date"], "%Y-%m-%d").year
parsed = fetch_metrics_with_prompt([doc["sourceLink"]], BASE_PROMPT)
event_type = parsed.get("eventType")
if event_type:
full_event = {"eventType": event_type, "eventDetails": parsed["eventDetails"]}
table_data[event_type][year] += json.dumps(full_event) + "\n---\n"
if table_data:
df = pd.DataFrame(table_data).T
return ticker, df
else:
return ticker, None
except Exception as e:
return ticker, None
results_by_ticker = {}
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(process_ticker, ticker): ticker for ticker in TICKERS}
for future in as_completed(futures):
ticker, df = future.result()
if df is not None:
results_by_ticker[ticker] = df
This final section renders the output by formatting the extracted catalyst events into readable tables. For each ticker, it iterates through structured events by type and year, parses the stored JSON blobs, and displays the eventDetails
as a clean DataFrame.
This step ties everything together, turning raw 8-K text into a digestible summary of what happened, when, and why it matters.
def camel_to_title(s):
return re.sub(r'(?<=[a-z])(?=[A-Z])', ' ', s).replace("_", " ").title()
for ticker, df in results_by_ticker.items():
print(f"\n📢 Ticker: {ticker}")
for event_type, row in df.iterrows():
for year, detail in row.items():
if not isinstance(detail, str) or not detail.strip():
continue
print(f"\n🗂️ {ticker} — {event_type} — {year}")
try:
# Split by delimiter and parse each event separately
event_blobs = [blob for blob in detail.split("\n---\n") if blob.strip()]
for blob in event_blobs:
parsed_detail = json.loads(blob)
details = parsed_detail.get("eventDetails", {})
if isinstance(details, dict) and details:
display(pd.DataFrame.from_dict(
{camel_to_title(k): v for k, v in details.items()},
orient="index",
columns=["Value"]
).style.set_properties(**{'white-space': 'pre-wrap'}))
else:
print(f"⚠️ No eventDetails in blob: {blob}")
except json.JSONDecodeError as e:
print(f"❌ JSON decode error for {event_type} in {year}: {e}")
except Exception as e:
print(f"❌ Failed to parse/display event for {event_type} in {year}: {e}")
Below are the results: