Adjusted EBITDA, Free Cash Flow, and Core Earnings are central to modern financial analysis. These non-GAAP metrics often drive valuation models, comparables analysis, and investment theses—especially when GAAP figures don’t fully reflect a company’s underlying performance. Yet accessing these metrics remains a significant challenge. They’re frequently embedded deep within earnings releases and 8-K filings, presented in unstructured and inconsistent formats that vary not only across companies but also between reporting periods for the same issuer. Tables may appear as images, poorly formatted text, or irregular layouts. Line item labels change, order varies, and no standard definition exists for terms like “Adjusted EBITDA.”
Traditional parsing techniques—such as regular expressions, templates, or table extraction tools—struggle to keep up. The result is a slow, manual, and error-prone process that resists scale and automation.
The Captide API enables prompt-driven, retrieval-augmented generation (RAG) workflows directly on SEC filings and other unstructured company disclosures. By leveraging advanced agentic AIbehind an API with access to one of the biggest datasets of financial disclosures, Captide can return structured, schema-consistent JSON with the exact metrics required (e.g., all lines for Net Income to Adjusted EBITDA reconciliation, always numeric, with explicit sign conventions).
This case study demonstrates an automated, end-to-end workflow that fetches the earnings press release 8-Ks from multiple public companies, extracts the reconciliation from Net Income to Adjusted EBITDA as a clean JSON object, and iteratively standardizes and aligns metric line items—even as they change across filings—so results are immediately suitable for analytics, visualization, or downstream modeling.
The first step in our workflow is to programmatically retrieve reconciliation line items from Net Income to Adjusted EBITDA, directly from 8-K filings. Using Captide’s retrieval-augmented generation API, we can prompt the model to return these metrics as a clean, structured JSON object—regardless of how inconsistently they're presented in the underlying documents. Captide will only retrieve information that is actually found in the filings and makes available audit traces for quality assurance.
We begin by filtering for recent 8-K filings (post-2022) that are classified as Item 2.02 (Results of Operations and Financial Condition) — the documents most likely to include reconciliations and earnings tables. The fetch_documents
function handles this filtering and returns metadata for qualifying filings across multiple tickers.
Once we’ve gathered the relevant source links, the fetch_metrics_with_prompt
function sends a structured prompt to Captide’s agent-query endpoint. This prompt instructs the model to return only a numeric JSON reconciliation from Net Income to Adjusted EBITDA, omitting non-numeric commentary or inconsistent formatting.
The response is streamed as server-sent events (SSE) and parsed into a usable Python dictionary via the parse_sse_response
function.
This step automates what would otherwise be hours of manual work—parsing filings, normalizing line items, and extracting financial data hidden deep in unstructured disclosures.
import re, json, requests, pandas as pd
from typing import Dict, List
CAPTIDE_API_KEY = os.getenv("CAPTIDE_API_KEY", "YOUR_CAPTIDE_API_KEY")
HEADERS = {
"X-API-Key": CAPTIDE_API_KEY,
"Content-Type": "application/json",
"Accept": "application/json"
}
TICKERS = ["SNAP", "PLTR", "UBER"]
BASE_PROMPT = (
"Return a single valid JSON object with double-quoted keys and numeric values (in thousands of dollars). The object "
"must represent the reconciliation from Net Income to Adjusted EBITDA, including all reported line items. Use "
"positive values for additions to Net Income and negative values for subtractions. Do not include words like 'add' "
"or 'less' in the keys. Output only the JSON object—no commentary or extra text."
)
def is_valid_fiscal_period(fp: str) -> bool:
m = re.match(r"Q([1-4]) (\d{4})", fp)
return bool(m and int(m.group(2)) > 2022)
def is_valid_document(doc: Dict) -> bool:
if doc["sourceType"] == "8-K":
return "2.02" in doc.get("additionalKwargs", {}).get("item", "")
return True
def fetch_documents(ticker: str) -> List[Dict]:
url = f"https://rest-api.captide.co/api/v1/companies/ticker/{ticker}/documents"
docs = requests.get(url, headers=HEADERS, timeout=60).json()
return [
{"ticker": doc["ticker"],
"fiscalPeriod": doc["fiscalPeriod"],
"sourceLink": doc["sourceLink"]}
for doc in docs
if doc["sourceType"] == "8-K"
and "fiscalPeriod" in doc
and is_valid_fiscal_period(doc["fiscalPeriod"])
and is_valid_document(doc)
]
def parse_sse_response(sse_text: str) -> Dict:
try:
lines = [l[6:] for l in sse_text.splitlines() if l.startswith("data: ")]
for l in lines:
obj = json.loads(l)
if obj.get("type") == "full_answer":
content = re.sub(r"\s*\[#\w+\]", "", obj["content"])
m = re.search(r"\{.*\}", content, re.DOTALL)
return json.loads(m.group(0)) if m else {}
except Exception:
pass
return {}
def fetch_metrics_with_prompt(source_links: List[str], prompt: str) -> Dict:
payload = {"query": prompt, "sourceLink": source_links}
r = requests.post(
"https://rest-api.captide.co/api/v1/rag/agent-query-stream",
json=payload, headers=HEADERS, timeout=120
)
return parse_sse_response(r.text)
Below is a sample response for Palantir Technologies' Q1 2023 reconciliation.
{
"Net Income (Loss)": 16802,
"Net Income (Loss) to Non-Controlling Interests": 2349,
"Interest Income": -20853,
"Interest Expense": 1275,
"Other (Income) Expense Net": 2861,
"Income Tax (Benefit) Expense": 1681,
"Depreciation and Amortization": 8320,
"Stock-Based Compensation": 114714,
"Payroll Tax on Stock-Based Compensation": 6285,
"Adjusted EBITDA": 133434
}
Because reconciliation line items vary not only between companies but also across different reporting periods, the next step is to normalize this structure into a stable schema that can support time-series analysis and consistent aggregation.
The key challenge is that issuers often introduce, rename, or reorder line items in their Adjusted EBITDA reconciliations. A rigid schema would either lose valuable information or require constant manual updates. To solve this, we use a dynamic approach that incrementally learns and aligns the line item ordering.
The build_prompt
function augments our original prompt with positional guidance derived from previously seen reconciliations. If prior periods contained line items in a particular order, we ask the model to maintain that order where possible—while still allowing new line items to be inserted in a sensible position.
The merge_key_lists
function handles this logic. It iteratively compares the current period’s line items with a “master” list accumulated across filings. When new items appear, it infers the appropriate insertion point by checking for nearby keys that already exist in the master list. This approach preserves semantic continuity without enforcing rigid templates.
By the end of this step, we’ve established a schema-consistent, order-aware list of reconciliation line items—robust enough to absorb quarterly variations while maintaining structure for downstream processing.
def build_prompt(prev_keys: List[str]) -> str:
if not prev_keys:
return BASE_PROMPT
joined = ", ".join(f'"{k}"' for k in prev_keys)
return (
BASE_PROMPT +
f"Use the following keys in this order if they appear: [{joined}]. "
"If the document contains additional reconciliation line items, insert "
"them at the correct position relative to the list above."
)
def merge_key_lists(master: list[str], this_quarter: list[str]) -> list[str]:
for i, k in enumerate(this_quarter):
if k in master:
continue
insert_pos = None
for j in range(i - 1, -1, -1):
prev_key = this_quarter[j]
if prev_key in master:
insert_pos = master.index(prev_key) + 1
break
if insert_pos is None:
for j in range(i + 1, len(this_quarter)):
nxt_key = this_quarter[j]
if nxt_key in master:
insert_pos = master.index(nxt_key)
break
if insert_pos is None:
insert_pos = len(master)
master.insert(insert_pos, k)
return master
With the logic in place for extraction and normalization, we can now scale our workflow across multiple tickers—automating the collection of structured, schema-aligned reconciliation metrics at scale.
The run_one_ticker
function orchestrates the full process for a single company. It fetches all qualifying 8-K filings, sorts them chronologically using the fiscal_sort_key
, and processes each in turn. As each quarter is parsed, the line items are normalized and merged into a cumulative schema using merge_key_lists
. This ensures that earlier schema decisions inform how future filings are interpreted and structured.
The output is a time-indexed dictionary of structured reconciliation data per ticker, along with a final, consolidated ordering of all metric keys encountered.
To parallelize processing across the entire ticker list, we use a ThreadPoolExecutor
. Each ticker is processed in its own thread, maximizing throughput and minimizing latency—especially useful when dealing with network-bound operations like API calls.
The final per_ticker_output
dictionary contains fully normalized, JSON-formatted reconciliation data for each company, ready for direct use in analytics, dashboards, or financial models.
from concurrent.futures import ThreadPoolExecutor, as_completed
def fiscal_sort_key(fp: str) -> tuple[int, int]:
m = re.match(r"Q([1-4]) (\d{4})", fp)
if not m:
return (9999, 9)
q, yr = int(m.group(1)), int(m.group(2))
return (yr, q)
def run_one_ticker(ticker: str) -> Dict[str, Dict[str, float]]:
docs = fetch_documents(ticker)
docs.sort(key=lambda d: fiscal_sort_key(d["fiscalPeriod"]))
key_order: List[str] = []
results: Dict[str, Dict[str, float]] = {}
for doc in docs:
prompt = build_prompt(key_order)
data = fetch_metrics_with_prompt([doc["sourceLink"]], prompt)
if not data:
continue
results[doc["fiscalPeriod"]] = data
key_order = merge_key_lists(key_order, list(data.keys()))
return {"keys": key_order, "data": results}
per_ticker_output = {}
with ThreadPoolExecutor(max_workers=len(TICKERS)) as pool:
futures = {pool.submit(run_one_ticker, t): t for t in TICKERS}
for fut in as_completed(futures):
ticker = futures[fut]
per_ticker_output[ticker] = fut.result()
With normalized metrics collected across companies and time periods, the final step is to convert this data into a format suitable for inspection, visualization, or direct analysis.
Each company’s output—originally a nested dictionary of fiscal periods and corresponding reconciliation metrics—is transformed into a tidy, column-aligned pandas.DataFrame
. The key_order
ensures that line items appear in a consistent and meaningful sequence across all periods, regardless of how they were presented in the original filings.
The result is a clean, rectangular table for each ticker, where rows represent standardized reconciliation line items (e.g., Depreciation and Amortization, Stock-Based Compensation), and columns represent fiscal quarters. This structure makes it trivial to:
tables = {}
for ticker, payload in per_ticker_output.items():
key_order = payload["keys"]
series_by_q = payload["data"]
df = pd.DataFrame(series_by_q).reindex(key_order)
df.index.name = "Line item"
tables[ticker] = df
for t, frame in tables.items():
print(f"\n📊 {t}")
print(frame)
Below are the results we get for Snap Inc. and Palantir Technologies Inc.
EBITDA Reconciliation for SNAP
Line item Q1 2023 Q2 2023 Q3 2023 Q4 2023 Q1 2024 Q2 2024 Q3 2024 Q4 2024
Net Income (Loss) -328674 -377308 -368256 -248247 -305090 -248620 -153247 9101
Interest Income -37948 -43144 -43839 -43463 -39898 -36462 -38533 -38573
Interest Expense 5885 5343 5521 5275 4743 5113 5883 5813
Other (Income) Expense Net -11372 -1323 20662 34447 81 20792 4355 -8382
Income Tax (Benefit) Expense 6845 12093 5849 3275 6932 5202 8332 5164
Depreciation and Amortization 35220 39688 41209 43882 38098 37930 38850 39581
Stock-Based Compensation 314931 317943 353846 333063 254715 258946 260229 257731
Payroll Tax on Stock-Based Compensation 15926 8229 6463 8706 15970 10133 6093 5572
Restructuring Charges 0 0 18639 22211 70108 1943 0 0
Adjusted EBITDA 813 -38479 40094 159149 45659 54977 131962 276007
EBITDA Reconciliation for PLTR
Line item Q1 2023 Q2 2023 Q3 2023 Q4 2023 Q1 2024 Q2 2024 Q3 2024 Q4 2024 Q1 2025
Net Income (Loss) 16802 28127 71505 93391 105530 134126 143525 79009 214031
Net Income (Loss) to Non-Controlling Interests 2349 -255 1934 3522 541 1444 5816 -2073 3686
Interest Income -20853 -30310 -36864 -44545 -43352 -46593 -52120 -54727 -50441
Interest Expense 1275 1317 742 136 0 0 0 0 0
Other (Income) Expense Net 2861 9024 -3864 3956 13507 11173 8110 -14768 3173
Income Tax (Benefit) Expense 1681 2171 6530 9334 4655 5189 7809 3602 5599
Depreciation and Amortization 8320 8399 8663 7972 8438 8056 8087 7006 6622
Stock-Based Compensation 114714 114201 114380 132608 125651 141764 142425 281798 155339
Payroll Tax on Stock-Based Compensation 6285 10760 8909 10953 19926 6464 19950 79681 59323
Adjusted EBITDA 133434 143434 171935 217327 234896 261623 283602 379528 397332
What used to be a tedious, manual task—digging through 8-Ks for elusive non-GAAP metrics—can now be fully automated with precision and scale. By combining Captide’s retrieval-augmented generation capabilities with a structured prompt strategy and dynamic schema normalization, we’ve shown how even the most inconsistent financial disclosures can be transformed into clean, analysis-ready data.
This approach doesn’t just save time—it unlocks new possibilities. With standardized, machine-readable versions of Adjusted EBITDA reconciliations across companies and quarters, analysts can now:
Crucially, this framework is generalizable. The same method can be applied to extract and normalize other elusive financial metrics: Free Cash Flow definitions, Core Earnings breakdowns, or custom KPIs buried in footnotes and management commentary.
In a world increasingly driven by unstructured data, tools like Captide don’t just streamline workflows—they change the very nature of what's feasible in financial analysis. What was once hidden is now structured. What was once manual is now instant. And what was once brittle is now intelligent.