Visualizing Financial Statements | LightningChart Python

Tutorial

Assisted by AI

Step-by-step guide to start visualizing financial statements using LightningChart Python data visualization library.
Vindya-Nukulasooriya

Vindya Nukulasooriya

Data Science Developer

LinkedIn icon
Visualizing-Financial-Statements-Cover

Introduction

This project presents a comprehensive financial data visualization analysis using the SEC Financial Statement Data Sets, powered by the LightningChart Python library. The dataset, published by the U.S. Securities and Exchange Commission (SEC), contains key accounting indicators such as assets, liabilities, stockholders’ equity, revenues, cost of revenue, gross profit, operating income (loss), net income (loss), cash and cash equivalents, and operating cash flow.

The primary objectives of this project are to:

  • Explore the relationships between individual financial statement parameters and overall company performance.
  • Identify which factors show the strongest association with profitability, liquidity, and solvency.
  • Visualize multivariate interactions to understand the combined effects of balance sheet and income statement variables.
  • Transform raw SEC filings into clear, interactive visualizations that can aid investors, analysts, regulators, and corporate decision-makers.

To achieve these objectives, LightningChart Python was selected for its:

  • High-performance rendering, capable of efficiently managing millions of financial records across thousands of company filings.
  • Extensive 2D and 3D visualization capabilities, well-suited for comparative analysis, correlation studies, and time-series financial profiling.
  • Publication-quality interactive charts, enabling both scientific presentation and business intelligence reporting.

By converting raw accounting disclosures into intuitive visual insights, this project reveals critical patterns in corporate financial health, providing evidence-based guidance for evaluating profitability, liquidity, and solvency across U.S. companies.

Project Overview

To develop up to 10 interactive chart examples using LightningChart Python, focusing on uncovering patterns in financial statement parameters, their interrelationships, and their influence on the profitability, liquidity, and solvency of U.S. companies.

 Objectives

  • Assess how individual accounting indicators (eg, Net Income, Revenues, Assets, Liabilities) vary across companies and time.
  • Examine correlations between balance sheet and income statement components, identifying parameters with the strongest relationships.
  • Explore multi-parameter financial profiles to determine whether combined variables provide deeper insights into company performance.
  • Showcase LightningChart Python’s capability to deliver high-performance, interactive visualizations for large-scale financial datasets.

Deliverables

  • Nine high-performance visualizations created exclusively with LightningChart Python.
  • Well-documented Python code for each chart, including preprocessing, parameter selection, and reasoning.
  • Interpretive summaries highlighting trends, correlations, and patterns in profitability, liquidity, and solvency.
  • A conclusion discussing how LightningChart Python enhances financial data analysis and supports corporate performance evaluation.

Tools Used

Python 3.13.5, LightningChart Python, Jupyter Notebook, AI Assistance

About the Dataset

The SEC Financial Statement Data Sets contain structured disclosures from U.S. public companies, including quarterly and annual filings. They provide rich numerical data suitable for analysing accounting metrics, industry patterns, and long-term financial trends. Each record includes

  •  Balance Sheet Indicators: Assets, Liabilities, Stockholders’ Equity
  • Income Statement Indicators: Revenues, Cost of Revenue, Gross Profit, Operating Income, Net Income
  • Cash Flow Indicators: Cash and Cash Equivalents, Operating Cash Flow

    LightningChart Python

    LightningChart Python is a professional-grade data visualization library renowned for its ultra-fast rendering and scientific precision. Its ability to handle large-scale datasets and produce multidimensional, interactive visualizations makes it highly effective for analysing financial statement data.

    LightningChart-Python-About

    Setting Up Python Environment

    Before running the project, install Python and the other required libraries using:

    %pip install numpy pandas lightningchart

    Setting Up Your Development Environment:

    1. Set up a virtual environment:
    2. Use Visual Studio Code (VSCode) for a streamlined development experience.

    Loading and Preprocessing Data

    To create this China Water Pollution Monitoring Application, we will fetch the China water pollution data using the following function:

    Downloaded the dataset from https://www.sec.gov/data-research/sec-markets-data/financial-statement-data-sets

    To preprocess the dataset, we will import the pandas library:

    # Import necessary libraries (load pandas library to preprocess dataset)
    import pandas as pd

    Visualizing Data with LightningChart Python

    Histogram of NetIncomeLoss Distribution

    The decile histogram shows that corporate net incomes are highly unevenly distributed: the majority of firms report modest results, while a small fraction accounts for very large profits.

    Visualizing-Financial-Statements-Histogram
    # Chart 1 — Histogram of NetIncomeLoss Distribution (from num_clean.csv)
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # Load license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load cleaned dataset
    num = pd.read_csv("num_clean.csv", low_memory=False)
    
    # Filter to NetIncomeLoss and prep values 
    net = num[num["tag"].astype(str) == "NetIncomeLoss"].copy()
    net["value"] = pd.to_numeric(net["value"], errors="coerce")
    values = net["value"].dropna().to_numpy(dtype=float)
    
    if values.size == 0:
        raise ValueError("No NetIncomeLoss values found in num_clean.csv. Check preprocessing or tag filtering.")
    
    # Optional clipping to reduce extreme tails for readability
    CLIP = True
    if CLIP:
        p_low, p_high = np.nanpercentile(values, [1, 99])
        values = np.clip(values, p_low, p_high)
        clip_note = " (clipped 1–99%)"
    else:
        clip_note = ""
    
    # Build 10 quantile (decile) bins
    cats = pd.qcut(values, q=10, duplicates="drop")
    
    # Convert to Series before counting
    counts = pd.Series(cats).value_counts(sort=False)
    
    # Extract decile edges
    intervals = counts.index.to_list()  # pandas.Interval objects
    
    def fmtB(v):
        b = v / 1e9
        if abs(b) >= 100:
            return f"{b:.0f}B"
        elif abs(b) >= 10:
            return f"{b:.1f}B"
        else:
            return f"{b:.2f}B"
    
    bar_data = []
    for i, (iv, cnt) in enumerate(zip(intervals, counts.values), start=1):
        left, right = float(iv.left), float(iv.right)
        label = f"Q{i}: {fmtB(left)}–{fmtB(right)}"
        bar_data.append({"category": label, "value": int(cnt)})
    
    # Create BarChart (quantile histogram)
    chart = lc.BarChart(
        vertical=True,
        theme=lc.Themes.Light,
        title=f"NetIncomeLoss Distribution — Decile bins (X: Net income in USD, Y: Filing count){clip_note}"
    )
    
    chart.set_data(bar_data)
    chart.set_sorting('disabled')   # keep Q1..Q10 order
    # chart.set_bars_color('cyan')  # optional
    
    chart.open()

    Scatter Plot of Revenues vs NetIncomeLoss

    Revenues correlate positively with net income, but profitability varies widely for firms with comparable sales, indicating different cost structures and margins.

    Visualizing-Financial-Statements-Scatterplot
    # Chart 2 – Scatter Plot of Revenues vs NetIncomeLoss
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # Load license (adjust path if needed)
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load + pivot so each row (adsh) has both metrics
    num = pd.read_csv("num_clean.csv", low_memory=False)
    keep = num[num["tag"].isin(["Revenues", "NetIncomeLoss"])].copy()
    keep["value"] = pd.to_numeric(keep["value"], errors="coerce")
    keep = keep.dropna(subset=["value"])
    
    pivot = (
        keep.pivot_table(index="adsh", columns="tag", values="value", aggfunc="max")
             .dropna(subset=["Revenues", "NetIncomeLoss"])
             .reset_index()
    )
    
    if pivot.empty:
        raise ValueError("No filings contain both Revenues and NetIncomeLoss in num_clean.csv.")
    
    # Scale to USD billions for readability
    pivot["Revenues_B"] = pivot["Revenues"] / 1e9
    pivot["NetIncomeLoss_B"] = pivot["NetIncomeLoss"] / 1e9
    
    # Optional clipping to reduce extreme tails (toggle False to see full range)
    CLIP = True
    x = pivot["Revenues_B"].to_numpy(float)
    y = pivot["NetIncomeLoss_B"].to_numpy(float)
    
    if CLIP:
        x1, x99 = np.nanpercentile(x, [1, 99])
        y1, y99 = np.nanpercentile(y, [1, 99])
        x = np.clip(x, x1, x99)
        y = np.clip(y, y1, y99)
        clip_note = " (clipped 1–99%)"
    else:
        clip_note = ""
    
    # Build chart
    chart = lc.ChartXY(title=f"Revenues vs Net Income — Scatter{clip_note}", theme=lc.Themes.Dark)
    
    # Scatter series
    pts = chart.add_point_series()
    
    # Color points by NetIncomeLoss (y)
    # Build a dynamic palette spanning min → 0 → upper quantile → max
    y_min, y_q75, y_max = float(np.nanmin(y)), float(np.nanpercentile(y, 75)), float(np.nanmax(y))
    pts.set_palette_point_coloring(
        steps=[
            {"value": y_min, "color": ("darkblue")}, # deep losses
            {"value": 0.0, "color": ("lightblue")}, # break-even
            {"value": y_q75, "color": ("orange")}, # strong profits
            {"value": y_max, "color": ("red")},
        ],
        look_up_property="y",
        percentage_values=False
    )
    
    # Add data
    pts.add(x=x, y=y)
    
    # Axis titles
    chart.get_default_x_axis().set_title("Revenues (USD billions)")
    chart.get_default_y_axis().set_title("Net income (USD billions)")
    
    chart.open()

    Bar Chart of Quarterly Operating Cash Flow

    For most companies, operating cash flow is near break-even each quarter, with a small bump in Q3.

    Visualizing-Financial-Statements-Barchart
    # Chart 3 – Bar Chart of Quarterly Operating Cash Flow
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # Load license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load raw files (need fp from sub)
    num = pd.read_csv("num.csv", encoding="ISO-8859-1", low_memory=False)
    sub = pd.read_csv("sub.csv", encoding="ISO-8859-1", low_memory=False)
    
    # Filter Operating Cash Flow tag
    num["tag"] = num["tag"].astype(str)
    ocf = num[num["tag"].str.fullmatch("NetCashProvidedByUsedInOperatingActivities", case=False, na=False)].copy()
    
    ocf["value"] = pd.to_numeric(ocf["value"], errors="coerce")
    ocf = ocf.dropna(subset=["value"])
    
    # Keep USD where available
    if "uom" in ocf.columns:
        ocf["uom"] = ocf["uom"].astype(str).str.upper()
        ocf = ocf[ocf["uom"].isin({"USD", "USD$", "US DOLLARS", "US$"})]
    
    # Join quarter label from sub (`fp`), or infer from `ddate`
    ocf["adsh"] = ocf["adsh"].astype(str).str.strip()
    sub["adsh"] = sub["adsh"].astype(str).str.strip()
    ocf = ocf.merge(sub[["adsh", "fp"]], on="adsh", how="left")
    
    if "ddate" in ocf.columns:
        ocf["ddate"] = pd.to_datetime(ocf["ddate"].astype(str), errors="coerce", format="%Y%m%d")
    
    def to_quarter(row):
        fp = str(row.get("fp", "")).upper()
        if fp in {"Q1", "Q2", "Q3", "Q4"}:
            return fp
        dt = row.get("ddate")
        if pd.notna(dt):
            return f"Q{int(((dt.month - 1)//3) + 1)}"
        return np.nan
    
    ocf["quarter"] = ocf.apply(to_quarter, axis=1)
    ocf = ocf[ocf["quarter"].isin(["Q1", "Q2", "Q3", "Q4"])]
    
    # OPTIONAL: if multiple OCF facts per filing, keep the latest
    if "ddate" in ocf.columns:
        ocf = ocf.sort_values(["adsh", "ddate"]).groupby("adsh", as_index=False).tail(1)
    
    # Aggregate: median per quarter (robust), then scale
    agg = ocf.groupby("quarter")["value"].median().reindex(["Q1", "Q2", "Q3", "Q4"])
    agg_B = (agg / 1e9).fillna(0.0)
    
    # Prepare BarChart data
    bar_data = [{"category": q, "value": float(v)} for q, v in agg_B.items()]
    
    # Plot
    chart = lc.BarChart(
        vertical=True,
        theme=lc.Themes.Dark,
        title="Quarterly Operating Cash Flow - Median per quarter (X: Quarter, Y: USD Billions)"
    )
    chart.set_sorting("disabled")   # Q1..Q4 order
    chart.set_data(bar_data)
    # chart.set_bars_color('#22d3ee')  # optional
    
    chart.open()

    Correlation Heatmap of Core Accounting Metrics

    Profit metrics move together (operating -> net), and sales scale costs and gross profit, consistent with accounting logic.

    Visualizing-Financial-Statements-Correlation-Heatmap
    # Chart 4 – Correlation Heatmap of Core Accounting Metrics
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # License
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load + pivot
    num = pd.read_csv("num_clean.csv", low_memory=False)
    
    core_tags = [
        "Revenues",
        "CostOfRevenue",
        "GrossProfit",
        "OperatingIncomeLoss",
        "NetIncomeLoss",
        "NetCashProvidedByUsedInOperatingActivities",
    ]
    df = num[num["tag"].isin(core_tags)].copy()
    df["value"] = pd.to_numeric(df["value"], errors="coerce")
    df = df.dropna(subset=["value"])
    
    # one row per filing (adsh), columns = tags
    wide = (
        df.pivot_table(index="adsh", columns="tag", values="value", aggfunc="max")
          .dropna(how="all")
    )
    
    # Keep only metrics that actually exist in this quarter
    present = [c for c in core_tags if c in wide.columns]
    if len(present) < 2:
        raise ValueError("Not enough core metrics available in num_clean.csv to compute correlations.")
    
    # Pearson correlation across filings
    corr = wide[present].corr(method="pearson")
    
    # For readability, map long tag to shorter label
    display_map = {
        "Revenues": "Revenues",
        "CostOfRevenue": "Cost of Rev.",
        "GrossProfit": "Gross Profit",
        "OperatingIncomeLoss": "Operating Inc./Loss",
        "NetIncomeLoss": "Net Income/Loss",
        "NetCashProvidedByUsedInOperatingActivities": "Operating Cash Flow",
    }
    labels = [display_map.get(t, t) for t in present]
    
    # Build numeric grid (rows = Y, cols = X)
    M = corr.reindex(index=present, columns=present).to_numpy(dtype=float)
    
    # LightningChart heatmap
    chart = lc.ChartXY(
        title="Correlation Heatmap of Core Accounting Metrics (Pearson, -1 to +1)",
        theme=lc.Themes.Dark
    )
    
    rows, cols = M.shape
    heat = chart.add_heatmap_grid_series(columns=cols, rows=rows)
    heat.set_start(x=0, y=0)
    heat.set_end(x=cols, y=rows)
    heat.set_step(x=1, y=1)
    heat.set_intensity_interpolation(True)
    heat.invalidate_intensity_values(M.tolist())
    heat.hide_wireframe()
    
    # Diverging palette: blue (-1) → light (0) → red (+1)
    heat.set_palette_coloring(
        steps=[
            {"value": -1.0, "color": ("#2563eb")},   # blue
            {"value": -0.5, "color": ("#93c5fd")},
            {"value":  0.0, "color": ("#e5e7eb")},   # light gray
            {"value":  0.5, "color": ("#fdba74")},
            {"value":  1.0, "color": ("#ef4444")},   # red
        ],
        look_up_property="value",
        interpolate=True
    )
    
    # Axis titles (category labels vary by LC build; showing order below)
    chart.get_default_x_axis().set_title("Metrics (X)")
    chart.get_default_y_axis().set_title("Metrics (Y)")
    
    # Legend
    chart.add_legend(data=heat).set_title("Correlation")
    
    # Print label order to console so you can include it in your report
    print("Heatmap order (left→right / bottom→top):")
    for i, name in enumerate(labels, 1):
        print(f"{i}. {name}")
    
    chart.open()

    Line Chart of Revenues Over Time

    Weekly median revenue barely moves for the typical filer.

    Visualizing-Financial-Statements-Line-Chart
    # Chart 5 – Line Chart of Revenues Over Time
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # License (adjust if needed)
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load data
    num = pd.read_csv("num_clean.csv", low_memory=False)
    # Use sub.csv because it reliably includes 'filed'/'accepted' in the raw dump
    sub = pd.read_csv("sub.csv", encoding="ISO-8859-1", low_memory=False)
    
    # Revenues per filing
    rev_rows = num[num["tag"].astype(str) == "Revenues"].copy()
    rev_rows["value"] = pd.to_numeric(rev_rows["value"], errors="coerce")
    rev_rows = rev_rows.dropna(subset=["value"])
    
    # Keep one revenue per filing (max if multiple facts)
    rev = rev_rows.groupby("adsh", as_index=False)["value"].max()
    
    # Merge possible date columns from sub
    sub_dates = sub[["adsh", "filed", "accepted", "period"]].copy()
    rev = rev.merge(sub_dates, on="adsh", how="left")
    
    # Also bring in ddate from num as last fallback
    rev_ddate = (
        rev_rows[["adsh", "ddate"]]
        .dropna()
        .drop_duplicates(subset=["adsh"], keep="last")
    )
    rev = rev.merge(rev_ddate, on="adsh", how="left")
    
    # Build a single datetime column with fallbacks
    def parse_ymd(s):
        return pd.to_datetime(str(s), errors="coerce", format="%Y%m%d")
    
    rev["filed_dt"]    = parse_ymd(rev["filed"])
    rev["accepted_dt"] = pd.to_datetime(rev["accepted"], errors="coerce")  # already timestamp-like
    rev["period_dt"]   = parse_ymd(rev["period"])
    rev["ddate_dt"]    = pd.to_datetime(rev["ddate"], errors="coerce", format="%Y%m%d")
    
    rev["date"] = rev["filed_dt"]
    rev.loc[rev["date"].isna(), "date"] = rev.loc[rev["date"].isna(), "accepted_dt"]
    rev.loc[rev["date"].isna(), "date"] = rev.loc[rev["date"].isna(), "period_dt"]
    rev.loc[rev["date"].isna(), "date"] = rev.loc[rev["date"].isna(), "ddate_dt"]
    
    # Drop rows with no usable date
    rev = rev.dropna(subset=["date"])
    if rev.empty:
        raise ValueError("No valid dates found (filed/accepted/period/ddate). Check your sub/num files.")
    
    # Aggregate to weekly median (robust)
    weekly = (
        rev.set_index("date")
           .groupby(pd.Grouper(freq="W-MON"))["value"]
           .median()
           .dropna()
    )
    # Fallback to daily if very sparse
    if weekly.size < 3:
        weekly = (
            rev.set_index("date")
               .groupby(pd.Grouper(freq="D"))["value"]
               .median()
               .dropna()
        )
    
    # Build series data (X as POSIX seconds, Y in USD billions)
    y_vals = (weekly / 1e9).to_numpy(dtype=float)
    x_vals = (weekly.index.astype("int64") // 10**9).to_numpy(dtype=float)
    
    # Plot
    chart = lc.ChartXY(theme=lc.Themes.Light, title="Revenues Over Time - Weekly median (USD billions)")
    line = chart.add_line_series()
    line.append_samples(x_values=x_vals, y_values=y_vals)
    
    # Axes
    x_axis = chart.get_default_x_axis()
    y_axis = chart.get_default_y_axis()
    y_axis.set_title("Revenues (USD billions)")
    
    # Use date ticks if available
    try:
        x_axis.set_tick_strategy(lc.AxisTickStrategies.DateTime)
        x_axis.set_title("Filing date")
    except Exception:
        x_axis.set_title("Time (POSIX seconds)")
    
    chart.open()

    Stacked Bar Chart of Liabilities (Current vs Noncurrent)

    The typical filer carries more short-term (current) debt than long-term.

    Visualizing-Financial-Statements-Stacked-Bar-Chart
    # Chart 6 – Stacked Bar Chart of Liabilities (Current vs Noncurrent)
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # Load license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load raw SEC files (need balance-sheet tags + quarter labels)
    num = pd.read_csv("num.csv", encoding="ISO-8859-1", low_memory=False)
    sub = pd.read_csv("sub.csv", encoding="ISO-8859-1", low_memory=False)
    
    num["tag"] = num["tag"].astype(str)
    num["adsh"] = num["adsh"].astype(str).str.strip()
    sub["adsh"] = sub["adsh"].astype(str).str.strip()
    
    # Keep only the tags we need
    tags = ["Liabilities", "LiabilitiesCurrent", "LiabilitiesNoncurrent"]
    df = num[num["tag"].isin(tags)].copy()
    
    # Numeric + USD filter
    df["value"] = pd.to_numeric(df["value"], errors="coerce")
    df = df.dropna(subset=["value"])
    if "uom" in df.columns:
        df["uom"] = df["uom"].astype(str).str.upper()
        df = df[df["uom"].isin({"USD", "USD$", "US DOLLARS", "US$"})]
    
    # If multiple ddates per (adsh, tag), keep the latest
    if "ddate" in df.columns:
        df["ddate"] = pd.to_datetime(df["ddate"].astype(str), errors="coerce", format="%Y%m%d")
        df = df.sort_values(["adsh", "tag", "ddate"]).groupby(["adsh", "tag"], as_index=False).tail(1)
    
    # Pivot to one row per filing
    wide = df.pivot_table(index="adsh", columns="tag", values="value", aggfunc="max")
    
    # Derive current & noncurrent liabilities per filing
    def split_liabilities(row):
        cur  = row.get("LiabilitiesCurrent", np.nan)
        nonc = row.get("LiabilitiesNoncurrent", np.nan)
        total = row.get("Liabilities", np.nan)
    
        # If both provided, use them.
        if pd.notna(cur) and pd.notna(nonc):
            return cur, nonc
        # If total + one component, infer the other.
        if pd.notna(total) and pd.notna(cur):
            nonc_inf = total - cur
            return cur, (nonc_inf if np.isfinite(nonc_inf) else np.nan)
        if pd.notna(total) and pd.notna(nonc):
            cur_inf = total - nonc
            return (cur_inf if np.isfinite(cur_inf) else np.nan), nonc
        # Otherwise, insufficient info.
        return np.nan, np.nan
    
    cur_vals, nonc_vals = [], []
    for _, row in wide.iterrows():
        c, n = split_liabilities(row)
        cur_vals.append(c); nonc_vals.append(n)
    
    wide = wide.assign(Liab_Current=cur_vals, Liab_Noncurrent=nonc_vals)
    wide = wide[(wide["Liab_Current"].notna()) | (wide["Liab_Noncurrent"].notna())].reset_index()
    
    # Attach quarter labels (prefer sub.fp; else infer from sub.filed)
    sub_small = sub[["adsh", "fp", "filed"]].copy()
    wide = pd.merge(wide, sub_small, on="adsh", how="left")
    
    # Parse fallback date to infer quarter if needed
    def to_quarter(fp, filed):
        fp = str(fp).upper()
        if fp in {"Q1", "Q2", "Q3", "Q4"}:
            return fp
        dt = pd.to_datetime(str(filed), errors="coerce", format="%Y%m%d")
        if pd.notna(dt):
            return f"Q{int(((dt.month - 1)//3) + 1)}"
        return np.nan
    
    wide["quarter"] = [to_quarter(fp, filed) for fp, filed in zip(wide["fp"], wide["filed"])]
    wide = wide[wide["quarter"].isin(["Q1", "Q2", "Q3", "Q4"])]
    
    if wide.empty:
        raise ValueError("No filings with usable Liabilities split and quarter labels.")
    
    # Aggregate to median per quarter (USD billions)
    agg = (
        wide.groupby("quarter")[["Liab_Current", "Liab_Noncurrent"]]
            .median()
            .reindex(["Q1","Q2","Q3","Q4"])
            / 1e9
    )
    agg = agg.fillna(0.0)
    
    categories = agg.index.tolist()  # ["Q1","Q2","Q3","Q4"]
    current_vals = agg["Liab_Current"].astype(float).tolist()
    noncurrent_vals = agg["Liab_Noncurrent"].astype(float).tolist()
    
    # Plot stacked bars
    chart = lc.BarChart(
        vertical=True,
        theme=lc.Themes.Light,
        title="Liabilities Structure by Quarter (X: Quarter, Y: USD Billions, median per filing)"
    )
    
    chart.set_sorting("disabled")  # keep Q1..Q4 order
    chart.set_data_stacked(
        categories,
        [
            {"subCategory": "Current",    "values": current_vals},
            {"subCategory": "Noncurrent", "values": noncurrent_vals},
        ],
    )
    chart.add_legend().add(chart)
    
    chart.open()

    Box Plot of Gross Profit Margin by Industry

    Profitability is industry-driven: asset-light sectors (Services/Finance) tend to earn higher margins, while cost-heavy sectors (Retail/Wholesale) run thin margins.

    Visualizing-Financial-Statements-Boxplot
    # Chart 7 – Box Plot of Gross Profit Margin by Industry
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import pandas as pd
    import numpy as np
    
    # Load your license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load data
    num = pd.read_csv("num_clean.csv", low_memory=False)
    sub = pd.read_csv("sub_clean.csv", low_memory=False)   # needs 'sic'
    
    # Keep Revenues & GrossProfit, one value per filing
    df = num[num["tag"].isin(["Revenues", "GrossProfit"])].copy()
    df["value"] = pd.to_numeric(df["value"], errors="coerce")
    df = df.dropna(subset=["value"])
    
    wide = (
        df.pivot_table(index="adsh", columns="tag", values="value", aggfunc="max")
          .dropna(subset=["Revenues", "GrossProfit"])
          .reset_index()
    )
    
    # Merge SIC
    sub_small = sub[["adsh", "sic"]].copy()
    sub_small["adsh"] = sub_small["adsh"].astype(str).str.strip()
    wide["adsh"] = wide["adsh"].astype(str).str.strip()
    wide = wide.merge(sub_small, on="adsh", how="left")
    
    # Map SIC → division (compact)
    def sic_to_division(s):
        try:
            s = int(s)
        except: return "Other"
        if   100 <= s <=  999:  return "Ag"
        if  1000 <= s <= 1499:  return "Mining"
        if  1500 <= s <= 1799:  return "Const"
        if  2000 <= s <= 3999:  return "Mfg"
        if  4000 <= s <= 4999:  return "Trans"
        if  5000 <= s <= 5199:  return "Wh"      
        if  5200 <= s <= 5999:  return "Retail"
        if  6000 <= s <= 6799:  return "Fin"     
        if  7000 <= s <= 8999:  return "Svc"
        if  9000 <= s <= 9729:  return "Public"
        return "Other"
    
    wide["industry"] = wide["sic"].apply(sic_to_division)
    
    # Gross Profit Margin (%), clip for readability
    rev = wide["Revenues"].replace(0, np.nan)
    gpm = (wide["GrossProfit"] / rev) * 100.0
    wide["gpm_pct"] = gpm.replace([np.inf, -np.inf], np.nan)
    wide = wide.dropna(subset=["gpm_pct"])
    wide["gpm_pct"] = wide["gpm_pct"].clip(-100, 100)
    
    # Top 6 industries by count
    top_inds = (
        wide.groupby("industry")["gpm_pct"].size()
            .sort_values(ascending=False).head(6).index.tolist()
    )
    plot_df = wide[wide["industry"].isin(top_inds)].copy()
    if plot_df.empty:
        raise ValueError("No gross profit margin data after filtering. Check num_clean.csv contents.")
    
    # Create chart
    chart = lc.ChartXY(theme=lc.Themes.Light, title="Gross Profit Margin by Industry — Box Plot (%)")
    
    # Prepare box data with extra spacing + short labels
    SPACING = 3  # widen distance between categories to avoid overlap
    dataset, x_out, y_out = [], [], []
    
    for i, cat in enumerate(top_inds):
        vals = plot_df.loc[plot_df["industry"] == cat, "gpm_pct"].dropna().to_numpy()
        if vals.size < 2:
            continue
    
        start = (i * SPACING) + 1
        end = start + 1
    
        q1 = float(np.percentile(vals, 25))
        q3 = float(np.percentile(vals, 75))
        med = float(np.median(vals))
        iqr = q3 - q1
        lo_b, hi_b = q1 - 1.5 * iqr, q3 + 1.5 * iqr
    
        inliers = vals[(vals >= lo_b) & (vals <= hi_b)]
        if inliers.size == 0:
            inliers = np.array([q1, q3], dtype=float)
    
        dataset.append({
            "start": start, "end": end,
            "lowerQuartile": q1, "upperQuartile": q3,
            "median": med,
            "lowerExtreme": float(inliers.min()),
            "upperExtreme": float(inliers.max()),
        })
    
        outs = vals[(vals < lo_b) | (vals > hi_b)]
        if outs.size:
            x_out += [start + 0.5] * outs.size
            y_out += outs.tolist()
    
    # Add box series
    series = chart.add_box_series()
    series.add_multiple(dataset)
    
    # Add outliers
    out = chart.add_point_series(sizes=True)
    out.set_point_color("red")
    if y_out:
        out.append_samples(x_values=x_out, y_values=y_out, sizes=[9] * len(y_out))
    
    # Axis titles
    x_axis = chart.get_default_x_axis()
    y_axis = chart.get_default_y_axis()
    x_axis.set_title("Industry (SIC division, abbreviated)")
    y_axis.set_title("Gross profit margin (%)")
    
    # Custom ticks with short labels (no overlap)
    try:
        # Set the visible range to fit all categories
        x_axis.set_interval(0, (len(top_inds) - 1) * SPACING + 3)
        for i, cat in enumerate(top_inds):
            tick = x_axis.add_custom_tick()
            tick.set_value((i * SPACING) + 1.5)  # centered between start & end
            tick.set_text(cat)                   # already short (e.g., Mfg, Retail)
    except Exception:
        # If custom ticks aren't supported, print order to console
        print("Custom ticks not supported; category order:", top_inds)
    
    chart.open()

    Scatter Plot of Assets vs StockholdersEquity

    Across filings, assets and equity move together, as expected from the balance sheet.

    Visualizing-Financial-Statements-Scatterplot-Assets
    # Chart 8 – Scatter Plot of Assets vs StockholdersEquity
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import numpy as np
    import pandas as pd
    
    # Load license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load numeric (need balance-sheet tags)
    num = pd.read_csv("num.csv", encoding="ISO-8859-1", low_memory=False)
    
    num["tag"] = num["tag"].astype(str)
    num["adsh"] = num["adsh"].astype(str).str.strip()
    num["value"] = pd.to_numeric(num["value"], errors="coerce")
    num = num.dropna(subset=["value"])
    
    # Keep only the tags we need
    tags = [
        "Assets", "AssetsCurrent", "AssetsNoncurrent",
        "Liabilities",
        "StockholdersEquity",
        "StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest",
    ]
    df = num[num["tag"].isin(tags)].copy()
    
    # If multiple dates per (adsh, tag), keep latest
    if "ddate" in df.columns:
        df["ddate"] = pd.to_datetime(df["ddate"].astype(str), errors="coerce", format="%Y%m%d")
        df = df.sort_values(["adsh", "tag", "ddate"]).groupby(["adsh", "tag"], as_index=False).tail(1)
    
    # Pivot to wide per filing
    wide = df.pivot_table(index="adsh", columns="tag", values="value", aggfunc="max")
    
    # Compute Assets (proxy) and Equity (choose best available)
    def assets_proxy(row):
        if pd.notna(row.get("Assets")):
            return row["Assets"]
        ac, anc = row.get("AssetsCurrent"), row.get("AssetsNoncurrent")
        if pd.notna(ac) and pd.notna(anc):
            return ac + anc
        liab = row.get("Liabilities")
        eq = row.get("StockholdersEquity")
        if pd.isna(eq):
            eq = row.get("StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest")
        if pd.notna(liab) and pd.notna(eq):
            return liab + eq
        return np.nan
    
    def equity_value(row):
        if pd.notna(row.get("StockholdersEquity")):
            return row.get("StockholdersEquity")
        return row.get("StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest")
    
    wide["Assets_val"] = wide.apply(assets_proxy, axis=1)
    wide["Equity_val"] = wide.apply(equity_value, axis=1)
    
    data = wide[["Assets_val", "Equity_val"]].dropna()
    if data.empty:
        raise ValueError("No filings with both Assets (direct/proxy) and Stockholders' Equity found in num.csv.")
    
    # Scale to USD billions
    x = (data["Assets_val"] / 1e9).to_numpy(float)
    y = (data["Equity_val"] / 1e9).to_numpy(float)
    
    # Optional clipping to reduce extreme tails
    CLIP = True
    if CLIP:
        x1, x99 = np.nanpercentile(x, [1, 99])
        y1, y99 = np.nanpercentile(y, [1, 99])
        x = np.clip(x, x1, x99)
        y = np.clip(y, y1, y99)
        clip_note = " (clipped 1–99%)"
    else:
        clip_note = ""
    
    # Plot
    chart = lc.ChartXY(title=f"Assets vs Stockholders’ Equity - Scatter{clip_note}", theme=lc.Themes.Light)
    pts = chart.add_point_series()
    
    # Color by equity magnitude (y) for a bit of depth; safe for all builds
    y_min, y_q75, y_max = float(np.nanmin(y)), float(np.nanpercentile(y, 75)), float(np.nanmax(y))
    pts.set_palette_point_coloring(
        steps=[
            {"value": y_min, "color": ("#1e3a8a")},  # darker for low equity
            {"value": 0.0,   "color": ("#60a5fa")},  # near zero
            {"value": y_q75, "color": ("#f59e0b")},  # higher equity
            {"value": y_max, "color": ("#ef4444")},  # very high equity
        ],
        look_up_property="y",
        percentage_values=False
    )
    
    pts.add(x=x, y=y)
    
    # Axes
    chart.get_default_x_axis().set_title("Assets (USD billions)")
    chart.get_default_y_axis().set_title("Stockholders’ equity (USD billions)")
    
    chart.open()

    Heatmap of Filing Frequency by Year and Form Type

    Filing activity is seasonal: 10-Q dominates around quarter deadlines; 10-K clusters around fiscal year-end; 8-K trickles through the period.

    Visualizing-Financial-Statements-Heatmap
    # Chart 9 – Heatmap of Filing Frequency by Year and Form Type
    # Developed with AI Assistance to demonstrate LightningChart Python
    
    import lightningchart as lc
    import numpy as np
    import pandas as pd
    
    # Load license
    with open("D:/HAMK/Internship/MyProjects/lc_license.txt", "r") as f:
        lc.set_license(f.read().strip())
    
    # Load submissions (forms + dates)
    sub = pd.read_csv("sub.csv", encoding="ISO-8859-1", low_memory=False)
    sub["adsh"] = sub["adsh"].astype(str).str.strip()
    sub["form"] = sub["form"].astype(str).str.strip()
    
    # Helpers
    def get_col(df, name):
        return df[name] if name in df.columns else pd.Series([pd.NA] * len(df), index=df.index)
    def parse_ymd_series(series):
        return pd.to_datetime(series.astype(str), errors="coerce", format="%Y%m%d")
    
    # Build a single datetime (filed → accepted → period → fy)
    dt = parse_ymd_series(get_col(sub, "filed"))
    dt = dt.fillna(pd.to_datetime(get_col(sub, "accepted"), errors="coerce"))
    dt = dt.fillna(parse_ymd_series(get_col(sub, "period")))
    fy = pd.to_numeric(get_col(sub, "fy"), errors="coerce")
    dt = dt.fillna(pd.to_datetime(fy, errors="coerce", format="%Y"))
    
    sub = sub.assign(date=dt).dropna(subset=["date"])
    
    # Prefer YEAR; if only 1 year in the slice, fallback to MONTH (YYYY-MM)
    years_n = sub["date"].dt.year.nunique()
    if years_n >= 2:
        y_field = "year"
        sub["year"] = sub["date"].dt.year.astype(int)
    else:
        y_field = "month"
        sub["month"] = sub["date"].dt.to_period("M").astype(str)
    
    # Keep top 8 forms; rest = "Other"
    top_forms = sub["form"].value_counts().head(8).index.tolist()
    sub["form_simpl"] = np.where(sub["form"].isin(top_forms), sub["form"], "Other")
    
    # Pivot to Y×Form counts
    counts = (
        sub.groupby([y_field, "form_simpl"]).size()
           .unstack(fill_value=0)
           .sort_index()
    )
    
    # Remove empty columns (no filings)
    counts = counts.loc[:, counts.sum(axis=0) > 0]
    if counts.empty:
        raise ValueError("No filing counts available after parsing sub.csv.")
    
    # Intensity matrix (log + normalize to 0..1)
    M = counts.to_numpy(dtype=float)
    M_log = np.log1p(M)
    m_min, m_max = float(M_log.min()), float(M_log.max())
    if m_max == m_min:
        # Still flat → center fill to show up with legend, avoids “all green”
        M_norm = np.full_like(M_log, 0.5)
    else:
        M_norm = (M_log - m_min) / (m_max - m_min)
    
    # LightningChart heatmap expects a list shaped [columns][rows]
    # We have rows=Y, cols=Forms -> transpose
    intensity = M_norm.T.tolist()
    
    rows = counts.shape[0]              # number of Y bins
    cols = counts.shape[1]              # number of form columns
    Y_labels = counts.index.tolist()    # Y axis labels (years or months)
    X_labels = counts.columns.tolist()  # X axis labels (form types)
    
    # LightningChart heatmap
    chart = lc.ChartXY(
        title=f"Filing Frequency by {y_field.capitalize()} & Form Type - Heatmap",
        theme=lc.Themes.Light
    )
    
    heat = chart.add_heatmap_grid_series(columns=cols, rows=rows)
    heat.set_start(x=0, y=0)
    heat.set_end(x=cols, y=rows)
    heat.set_step(x=1, y=1)
    heat.set_intensity_interpolation(False)      # crisp cells
    heat.invalidate_intensity_values(intensity)  # <- transposed list
    heat.hide_wireframe()
    
    # Palette 0→1
    heat.set_palette_coloring(
        steps=[
            {"value": 0.00, "color": ("#eef2ff")},   # low
            {"value": 0.50, "color": ("#93c5fd")},   # mid
            {"value": 1.00, "color": ("#ef4444")},   # high
        ],
        look_up_property="value",
        interpolate=True
    )
    
    # Axes & labeled ticks (center each cell)
    x_axis = chart.get_default_x_axis()
    y_axis = chart.get_default_y_axis()
    x_axis.set_title("Form type")
    y_axis.set_title(y_field.capitalize())
    
    try:
        x_axis.set_interval(0, cols)
        for i, f in enumerate(X_labels):
            t = x_axis.add_custom_tick(); t.set_value(i + 0.5); t.set_text(f)
    
        y_axis.set_interval(0, rows)
        for j, y in enumerate(Y_labels):
            t = y_axis.add_custom_tick(); t.set_value(j + 0.5); t.set_text(str(y))
    except Exception:
        print("Forms (left→right):", X_labels)
        print(f"{y_field.capitalize()} (bottom→top):", Y_labels)
    
    chart.add_legend(data=heat).set_title("Relative intensity")
    chart.open()

    Conclusion

    This project explored the SEC Financial Statement Data Sets using LightningChart Python to start visualizing financial statements’ key measures of profitability, liquidity, and solvency. Nine charts were developed, including histograms, scatter plots, bar charts, heatmaps, and box plots to highlight distribution patterns, cross-metric relationships, industry differences, and regulatory filing trends. Together, these visualizations provided a clearer understanding of how U.S. companies’ financial structures and results vary over time and across sectors.

    Continue learning with LightningChart

    How to Create a Strip Chart

    How to Create a Strip Chart

    Written by a human | Updated on April 9th, 2025What is a Strip chart application and what are the modern equivalents to it?  Before computers exist or were taking their first steps, a Strip chart was a way to visualize an analog electrical signal. Voltage was...

    Data Visualization Template for Electron JS | LightningChart®

    Updated on April 4th, 2025 | Written by humanAre you already building cross-platform applications with Electron JS?  In some of our previous articles, we’ve worked on TypeScript projects where we created pie charts and vibration chart applications. And as we...

    Bar chart race JavaScript

    Bar chart race JavaScript

    Updated on April 14th, 2025 | Written by humanBar chart race JavaScript  When I wrote this article, the COVID-19 pandemic was at its peak point. Today, things are much better thanks to vaccinations that continued their steady positive global effect. With this bar...