Skip to content

API Reference

mindoff-dataport has a deliberately small public surface: four functions that mirror the four steps of building a report. This page documents each one in full. The reference below is generated directly from the docstrings in the source, so it always matches the version you have installed.

Import Alias

The recommended entry point bundles all four functions under one namespace:

from mindoff_dataport import mo_dataport

mo_dataport.extract(...)   # extract_template
mo_dataport.inputs(...)    # get_template_inputs
mo_dataport.compile(...)   # compile_report_bundle
mo_dataport.export(...)    # export_report_bundle

Every function is also importable at the top level under both its full name and a short alias:

from mindoff_dataport import (
    extract_template,        # alias: extract
    get_template_inputs,     # alias: inputs
    compile_report_bundle,   # alias: compile
    export_report_bundle,    # alias: export
)

Functions

1. Template Extraction

Read an .xlsx template and return a WorkbookSchema.

Extraction captures cell values, styles (font, fill, alignment, borders), column widths and row heights, merged regions, manual print breaks, theme colors, and every {{key:type}} placeholder it finds. The schema is the in-memory blueprint everything downstream is built from.

Usage

from mindoff_dataport import mo_dataport

schema = mo_dataport.extract("invoice_template.xlsx")

# or the explicit name:
from mindoff_dataport import extract_template
schema = extract_template("invoice_template.xlsx")
Parameter Type Required Description
path str Yes Path to the .xlsx template file

Returns: WorkbookSchema — the extracted template blueprint.

2. Input Discovery

Inspect a schema and report every input the template expects.

Returns a sheet-scoped dictionary keyed by sheet name, then by placeholder key, with the value being the placeholder type ("string", "number", "date", "dataframe", and so on). Call this before building a payload so you know exactly what compile() will ask for — no guessing, no trial runs.

Usage

contract = mo_dataport.inputs(schema)

# or the explicit name:
from mindoff_dataport import get_template_inputs
contract = get_template_inputs(schema)
Parameter Type Required Description
template WorkbookSchema Yes Schema produced by extract()

Returns: dict[str, dict[str, str | list]] — the per-sheet input contract.

Example output

{
    "Sales Summary": {
        "report_title": "string",
        "generated_on": "date",
        "sales_rows": "dataframe",
    }
}

3. Bundle Compilation

Bind runtime data to a template and produce a ReportBundle.

Compilation validates the payload against the sheet contract, resolves scalar cells in place, materialises Polars DataFrames / LazyFrames to Parquet, stores compact dataframe anchors and repeat plans, and (optionally) shifts template content out of the way of expanding dataframes. The result is a portable bundle you can export now, or persist on disk and export later from any process.

Usage

bundle = mo_dataport.compile(
    template=schema,
    data=payload,
    bundle_path="out_bundle",      # omit to keep the bundle in memory
    dataframe_options=None,
    dataframe_shift="both",
)
Parameter Type Required Description
template WorkbookSchema Yes Schema from extract()
data dict[str, Any] Yes Sheet-scoped payload (see the Data Contract guide)
bundle_path str | None No Write the bundle to this directory; omit for in-memory only
dataframe_options dict[str, Any] | None No Per-sheet, per-placeholder column occupation and alignment overrides
dataframe_shift str No How surrounding cells/merges move around dataframe output: "both", "horizontal", "vertical", or "none"

Returns: ReportBundle — the compiled, exportable bundle.

Raises: KeyError if a required placeholder key is missing from the payload.

4. Bundle Export

Render a compiled bundle to a file on disk.

Accepts either an in-memory ReportBundle or a path to a persisted bundle directory, and writes .xlsx or .pdf output. XLSX supports a full-fidelity mode and a low-memory streaming mode; PDF always paginates automatically. Format-specific keyword options (export mode, sizing, fonts, page size, and so on) are passed through **options.

Usage

mo_dataport.export(bundle, "report.xlsx", format="xlsx")
mo_dataport.export("out_bundle", "report.pdf", format="pdf")
Parameter Type Required Default Description
bundle_or_path ReportBundle | str Yes In-memory bundle or path to a bundle directory
output_path str Yes Destination file path (.xlsx or .pdf)
format str No "xlsx" "xlsx" or "pdf". "image" is reserved and raises NotImplementedError
**options No Sizing and format-specific options (see the Exporting guides)

Returns: None for fidelity XLSX and all PDF exports. For streaming XLSX, a list[str]: one workbook path when no split is needed, or a single .zip path when the export is split across multiple workbooks.

Supporting Types

ReportBundle

The canonical intermediate artifact produced by compile() and consumed by export(). It can live in memory or be persisted to a directory and reloaded later:

from mindoff_dataport import ReportBundle

bundle = ReportBundle.load("saved_bundle")   # reload a persisted bundle
mo_dataport.export(bundle, "report.xlsx")

The on-disk layout is documented in Architecture → The Pipeline.

repeat_records(...)

A helper that wraps source-backed repeat payloads (an ordered set of scalar record columns plus constant dataframe payloads) so large repeat sections don't have to materialise every record in memory.

from mindoff_dataport import repeat_records

records = repeat_records(scalar_records, constants={"line_items": shared_df})

See the Repeat Sections recipe for context on when to reach for it.