Mindoff Dataport

Build high-fidelity Excel and PDF reports from reusable .xlsx templates.

Mindoff Dataport turns styled Excel workbooks into reusable report templates, compiles runtime data into a portable ReportBundle, and exports .xlsx and .pdf outputs while preserving the layout, structure, and styling you designed.

Documentation: https://dataport.mindoff.work

Source: https://github.com/mindoffwork/mindoff-dataport

Key Features¶

Template-First Report Generation
Turn real Excel workbooks into reusable report templates without rebuilding layouts in code.
One Bundle, Two Formats
The same compiled bundle renders to both .xlsx and .pdf. No second pipeline, no separate styling pass.
Dataframes Plug Directly Into Templates
Connect dataframe inputs directly to templates so report generation fits naturally into modern data workflows.
Streaming for Large Exports
In streaming mode, Parquet sources are read in batches and rows are written incrementally. Peak memory stays roughly flat as row count grows, instead of climbing with the dataset.
Flexible Repeating and Dynamic Sheets
Generate repeated sections and dynamic sheets for customer-wise, region-wise, or report-wise output from a single template.
Adjustable Layout at Export Time
Column occupation, alignment, and collision shifting are configurable at runtime, without touching the original template.

Performance¶

Streaming mode holds near-constant peak memory regardless of dataset size. These benchmarks compare it against raw openpyxl, xlsxwriter, and ReportLab loops with equivalent layout and styling (the most direct alternative).

XLSX export: export time and peak memory at scale

Fig. 1: XLSX export. Left: wall-clock time for all Mindoff modes; both streaming and fidelity scale O(n) linearly. Right: peak RSS; Mindoff streaming holds near-constant while openpyxl and xlsxwriter raw loops grow with dataset size.

PDF export: export time and peak memory at scale

Fig. 2: PDF export. Left: wall-clock time; linear O(n) scaling. Right: peak RSS; Mindoff streaming vs. ReportLab raw loop.

Scenario	Mode	Why
≤ 50K rows, full style fidelity	`export_mode="fidelity"`	Full merged-cell and style support; no streaming constraints
> 50K rows, XLSX	`export_mode="streaming"`	Near-constant memory regardless of row count
Any size, PDF	automatic	PDF always paginates; no `export_mode` setting needed
> 1M rows, split output	`streaming` + `max_rows_per_workbook`	Splits output across multiple workbook files

Full methodology, fairness notes, and instructions to reproduce the numbers yourself are in the Benchmarking guide.

Quick Start¶

Think of it like a mail merge for spreadsheets: you design the layout once in Excel, then the library fills in the data. Every report follows four steps: extract → inspect → compile → export.

1. Install the Package¶

pip install mindoff-dataport

For dataframe support (Polars DataFrames or LazyFrames):

pip install "mindoff-dataport[polars]"

2. Extract the Template¶

Read your .xlsx file into a schema. This captures the layout, every style, and the {{key:type}} placeholders you marked in cells. You run this once per template, not once per report.

import polars as pl
from mindoff_dataport import mo_dataport

template = mo_dataport.extract("invoice_template.xlsx")

3. Inspect What the Template Needs¶

Before you build a payload, ask the template what it expects. Useful when working with a template someone else designed, or coming back to one after a while.

required_inputs = mo_dataport.inputs(template)
# {'Invoice': {'customer_name': 'string', 'invoice_number': 'number', 'line_items': 'dataframe'}}

The result is sheet-scoped: the outer key is the sheet name, and the inner keys are the placeholders on that sheet with their expected types.

4. Compile: Bind Your Data¶

Hand the template your real data, keyed by sheet name. The library validates it against the contract from step 3 and produces a ReportBundle: a portable artifact you can export immediately or save to disk and export later.

line_items = pl.DataFrame({"item": ["Widget A", "Widget B"], "amount": [125, 275]})

bundle = mo_dataport.compile(
    template,
    data={
        "Invoice": {
            "customer_name": "Acme Industries",
            "invoice_number": 1024,
            "line_items": line_items,
        }
    },
)

5. Export¶

Render the bundle to a file. The same bundle drives both formats. No second pipeline.

mo_dataport.export(bundle, "invoice_filled.xlsx")
mo_dataport.export(bundle, "invoice_filled.pdf", format="pdf")

When you're ready to go further (placeholders, the data contract, streaming, repeats, dynamic sheets, custom fonts, and the full API reference), head to the developer guide.

License¶

Released under the MIT License.