Exporting to Excel (XLSX)¶

Once you have a bundle, exporting to Excel is a single call. The interesting choice is how: a full-fidelity render that supports every feature, or a streaming render that holds memory flat no matter how many rows you throw at it.

mo_dataport.export(bundle, "report.xlsx")                       # fidelity (default)
mo_dataport.export(bundle, "report.xlsx", export_mode="streaming")

Prerequisites¶

A compiled ReportBundle or a path to a persisted bundle directory (see Compiling Report Bundles)
Polars installed if the bundle contains dataframe sources: pip install "mindoff-dataport[polars]"

Implementation¶

1. Choose an Export Mode¶

Mode	Best for	Trade-off
`"fidelity"`	Up to 50K rows, full styling	Renders in memory; supports every feature, no constraints
`"streaming"`	Large or unbounded datasets	Near-constant memory; a few feature limits (listed below)

Fidelity is the default because most reports are small and want every feature. Reach for streaming when row counts climb into the hundreds of thousands and memory becomes the bottleneck. Under the hood, streaming reads Parquet in batches and writes rows incrementally, so peak memory barely moves as the dataset grows. The numbers are in Architecture: Benchmarking.

Streaming trades a little flexibility for that flat memory profile:

No hug sizing (it would require buffering every row to measure content).
No merged cells may remain intersecting dataframe-content rows after compile-time shifting.
Only one dataframe-content placeholder per non-repeat sheet.

If your report fits within these, streaming is a free win on memory.

2. Configure Export Options¶

All options are keyword arguments to export().

Option	Type	Default	Description
`export_mode`	`str`	`"fidelity"`	`"fidelity"` (full in-memory render) or `"streaming"` (row-by-row, lower memory)
`streaming_engine`	`str`	`"openpyxl"`	Streaming writer backend: `"openpyxl"` or `"xlsxwriter"`. Rejected in fidelity mode
`column_width_mode`	`str`	schema value	`"fixed"`, `"even"`, or `"hug"`. Overrides the template schema. See Sizing & Styling
`row_height_mode`	`str`	schema value	`"fixed"`, `"even"`, or `"hug"`. Overrides the template schema
`default_column_width`	`float`	schema value	Fallback width (Excel character units) for `"even"` mode or when no width is stored
`default_row_height`	`float`	schema value	Fallback height (points) for `"even"` mode or when no height is stored
`streaming_chunk_rows`	`int`	`50000`	Parquet rows read per batch during streaming
`max_rows_per_workbook`	`int`	`1048576`	Split output into multiple `.xlsx` parts when a sheet reaches this row limit
`auto_delete_bundle`	`bool`	`False`	Delete the bundle directory after a successful export

Which engine?

OpenPyXL is the default streaming engine and the safe choice. xlsxwriter is opt-in and can be faster for some pure-write workloads; benchmark both against your own data if write speed is critical.

3. Split Large Output¶

Excel caps a worksheet at 1,048,576 rows. When you're exporting beyond that (or just want manageable file sizes), set max_rows_per_workbook. In streaming mode, the library writes the workbook parts, bundles them into output.zip, deletes the loose parts, and returns a one-item list[str] containing the zip path.

outputs = mo_dataport.export(
    bundle,
    "output.xlsx",
    export_mode="streaming",
    max_rows_per_workbook=500_000,
)
# outputs -> ["output.zip"] when a split occurred

4. Understand the Return Value¶

Fidelity XLSX returns None.
Streaming XLSX returns a list[str]: a single workbook path when no split is needed, or one .zip path when the output was split.

Core Concepts¶

1. How Streaming Achieves Flat Memory¶

Streaming reads dataframe-content from Parquet in batches (controlled by streaming_chunk_rows, default 50K rows) and writes each batch to the workbook immediately. At any point, only one batch is in memory. That's why peak RSS barely moves as dataset size grows. Fidelity, by contrast, expands all rows before writing, so memory scales with row count. The numbers are in Architecture: Benchmarking.

2. When to Choose Streaming over Fidelity¶

Row count alone isn't the deciding factor. Reach for streaming when:

Row counts exceed tens of thousands and memory is a constraint.
The data source is a Polars LazyFrame or Parquet file you'd rather not fully load into memory.
You need to split output across multiple workbooks with max_rows_per_workbook.

Stay with fidelity when the report needs hug sizing, retains merged cells across dataframe-content rows, or uses more than one dataframe-content placeholder per non-repeat sheet.

Troubleshooting¶

Streaming rejected my sheet. Check the three streaming constraints above; usually it's hug sizing or a merged cell sitting on dataframe-content rows.
streaming_engine raised an error. It's only valid in streaming mode (and never for PDF). Remove it for fidelity exports.
I expected one file but got a zip. A sheet crossed max_rows_per_workbook, so the export was split. Inspect the returned list[str].