Exporting to Excel (XLSX)¶
Once you have a bundle, exporting to Excel is a single call. The interesting choice is how: a full-fidelity render that supports every feature, or a streaming render that holds memory flat no matter how many rows you throw at it.
mo_dataport.export(bundle, "report.xlsx") # fidelity (default)
mo_dataport.export(bundle, "report.xlsx", export_mode="streaming")
Prerequisites¶
- A compiled
ReportBundleor a path to a persisted bundle directory (see Compiling Report Bundles) - Polars installed if the bundle contains dataframe sources:
pip install "mindoff-dataport[polars]"
Implementation¶
1. Choose an Export Mode¶
| Mode | Best for | Trade-off |
|---|---|---|
"fidelity" |
Up to 50K rows, full styling | Renders in memory; supports every feature, no constraints |
"streaming" |
Large or unbounded datasets | Near-constant memory; a few feature limits (listed below) |
Fidelity is the default because most reports are small and want every feature. Reach for streaming when row counts climb into the hundreds of thousands and memory becomes the bottleneck. Under the hood, streaming reads Parquet in batches and writes rows incrementally, so peak memory barely moves as the dataset grows. The numbers are in Architecture: Benchmarking.
Streaming trades a little flexibility for that flat memory profile:
- No
hugsizing (it would require buffering every row to measure content). - No merged cells may remain intersecting
dataframe-contentrows after compile-time shifting. - Only one
dataframe-contentplaceholder per non-repeat sheet.
If your report fits within these, streaming is a free win on memory.
2. Configure Export Options¶
All options are keyword arguments to export().
| Option | Type | Default | Description |
|---|---|---|---|
export_mode |
str |
"fidelity" |
"fidelity" (full in-memory render) or "streaming" (row-by-row, lower memory) |
streaming_engine |
str |
"openpyxl" |
Streaming writer backend: "openpyxl" or "xlsxwriter". Rejected in fidelity mode |
column_width_mode |
str |
schema value | "fixed", "even", or "hug". Overrides the template schema. See Sizing & Styling |
row_height_mode |
str |
schema value | "fixed", "even", or "hug". Overrides the template schema |
default_column_width |
float |
schema value | Fallback width (Excel character units) for "even" mode or when no width is stored |
default_row_height |
float |
schema value | Fallback height (points) for "even" mode or when no height is stored |
streaming_chunk_rows |
int |
50000 |
Parquet rows read per batch during streaming |
max_rows_per_workbook |
int |
1048576 |
Split output into multiple .xlsx parts when a sheet reaches this row limit |
auto_delete_bundle |
bool |
False |
Delete the bundle directory after a successful export |
Which engine?
OpenPyXL is the default streaming engine and the safe choice. xlsxwriter is opt-in and can be faster for some pure-write workloads; benchmark both against your own data if write speed is critical.
3. Split Large Output¶
Excel caps a worksheet at 1,048,576 rows. When you're exporting beyond that (or just want manageable file sizes), set max_rows_per_workbook. In streaming mode, the library writes the workbook parts, bundles them into output.zip, deletes the loose parts, and returns a one-item list[str] containing the zip path.
outputs = mo_dataport.export(
bundle,
"output.xlsx",
export_mode="streaming",
max_rows_per_workbook=500_000,
)
# outputs -> ["output.zip"] when a split occurred
4. Understand the Return Value¶
- Fidelity XLSX returns
None. - Streaming XLSX returns a
list[str]: a single workbook path when no split is needed, or one.zippath when the output was split.
Core Concepts¶
1. How Streaming Achieves Flat Memory¶
Streaming reads dataframe-content from Parquet in batches (controlled by streaming_chunk_rows, default 50K rows) and writes each batch to the workbook immediately. At any point, only one batch is in memory. That's why peak RSS barely moves as dataset size grows. Fidelity, by contrast, expands all rows before writing, so memory scales with row count. The numbers are in Architecture: Benchmarking.
2. When to Choose Streaming over Fidelity¶
Row count alone isn't the deciding factor. Reach for streaming when:
- Row counts exceed tens of thousands and memory is a constraint.
- The data source is a Polars
LazyFrameor Parquet file you'd rather not fully load into memory. - You need to split output across multiple workbooks with
max_rows_per_workbook.
Stay with fidelity when the report needs hug sizing, retains merged cells across dataframe-content rows, or uses more than one dataframe-content placeholder per non-repeat sheet.
Troubleshooting¶
- Streaming rejected my sheet.
Check the three streaming constraints above; usually it's
hugsizing or a merged cell sitting ondataframe-contentrows. streaming_engineraised an error. It's only valid in streaming mode (and never for PDF). Remove it for fidelity exports.- I expected one file but got a zip.
A sheet crossed
max_rows_per_workbook, so the export was split. Inspect the returnedlist[str].