u/BricksterJ — reddlx

Hey r/databricks!

Native Excel ingestion on Databricks is now Generally Available across AWS, Azure, and GCP.

With this release, you can ingest, parse, and query .xls / .xlsx / .xlsm files directly.

Public docs: https://docs.databricks.com/aws/en/query/formats/excel

📂 What is it?

Native Excel support that lets you:

Directly read .xls, .xlsx, and .xlsm files using Spark (spark.read.excel(...)) or SQL (read_files, COPY INTO).
Upload Excel files through the "Create or modify table" UI and land them as Delta.
Specify exact sheets and cell ranges (e.g., "Sheet1!A2:D10") for complex layouts.
Infer schema, headers, and data types automatically, or bring your own.
Stream Excel files with Auto Loader using cloudFiles.format = "excel".
List sheets in a workbook programmatically before ingesting.

🤷 Why?

Until now, Databricks didn't have a native Excel reader. That meant writing custom Python with pandas / openpyxl to convert Excel → DataFrame → Delta, manually exporting sheets to CSV before you could ingest them, or giving up on workflows because the Databricks file-upload UI rejected .xlsx.

GA makes Excel a first-class file format across Spark, SQL, Auto Loader, and the table-creation UI. It also opens the door to Excel ingestion via our managed file connectors (SharePoint, Google Drive, SFTP, and more coming soon).

🧑‍💻 How do I try it?

1️⃣ Requirements

Databricks Runtime 18.1 or above.

2️⃣ Try it in the UI

Click New → Add Data → Create or modify table.
Upload an .xls, .xlsx, or .xlsmfile.
Pick the sheet. Adjust header rows or cell range if needed.
Preview the inferred schema.
Click Create table. It lands as a Delta table in Unity Catalog.

3️⃣ Try it in Spark (batch)

# Read the first sheet of a workbook
df = spark.read.excel("&lt;path to excel file&gt;")

# Use a header row and a specific sheet + range
df = (
  spark.read
    .option("headerRows", 1)
    .option("dataAddress", "Sheet1!A1:E10")
    .excel("&lt;path to excel directory or file&gt;")
)

df.write.mode("overwrite").saveAsTable("&lt;catalog&gt;.&lt;schema&gt;.my_table")

4️⃣ Try it in SQL with read_files

CREATE TABLE my_sheet_table AS
SELECT * FROM read_files(
  "&lt;path to excel directory or file&gt;",
  format              =&gt; "excel",
  headerRows          =&gt; 1,
  dataAddress         =&gt; "Sheet1!A2:D10",
  schemaEvolutionMode =&gt; "none"
);

5️⃣ Try it with COPY INTO

COPY INTO excel_demo_table
FROM "&lt;path to excel directory or file&gt;"
FILEFORMAT = EXCEL;

6️⃣ Try it with Auto Loader (streaming)

df = (
  spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "excel")
    .option("cloudFiles.inferColumnTypes", True)
    .option("headerRows", 1)
    .option("cloudFiles.schemaLocation", "&lt;schema location&gt;")
    .load("&lt;path to excel directory or file&gt;")
)

(df.writeStream
  .format("delta")
  .option("checkpointLocation", "&lt;checkpoint path&gt;")
  .table("&lt;catalog&gt;.&lt;schema&gt;.excel_stream"))

7️⃣ List sheets in a workbook

sheets = (
  spark.read
    .option("operation", "listSheets")
    .excel("&lt;path to workbook&gt;")
)
sheets.show()  # returns sheetIndex, sheetName

🎛️ Supported options

Option	Description
`dataAddress`	Cell range in Excel syntax. Examples: `"MySheet!C5:H10"`, `"C5:H10"`, `"Sheet1"`. Defaults to all valid cells on the first sheet.
`headerRows`	Number of header rows inside `dataAddress` (0 or 1). Default: 0.
`operation`	`"readSheet"` (default) or `"listSheets"`.
`dateFormat`	Custom date format. Default: `yyyy-MM-dd`.
`timestampNTZFormat`	Custom timestamp (no TZ) format. Default: `yyyy-MM-dd'T'HH:mm:ss[.SSS]`.

⚠️ Known limitations + behaviors

Password-protected files are not supported.
One header row max (headerRows = 0 or 1).
"Strict OOXML" format is not supported.
Schema evolution is not supported with Auto Loader streaming.
Merged cells: only the top-left value is retained; other cells in the merge become NULL.
Duplicate column headers are not supported (workaround: headerRows=0 and rename post-read).
.xlsm macros are not evaluated (computed values come through, but macros don't run).

⏭️ What's next?

Writing to Excel files.
Multi-sheet → multi-table ingestion in a single pass.
.xlsb binary format support.
Excel ingestion via managed connectors (SharePoint, Google Drive, SFTP, OneDrive, Box, Dropbox).

💬 Feedback

Drop a comment below or reach out to your Databricks account team. We'd love to hear which Excel workflows you want us to prioritize next.