The Problem
Ever had two tables with the same columns, but only one of them has proper column descriptions? The source table may have all the nice comments like “Unique identifier” or “Date policy was activated”, while the target table is bare.
You could retype those by hand. Or, you could use Databricks to automate it in a few lines.
So the question becomes:
How do I copy column comments from one table to another, for all matching column names in an automated way?
Automating with PySparks
Using a Python notebook in Databricks you can run the code below that will write comments from a source table to a target table. If you aren't familiar with running a Python notebook in Databricks jump to the section below.
Simply replace the value for source_table and target_table with the full path for your table or view. Making sure to include catalog, schema, and table name.
This code will append comments from your source table to your target table. This way if your target table has existing comments that you would like to keep it will append it with the additional comments.
# --- CONFIGURATION ---
source_table = "source_catalog.source_schema.source_table"
target_table = "target_catalog.target_schema.target_table"
separator = " | " # how appended text will be separated
overwrite_if_empty = True # whether to fill in comments if the target has none
# ----------------------
from pyspark.sql.functions import col, lower
# Get source comments
source_cols = (
spark.sql(f"DESCRIBE TABLE EXTENDED {source_table}")
.where("col_name != '' AND data_type IS NOT NULL AND NOT col_name LIKE '#%'")
.selectExpr("col_name", "comment as src_comment")
)
# Get target comments
target_cols = (
spark.sql(f"DESCRIBE TABLE EXTENDED {target_table}")
.where("col_name != '' AND data_type IS NOT NULL AND NOT col_name LIKE '#%'")
.selectExpr("col_name as col_name_target", "comment as tgt_comment")
)
# Match columns by name (case-insensitive)
overlap = (
source_cols.join(
target_cols,
lower(col("col_name")) == lower(col("col_name_target")),
"inner"
)
.select(
source_cols.col_name,
source_cols.src_comment,
target_cols.tgt_comment
)
.where("src_comment IS NOT NULL AND src_comment != ''")
)
rows = overlap.collect()
# Apply updates
for r in rows:
col_name = r["col_name"]
src_comment = (r["src_comment"] or "").strip()
tgt_comment = (r["tgt_comment"] or "").strip()
# Skip if both are empty
if not src_comment and not tgt_comment:
continue
# Append or overwrite as needed
if tgt_comment:
combined = f"{tgt_comment}{separator}{src_comment}"
else:
if overwrite_if_empty:
combined = src_comment
else:
continue
# Escape quotes and execute
combined = combined.replace("'", "''")
stmt = f"COMMENT ON COLUMN {target_table}.{col_name} IS '{combined}'"
print(stmt)
spark.sql(stmt)
What’s Happening Here
DESCRIBE TABLE EXTENDEDis used on both source and target tables to pull their column names and comments.- We join them case-insensitively on column name, keeping only shared columns that have a non-empty comment in the source.
- For each column:
- If the target already has a comment, we append the new one with a separator (default:
" | "). - If the target comment is empty and
overwrite_if_emptyisTrue, we copy the source comment directly.
- If the target already has a comment, we append the new one with a separator (default:
- We then issue a
COMMENT ON COLUMNstatement that writes the combined text back to the target.
Setting Up Your Databricks Notebook and Cluster
Before you even write a line of code, you need to make sure you’re working in the right notebook and cluster environment. Databricks notebooks aren’t just text editors — they run code on a compute cluster, and getting that part right matters.
Here’s what that looks like in practice:
- Create a new notebook
- From the Databricks workspace sidebar, click Workspace → Create → Notebook.
- Give it a clear name (e.g.,
Copy Column Comments) and choose Python as the default language. - You can still mix
%sqlcells later if you want to inspect metadata directly.
- Attach it to a cluster
- In the upper-right of the notebook, there’s a dropdown to select your cluster.
- Choose a running cluster from that list — if none are available, you’ll need to start or request one.
- Check with your admin if you’re unsure
- You might not have permissions to see every cluster, or your workspace could have separate compute environments for dev, staging, or prod.
- Some clusters may not have access to Unity Catalog or the catalogs/schemas you need.
- If you’re not sure which one to use, contact your Databricks admin or the team that owns the workspace. They can tell you:
- Which cluster has access to the source and target tables.
- What size cluster you need — metadata updates are light, so a small shared cluster is usually enough.
- Verify your catalog and schema access
- You can test connectivity by running a quick query:
%sqlyour_catalog.your_schema
SHOW TABLES IN - If that fails, you’re probably on the wrong cluster or missing permissions.
- You can test connectivity by running a quick query:
- Then, start coding
Once your notebook is attached to a cluster that can see both tables, you’re ready to run code to copy or append comments.
