Transferring Comments from Table to Table in Databricks

The Problem

Ever had two tables with the same columns, but only one of them has proper column descriptions? The source table may have all the nice comments like “Unique identifier” or “Date policy was activated”, while the target table is bare.

You could retype those by hand. Or, you could use Databricks to automate it in a few lines.

So the question becomes:
How do I copy column comments from one table to another, for all matching column names in an automated way?

Automating with PySparks

Using a Python notebook in Databricks you can run the code below that will write comments from a source table to a target table. If you aren't familiar with running a Python notebook in Databricks jump to the section below.

Simply replace the value for source_table and target_table with the full path for your table or view. Making sure to include catalog, schema, and table name.

This code will append comments from your source table to your target table. This way if your target table has existing comments that you would like to keep it will append it with the additional comments.

# --- CONFIGURATION ---
source_table = "source_catalog.source_schema.source_table"
target_table = "target_catalog.target_schema.target_table"
separator = " | "  # how appended text will be separated
overwrite_if_empty = True  # whether to fill in comments if the target has none
# ----------------------

from pyspark.sql.functions import col, lower

# Get source comments
source_cols = (
    spark.sql(f"DESCRIBE TABLE EXTENDED {source_table}")
    .where("col_name != '' AND data_type IS NOT NULL AND NOT col_name LIKE '#%'")
    .selectExpr("col_name", "comment as src_comment")
)

# Get target comments
target_cols = (
    spark.sql(f"DESCRIBE TABLE EXTENDED {target_table}")
    .where("col_name != '' AND data_type IS NOT NULL AND NOT col_name LIKE '#%'")
    .selectExpr("col_name as col_name_target", "comment as tgt_comment")
)

# Match columns by name (case-insensitive)
overlap = (
    source_cols.join(
        target_cols,
        lower(col("col_name")) == lower(col("col_name_target")),
        "inner"
    )
    .select(
        source_cols.col_name,
        source_cols.src_comment,
        target_cols.tgt_comment
    )
    .where("src_comment IS NOT NULL AND src_comment != ''")
)

rows = overlap.collect()

# Apply updates
for r in rows:
    col_name = r["col_name"]
    src_comment = (r["src_comment"] or "").strip()
    tgt_comment = (r["tgt_comment"] or "").strip()

    # Skip if both are empty
    if not src_comment and not tgt_comment:
        continue

    # Append or overwrite as needed
    if tgt_comment:
        combined = f"{tgt_comment}{separator}{src_comment}"
    else:
        if overwrite_if_empty:
            combined = src_comment
        else:
            continue

    # Escape quotes and execute
    combined = combined.replace("'", "''")
    stmt = f"COMMENT ON COLUMN {target_table}.{col_name} IS '{combined}'"
    print(stmt)
    spark.sql(stmt)

What’s Happening Here

  1. DESCRIBE TABLE EXTENDED is used on both source and target tables to pull their column names and comments.
  2. We join them case-insensitively on column name, keeping only shared columns that have a non-empty comment in the source.
  3. For each column:
    • If the target already has a comment, we append the new one with a separator (default: " | ").
    • If the target comment is empty and overwrite_if_empty is True, we copy the source comment directly.
  4. We then issue a COMMENT ON COLUMN statement that writes the combined text back to the target.

Setting Up Your Databricks Notebook and Cluster

Before you even write a line of code, you need to make sure you’re working in the right notebook and cluster environment. Databricks notebooks aren’t just text editors — they run code on a compute cluster, and getting that part right matters.

Here’s what that looks like in practice:

  1. Create a new notebook
    • From the Databricks workspace sidebar, click Workspace → Create → Notebook.
    • Give it a clear name (e.g., Copy Column Comments) and choose Python as the default language.
    • You can still mix %sql cells later if you want to inspect metadata directly.
  2. Attach it to a cluster
    • In the upper-right of the notebook, there’s a dropdown to select your cluster.
    • Choose a running cluster from that list — if none are available, you’ll need to start or request one.
  3. Check with your admin if you’re unsure
    • You might not have permissions to see every cluster, or your workspace could have separate compute environments for dev, staging, or prod.
    • Some clusters may not have access to Unity Catalog or the catalogs/schemas you need.
    • If you’re not sure which one to use, contact your Databricks admin or the team that owns the workspace. They can tell you:
      • Which cluster has access to the source and target tables.
      • What size cluster you need — metadata updates are light, so a small shared cluster is usually enough.
  4. Verify your catalog and schema access
    • You can test connectivity by running a quick query:%sql
      SHOW TABLES IN
      your_catalog.your_schema
    • If that fails, you’re probably on the wrong cluster or missing permissions.
  5. Then, start coding
    Once your notebook is attached to a cluster that can see both tables, you’re ready to run code to copy or append comments.
Author:
Salome Grasland
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab