<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <!-- Source: https://rendiment.io/feed.xml -->
  <generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator>
  <link href="https://siftrss.com/f/03QdNQW78K" rel="self" type="application/atom+xml"/>
  <link href="https://rendiment.io/" rel="alternate" type="text/html"/>
  <updated>2026-03-31T10:06:13-04:00</updated>
  <id>https://siftrss.com/f/03QdNQW78K</id>
  <title type="html">Rendiment</title>
  <subtitle>Rendiment aims to help the community to improve the performance of the databases, with a focus on MySQL and PostgreSQL</subtitle>
  <author>
    <name>Daniel Guzman Burgos</name>
  </author>
  <entry>
    <title type="html">PostgreSQL Internals for the MySQL DBA</title>
    <link href="https://rendiment.io/postgresql/mysql/2026/03/31/postgresql-internals-for-mysql-dba.html" rel="alternate" type="text/html" title="PostgreSQL Internals for the MySQL DBA"/>
    <published>2026-03-31T00:00:00-04:00</published>
    <updated>2026-03-31T00:00:00-04:00</updated>
    <id>https://rendiment.io/postgresql/mysql/2026/03/31/postgresql-internals-for-mysql-dba</id>
    <content type="html" xml:base="https://rendiment.io/postgresql/mysql/2026/03/31/postgresql-internals-for-mysql-dba.html"><![CDATA[<p>I’ve been a MySQL DBA for 20 years. InnoDB internals, replication topologies, fleet management at scale. My entire mental model of how a relational database works was built on MySQL.</p>

<p>I’ve also been using PostgreSQL for about six years now. But using it and <a href="/postgresql/2026/01/21/pgtrgm-pgvector-music.html">understanding its internals</a> are different things. I knew how to write queries and stand up a server. I didn’t have a clear picture of what Postgres was actually doing under the hood, or why it made the design choices it made. So I decided to dig in. And I found that every explanation out there falls into one of two buckets: either it’s a “Postgres vs MySQL” comparison written by someone who clearly picked a side, or it’s a Postgres tutorial that starts from “what is a table.” Neither is useful when you already know what a clustered index is and you want to understand why Postgres doesn’t have one.</p>

<p>This post is what I wish existed when I started: PostgreSQL internals explained from the perspective of someone who already knows MySQL. Every concept mapped to its InnoDB equivalent. No fanboy energy. Just the architecture.</p>

<hr />

<h2 id="mvcc-two-philosophies-same-goal">MVCC: Two philosophies, same goal</h2>

<p>Both InnoDB and PostgreSQL implement MVCC. Readers don’t block writers, writers don’t block readers. Same goal. Completely different implementations.</p>

<p><strong>What you already know from InnoDB:</strong></p>

<p>When you run an UPDATE, InnoDB modifies the row <strong>in-place</strong> in the buffer pool page. The previous version gets copied to the <strong>undo log</strong> (rollback segment). The modified page is marked dirty and eventually flushed to the tablespace by a checkpoint or background flushing. The redo log guarantees durability before the flush happens. If another transaction needs the old version, InnoDB follows the pointer chain in the undo log until it finds the right one. When no active transaction needs those old versions anymore, the <strong>purge thread</strong> cleans them up quietly in the background.</p>

<p>The key insight: the table (the clustered index) always contains only the latest version. History lives elsewhere.</p>

<p><strong>How PostgreSQL does it:</strong></p>

<p>Postgres does the opposite. When you run an UPDATE, the original row stays untouched. Postgres writes a <strong>brand new copy</strong> of the entire row into the same table. The old row gets metadata that says “I was killed by transaction Y.” The new row gets metadata that says “I was born in transaction Y.”</p>

<p>Every row (Postgres calls them “tuples”) carries two internal fields: <code class="language-plaintext highlighter-rouge">xmin</code> (the transaction that created it) and <code class="language-plaintext highlighter-rouge">xmax</code> (the transaction that deleted or updated it). A live row has an <code class="language-plaintext highlighter-rouge">xmin</code> but no effective <code class="language-plaintext highlighter-rouge">xmax</code>. A dead row has both.</p>

<p>So the table itself is a graveyard of versions. Current rows, dead rows, all mixed together in the same physical structure. When a transaction needs to read data, it checks each tuple’s <code class="language-plaintext highlighter-rouge">xmin</code> and <code class="language-plaintext highlighter-rouge">xmax</code> against its own snapshot to decide: “can I see this version?”</p>

<p>Think of it this way. In InnoDB, the undo log is an archive in the basement. The table is the office, and it always has the current document on the desk. In Postgres, there is no basement. Every version of every document, current and historical, is piled on the same desk. Each one has a sticky note that says when it was valid.</p>

<hr />

<h2 id="the-heap-why-theres-no-clustered-index">The heap: why there’s no clustered index</h2>

<p>In InnoDB, the primary key index <strong>is</strong> the table. The B+Tree leaf nodes contain the actual row data, physically sorted by the primary key. This is the <a href="/mysql/fundamentals/2026/03/03/before-the-index-mysql-three-promises.html">clustered index</a>. When you do a SELECT by PK, InnoDB walks the tree and lands directly on the data. A secondary index stores the PK value as a pointer, so a secondary index lookup means: find the PK in the secondary index, then look up the PK in the clustered index to get the row.</p>

<p>Postgres doesn’t have this concept. The table data lives in a structure called the <strong>heap</strong>. It’s a file where tuples are written in whatever order they arrive, page by page. Pages are 8KB (vs InnoDB’s 16KB). No sorting by any key. No tree structure. Just pages with tuple slots, filled wherever there’s room.</p>

<p>When you create any index in Postgres, including the primary key index, it’s a <strong>separate B-Tree</strong> that points to the physical location of the tuple in the heap. The pointer is a <code class="language-plaintext highlighter-rouge">ctid</code> (a page number and slot offset, like “page 42, slot 7”). Every index in Postgres is essentially what InnoDB would call a secondary index, except instead of pointing to a PK value, it points to a physical address.</p>

<p>The consequences for operations you already understand:</p>

<p><strong>Range scans on the PK.</strong> In InnoDB, a range scan on the PK is a sequential read through contiguous leaf pages. Fast, because the data is physically ordered. In Postgres, even if you scan the PK index in order, each tuple you find could be anywhere in the heap. You’re doing random I/O into the heap for each row. Postgres has a “bitmap heap scan” optimization that collects all the heap addresses first and then reads them in physical order, but it’s still fundamentally different from InnoDB’s clustered access pattern.</p>

<p><strong>UPDATEs and indexes.</strong> In InnoDB, when you update a non-indexed column, only the clustered index page changes. Secondary indexes still point to the same PK, which still points to the same row (now updated in place). In Postgres, because an UPDATE creates a new tuple at a new physical location, <strong>every index</strong> on that table needs a new entry pointing to the new location. This is expensive. Postgres mitigates this with HOT (Heap-Only Tuple) updates: if the new tuple fits in the same heap page and you didn’t change any indexed column, Postgres can skip the index updates and chain the old tuple to the new one within the page. But the moment you change an indexed column or the page is full, you pay the full cost.</p>

<p><strong>The CLUSTER command.</strong> Postgres has a <code class="language-plaintext highlighter-rouge">CLUSTER</code> command that physically reorders the heap to match an index, once. It doesn’t maintain the order over time. After a few thousand inserts and updates, the heap is scattered again. It’s not a clustered index. It’s a one-time defragmentation.</p>

<hr />

<h2 id="vacuum-the-price-of-keeping-history-in-the-table">VACUUM: the price of keeping history in the table</h2>

<p>In InnoDB, the purge thread runs automatically in the background. It cleans up undo log entries that no active transaction needs. You almost never think about it. The worst case is when a very long-running transaction holds back the purge (the <a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">history list length grows</a>), but even then the problem is contained to the undo tablespace.</p>

<p>In Postgres, the equivalent is <strong>VACUUM</strong>. Because dead tuples live in the table itself, someone has to come along and mark that space as reusable. That someone is VACUUM (or autovacuum, the background process that triggers it automatically).</p>

<p>VACUUM scans the table, identifies tuples that are invisible to all current transactions, and marks their space as available for new inserts. It doesn’t return the space to the OS by default. The table file stays the same size, but the free space inside it gets reused. This is why Postgres tables can “bloat”: if VACUUM can’t keep up, or if something prevents it from cleaning (like a long-running transaction holding back the visibility horizon), dead tuples accumulate and the table grows beyond what the live data needs.</p>

<p>The closest InnoDB analogy is imagining that the undo log didn’t exist as a separate structure, and instead old row versions accumulated directly inside the <code class="language-plaintext highlighter-rouge">.ibd</code> file. And then the purge thread had to walk the entire file to find and reclaim the dead ones. That’s basically what VACUUM does.</p>

<p><strong>VACUUM FULL</strong> is the nuclear option. It rewrites the entire table into a new file, copying only live tuples. Reclaims all the space. But it takes an <code class="language-plaintext highlighter-rouge">ACCESS EXCLUSIVE</code> lock, which means nothing can read or write the table while it runs. On a busy production table, this is rarely an option.</p>

<p>There’s a parameter called <code class="language-plaintext highlighter-rouge">idle_in_transaction_session_timeout</code> that you’ll want to know about. A session that starts a transaction and then idles holds back the visibility horizon for the entire database, preventing VACUUM from cleaning any tuples created after that transaction started. It’s the Postgres equivalent of a long-running transaction inflating the history list length in InnoDB, except the consequences are worse because the bloat is inside every table.</p>

<hr />

<h2 id="wal-and-logical-decoding-the-postgres-binlog">WAL and logical decoding: the Postgres binlog</h2>

<p>The WAL (Write-Ahead Log) is the Postgres equivalent of the redo log and binlog combined. Every change is written to WAL before it’s applied to the data files. It handles crash recovery (like the redo log) and replication (like the binlog).</p>

<p>For CDC and change data capture, what matters is <strong>logical decoding</strong>. This is the mechanism that reads WAL records and converts them into logical change events (INSERT, UPDATE, DELETE with row data). You consume these events through a <strong>replication slot</strong> with an <strong>output plugin</strong> (<code class="language-plaintext highlighter-rouge">pgoutput</code>, <code class="language-plaintext highlighter-rouge">wal2json</code>, <code class="language-plaintext highlighter-rouge">test_decoding</code>).</p>

<p>The replication slot tells Postgres: “don’t discard WAL segments past this point, I have a consumer that hasn’t caught up yet.” Same concept as MySQL retaining binary logs until all replicas have consumed them.</p>

<h3 id="replica-identity-the-postgres-binlog_row_image">REPLICA IDENTITY: the Postgres binlog_row_image</h3>

<p>In MySQL, <code class="language-plaintext highlighter-rouge">binlog_row_image</code> controls what data gets written to the binlog for each row event. FULL writes all columns before and after. MINIMAL writes only what’s needed to identify and apply the change.</p>

<p>Postgres has <code class="language-plaintext highlighter-rouge">REPLICA IDENTITY</code>, which is set per table:</p>

<ul>
  <li><strong>DEFAULT</strong>: Uses the primary key columns as the row identifier. For UPDATEs, you get all new column values plus only the PK as the “old key.” For DELETEs, you get only the PK of the deleted row.</li>
  <li><strong>FULL</strong>: The entire old row is included. All columns, before and after. This is what you need if you want complete before/after images for CDC.</li>
  <li><strong>USING INDEX</strong>: Uses a specific unique index as the identifier instead of the PK.</li>
  <li><strong>NOTHING</strong>: No row identification. UPDATEs and DELETEs can’t be replicated.</li>
</ul>

<p>The comparison to <code class="language-plaintext highlighter-rouge">binlog_row_image</code> is not exact. <code class="language-plaintext highlighter-rouge">binlog_row_image</code> controls what gets logged for all purposes. <code class="language-plaintext highlighter-rouge">REPLICA IDENTITY</code> is specifically about how the subscriber (or CDC consumer) identifies which row was affected. But the practical effect for building a change stream is similar.</p>

<h3 id="toast-the-large-value-problem">TOAST: the large-value problem</h3>

<p>Here’s something that has no direct parallel in the binlog world. Postgres pages are 8KB. A tuple can’t span pages. So when a column value exceeds roughly 2KB, Postgres moves it to a separate <strong>TOAST table</strong> (The Oversized-Attribute Storage Technique) and leaves a pointer in the main tuple.</p>

<p>Think of it like InnoDB’s overflow pages for BLOB/TEXT columns with DYNAMIC row format, where large values get stored off-page.</p>

<p>The CDC implication: when you do an UPDATE that doesn’t touch a TOASTed column, Postgres doesn’t include that column’s value in the WAL event. Your CDC consumer gets a placeholder that means “unchanged.” With <code class="language-plaintext highlighter-rouge">REPLICA IDENTITY FULL</code>, the old values including TOAST are included, but at the cost of more WAL volume and I/O.</p>

<p>In MySQL with <code class="language-plaintext highlighter-rouge">binlog_row_image=FULL</code>, every column is always included regardless of size. No special handling needed. One less thing to worry about.</p>

<hr />

<h2 id="replication-why-the-standby-cant-have-its-own-mvcc">Replication: why the standby can’t have its own MVCC</h2>

<p>This is where the architectural difference between MySQL and Postgres replication becomes most visible.</p>

<h3 id="physical-replication">Physical replication</h3>

<p>Postgres streaming replication works by shipping WAL records to a standby server, which applies them byte by byte. The standby’s data files become a physical copy of the primary. When a WAL record says “write these bytes to page 42 of table X”, the standby does exactly that.</p>

<p>If you enable <code class="language-plaintext highlighter-rouge">hot_standby = on</code>, you can run read queries against the standby. Those queries need MVCC to get a consistent snapshot. Here’s the problem: the standby doesn’t have its own MVCC machinery. It doesn’t generate its own tuple versions. It just replays what the primary did.</p>

<p>In MySQL, a replica applies relay log events using InnoDB. InnoDB on the replica generates its own undo log, manages its own buffer pool, handles its own MVCC. The replica is a sovereign database engine that happens to receive instructions from the primary. The purge thread on the primary can clean up whatever it wants, and the replica doesn’t care because it has its own undo log for its own read queries.</p>

<p>In Postgres, the standby has no independent version management. When a read query on the standby needs to see a tuple that was alive at time T, it relies on that tuple still existing in the heap. But if the primary ran VACUUM and removed that tuple, and the WAL replay applied that VACUUM to the standby, the tuple is gone. The query fails with a replication conflict.</p>

<p>Postgres resolves this in one of three ways:</p>

<ol>
  <li>Cancel the query on the standby (the default).</li>
  <li>Delay WAL replay to give standby queries time to finish (<code class="language-plaintext highlighter-rouge">max_standby_streaming_delay</code>).</li>
  <li>Have the standby report its oldest needed xmin to the primary with <code class="language-plaintext highlighter-rouge">hot_standby_feedback = on</code>, so the primary delays VACUUM for those tuples.</li>
</ol>

<p>Option 3 sounds good until you realize it means slow queries on the standby can cause bloat on the primary. A read query 3,000 miles away can prevent VACUUM from running on your primary server. In MySQL, this problem literally cannot exist because the replica’s MVCC is independent.</p>

<h3 id="logical-replication">Logical replication</h3>

<p>Postgres also has logical replication, which decodes the WAL into logical changes (INSERT, UPDATE, DELETE) and sends them to a subscriber. This is conceptually closer to MySQL’s row-based replication. The subscriber has its own tables, its own heap, its own MVCC. It applies the changes as regular SQL operations.</p>

<p>A logical replication slot retains WAL segments until the consumer catches up. If the consumer is down or lagging, WAL accumulates on disk. The slot also holds a <code class="language-plaintext highlighter-rouge">catalog_xmin</code> that prevents VACUUM from cleaning dead tuples in the <strong>system catalog tables</strong> (pg_class, pg_attribute, etc.) because the logical decoder needs the old catalog state to correctly decode WAL records from the past.</p>

<p>Important distinction: a logical slot’s <code class="language-plaintext highlighter-rouge">catalog_xmin</code> only blocks VACUUM on system catalog tables, not on your user tables. Your regular tables get vacuumed normally. This is different from physical replication slots (with <code class="language-plaintext highlighter-rouge">hot_standby_feedback</code>), which can block VACUUM on user tables too.</p>

<p>An abandoned logical replication slot causes two problems: unbounded WAL accumulation on disk, and bloat in the system catalogs. In extreme cases, the catalog bloat can become severe enough to threaten transaction ID wraparound. Always monitor your replication slots. Always drop the ones you’re not using.</p>

<hr />

<h2 id="the-operational-lens">The operational lens</h2>

<p>Let me be direct about what I think after studying this.</p>

<p>MySQL’s architecture makes the DBA’s operational life easier in several specific ways. The purge thread is invisible. The clustered index eliminates a whole class of random I/O problems. The binlog is self-contained and doesn’t need to reference the data dictionary to decode events from the past. Replication is logically independent on each node.</p>

<p>Postgres gives the developer more features out of the box. Richer type system, better JSON support, <a href="/postgresql/2026/01/21/pgtrgm-pgvector-music.html">extensions like PostGIS</a>, more standard SQL compliance. For the person writing queries and designing schemas, Postgres often feels more complete.</p>

<p>The trade-off is operational complexity. VACUUM tuning is a real discipline. Bloat monitoring requires active attention. Replication conflicts on hot standbys are a problem MySQL DBAs have never had to think about. TOAST adds a layer of complexity to CDC that doesn’t exist with the MySQL binlog.</p>

<p>Neither system is better. They made different <a href="/mysql/fundamentals/2026/03/03/before-the-index-mysql-three-promises.html">architectural bets</a>. Understanding both makes you better at each one, because you stop taking your database’s design choices for granted and start seeing them as trade-offs that could have gone the other way.</p>

<hr />

<p><em>I’m a MySQL DBA by trade. This is what I found when I stopped using PostgreSQL on autopilot and started reading its source of truth.</em></p>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="postgresql"/>
    <category term="mysql"/>
    <category term="postgresql"/>
    <category term="mysql"/>
    <category term="internals"/>
    <category term="mvcc"/>
    <category term="vacuum"/>
    <category term="replication"/>
    <category term="architecture"/>
    <summary type="html"><![CDATA[PostgreSQL internals explained for MySQL DBAs. MVCC, heap vs clustered index, VACUUM, WAL, and replication, all mapped to their InnoDB equivalents.]]></summary>
  </entry>
  <entry>
    <title type="html">The Claude Code Engineer</title>
    <link href="https://rendiment.io/engineering/2026/03/08/claude-code-engineer-new-role.html" rel="alternate" type="text/html" title="The Claude Code Engineer"/>
    <published>2026-03-08T00:00:00-05:00</published>
    <updated>2026-03-08T00:00:00-05:00</updated>
    <id>https://rendiment.io/engineering/2026/03/08/claude-code-engineer-new-role</id>
    <content type="html" xml:base="https://rendiment.io/engineering/2026/03/08/claude-code-engineer-new-role.html"><![CDATA[<p>There is a new type of job showing up. Maybe it already exists and nobody named it yet. But the pattern is clear, and I think it deserves a name: <strong>Claude Code Engineer</strong>.</p>

<p>This is not a developer. Not exactly. Not a project manager either. And definitely not a prompt engineer (that term needs to retire already). The Claude Code Engineer is the person who sits between the backlog and production, and their main tool is not an IDE. It’s an agentic coding assistant.</p>

<h2 id="the-loop">The loop</h2>

<p>Here’s what a Claude Code Engineer does, every day:</p>

<ol>
  <li><strong>Picks up issues.</strong> From GitHub, Jira, Linear, wherever. They tell Claude Code to read the issue and come up with a plan.</li>
  <li><strong>Reviews the plan.</strong> They don’t approve blindly. They read the plan, understand the trade-offs, and leave a comment on the ticket explaining why they approved, changed, or rejected it. This means the engineer needs to understand the code <em>and</em> the business.</li>
  <li><strong>Enforces documentation.</strong> Every issue must produce or update documentation. No exceptions.</li>
  <li><strong>Enforces testing.</strong> Unit tests, integration tests, regression tests, e2e. The engineer decides what coverage the change needs and makes sure Claude Code generates it.</li>
  <li><strong>Runs the tests and does QA.</strong> They run the full test suite, review the output, and do manual QA. Last checkpoint before merge.</li>
  <li><strong>Manages the git workflow.</strong> Worktrees, pull requests, clean commit history. The engineer handles the merge when everything is green.</li>
</ol>

<p>That’s the loop. Issue, plan, review, docs, tests, QA, merge. Every day. Multiple issues in parallel.</p>

<!-- TODO: Diagram showing the loop as a cycle: Issue → Plan → Review → Docs → Tests → QA → Merge → back to Issue. Clean, minimal style. Could be a simple flowchart or circular diagram. -->
<p><img src="/assets/img/gallery/claude-code-engineer-loop.jpg" alt="The Claude Code Engineer loop: from issue to merge" /></p>

<h2 id="why-qa-is-the-main-skill">Why QA is the main skill</h2>

<p>If I had to rank the skills needed for this role: QA first, product second, dev third.</p>

<p>Claude Code can write code. It can also write tests, but it will write the tests that pass, not the tests that <em>should</em> pass. The difference between those two is the whole point of quality assurance.</p>

<p>The Claude Code Engineer needs to think about edge cases the LLM didn’t consider and how the change interacts with what already exists. They need to be the person who says “what happens when the input is null and the user is unauthenticated and the database is in read-only mode?” and then makes Claude Code write that test.</p>

<p>This is not a junior role. This is someone who has broken enough things in production to know where things break.</p>

<h2 id="why-git-worktrees-not-just-branches">Why git worktrees, not just branches</h2>

<p>Quick detour: a git repo only has one working directory. If Claude Code is halfway through a feature, you can’t switch branches and start another task. You’re stuck waiting.</p>

<p>Worktrees fix this. Multiple working directories, each on its own branch, all sharing the same repo history. No cloning the repo five times. Claude Code has <a href="https://code.claude.com/docs/en/common-workflows">built-in worktree support</a>: <code class="language-plaintext highlighter-rouge">claude --worktree feature-auth</code> and you’re running. The team at incident.io <a href="https://incident.io/blog/shipping-faster-with-claude-code-and-git-worktrees">wrote about their experience</a> running four or five Claude agents at the same time this way. It’s what makes the Claude Code Engineer role possible: one person coordinating multiple parallel AI sessions, each working on a different issue.</p>

<!-- TODO: Diagram showing one engineer with 3-4 parallel worktrees, each on a different branch/issue. Shows the "one person, multiple AI sessions" concept visually. -->
<p><img src="/assets/img/gallery/claude-code-engineer-worktrees.jpg" alt="One engineer running multiple parallel worktrees, each working on a different issue" /></p>

<h2 id="the-issue-architect">The Issue Architect</h2>

<p>Someone has to create those issues. The <strong>Issue Architect</strong> has a product/infrastructure/architecture profile. They think in milestones and epics. They break down big initiatives into small, well-scoped issues that can be worked on independently and <strong>in parallel</strong>.</p>

<p>Why parallel? Because if the issues are well-structured, there’s no reason you can’t have multiple worktrees running at the same time. The bottleneck is no longer “how fast can we type code.” The bottleneck is “how well did we define the work.”</p>

<p>The Issue Architect’s value is in knowing how to split things up. How to break a large feature into independent pieces that don’t create merge conflicts and can be tested on their own. That skill becomes 10x more valuable when your execution speed goes way up.</p>

<h2 id="the-toolsmith">The Toolsmith</h2>

<p>The <strong>Claude Code Toolsmith</strong> builds custom plugins, skills, CLAUDE.md files, and hooks that put the team’s knowledge into Claude Code’s workflow: code review checklists, database safety rules, changelog validation, naming conventions, QA templates built from real production incidents.</p>

<p>They make sure that when Claude Code writes code for <em>your</em> company, it writes it the way <em>your</em> company writes code. Not generic code. Not “best practices from Stack Overflow” code. <em>Your</em> code.</p>

<h2 id="what-about-the-people-who-arent-developers">What about the people who aren’t developers?</h2>

<p>I’m a DBA. I don’t write great code. I know databases, a lot, and some adjacent stuff: devops, linux, monitoring. So when I look at these roles, I ask myself: where do I fit?</p>

<p>AI can write a migration. It can generate an <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> that looks perfectly fine. But it has no idea that your table has 500 million rows and that ALTER will lock it for 20 minutes during peak traffic. It doesn’t know that your replication topology means that DDL needs to run through <code class="language-plaintext highlighter-rouge">pt-online-schema-change</code>, not as a direct statement. It doesn’t know that <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code> on that particular table will <a href="/mysql/tools/2026/02/25/mysql-fk-cascade-blind-spot.html">silently wipe half your data</a> if someone deletes the wrong parent row. The DBA knows that.</p>

<p>In this model, the specialist becomes the most important kind of Toolsmith. The DBA builds the skill that flags DDL on tables over 10M rows. The SRE builds the hook that validates resource limits. The security engineer builds the check that scans for hardcoded secrets.</p>

<p>And yes, I see the irony. I’m building <a href="https://github.com/nethalo/dbsafe">dbsafe</a> that does exactly this: it checks DDL and DML for dangerous patterns so that developers (and AI) don’t <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">blow up the database</a>. I’m literally teaching the machine everything I know. At some point, someone will say “if the tool catches all the problems, why do we need the DBA?”</p>

<p>It’s uncomfortable. But Terraform exists and infrastructure engineers didn’t disappear. Kubernetes exists and SREs didn’t disappear. The tool automates the execution, but someone still has to decide <em>what</em> to execute and <em>why</em>.</p>

<p>dbsafe started by detecting DDL algorithms and warning about table locks. Then I had to add topology awareness for PXC clusters. Then <a href="/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe.html">foreign key impact analysis</a>. Then <a href="/mysql/tools/2026/03/06/dbsafe-cloud-aurora-rds-tls.html">Aurora and RDS support</a>. Each one was a new guardrail that only existed because production taught me a new lesson.</p>

<p>And then there’s work no tool covers. Migrating a fleet of MySQL master-slave clusters to Percona XtraDB Cluster with ProxySQL: months of planning, fixing tables with no primary key, building monitoring dashboards, writing failover runbooks, dealing with the 15 things that go wrong that you didn’t plan for. Could Claude Code help with parts of that? Probably yes. Could it plan and execute the whole thing? No. Half of it is judgment calls that depend on knowing your specific infrastructure.</p>

<p>The tool catches what you taught it to catch. It doesn’t catch what you haven’t seen yet. I’d rather be the person who built the guardrail than the person who gets called at 3 AM because nobody did.</p>

<h2 id="the-elephant-in-the-room-what-about-juniors">The elephant in the room: what about juniors?</h2>

<p>Salesforce announced it would halt junior hiring for 2025, saying AI made them productive enough. Then they <a href="https://codeconductor.ai/blog/future-of-junior-developers-ai/">reversed course</a> when the strategy failed. That reversal matters.</p>

<p>Still, the landscape is not great. A <a href="https://digitaleconomy.stanford.edu/wp-content/uploads/2025/08/Canaries_BrynjolfssonChandarChen.pdf">Stanford/ADP study</a> found developers aged 22-25 lost nearly 20% of their jobs since late 2022. <a href="https://www.signalfire.com/blog/signalfire-state-of-talent-report-2025">SignalFire</a> shows new grad hiring at the top 15 tech companies is down over 50% from pre-pandemic levels. <a href="https://leaddev.com/the-ai-impact-report-2025">LeadDev</a> found 54% of engineering leaders expect junior hiring to go down because of AI tools.</p>

<p>But that logic has a problem: <strong>if you don’t hire juniors today, you won’t have seniors tomorrow.</strong> As the <a href="https://stackoverflow.blog/2025/12/26/ai-vs-gen-z/">Stack Overflow blog</a> put it: “if you don’t hire junior developers, you’ll someday never have senior developers.”</p>

<p>The Claude Code Engineer role is a new entry point. Instead of “write this CRUD endpoint,” it becomes “review this CRUD endpoint that Claude Code wrote, check the edge cases, make sure the docs match.” Different skill. Learnable. And I’d say it teaches engineering judgment faster than writing boilerplate ever did.</p>

<h2 id="the-other-elephant-what-about-seniors">The other elephant: what about seniors?</h2>

<p>“If a mid-level person can review AI output, why are we paying senior salaries?”</p>

<p>Some companies will try this. They will fire seniors, hire cheaper people to “operate” AI tools, and call it efficiency. It will blow up.</p>

<p>The Claude Code Engineer reviews the output. But someone has to define the architecture, build the skills and plugins (the Toolsmith), break down the roadmap into good issues (the Issue Architect), and make the call when the AI proposes something that breaks a constraint only 10 years of production experience would catch. That’s the senior.</p>

<p>The senior’s value goes up in this model. Their architecture decisions shape every feature that every Claude Code Engineer ships. Their output is no longer measured in lines of code. It’s measured in how many bad decisions they prevented the AI from making.</p>

<p>Companies that cut seniors will ship fast and break things. Companies that keep seniors and give them leverage will ship fast and break less.</p>

<h2 id="what-this-looks-like-in-practice">What this looks like in practice</h2>

<p>Imagine a team of five:</p>

<ul>
  <li><strong>1 Issue Architect</strong> who breaks down the roadmap into well-scoped parallel issues</li>
  <li><strong>2 Claude Code Engineers</strong> who each run 3-5 parallel worktrees, reviewing plans, enforcing quality, and merging</li>
  <li><strong>1 Toolsmith</strong> who builds the internal skills and safety checks</li>
  <li><strong>1 senior engineer</strong> who handles system design, incident response, performance tuning, security reviews</li>
</ul>

<p>That team of five ships what used to require fifteen. Not because the other ten got fired, but because the work changed shape.</p>

<!-- TODO: Team structure diagram showing the 5 roles: Issue Architect at top feeding issues down, 2 Claude Code Engineers in the middle each with multiple worktree lines, Toolsmith on the side feeding skills/hooks into the workflow, Senior Engineer as the architectural foundation. -->
<p><img src="/assets/img/gallery/claude-code-engineer-team.jpg" alt="A five-person team structure: Issue Architect, two Claude Code Engineers, Toolsmith, and senior engineer" /></p>

<h2 id="the-inevitable-objection">The inevitable objection</h2>

<p>“But what if Claude Code gets good enough that you don’t need the Engineer?”</p>

<p>Maybe. Eventually. But the answer to “will we still need someone to verify the work?” has always been yes. We automated deployments and still need SREs. We automated testing and still need QA engineers. We automated monitoring and still need on-call rotations.</p>

<p>The tools change. The responsibility doesn’t.</p>

<p>The backlog is infinite. It always has been.</p>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="engineering"/>
    <category term="claude-code"/>
    <category term="ai"/>
    <category term="engineering"/>
    <category term="workflow"/>
    <category term="devops"/>
    <summary type="html"><![CDATA[Meet the Claude Code Engineer: a new role where QA, product thinking, and AI coordination matter more than writing code. How agentic tools reshape teams.]]></summary>
  </entry>
  <entry>
    <title type="html">dbsafe in the Cloud: Safe Schema Changes on AWS Aurora and RDS</title>
    <link href="https://rendiment.io/mysql/tools/2026/03/06/dbsafe-cloud-aurora-rds-tls.html" rel="alternate" type="text/html" title="dbsafe in the Cloud: Safe Schema Changes on AWS Aurora and RDS"/>
    <published>2026-03-06T00:00:00-05:00</published>
    <updated>2026-03-06T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/03/06/dbsafe-cloud-aurora-rds-tls</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/03/06/dbsafe-cloud-aurora-rds-tls.html"><![CDATA[<p>You moved your production MySQL to Aurora. The application runs faster, failovers are automatic, and you don’t think about storage anymore. Then you need to run a schema change. You <code class="language-plaintext highlighter-rouge">ssh</code> to your bastion, fire up gh-ost with the Aurora writer endpoint, and it hangs. gh-ost is waiting for a binlog entry that will never arrive, because Aurora’s <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">binlog filtering is enabled by default</a> and silently drops the events gh-ost needs to proceed.</p>

<p>Aurora MySQL uses a <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html">shared-storage architecture</a>. The <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html">Aurora overview</a> describes reader instances as connecting “to the same storage volume as the primary DB instance” rather than replaying binlog events. This architecture means gh-ost requires <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">a complex cross-cluster replication setup</a> and specific parameter changes to function on Aurora, rather than the straightforward single-cluster operation you get on standard MySQL.</p>

<p>dbsafe detects Aurora automatically and steers you toward pt-osc, which uses DML triggers and standard SQL rather than binlog streaming. No cross-cluster setup required.</p>

<blockquote>
  <p><strong>This is part of the dbsafe series.</strong> <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe</a> covers installation and full feature overview. <a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a> covers which operations trigger a full table rebuild, the context that makes tool selection critical. <a href="/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe.html">Foreign Keys and Schema Changes</a> covers another case where gh-ost is excluded.</p>
</blockquote>

<h2 id="connecting-with-tls">Connecting with TLS</h2>

<p>Cloud MySQL endpoints typically require encrypted connections. dbsafe supports TLS natively via the <code class="language-plaintext highlighter-rouge">--tls</code> flag:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="se">\</span>
  <span class="nt">-H</span> my-cluster.cluster-abc123.us-east-1.rds.amazonaws.com <span class="se">\</span>
  <span class="nt">-u</span> dbsafe_ro <span class="se">\</span>
  <span class="nt">--tls</span> required <span class="se">\</span>
  <span class="nt">-d</span> myapp <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id VARCHAR(50)"</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--tls</code> flag accepts five modes:</p>

<table>
  <thead>
    <tr>
      <th>Mode</th>
      <th>Behavior</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">disabled</code></td>
      <td>No encryption. Rejected by most cloud endpoints.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">preferred</code></td>
      <td>Encrypt if the server supports it, fall back to plaintext if not.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">required</code></td>
      <td>Encrypt or fail. Does not verify the server certificate.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">skip-verify</code></td>
      <td>Encrypt, but skip certificate hostname verification. Useful for tunnels.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">custom</code></td>
      <td>Encrypt with full certificate verification against a CA you provide.</td>
    </tr>
  </tbody>
</table>

<p>For AWS environments, <code class="language-plaintext highlighter-rouge">required</code> is the minimum. For strict certificate verification (recommended for production), use <code class="language-plaintext highlighter-rouge">custom</code> with the <code class="language-plaintext highlighter-rouge">--tls-ca</code> flag pointing to the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html">AWS RDS CA bundle</a>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="se">\</span>
  <span class="nt">-H</span> my-cluster.cluster-abc123.us-east-1.rds.amazonaws.com <span class="se">\</span>
  <span class="nt">-u</span> dbsafe_ro <span class="se">\</span>
  <span class="nt">--tls</span> custom <span class="se">\</span>
  <span class="nt">--tls-ca</span> /path/to/aws-rds-global-bundle.pem <span class="se">\</span>
  <span class="nt">-d</span> myapp <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id VARCHAR(50)"</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--tls-ca</code> flag loads the CA certificate file and verifies the server’s certificate chain against it. This is the equivalent of MySQL’s <code class="language-plaintext highlighter-rouge">--ssl-ca</code> option. AWS publishes a <a href="https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem">global CA bundle</a> that covers all commercial AWS regions. Download it once and reference it in your dbsafe config profile.</p>

<h2 id="aurora-auto-detection">Aurora Auto-Detection</h2>

<p>When dbsafe connects to a MySQL instance, it checks the <code class="language-plaintext highlighter-rouge">basedir</code> system variable. Aurora instances have a distinctive <code class="language-plaintext highlighter-rouge">basedir</code> that contains the Aurora version string:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">@@</span><span class="k">version</span><span class="p">,</span> <span class="o">@@</span><span class="n">basedir</span><span class="p">;</span>
<span class="c1">-- @@version = 8.0.28</span>
<span class="c1">-- @@basedir = /rdsdbbin/oscar-8.0.mysql_aurora.3.04.0.0.32961.0/</span>
</code></pre></div></div>

<p>Note that <code class="language-plaintext highlighter-rouge">@@version</code> returns the MySQL compatibility version (<code class="language-plaintext highlighter-rouge">8.0.28</code>), not the Aurora version. The Aurora version string only appears in <code class="language-plaintext highlighter-rouge">basedir</code>. dbsafe parses it and extracts:</p>

<ul>
  <li><strong>Flavor</strong>: <code class="language-plaintext highlighter-rouge">aurora-mysql</code> (detected from the <code class="language-plaintext highlighter-rouge">mysql_aurora</code> substring in <code class="language-plaintext highlighter-rouge">basedir</code>)</li>
  <li><strong>Aurora version</strong>: <code class="language-plaintext highlighter-rouge">3.04.0</code> (extracted from the <code class="language-plaintext highlighter-rouge">basedir</code> path)</li>
  <li><strong>Effective MySQL version</strong>: <code class="language-plaintext highlighter-rouge">8.0.28</code> (from <code class="language-plaintext highlighter-rouge">@@version</code>), used for algorithm detection (INSTANT DDL eligibility, etc.)</li>
</ul>

<p>This distinction matters because Aurora’s MySQL compatibility is not always the latest patch. According to the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Updates.MySQLBugs.html">Aurora MySQL version mapping</a>, Aurora 3.04.0 is compatible with MySQL 8.0.28. If a DDL feature was introduced in MySQL 8.0.29 (like <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT ADD COLUMN at any position</a>), dbsafe correctly reports that the feature is unavailable on this Aurora version, even though the base MySQL 8.0 branch supports it.</p>

<p>Here’s <code class="language-plaintext highlighter-rouge">dbsafe plan</code> against an Aurora Writer for a COPY-algorithm operation:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span>mypassword dbsafe plan <span class="se">\</span>
  <span class="nt">-H</span> my-cluster.cluster-abc123.us-east-1.rds.amazonaws.com <span class="se">\</span>
  <span class="nt">-u</span> dbsafe_ro <span class="se">\</span>
  <span class="nt">--tls</span> required <span class="se">\</span>
  <span class="nt">-d</span> myapp <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders MODIFY COLUMN status VARCHAR(100)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-aurora-writer.png" alt="dbsafe plan output against Aurora Writer showing aurora-mysql flavor, COPY algorithm, DANGEROUS risk, and pt-osc recommendation with gh-ost excluded" /></p>

<p>The output shows the Aurora flavor, the effective MySQL version used for algorithm detection, and the topology. For a COPY operation, gh-ost is excluded and pt-osc is recommended, with the reason displayed inline.</p>

<h2 id="aurora-writer-vs-reader-detection">Aurora Writer vs Reader Detection</h2>

<p>The <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.BestPractices.html">Aurora MySQL best practices</a> documentation recommends checking the <code class="language-plaintext highlighter-rouge">innodb_read_only</code> global variable to determine whether you are connected to a writer or reader instance:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SHOW</span> <span class="k">GLOBAL</span> <span class="n">VARIABLES</span> <span class="k">LIKE</span> <span class="s1">'innodb_read_only'</span><span class="p">;</span>
</code></pre></div></div>

<p>The variable is <code class="language-plaintext highlighter-rouge">OFF</code> on the writer and <code class="language-plaintext highlighter-rouge">ON</code> on reader instances. dbsafe queries this variable and reports the topology:</p>

<ul>
  <li><strong>Aurora Writer</strong>: the instance accepting writes. This is where DDL should run.</li>
  <li><strong>Aurora Reader</strong>: a read-only instance. DDL executed here will fail.</li>
</ul>

<p>When dbsafe detects an Aurora Reader, it surfaces a warning: run your schema change on the Writer endpoint instead. This catches a common mistake, connecting to the reader endpoint (the <code class="language-plaintext highlighter-rouge">-ro</code> suffix in the DNS name) when you meant to use the cluster endpoint.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Connecting to the reader endpoint by mistake</span>
<span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span>mypassword dbsafe plan <span class="se">\</span>
  <span class="nt">-H</span> my-cluster.cluster-ro-abc123.us-east-1.rds.amazonaws.com <span class="se">\</span>
  <span class="nt">-u</span> dbsafe_ro <span class="se">\</span>
  <span class="nt">--tls</span> required <span class="se">\</span>
  <span class="nt">-d</span> myapp <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN priority INT"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-aurora-reader-warning.png" alt="dbsafe plan output against Aurora Reader showing read_only warning and recommendation to run DDL on the Writer instance" /></p>

<p>The analysis still runs. You see the algorithm, risk, and table metadata, but the warning makes it clear that executing this DDL here would fail. The Reader endpoint is for <code class="language-plaintext highlighter-rouge">SELECT</code> queries and read-only analysis, not schema changes.</p>

<h2 id="why-gh-ost-needs-special-handling-on-aurora">Why gh-ost Needs Special Handling on Aurora</h2>

<p>gh-ost’s architecture depends on <a href="https://github.com/github/gh-ost/blob/master/doc/triggerless-design.md">binlog streaming</a> to capture row changes. As the gh-ost docs describe, it “pretends to be a MySQL replica: it connects to the MySQL server and begins requesting for binlog events as though it were a real replication server.” When the shadow table is caught up, it performs a cut-over swap.</p>

<p>On traditional MySQL replication, this works because the binlog is the authoritative source of changes. Replicas apply binlog events to stay in sync. gh-ost taps into the same stream.</p>

<p>On Aurora, the architecture is different. The <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html">Aurora overview</a> describes reader instances as connecting “to the same storage volume as the primary DB instance.” The <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Replication.html">Aurora replication documentation</a> notes that updates “are visible to all Aurora Replicas with minimal replica lag, usually much less than 100 milliseconds after the primary instance has written an update.” This replication happens through the shared storage layer, not through binlog replay.</p>

<p>The <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">gh-ost RDS documentation</a> describes the specific obstacles on Aurora:</p>

<ol>
  <li>
    <p><strong>Binlog filtering</strong>: Aurora enables <code class="language-plaintext highlighter-rouge">aurora_enable_repl_bin_log_filtering</code> by default. The gh-ost docs explain the consequence: “gh-ost waits for an entry in the binlog to proceed but this entry will never end up in the binlog because it gets filtered out.” You must set this parameter to <code class="language-plaintext highlighter-rouge">0</code> before running gh-ost and restore it to <code class="language-plaintext highlighter-rouge">1</code> afterward.</p>
  </li>
  <li>
    <p><strong>Master detection</strong>: gh-ost detects it is running on the master even when connected to a reader endpoint, because all Aurora instances share the same storage. The workaround requires setting up a separate Aurora cluster configured as a binlog replica, following the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Replication.CrossRegion.html">Aurora cross-cluster replication</a> documentation.</p>
  </li>
  <li>
    <p><strong>Preflight requirements</strong>: the gh-ost RDS docs list a preflight checklist including a secondary cluster, consistent parameters, verified replication status, and backup retention exceeding 1 day.</p>
  </li>
</ol>

<p>This is not a simple “add two flags” situation like RDS standalone. It requires provisioning infrastructure (a second Aurora cluster), changing Aurora parameters, and managing cross-cluster replication state.</p>

<p>dbsafe detects <code class="language-plaintext highlighter-rouge">aurora-mysql</code> as the flavor and excludes gh-ost from the tool recommendation. pt-osc, which creates DML triggers on the original table and uses standard SQL INSERT/UPDATE/DELETE statements to populate the shadow table (as described in the <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-osc documentation</a>: “the tool creates triggers on the original table to update the corresponding rows in the new table”), operates on Aurora without any special infrastructure because those SQL statements go through Aurora’s storage layer like any other application write.</p>

<h2 id="rds-standalone-detection">RDS Standalone Detection</h2>

<p>For non-Aurora RDS instances (standard MySQL on RDS), dbsafe performs best-effort detection by checking the <code class="language-plaintext highlighter-rouge">basedir</code> system variable. RDS instances have <code class="language-plaintext highlighter-rouge">basedir</code> set to a path containing <code class="language-plaintext highlighter-rouge">rdsdbbin</code>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- On RDS:</span>
<span class="k">SELECT</span> <span class="o">@@</span><span class="n">basedir</span><span class="p">;</span>
<span class="c1">-- /rdsdbbin/mysql-8.0.45.R3/</span>
</code></pre></div></div>

<p>When dbsafe detects this pattern, it sets a <code class="language-plaintext highlighter-rouge">cloud-managed</code> flag. The implications for schema changes:</p>

<ul>
  <li><strong>No SSH access</strong>: you cannot install or run gh-ost on the RDS host itself. gh-ost must run from an external host with <code class="language-plaintext highlighter-rouge">--allow-on-master</code>.</li>
  <li><strong>No SUPER privilege</strong>: RDS does not grant <code class="language-plaintext highlighter-rouge">SUPER</code>. The <a href="https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md">gh-ost requirements</a> document that you can use <code class="language-plaintext highlighter-rouge">--assume-rbr</code> to avoid the <code class="language-plaintext highlighter-rouge">STOP SLAVE/START SLAVE</code> operations that require <code class="language-plaintext highlighter-rouge">SUPER</code>, as long as your replication is already in <code class="language-plaintext highlighter-rouge">binlog_format=ROW</code>.</li>
  <li><strong>pt-osc works without extra configuration</strong>: it connects as a regular MySQL client and uses DML triggers. No special flags needed for RDS.</li>
</ul>

<p>Here’s <code class="language-plaintext highlighter-rouge">dbsafe plan</code> against an RDS standalone instance:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span>mypassword dbsafe plan <span class="se">\</span>
  <span class="nt">-H</span> mydb.abc123.us-east-1.rds.amazonaws.com <span class="se">\</span>
  <span class="nt">-u</span> dbsafe_ro <span class="se">\</span>
  <span class="nt">--tls</span> required <span class="se">\</span>
  <span class="nt">-d</span> myapp <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders MODIFY COLUMN status VARCHAR(100)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-rds-standalone.png" alt="dbsafe plan output against RDS standalone showing cloud-managed flag and gh-ost extra flags warning" /></p>

<p>Unlike Aurora, gh-ost is not excluded on RDS standalone. The <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">gh-ost RDS documentation</a> confirms that “gh-ost has been updated to work with Amazon RDS.” dbsafe shows both gh-ost and pt-osc commands, with the gh-ost command including the additional flags needed for RDS.</p>

<h2 id="configuration-profiles-for-cloud">Configuration Profiles for Cloud</h2>

<p>Typing <code class="language-plaintext highlighter-rouge">--tls required --tls-ca /path/to/cert.pem</code> on every command is tedious. dbsafe supports <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">configuration profiles</a> in <code class="language-plaintext highlighter-rouge">~/.dbsafe/config.yaml</code> that store connection defaults:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">connections</span><span class="pi">:</span>
  <span class="na">aurora-prod</span><span class="pi">:</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">my-cluster.cluster-abc123.us-east-1.rds.amazonaws.com</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">3306</span>
    <span class="na">user</span><span class="pi">:</span> <span class="s">dbsafe_ro</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">myapp</span>
    <span class="na">tls</span><span class="pi">:</span> <span class="s">required</span>
    <span class="na">tls_ca</span><span class="pi">:</span> <span class="s">/path/to/aws-rds-global-bundle.pem</span>

  <span class="na">rds-staging</span><span class="pi">:</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">staging.abc123.us-east-1.rds.amazonaws.com</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">3306</span>
    <span class="na">user</span><span class="pi">:</span> <span class="s">dbsafe_ro</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">myapp_staging</span>
    <span class="na">tls</span><span class="pi">:</span> <span class="s">required</span>

<span class="na">defaults</span><span class="pi">:</span>
  <span class="na">format</span><span class="pi">:</span> <span class="s">text</span>
</code></pre></div></div>

<p>With this config, your daily workflow becomes:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Analyze against Aurora production (TLS and CA handled by the profile)</span>
<span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span>mypassword dbsafe plan <span class="nt">--connection</span> aurora-prod <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id VARCHAR(50)"</span>

<span class="c"># Same analysis against RDS staging</span>
<span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span>mypassword dbsafe plan <span class="nt">--connection</span> rds-staging <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id VARCHAR(50)"</span>
</code></pre></div></div>

<p>The password is passed via the <code class="language-plaintext highlighter-rouge">DBSAFE_PASSWORD</code> environment variable, never stored in the config file. For CI/CD pipelines, pull it from your secrets manager (AWS Secrets Manager, Vault, etc.) into the environment before invoking dbsafe.</p>

<h2 id="practical-workflow-for-cloud-schema-changes">Practical Workflow for Cloud Schema Changes</h2>

<p><strong>1. Set up a config profile</strong> for each environment (production Aurora, staging RDS, etc.) with TLS and CA certificate paths. Do this once.</p>

<p><strong>2. Run <code class="language-plaintext highlighter-rouge">dbsafe plan</code> against the production Writer</strong> endpoint. Verify the topology shows <code class="language-plaintext highlighter-rouge">Aurora Writer</code> (not Reader) and the effective MySQL version matches your expectations.</p>

<p><strong>3. Check the algorithm.</strong> INSTANT operations are safe on Aurora. They complete in milliseconds just like on standalone MySQL. The shared storage architecture doesn’t affect metadata-only changes.</p>

<p><strong>4. For COPY operations, use pt-osc.</strong> gh-ost is excluded on Aurora because the cross-cluster setup it requires is impractical for routine schema changes. dbsafe generates the pt-osc command pre-populated with your connection parameters. If the table has triggers, <code class="language-plaintext highlighter-rouge">--preserve-triggers</code> is included automatically.</p>

<p><strong>5. For RDS standalone COPY operations, choose your tool.</strong> Both gh-ost and pt-osc work, but gh-ost requires <code class="language-plaintext highlighter-rouge">--allow-on-master</code> and <code class="language-plaintext highlighter-rouge">--assume-rbr</code>. dbsafe includes these flags in the generated command. pt-osc works without extra configuration.</p>

<p><strong>6. Never run DDL on a Reader endpoint.</strong> dbsafe warns you, but the best practice is to always use the cluster endpoint (which routes to the Writer) rather than a specific instance endpoint.</p>

<p><strong>7. For CI/CD gates</strong>, use <code class="language-plaintext highlighter-rouge">dbsafe plan --format json</code> with the cloud profile:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">RESULT</span><span class="o">=</span><span class="si">$(</span><span class="nv">DBSAFE_PASSWORD</span><span class="o">=</span><span class="s2">"</span><span class="nv">$DB_PASSWORD</span><span class="s2">"</span> dbsafe plan <span class="se">\</span>
  <span class="nt">--connection</span> aurora-prod <span class="se">\</span>
  <span class="nt">--format</span> json <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id VARCHAR(50)"</span><span class="si">)</span>

<span class="nv">ALGORITHM</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$RESULT</span><span class="s2">"</span> | jq <span class="nt">-r</span> <span class="s1">'.algorithm'</span><span class="si">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$ALGORITHM</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"INSTANT"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Non-INSTANT DDL on Aurora: requires pt-osc migration plan"</span>
  <span class="nb">exit </span>1
<span class="k">fi</span>
</code></pre></div></div>

<h2 id="summary">Summary</h2>

<ol>
  <li>
    <p><strong>TLS is standard for cloud MySQL.</strong> dbsafe supports five TLS modes via <code class="language-plaintext highlighter-rouge">--tls</code>, with <code class="language-plaintext highlighter-rouge">--tls-ca</code> for custom CA certificate verification against the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html">AWS RDS CA bundle</a>. Use <code class="language-plaintext highlighter-rouge">required</code> at minimum; <code class="language-plaintext highlighter-rouge">custom</code> with the CA bundle for full certificate verification.</p>
  </li>
  <li>
    <p><strong>Aurora auto-detection</strong> parses the <code class="language-plaintext highlighter-rouge">basedir</code> variable (which contains <code class="language-plaintext highlighter-rouge">mysql_aurora</code> and the Aurora version) to identify the flavor. The effective MySQL version comes from <code class="language-plaintext highlighter-rouge">@@version</code>. The <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Updates.MySQLBugs.html">Aurora version mapping</a> determines INSTANT DDL eligibility based on the effective MySQL version, not the Aurora release number.</p>
  </li>
  <li>
    <p><strong>Aurora Writer/Reader topology</strong> is detected via the <code class="language-plaintext highlighter-rouge">innodb_read_only</code> variable, as <a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.BestPractices.html">recommended by AWS</a>. Reader instances get a warning: DDL must run on the Writer. This catches the common mistake of connecting to the <code class="language-plaintext highlighter-rouge">-ro</code> endpoint.</p>
  </li>
  <li>
    <p><strong>gh-ost requires a complex setup on Aurora.</strong> The <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">gh-ost RDS documentation</a> documents three obstacles: default binlog filtering that blocks gh-ost events, master detection on all endpoints due to shared storage, and the need for a separate cross-cluster replication target. dbsafe excludes gh-ost on Aurora and recommends pt-osc, which works with standard SQL and requires no special infrastructure.</p>
  </li>
  <li>
    <p><strong>RDS standalone detection</strong> is best-effort via the <code class="language-plaintext highlighter-rouge">basedir</code> variable. gh-ost <a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">works on RDS</a> but requires <code class="language-plaintext highlighter-rouge">--allow-on-master</code> and <code class="language-plaintext highlighter-rouge">--assume-rbr</code>. dbsafe includes these in the generated command. pt-osc works without extra configuration.</p>
  </li>
  <li>
    <p><strong>Configuration profiles</strong> in <code class="language-plaintext highlighter-rouge">~/.dbsafe/config.yaml</code> store host, TLS, and CA paths per environment. Passwords go in <code class="language-plaintext highlighter-rouge">DBSAFE_PASSWORD</code>, never in the config file.</p>
  </li>
</ol>

<h2 id="references">References</h2>

<p><strong>AWS Documentation:</strong></p>
<ul>
  <li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html">Amazon Aurora Overview</a></li>
  <li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Replication.html">Aurora Replication</a></li>
  <li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Updates.MySQLBugs.html">Aurora MySQL Version Mapping</a></li>
  <li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.BestPractices.html">Aurora MySQL Best Practices</a></li>
  <li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html">Using SSL/TLS with RDS</a></li>
  <li><a href="https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem">RDS CA Certificate Bundle</a></li>
</ul>

<p><strong>Tools:</strong></p>
<ul>
  <li><a href="https://github.com/nethalo/dbsafe">dbsafe, GitHub Repository</a></li>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change, Percona Toolkit</a></li>
  <li><a href="https://github.com/github/gh-ost">gh-ost, GitHub’s Online Schema Migration Tool</a></li>
  <li><a href="https://github.com/github/gh-ost/blob/master/doc/triggerless-design.md">gh-ost Triggerless Design</a></li>
  <li><a href="https://github.com/github/gh-ost/blob/master/doc/rds.md">gh-ost on RDS and Aurora</a></li>
  <li><a href="https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md">gh-ost Requirements and Limitations</a></li>
</ul>

<p><strong>Related Posts:</strong></p>
<ul>
  <li><a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe: Know Before You ALTER</a></li>
  <li><a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">Zero-Downtime Schema Changes with INSTANT DDL</a></li>
  <li><a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a></li>
  <li><a href="/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe.html">Foreign Keys and Schema Changes: The Constraint You Didn’t Plan For</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="aurora"/>
    <category term="rds"/>
    <category term="aws"/>
    <category term="tls"/>
    <category term="dbsafe"/>
    <category term="pt-online-schema-change"/>
    <summary type="html"><![CDATA[Run dbsafe against Aurora MySQL and RDS with TLS. Learn how Aurora's shared-storage architecture complicates gh-ost and why dbsafe auto-detects cloud topology.]]></summary>
  </entry>
  <entry>
    <title type="html">Before the Index, Before the Schema: MySQL Makes Three Promises</title>
    <link href="https://rendiment.io/mysql/fundamentals/2026/03/03/before-the-index-mysql-three-promises.html" rel="alternate" type="text/html" title="Before the Index, Before the Schema: MySQL Makes Three Promises"/>
    <published>2026-03-03T00:00:00-05:00</published>
    <updated>2026-03-03T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/fundamentals/2026/03/03/before-the-index-mysql-three-promises</id>
    <content type="html" xml:base="https://rendiment.io/mysql/fundamentals/2026/03/03/before-the-index-mysql-three-promises.html"><![CDATA[<p>Most people, when asked what a database does, say something like: “it stores data.”</p>

<p>That’s like saying a restaurant “stores food.”</p>

<p>Technically true. Completely misses the point.</p>

<p>A restaurant has to cook fast, serve many tables at once, and not poison anyone. Fail any one of those three and it doesn’t matter how good the kitchen looks. A database has the same problem — except the stakes are your production system at 2am.</p>

<p>A few years ago I gave a talk at Percona Live in Denver where I tried to answer this properly. Not from a features list. Not from a vendor slide deck. From first principles: what does a database <em>have</em> to do?</p>

<p>Three things. Everything else — every configuration parameter, every architecture decision, every incident you’ve ever fought — falls into one of them.</p>

<hr />

<h2 id="execute-queries">Execute Queries</h2>

<p><img src="/assets/img/gallery/what-databases-do-execute.jpg" alt="MySQL query execution flow through InnoDB — buffer pool, redo log, and doublewrite buffer working together" /></p>

<p>A restaurant has one core job: take an order and bring food to the table. Fast, correct, and for as many tables as possible simultaneously.</p>

<p>A database has the same job. Answer questions about data. Record changes. As fast as possible, as many as possible, without corrupting anything in the process.</p>

<p>That last part is the one that gets sacrificed first when you’re optimizing for speed. InnoDB’s entire machinery — the buffer pool, the redo log, the doublewrite buffer — exists to make sure “fast” and “correct” happen at the same time. ACID isn’t a marketing term. It’s the contract the database makes with every query it executes.</p>

<p>The tension is real. Disabling <code class="language-plaintext highlighter-rouge">foreign_key_checks</code> before a bulk load makes the operation faster. It also removes a correctness guarantee while it’s disabled. That tradeoff isn’t inherently wrong — but you can only make it deliberately if you understand what you’re trading. If you’re curious about the hidden consequences of foreign keys, I covered one particularly dangerous scenario in <a href="/mysql/tools/2026/02/25/mysql-fk-cascade-blind-spot.html">the ON DELETE CASCADE blind spot in MySQL’s binary log</a>.</p>

<p>When a query is slow, the reflex is to reach for indexes. Sometimes that’s right. But a query can also be slow because lock contention is serializing execution, because the working set stopped fitting in the buffer pool, or because something upstream is flooding the connection pool. Same symptom, completely different root causes, completely different solutions. Knowing the responsibility narrows the search. Understanding <a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">InnoDB semaphore contention</a> is one way to tell lock contention apart from other causes.</p>

<hr />

<h2 id="relationships">Relationships</h2>

<p><img src="/assets/img/gallery/what-databases-do-relationships.jpg" alt="Database relationships — users, replicas, and dev/ops teams all depend on the database" /></p>

<p>No database is an island.</p>

<p>Think of it like a person who has three very different kinds of relationships in their life — and does a bad job with any one of them at their own peril.</p>

<p><strong>With users</strong>, the relationship is trust and boundaries. Who gets in, what they can see, what they can touch. MySQL’s account model — hosts, privileges, roles — is the entire machinery for this. When someone asks why the application can’t just run as root, this is why. The database has a responsibility to protect data from people and systems that shouldn’t have it. That responsibility doesn’t disappear because setting it up is inconvenient.</p>

<p><strong>With other databases</strong>, the relationship is coordination. A replica trusts that the primary is sending it a faithful copy of reality. A PXC node trusts that the other nodes in the cluster will agree on the same writes. When <code class="language-plaintext highlighter-rouge">wsrep_local_recv_queue</code> starts climbing, the cluster is telling you a relationship is under stress — one node can’t keep up with what the others are sending. It’s a relationship problem before it’s a performance problem. Treating it as a performance problem first is how you end up chasing the wrong metric.</p>

<p><strong>With dev and ops teams</strong>, the relationship is communication. Logs, status variables, Performance Schema — this is how the database talks. When you skip configuring the slow query log because it adds overhead, you’re choosing silence. You’ll regret that choice during the next incident, when you’re flying blind trying to reconstruct what happened. Tools like <a href="/mysql/postgresql/pmm/percona/monitoring/2024/09/19/pmm-query-analytics-clickhouse.html">PMM Query Analytics</a> exist precisely to bridge this communication gap.</p>

<p>A database that executes queries correctly but can’t communicate its state, can’t cooperate with peers, and can’t enforce who has access — is a ticking clock.</p>

<hr />

<h2 id="survive">Survive</h2>

<p><img src="/assets/img/gallery/what-databases-do-survive.jpg" alt="Database survival — CPU, memory, and disk as physical constraints the database must negotiate" /></p>

<p>This is the one nobody talks about at conferences, and it’s the one that kills you.</p>

<p>A database doesn’t run in the cloud. It runs on a machine. A machine with a CPU that can be saturated, memory that can be exhausted, and a disk that fills up and then — not slowly degrades, but <strong>stops</strong>. Full disk doesn’t slow MySQL down. It stops it cold.</p>

<p>Think of it like a tenant who has to know the rules of the building they live in. The landlord — the OS — controls memory allocation, file descriptors, I/O scheduling. The tenant can push their luck, but only so far before the landlord intervenes. An OOM kill at 3am is the landlord evicting a tenant who was using more than their share.</p>

<p><code class="language-plaintext highlighter-rouge">innodb_buffer_pool_size</code> is the most important negotiation a MySQL server has with its host machine. Too low and you’re leaving performance on the table. Too high on a box running other processes and you’re gambling that the OS won’t reclaim that memory mid-write. That configuration parameter isn’t a performance knob. It’s a survival decision.</p>

<p>Disk is more insidious. A table that grows 100MB per day doesn’t look dangerous today. In six months it’s 18GB. The database won’t warn you. It will just stop one day. The monitoring that watches disk growth trends and alerts before the cliff — that’s not operational overhead. That’s the database fulfilling its responsibility to survive the physical world it lives in. Setting up <a href="/monitoring/prometheus/2024/09/10/smart-alerting-dynamic-thresholds.html">smart alerting with dynamic thresholds</a> is how you catch these slow-moving threats.</p>

<p>Backups live here too. A database that can’t be recovered after a failure didn’t survive. Full stop.</p>

<hr />

<h2 id="why-this-framework-matters">Why This Framework Matters</h2>

<p><img src="/assets/img/gallery/what-databases-do-framework.jpg" alt="The three promises as a diagnostic framework — locating a problem before solving it" /></p>

<p>These three categories won’t tell you how to fix anything. They’re not a checklist. What they give you is a way to locate a problem before you start solving it — and that matters more than most people admit.</p>

<p>Replica falling behind? Three possible zip codes:</p>

<ul>
  <li><strong>Execute Queries</strong> — the primary is running queries so heavy that the replica can’t replay them fast enough</li>
  <li><strong>Relationships</strong> — the network between primary and replica can’t carry the replication stream</li>
  <li><strong>Survive</strong> — the replica’s disk I/O is the bottleneck</li>
</ul>

<p>Same symptom. Three completely different tools. If you go straight to tuning queries when the real problem is disk throughput on the replica, you will waste hours.</p>

<p>The framework doesn’t solve the problem. It tells you which drawer to open first.</p>

<p>Every decision you make as a DBA is in service of one of these three things. Execute queries correctly and fast. Manage relationships with users, peers, and teams. Survive the physical constraints of the machine it runs on.</p>

<p>That’s the whole job.</p>

<hr />

<p><em>I first presented this framework at Percona Live in Denver. The talk was aimed at DBAs, but I’ve always believed that database fundamentals should be explainable to anyone — and that explaining them clearly forces a deeper understanding than talking only to specialists.</em></p>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="fundamentals"/>
    <category term="mysql"/>
    <category term="fundamentals"/>
    <category term="architecture"/>
    <category term="dba"/>
    <summary type="html"><![CDATA[What is a database actually supposed to do? A framework of three core promises that applies whether you're a DBA, developer, or ops engineer.]]></summary>
  </entry>
  <entry>
    <title type="html">Foreign Keys and Schema Changes: The Constraint You Didn’t Plan For</title>
    <link href="https://rendiment.io/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe.html" rel="alternate" type="text/html" title="Foreign Keys and Schema Changes: The Constraint You Didn’t Plan For"/>
    <published>2026-02-28T00:00:00-05:00</published>
    <updated>2026-02-28T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/02/28/foreign-key-impact-ddl-dbsafe.html"><![CDATA[<p>You added a foreign key constraint when you designed the schema, verified referential integrity, and moved on. Two years later, you need to modify a column on the child table. You run <code class="language-plaintext highlighter-rouge">dbsafe plan</code> and the output shows a section you weren’t expecting: a FK listing with the constraint name, the referenced table, and <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code>. What does that mean for your ALTER TABLE?</p>

<p>Foreign key constraints surface in two places during schema changes: in the metadata locks MySQL acquires for the duration of the DDL, and in the tool selection available to you. Understanding both before you run is exactly what <code class="language-plaintext highlighter-rouge">dbsafe plan</code> is for.</p>

<blockquote>
  <p><strong>This is the fifth post in the dbsafe series.</strong> <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe</a> covers installation and the full feature overview. <a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a> covers the operations that require a full table rebuild — the context you need before this post.</p>
</blockquote>

<h2 id="what-dbsafe-detects">What dbsafe Detects</h2>

<p>When analyzing a table, dbsafe queries <a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema-key-column-usage-table.html"><code class="language-plaintext highlighter-rouge">information_schema.KEY_COLUMN_USAGE</code></a> and <a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema-referential-constraints-table.html"><code class="language-plaintext highlighter-rouge">information_schema.REFERENTIAL_CONSTRAINTS</code></a> to surface every FK relationship involving the target table. The output shows the referenced tables and columns that the target table depends on.</p>

<p>Here’s <code class="language-plaintext highlighter-rouge">dbsafe plan</code> on <code class="language-plaintext highlighter-rouge">order_items</code>, a child table with a FK pointing to <code class="language-plaintext highlighter-rouge">orders</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="nt">-H</span> 127.0.0.1 <span class="nt">-P</span> 23306 <span class="nt">-u</span> dbsafe <span class="nt">-d</span> demo <span class="se">\</span>
  <span class="s2">"ALTER TABLE order_items ADD COLUMN note_text VARCHAR(500)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/fk-child-table.png" alt="dbsafe plan output for ALTER TABLE order_items ADD COLUMN note_text showing INSTANT algorithm, SAFE risk level, and FK refs line showing 2 references to orders.id and products.id" /></p>

<p>The <code class="language-plaintext highlighter-rouge">FK refs</code> line in the output shows the count and the referenced tables and columns — <code class="language-plaintext highlighter-rouge">orders.id</code> and <code class="language-plaintext highlighter-rouge">products.id</code>. At a glance you can see which parent tables this child depends on. Surfacing this requires querying <code class="language-plaintext highlighter-rouge">information_schema</code> separately; dbsafe includes it automatically in every analysis.</p>

<p>FK metadata appears regardless of the DDL algorithm. Even an INSTANT operation like <code class="language-plaintext highlighter-rouge">ADD COLUMN</code> surfaces the FK relationships, so you understand the full constraint context before changing anything.</p>

<h2 id="metadata-locks-extend-to-related-tables">Metadata Locks Extend to Related Tables</h2>

<p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html">MySQL documentation on metadata locking</a> states:</p>

<blockquote>
  <p>“Metadata locks are extended, as necessary, to tables related by a foreign key constraint to prevent conflicting DML and DDL operations from executing concurrently on the related tables. When updating a parent table, a metadata lock is taken on the child table while updating foreign key metadata. Foreign key metadata is owned by the child table.”</p>
</blockquote>

<p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">FOREIGN KEY Constraints reference</a> adds the lock type detail:</p>

<blockquote>
  <p>“If a table is locked explicitly with LOCK TABLES, any tables related by a foreign key constraint are opened and locked implicitly. For foreign key checks, a shared read-only lock (LOCK TABLES READ) is taken on related tables. For cascading updates, a shared-nothing write lock (LOCK TABLES WRITE) is taken on related tables that are involved in the operation.”</p>
</blockquote>

<p>In practice:</p>

<ul>
  <li><strong>ALTER TABLE on the child table</strong> acquires a metadata lock on the parent table for the duration of the DDL. Any transaction on the parent that hasn’t committed must complete first. New transactions that need to modify the parent must wait.</li>
  <li><strong>ALTER TABLE on the parent table</strong> works in reverse: the lock protects the child table’s FK references from being invalidated while the parent schema is changing.</li>
</ul>

<p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations reference</a> documents the specific wait conditions that arise from CASCADE and SET NULL rules:</p>

<blockquote>
  <p>“An ALTER TABLE on the child table could wait for another transaction to commit, if a change to the parent table causes associated changes in the child table through an ON UPDATE or ON DELETE clause using the CASCADE or SET NULL parameters.”</p>
</blockquote>

<p>And in the other direction:</p>

<blockquote>
  <p>“In the same way, if a table is the parent table in a foreign key relationship, even though it does not contain any FOREIGN KEY clauses, it could wait for the ALTER TABLE to complete if an INSERT, UPDATE, or DELETE statement causes an ON UPDATE or ON DELETE action in the child table.”</p>
</blockquote>

<p>The practical consequence: an ALTER TABLE on <code class="language-plaintext highlighter-rouge">order_items</code> can block or be blocked by concurrent DML on <code class="language-plaintext highlighter-rouge">orders</code>, and vice versa — even though you’re only changing one table. On high-write systems this lock interaction can cause visible application latency during the DDL window. dbsafe’s FK section makes this non-obvious relationship explicit before you schedule the operation.</p>

<h2 id="the-copy-algorithm-on-fk-constrained-tables">The COPY Algorithm on FK-Constrained Tables</h2>

<p>When the DDL algorithm is COPY — a full table rebuild — the FK relationship adds a validation layer to every row written into the shadow table.</p>

<p>InnoDB enforces FK constraints on every write operation. With <code class="language-plaintext highlighter-rouge">foreign_key_checks</code> set to <code class="language-plaintext highlighter-rouge">ON</code> (the default), every row inserted into the shadow table during the COPY phase is subject to the same referential validity check that applies to any <code class="language-plaintext highlighter-rouge">INSERT</code> against the table. The <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations documentation</a> captures the dependency between FK enforcement and the algorithm available:</p>

<blockquote>
  <p>“The INPLACE algorithm is supported when foreign_key_checks is disabled. Otherwise, only the COPY algorithm is supported.”</p>
</blockquote>

<p>This is stated in the context of <code class="language-plaintext highlighter-rouge">ADD FOREIGN KEY</code>, but it reflects the general principle: FK enforcement and COPY algorithm are coupled. On a child table with millions of rows and a FK pointing to a large parent, the COPY phase performs a referential validity check alongside every row copy.</p>

<p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">FOREIGN KEY Constraints reference</a> describes what disabling <code class="language-plaintext highlighter-rouge">foreign_key_checks</code> actually bypasses:</p>

<blockquote>
  <p>“When foreign_key_checks is disabled, foreign key constraints are ignored, with the following exceptions: [recreating a dropped table must still conform to FK definitions referencing it; incorrectly formed FK definitions still return errors; dropping an index required by a FK constraint requires removing the FK first].”</p>
</blockquote>

<p>Disabling <code class="language-plaintext highlighter-rouge">foreign_key_checks</code> removes referential integrity enforcement for the duration of the operation. On large tables where the validation overhead is measurable, it’s a tradeoff: faster rebuild, no constraint protection if the operation is interrupted or if concurrent DML inserts during the rebuild window.</p>

<h2 id="gh-ost-does-not-support-fk-constrained-tables">gh-ost Does Not Support FK-Constrained Tables</h2>

<p>This is where the FK impact is most operationally significant: <strong>gh-ost refuses to operate on tables involved in FK relationships by default</strong>.</p>

<p>The <a href="https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md">gh-ost requirements and limitations documentation</a> is explicit:</p>

<blockquote>
  <p>“Foreign key constraints are not supported. They may be supported in the future, to some extent.”</p>
</blockquote>

<p>The restriction applies in both directions:</p>

<ul>
  <li><strong>Child-side</strong>: a table that has a <code class="language-plaintext highlighter-rouge">FOREIGN KEY ... REFERENCES</code> clause — like <code class="language-plaintext highlighter-rouge">order_items</code> with its FK to <code class="language-plaintext highlighter-rouge">orders</code></li>
  <li><strong>Parent-side</strong>: a table that is referenced by other tables’ FK constraints — like <code class="language-plaintext highlighter-rouge">orders</code> referenced by <code class="language-plaintext highlighter-rouge">order_items</code></li>
</ul>

<p>Both cause gh-ost to abort. Two escape hatches exist: <code class="language-plaintext highlighter-rouge">--skip-foreign-key-checks</code> (bypasses the detection check entirely, with a logged warning) and <code class="language-plaintext highlighter-rouge">--discard-foreign-keys</code> (allows child-side FK tables if you accept that the FK constraint will be dropped on the ghost table and not recreated). Neither is appropriate for production use when referential integrity must be maintained.</p>

<p><strong>The operational result</strong>: any table involved in a FK relationship requires pt-osc. dbsafe detects FK constraints and routes the tool recommendation accordingly.</p>

<p>Here’s <code class="language-plaintext highlighter-rouge">dbsafe plan</code> on the parent <code class="language-plaintext highlighter-rouge">orders</code> table, which is referenced by <code class="language-plaintext highlighter-rouge">order_items</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="nt">-H</span> 127.0.0.1 <span class="nt">-P</span> 23306 <span class="nt">-u</span> dbsafe <span class="nt">-d</span> demo <span class="se">\</span>
  <span class="s2">"ALTER TABLE orders MODIFY COLUMN total_amount DECIMAL(14,4)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/fk-parent-table.png" alt="dbsafe plan output for ALTER TABLE orders MODIFY COLUMN total_amount showing COPY algorithm, DANGEROUS risk level, FK section listing order_items as a referencing child table, and a pt-osc recommendation with --preserve-triggers flag" /></p>

<p>The output surfaces two reasons gh-ost is excluded: the <code class="language-plaintext highlighter-rouge">orders</code> table has triggers (<code class="language-plaintext highlighter-rouge">trg_orders_after_update</code>, <code class="language-plaintext highlighter-rouge">trg_orders_after_delete</code>) — <a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">covered in the COPY algorithm post</a> — and it is referenced by FK constraints in child tables. Either condition alone is sufficient to exclude gh-ost. pt-osc is the only viable tool.</p>

<h2 id="pt-osc-and-alter-foreign-keys-method">pt-osc and –alter-foreign-keys-method</h2>

<p>When pt-osc alters a parent table, the rename swap at the end creates a problem: child tables still have FK constraints that reference the original table name. After the rename, those constraints point to a non-existent table.</p>

<p>The <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change documentation</a> describes three modes for <code class="language-plaintext highlighter-rouge">--alter-foreign-keys-method</code>:</p>

<p><strong><code class="language-plaintext highlighter-rouge">rebuild_constraints</code></strong> — the safe default:</p>

<blockquote>
  <p>“This method uses ALTER TABLE to drop and re-add foreign key constraints that reference the new table. This is the preferred technique, unless one or more of the ‘child’ tables is so large that the ALTER would take too long.”</p>
</blockquote>

<p>Note from the pt-osc docs: “Due to a limitation in MySQL, foreign keys will not have the same names after the ALTER that they did prior to it.” The names get an underscore prefix. If your application or monitoring refers to FK constraint names explicitly, update those references after the migration.</p>

<p><strong><code class="language-plaintext highlighter-rouge">drop_swap</code></strong> — faster, but riskier:</p>

<blockquote>
  <p>“Disable foreign key checks (FOREIGN_KEY_CHECKS=0), then drop the original table before renaming the new table into its place. This is different from the normal method of swapping the old and new table, which uses an atomic RENAME that is undetectable to client applications.”</p>
</blockquote>

<p>Between the drop and the rename, the original table does not exist. Queries against it fail during that window. If the rename fails, the original table is gone. The pt-osc docs describe this as riskier for exactly these two reasons.</p>

<p><strong><code class="language-plaintext highlighter-rouge">none</code></strong> — requires manual follow-up:</p>

<blockquote>
  <p>“This method is like drop_swap without the ‘swap’. Any foreign keys that referenced the original table will now reference a nonexistent table.”</p>
</blockquote>

<p>This is explicitly for administrators who intend to handle FK re-pointing manually. The resulting state after the migration is broken until that manual step is performed.</p>

<p>When dbsafe generates a pt-osc command for a parent table COPY operation, <code class="language-plaintext highlighter-rouge">rebuild_constraints</code> is the right default unless the child tables are so large that the secondary <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> to drop/recreate FKs falls outside your maintenance window. In that case, <code class="language-plaintext highlighter-rouge">drop_swap</code> trades a brief availability gap for faster completion — a deliberate tradeoff, not a safe shortcut.</p>

<h2 id="a-note-on-on-delete-cascade-during-the-migration-window">A Note on ON DELETE CASCADE During the Migration Window</h2>

<p>If the parent table has <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code> relationships to child tables, there is an additional consideration during the migration window. When pt-osc runs:</p>

<ol>
  <li>pt-osc installs <code class="language-plaintext highlighter-rouge">AFTER INSERT</code>, <code class="language-plaintext highlighter-rouge">AFTER UPDATE</code>, and <code class="language-plaintext highlighter-rouge">AFTER DELETE</code> triggers on the original table to capture DML and replay it on the ghost table.</li>
  <li>When a row is deleted from <code class="language-plaintext highlighter-rouge">orders</code>, the trigger fires and pt-osc replicates the delete to the ghost table — correctly.</li>
  <li>But the <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code> to <code class="language-plaintext highlighter-rouge">order_items</code> fires inside the InnoDB storage engine, below the SQL layer, and therefore <strong>below pt-osc’s triggers</strong>. The cascaded deletions on <code class="language-plaintext highlighter-rouge">order_items</code> are invisible to pt-osc’s change tracking on <code class="language-plaintext highlighter-rouge">orders</code>.</li>
</ol>

<p>This is the same architectural limitation documented in <a href="/mysql/innodb/2026/02/25/mysql-fk-cascade-blind-spot.html">ON DELETE CASCADE: The Foreign Key Change MySQL Doesn’t Log</a>. In the context of a parent table migration, pt-osc’s tracking of <code class="language-plaintext highlighter-rouge">orders</code> is complete; it’s the shadow <code class="language-plaintext highlighter-rouge">order_items</code> data that may drift if CASCADE fires during the copy phase.</p>

<p>Whether this matters depends on your schema. If you’re only migrating <code class="language-plaintext highlighter-rouge">orders</code> and <code class="language-plaintext highlighter-rouge">order_items</code> is not being migrated simultaneously, the drift affects data in <code class="language-plaintext highlighter-rouge">order_items</code> (the real table) — not the shadow table being built for <code class="language-plaintext highlighter-rouge">orders</code>. In most cases this is informational. But if you’re orchestrating coordinated schema changes across both tables at once, sequencing and the migration window need careful planning.</p>

<h2 id="practical-workflow-for-fk-constrained-tables">Practical Workflow for FK-Constrained Tables</h2>

<p><strong>1. Run <code class="language-plaintext highlighter-rouge">dbsafe plan</code> on every table in the FK relationship</strong>, not just the one you’re altering. The metadata lock extension means the parent table’s DDL window affects child table DML and vice versa. Knowing both sides before scheduling is necessary for accurate impact assessment.</p>

<p><strong>2. Accept that gh-ost is unavailable</strong> for any table with FK relationships. pt-osc is the tool. This is not a dbsafe preference — it reflects gh-ost’s documented limitation.</p>

<p><strong>3. Check the algorithm.</strong> If INSTANT, the FK impact is limited to the MDL extension discussed above. If COPY, FK constraint validation adds overhead to the row copy phase. The larger the tables involved, the more significant the overhead.</p>

<p><strong>4. Review <code class="language-plaintext highlighter-rouge">--alter-foreign-keys-method</code> for parent table changes.</strong> The <code class="language-plaintext highlighter-rouge">rebuild_constraints</code> default is safe. Before using <code class="language-plaintext highlighter-rouge">drop_swap</code>, verify you can tolerate the brief availability gap. Never use <code class="language-plaintext highlighter-rouge">none</code> unless you have a manual FK repair step ready to run immediately after the migration completes.</p>

<p><strong>5. For type changes that cascade through FK columns</strong> — such as migrating <code class="language-plaintext highlighter-rouge">INT</code> primary keys to <code class="language-plaintext highlighter-rouge">BIGINT</code> — each table in the FK chain requires its own migration. The child FK column type must match the parent PK type. Plan each migration separately, verify referential integrity between steps, and do not attempt both simultaneously.</p>

<h2 id="summary">Summary</h2>

<ol>
  <li>
    <p><strong>dbsafe surfaces FK relationships automatically</strong> from <code class="language-plaintext highlighter-rouge">information_schema.KEY_COLUMN_USAGE</code> and <code class="language-plaintext highlighter-rouge">information_schema.REFERENTIAL_CONSTRAINTS</code>. The FK section appears in every analysis regardless of DDL algorithm.</p>
  </li>
  <li>
    <p><strong>MySQL extends metadata locks to FK-related tables</strong> for the duration of the DDL. The <a href="https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html">metadata locking documentation</a> and the <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations reference</a> document specific wait conditions for CASCADE and SET NULL rules — wait conditions that affect both the child table DDL and the parent table DDL.</p>
  </li>
  <li>
    <p><strong>COPY algorithm + FK constraints</strong> means InnoDB applies its standard FK enforcement to every row written into the shadow table during the rebuild. Disabling <code class="language-plaintext highlighter-rouge">foreign_key_checks</code> removes that overhead but also removes referential integrity protection for the duration.</p>
  </li>
  <li>
    <p><strong>gh-ost refuses FK-constrained tables</strong> by default — both parent-side and child-side. The <a href="https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md">gh-ost documentation</a> is explicit: “Foreign key constraints are not supported.” pt-osc is required for any table in a FK relationship.</p>
  </li>
  <li>
    <p><strong>For parent table migrations with pt-osc</strong>, <code class="language-plaintext highlighter-rouge">--alter-foreign-keys-method</code> controls how child FK references are updated after the rename swap. <code class="language-plaintext highlighter-rouge">rebuild_constraints</code> is the safe default. <code class="language-plaintext highlighter-rouge">drop_swap</code> is faster but introduces a brief window where the original table does not exist. <code class="language-plaintext highlighter-rouge">none</code> leaves child FKs pointing to a non-existent table until manually repaired.</p>
  </li>
</ol>

<h2 id="references">References</h2>

<p><strong>MySQL Official Documentation:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html">Metadata Locking — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">FOREIGN KEY Constraints — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema-key-column-usage-table.html">INFORMATION_SCHEMA KEY_COLUMN_USAGE Table — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema-referential-constraints-table.html">INFORMATION_SCHEMA REFERENTIAL_CONSTRAINTS Table — MySQL 8.0</a></li>
</ul>

<p><strong>Tools:</strong></p>
<ul>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change — Percona Toolkit</a></li>
  <li><a href="https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md">gh-ost Requirements and Limitations</a></li>
  <li><a href="https://github.com/nethalo/dbsafe">dbsafe — GitHub Repository</a></li>
</ul>

<p><strong>Related Posts:</strong></p>
<ul>
  <li><a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe: Know Before You ALTER</a></li>
  <li><a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a></li>
  <li><a href="/mysql/innodb/2026/02/25/mysql-fk-cascade-blind-spot.html">ON DELETE CASCADE: The Foreign Key Change MySQL Doesn’t Log</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="foreign-keys"/>
    <category term="schema-changes"/>
    <category term="dbsafe"/>
    <category term="pt-online-schema-change"/>
    <summary type="html"><![CDATA[MySQL foreign key constraints extend metadata locks to related tables and eliminate gh-ost for DDL. dbsafe surfaces the full FK impact before you alter.]]></summary>
  </entry>
  <entry>
    <title type="html">ON DELETE CASCADE: The Foreign Key Change MySQL Doesn’t Log</title>
    <link href="https://rendiment.io/mysql/innodb/2026/02/25/mysql-fk-cascade-blind-spot.html" rel="alternate" type="text/html" title="ON DELETE CASCADE: The Foreign Key Change MySQL Doesn’t Log"/>
    <published>2026-02-25T00:00:00-05:00</published>
    <updated>2026-02-25T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/innodb/2026/02/25/mysql-fk-cascade-blind-spot</id>
    <content type="html" xml:base="https://rendiment.io/mysql/innodb/2026/02/25/mysql-fk-cascade-blind-spot.html"><![CDATA[<p>You’ve set <code class="language-plaintext highlighter-rouge">binlog_format=ROW</code> and <code class="language-plaintext highlighter-rouge">binlog_row_image=FULL</code>. Every row change is in the binlog, right? Triggers, stored procedures, LOAD DATA, all of it — every effect is captured as row events.</p>

<p>Mostly yes. But there’s one case where MySQL is completely silent: <strong>foreign key cascades</strong>.</p>

<p>And the silence isn’t just in the binlog. It’s in the audit plugin and in your triggers too. Three blind spots, same root cause.</p>

<hr />

<h2 id="what-row--full-actually-captures">What ROW + FULL actually captures</h2>

<p>When you use row-based logging, the server records the <em>effect</em> of every DML operation — not the statement itself. The <a href="https://dev.mysql.com/doc/refman/8.0/en/stored-programs-logging.html">MySQL docs on stored program binary logging</a> are clear about this:</p>

<blockquote>
  <p>For stored procedures, the CALL statement is not logged. For stored functions, row changes made within the function are logged, not the function invocation.</p>
</blockquote>

<p>The same applies to triggers and events. The code doesn’t appear in the binlog — the row changes do. That’s good. With <code class="language-plaintext highlighter-rouge">binlog_row_image=FULL</code>, each row event contains every column of the row, before and after the change.</p>

<p>So for direct DML, stored procedures, triggers, stored functions, prepared statements, LOAD DATA INFILE, and Event Scheduler events — you’re fully covered.</p>

<p>Foreign key cascades are the exception.</p>

<hr />

<h2 id="the-cascade-blind-spot">The cascade blind spot</h2>

<p>When you delete a row from a parent table and InnoDB cascades that delete to child rows, only the parent deletion appears in the binlog. The child table deletions are nowhere to be found.</p>

<p>This is <a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">documented as a known limitation</a> in the MySQL 8.0 Reference Manual, section 15.1.20.5.</p>

<p>Here’s what the binlog actually shows for a cascaded delete:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>| binlog.000003 | 354 | Table_map    | table_id: 90 (orders)      |
| binlog.000003 | 410 | Delete_rows  | table_id: 90               |
| binlog.000003 | 459 | Xid          | COMMIT                     |
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">orders</code> has a child table <code class="language-plaintext highlighter-rouge">order_items</code> with <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code>, every deletion in <code class="language-plaintext highlighter-rouge">order_items</code> is missing from that output. The binlog tells you the parent row is gone, and nothing else.</p>

<p>The reason is architectural. InnoDB handles cascade enforcement entirely inside the storage engine. When InnoDB processes a <code class="language-plaintext highlighter-rouge">DELETE</code> on the parent table, it finds the matching child rows and removes them internally — without ever surfacing those operations back to the SQL layer. The binary log is written by the SQL layer. If the SQL layer doesn’t see it, the binlog doesn’t record it.</p>

<p>This behavior is the same whether you’re using statement-based or row-based replication. Changing <code class="language-plaintext highlighter-rouge">binlog_format</code> doesn’t help.</p>

<hr />

<h2 id="and-it-gets-worse-triggers-dont-fire-either">And it gets worse: triggers don’t fire either</h2>

<p>Before reaching for triggers as a workaround, there’s a second problem. Triggers on the child table also don’t fire when rows are deleted by a cascade.</p>

<p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/create-trigger.html">MySQL 8.0 Reference Manual section on CREATE TRIGGER</a> states this directly:</p>

<blockquote>
  <p>Cascaded foreign key actions do not activate triggers.</p>
</blockquote>

<p>The same sentence appears in the <a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">FOREIGN KEY Constraints section</a> and has been consistent across every MySQL version from 5.5 to 8.4. It was first reported as <a href="https://bugs.mysql.com/bug.php?id=11472">bug #11472</a> back in 2005 and remains open as of MySQL 8.x.</p>

<p>So if you had a <code class="language-plaintext highlighter-rouge">AFTER DELETE</code> trigger on <code class="language-plaintext highlighter-rouge">order_items</code> to write to an audit table — it won’t fire when the cascade removes rows. It will only fire if rows are deleted directly with an explicit <code class="language-plaintext highlighter-rouge">DELETE</code> statement.</p>

<hr />

<h2 id="and-the-audit-plugin-too">And the audit plugin too</h2>

<p>The MySQL Enterprise Audit plugin, the Percona Audit Log plugin, and MariaDB-based audit plugins all intercept events at the SQL layer — the same layer the binlog is written at. Since cascade operations never reach the SQL layer, they are invisible to every audit plugin as well.</p>

<p>The complete picture in MySQL 8.x:</p>

<table>
  <thead>
    <tr>
      <th>Mechanism</th>
      <th>Sees direct DML</th>
      <th>Sees FK cascade effects</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Binlog (ROW + FULL)</td>
      <td>✅</td>
      <td>❌ confirmed</td>
    </tr>
    <tr>
      <td>Audit plugin</td>
      <td>✅</td>
      <td>❌ confirmed</td>
    </tr>
    <tr>
      <td>Triggers on child table</td>
      <td>✅</td>
      <td>❌ confirmed</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">general_log</code></td>
      <td>✅</td>
      <td>❌ confirmed</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">events_statements_*</code></td>
      <td>✅</td>
      <td>❌ confirmed</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">table_io_waits_summary_by_table</code></td>
      <td>✅</td>
      <td>⚠️ likely blind spot</td>
    </tr>
  </tbody>
</table>

<p>The first five are confirmed blind spots — all operate at or above the SQL layer, and the cascade never reaches them.</p>

<p>The last one is more nuanced. <code class="language-plaintext highlighter-rouge">table_io_waits_summary_by_table</code> instruments the <code class="language-plaintext highlighter-rouge">wait/io/table/sql/handler</code> instrument, which hooks into the handler API (<code class="language-plaintext highlighter-rouge">ha_delete_row</code>, <code class="language-plaintext highlighter-rouge">ha_write_row</code>, etc.) at a lower level than statements. In theory, if InnoDB called <code class="language-plaintext highlighter-rouge">ha_delete_row</code> for each cascaded row, Performance Schema would count it. But in practice, InnoDB’s cascade implementation calls internal row deletion functions directly, bypassing the standard handler interface that Performance Schema instruments. The architectural expectation is that this is also a blind spot, but I haven’t found explicit documentation confirming it for this specific table — and it’s worth verifying empirically in your environment before relying on it for anything compliance-related.</p>

<p>The only layer with complete visibility into cascade operations is InnoDB itself — in the undo log and the tablespace pages.</p>

<hr />

<h2 id="verifying-it-yourself">Verifying it yourself</h2>

<p>You can confirm this in minutes. Set up a simple parent/child relationship and watch the binlog:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INT</span> <span class="k">PRIMARY</span> <span class="k">KEY</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span><span class="p">;</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">order_items</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INT</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>
  <span class="n">order_id</span> <span class="nb">INT</span><span class="p">,</span>
  <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">order_id</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">orders</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">ON</span> <span class="k">DELETE</span> <span class="k">CASCADE</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span><span class="p">;</span>

<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">orders</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">order_items</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>

<span class="c1">-- Now delete the parent</span>
<span class="k">DELETE</span> <span class="k">FROM</span> <span class="n">orders</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</code></pre></div></div>

<p>Then check what made it into the binlog:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysqlbinlog <span class="nt">--base64-output</span><span class="o">=</span>DECODE-ROWS <span class="nt">-v</span> /var/lib/mysql/binlog.000001
</code></pre></div></div>

<p>You’ll see one <code class="language-plaintext highlighter-rouge">Delete_rows</code> event for <code class="language-plaintext highlighter-rouge">orders</code>. Nothing for <code class="language-plaintext highlighter-rouge">order_items</code>, even though those three rows are gone.</p>

<hr />

<h2 id="the-workaround-move-the-cascade-to-the-sql-layer">The workaround: move the cascade to the SQL layer</h2>

<p>The only clean solution in MySQL 8.x is to stop letting InnoDB handle the cascade and do it explicitly in the SQL layer instead. You keep the foreign key for referential integrity, but change the action to <code class="language-plaintext highlighter-rouge">RESTRICT</code>, and add a <code class="language-plaintext highlighter-rouge">BEFORE DELETE</code> trigger on the parent that performs the deletion on children explicitly:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Step 1: Change the FK to RESTRICT (no cascade)</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">order_items</span>
  <span class="k">DROP</span> <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="n">order_items_ibfk_1</span><span class="p">,</span>
  <span class="k">ADD</span> <span class="k">CONSTRAINT</span> <span class="n">order_items_ibfk_1</span>
    <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">order_id</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">orders</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">ON</span> <span class="k">DELETE</span> <span class="k">RESTRICT</span><span class="p">;</span>

<span class="c1">-- Step 2: Add a trigger on the parent that does the deletion</span>
<span class="k">DELIMITER</span> <span class="o">//</span>
<span class="k">CREATE</span> <span class="k">TRIGGER</span> <span class="n">before_order_delete</span>
<span class="k">BEFORE</span> <span class="k">DELETE</span> <span class="k">ON</span> <span class="n">orders</span>
<span class="k">FOR</span> <span class="k">EACH</span> <span class="k">ROW</span>
<span class="k">BEGIN</span>
  <span class="k">DELETE</span> <span class="k">FROM</span> <span class="n">order_items</span> <span class="k">WHERE</span> <span class="n">order_id</span> <span class="o">=</span> <span class="k">OLD</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
<span class="k">END</span> <span class="o">//</span>
<span class="k">DELIMITER</span> <span class="p">;</span>
</code></pre></div></div>

<p>Now when you delete from <code class="language-plaintext highlighter-rouge">orders</code>, the trigger fires, issues an explicit <code class="language-plaintext highlighter-rouge">DELETE</code> on <code class="language-plaintext highlighter-rouge">order_items</code>, and that statement goes through the full SQL layer: it appears in the binlog, it fires any <code class="language-plaintext highlighter-rouge">AFTER DELETE</code> triggers on <code class="language-plaintext highlighter-rouge">order_items</code>, and the audit plugin captures it.</p>

<p><strong>This comes with a tradeoff.</strong> InnoDB’s native cascade is highly optimized — it uses a depth-first search on the FK index and deletes child rows in a single internal pass. When you replace it with a trigger-based <code class="language-plaintext highlighter-rouge">DELETE</code>, that statement goes through the full SQL engine execution path. For tables with millions of child rows per parent, this can be measurably slower. Test it against your actual data volumes before deploying.</p>

<hr />

<h2 id="data-forensics-when-you-cant-prevent-it">Data forensics when you can’t prevent it</h2>

<p>If cascades already happened and you need to reconstruct what was deleted from child tables, it’s possible — but only if you have a backup predating the event.</p>

<p>The cascade is deterministic. Given the parent row’s primary key and the FK definition, you know exactly which child rows InnoDB would have removed. The process:</p>

<ol>
  <li>Parse the binlog to extract the PK values of deleted parent rows and their timestamps</li>
  <li>Restore your last backup to a separate instance</li>
  <li>Apply the binlog up to just before the delete event</li>
  <li>Query the child tables filtering by the FK column — those results are exactly what the cascade removed</li>
  <li>Repeat recursively if you have multi-level FK chains</li>
</ol>

<p>If you don’t have a backup, the only path is low-level InnoDB recovery: <code class="language-plaintext highlighter-rouge">undrop-for-innodb</code> or manual tablespace page parsing. That’s a different category of work entirely and not guaranteed to succeed.</p>

<hr />

<h2 id="mysql-96-finally-fixes-this">MySQL 9.6 finally fixes this</h2>

<p>MySQL 9.6, released January 2026, addresses the root cause. Oracle moved foreign key enforcement and cascade execution from the InnoDB storage engine to the SQL engine layer. Now when a cascade fires, the SQL engine generates discrete DML statements for the child table operations, and those statements are logged to the binlog normally.</p>

<p>The same <code class="language-plaintext highlighter-rouge">DELETE FROM orders WHERE id = 1</code> that previously produced one binlog event now produces events for both the parent deletion and all child table deletions — and audit plugins and triggers on child tables finally see them too.</p>

<p>For MySQL 8.x and Percona Server 8.x, this fix is not available. The architectural change is deep enough that a backport to 8.4 LTS would be a major engineering effort, and there’s no indication Percona plans to do it.</p>

<p>If you’re running Percona XtraDB Cluster 8.x and this gap matters for your compliance or CDC requirements, the trigger-based workaround described above is your best option today.</p>

<hr />

<h2 id="summary">Summary</h2>

<p>In MySQL 8.x, <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code> and <code class="language-plaintext highlighter-rouge">ON UPDATE CASCADE</code> with InnoDB are handled entirely inside the storage engine, invisible to every observability layer that operates at or above the SQL layer:</p>

<ul>
  <li>The binlog only records the parent table operation</li>
  <li>Audit plugins miss the child table changes entirely</li>
  <li>Triggers on child tables do not fire</li>
</ul>

<p>The trigger-based workaround — replacing cascade FKs with <code class="language-plaintext highlighter-rouge">RESTRICT</code> and doing explicit child deletions in a <code class="language-plaintext highlighter-rouge">BEFORE DELETE</code> trigger on the parent — is the only way to get full visibility on MySQL 8.x. It’s more verbose and potentially slower at scale, but it makes the cascade operations first-class citizens in your binlog and audit logs.</p>

<p>MySQL 9.6 solves this at the architecture level. Until you get there, document the gap explicitly and design around it.</p>

<hr />

<p><em>References:</em></p>
<ul>
  <li><em><a href="https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html">MySQL 8.0 Docs: FOREIGN KEY Constraints §15.1.20.5</a></em></li>
  <li><em><a href="https://dev.mysql.com/doc/refman/8.0/en/create-trigger.html">MySQL 8.0 Docs: CREATE TRIGGER §15.1.22</a></em></li>
  <li><em><a href="https://dev.mysql.com/doc/refman/8.0/en/stored-programs-logging.html">MySQL 8.0 Docs: Stored Program Binary Logging §27.7</a></em></li>
  <li><em><a href="https://bugs.mysql.com/bug.php?id=32506">MySQL Bug #32506: Foreign key cascades do not appear when binlog_format = ‘ROW’</a></em></li>
  <li><em><a href="https://bugs.mysql.com/bug.php?id=11472">MySQL Bug #11472: Triggers not executed following foreign key updates/deletes</a></em></li>
  <li><em><a href="https://blogs.oracle.com/mysql/no-more-hidden-changes-how-mysql-9-6-transforms-foreign-key-management">Oracle MySQL Blog: No More Hidden Changes: How MySQL 9.6 Transforms Foreign Key Management</a></em></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="innodb"/>
    <category term="mysql"/>
    <category term="innodb"/>
    <category term="binlog"/>
    <category term="replication"/>
    <category term="foreign-keys"/>
    <category term="cdc"/>
    <category term="audit"/>
    <summary type="html"><![CDATA[MySQL InnoDB's ON DELETE CASCADE bypasses the binlog, audit plugins, and triggers — the three blind spots explained plus a trigger-based workaround for 8.x.]]></summary>
  </entry>
  <entry>
    <title type="html">Always Have an Exit Strategy: dbsafe Automatic Rollback Plans</title>
    <link href="https://rendiment.io/mysql/tools/2026/02/23/rollback-plans-dbsafe.html" rel="alternate" type="text/html" title="Always Have an Exit Strategy: dbsafe Automatic Rollback Plans"/>
    <published>2026-02-23T00:00:00-05:00</published>
    <updated>2026-02-23T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/02/23/rollback-plans-dbsafe</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/02/23/rollback-plans-dbsafe.html"><![CDATA[<p>You’ve planned the schema change. You’ve verified the algorithm. You’ve scheduled the maintenance window. And then, fifteen minutes after the <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> commits, the application starts throwing errors. The column type is wrong. The index breaks a query. You need to undo it — immediately, under pressure, while your on-call phone is ringing.</p>

<p>Do you have the rollback SQL written down?</p>

<p>Every schema change has a mirror operation. <code class="language-plaintext highlighter-rouge">ADD COLUMN</code> → <code class="language-plaintext highlighter-rouge">DROP COLUMN</code>. <code class="language-plaintext highlighter-rouge">ADD INDEX</code> → <code class="language-plaintext highlighter-rouge">DROP INDEX</code>. <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">dbsafe</a> generates that rollback SQL automatically as part of every <code class="language-plaintext highlighter-rouge">plan</code> output. Before you touch production, you already know the undo operation — and whether it’s actually reversible at all.</p>

<blockquote>
  <p><strong>Related:</strong> This is part of the dbsafe series. <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">Zero-Downtime Schema Changes with INSTANT DDL</a> covers metadata-only operations that take milliseconds. <a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a> covers the dangerous end — full table rebuilds, exclusive locks, and when to use gh-ost or pt-osc.</p>
</blockquote>

<h2 id="the-two-kinds-of-rollback">The Two Kinds of Rollback</h2>

<p>dbsafe’s rollback plan section covers two fundamentally different scenarios:</p>

<p><strong>Structural rollbacks</strong> — operations where the inverse SQL is mechanically clean. Adding a column can be undone by dropping it. Adding an index can be undone by dropping it. These are reversible at the SQL level.</p>

<p><strong>Data destruction</strong> — operations where there is no SQL rollback. <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> permanently destroys the values stored in that column. A <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> → <code class="language-plaintext highlighter-rouge">ADD COLUMN</code> pair restores the column definition but not the data. dbsafe surfaces a warning rather than a misleading rollback command.</p>

<p>Knowing which scenario you’re in — before you run the change — is what makes the difference between a prepared runbook and an incident.</p>

<h2 id="add-column-clean-rollback">ADD COLUMN: Clean Rollback</h2>

<p>The simplest rollback case. Adding a column has a clean mirror: drop the same column. The schema returns to its original state.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_notes TEXT"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-rollback-add-column.png" alt="dbsafe output for ADD COLUMN fulfillment_notes showing INSTANT algorithm, NONE locking, SAFE risk, and rollback plan with DROP COLUMN fulfillment_notes" /></p>

<p>dbsafe identifies this as <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT</a> — a metadata-only change, no table rebuild, milliseconds on any table size. The rollback plan at the bottom of the report shows exactly what to run to undo it:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">DROP</span> <span class="k">COLUMN</span> <span class="n">fulfillment_notes</span><span class="p">;</span>
</code></pre></div></div>

<p>On MySQL 8.0.29+, that rollback is also INSTANT. The column is marked as dropped in the InnoDB data dictionary and the change commits in milliseconds — no rebuild, no rows touched.</p>

<p>One important caveat: the structural rollback only works cleanly if no data has been written to the column yet. If the application started populating <code class="language-plaintext highlighter-rouge">fulfillment_notes</code> between the <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> and the rollback, that data disappears when the column is dropped. The SQL is correct, but it’s a schema rollback, not a data rollback. Backups cover the data; the rollback SQL covers the structure.</p>

<h2 id="add-index-clean-rollback">ADD INDEX: Clean Rollback</h2>

<p>Index additions are even cleaner to roll back. <code class="language-plaintext highlighter-rouge">ADD INDEX</code> → <code class="language-plaintext highlighter-rouge">DROP INDEX</code>, with zero data loss. Indexes are derived structures built from existing row data — dropping one never removes a single byte of actual stored data.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders ADD INDEX idx_total_amount (total_amount)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-rollback-add-index.png" alt="dbsafe output for ADD INDEX showing INPLACE algorithm and rollback plan with DROP INDEX idx_total_amount on orders" /></p>

<p>The rollback plan shows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">DROP</span> <span class="k">INDEX</span> <span class="n">idx_total_amount</span><span class="p">;</span>
</code></pre></div></div>

<p>The forward operation is <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INPLACE</a> — MySQL reads all rows to build the index structure without creating a shadow copy of the table. The rollback is faster: dropping an index is a metadata change that doesn’t touch row data at all. If you realize the new index is causing query planner issues or unexpected lock contention, the DROP INDEX rollback is safe to run immediately under load.</p>

<h2 id="drop-column-data-is-gone">DROP COLUMN: Data Is Gone</h2>

<p>This is where the rollback plan becomes a warning.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders DROP COLUMN fulfillment_notes"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-rollback-drop-column.png" alt="dbsafe output for DROP COLUMN showing data loss warning — no SQL rollback possible, values are permanently destroyed on commit" /></p>

<p>dbsafe does not generate <code class="language-plaintext highlighter-rouge">ADD COLUMN fulfillment_notes TEXT</code> as the rollback. That SQL would restore the column definition but not the data stored in it across all existing rows. A misleading rollback is worse than no rollback — it creates the false impression that you can recover something you can’t.</p>

<p>Instead, the rollback section surfaces a clear warning: <strong>this operation permanently destroys data</strong>. After the commit, the only recovery path is restoring from a backup.</p>

<p>Before running any <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> on a production table:</p>

<ol>
  <li>Confirm the column is genuinely unused — check application code, ORM models, stored procedures, and views</li>
  <li>Verify you have a recent backup and you know how to restore from it</li>
  <li>Consider a deprecation buffer: rename the column to <code class="language-plaintext highlighter-rouge">_col_deprecated_YYYYMMDD</code> and wait two full deployment cycles before dropping. If something breaks, you rename it back in milliseconds</li>
  <li>On MySQL 8.0.29+, the physical bytes stay on disk until the next rebuild (<code class="language-plaintext highlighter-rouge">OPTIMIZE TABLE</code> or a COPY-algorithm ALTER) — but they’re completely inaccessible to MySQL and cannot be queried</li>
</ol>

<p>The rename strategy is underused. It costs nothing operationally — a <code class="language-plaintext highlighter-rouge">RENAME COLUMN</code> is <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT on MySQL 8.0.28+</a> — and it buys you weeks of confidence before the irreversible step.</p>

<h2 id="mysql-8029-instant-drop">MySQL 8.0.29+: INSTANT Drop</h2>

<p>On MySQL 8.0.29 or newer, <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> executes as <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT DDL</a>. The column is marked as dropped in the InnoDB data dictionary and the change commits in milliseconds, regardless of table size (with limited exceptions: <code class="language-plaintext highlighter-rouge">ROW_FORMAT=COMPRESSED</code> tables, tables with a <code class="language-plaintext highlighter-rouge">FULLTEXT</code> index, and tables that have exhausted the 64 INSTANT row-version limit).</p>

<p>The data destruction caveat is unchanged. INSTANT execution means the structural change is fast — it doesn’t mean the data is preserved or recoverable. What it does change for the rollback picture:</p>

<ul>
  <li><strong>No long-running ALTER to interrupt</strong> — with the COPY algorithm, the data-copy phase can take minutes or hours, giving you a window to kill the query before the final table swap. With INSTANT, the commit is atomic and immediate. There is no window to intervene after you press Enter.</li>
  <li><strong>Physical bytes remain until the next rebuild</strong> — the column data is invisible to MySQL and cannot be accessed via SQL. A full table rebuild — via <code class="language-plaintext highlighter-rouge">OPTIMIZE TABLE</code>, <code class="language-plaintext highlighter-rouge">ALTER TABLE ... FORCE</code>, or <code class="language-plaintext highlighter-rouge">ALTER TABLE ... ENGINE=InnoDB</code> — is the only way to reclaim the disk space.</li>
</ul>

<p>dbsafe surfaces a note when the <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> will execute as INSTANT, so you can set correct expectations for both timing and the permanence of the change.</p>

<h2 id="rollback-plans-in-practice">Rollback Plans in Practice</h2>

<p>The rollback section is designed for one specific use case: runbook preparation.</p>

<p>When you’re preparing a change for a production deployment window, capture both the forward SQL and the rollback SQL in your runbook before the window opens. During the change, if something goes wrong, you paste from a document you reviewed hours earlier — not SQL you’re writing under pressure while the on-call pager fires.</p>

<p>The workflow:</p>

<ol>
  <li><strong>Write your <code class="language-plaintext highlighter-rouge">ALTER TABLE</code></strong> statement</li>
  <li><strong>Run <code class="language-plaintext highlighter-rouge">dbsafe plan</code></strong> against your production server or a production-scale replica</li>
  <li><strong>Copy the rollback SQL</strong> into your runbook alongside the forward operation</li>
  <li><strong>Note the rollback category</strong>: structural (reversible) or destructive (backup required)</li>
  <li><strong>Execute the forward change</strong> during the window</li>
  <li><strong>If rollback needed</strong>: paste and run the rollback SQL immediately</li>
</ol>

<p>For <a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">COPY-algorithm operations</a> executed via gh-ost or pt-online-schema-change, the rollback window is wider. Both tools maintain the original table until the final cutover: gh-ost keeps the <code class="language-plaintext highlighter-rouge">_orders_gho</code> shadow table, pt-osc keeps <code class="language-plaintext highlighter-rouge">_orders_new</code>. Aborting before cutover leaves the original table completely untouched — the rollback is just stopping the tool. After cutover, you’re in the same position as a direct ALTER, and the dbsafe rollback SQL applies.</p>

<blockquote>
  <p><strong>Related:</strong> After a schema change rollback, monitor for InnoDB mutex contention — especially if the change involved a heavily-written table. See <a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a> for how to read the <code class="language-plaintext highlighter-rouge">SEMAPHORES</code> section of <code class="language-plaintext highlighter-rouge">SHOW ENGINE INNODB STATUS</code> and interpret elevated wait counts after DDL.</p>
</blockquote>

<h2 id="summary">Summary</h2>

<ol>
  <li><strong>dbsafe generates rollback SQL automatically</strong> for every <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> plan. It’s included in the standard output, no extra flags needed.</li>
  <li><strong>ADD COLUMN and ADD INDEX have clean structural rollbacks</strong> — drop the same column or index. No data loss, and the rollback operation is often faster than the forward operation.</li>
  <li><strong>DROP COLUMN is irreversible</strong> — dbsafe shows a data loss warning instead of misleading rollback SQL. The only real recovery is a backup.</li>
  <li><strong>MySQL 8.0.29+ makes DROP COLUMN INSTANT</strong> — milliseconds, but the data destruction is still permanent. No window to intervene after commit.</li>
  <li><strong>Capture rollback SQL in your runbook before the change window</strong> — not while the incident is happening.</li>
</ol>

<p>Happy (safe) schema changes!</p>

<h2 id="references">References</h2>

<p><strong>MySQL Official Documentation:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">ALTER TABLE Statement — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html">InnoDB and Online DDL — MySQL 8.0</a></li>
</ul>

<p><strong>Tools:</strong></p>
<ul>
  <li><a href="https://github.com/nethalo/dbsafe">dbsafe — GitHub Repository</a></li>
  <li><a href="https://github.com/github/gh-ost">gh-ost — GitHub’s Online Schema Migration Tool</a></li>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change — Percona Toolkit</a></li>
</ul>

<p><strong>Related Posts:</strong></p>
<ul>
  <li><a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe: Know Before You ALTER</a></li>
  <li><a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">Zero-Downtime Schema Changes with INSTANT DDL</a></li>
  <li><a href="/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</a></li>
  <li><a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="schema-changes"/>
    <category term="dbsafe"/>
    <category term="rollback"/>
    <category term="innodb"/>
    <summary type="html"><![CDATA[Learn how dbsafe auto-generates rollback SQL for every schema change — with data loss warnings for DROP COLUMN and INSTANT drop notes for MySQL 8.0.29+.]]></summary>
  </entry>
  <entry>
    <title type="html">When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL</title>
    <link href="https://rendiment.io/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html" rel="alternate" type="text/html" title="When MySQL Rebuilds Your Table: Understanding COPY Algorithm DDL"/>
    <published>2026-02-22T00:00:00-05:00</published>
    <updated>2026-02-22T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/02/22/copy-algorithm-mysql-dbsafe.html"><![CDATA[<p>You run <code class="language-plaintext highlighter-rouge">ALTER TABLE orders MODIFY COLUMN status VARCHAR(100)</code> on your production table. It looks simple — the column already exists, you’re just increasing the size limit. Then you watch the operation spin for 40 minutes while your application throws lock timeout errors and your replica falls an hour behind. MySQL didn’t update the column definition. It built an entirely new copy of the table from scratch.</p>

<p>This is the COPY algorithm: the most disruptive class of <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> operations. Unlike the <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT DDL operations covered in the previous post</a>, COPY operations touch every row, block DML for the duration, and can take hours on large tables — or days.</p>

<p>This post covers exactly which operations trigger COPY, what physically happens at the InnoDB level, how table size translates to actual risk, and how <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">dbsafe</a> detects the algorithm and generates the mitigation commands you need.</p>

<blockquote>
  <p><strong>Related:</strong> This is the third post in the dbsafe series. <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe</a> covers installation and the full analysis capabilities. <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">Zero-Downtime Schema Changes with INSTANT DDL</a> covers the safe alternative — the metadata-only operations that don’t touch rows at all.</p>
</blockquote>

<h2 id="the-problem-alter-table-as-a-full-table-rebuild">The Problem: ALTER TABLE as a Full Table Rebuild</h2>

<p>When MySQL uses the COPY algorithm, it doesn’t modify your table in place. It executes a sequence of physical operations that are equivalent to recreating the table from scratch:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- What MySQL does internally during a COPY algorithm ALTER TABLE:</span>

<span class="c1">-- Step 1: Create a new shadow table with the modified structure</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">orders_new</span> <span class="k">LIKE</span> <span class="n">orders</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders_new</span> <span class="k">MODIFY</span> <span class="k">COLUMN</span> <span class="n">status</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>

<span class="c1">-- Step 2: Copy every row from the original to the shadow table</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">orders_new</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">orders</span><span class="p">;</span>

<span class="c1">-- Step 3: Acquire an exclusive lock and drop the original</span>
<span class="k">LOCK</span> <span class="n">TABLES</span> <span class="n">orders</span> <span class="k">WRITE</span><span class="p">;</span>
<span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">orders</span><span class="p">;</span>

<span class="c1">-- Step 4: Rename the shadow table to the original name</span>
<span class="k">RENAME</span> <span class="k">TABLE</span> <span class="n">orders_new</span> <span class="k">TO</span> <span class="n">orders</span><span class="p">;</span>
<span class="n">UNLOCK</span> <span class="n">TABLES</span><span class="p">;</span>
</code></pre></div></div>

<p>No rows are skipped. The operation is as expensive as it looks — a full sequential read of every row, a full write into a new table, and a metadata swap under an exclusive lock.</p>

<p>Four risk factors compound this:</p>

<ol>
  <li><strong>Shared lock</strong> — the table is locked for writes (DML) but reads (<code class="language-plaintext highlighter-rouge">SELECT</code>) continue under <code class="language-plaintext highlighter-rouge">LOCK=SHARED</code> during the copy. Applications cannot insert, update, or delete rows, but queries can still read.</li>
  <li><strong>Disk space doubles</strong> — the new table must exist alongside the original until the rename. A 200GB table requires 200GB of free disk space during the operation.</li>
  <li><strong>Replication lag</strong> — replicas must independently re-execute the full <code class="language-plaintext highlighter-rouge">ALTER TABLE</code>, creating lag proportional to table size and I/O throughput.</li>
  <li><strong>Duration scales with row count</strong> — 10 million rows takes roughly 10× longer than 1 million rows. There’s no shortcut.</li>
</ol>

<p>INPLACE operations are better — they often avoid the row-by-row copy — but can still require an internal data rebuild and may hold metadata locks. INSTANT is the only truly lock-free path, and it only applies to <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">a specific set of operations</a>.</p>

<h2 id="the-orders-table">The Orders Table</h2>

<p>The examples below use the same <code class="language-plaintext highlighter-rouge">orders</code> table introduced in the <a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">INSTANT DDL post</a>: 21 columns, 7 indexes, and a foreign key to <code class="language-plaintext highlighter-rouge">customers</code>. It’s a realistic production schema where algorithm choices have real consequences.</p>

<p>Refer to the INSTANT DDL post for the full <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> statement. The column names used in the examples below — <code class="language-plaintext highlighter-rouge">status VARCHAR(20)</code>, <code class="language-plaintext highlighter-rouge">total_amount DECIMAL(12,2)</code>, and the mixed-charset string columns — are all part of that schema.</p>

<h2 id="modify-column-expanding-a-varchar">MODIFY COLUMN: Expanding a VARCHAR</h2>

<p>The single most common accidental COPY trigger. A product manager asks for longer status values, so you expand <code class="language-plaintext highlighter-rouge">status VARCHAR(20)</code> to <code class="language-plaintext highlighter-rouge">VARCHAR(100)</code>. Same column, same data type, just a bigger limit — surely that’s instant?</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders MODIFY COLUMN status VARCHAR(100)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-copy-modify-varchar.png" alt="dbsafe output showing COPY algorithm, SHARED lock, DANGEROUS risk, and pt-osc recommendation for MODIFY COLUMN status VARCHAR(100) — full output including pt-online-schema-change command with --preserve-triggers visible" /></p>

<p>dbsafe reports <code class="language-plaintext highlighter-rouge">Algorithm: COPY</code>, <code class="language-plaintext highlighter-rouge">Lock: SHARED</code>, and a <code class="language-plaintext highlighter-rouge">Dangerous</code> risk assessment. The analysis includes disk space required, an explanation of why gh-ost cannot be used (the table has triggers), and a ready-to-run <code class="language-plaintext highlighter-rouge">pt-online-schema-change</code> command with <code class="language-plaintext highlighter-rouge">--preserve-triggers</code>.</p>

<p>The reason this triggers COPY comes down to how InnoDB stores variable-length columns. MySQL’s row format encodes the length of each <code class="language-plaintext highlighter-rouge">VARCHAR</code> value using 1 or 2 bytes depending on the declared maximum:</p>

<ul>
  <li><strong>1-byte length prefix</strong>: VARCHAR where <code class="language-plaintext highlighter-rouge">max_chars × bytes_per_char ≤ 255</code></li>
  <li><strong>2-byte length prefix</strong>: VARCHAR where <code class="language-plaintext highlighter-rouge">max_chars × bytes_per_char &gt; 255</code></li>
</ul>

<p>Crossing that 255-byte boundary forces an on-disk format change that requires rewriting every row — triggering COPY.</p>

<p>The charset is the critical variable. With <code class="language-plaintext highlighter-rouge">latin1</code> (1 byte/char), <code class="language-plaintext highlighter-rouge">VARCHAR(20)</code> = 20 bytes and <code class="language-plaintext highlighter-rouge">VARCHAR(100)</code> = 100 bytes — both stay under 255, so the expansion is INPLACE. With <code class="language-plaintext highlighter-rouge">utf8mb4</code> (up to 4 bytes/char), <code class="language-plaintext highlighter-rouge">VARCHAR(20)</code> = 80 bytes and <code class="language-plaintext highlighter-rouge">VARCHAR(100)</code> = 400 bytes — crossing the boundary triggers COPY. The <code class="language-plaintext highlighter-rouge">orders</code> table uses <code class="language-plaintext highlighter-rouge">utf8mb4</code>, which is why dbsafe reports COPY here.</p>

<p>The practical rule: <strong>any <code class="language-plaintext highlighter-rouge">MODIFY COLUMN</code> that changes size, type, nullability, or charset will use COPY or INPLACE — never INSTANT.</strong></p>

<h2 id="change-column-rename-with-type-change">CHANGE COLUMN: Rename with Type Change</h2>

<p><code class="language-plaintext highlighter-rouge">CHANGE COLUMN</code> is the syntax for renaming a column while optionally changing its type. You want to rename <code class="language-plaintext highlighter-rouge">total_amount</code> to <code class="language-plaintext highlighter-rouge">amount</code> and change the precision from <code class="language-plaintext highlighter-rouge">DECIMAL(12,2)</code> to <code class="language-plaintext highlighter-rouge">DECIMAL(14,4)</code> to track fractional amounts more precisely:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders CHANGE COLUMN total_amount amount DECIMAL(14,4)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-copy-change-column.png" alt="dbsafe output showing COPY algorithm for CHANGE COLUMN with type change" /></p>

<p>The type change from <code class="language-plaintext highlighter-rouge">DECIMAL(12,2)</code> to <code class="language-plaintext highlighter-rouge">DECIMAL(14,4)</code> forces COPY. Changing decimal scale (the digits after the decimal point) modifies the internal binary encoding of every stored value — so every row must be rewritten.</p>

<p>All DECIMAL precision changes require <code class="language-plaintext highlighter-rouge">ALGORITHM=COPY</code> — there are no INSTANT or INPLACE exceptions like VARCHAR has (<a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">MySQL docs</a>). When in doubt, run <code class="language-plaintext highlighter-rouge">dbsafe plan</code> against your actual MySQL version before assuming the algorithm.</p>

<blockquote>
  <p><strong>Tip:</strong> If you only need to rename the column — without changing the type — use <code class="language-plaintext highlighter-rouge">RENAME COLUMN</code> instead of <code class="language-plaintext highlighter-rouge">CHANGE COLUMN</code>. <code class="language-plaintext highlighter-rouge">RENAME COLUMN</code> is available from MySQL 8.0.3+; it executes as INSTANT DDL from MySQL 8.0.28+ (INPLACE on earlier 8.0.x versions). Either way: no full row copy, milliseconds on any table size.</p>

  <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- INSTANT on MySQL 8.0.3+ (rename only, no type change)</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">RENAME</span> <span class="k">COLUMN</span> <span class="n">total_amount</span> <span class="k">TO</span> <span class="n">amount</span><span class="p">;</span>

<span class="c1">-- COPY (scale change forces full rebuild)</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="n">CHANGE</span> <span class="k">COLUMN</span> <span class="n">total_amount</span> <span class="n">amount</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span><span class="mi">4</span><span class="p">);</span>
</code></pre></div>  </div>
</blockquote>

<h2 id="character-set-conversion">Character Set Conversion</h2>

<p>Converting a table’s character set triggers a full table rebuild. The algorithm MySQL uses — COPY or INPLACE — depends on whether the table has indexes on character columns. Per <a href="https://dev.mysql.com/worklog/task/?id=11605">WL#11605</a>, collation changes on indexed columns cannot be performed inplace. If any <code class="language-plaintext highlighter-rouge">VARCHAR</code>, <code class="language-plaintext highlighter-rouge">CHAR</code>, or <code class="language-plaintext highlighter-rouge">TEXT</code> column involved in the conversion is part of an index, MySQL falls back to <code class="language-plaintext highlighter-rouge">ALGORITHM=COPY</code>.</p>

<p>Even when INPLACE is possible (tables with no indexes on string columns), <code class="language-plaintext highlighter-rouge">CONVERT TO CHARACTER SET</code> does not permit concurrent DML — the table is blocked for writes during the entire rebuild. This is a critical distinction from <code class="language-plaintext highlighter-rouge">ALTER TABLE ... CHARACTER SET = ...</code> (which only changes the table default without converting existing columns), which does allow concurrent DML under INPLACE.</p>

<p>For the <code class="language-plaintext highlighter-rouge">orders</code> table, five indexes reference VARCHAR columns (<code class="language-plaintext highlighter-rouge">uq_order_number</code>, <code class="language-plaintext highlighter-rouge">idx_status</code>, <code class="language-plaintext highlighter-rouge">idx_payment_method</code>, <code class="language-plaintext highlighter-rouge">idx_status_created</code>), so this conversion uses COPY. When you run:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-copy-charset-conversion.png" alt="dbsafe output showing COPY algorithm for CONVERT TO CHARACTER SET utf8mb4" /></p>

<p>Every string column in the table must be re-encoded. A <code class="language-plaintext highlighter-rouge">utf8mb3</code> character that uses up to 3 bytes may require up to 4 bytes in <code class="language-plaintext highlighter-rouge">utf8mb4</code>. While <code class="language-plaintext highlighter-rouge">utf8mb3</code> is a strict subset of <code class="language-plaintext highlighter-rouge">utf8mb4</code> (all valid <code class="language-plaintext highlighter-rouge">utf8mb3</code> byte sequences are valid <code class="language-plaintext highlighter-rouge">utf8mb4</code>), the change in maximum bytes-per-character affects column metadata, index key lengths, and the VARCHAR length-prefix calculations — which is why the table must be rebuilt. MySQL cannot predict which characters in your data need expansion, so it has to rewrite every string value in every row.</p>

<p>Two additional risks specific to charset conversions:</p>

<ul>
  <li><strong>Table size can grow</strong> — if your data contains characters above ASCII (accented letters, emojis, non-Latin scripts), <code class="language-plaintext highlighter-rouge">utf8mb4</code> encoding uses more bytes than <code class="language-plaintext highlighter-rouge">latin1</code>. A 100GB table can become 120GB post-conversion.</li>
  <li><strong>Index prefix limits</strong> — <code class="language-plaintext highlighter-rouge">utf8mb4</code> uses up to 4 bytes per character. A <code class="language-plaintext highlighter-rouge">VARCHAR(255)</code> with <code class="language-plaintext highlighter-rouge">utf8mb4</code> requires 1020 bytes for a full-column index, which exceeds InnoDB’s 767-byte index prefix limit on the <code class="language-plaintext highlighter-rouge">COMPACT</code> row format. If you have indexes on long string columns, the conversion may fail unless you use the <code class="language-plaintext highlighter-rouge">DYNAMIC</code> or <code class="language-plaintext highlighter-rouge">COMPRESSED</code> row format (the default since MySQL 5.7.9).</li>
</ul>

<p>For large tables with mixed-charset data, dbsafe’s estimated duration and disk space requirements are the numbers to review carefully before scheduling the operation.</p>

<h2 id="when-dbsafe-recommends-gh-ost-or-pt-online-schema-change">When dbsafe Recommends gh-ost or pt-online-schema-change</h2>

<p>When dbsafe detects a COPY algorithm operation, it doesn’t just warn you — it generates the exact migration tool command to use. The goal is zero-downtime: instead of taking an exclusive lock while copying rows, gh-ost and pt-osc copy rows in the background while the table remains fully writable.</p>

<p>You already saw the full output in the MODIFY COLUMN screenshot above. The bottom of the dbsafe report includes a complete <code class="language-plaintext highlighter-rouge">pt-online-schema-change</code> command with your server’s connection parameters, the table name, and the <code class="language-plaintext highlighter-rouge">--alter</code> flag pre-populated with your statement. You copy, review, and run it.</p>

<p>For the <code class="language-plaintext highlighter-rouge">orders</code> table in our demo, dbsafe recommended pt-osc — not gh-ost — because <code class="language-plaintext highlighter-rouge">orders</code> has two triggers (<code class="language-plaintext highlighter-rouge">trg_orders_after_update</code>, <code class="language-plaintext highlighter-rouge">trg_orders_after_delete</code>). gh-ost explicitly does not support tables with existing triggers — this is a documented hard limitation. gh-ost is triggerless by design: it captures row changes via binlog streaming rather than installing triggers. But when the source table has its own triggers, those triggers fire during gh-ost’s row copy and can produce unexpected side effects (double-firing, inconsistent data). gh-ost refuses to run in this case. dbsafe detects triggers via <code class="language-plaintext highlighter-rouge">information_schema.TRIGGERS</code> and switches the recommendation to pt-osc automatically.</p>

<p><strong>The full decision matrix for which tool dbsafe recommends:</strong></p>

<ul>
  <li>
    <p><strong>Table has triggers → pt-online-schema-change</strong> — gh-ost cannot operate on tables with existing triggers (known limitation). dbsafe checks for triggers first and routes to pt-osc, which handles triggers correctly via <code class="language-plaintext highlighter-rouge">--preserve-triggers</code>.</p>
  </li>
  <li>
    <p><strong>Galera/PXC cluster → pt-online-schema-change</strong> — gh-ost has known incompatibilities with Galera/PXC due to differences in how DDL and locking interact with writeset replication. pt-osc uses standard SQL DML that replicates correctly through wsrep. dbsafe detects the cluster topology and switches the recommendation automatically.</p>
  </li>
  <li>
    <p><strong>Amazon Aurora → pt-online-schema-change</strong> — gh-ost requires additional configuration and workarounds to run against Aurora (<code class="language-plaintext highlighter-rouge">--allow-on-master</code>, binary log configuration). pt-osc works correctly without extra configuration.</p>
  </li>
  <li>
    <p><strong>Standalone MySQL or async replication, no triggers → gh-ost (default)</strong> — gh-ost uses binlog streaming rather than triggers, making it pausable, throttleable, and safer for high-write environments. It’s the preferred tool when no triggers or cluster topology prevents it.</p>
  </li>
</ul>

<blockquote>
  <p>A future post will cover the full decision matrix for gh-ost vs pt-osc in detail, including throttling configuration, chunk sizing, and monitoring during execution.</p>
</blockquote>

<h2 id="risk-assessment-table-size-matters">Risk Assessment: Table Size Matters</h2>

<p>dbsafe factors table size into its risk assessment and estimated duration. The rough sizing guide:</p>

<table>
  <thead>
    <tr>
      <th>Table Size</th>
      <th>Estimated Duration</th>
      <th>Risk Level</th>
      <th>Recommendation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>&lt; 100 MB</td>
      <td>Seconds</td>
      <td>Moderate</td>
      <td>Native MySQL acceptable with caution</td>
    </tr>
    <tr>
      <td>100 MB – 1 GB</td>
      <td>Minutes</td>
      <td>High</td>
      <td>Use gh-ost or pt-osc, avoid peak hours</td>
    </tr>
    <tr>
      <td>1 GB – 10 GB</td>
      <td>Tens of minutes</td>
      <td>Dangerous</td>
      <td>gh-ost or pt-osc required</td>
    </tr>
    <tr>
      <td>10 GB – 100 GB</td>
      <td>Hours</td>
      <td>Critical</td>
      <td>gh-ost or pt-osc, staged rollout, monitoring</td>
    </tr>
    <tr>
      <td>&gt; 100 GB</td>
      <td>Many hours</td>
      <td>Critical</td>
      <td>Full migration plan, staging validation, on-call</td>
    </tr>
  </tbody>
</table>

<p>These estimates assume a reasonably loaded server with local SSD storage. NFS-mounted data directories, high concurrent write load, and row formats with large BLOBs all increase duration significantly. A <code class="language-plaintext highlighter-rouge">&lt; 100 MB</code> table under heavy write load can be more disruptive than a <code class="language-plaintext highlighter-rouge">1 GB</code> table on a quiet replica.</p>

<p>dbsafe’s duration estimate comes from the table’s current row count and average row size (read from <code class="language-plaintext highlighter-rouge">information_schema.TABLES</code>), so it reflects your actual data, not a generic estimate.</p>

<h2 id="practical-workflow-for-copy-operations">Practical Workflow for COPY Operations</h2>

<p>The workflow when you suspect or confirm a COPY operation:</p>

<ol>
  <li><strong>Write your <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> statement</strong> — start with what you actually need</li>
  <li><strong>Run <code class="language-plaintext highlighter-rouge">dbsafe plan</code></strong> against your production server or a replica with production-scale data</li>
  <li><strong>Confirm the algorithm</strong> — if COPY, do not execute natively on a large table</li>
  <li><strong>Check the table size</strong> — dbsafe shows current size; compare against the sizing guide above</li>
  <li><strong>Review the generated command</strong> — gh-ost or pt-osc, pre-populated by dbsafe</li>
  <li><strong>Test on staging</strong> — run the full migration tool command on a staging server with a production-size dataset</li>
  <li><strong>Execute with gh-ost or pt-osc</strong> — during a lower-traffic window, with throttling configured</li>
  <li><strong>Verify after completion</strong> — check row counts, spot-check data, confirm replication is caught up</li>
</ol>

<blockquote>
  <p><strong>Related:</strong> Heavy DDL operations — even when run via gh-ost — can cause InnoDB mutex and semaphore contention. See <a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a> for how to detect contention using <code class="language-plaintext highlighter-rouge">SHOW ENGINE INNODB STATUS</code> during and after schema operations. The <code class="language-plaintext highlighter-rouge">SEMAPHORES</code> section will show elevated waits if the operation is stressing the buffer pool or lock subsystem.</p>
</blockquote>

<p>For automated pipelines, <code class="language-plaintext highlighter-rouge">dbsafe plan --format json</code> lets you extract the algorithm and generated command programmatically. If the algorithm is not INSTANT, the pipeline can halt and present the migration command for human review before any production change proceeds.</p>

<h2 id="summary">Summary</h2>

<ol>
  <li><strong>COPY algorithm means a full table duplicate</strong> — MySQL creates a new table, copies every row, then swaps. Disk space doubles temporarily. DML (writes) is blocked throughout under <code class="language-plaintext highlighter-rouge">LOCK=SHARED</code>, but reads can continue.</li>
  <li><strong>The most common COPY triggers are</strong> <code class="language-plaintext highlighter-rouge">MODIFY COLUMN</code> (any size or type change that crosses the VARCHAR length-prefix boundary or changes binary encoding), <code class="language-plaintext highlighter-rouge">CHANGE COLUMN</code> with a type change, and dropping a primary key without replacement. Charset conversions (<code class="language-plaintext highlighter-rouge">CONVERT TO CHARACTER SET</code>) use COPY when the table has indexes on character columns — which is the common case for production tables. Even when INPLACE is possible, concurrent DML is not permitted. In either case, the physical work is a full row-by-row rebuild.</li>
  <li><strong>Risk scales with table size</strong> — a COPY on a 500GB table takes hours; a COPY on a 50MB table takes seconds. dbsafe estimates duration from your actual row count and row size.</li>
  <li><strong>dbsafe detects the algorithm and generates the mitigation command</strong> — gh-ost for standalone and async replication, pt-osc for triggered tables, Galera/PXC clusters, and Aurora.</li>
  <li><strong>Always run <code class="language-plaintext highlighter-rouge">dbsafe plan</code> before any production schema change</strong> — especially for operations that look innocent, like expanding a VARCHAR or renaming a column with a type change.</li>
</ol>

<p>Happy (safe) schema changes!</p>

<h2 id="references">References</h2>

<p><strong>MySQL Official Documentation:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">ALTER TABLE Statement — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html">InnoDB and Online DDL — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html">InnoDB Row Formats — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set">Character Set Conversion — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">RENAME COLUMN — MySQL 8.0</a></li>
</ul>

<p><strong>Tools:</strong></p>
<ul>
  <li><a href="https://github.com/nethalo/dbsafe">dbsafe — GitHub Repository</a></li>
  <li><a href="https://github.com/github/gh-ost">gh-ost — GitHub’s Online Schema Migration Tool</a></li>
  <li><a href="https://github.com/github/gh-ost/blob/master/doc/triggerless-design.md">gh-ost Triggerless Design</a></li>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change — Percona Toolkit</a></li>
</ul>

<p><strong>Related Posts:</strong></p>
<ul>
  <li><a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe: Know Before You ALTER</a></li>
  <li><a href="/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html">Zero-Downtime Schema Changes with INSTANT DDL</a></li>
  <li><a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="copy-algorithm"/>
    <category term="schema-changes"/>
    <category term="dbsafe"/>
    <category term="gh-ost"/>
    <category term="pt-online-schema-change"/>
    <summary type="html"><![CDATA[Understand MySQL COPY algorithm DDL that rebuilds entire tables. Learn why MODIFY COLUMN and charset conversions are dangerous, and how dbsafe mitigates risk.]]></summary>
  </entry>
  <entry>
    <title type="html">Zero-Downtime Schema Changes with MySQL 8.0 INSTANT DDL</title>
    <link href="https://rendiment.io/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html" rel="alternate" type="text/html" title="Zero-Downtime Schema Changes with MySQL 8.0 INSTANT DDL"/>
    <published>2026-02-21T00:00:00-05:00</published>
    <updated>2026-02-21T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/02/21/instant-ddl-mysql-dbsafe.html"><![CDATA[<p>You need to add a column to a 500-million-row production table. Traditionally, that means hours of disk I/O, replication lag that grows faster than you can drain it, and a maintenance window to tell your users about. With MySQL 8.0 INSTANT DDL, that same change completes in milliseconds — no table rebuild, no row copies, only a brief exclusive metadata lock during the commit phase.</p>

<p>This post covers exactly which operations qualify for INSTANT execution, how MySQL 8.0.29 extended the feature significantly, and how to use <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">dbsafe</a> to verify the algorithm before you ever touch production.</p>

<blockquote>
  <p><strong>Related:</strong> New to dbsafe? The <a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe</a> post covers installation, configuration, and the full range of analysis capabilities including DML, topology detection, and CI/CD integration.</p>
</blockquote>

<h2 id="the-problem-schema-changes-that-dont-need-to-be-painful">The Problem: Schema Changes That Don’t Need to Be Painful</h2>

<p>MySQL’s <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> has three execution algorithms, and they have very different performance profiles:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- COPY: New table created, all rows copied, original dropped. Hours on large tables.</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">MODIFY</span> <span class="k">COLUMN</span> <span class="n">status</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">30</span><span class="p">),</span> <span class="n">ALGORITHM</span><span class="o">=</span><span class="k">COPY</span><span class="p">;</span>

<span class="c1">-- INPLACE: Modified in-place without full row copy, but often still rebuilds data on disk.</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">ADD</span> <span class="k">INDEX</span> <span class="n">idx_tracking</span> <span class="p">(</span><span class="n">tracking_number</span><span class="p">),</span> <span class="n">ALGORITHM</span><span class="o">=</span><span class="n">INPLACE</span><span class="p">;</span>

<span class="c1">-- INSTANT: Metadata-only change. No rows touched. Milliseconds regardless of table size.</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="k">ADD</span> <span class="k">COLUMN</span> <span class="n">notes</span> <span class="nb">TEXT</span><span class="p">,</span> <span class="n">ALGORITHM</span><span class="o">=</span><span class="n">INSTANT</span><span class="p">;</span>
</code></pre></div></div>

<p>The key question is: does this specific operation need to touch data at all? Adding a column with a default of <code class="language-plaintext highlighter-rouge">NULL</code> doesn’t require reading or writing a single row — the column simply doesn’t exist yet in the physical data. MySQL can record that fact in the InnoDB data dictionary and be done.</p>

<p>The problem is knowing which operations qualify. That’s where the confusion (and production incidents) happen.</p>

<h2 id="what-is-instant-ddl">What Is INSTANT DDL?</h2>

<p>INSTANT DDL was introduced in <a href="https://dev.mysql.com/blog-archive/mysql-8-0-innodb-now-supports-instant-add-column/">MySQL 8.0.12 for trailing ADD COLUMN</a>. Instead of rebuilding the table’s physical storage, MySQL makes a metadata-only change: it updates the InnoDB data dictionary to record the new column definition, without touching any of the actual row data.</p>

<p>The key properties:</p>

<ul>
  <li><strong>No table rebuild</strong> — physical row data is not copied or reorganized</li>
  <li><strong>Brief metadata lock only</strong> — reads and writes proceed normally; a brief exclusive metadata lock is taken during the commit phase</li>
  <li><strong>Instant execution</strong> — completes in milliseconds regardless of table size (500 rows or 500 million)</li>
  <li><strong>Metadata-only</strong> — only the InnoDB data dictionary is modified</li>
</ul>

<p><a href="https://dev.mysql.com/blog-archive/mysql-8-0-instant-add-and-drop-columns/">MySQL 8.0.29 extended INSTANT DDL significantly</a>, adding support for adding columns at any position (not just trailing) and for dropping columns entirely — operations that previously always required a full rebuild.</p>

<h2 id="how-dbsafe-detects-instant-operations">How dbsafe Detects INSTANT Operations</h2>

<p>dbsafe connects to your MySQL server, checks the version, reads the table’s column structure, and maps your specific <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> statement to the algorithm MySQL will use. No guessing, no reading documentation — it tells you directly.</p>

<p>For the examples below, create a self-contained test table. The <code class="language-plaintext highlighter-rouge">orders</code> table is a realistic schema with 21 columns, 7 indexes, and a foreign key — the kind of table where DDL decisions actually matter.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span> <span class="n">dbsafe_demo</span><span class="p">;</span>
<span class="n">USE</span> <span class="n">dbsafe_demo</span><span class="p">;</span>

<span class="c1">-- Minimal customers table (required for the FK constraint)</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">customers</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INT</span> <span class="nb">UNSIGNED</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="n">AUTO_INCREMENT</span><span class="p">,</span>
  <span class="n">name</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">150</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">email</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span><span class="p">;</span>

<span class="c1">-- Production-like orders table: ~3.9M rows, 21 columns, 7 indexes, 1 FK</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INT</span> <span class="nb">UNSIGNED</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="n">AUTO_INCREMENT</span><span class="p">,</span>
  <span class="n">order_number</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">customer_id</span> <span class="nb">INT</span> <span class="nb">UNSIGNED</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">status</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'pending'</span><span class="p">,</span>
  <span class="n">tracking_number</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">subtotal</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'0.00'</span><span class="p">,</span>
  <span class="n">tax_amount</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'0.00'</span><span class="p">,</span>
  <span class="n">shipping_amount</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'0.00'</span><span class="p">,</span>
  <span class="n">total_amount</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'0.00'</span><span class="p">,</span>
  <span class="n">shipping_address_id</span> <span class="nb">INT</span> <span class="nb">UNSIGNED</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">billing_address_id</span> <span class="nb">INT</span> <span class="nb">UNSIGNED</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">payment_method</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">payment_status</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="s1">'unpaid'</span><span class="p">,</span>
  <span class="n">shipped_at</span> <span class="nb">DATETIME</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">delivered_at</span> <span class="nb">DATETIME</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">cancelled_at</span> <span class="nb">DATETIME</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">cancel_reason</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">ip_address</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">45</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">user_agent</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">512</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="n">created_at</span> <span class="nb">DATETIME</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="k">CURRENT_TIMESTAMP</span><span class="p">,</span>
  <span class="n">updated_at</span> <span class="nb">DATETIME</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="k">CURRENT_TIMESTAMP</span> <span class="k">ON</span> <span class="k">UPDATE</span> <span class="k">CURRENT_TIMESTAMP</span><span class="p">,</span>
  <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">id</span><span class="p">),</span>
  <span class="k">UNIQUE</span> <span class="k">KEY</span> <span class="n">uq_order_number</span> <span class="p">(</span><span class="n">order_number</span><span class="p">),</span>
  <span class="k">KEY</span> <span class="n">idx_customer_id</span> <span class="p">(</span><span class="n">customer_id</span><span class="p">),</span>
  <span class="k">KEY</span> <span class="n">idx_status</span> <span class="p">(</span><span class="n">status</span><span class="p">),</span>
  <span class="k">KEY</span> <span class="n">idx_tracking_number</span> <span class="p">(</span><span class="n">tracking_number</span><span class="p">),</span>
  <span class="k">KEY</span> <span class="n">idx_payment_status</span> <span class="p">(</span><span class="n">payment_status</span><span class="p">),</span>
  <span class="k">KEY</span> <span class="n">idx_created_at</span> <span class="p">(</span><span class="n">created_at</span><span class="p">),</span>
  <span class="k">CONSTRAINT</span> <span class="n">fk_orders_customer</span> <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">customer_id</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">customers</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span> <span class="k">DEFAULT</span> <span class="n">CHARSET</span><span class="o">=</span><span class="n">utf8mb3</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="instant-add-column-mysql-8012">INSTANT ADD COLUMN (MySQL 8.0.12+)</h2>

<p>The original INSTANT capability: adding a column at the end of the table (trailing position). This works on any MySQL 8.0.12+ server.</p>

<p>Customer service wants a free-text annotations field on orders. A <code class="language-plaintext highlighter-rouge">notes TEXT</code> column has no default value and no NOT NULL constraint — it’s the textbook INSTANT case.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders ADD COLUMN notes TEXT"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-instant-add-trailing.png" alt="dbsafe output showing INSTANT algorithm, NONE locking, SAFE risk for trailing ADD COLUMN" /></p>

<p>dbsafe reports <code class="language-plaintext highlighter-rouge">Algorithm: INSTANT</code>, <code class="language-plaintext highlighter-rouge">Locking: NONE</code>, and <code class="language-plaintext highlighter-rouge">Risk: SAFE</code>. The analysis also shows that no rows will be touched, and the operation will complete in milliseconds regardless of table size.</p>

<p>This is the most straightforward INSTANT case: a nullable column added at the end, nothing in the physical data needs to change.</p>

<h2 id="instant-add-column-at-any-position-mysql-8029">INSTANT ADD COLUMN at Any Position (MySQL 8.0.29+)</h2>

<p>Before 8.0.29, if you wanted to add a column somewhere other than the last position — using <code class="language-plaintext highlighter-rouge">AFTER column_name</code> or <code class="language-plaintext highlighter-rouge">FIRST</code> — MySQL fell back to INPLACE, which could still require rebuilding the data on disk.</p>

<p>On 8.0.29+, column position no longer matters:</p>

<p>You want to add a <code class="language-plaintext highlighter-rouge">currency VARCHAR(3)</code> column right next to the monetary columns for readability. Using <code class="language-plaintext highlighter-rouge">AFTER total_amount</code> places it in the middle of the column list — something that would have forced a rebuild on 8.0.28 and earlier.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders ADD COLUMN currency VARCHAR(3) DEFAULT 'USD' AFTER total_amount"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-instant-add-after.png" alt="dbsafe output showing INSTANT algorithm for ADD COLUMN AFTER on MySQL 8.0.29+" /></p>

<p>The same statement on MySQL 8.0.28 would produce <code class="language-plaintext highlighter-rouge">Algorithm: INPLACE</code> — a full rebuild. On 8.0.29+, it’s INSTANT.</p>

<blockquote>
  <p><strong>Tip:</strong> If your MySQL version is 8.0.28 or earlier, <code class="language-plaintext highlighter-rouge">AFTER column_name</code> clauses will not be INSTANT. dbsafe detects your server version and shows you exactly what algorithm your server will use — not what the latest version supports.</p>
</blockquote>

<p>This version boundary matters in practice. If your staging server is on 8.0.30 but production is on 8.0.27, dbsafe will give you different answers when run against each server. Always run it against the target server.</p>

<h2 id="instant-drop-column-mysql-8029">INSTANT DROP COLUMN (MySQL 8.0.29+)</h2>

<p>Before 8.0.29, <code class="language-plaintext highlighter-rouge">DROP COLUMN</code> always rebuilt the table. Every row had to be rewritten without that column’s data. On large tables this was as disruptive as any other COPY or INPLACE rebuild.</p>

<p>Starting with 8.0.29, dropping a column is also metadata-only:</p>

<p>The <code class="language-plaintext highlighter-rouge">user_agent</code> column was added years ago for fraud detection, but your new fraud system pulls that data from a separate audit log. You want to drop it for GDPR data minimization — a 512-byte VARCHAR across 3.9M rows is meaningful storage.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders DROP COLUMN user_agent"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-instant-drop-column.png" alt="dbsafe output showing INSTANT algorithm for DROP COLUMN on MySQL 8.0.29+" /></p>

<p>Internally, InnoDB marks the column as dropped in the data dictionary. The physical data remains on disk — the rows still contain the column’s bytes — until the next time a full table rebuild occurs (such as an <code class="language-plaintext highlighter-rouge">OPTIMIZE TABLE</code> or a COPY-algorithm ALTER).</p>

<blockquote>
  <p><strong>Tip:</strong> Disk space is not immediately reclaimed after an INSTANT DROP COLUMN. The column’s data stays in the row format on disk until the next rebuild. This is expected behavior — the column is simply invisible to MySQL. If you need to reclaim the space, run <code class="language-plaintext highlighter-rouge">OPTIMIZE TABLE orders</code> at a maintenance window.</p>
</blockquote>

<h2 id="when-instant-ddl-wont-work">When INSTANT DDL Won’t Work</h2>

<p>Not every <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> qualifies for INSTANT. Some operations that look simple still require INPLACE or COPY because they actually need to touch or reorganize row data.</p>

<p>Expanding a VARCHAR column is a good example. Whether MySQL can do it in-place depends on whether the change crosses the <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">255-byte length-prefix boundary</a>: VARCHAR values up to 255 bytes use a 1-byte length prefix, while values of 256 bytes or more use a 2-byte prefix. When the extension stays within the same boundary, MySQL only updates metadata. When it crosses, every row must be rewritten.</p>

<p>The <code class="language-plaintext highlighter-rouge">orders</code> table uses <code class="language-plaintext highlighter-rouge">utf8mb3</code> (3 bytes per character), so the byte math matters:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">VARCHAR(30)</code> × 3 = <strong>90 bytes</strong> → 1-byte length prefix</li>
  <li><code class="language-plaintext highlighter-rouge">VARCHAR(50)</code> × 3 = <strong>150 bytes</strong> → 1-byte length prefix</li>
  <li><code class="language-plaintext highlighter-rouge">VARCHAR(255)</code> × 3 = <strong>765 bytes</strong> → 2-byte length prefix</li>
</ul>

<p>Extending <code class="language-plaintext highlighter-rouge">order_number</code> from <code class="language-plaintext highlighter-rouge">VARCHAR(30)</code> to <code class="language-plaintext highlighter-rouge">VARCHAR(50)</code> stays within the 1-byte prefix range — both are under 255 bytes. MySQL handles this as an INPLACE, metadata-only change:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders MODIFY COLUMN order_number VARCHAR(50)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-instant-varchar-safe.png" alt="dbsafe output showing INPLACE algorithm for VARCHAR(30) to VARCHAR(50) extension" /></p>

<p>But extending to <code class="language-plaintext highlighter-rouge">VARCHAR(255)</code> crosses the boundary — from 90 bytes (1-byte prefix) to 765 bytes (2-byte prefix). MySQL must rewrite every row to change the length prefix, forcing a full COPY rebuild:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE orders MODIFY COLUMN order_number VARCHAR(255)"</span>
</code></pre></div></div>

<p><img src="/assets/img/gallery/dbsafe-instant-varchar-copy.png" alt="dbsafe output showing COPY algorithm for VARCHAR(30) to VARCHAR(255) extension" /></p>

<blockquote>
  <p><strong>Tip:</strong> The boundary depends on your character set. With <code class="language-plaintext highlighter-rouge">utf8mb4</code> (4 bytes/char), <code class="language-plaintext highlighter-rouge">VARCHAR(64)</code> is already 256 bytes — past the threshold. With <code class="language-plaintext highlighter-rouge">latin1</code> (1 byte/char), you can extend up to <code class="language-plaintext highlighter-rouge">VARCHAR(255)</code> in-place. Always check the byte length, not the character count.</p>
</blockquote>

<p>Other common operations that won’t be INSTANT:</p>

<ul>
  <li><strong>Changing column data type</strong> (<code class="language-plaintext highlighter-rouge">INT</code> → <code class="language-plaintext highlighter-rouge">BIGINT</code>, <code class="language-plaintext highlighter-rouge">VARCHAR</code> → <code class="language-plaintext highlighter-rouge">TEXT</code>) → COPY</li>
  <li><strong>Adding an index</strong> → INPLACE (reads all rows to build the index)</li>
  <li><strong>Changing <code class="language-plaintext highlighter-rouge">NULL</code> to <code class="language-plaintext highlighter-rouge">NOT NULL</code></strong> → INPLACE or COPY (needs to validate existing rows)</li>
  <li><strong>Dropping the primary key without a replacement</strong> → COPY (entire clustered index must be rebuilt); adding or replacing a PK → INPLACE (with rebuild, but faster than COPY)</li>
</ul>

<p>For these operations, you need a different approach: <a href="https://github.com/github/gh-ost">gh-ost</a>, <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change</a>, or a carefully planned maintenance window. The <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">MySQL Online DDL Operations reference</a> has the full matrix of what’s possible.</p>

<h2 id="version-matrix">Version Matrix</h2>

<table>
  <thead>
    <tr>
      <th>Operation</th>
      <th>8.0.12–8.0.28</th>
      <th>8.0.29+ / 8.4 LTS</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ADD COLUMN (trailing)</td>
      <td>INSTANT</td>
      <td>INSTANT</td>
    </tr>
    <tr>
      <td>ADD COLUMN (AFTER/FIRST)</td>
      <td>INPLACE</td>
      <td>INSTANT</td>
    </tr>
    <tr>
      <td>DROP COLUMN</td>
      <td>INPLACE (rebuild)</td>
      <td>INSTANT</td>
    </tr>
    <tr>
      <td>RENAME COLUMN</td>
      <td>INPLACE</td>
      <td>INSTANT</td>
    </tr>
    <tr>
      <td>Set/drop column default</td>
      <td>INSTANT</td>
      <td>INSTANT</td>
    </tr>
    <tr>
      <td>MODIFY COLUMN (type change)</td>
      <td>COPY</td>
      <td>COPY</td>
    </tr>
    <tr>
      <td>ADD INDEX</td>
      <td>INPLACE</td>
      <td>INPLACE</td>
    </tr>
    <tr>
      <td>Change NULL → NOT NULL</td>
      <td>INPLACE/COPY</td>
      <td>INPLACE/COPY</td>
    </tr>
    <tr>
      <td>Drop PRIMARY KEY (no replacement)</td>
      <td>COPY</td>
      <td>COPY</td>
    </tr>
    <tr>
      <td>Add/replace PRIMARY KEY</td>
      <td>INPLACE (rebuild)</td>
      <td>INPLACE (rebuild)</td>
    </tr>
  </tbody>
</table>

<h2 id="practical-workflow">Practical Workflow</h2>

<p>The workflow for any production schema change:</p>

<ol>
  <li><strong>Write your <code class="language-plaintext highlighter-rouge">ALTER TABLE</code></strong> statement</li>
  <li><strong>Run <code class="language-plaintext highlighter-rouge">dbsafe plan</code></strong> against your production server (or a replica with production data)</li>
  <li><strong>Check the algorithm</strong>: INSTANT → proceed; INPLACE/COPY → evaluate alternatives</li>
  <li><strong>If INSTANT</strong>: execute directly during business hours, no maintenance window needed</li>
  <li><strong>If INPLACE or COPY</strong>: consider <a href="https://github.com/github/gh-ost">gh-ost</a> for zero-downtime, or pt-osc, or schedule a maintenance window</li>
</ol>

<p>For CI/CD pipelines, <code class="language-plaintext highlighter-rouge">dbsafe plan --format json</code> lets you gate deployments on the algorithm:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">RESULT</span><span class="o">=</span><span class="si">$(</span>dbsafe plan <span class="nt">--format</span> json <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id INT"</span><span class="si">)</span>

<span class="nv">ALGORITHM</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$RESULT</span><span class="s2">"</span> | jq <span class="nt">-r</span> <span class="s1">'.algorithm'</span><span class="si">)</span>
<span class="nv">RISK</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$RESULT</span><span class="s2">"</span> | jq <span class="nt">-r</span> <span class="s1">'.risk'</span><span class="si">)</span>

<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$ALGORITHM</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"INSTANT"</span> <span class="o">]</span> <span class="o">||</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$RISK</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"SAFE"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Schema change is not INSTANT/SAFE — blocking deployment"</span>
  <span class="nb">echo</span> <span class="s2">"Algorithm: </span><span class="nv">$ALGORITHM</span><span class="s2">, Risk: </span><span class="nv">$RISK</span><span class="s2">"</span>
  <span class="nb">exit </span>1
<span class="k">fi

</span><span class="nb">echo</span> <span class="s2">"Schema change is safe to run — proceeding"</span>
mysql <span class="nt">-e</span> <span class="s2">"ALTER TABLE orders ADD COLUMN fulfillment_id INT"</span>
</code></pre></div></div>

<p>This pattern catches dangerous migrations before they reach production. The pipeline fails fast, and the developer sees exactly why: the algorithm, the locking, the risk level.</p>

<blockquote>
  <p><strong>Related:</strong> Heavy schema change traffic can cause InnoDB mutex contention. See <a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a> for how to detect contention using <code class="language-plaintext highlighter-rouge">SHOW ENGINE INNODB STATUS</code> during and after schema operations.</p>
</blockquote>

<p>For Galera/PXC clusters, the stakes are higher: even an INSTANT DDL in TOI mode blocks all cluster nodes for the duration. dbsafe detects cluster topology and adjusts its risk assessment accordingly. For cluster-specific testing patterns, see <a href="/mysql/proxysql/2026/02/03/sysbench-proxysql.html">How to Test ProxySQL Read/Write Split with sysbench</a> for context on how cluster load behaves during DDL.</p>

<h2 id="summary">Summary</h2>

<ol>
  <li><strong>INSTANT DDL modifies only the InnoDB data dictionary</strong> — no row copies, no table rebuild, only a brief exclusive metadata lock during the commit phase, milliseconds regardless of table size.</li>
  <li><strong>MySQL 8.0.12</strong> introduced INSTANT ADD COLUMN for trailing positions only.</li>
  <li><strong>MySQL 8.0.29</strong> extended INSTANT to ADD COLUMN at any position and DROP COLUMN — two of the most common DBA operations.</li>
  <li><strong>Not every ALTER qualifies</strong>: data type changes, index additions, and NULL → NOT NULL changes still rebuild.</li>
  <li><strong>Use <code class="language-plaintext highlighter-rouge">dbsafe plan</code> before every production schema change</strong> to confirm the algorithm against your specific MySQL version, table structure, and cluster topology.</li>
</ol>

<p>Happy schema changes!</p>

<h2 id="references">References</h2>

<p><strong>MySQL Official Documentation:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">ALTER TABLE Statement — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html">InnoDB and Online DDL — MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-data-dictionary.html">InnoDB Data Dictionary — MySQL 8.0</a></li>
</ul>

<p><strong>MySQL Blog Posts:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/blog-archive/mysql-8-0-innodb-now-supports-instant-add-column/">MySQL 8.0: InnoDB now supports Instant ADD COLUMN</a></li>
  <li><a href="https://dev.mysql.com/blog-archive/mysql-8-0-instant-add-and-drop-columns/">MySQL 8.0 INSTANT ADD and DROP Column(s)</a></li>
</ul>

<p><strong>Tools:</strong></p>
<ul>
  <li><a href="https://github.com/nethalo/dbsafe">dbsafe — GitHub Repository</a></li>
  <li><a href="https://github.com/github/gh-ost">gh-ost — GitHub’s Online Schema Migration Tool</a></li>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change — Percona Toolkit</a></li>
</ul>

<p><strong>Related Posts:</strong></p>
<ul>
  <li><a href="/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html">Introducing dbsafe: Know Before You ALTER</a></li>
  <li><a href="/mysql/innodb/2024/09/01/innodb-semaphore-contention.html">Contention in MySQL InnoDB</a></li>
  <li><a href="/mysql/proxysql/2026/02/03/sysbench-proxysql.html">How to Test ProxySQL Read/Write Split with sysbench</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="instant-ddl"/>
    <category term="schema-changes"/>
    <category term="dbsafe"/>
    <category term="innodb"/>
    <summary type="html"><![CDATA[Master MySQL 8.0 INSTANT DDL for zero-downtime schema changes. Learn ADD COLUMN, DROP COLUMN without table rebuilds, and use dbsafe to verify before production.]]></summary>
  </entry>
  <entry>
    <title type="html">Introducing dbsafe: Know Before You ALTER</title>
    <link href="https://rendiment.io/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html" rel="alternate" type="text/html" title="Introducing dbsafe: Know Before You ALTER"/>
    <published>2026-02-14T00:00:00-05:00</published>
    <updated>2026-02-14T00:00:00-05:00</updated>
    <id>https://rendiment.io/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter</id>
    <content type="html" xml:base="https://rendiment.io/mysql/tools/2026/02/14/introducing-dbsafe-know-before-you-alter.html"><![CDATA[<p>You’ve probably been there: running <code class="language-plaintext highlighter-rouge">ALTER TABLE users ADD COLUMN email VARCHAR(255)</code> on a production table, expecting it to take a few seconds, and watching it lock the table for 20 minutes instead. Or spending an hour reading MySQL documentation trying to figure out if your <code class="language-plaintext highlighter-rouge">MODIFY COLUMN</code> will use INSTANT DDL or rebuild the entire table.</p>

<p>The problem with MySQL’s <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> is that you don’t know what algorithm it will use, what locks it will take, or how long it will run until you actually execute it. By then, if you guessed wrong, your application is already timing out.</p>

<h2 id="the-problem-alter-table-is-a-black-box">The Problem: ALTER TABLE is a Black Box</h2>

<p>When planning a schema change, these are the questions you need answered:</p>

<ul>
  <li>Will it use <a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">INSTANT, INPLACE, or COPY algorithm</a>?</li>
  <li>What locks will it take? Can the table still handle reads/writes?</li>
  <li>How long will it take on a table with 500 million rows?</li>
  <li>Will it work differently on MySQL 8.0.12 vs 8.0.29?</li>
  <li>What about my Galera cluster? Will it block all nodes in TOI mode?</li>
  <li>Can I roll it back if something goes wrong?</li>
  <li>Should I use <a href="https://github.com/github/gh-ost">gh-ost</a> or <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change</a> instead?</li>
</ul>

<p>You can test on staging, but staging never has production-scale data. You can read the documentation, but you still need to mentally map your MySQL version, your table structure, and your specific ALTER syntax to figure out what will happen.</p>

<p>There should be a tool that just tells you.</p>

<h2 id="enter-dbsafe">Enter dbsafe</h2>

<p><a href="https://github.com/nethalo/dbsafe">dbsafe</a> is a command-line tool that connects to your MySQL server, analyzes your DDL or DML statement without running it, and tells you exactly what will happen.</p>

<p><img src="/assets/img/gallery/dbsafe-output-safe.png" alt="dbsafe plan output showing INSTANT algorithm and SAFE risk level" /></p>

<p>It’s read-only analysis. It doesn’t modify anything. It just tells you what MySQL would do if you ran that statement.</p>

<h2 id="the-same-statement-different-outcomes">The Same Statement, Different Outcomes</h2>

<p>Here’s what makes schema changes tricky: similar-looking statements can behave completely differently.</p>

<p><img src="/assets/img/gallery/dbsafe-safe-vs-dangerous.png" alt="Comparison of safe ADD COLUMN vs dangerous MODIFY COLUMN operations" /></p>

<p>The first one is instant, no locks, safe for production. The second one rebuilds the entire table with an exclusive lock (<a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">COPY algorithm</a> creates a new table and copies all rows). You need to know which is which before you run it.</p>

<h2 id="topology-detection">Topology Detection</h2>

<p>If you’re running Percona XtraDB Cluster, dbsafe detects it and adjusts its analysis:</p>

<p><img src="/assets/img/gallery/dbsafe-topology-pxc.png" alt="dbsafe topology detection for Percona XtraDB Cluster" /></p>

<p><em>In TOI mode, <a href="https://docs.percona.com/percona-xtradb-cluster/8.0/toi.html">DDL locks the entire cluster for the duration of the operation</a> - all nodes are blocked from accepting writes. Cluster detection uses <a href="https://docs.percona.com/percona-xtradb-cluster/8.0/wsrep-status-index.html">wsrep status variables</a>. For large tables, use <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change</a> or <a href="https://docs.percona.com/percona-xtradb-cluster/8.0/rsu.html">RSU method</a>.</em></p>

<p>The same ALTER that’s safe on standalone MySQL can block your entire cluster for minutes in <a href="https://docs.percona.com/percona-xtradb-cluster/8.0/toi.html">TOI (Total Order Isolation) mode</a>. dbsafe detects:</p>

<ul>
  <li>Galera/PXC clusters (<a href="https://docs.percona.com/percona-xtradb-cluster/8.0/wsrep-status-index.html">wsrep status variables</a>)</li>
</ul>

<blockquote>
  <p><strong>Related:</strong> For load-testing PXC clusters with ProxySQL read/write split, see <a href="/mysql/proxysql/2026/02/03/sysbench-proxysql.html">How to Test ProxySQL Read/Write Split with sysbench</a>.</p>
</blockquote>

<ul>
  <li>MySQL Group Replication (<a href="https://dev.mysql.com/doc/refman/8.0/en/performance-schema-replication-group-members-table.html">performance_schema.replication_group_members</a>)</li>
  <li>Async replication topologies (<a href="https://dev.mysql.com/doc/refman/8.0/en/show-replica-status.html">SHOW REPLICA STATUS</a>)</li>
  <li>Semi-sync replication</li>
</ul>

<p>And adjusts its risk assessment and recommendations accordingly.</p>

<h2 id="dml-analysis">DML Analysis</h2>

<p>dbsafe also analyzes DELETE and UPDATE statements:</p>

<p><img src="/assets/img/gallery/dbsafe-dml-analysis.png" alt="DML analysis showing chunked DELETE script generation" /></p>

<p>It uses <a href="https://dev.mysql.com/doc/refman/8.0/en/explain.html"><code class="language-plaintext highlighter-rouge">EXPLAIN</code></a> to estimate affected rows, checks for triggers, and generates a chunked execution script for large operations. The script uses <code class="language-plaintext highlighter-rouge">LIMIT</code> with <code class="language-plaintext highlighter-rouge">SLEEP()</code> between batches to avoid replication lag and long-running transactions.</p>

<h2 id="version-specific-features">Version-Specific Features</h2>

<p><a href="https://dev.mysql.com/blog-archive/mysql-8-0-innodb-now-supports-instant-add-column/">MySQL 8.0.12 introduced INSTANT ADD COLUMN</a> for trailing positions. <a href="https://dev.mysql.com/blog-archive/mysql-8-0-instant-add-and-drop-columns/">MySQL 8.0.29 extended it to any position and added INSTANT DROP COLUMN</a>. dbsafe detects your MySQL version and tells you what’s supported.</p>

<p><img src="/assets/img/gallery/dbsafe-version-comparison.png" alt="Version comparison between MySQL 8.0.11 and 8.0.29+ INSTANT DDL support" /></p>

<p>Same statement, different behavior depending on version. You need to know what your specific MySQL version supports.</p>

<h2 id="features">Features</h2>

<p><strong>Read-only analysis</strong> - Connects to your database, reads metadata, never modifies data</p>

<p><strong>Topology detection</strong> - Detects Galera/PXC, Group Replication, async replication, adjusts recommendations</p>

<p><strong>Version-aware</strong> - Knows feature differences between MySQL 8.0.12, 8.0.29, 8.4 LTS, and Percona variants</p>

<p><strong>Rollback plans</strong> - Generates undo SQL for every DDL operation</p>

<p><strong>DML analysis</strong> - Analyzes DELETE/UPDATE statements, generates chunked execution scripts</p>

<p><strong>Multiple output formats</strong> - Text (colored), Plain (no colors), JSON (for automation), Markdown</p>

<p><strong>Table metadata</strong> - Shows table size, row count, indexes, foreign keys, triggers</p>

<p><strong>Tool recommendations</strong> - Tells you when to use <a href="https://github.com/github/gh-ost">gh-ost</a>, pt-online-schema-change, or native MySQL</p>

<p><strong>CI/CD integration</strong> - JSON output, exit codes for pipeline automation</p>

<h2 id="requirements">Requirements</h2>

<ul>
  <li>MySQL 8.0.x or 8.4 LTS (including Percona Server variants, XtraDB Cluster, Group Replication)</li>
  <li>MySQL 5.7 and MariaDB are NOT supported</li>
  <li>Read-only MySQL user with <a href="https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html">SELECT, PROCESS, and REPLICATION CLIENT privileges</a></li>
</ul>

<h2 id="installation">Installation</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Linux x86_64</span>
<span class="nv">VERSION</span><span class="o">=</span>0.2.0
curl <span class="nt">-L</span> https://github.com/nethalo/dbsafe/releases/download/v<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>/dbsafe_<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>_linux_amd64.tar.gz | <span class="nb">tar </span>xz
<span class="nb">sudo mv </span>dbsafe /usr/local/bin/

<span class="c"># macOS Apple Silicon</span>
<span class="nv">VERSION</span><span class="o">=</span>0.2.0
curl <span class="nt">-L</span> https://github.com/nethalo/dbsafe/releases/download/v<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>/dbsafe_<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>_darwin_arm64.tar.gz | <span class="nb">tar </span>xz
<span class="nb">sudo mv </span>dbsafe /usr/local/bin/

<span class="c"># Create MySQL user for dbsafe (read-only)</span>
mysql <span class="nt">-u</span> root <span class="nt">-p</span> <span class="o">&lt;&lt;</span> <span class="sh">'</span><span class="no">SQL</span><span class="sh">'
CREATE USER 'dbsafe'@'%' IDENTIFIED BY 'your_password';
GRANT SELECT, PROCESS, REPLICATION CLIENT ON *.* TO 'dbsafe'@'%';
</span><span class="no">SQL

</span><span class="c"># Setup configuration</span>
dbsafe config init

<span class="c"># Test connection</span>
dbsafe connect
</code></pre></div></div>

<h2 id="quick-start">Quick Start</h2>

<p>Create a test database:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">dbsafe_demo</span><span class="p">;</span>
<span class="n">USE</span> <span class="n">dbsafe_demo</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">products</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INT</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="n">AUTO_INCREMENT</span><span class="p">,</span>
  <span class="n">name</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">),</span>
  <span class="n">price</span> <span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span><span class="p">;</span>
</code></pre></div></div>

<p>Test a safe change (add <code class="language-plaintext highlighter-rouge">-p</code> to be prompted for password):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE products ADD COLUMN description TEXT"</span>
</code></pre></div></div>

<p>Output shows INSTANT algorithm, no locks, safe for production.</p>

<p>Test a dangerous change:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE products MODIFY COLUMN name VARCHAR(255)"</span>
</code></pre></div></div>

<p>Output shows COPY algorithm, exclusive locks, recommendation to use <a href="https://github.com/github/gh-ost">gh-ost</a> or pt-online-schema-change.</p>

<p><strong>Note</strong>: Connection parameters come from <code class="language-plaintext highlighter-rouge">~/.dbsafe/config.yaml</code>. Use the <code class="language-plaintext highlighter-rouge">-p</code> flag if you need to enter your password interactively (recommended - don’t store passwords in config files).</p>

<h2 id="use-cases">Use Cases</h2>

<p><strong>Planning production schema changes</strong> - Analyze before you run, know the impact</p>

<p><strong>Reviewing migration scripts</strong> - Add to CI/CD pipeline to catch dangerous changes early</p>

<p><strong>Galera/PXC cluster operations</strong> - Understand TOI blocking behavior before it happens</p>

<p><strong>Large table operations</strong> - Get realistic time estimates, decide between native MySQL and <a href="https://github.com/github/gh-ost">gh-ost</a></p>

<p><strong>DML safety</strong> - Analyze bulk DELETE/UPDATE, get chunked execution scripts</p>

<h2 id="how-it-works">How It Works</h2>

<p>dbsafe connects to your MySQL server (read-only) and performs the following analysis:</p>

<ol>
  <li><strong>SQL Parsing</strong> - Uses <a href="https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser">Vitess sqlparser</a> to parse and understand your DDL/DML statement</li>
  <li><strong>Topology Detection</strong> - Queries <code class="language-plaintext highlighter-rouge">wsrep_*</code> variables (Galera/PXC), <code class="language-plaintext highlighter-rouge">performance_schema.replication_group_members</code> (Group Replication), or runs <code class="language-plaintext highlighter-rouge">SHOW REPLICA STATUS</code> (async replication)</li>
  <li><strong>Metadata Collection</strong> - Gathers table size, row count, indexes, foreign keys, triggers from <a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema.html"><code class="language-plaintext highlighter-rouge">information_schema</code></a></li>
  <li><strong>Version Detection</strong> - Checks MySQL version to determine available INSTANT DDL features</li>
  <li><strong>Algorithm Determination</strong> - Maps your operation + MySQL version to execution algorithm (INSTANT/INPLACE/COPY)</li>
  <li><strong>Impact Estimation</strong> - Calculates estimated duration, lock requirements, and replication impact</li>
  <li><strong>Recommendations</strong> - Suggests <a href="https://github.com/github/gh-ost">gh-ost</a>/pt-osc for COPY operations on large tables, chunked scripts for bulk DML</li>
</ol>

<p>All analysis is read-only. No test runs, no locks taken, no data modified.</p>

<h2 id="output-formats">Output Formats</h2>

<p><strong>Text</strong> (default) - Colored output for terminal use</p>

<p><strong>Plain</strong> - No colors, for log files and CI/CD</p>

<p><strong>JSON</strong> - Machine-readable for automation:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="nt">--format</span> json <span class="s2">"ALTER TABLE users ADD COLUMN email VARCHAR(255)"</span> <span class="o">&gt;</span> analysis.json
</code></pre></div></div>

<p><strong>Markdown</strong> - For documentation</p>

<h2 id="configuration">Configuration</h2>

<p>The <code class="language-plaintext highlighter-rouge">dbsafe config init</code> command creates <code class="language-plaintext highlighter-rouge">~/.dbsafe/config.yaml</code> interactively. You can manually edit it for multiple environments:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">connections</span><span class="pi">:</span>
  <span class="na">default</span><span class="pi">:</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">3306</span>
    <span class="na">user</span><span class="pi">:</span> <span class="s">dbsafe</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">myapp</span>

  <span class="na">production</span><span class="pi">:</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">prod.example.com</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">3306</span>
    <span class="na">user</span><span class="pi">:</span> <span class="s">dbsafe_ro</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">production</span>

<span class="na">defaults</span><span class="pi">:</span>
  <span class="na">chunk_size</span><span class="pi">:</span> <span class="m">10000</span>
  <span class="na">format</span><span class="pi">:</span> <span class="s">text</span>
</code></pre></div></div>

<p><strong>Important</strong>: Never store passwords in the config file. Use the <code class="language-plaintext highlighter-rouge">-p</code> flag when running commands to enter the password interactively.</p>

<p>View current configuration:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe config show
</code></pre></div></div>

<p>Then run analysis using the default connection:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dbsafe plan <span class="s2">"ALTER TABLE users ADD COLUMN region VARCHAR(50)"</span>
</code></pre></div></div>

<h2 id="links">Links</h2>

<ul>
  <li><a href="https://github.com/nethalo/dbsafe">GitHub Repository</a></li>
  <li><a href="https://github.com/nethalo/dbsafe/issues">Report Issues</a></li>
</ul>

<h2 id="references">References</h2>

<p><strong>MySQL Official Documentation:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/alter-table.html">ALTER TABLE Statement - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html">Online DDL Operations - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-storage-engine.html">InnoDB Storage Engine - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html">InnoDB and Online DDL - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/information-schema.html">INFORMATION_SCHEMA Tables - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html">Privileges Provided by MySQL</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/explain.html">EXPLAIN Statement - MySQL 8.0</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/show-replica-status.html">SHOW REPLICA STATUS - MySQL 8.0</a></li>
</ul>

<p><strong>MySQL Blog Posts:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/blog-archive/mysql-8-0-innodb-now-supports-instant-add-column/">MySQL 8.0: InnoDB now supports Instant ADD COLUMN</a></li>
  <li><a href="https://dev.mysql.com/blog-archive/mysql-8-0-instant-add-and-drop-columns/">MySQL 8.0 INSTANT ADD and DROP Column(s)</a></li>
</ul>

<p><strong>MySQL Group Replication:</strong></p>
<ul>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/performance-schema-replication-group-members-table.html">performance_schema.replication_group_members Table</a></li>
  <li><a href="https://dev.mysql.com/doc/refman/8.0/en/group-replication-monitoring.html">Monitoring Group Replication - MySQL 8.0</a></li>
</ul>

<p><strong>Percona XtraDB Cluster Documentation:</strong></p>
<ul>
  <li><a href="https://docs.percona.com/percona-xtradb-cluster/8.0/toi.html">Total Order Isolation (TOI)</a></li>
  <li><a href="https://docs.percona.com/percona-xtradb-cluster/8.0/rsu.html">Rolling Schema Upgrade (RSU)</a></li>
  <li><a href="https://docs.percona.com/percona-xtradb-cluster/8.0/wsrep-status-index.html">Index of wsrep Status Variables</a></li>
  <li><a href="https://docs.percona.com/percona-xtradb-cluster/8.0/online-schema-upgrade.html">Online Schema Upgrade</a></li>
</ul>

<p><strong>Percona Toolkit:</strong></p>
<ul>
  <li><a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change</a></li>
</ul>

<p><strong>Third-Party Tools:</strong></p>
<ul>
  <li><a href="https://github.com/github/gh-ost">gh-ost - GitHub’s Online Schema Migration Tool</a></li>
  <li><a href="https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser">Vitess sqlparser</a></li>
</ul>]]></content>
    <author>
      <name>Daniel Guzman Burgos</name>
    </author>
    <category term="mysql"/>
    <category term="tools"/>
    <category term="mysql"/>
    <category term="ddl"/>
    <category term="schema-changes"/>
    <category term="dba"/>
    <category term="tools"/>
    <summary type="html"><![CDATA[Analyze MySQL ALTER TABLE statements before running them. Know the algorithm, locks, and risks without testing on production. Read-only DDL safety tool.]]></summary>
  </entry>
</feed>
