<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <!-- Source: http://dev.to/feed/rdunklau -->
  <channel>
    <title>DEV Community: Ronan Dunklau</title>
    <description>The latest articles on DEV Community by Ronan Dunklau (@rdunklau).</description>
    <link>https://siftrss.com/f/KRVGYGRMYv8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F99mvlsfu5tfj9m7ku25d.png</url>
      <title>DEV Community: Ronan Dunklau</title>
      <link>https://dev.to/rdunklau</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://siftrss.com/f/KRVGYGRMYv8"/>
    <language>en</language>
    <item>
      <title>Future PostgreSQL: improvement to the replication protocol</title>
      <dc:creator>Ronan Dunklau</dc:creator>
      <pubDate>Thu, 13 Jan 2022 13:29:45 +0000</pubDate>
      <link>https://dev.to/rdunklau/future-postgresql-improvement-to-the-replication-protocol-5f9n</link>
      <guid>https://dev.to/rdunklau/future-postgresql-improvement-to-the-replication-protocol-5f9n</guid>
      <description>&lt;p&gt;Today's post is about a new PostgreSQL feature (coming in PostgreSQL 15) developed by Aiven to solve a reliability problem we sometimes run into with replication. The &lt;code&gt;READ_REPLICATION_SLOT&lt;/code&gt; command makes a physical replication client, particularly &lt;code&gt;pg_receivewal&lt;/code&gt;, more reliable.&lt;/p&gt;

&lt;p&gt;At Aiven's OSPO office, we dedicate our time to the open source products Aiven offers or develops, and we were very happy to collaborate on this feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostgreSQL replication surprises
&lt;/h2&gt;

&lt;p&gt;This story starts with our own &lt;a href="https://github.com/aiven/pghoard"&gt;PgHoard&lt;/a&gt;, a &lt;a href="https://www.postgresql.org/docs/current/continuous-archiving.html"&gt;PITR&lt;/a&gt; backup tool for PostgreSQL. PgHoard offers several methods to archive the WAL (Write Ahead Log), including &lt;a href="https://www.postgresql.org/docs/current/app-pgreceivewal.html"&gt;&lt;code&gt;pg_receivewal&lt;/code&gt;&lt;/a&gt;, a small application shipping with PostgreSQL which connects to a PostgreSQL cluster using the physical replication protocol to stream WAL as they are produced, optionally keeping track of the position on the server using a replication slot.&lt;/p&gt;

&lt;p&gt;We noticed that we could occasionally lose some WAL when PgHoard is restarted on another machine. This happens because &lt;code&gt;pg_receivewal&lt;/code&gt; determines its start position like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look at what segments are present in the archive folder, and resume from the latest archived one.&lt;/li&gt;
&lt;li&gt;If the archive folder is empty, start from the current server LSN position.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is, in our case, since &lt;code&gt;pg_receivewal&lt;/code&gt; was started on another server, it didn't have access to the archive folder directly so it resumed from the current LSN (Log Sequence Number) position.&lt;/p&gt;

&lt;p&gt;You can observe this in action by launching &lt;code&gt;pg_receivewal&lt;/code&gt;, stopping it after some WAL have been archived, then relaunching it with its directory empty:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❯ sudo -u postgres pg_receivewal -D /tmp/wals  -v -s pg_receivewal_slot
pg_receivewal: starting log streaming at 47/E5000000 (timeline 1)
pg_receivewal: finished segment at 47/E6000000 (timeline 1)
pg_receivewal: received interrupt signal, exiting
pg_receivewal: not renaming "0000000100000047000000E6.partial", segment is not complete
pg_receivewal: received interrupt signal, exiting
❯ sudo rm /tmp/wals/* -rf
# Wait for some WAL to be generated on the server, and relaunch pg_receivewal
❯ sudo -u postgres pg_receivewal -D /tmp/wals  -v -s pg_receivewal_slot
pg_receivewal: starting log streaming at 47/E9000000 (timeline 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We notice here that the segments &lt;code&gt;0000000100000047000000E7&lt;/code&gt; and &lt;code&gt;0000000100000047000000E8&lt;/code&gt; have not been archived at all, jumping from &lt;code&gt;47/E6000000&lt;/code&gt; to &lt;code&gt;47/E9000000&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;What should happen here is resuming from the replication slot's &lt;code&gt;restart_lsn&lt;/code&gt;, which is kept on the server.&lt;/p&gt;

&lt;h2&gt;
  
  
  New feature: &lt;code&gt;READ_REPLICATION_SLOT&lt;/code&gt; command
&lt;/h2&gt;

&lt;p&gt;Prior to version 15 of PostgreSQL, the only way to know a replication slot's position was to query the &lt;code&gt;pg_replication_slots&lt;/code&gt; view, which means regular SQL queries need to be issued. This is fine for logical replication, as logical replication connection are bound to a database and are allowed to perform SQL queries. But in the case of physical replication connections, &lt;a href="https://www.postgresql.org/docs/14/protocol-replication.html"&gt;we do not have this possibility&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That meant that a physical replication client had no way to know the state of it's replication slot, except by opening a separate, non-replication connection, which is not something we can do with the &lt;code&gt;pg_receivewal&lt;/code&gt; application.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;READ_REPLICATION_SLOT&lt;/code&gt; command aims to fulfill this gap, by allowing a replication connection to read the current &lt;code&gt;restart_lsn&lt;/code&gt; and timeline associated&lt;br&gt;
to a replication slot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ❯ psql postgres -d "replication=1"
psql (15devel)
  Type "help" for help.

  postgres=# READ_REPLICATION_SLOT slot1;
  slot_type | restart_lsn | restart_tli 
  -----------+-------------+-------------
  physical  | 0/1501000   |           1
(1 row)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that new command, &lt;code&gt;pg_receivewal&lt;/code&gt; can now determine its starting LSN slightly differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look at what segments are present in the archive folder, and resume from the latest archived one.&lt;/li&gt;
&lt;li&gt;If the archive folder is empty and a replication slot is used, issue a &lt;code&gt;READ_REPLICATION_SLOT&lt;/code&gt; command to read the state from the server.&lt;/li&gt;
&lt;li&gt;Otherwise, start streaming from the server's current LSN.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This new command is currently only supported for physical replication slots, and returns limited information but that could evolve in future if needed.&lt;/p&gt;

&lt;p&gt;Already the feature will help with the "surprises" we encountered when using &lt;code&gt;pg_receivewal&lt;/code&gt;, and will also benefit other replication clients by making them more reliable. It has been accepted into the project and will be available in the version 15 release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Learn more about &lt;a href="https://aiven.io/postgresql"&gt;Aiven for PostgreSQL&lt;/a&gt;, or &lt;a href="https://console.aiven.io/signup"&gt;sign up for our free trial&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Check out the &lt;a href="https://github.com/aiven/pghoard"&gt;PgHoard project on GitHub&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Find out what else our &lt;a href="https://aiven.io/open-source"&gt;OSPO&lt;/a&gt; (Open Source Program Office) is up to.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://twitter.com/aiven_io"&gt;Twitter&lt;/a&gt; to keep up with more posts like this.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>postgres</category>
    </item>
  </channel>
</rss>
