<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <!-- Source: https://dev.to/feed/raoufchebri -->
  <channel>
    <title>DEV Community: Raouf Chebri</title>
    <description>The latest articles on DEV Community by Raouf Chebri (@raoufchebri).</description>
    <link>https://siftrss.com/f/yNoZW8R639</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F853339%2F3fe742c9-df9d-4d6a-af4a-4e995a896424.JPG</url>
      <title>DEV Community: Raouf Chebri</title>
      <link>https://dev.to/raoufchebri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://siftrss.com/f/yNoZW8R639"/>
    <language>en</language>
    <item>
      <title>Autoscaling in Action: Postgres Load Testing with pgbench</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Fri, 23 Feb 2024 09:25:39 +0000</pubDate>
      <link>https://dev.to/neon-postgres/autoscaling-in-action-postgres-load-testing-with-pgbench-5e84</link>
      <guid>https://dev.to/neon-postgres/autoscaling-in-action-postgres-load-testing-with-pgbench-5e84</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyr5pb09yvp93ankmtyfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyr5pb09yvp93ankmtyfn.png" alt="Blog post cover"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, I’ll show Neon autoscaling in action by running a load test using one of Postgres’ most popular benchmarking tool, &lt;code&gt;pgbench&lt;/code&gt;. The test simulates 30 clients running a heavy query. &lt;/p&gt;

&lt;p&gt;While 30 doesn’t sound like a lot, the query involves a mathematical function with high computational overhead, which signals to the autoscaler-agent that it needs to allocate more resources to the VM.&lt;/p&gt;

&lt;p&gt;We will not cover how autoscaling works, but for those interested in knowing the details, you can read more about &lt;a href="https://neon.tech/blog/scaling-serverless-postgres" rel="noopener noreferrer"&gt;how we implemented autoscaling in Neon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For this load test, you will need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;A Neon account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://wiki.postgresql.org/wiki/Homebrew" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The load test
&lt;/h2&gt;

&lt;p&gt;Ensuring your production database can perform under varying loads is crucial. That’s why we implemented autoscaling to Neon, a feature that dynamically adjusts resources allocated to a database in real-time, based on its current workload. &lt;/p&gt;

&lt;p&gt;However, the effectiveness and efficiency of autoscaling are often taken for granted without thorough testing. To showcase autoscaling in action, we turn to Postgres and &lt;code&gt;pgbench&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is a benchmarking tool included with Postgres, designed to evaluate the performance of a Postgres server. The tool simulates client load on the server and runs tests to measure how the server handles concurrent data requests. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is executed from the command line, and its usage can vary widely depending on the specific tests or benchmarks being run. Here is the command we will use in our test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql &lt;span class="nt"&gt;-c&lt;/span&gt; 30 &lt;span class="nt"&gt;-T&lt;/span&gt; 120 &lt;span class="nt"&gt;-P&lt;/span&gt; 1 &amp;amp;lt&lt;span class="p"&gt;;&lt;/span&gt;CONNECTION_STRING&amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;pgbench&lt;/code&gt; executes the query in &lt;code&gt;test.sql&lt;/code&gt;. The parameter &lt;code&gt;-c 30&lt;/code&gt; specifies 30 client connections, and &lt;code&gt;-T 120&lt;/code&gt; runs the test for 120 seconds against your database. &lt;code&gt;-P 1&lt;/code&gt; specifies that pgbench should report the progress of the test every 1 second. The progress report typically includes the number of transactions completed so far and the number of transactions per second.&lt;/p&gt;

&lt;p&gt;30 clients don’t seem like enough do stress a database. Well, it depends on the query you’re executing, which we’ll see next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query execution plan
&lt;/h2&gt;

&lt;p&gt;Here is the query we’ll use for our load test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mathematically, this query essentially compares the growth rates of the factorials of 32,000 and 20,000 by examining the ratio of their logarithms. &lt;/p&gt;

&lt;p&gt;Remember factorials? The factorial of a number n (denoted as n!) is the product of all positive integers less than or equal to n. For example, the factorial of 5 (5!) is 5 * 4 * 3 * 2 * 1 = 120. Factorials grow very rapidly with increasing numbers. &lt;/p&gt;

&lt;p&gt;To give you a sense of scale, the factorial of just 20 is already a 19-digit number: 20!=2,432,902,008,176,640,000&lt;/p&gt;

&lt;p&gt;The natural logarithmic function (log), on the other hand, is the power to which &lt;em&gt;e&lt;/em&gt; (Euler’s number = 2.71828) must be raised to obtain the value x.&lt;/p&gt;

&lt;p&gt;In other words, this operation should take a long time to process. How long? Let’s examine the query execution plan using EXPLAIN ANALYZE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;QUERY PLAN                                      

&lt;span class="nt"&gt;-------------------------------------------------------------------------------------&lt;/span&gt;

 Result  &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.00..0.01 &lt;span class="nv"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;32&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;actual &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.000..0.001 &lt;span class="nv"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt;

 Planning Time: 1921.630 ms

 Execution Time: 0.005 ms

&lt;span class="o"&gt;(&lt;/span&gt;3 rows&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query was executed on ¼ vCPU. &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; includes the planner’s estimates and real execution metrics. Execution Time appears to be quite fast. However, Planning Time (the time taken by the Postgres query planner to generate the execution plan) takes almost 2 seconds and suggests that preparing to run this mathematical function involves significant computational overhead.&lt;/p&gt;

&lt;p&gt;Combine 30 of those, and we should stress Postgres enough to trigger autoscaling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enabling autoscaling
&lt;/h2&gt;

&lt;p&gt;Autoscaling is the process of automatically increasing or decreasing the CPU and memory allocated to a database based on its current load. It dynamically adjusts the compute resources allocated to a Neon compute instance in response to the current load, eliminating the need for manual intervention. &lt;a href="https://neon.tech/docs/introduction/autoscaling" rel="noopener noreferrer"&gt;Learn more about autoscaling in the docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can enable autoscaling by defining the minimum and maximum compute units (CU) you’d like to allocate to your Postgres instance. This way, you remain in control of your resource consumption. For example, 1 CU allocates 1vCPU and 4GB of RAM to your instance.&lt;/p&gt;

&lt;p&gt;You can set your instance size when you create a new project or by navigating to the Branches page on your Neon Console, clicking on the database branch, and setting the CU range.&lt;/p&gt;



&lt;p&gt;I will set the range for this load test from ¼ to 7 CUs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Executing &amp;amp; monitoring the load test
&lt;/h1&gt;

&lt;p&gt;Let’s run our load test now and observe its effect on our Postgres instance. We recently added graphs to monitor the resources allocated to your Postgres instance and its usage, which will come in handy later. After enabling autoscaling, follow these steps to execute the load test:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create your project folder and test.sql file:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;pgbench-load-test
&lt;span class="nb"&gt;cd &lt;/span&gt;pgbench-load-test
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"SELECT log(factorial(32000)) / log(factorial(20000));"&lt;/span&gt; &amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt; test.sql&lt;span class="s1"&gt;'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Execute the load test by running the following command:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql &lt;span class="nt"&gt;-c&lt;/span&gt; 8 &lt;span class="nt"&gt;-T&lt;/span&gt; 120 &lt;span class="nt"&gt;-P&lt;/span&gt; 1 &amp;amp;lt&lt;span class="p"&gt;;&lt;/span&gt;YOUR_CONNECTION_STRING&amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you can create a &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;Neon project&lt;/a&gt; if you don’t have a connection string.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the autoscaling graph to monitor usage:&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;You should observe a rapid change in CPU and memory allocated. The result should look similar to the graph below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2FScreenshot-2024-02-23-at-10.04.45-982x1024.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2FScreenshot-2024-02-23-at-10.04.45-982x1024.png" alt="Autoscaling graph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The performance summary returned by &lt;code&gt;pgbench&lt;/code&gt; should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;latency average &lt;span class="o"&gt;=&lt;/span&gt; 6000.891 ms
latency stddev &lt;span class="o"&gt;=&lt;/span&gt; 2768.066 ms
initial connection &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 3712.770 ms
tps &lt;span class="o"&gt;=&lt;/span&gt; 4.978907 &lt;span class="o"&gt;(&lt;/span&gt;without initial connection &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On average, each operation took slightly over 6 seconds to complete. A standard deviation of 2768.066 ms means that the latencies of individual operations varied quite a bit around the average latency. A higher standard deviation indicates more variability in how long each operation took to complete.&lt;/p&gt;

&lt;p&gt;Establishing this connection took approximately 3.7 seconds before any operations could be performed. A TPS of around 4.98 means that, on average, the database was able to complete nearly five transactions every second during the test, after excluding the initial connection time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is a simple yet powerful tool to test your database and simulate multiple clients running heavy SQL queries. We also saw how to examine the query execution plan with &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;, which provides insights to optimize your SQL queries.&lt;/p&gt;

&lt;p&gt;If you’re running an application that can be subject to varying workloads, autoscaling offers you the confidence that your database will always under the stress of real-world demands.&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you are curious about autoscaling, &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;give Neon a try&lt;/a&gt; and join our &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;. We look forward to seeing you there and hearing your feedback.&lt;/p&gt;

&lt;p&gt;Happy scaling!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>test</category>
      <category>scale</category>
    </item>
    <item>
      <title>Point In Time Recovery Under the Hood in Serverless Postgres</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 22 Feb 2024 12:44:01 +0000</pubDate>
      <link>https://dev.to/neon-postgres/point-in-time-recovery-under-the-hood-in-serverless-postgres-2dhn</link>
      <guid>https://dev.to/neon-postgres/point-in-time-recovery-under-the-hood-in-serverless-postgres-2dhn</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-28-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-28-1024x576.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine working on a crucial project when suddenly, due to an unexpected event, you lose significant chunks of your database. Whether it’s a human error, a malicious attack, or a software bug, data loss is a nightmare scenario. But fear not! We recently added support for &lt;a href="https://dev.to/evanatneon/announcing-point-in-time-restore-864-temp-slug-3100656"&gt;Point-In-Time Restore (PITR)&lt;/a&gt; to Neon, so you can turn back the clock to a happier moment before things went south.&lt;/p&gt;

&lt;p&gt;In the video below and in the &lt;a href="https://dev.to/evanatneon/announcing-point-in-time-restore-864-temp-slug-3100656"&gt;PITR announcement article&lt;/a&gt;, my friend Evan shows you can recover your data in a few clicks. He also uses Time Travel Assist to observe the state of the database at a given timestamp to confidently and safely run the restore process.&lt;/p&gt;



&lt;p&gt;How is this possible? This article is for those interested in understanding how PITR works under the hood in Neon. To better explain this, we will: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cover the basics of PITR in Postgres &lt;/li&gt;
&lt;li&gt;Explore the underlying infrastructure that allows for PITR in Neon. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll ensure by the end of this post that you’re always prepared for disaster strikes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the basics of Point In-Time Recovery in Postgres
&lt;/h2&gt;

&lt;p&gt;PITR in Postgres is made possible using two key components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write-Ahead Logging&lt;/strong&gt; : Postgres uses &lt;a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener noreferrer"&gt;Write-Ahead Logging&lt;/a&gt; (WAL) to record all changes made to the database. Think of WAL as the database’s diary, keeping track of every detail of its day-to-day activities. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base backups&lt;/strong&gt; : Base backups are snapshots of your database at a particular moment in time. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With these two elements combined, you define a strategy to restore your database to any point after the base backup was taken, effectively traveling through your database’s timeline. However, you’d need to do some groundwork, which consists of the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Setting up WAL archiving:&lt;/strong&gt; By defining an &lt;code&gt;archive_command&lt;/code&gt; and setting &lt;code&gt;archive_mode&lt;/code&gt; to &lt;code&gt;on&lt;/code&gt;  in your &lt;code&gt;postgresql.conf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creating base backups:&lt;/strong&gt; You can use the &lt;code&gt;pg_basebackup&lt;/code&gt; to create daily backups.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FDR4PajEMGMxzyTFgdbCNEUmieSgTLWZsjfaN94aUmc5mdNV1Fa3ZAkr56df29EdFfG-U5kC_8Zg7MDSqP6aJCHf0ZhpjFEfKdKhCXtHlGAUudLiCF4iuXViEXZCZJx7y3pYlo8p5cwvRTiduMn45Xuc" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FDR4PajEMGMxzyTFgdbCNEUmieSgTLWZsjfaN94aUmc5mdNV1Fa3ZAkr56df29EdFfG-U5kC_8Zg7MDSqP6aJCHf0ZhpjFEfKdKhCXtHlGAUudLiCF4iuXViEXZCZJx7y3pYlo8p5cwvRTiduMn45Xuc"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If, for any reason, you need to restore your database, you need to recover the latest backup and replay the WAL on top of it. The same logic applies to restoring from a point in time in the retention period. &lt;/p&gt;

&lt;p&gt;Let’s say we want to restore the database to its state on February 1st at 14:30. We first locate the last backup file created before that target time, restore it, and then replay the WAL up to that time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FvTul66-QTVPuOMRscFhgCSpHVLZBUxNENuxuIVl0c9Vd8nvuoeFQiqOqW-TpMQ0-ZcmTffmzs4OF8TwE1on5qVQAhYPSPYK7ub9oKPZIkTPlghMzVQu9U8jQCcjQHGqsik8J9_PcYOBPVH1B2bQansA" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FvTul66-QTVPuOMRscFhgCSpHVLZBUxNENuxuIVl0c9Vd8nvuoeFQiqOqW-TpMQ0-ZcmTffmzs4OF8TwE1on5qVQAhYPSPYK7ub9oKPZIkTPlghMzVQu9U8jQCcjQHGqsik8J9_PcYOBPVH1B2bQansA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great! We now know how to perform a PITR in Postgres. However, there are a few limitations to this approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You might notice a drop in performance while performing backups, &lt;/li&gt;
&lt;li&gt;Because you have a finite storage capacity, you must define a limit to your archived WAL. This limit is known as the retention period (a.k.a history retention), which determines how far back in time your data can be restored.&lt;/li&gt;
&lt;li&gt;You have a single point of failure (SPOF) since all base backups and WAL archives are in the same location.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can enhance our architecture by adopting disaster recovery tools like &lt;a href="https://pgbarman.org/" rel="noopener noreferrer"&gt;Barman&lt;/a&gt; to avoid SPOF and downtime. With Barman, Postgres streams base backups and WAL archives to an external backup server. Or, if you know what you’re doing, you can configure Postgres to stream base backups and WAL archives to an AWS S3 bucket, and add a standby, which serves as an exact copy of your database, to avoid downtime. Your setup would look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2Fhj6272QvVTzQ0MIduTt6MpFoCY7fMJSdDJjo9jE0yzRzokzKaZ4B5A1HymLIIP6g8FbblXxsR5ks73VPWI0yTvHQFCQ8JSiaYIV5YnhFmf4ORS6bwEXS_SCLtMnoHsSZ1mJltkpk13xKRLpnFyZ06nQ" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2Fhj6272QvVTzQ0MIduTt6MpFoCY7fMJSdDJjo9jE0yzRzokzKaZ4B5A1HymLIIP6g8FbblXxsR5ks73VPWI0yTvHQFCQ8JSiaYIV5YnhFmf4ORS6bwEXS_SCLtMnoHsSZ1mJltkpk13xKRLpnFyZ06nQ"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To sum it up and to perform a PITR in Postgres without downtime, you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Have a backup server&lt;/li&gt;
&lt;li&gt;Set up WAL archiving and stream it to the backup&lt;/li&gt;
&lt;li&gt;Schedule daily backups&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Additionally, you need to install a bunch of packages and configure and maintain this infrastructure, a time that can be spent focused on your application instead. It’s that convenience, simplicity, and confidence in your data of use that Neon offers.&lt;/p&gt;

&lt;p&gt;So, how do we make it look so easy? Let’s step back and explain how Neon’s storage engine works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Neon’s architecture
&lt;/h2&gt;

&lt;p&gt;Neon’s philosophy is that the “database is its logs”. In our case: “Postgres is its WAL records”.&lt;/p&gt;

&lt;p&gt;Neon configures Postgres to stream the WAL to a custom Rust-based storage engine. Neon’s storage engine is composed of three parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A persistence layer called “&lt;a href="https://github.com/neondatabase/neon/blob/main/docs/rfcs/014-safekeepers-gossip.md" rel="noopener noreferrer"&gt;Safekeepers&lt;/a&gt;” makes sure the written data is never lost, &lt;a href="https://neon.tech/blog/paxos" rel="noopener noreferrer"&gt;using Paxos as a consensus algorithm&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A storage layer called “Pageservers”: multi-tenant storage that can reconstruct the data from WAL and send it to Postgres.&lt;/li&gt;
&lt;li&gt;A second persistence layer to durably store the WAL in AWS S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And since all the data is stored in Neon’s storage engine, Postgres doesn’t need to persist data on the local disk. This turns Postgres into a stateless compute instance that can start in under 500ms, making Neon serverless. &lt;/p&gt;

&lt;p&gt;As a result, we no longer require: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A standby: because, in the case of a Postgres crash, we can quickly spin up another instance.&lt;/li&gt;
&lt;li&gt;Backups: Neon’s storage engine stores the WAL and creates and performs &lt;a href="https://en.wikipedia.org/wiki/Compaction" rel="noopener noreferrer"&gt;compactions&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The data flow would look like the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FziTbF_Fwf1anMRcVZwo4f7DbNWmhSFVvJOXqL7x-B2lTZ-zeq6m7eVxwGXFMTg4_8kd8-fociJ-ka4QCKntbS3jj5L7F7HAJ2TXuCCHbixTFo6m0ukn_keRa1ZsLRD0Ryn9vx0Y2xg45-OIQBK7XysI" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FziTbF_Fwf1anMRcVZwo4f7DbNWmhSFVvJOXqL7x-B2lTZ-zeq6m7eVxwGXFMTg4_8kd8-fociJ-ka4QCKntbS3jj5L7F7HAJ2TXuCCHbixTFo6m0ukn_keRa1ZsLRD0Ryn9vx0Y2xg45-OIQBK7XysI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://neon.tech/blog/architecture-decisions-in-neon" rel="noopener noreferrer"&gt;&lt;em&gt;Architecture decisions in Neon&lt;/em&gt; article by Heikki Linnakangas&lt;/a&gt; to learn more.&lt;/p&gt;

&lt;p&gt;To understand the magic behind PITR in Neon, we’ll explore how the Pageservers work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pageservers: under the hood
&lt;/h2&gt;

&lt;p&gt;Each transaction in the WAL is associated with a Log Sequence Number (LSN), marking the byte position in the WAL stream where the record of that transaction starts. If we follow our initial analogy of WAL being a detailed diary of everything in the database, then the LSN is the page number in that diary.&lt;/p&gt;

&lt;p&gt;The Pageserver can be represented by a 2-dimensional graph, where the Y-axis is the &lt;code&gt;LSN&lt;/code&gt;, and the X-axis is the &lt;code&gt;key&lt;/code&gt; that points to the database, relation, and then block number. A key for example can point to certain rows in your database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FfRlrDbEpnnuLSCTH2XwuiuhsU74euugyHI-ebB7EPrvwR0FbuEDSgkG9HvkzeDZwPyIrF_dQTz2hWIXHEl0NgKILbydD5QPMlJz5sKuFuLDneJKsOWrtyx4oRVJk8AJL58zdY5yLxdAJildhuEOMuAI" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FfRlrDbEpnnuLSCTH2XwuiuhsU74euugyHI-ebB7EPrvwR0FbuEDSgkG9HvkzeDZwPyIrF_dQTz2hWIXHEl0NgKILbydD5QPMlJz5sKuFuLDneJKsOWrtyx4oRVJk8AJL58zdY5yLxdAJildhuEOMuAI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When data is written in Neon, the role of Pageservers is to accumulate WAL records. Then, when these records reach approximately 1GB in size, Pageservers create two types of immutable layer files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image layers (bars)&lt;/strong&gt;: contain a &lt;strong&gt;&lt;em&gt;snapshot&lt;/em&gt;&lt;/strong&gt; of a key range for a specific LSN. You can see Image Layers as the state of rows in certain tables or indexes at a given time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta layers (rectangles)&lt;/strong&gt;: contain the &lt;strong&gt;&lt;em&gt;incremental changes&lt;/em&gt;&lt;/strong&gt; within a key range. You can see Delta layers as a log of all the changes that happened to your rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Does this sound familiar?&lt;/p&gt;

&lt;p&gt;Indeed, it employs the same principle as the traditional Postgres setups for PITR we’ve previously discussed, which include base backups and WAL archiving. The main difference here is that you don’t need to initiate a lengthy and complex restore procedure every time you wish to read data from a previous state of the database. This is because Pageservers inherently know how to reconstruct the state of the page at any given LSN or timeline.&lt;/p&gt;

&lt;p&gt;Ephemeral branches&lt;/p&gt;

&lt;p&gt;We mentioned previously that, in Postgres, each WAL record is associated with an LSN. In Neon, Postgres tracks the last evicted LSN in the buffer cache, so Postgres knows at which point in time it should fetch the data. &lt;/p&gt;

&lt;p&gt;When Postgres requests a page from the Pageserver, it triggers the &lt;a href="https://github.com/neondatabase/neon/blob/main/pageserver/pagebench/src/cmd/getpage_latest_lsn.rs" rel="noopener noreferrer"&gt;GetPage@LSN&lt;/a&gt; function, which returns the state of a given key at that specific LSN.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2F87-1024x456.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2F87-1024x456.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Read the &lt;a href="https://neon.tech/blog/get-page-at-lsn" rel="noopener noreferrer"&gt;Deep dive in Neon’s storage engine&lt;/a&gt; article to learn more about Neon’s architecture.&lt;/p&gt;

&lt;p&gt;In practice, you can access different timelines through database branches. These branches are copy-on-write clones of your database, representing the state of your data at any point in its history. When you create a branch, you specify the LSN (or a timestamp), and Neon’s control plane generates a timeline associated with your project, keeping track of it.&lt;/p&gt;

&lt;p&gt;We’ve enhanced the Point In Time Recovery (PITR) feature in Neon with Time Travel Assist. This functionality allows you to perform Time Travel queries to review the state of your database at a specific timestamp or LSN, following the same underlying steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating a timeline, and&lt;/li&gt;
&lt;li&gt;Running &lt;a href="mailto:GetPage@LSN"&gt;GetPage@LSN&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, these branches are ephemeral, having a Time To Live (TTL) of 10 seconds. We refer to these as ephemeral branches, and they will soon become a crucial part of your development workflows.&lt;/p&gt;

&lt;p&gt;Ephemeral branches enable you to connect to a previous state of your database by merely specifying the LSN or timestamp in your connection string. This capability is natively supported by Pageservers, and Neon’s PITR feature is the first step towards making ephemeral connections available to developers. Stay tuned for more development in this area.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;While Postgres’ features offer powerful options and tools like Barman to help with disaster recovery, Neon’s approach makes PITR reliable, accessible, efficient, and integrated into a seamless database management experience. &lt;/p&gt;

&lt;p&gt;By first exploring how to do PITR in Postgres, we’ve learned about the importance of continuous archiving and creating base backups. &lt;/p&gt;

&lt;p&gt;Neon’s storage engine saves WAL records and snapshots of your database and can natively reconstruct data for any point in time in your history. This capability allows for the Time Travel Assist to query your database at a given timestamp before you proceed to its restoration using short-lived or ephemeral branches.&lt;/p&gt;

&lt;p&gt;Ephemeral branches introduce a unique way to interact with your data’s history by allowing developers to access different timelines and perform Time Travel queries to provide the ability to review prior states and understand your data’s lifecycle.&lt;/p&gt;

&lt;p&gt;What about you? How often do you use PITR in your projects? Join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and let us know how we can enhance your Postgres experience in the cloud.&lt;/p&gt;

&lt;p&gt;Special thanks to &lt;a href="https://twitter.com/skeptrune" rel="noopener noreferrer"&gt;skeptrune&lt;/a&gt; for reviewing and suggesting adding a mention to Barman.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>recovery</category>
      <category>disaster</category>
    </item>
    <item>
      <title>PgBouncer: The one with prepared statements</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 15 Feb 2024 09:43:20 +0000</pubDate>
      <link>https://dev.to/neon-postgres/pgbouncer-the-one-with-prepared-statements-198i</link>
      <guid>https://dev.to/neon-postgres/pgbouncer-the-one-with-prepared-statements-198i</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-26-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-26-1024x576.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The latest release of &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0" rel="noopener noreferrer"&gt;PgBouncer 1.22.0&lt;/a&gt; increases query throughput by 15% to 250% and includes support for &lt;code&gt;DEALLOCATE ALL&lt;/code&gt; and &lt;code&gt;DISCARD ALL&lt;/code&gt;, as well as protocol-level prepared statements released in &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0" rel="noopener noreferrer"&gt;1.21.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore what prepared statements are and how to use PgBouncer to optimize your queries in Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Prepared Statements?
&lt;/h2&gt;

&lt;p&gt;In Postgres, a prepared statement is a feature that allows you to create and optimize an SQL query once and then execute it multiple times with different parameters. It’s a template where you define the structure of your query and later fill in the specific values you want to use.&lt;/p&gt;

&lt;p&gt;Here’s an example of creating a prepared statement with &lt;code&gt;PREPARE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;PREPARE&lt;/span&gt; &lt;span class="n"&gt;user_fetch_plan&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;user_fetch_plan&lt;/code&gt; is the name of the prepared statement, and &lt;code&gt;$1&lt;/code&gt; is a placeholder for the parameter. &lt;/p&gt;

&lt;p&gt;Here is how to execute the prepared statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="n"&gt;user_fetch_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'alice'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="nv"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query will fetch all columns from the &lt;code&gt;users&lt;/code&gt; table where the &lt;code&gt;username&lt;/code&gt; is &lt;code&gt;alice&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Prepared Statements?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt; : Since the SQL statement is parsed and the execution plan is created only once, subsequent executions can be faster. However, this benefit might be more noticeable in databases with heavy and repeated traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt; : Prepared statements are a great way to avoid SQL injection attacks. Since data values are sent separately from the query, they aren’t executed as SQL, making injecting malicious SQL code difficult.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What is PgBouncer?
&lt;/h2&gt;

&lt;p&gt;Before diving into what PgBouncer is, let’s take a step back and briefly touch on how Postgres operates. &lt;/p&gt;

&lt;p&gt;Postgres runs on a system of several interlinked processes, with the &lt;code&gt;postmaster&lt;/code&gt; taking the lead. This initial process kicks things off, supervises other processes, and listens for new connections. The &lt;code&gt;postmaster&lt;/code&gt; also allocates a shared memory for these processes to interact.&lt;/p&gt;

&lt;p&gt;Whenever a client wants to establish a new connection, the &lt;code&gt;postmaster&lt;/code&gt; creates a new backend process for that client. This new connection starts a session with the backend, which stays active until the client decides to leave or the connection drops.&lt;/p&gt;

&lt;p&gt;Here’s where it gets tricky: Many applications, such as serverless backends, open numerous connections, and most eventually become inactive. Postgres needs to create a unique backend process for each client connection. When many clients try to connect, more memory is needed. In Neon, for example, the default maximum number of &lt;a href="https://neon.tech/docs/connect/connection-pooling#default-connection-limits" rel="noopener noreferrer"&gt;concurrent direct connections is set to 100&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The solution to this problem is connection pooling with PgBouncer, which helps keep the number of active backend processes low.&lt;/p&gt;

&lt;p&gt;PgBouncer is a lightweight connection pooler which primary function is to manage and maintain a pool of database connections to overcome Postgres’ connection limitations. Neon projects come by default with direct and pooled connections. The latter uses PgBouncer and currently offers up to 10,000 connections.&lt;/p&gt;

&lt;p&gt;Depending on your database provider, you'll have different ways to access to PgBouncer. On Neon, you can check the “Pooled connection” box in the connection details widget and make sure is contains the -pooler suffix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres://johndoe:mypassword@ep-billowing-wood-25959289-pooler.us-east-1.aws.neon.tech/neondb"&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FRnGdfOkY2GRjsa2zooCkJkdq838AK63X9LvXn2zvuEjbpNz3Hc3rVbwottAaEwQRkZ1NQd5USaFgMiKDJvtL5HUI5sUh058PTSG5NelFpJyJ8uwHjmQEavFjmgxp2BxmOugIrDpf-I1C-MriITe-Lkk" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FRnGdfOkY2GRjsa2zooCkJkdq838AK63X9LvXn2zvuEjbpNz3Hc3rVbwottAaEwQRkZ1NQd5USaFgMiKDJvtL5HUI5sUh058PTSG5NelFpJyJ8uwHjmQEavFjmgxp2BxmOugIrDpf-I1C-MriITe-Lkk"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Prepared Statements with PgBouncer in client libraries:
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_22_0" rel="noopener noreferrer"&gt;PgBouncer&lt;/a&gt; supports prepared statements at the protocol level, and therefore, the above SQL-level prepared statement using &lt;code&gt;PREPARE&lt;/code&gt; and &lt;code&gt;EXECUTE&lt;/code&gt; will not work with PgBouncer. See &lt;a href="https://www.pgbouncer.org/config.html#max_prepared_statements" rel="noopener noreferrer"&gt;PgBouncer’s documentation&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;However, you can use prepared statements with pooled connections in a client library. Most PostgreSQL client libraries offer support for prepared statements, often abstracting away the explicit use of &lt;code&gt;PREPARE&lt;/code&gt; and &lt;code&gt;EXECUTE&lt;/code&gt;. Here’s how you might use it in a few popular languages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;using&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;quot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="n"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;quot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;prepare&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// using pg  &lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="c1"&gt;// give the query a unique name&lt;/span&gt;
   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fetch-user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM users WHERE username = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In these client libraries, the actual SQL command is parsed and prepared on the server, and then the data values are sent separately, ensuring both efficiency and security.&lt;/p&gt;

&lt;p&gt;Under the hood, PgBouncer examines all the queries sent as a prepared statement by clients and assigns each unique query string an internal name (e.g. PGBOUNCER_123). PgBouncer rewrites each command that uses a prepared statement to use the matching internal name before forwarding the corresponding command to Postgres.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;                +-------------+
                | Client |
                +------+------+
                       |
                       | Sends Prepared Statement &lt;span class="o"&gt;(&lt;/span&gt;e.g., &amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM &lt;span class="nb"&gt;users &lt;/span&gt;WHERE &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; ?&amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                       |
                +------v------+
                | PgBouncer |
                | |
                | 1. Examines and tracks the client&lt;span class="s1"&gt;'s statement. |
                | 2. Assigns an internal name (e.g., PGBOUNCER_123).|
                | 3. Checks if the statement is already prepared |
                | on the PostgreSQL server. |
                | 4. If not, prepares the statement on the server. |
                | 5. Rewrites the client'&lt;/span&gt;s &lt;span class="nb"&gt;command &lt;/span&gt;to use the |
                | internal name. |
                +------^------+
                       |
                       | Forwards Rewritten Statement &lt;span class="o"&gt;(&lt;/span&gt;e.g., &amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM &lt;span class="nb"&gt;users &lt;/span&gt;WHERE &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; ?&amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt; as PGBOUNCER_123&lt;span class="o"&gt;)&lt;/span&gt;
                       |
                +------v------+
                | PostgreSQL |
                | Server |
                | |
                | Executes the forwarded statement with the internal name. |
                +-------------+&lt;span class="s2"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  In Summary
&lt;/h2&gt;

&lt;p&gt;PgBouncer bridges the gap between the inherent connection limitations of Postgres and the ever-growing demand for higher concurrency in modern applications. &lt;/p&gt;

&lt;p&gt;Leveraging prepared statements can be a valuable asset to boost your Postgres query performance and adds a layer of security against potential SQL injection attacks when using pooled connections. &lt;/p&gt;

&lt;p&gt;You can try prepared statements in PgBouncer with Neon today. We can’t wait to see what you build using it. Happy querying.&lt;br&gt;&lt;br&gt;
If you have any questions or feedback, don’t hesitate to get in touch with us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;. We’d love to hear from you.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgbouncer</category>
    </item>
    <item>
      <title>pgvector: 30x Faster Index Build for your Vector Embeddings</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Wed, 07 Feb 2024 15:43:47 +0000</pubDate>
      <link>https://dev.to/neon-postgres/pgvector-30x-faster-index-build-for-your-vector-embeddings-46da</link>
      <guid>https://dev.to/neon-postgres/pgvector-30x-faster-index-build-for-your-vector-embeddings-46da</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5gsml9s8t1p9imi47u93.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5gsml9s8t1p9imi47u93.jpg" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;We are Neon, the serverless Postgres. We power thousands of AI apps with the pgvector extension and separate storage and compute enabling your database resources to scale independently. In this article, Raouf explains how you can use Neon’s elasticity, and parallel HNSW index build in pgvector (0.5.1 for now, and 0.6.0 soon) to scale your AI apps.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Postgres’ most popular vector search extension, pgvector, recently implemented a parallel index build feature, which significantly improves the Hierarchical Navigable Small World (HNSW) index build time by a factor of 30.&lt;/p&gt;

&lt;p&gt;Congratulations to &lt;a href="https://github.com/ankane" rel="noopener noreferrer"&gt;Andrew Kane&lt;/a&gt; and pgvector contributors for this release, which solidifies Postgres’ position as one of the best databases for vector search and allows you to utilize the full power of your database to build the index.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y8Cdewke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/JF6GlzjwCLdIxG0PpOh66Q8GgqU60Ea_dXyGbGoKxjMCPQMtMMjzweMs4o9FeCBXY_ZKYNJQ2TuO8F-tUTFypUmN97XtyqhRgBM1ZjHg1wccgN5-IxTH5fpVQ7xrdM7l10lj99cJsmeYcOMPF-QGd0c" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y8Cdewke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/JF6GlzjwCLdIxG0PpOh66Q8GgqU60Ea_dXyGbGoKxjMCPQMtMMjzweMs4o9FeCBXY_ZKYNJQ2TuO8F-tUTFypUmN97XtyqhRgBM1ZjHg1wccgN5-IxTH5fpVQ7xrdM7l10lj99cJsmeYcOMPF-QGd0c" alt="Post image" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tests run by Johnathan Katz using a 10M dataset with 1,536-dimension vectors on a 64 vCPU, 512GB RAM instance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With Neon’s elastic capabilities and its architecture that separates storage and compute, you can, from the console or using the Neon API, allocate additional resources to your Postgres instance specifically for your HNSW index build process and then scale down to meet user demands, making Neon and pgvector a match made in heaven for efficient AI applications that scale to millions of users.&lt;/p&gt;

&lt;p&gt;This article details how you can use pgvector with Neon.&lt;/p&gt;
&lt;h2&gt;
  
  
  The power of pgvector
&lt;/h2&gt;

&lt;p&gt;Pgvector is Postgres’ most popular extension for vector similarity search. Vector search has become increasingly crucial to semantic search and Retrieval Augmented Generation (RAG) applications, enhancing the long-term memory of large language models’ (LLMs).&lt;/p&gt;

&lt;p&gt;In both semantic search and RAG use cases, the database contains a knowledge base that the LLM wasn’t trained on, split into a series of texts or chunks. Each text is saved in a row and is associated with a vector generated by an embedding model such as &lt;a href="https://platform.openai.com/docs/guides/embeddings/embedding-models" rel="noopener noreferrer"&gt;OpenAI’s ada-embedding-002&lt;/a&gt; or &lt;a href="https://docs.mistral.ai/platform/client/#embeddings" rel="noopener noreferrer"&gt;Mistral-AI’s mistral-embed&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Vector search is then used to find the most similar (closer) text to the query vector. This is achieved by comparing the query vector with every row in the database, making vector search hard to scale. This is why pgvector implemented &lt;a href="https://en.wikipedia.org/wiki/(1%2B%CE%B5)-approximate_nearest_neighbor_search" rel="noopener noreferrer"&gt;approximate nearest neighbor (ANN) algorithms&lt;/a&gt; (or indexes), which conduct the vector search over a subset of the database to avoid lengthy sequential scans.&lt;/p&gt;

&lt;p&gt;One of the most efficient ANN algorithms is the Hierarchical Navigable Small World (HNSW) index. Its graph-based and multi-layered nature is designed for billions-of-row vector search. This makes HNSW extremely fast and efficient at scale and one of the most popular indexes in the vector store market.&lt;/p&gt;
&lt;h2&gt;
  
  
  HNSW’s Achilles heel: memory and build time
&lt;/h2&gt;

&lt;p&gt;HNSW was first introduced by Yu A Malkov and Dmitry A. Yashunin in their paper titled Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.&lt;/p&gt;

&lt;p&gt;HNSW is a graph-based approach to indexing high-dimensional data. It constructs a hierarchy of graphs, where each layer is a subset of the previous one, which results in a time complexity of &lt;code&gt;O(log(rows))&lt;/code&gt;. During the search, it navigates through these graphs to quickly find the nearest neighbors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RvQVmOyF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/KqNnqzGKbZDgAGUD-Mbv_kUs4igPMlxV2t-L-OnHbMONP-KQ91MhNE1VwMhP9XHCjKGXXxFr6wpsBpGxaTR5z8PfiX4cmZPRs6c4MeU3IfvkliMJOQjjS4ghjdekfft16M2SZq7SNAaIBltie-VH7Mg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RvQVmOyF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/KqNnqzGKbZDgAGUD-Mbv_kUs4igPMlxV2t-L-OnHbMONP-KQ91MhNE1VwMhP9XHCjKGXXxFr6wpsBpGxaTR5z8PfiX4cmZPRs6c4MeU3IfvkliMJOQjjS4ghjdekfft16M2SZq7SNAaIBltie-VH7Mg" alt="Post image" width="716" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As fast and efficient as HNSW is, the index has two drawbacks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Memory&lt;/strong&gt;: The index requires significantly more memory than other indexes, such as the Inverted File Index (IVFFlat). You can solve the memory issue by having a larger database instance. But if you use standalone Postgres such as AWS RDS, you will find yourself in a position where you over-provision just for the index build. With Neon scaling capabilities, however, you can scale up, build the HNSW index, and then scale back down to save on cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pKP2VR_3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/amQuw2B6GIie69Mfc_2QH1-13H9oVvur1pMutPy8XjosF8BFYAVtfKFlaqu7hQeE1Z6xU-zjqj_faSelXhj8EzulxztxZdprzCCGFE-HBaqPyvmzz9FZ337Mp-9pAGdWdK4cRq5DlQ7K5J6xRYFzqHA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pKP2VR_3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/amQuw2B6GIie69Mfc_2QH1-13H9oVvur1pMutPy8XjosF8BFYAVtfKFlaqu7hQeE1Z6xU-zjqj_faSelXhj8EzulxztxZdprzCCGFE-HBaqPyvmzz9FZ337Mp-9pAGdWdK4cRq5DlQ7K5J6xRYFzqHA" alt="Post image" width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Build time&lt;/strong&gt;: The HNSW index can take hours to build for million-row datasets. This is mainly due to the time spent calculating the distance among vectors. And this is precisely what pgvector 0.6.0 solves by introducing &lt;a href="https://github.com/pgvector/pgvector/issues/409" rel="noopener noreferrer"&gt;Parallel Index Build&lt;/a&gt;. By allocating more CPU and workers, you build your HNSW index 30x faster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gN38rRGc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/-D_PE-Rd0kcV-U6x52yJjdwconeLHodXZ1MXTddB2p1q5-uFONE5Moem9RYmTLrB71uXKlA_sSyiN-viT1c9Xt26qbHHvFEwvGlXNEDgD1AmIgCak4GZPyvYQsX-4mwNWYAfGpc2nj31rp1cMihUKaM" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gN38rRGc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/-D_PE-Rd0kcV-U6x52yJjdwconeLHodXZ1MXTddB2p1q5-uFONE5Moem9RYmTLrB71uXKlA_sSyiN-viT1c9Xt26qbHHvFEwvGlXNEDgD1AmIgCak4GZPyvYQsX-4mwNWYAfGpc2nj31rp1cMihUKaM" alt="Post image" width="716" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But wait! The HNSW index supports updates, so why is this feature parallel index build necessary if you only need to build the index once?&lt;/p&gt;

&lt;p&gt;Well, there are two cases where you need to create an HNSW index:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When you want faster queries and to optimize for vector search&lt;/li&gt;
&lt;li&gt;When you already have an HNSW index, and you delete vectors from the table&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The latter might cause the indexed search to return false positives, negatively impacting the quality of the LLM response and the overall performance of your AI application.&lt;/p&gt;
&lt;h2&gt;
  
  
  Scale up and boost index build time
&lt;/h2&gt;

&lt;p&gt;pgvector 0.6.0 speeds up index build time up to 30 times compared to previous versions when using parallel workers. This improvement is especially notable when dealing with large data sets and vector sizes, such as OpenAI 1536 dimension vector embeddings.&lt;/p&gt;

&lt;p&gt;Creating an HNSW index could require significant resources. The reason is you need to allocate enough &lt;code&gt;maintenance_work_mem&lt;/code&gt; to fit the index in memory. Otherwise, the hnsw graph will take significantly longer to be built.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NOTICE:  hnsw graph no longer fits into maintenance_work_mem after 100000 tuples
DETAIL:  Building will take significantly longer.
HINT:  Increase maintenance_work_mem to speed up builds.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Neon, you can scale up your Postgres instance using the Console or the API, configure it to build the index, and then scale back down to save on cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--orTawCEn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/02/Export-1707305452931.mp4" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--orTawCEn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/02/Export-1707305452931.mp4" alt="Neon Console Operation" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To effectively use parallel index build, it’s essential to configure Postgres with suitable settings. Key parameters to consider are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;maintenance_work_mem&lt;/strong&gt;: This parameter determines the memory allocated for creating or rebuilding indexes. This parameter affects the performance and efficiency of these operations. Setting this to a high value, such as 8GB, allows for more efficient handling of the index build process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET maintenance_work_mem = '8GB';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;max_parallel_maintenance_workers&lt;/strong&gt;: This dictates the number of parallel workers that can be employed. The default value of max_parallel_maintenance_workers is typically set to 2 in Postgres. Setting this to a high number enables the utilization of more computing resources for faster index builds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET max_parallel_maintenance_workers = 7; -- plus leader
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: Neon supports for pgvector 0.5.1. However, our engineering team is working on adding support for 0.6.0. Stay tuned.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How does this affect recall performance?
&lt;/h2&gt;

&lt;p&gt;Recall is as important of a metric as query execution time in RAG applications. Recall is the percentage of correct answers the ANN provides. In the HNSW index, &lt;code&gt;ef_search&lt;/code&gt; is the parameter that determines the number of neighbors to scan at search time. The higher &lt;code&gt;ef_search&lt;/code&gt; is, the higher the recall and the higher the query execution time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pgvector/pgvector/issues/409#issuecomment-1898605567" rel="noopener noreferrer"&gt;The tests conducted by Johnathan Katz&lt;/a&gt; show that using parallel builds has negligible impact on recall, with most changes swinging positively by over 1%. Despite the substantial speed improvements, this remarkable stability in recall rates highlights the effectiveness of pgvector 0.6.0’s parallel build process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JsJxTTw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/P-_6yF4v7-mzDO-_AdCzEQlKNz7KqleEIFz1jje5YNktnWlZ-MU5VyillAUJjdo0CZ-ux2PILd7_llFpE_hawJ_kexmF2b6w9zJ6r2G-mZl0fr3IaKRZgfDiFf5VDmg9y8TKsOK-GdpXXC2ZIPU9c0A" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JsJxTTw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/P-_6yF4v7-mzDO-_AdCzEQlKNz7KqleEIFz1jje5YNktnWlZ-MU5VyillAUJjdo0CZ-ux2PILd7_llFpE_hawJ_kexmF2b6w9zJ6r2G-mZl0fr3IaKRZgfDiFf5VDmg9y8TKsOK-GdpXXC2ZIPU9c0A" alt="Recall Performance Graph" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;pgvector 0.6.0 represents a significant leap forward, proving that Postgres is an important player in the vector search space. By harnessing the power of parallel index building, developers can now construct HNSW indexes more rapidly and efficiently, significantly reducing the time and resources traditionally required for such tasks.&lt;/p&gt;

&lt;p&gt;Neon’s flexible and scalable serverless Postgres offering complements pg vector’s capabilities perfectly. Users can scale their database resources according to their specific needs for index building and then scale down to optimize costs, ensuring an economical yet powerful solution.&lt;/p&gt;

&lt;p&gt;What AI applications are you currently building? &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;Try pgvector on Neon today&lt;/a&gt;, join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, and let us know how we can improve your experience with serverless PostgreSQL.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgvector</category>
      <category>vector</category>
      <category>ai</category>
    </item>
    <item>
      <title>Bring Your Own Extensions to Serverless PostgreSQL</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Wed, 17 Jan 2024 14:07:36 +0000</pubDate>
      <link>https://dev.to/neon-postgres/bring-your-own-extensions-to-serverless-postgresql-1ba8</link>
      <guid>https://dev.to/neon-postgres/bring-your-own-extensions-to-serverless-postgresql-1ba8</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy04pls5fo5okulg8itif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy04pls5fo5okulg8itif.png" alt="Bring Your Own Extensions Cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Extensions in PostgreSQL are comparable to libraries in programming languages or plugins in web browsers. They are pivotal in the PostgreSQL ecosystem, providing additional functionalities ranging from encryption and AI to handling time series and geospatial data. More complex extensions can transform PostgreSQL into a graph or analytical database, and some companies even create custom private extensions for specific business logic.&lt;/p&gt;

&lt;p&gt;Neon’s compute in stateless PostgreSQL, which runs as a VM or a Kubernetes pod. The compute image comes with a &lt;a href="https://neon.tech/docs/extensions/pg-extensions" rel="noopener noreferrer"&gt;list of supported extensions&lt;/a&gt;. However, supporting a wide range of PostgreSQL extensions can pose performance and security risks in a multi-tenant serverless environment like Neon. This is why we are excited to announce we added &lt;a href="https://neon.tech/docs/extensions/pg-extensions#custom-built-extensions" rel="noopener noreferrer"&gt;support for private and custom extensions&lt;/a&gt; using Dynamic Extension Loading. &lt;/p&gt;

&lt;p&gt;This feature is currently in beta on request only. You can contact support if you want to bring your own extensions to Neon. In this article, we’ll introduce Dynamic Extension Loading, its implementation, its benefits, and our future plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extensions in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b4wtxtczmev9fq5d3gr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b4wtxtczmev9fq5d3gr.png" alt="PostgreSQL Extension Ecosystem" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL is a robust and versatile database system that is further enhanced by its support for extensions. Some of the most popular extensions are &lt;a href="https://postgis.net/" rel="noopener noreferrer"&gt;PostGIS&lt;/a&gt; for geolocation, &lt;a href="https://www.postgresql.org/docs/current/pgstatstatements.html" rel="noopener noreferrer"&gt;pg_stat_statement&lt;/a&gt;, or &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt; for vector similarity search. &lt;/p&gt;

&lt;p&gt;Extensions in PostgreSQL come in various forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL Object Packages&lt;/strong&gt;: These are the most common, comprising domain-specific data types, functions, triggers, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedural Languages&lt;/strong&gt;: Extensions like PLPython or PLV8 enable the use of different programming languages within PostgreSQL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal API Enhancements&lt;/strong&gt;: Written in C, these powerful extensions can introduce new storage methods, volume replication, background jobs, and configuration parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensions in Other Languages&lt;/strong&gt;: Beyond C, extensions can be developed in languages like C++ or Rust, broadening the scope of functionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To use an extension, it must be built against the correct major version of PostgreSQL. The installation involves placing files in the shared directory and library files in the libdir, paths that vary across platforms. After placing the files, the &lt;code&gt;CREATE EXTENSION&lt;/code&gt; command is executed in the database, prompting PostgreSQL to locate and run the installation scripts for the extension.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extension support limitations in serverless environments
&lt;/h2&gt;

&lt;p&gt;In Neon's serverless PostgreSQL environment, each compute runs as an ephemeral Kubernetes pod or VM. A compute instance can be scaled up, down, and descheduled whenever the workload changes. Therefore, supporting a wide range of PostgreSQL extensions presents significant challenges such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility&lt;/strong&gt;: Many extensions are not designed for serverless architectures, particularly those needing persistent storage or deep system integration, such as &lt;a href="https://github.com/citusdata/pg_cron" rel="noopener noreferrer"&gt;pg_cron&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" rel="noopener noreferrer"&gt;file_fdw&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Issues&lt;/strong&gt;: Embedding all extensions in the compute image significantly increases its size, leading to slower start times and reduced performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance Overhead&lt;/strong&gt;: Traditional methods require frequent updates to the entire compute image for each extension update, causing potential service disruptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Risks&lt;/strong&gt;: A larger set of extensions in the base image increases the potential attack surface, especially with extensions that remain unused by many users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited Customization&lt;/strong&gt;: The open-source nature of compute images restricts the inclusion of custom or closed-source extensions, limiting tailored solutions for specific customer needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, the conventional method of bundling extension files into compute images is impractical due to the sheer number of extensions and the varied needs of users. This led us to rethink how we provide extensions with Dynamic Extension Loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic Extension Loading: A New Approach
&lt;/h2&gt;

&lt;p&gt;At Neon, we've addressed these challenges with our dynamic extension loading mechanism. Here's how it works:&lt;/p&gt;

&lt;p&gt;Building and Storing Extensions: We build extensions in a separate repository and store the resulting files in an S3 bucket.&lt;br&gt;
Configuring Extensions: Extensions are configured per user in the Neon control plane, enhancing customization.&lt;br&gt;
On-Demand Loading: Compute instances download control files at startup, and library files are fetched as needed when extension functions are called.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiszbg91ueiv8onrg4fwu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiszbg91ueiv8onrg4fwu.png" alt="Custom Extension download diagram on Neon" width="800" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Dynamic Extension Loading, private and default extensions can be added to compute instances without restarting, reducing maintenance overhead. Additionally, it brings performance benefits to Neon. Our plans with Dynamic Extension Loading include moving all default-supported extensions to the extension storage, resulting in a smaller compute image size and faster start times.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to bring your own extension to Neon
&lt;/h2&gt;

&lt;p&gt;To request support for a Postgres extension, paid plan users can &lt;a href="https://console.neon.tech/app/projects?modal=support" rel="noopener noreferrer"&gt;open a support ticket&lt;/a&gt;. Free plan users can submit a request via the feedback channel on our &lt;a href="https://discord.com/invite/92vNTzKDGp" rel="noopener noreferrer"&gt;Discord Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our engineers will then evaluate the compatibility of your extensions with Neon, build it, and upload the artifacts to the extension storage once it pass all the security tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This feature is currently in beta, with plans for general availability in the near future. This development marks a significant step forward in making PostgreSQL more adaptable and efficient in a serverless environment.&lt;/p&gt;

&lt;p&gt;What about you? Do you use PostgreSQL extensions in your projects? Join us on Discord and let us know which extensions you use and how we can enhance your PostgreSQL experience in the cloud.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>extensions</category>
      <category>cloud</category>
      <category>database</category>
    </item>
    <item>
      <title>Change Data Capture with Serverless Postgres</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 21 Dec 2023 12:23:07 +0000</pubDate>
      <link>https://dev.to/neon-postgres/change-data-capture-with-serverless-postgres-823</link>
      <guid>https://dev.to/neon-postgres/change-data-capture-with-serverless-postgres-823</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ge716hp6gprsd0d0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ge716hp6gprsd0d0a.png" alt="Cover Image" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modern applications often require loosely coupled components and services that help teams and systems to scale. These data pipelines generate continuous data streams that need to be replicated, processed, or analyzed. &lt;/p&gt;

&lt;p&gt;However, moving data between different data stores can seriously compromise the quality and reliability of your decisions because inconsistent data or corruption occurs during transformation. This is why &lt;a href="https://en.wikipedia.org/wiki/Change_data_capture" rel="noopener noreferrer"&gt;Change Data Capture (CDC)&lt;/a&gt; has emerged as one of the most popular methods to synchronize data across multiple data stores. One way to use CDC in Postgres is with &lt;a href="https://www.postgresql.org/docs/current/logical-replication.html#:~:text=Logical%20replication%20is%20a%20method,byte%2Dby%2Dbyte%20replication." rel="noopener noreferrer"&gt;logical replication&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today, we’re excited to announce the release of logical replication in beta on Neon. This feature lets you stream your data hosted on Neon to external data stores, allowing for change data capture and real-time analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CDC matters?
&lt;/h2&gt;

&lt;p&gt;CDC refers to the process of capturing changes made to data in a database – such as inserts, updates, and deletes – and then delivering these changes to downstream processes or systems. &lt;/p&gt;

&lt;p&gt;CDC operates by monitoring and capturing data changes in a source database as they occur. This is a departure from traditional batch processing, where data updates are transferred at scheduled intervals. CDC ensures that every change is captured and can be acted upon almost instantaneously.&lt;/p&gt;

&lt;p&gt;Why CDC Matters&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data synchronization: In a distributed system architecture, keeping data synchronized across various platforms and services is critical. CDC facilitates this by providing a mechanism for real-time data replication.&lt;/li&gt;
&lt;li&gt;Minimizing Latency: By capturing changes as they happen, CDC minimizes the latency in data transfer. This is essential for applications where even a slight delay in data availability can lead to significant issues, such as financial trading systems.&lt;/li&gt;
&lt;li&gt;Enabling Event-Driven Architectures: CDC is a cornerstone for building event-driven systems. In such architectures, actions are triggered in response to data changes, making real-time data capture essential.&lt;/li&gt;
&lt;li&gt;Data warehousing and real-time analytics: For organizations relying on data warehouses and analytics tools for decision-making, CDC ensures that the data in these systems is current, enhancing the accuracy of insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we understand it better, let’s explore the technical mechanics of how CDC is implemented in Postgres through logical replication. &lt;/p&gt;

&lt;h2&gt;
  
  
  Logical replication: under the hood
&lt;/h2&gt;

&lt;p&gt;In Postgres, logical replication is one of the methods of implementing CDC and streaming data from your database to an external source. It uses a publisher-subscriber model. &lt;/p&gt;

&lt;p&gt;Your Neon database works as a publisher, copying first a snapshot of the data and then streaming changes to one or more target data stores (subscribers). This model allows for selective replication, where only specified tables or even specific columns within a table can be replicated.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72iu7yda24nhm4jw8tfy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72iu7yda24nhm4jw8tfy.png" alt="Neon as publisher" width="686" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learn more about connecting &lt;a href="https://neon.tech/docs/guides/logical-replication-guide" rel="noopener noreferrer"&gt;Neon to different data stores&lt;/a&gt; in the documentation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener noreferrer"&gt;Write-Ahead-Log (WAL)&lt;/a&gt; is a fundamental component in Postgres, designed to ensure data integrity and facilitate recovery. It records every change made to the database, including transactions and their states.&lt;/p&gt;

&lt;p&gt;For logical replication, the WAL serves as the primary data source. The WAL captures the comprehensive sequence of data changes, which are then decoded for replication purposes. Logical replication transforms the WAL to a format accepted by the subscriber through logical decoding, and the &lt;code&gt;walsender&lt;/code&gt; then streams the transformed data using the replication protocol. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;walsender&lt;/code&gt; initiates the logical decoding of the WAL using an output plugin. Postgres ships with several logical decoding plugins that can output the data in various formats. In addition, new plugins can be developed.&lt;/p&gt;

&lt;p&gt;For instance, in a Postgres-to-Postgres logical replication, the standard &lt;code&gt;pgoutput&lt;/code&gt; plugin transforms the data changes to the logical replication protocol. The transformed data is subsequently streamed using the replication protocol, which maps it to local tables and applies the changes in the exact sequence of the original transactions. However, integrations with non-Postgres systems require an output format different from the standard one specifically designed for Postgres-to-Postgres logical replication. &lt;/p&gt;

&lt;p&gt;Today’s data pipelines involve more than one data store type. For example, you can integrate all your Postgres databases into a data warehouse or streaming platform, such as &lt;a href="https://materialize.com/" rel="noopener noreferrer"&gt;Materialize&lt;/a&gt; or &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Kafka&lt;/a&gt;, to process and analyze data at higher scales. This is why, with the release of logical replication on Neon, we added support for &lt;a href="https://github.com/eulerto/wal2json" rel="noopener noreferrer"&gt;wal2json&lt;/a&gt;, which outputs changes in the JSON format to be easily consumed by other systems and data stores.&lt;/p&gt;

&lt;p&gt;You can read more on &lt;a href="https://neon.tech/blog/cdc-with-materialize" rel="noopener noreferrer"&gt;Change Data Capture using Neon and Materialize&lt;/a&gt; by our friend Marta Paes, to learn how to integrate your database with external systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  Logical vs. physical replication
&lt;/h2&gt;

&lt;p&gt;Logical replication differs from physical replication in that it replicates changes at the data level (row-level changes) rather than replicating the entire database block. This allows for more selective replication and reduces the amount of data transferred. Unlike snapshot replication, which provides a full copy of the data at a specific point in time, logical replication ensures continuous streaming of changes, making it more suitable for applications that require near real-time data availability.&lt;/p&gt;

&lt;p&gt;This comparison highlights the distinct characteristics, advantages, and applications of logical and physical replication.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Logical Replication&lt;/th&gt;
&lt;th&gt;Physical Replication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Row-Level Changes&lt;/strong&gt;: focuses on replicating specific row-level changes (INSERT, UPDATE, DELETE) in selected tables.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Block-Level Replication&lt;/strong&gt;: replicates the entire database at the block level. It creates an exact copy of the source database, including all tables and system catalogs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Offers the flexibility to replicate specific tables and even specific columns within tables.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Doesn’t allow for selective table replication and requires the same PostgreSQL version on both the primary and standby servers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;WAL-based&lt;/strong&gt;: Uses the WAL for capturing changes, but with logical decoding to convert these changes into a readable format for the subscriber.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Streaming Replication&lt;/strong&gt;: Changes are streamed as they are written to the WAL, minimizing lag.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Use Cases&lt;/strong&gt;: Ideal for situations requiring selective replication, minimal impact on the source database, or cross-version compatibility.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Use Cases&lt;/strong&gt;: Best suited for creating read-only replicas for load balancing, high availability, and disaster recovery solutions.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Get started with logical replication
&lt;/h2&gt;

&lt;p&gt;To enable logical replication, navigate to your project’s settings in the console and click on the “Beta” tab, locate Logical Replication then on the “Enable” button. &lt;/p&gt;

&lt;p&gt;Note that enabling logical replication will restart your compute instance, which will drop existing connections. A subscriber may also keep the connection to your Neon database active, preventing your Neon instance from scaling to zero.&lt;/p&gt;

&lt;p&gt;This action is also irreversible, and you will not be able to disable logical replication for your project.&lt;/p&gt;

&lt;p&gt;Ensure logical replication is enabled by running the following query in the SQL Editor within the Neon console or using psql on your terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW wal_level;

 wal_level 
-----------
 logical
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a publication
&lt;/h2&gt;

&lt;p&gt;Let’s assume you have the following users table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE users (

  id SERIAL PRIMARY KEY,

  username VARCHAR(50) NOT NULL,

  email VARCHAR(100) NOT NULL

);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Execute the following query to create a publication for the users table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE PUBLICATION users_publication FOR TABLE users;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Learn more about &lt;a href="https://neon.tech/docs/guides/logical-replication-guide" rel="noopener noreferrer"&gt;how to connect Neon to different data stores&lt;/a&gt; in the documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;While logical replication in Neon Postgres offers numerous benefits for real-time data synchronization and flexibility, it has some limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Publisher, not a subscriber&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release of logical replication on Neon is in beta, and for security reasons, it does not include subscriber capabilities at the moment. We are currently working on these security constraints, which should be supported in future releases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logical replication and Auto-suspend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a logical replication setup, a subscriber may keep the connection to your Neon publisher database active to poll for changes or perform sync operations, preventing your Neon compute instance from scaling to zero. Some subscribers allow you to configure connection or sync frequency, which may be necessary to continue taking advantage of Neon’s Auto-suspend feature. Please refer to your subscriber’s documentation or contact their support team for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Definition Language (DDL) Operations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Logical replication in Postgres primarily handles Data Manipulation Language (DML) operations like INSERT, UPDATE, and DELETE. However, it does not automatically replicate Data Definition Language (DDL) operations such as CREATE TABLE, ALTER TABLE, or DROP TABLE. This means that schema changes in the publisher database are not directly replicated to the subscriber database.&lt;/p&gt;

&lt;p&gt;Manual intervention is required to replicate DDL changes. This can be done by applying the DDL changes separately in both the publisher and subscriber databases or by using third-party tools that can handle DDL replication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication Lag&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In high-volume transaction environments, there is potential for replication lag. This is the time delay between a transaction being committed on the publisher and the same transaction being applied on the subscriber.&lt;/p&gt;

&lt;p&gt;It’s important to monitor replication lag and understand its impact, especially for applications that require near-real-time data consistency. Proper resource allocation and optimizing the network can help mitigate this issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Logical replication is undoubtedly one of the most important features for modern applications. As we continue to develop its capabilities, we encourage you to test, experiment, and push the boundaries of what logical replication can do. Join us on Discord, and share your experiences, suggestions, and challenges with us. &lt;/p&gt;

&lt;p&gt;We can’t wait to see what you build with Neon.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>cdc</category>
      <category>streaming</category>
    </item>
  </channel>
</rss>
